Cammand line for updating time
This implies that using the cache file with different options might cause VW to rebuild the cache.The easiest way to use a cache is to always specify the -a [ --audit ] print weights of features -p [ --predictions ] arg File to output predictions to -r [ --raw_predictions ] arg File to output unnormalized predictions --sendto arg to send compressed examples to --quiet Don't output diagnostics -P [ --progress ] arg Progress update frequency.--save_per_pass Save model after every pass over data --input_feature_regularizer arg Per feature regularization input file --output_feature_regularizer_binary arg Per feature regularization output file --output_feature_regularizer_text arg Per feature regularization output file in text format option determines the value of (b) which is 18 by default.Hashing the features allows the algorithm to work with very raw data (since there's no need to assign a unique id to each feature) and has only a negligible effect on generalization performance (see for example Feature Hashing for Large Scale Multitask Learning. Technically, this loads the regressor and prints out feature details when it's encountered in the dataset for the first time.For example in --sgd use regular/classic/simple stochastic gradient descent update, i.e., non adaptive, non normalized, non invariant (this is no longer the default since it is often sub-optimal) --adaptive use adaptive, individual learning rates (on by default) --normalized use per feature normalized updates.(on by default) --invariant use safe/importance aware updates (on by default) --conjugate_gradient use conjugate gradient based optimization (option in bfgs) --bfgs use bfgs optimization --ftrl use FTRL-Proximal optimization --ftrl_alpha (=0.005) ftrl alpha parameter (option in ftrl) --ftrl_beta (=0.1) ftrl beta patameter (option in ftrl) --mem arg (=15) memory in bfgs --termination arg (=0.001) Termination threshold --hessian_on use second derivative in line search --initial_pass_length arg initial number of examples per pass --l1 arg (=0) l_1 lambda (L1 regularization) --l2 arg (=0) l_2 lambda (L2 regularization) --decay_learning_rate arg (=1) Set Decay factor for learning_rate between passes --initial_t arg (=0) initial t value --power_t arg (=0.5) t power value -l [ --learning_rate ] arg (=0.5) Set (initial) learning Rate --loss_function arg (=squared) Specify the loss function to be used, uses squared by default.
VW removes duplicate interactions of same set of namespaces.
These values are applied on a per-example basis in online learning (sgd), but on an aggregate level in batch learning (conjugate gradient and bfgs).
specify the learning rate schedule whose generic form in the epoch is where is the sum of the importance weights of all examples seen so far ( if all examples have importance weight 1).
You must first create a cache file, and then it will treat initial_pass_length as the number of examples in a pass, resetting to the beginning of the file after each pass.
After running is a rarely used option for LBFGS which changes the way a step size is computed.
-k [ --kill_cache ] do not reuse existing cache: create a new one always --compressed use gzip format whenever possible.