fasta-holdout-set[options] --p <primary sequences>
Split primary and control sequences into training and testing sets. Control sequences are generated by shuffling if not specified. Primary sequences will be centrally trimmed to a specified length if requested.
The name of a file containing the primary (positive) sequences in
FASTA format. The file must contain
at least two valid sequences or
fasta-holdout-set will reject it.
fasta-holdout-set writes its output to files in a directory named
fasta-holdout-set_out, which it creates if necessary. You can change the
output directory using the --o or --oc options.
The directory will contain:
train_pos.fa- the primary training set
train_neg.fa- the control training set
test_pos.fa- the primary testing (hold-out) set
test_neg.fa- the control testing (hold-out) set
Note: All options may be preceded by a single dash (-) instead of a double dash (--) if desired.
|--n||control sequences||The name of a file containing control (negative) sequences in
The control sequences must be in the same sequence alphabet as the primary sequences.
If the average length of the control sequences is longer than that of
the primary sequences,
||If you do not provide control sequences, |
|--order||m||If you do not provide control sequences, |
|--hofract||hofract||The fraction of the primary and control sequences that
Note: If a value of 0 is specified, no hold-out set output files are created and the training set files will contain all the original sequences.
Note: If a hold-out set would contain fewer than 5 sequences,
|--seed||seed||Random seed for shuffling and sampling the hold-out set sequences (see above).||
|--ccut||size||Trim the primary sequences to their central region of size characters before creating the control sequences and before splitting the sequences into training and testing sets. A value of 0 indicates that the primary sequences should not be trimmed. Note: If you provide control sequences they will never be trimmed.||A value of 0 is used.|
|--verbosity||1|2|3|4|5||A number that regulates the verbosity level of the output
information messages. If set to 1 (quiet) then
||The verbosity level is set to 2 (normal).|