fasta-holdout-set</code>

<primary sequences>

The name of a file containing the primary (positive) sequences in FASTA format. The file must contain at least two valid sequences or fasta-holdout-set will reject it.

fasta-holdout-set writes its output to files in a directory named fasta-holdout-set_out, which it creates if necessary. You can change the output directory using the --o or --oc options. The directory will contain:

train_pos.fa - the primary training set
train_neg.fa - the control training set

Unless option --hofract (see below) is given and is equal to zero the output directory will also contain:

test_pos.fa - the primary testing (hold-out) set
test_neg.fa - the control testing (hold-out) set

Note: All options may be preceded by a single dash (-) instead of a double dash (--) if desired.

Option	Parameter	Description	Default Behavior
Output
--n	control sequences	The name of a file containing control (negative) sequences in FASTA format. The control sequences must be in the same sequence alphabet as the primary sequences. If the average length of the control sequences is longer than that of the primary sequences, `fasta-holdout-set` trims the control sequences so that both sets have the same average length.	If you do not provide control sequences, `fasta-holdout-set` creates them by shuffling a copy of each primary sequence, using a m-order shuffle (see next option). Shuffling also preserves the positions of non-core (e.g., ambiguous) characters in each sequence to avoid artifacts.
--order	m	If you do not provide control sequences, `fasta-holdout-set` will do an an m-order shuffle of each primary sequence to to create control sequences. This preserves the frequencies of words of length m+1 in each shuffled sequence. Unless you specify a background model file (see --bfile, below), `fasta-holdout-set` will also estimate an m-order Markov background model from the control sequences (or the primary sequences if you do not provide control sequences). `fasta-holdout-set` uses the `fasta-get-markov` program with a total pseudocount of 1 to create the Markov model. m must be in the range [0,..,5].	`fasta-holdout-set` uses m=2 (DNA and RNA), and m=0 (Protein and Custom alphabets).
--hofract	hofract	The fraction of the primary and control sequences that `fasta-holdout-set` will randomly select and place in the hold-out set output files. Note: If a value of 0 is specified, no hold-out set output files are created and the training set files will contain all the original sequences. Note: If a hold-out set would contain fewer than 5 sequences, `fasta-holdout-set` creates an empty output file.	`fasta-holdout-set` places 0.1 (10%) of the primary and control sequences in the hold-out set output files.
--seed	seed	Random seed for shuffling and sampling the hold-out set sequences (see above).	`fasta-holdout-set` uses a random seed of 0.
--ccut	size	Trim the primary sequences to their central region of size characters before creating the control sequences and before splitting the sequences into training and testing sets. A value of 0 indicates that the primary sequences should not be trimmed. Note: If you provide control sequences they will never be trimmed.	A value of 0 is used.
--verbosity	1\|2\|3\|4\|5	A number that regulates the verbosity level of the output information messages. If set to 1 (quiet) then `fasta-holdout-set` will only output warning and error messages, whereas the other extreme 5 (dump) outputs lots of information intended for debugging.	The verbosity level is set to 2 (normal).

The MEME Suite

Motif-based sequence analysis tools

fasta-holdout-set

Usage:

Description

Input

Output

Options