fasta-subsample <sequence file> <count> [options]
Create a random subset of the sequences in a FASTA formatted file. The random seed is fixed so the same subset will be output in every run of the program unless it is explicitly set.
The name of a file of sequences in FASTA format.
The number of sequences to randomly select for inclusion in the output.
Writes a FASTA formatted file to standard out containing the specified subsample of the original file. If -rest file is specified then any left over sequences are written to file, which is useful for cross-validation.
Option | Parameter | Description | Default Behavior |
---|---|---|---|
General Options | |||
-seed | random seed | Seed the random number generator uses to select the sequences. | A seed of 1 is used. |
-rest | file | Name of the file to send the sequences not selected in the output. | The unselected sequences are just discarded. |
-off | offset | The offset within each sequence to print. | The sequence is output from its beginning |
-len | len | The maximum length that printed sequences are constrained to. | The sequence is output until its end. |