fasta-subsample

Usage:

fasta-subsample <sequence file> <count> [options]

Description

Create a random subset of the sequences in a FASTA formatted file. The random seed is fixed so the same subset will be output in every run of the program unless it is explicitly set.

Input

<sequence file>

The name of a file of sequences in FASTA format.

<count>

The number of sequences to randomly select for inclusion in the output.

Output

Writes a FASTA formatted file to standard out containing the specified subsample of the original file. If -rest file is specified then any left over sequences are written to file, which is useful for cross-validation.

Options

Option Parameter Description Default Behavior
General Options
-seedrandom seed Seed the random number generator uses to select the sequences. A seed of 1 is used.
-restfile Name of the file to send the sequences not selected in the output. The unselected sequences are just discarded.
-offoffset The offset within each sequence to print. The sequence is output from its beginning
-lenlen The maximum length that printed sequences are constrained to. The sequence is output until its end.