fasta-shuffle-letters

Usage:

fasta-shuffle-letters [options] <sequence file> [<output file>]

Description

The program fasta-shuffle-letters creates a shuffled version of a FASTA file. The letters in each sequence in the input file are shuffled in such a way that k-mer frequencies are exactly preserved where k is by default 1 but may be set by the "-kmer" option. If an alphabet is specified via -dna, -rna, -protein or -alph, any aliased symbols are first converted to their core symbol before shuffling. By default, aliased symbols are not converted to their core symbols, and are treated as distinct letters, which may not be what you want.

The underlying implementation uses uShuffle.

Input

<sequence file>

The name of a file of sequences in FASTA format from a file.

<output file>

(Optional) The name of a file to receive the shuffled sequences.

Output

Writes a FASTA format file to the optional output file or standard output if it is left unspecified.

Options

Option Parameter Description Default Behavior
General Options
-kmerk Shuffle the sequences so that the frequencies of all words of length k are preserved. Note that in setting this number you must maintain a balance—the larger the number the more realistic the resulting sequences will be, but setting k too large will prevent the sequences from being shuffled at all! A value of 2 is used by MEME-ChIP. For values larger than 1, specifying the alphabet is highly recommended because it allows for the translation of aliases which may be important in some cases like soft-masked sequence. A value of 1 is used.
-preservepos The position (1-relative) to preserve within each sequence during shuffling. All other positions are shuffled in each sequence, but position pos remains untouched. If pos is greater than the sequence length, the entire sequence is shuffled. Each sequence is shuffled in its entirety.
-fixchar Preserve the positions of all occurrences of character char. All other positions are shuffled in each sequence, but all occurrences of character char remain untouched. Each sequence is shuffled in its entirety.
-copiesnum The number of shuffled copies to create for each sequence in the source. A single shuffled sequence is created for each sequence in the source.
-linenum The sequences will be output with a maximum of num symbols per line. A line length of 100 is used.
-tagtext The name of the sequence will have text appended to it. The name of the sequence will have "_shuf" appended to it.
-seednum Set the seed of the random number generator to num. Seed the random number generator from the computer as randomly as possible.