fasta-grep <re> (-dna | -prot) [options]
fasta-grep
displays the non-overlapping occurrences of a
PERL regular expression in FASTA sequences.
fasta-grep
supports the IUPAC
alphabets for amino acids and nucleotides.
A PERL regular expression.
The sequences are DNA or protein, respectively.
Reads FASTA formatted sequences from standard input.
Writes matches to the regular expression to standard output or, optionally, a file. The output is in the form
FASTA sequence ID line followed by number of matches [line 1 of sequence] [match line 1] [line 2 of sequence] [match line 2] ... [last line of sequence] [last match line]For proteins, occurrences are marked on the match lines as:
> start of occurrence < end of occurrenceFor DNA, occurrences are marked on the match line as:
> start of occurrence < end of occurrence * start and end of two occurrences
Option | Parameter | Description | Default Behavior |
---|---|---|---|
General Options | |||
-dna | Sequence is DNA. | ||
-prot | Sequence is protein. | ||
-s | Print whole matching sequences only. | ||
-p | Print positions only, not sequence; 1-based. (relative to input strand if DNA; see below) | ||
-m | Print IDs of matching sequences only. | ||
-x | Print IDs of non-matching sequences only. | ||
-o | Print occurrences only in "raw" format. | ||
-f | Print occurrences only in FASTA format. 1-based positions (relative to the strand of the match if DNA) are appended to the sequence ID. | ||
-a | Print all occurrences (even overlapping ones); ignored unless -o or -f given. | ||
-norc | Only print matches to given strand. | print matches for both DNA strands. | |
-prosite | <re> is in PROSITE format. | print matches for both DNA strands. | |
-erase | Replace occurrences with 'N's (-dna) or 'X's (-prot). |
fasta-grep WGATAAN -dna < ~/crp0.s fasta-grep A[AT]G -dna < ~/crp0.fasta