fasta-grep

Usage:

fasta-grep <re> (-dna | -prot) [options]

Description

fasta-grep displays the non-overlapping occurrences of a PERL regular expression in FASTA sequences. fasta-grep supports the IUPAC alphabets for amino acids and nucleotides.

Input

<re>

A PERL regular expression.

-dna | -prot

The sequences are DNA or protein, respectively.

Reads FASTA formatted sequences from standard input.

Output

Writes matches to the regular expression to standard output or, optionally, a file. The output is in the form

        FASTA sequence ID line followed by number of matches
        [line 1 of sequence]
        [match line 1]
        [line 2 of sequence]
        [match line 2]
        ...
        [last line of sequence]
        [last match line]
      
For proteins, occurrences are marked on the match lines as:
        >    start of occurrence
        <    end of occurrence
      
For DNA, occurrences are marked on the match line as:
        >    start of occurrence
        <    end of occurrence
        *    start and end of two occurrences
      

Options

Option Parameter Description Default Behavior
General Options
-dna Sequence is DNA.
-prot Sequence is protein.
-s Print whole matching sequences only.
-p Print positions only, not sequence; 1-based. (relative to input strand if DNA; see below)
-m Print IDs of matching sequences only.
-x Print IDs of non-matching sequences only.
-o Print occurrences only in "raw" format.
-f Print occurrences only in FASTA format. 1-based positions (relative to the strand of the match if DNA) are appended to the sequence ID.
-a Print all occurrences (even overlapping ones); ignored unless -o or -f given.
-norc Only print matches to given strand. print matches for both DNA strands.
-prosite <re> is in PROSITE format. print matches for both DNA strands.
-erase Replace occurrences with 'N's (-dna) or 'X's (-prot).

Examples

    fasta-grep WGATAAN -dna < ~/crp0.s
    fasta-grep A[AT]G -dna < ~/crp0.fasta