fasta-grep

fasta-grep displays the non-overlapping occurrences of a PERL regular expression in FASTA sequences. fasta-grep supports the IUPAC alphabets for amino acids and nucleotides.

<re>

A PERL regular expression.

-dna | -prot

The sequences are DNA or protein, respectively.

Reads FASTA formatted sequences from standard input.

Writes matches to the regular expression to standard output or, optionally, a file. The output is in the form

        FASTA sequence ID line followed by number of matches
        [line 1 of sequence]
        [match line 1]
        [line 2 of sequence]
        [match line 2]
        ...
        [last line of sequence]
        [last match line]

For proteins, occurrences are marked on the match lines as:

        >    start of occurrence
        <    end of occurrence

For DNA, occurrences are marked on the match line as:

        >    start of occurrence
        <    end of occurrence
        *    start and end of two occurrences

Option	Description	Default Behavior
General Options
-dna	Sequence is DNA.
-prot	Sequence is protein.
-s	Print whole matching sequences only.
-p	Print positions only, not sequence; 1-based. (relative to input strand if DNA; see below)
-m	Print IDs of matching sequences only.
-x	Print IDs of non-matching sequences only.
-o	Print occurrences only in "raw" format.
-f	Print occurrences only in FASTA format. 1-based positions (relative to the strand of the match if DNA) are appended to the sequence ID.
-a	Print all occurrences (even overlapping ones); ignored unless -o or -f given.
-norc	Only print matches to given strand.	print matches for both DNA strands.
-prosite	<re> is in PROSITE format.	print matches for both DNA strands.
-erase	Replace occurrences with 'N's (-dna) or 'X's (-prot).

    fasta-grep WGATAAN -dna < ~/crp0.s
    fasta-grep A[AT]G -dna < ~/crp0.fasta

The MEME Suite

Motif-based sequence analysis tools

Usage:

Description

Input

<re>

-dna | -prot

Output

Options

Examples