fimo [options] <motifs> <database>
The name FIMO stands for "Find Individual Motif Occurences." The program searches a database of DNA or protein sequences for occurrences of known motifs, treating each motif independently.
The program uses a dynamic programming algorithm to convert log-odds
scores (in bits) into p-values, assuming a zero-order background model.
By default the program reports all motif occurrences with a p-value less
than 1e-4
. The threshold can be set using the
--thresh option.
The p-values for each motif occurence are converted to q-values following the method of Benjamini and Hochberg ("q-value" is defined as the minimal false discovery rate at which a given motif occurrence is deemed significant). The --qv-thresh option directs the program to use q-values rather than p-values for the threshold.
If a motif has the strand
feature set to
+/-
(rather than +
), then fimo will search
both strands for occurrences.
The parameter --max-stored-scores sets the maximum number of motif occurrences that will be retained in memory. It defaults to 100,000. If the number of matches found reaches the maximum value allowed, FIMO will discard 50% of the least significant matches, and new matches falling below the significance level of the retained matches will also be discarded.
FIMO can make use of position specific priors (PSP) to improve its identification of true motif occurrences. To take advantage of PSP in FIMO you use must provide two command line options. The --psp option is used to set the name of a MEME PSP file, and the --prior-dist option is used to set the name of a file containing the binned distribution of priors.
A file containing a list of motifs in MEME format.
A file containing a collection of sequences in FASTA format.
If only one motif is supplied to FIMO then a dash ('-
')
can be used to indicate that the sequence data should be read from
standard input.
The FASTA header lines are used as the source of sequence names. The
sequence name is the string following the initial '>' up to the first
white space character. If the sequence name is of the form:
text:number-number
, the text portion will be used as the
sequence name. The numbers will be used as genomic coordinates, and the
first number will be used as the coordinate of the first position of
the sequence. In all other cases the coordinate of the first postion of
the sequence is taken as 1.
FIMO will create a directory, named fimo_out
by default.
Any existing output files in the directory will be overwritten. The
directory will contain:
fimo.xml
using the
CisML
schema.fimo.html
fimo.text
fimo.gff
The default output directory can be overridden using the --o or --oc options which are described below.
The --text option will limit output to plain text sent to the standard output. This will disable the calculation of q-values.
The score reported in the GFF output is
min(1000, -10*(log10(pvalue)))
.
The HTML and plain text output contain the following columns:
+
' indicates the motif matched the forward
strand, '-
' the reverse strand, and '.
'
indicates strand is not applicable (as for amino acid sequences).The HTML and plain text output is sorted by increasing p-value.
Option | Parameter | Description | Default Behaviour |
---|---|---|---|
General Options | |||
--alpha | num | The alpha parameter for calculating position specific priors. Alpha represents the fraction of all transcription factor binding sites that are binding sites for the TF of interest. Alpha must be between 0 and 1. | An alpha value of 1 is used. |
--bgfile | background file | Read background frequencies from
background file. The file should be in
MEME background file format.
The default is to use frequencies embedded in the application from
the non-redundant database. If the argument is the keyword
motif-file , then the frequencies will be taken from
the motif file. |
|
--max-seq-length | max | Set the maximum length allowed for input sequences to max. | A maximum sequence length of 250,000,000 is used. |
--max-strand | If matches on both strands at a given position satisfy the output threshold, only report the match for the strand with the higher score. | Both matches are reported. | |
--max-stored-scores | max | Set the maximum number of scores that will be stored. Keeping a complete list of scores may exceed available memory. Once the number of stored scores reaches the maximum allowed, the least significant 50% of scores will be dropped. In this case, the list of reported motifs may be incomplete and the q-value calculation will be approximate. | The maximum number of stored matches is 100,000. |
--motif | id | Use only the motif identified by id. This option may be repeated. | Use all motifs. |
--motif-pseudo | count | A pseudocount to be added to each count in the motif matrix, after first multiplying by the corresponding background frequency | A pseudocount of 0.1 is used. |
--no-qvalue | Do not compute a q-value for each p-value. The q-value calculation is that of Benjamini and Hochberg (1995). | The q-values are calculated. | |
--norc | Do not score the reverse complement DNA strand. | Both strands are scored. | |
--parse-genomic-coord | When this options is specified each sequence header will be
checked for UCSC style genomic coordinates. These are of the form:
>sequence name:starting position-ending position
Where
|
The first position in the sequence will be assumed to be 1. | |
--psp | file | File containing position specific priors (PSP) in MEME PSP format. | |
--prior-dist | file | File containing binned distribution of priors. This file can be generated from a MEME PSP format file. using the compute-prior-dist utility. | |
--qv-thresh | Directs the program to use q-values for the output threshold. | The program thresholds on p-values. | |
--text | Limits output to plain text sent to standard out. For FIMO, the text output is unsorted, and q-values are not reported. This mode allows the program to search an arbitrarily large database, because results are not stored in memory. | ||
--thresh | num | The output threshold for displaying search results. Only search results with a p-value less than the threshold will be output. The threshold can be set to use q-values rather than p-values via the --qv-thresh option. | The threshold is a p-value of 1e-4. |
If you use FIMO in your research, please cite the following paper:
Charles E. Grant, Timothy L. Bailey, and William Stafford Noble,
"FIMO: Scanning for occurrences of a given motif",
Bioinformatics, 27(7):1017-1018, 2011.
[full text]