fimo [options] <motif file> <sequence file>
You can define the statistical threshold (p-value) for motifs and whether FIMO
scans just the given sequences or their reverse complements (where applicable), too.
The program uses a dynamic programming algorithm to convert log-odds
scores into p-values, assuming a zero-order background model.
By default the program reports all motif occurrences with a p-value less
than 1e-4
. The threshold can be set using the
--thresh option.
The p-values for each motif occurrence are converted to q-values following the method of Benjamini and Hochberg ("q-value" is defined as the minimal false discovery rate at which a given motif occurrence is deemed significant). The --qv-thresh option directs the program to use q-values rather than p-values for the threshold.
If a motif has the strand
feature set to
+/-
(rather than +
), then FIMO will search
both strands for occurrences.
The parameter --max-stored-scores sets the maximum number of motif occurrences that will be retained in memory. It defaults to 100,000. If the number of matches found reaches the maximum value allowed, FIMO will discard 50% of the least significant matches, and new matches falling below the significance level of the retained matches will also be discarded.
FIMO can make use of position-specific priors (PSPs) to improve its identification of true motif occurrences. When priors are provided FIMO uses log-posterior odds scores instead of log-odds scores. The log-posterior odds score is described in this paper:
To take advantage of PSPs in FIMO you use must provide two command line options. The --psp option is used to set the name of a file containing the PSP, and the --prior-dist option is used to set the name of a file containing the binned distribution of the PSP.
The PSP can be provided in MEME PSP file format or in wiggle format. The MEME PSP file format requires that a PSP be included for every position in the sequence to be scanned. This format is usually only practical for relatively small sequence files. The wiggle format accommodates sequence segments with missing PSP values. When no PSP is available for a given position, FIMO will use the median PSP from the PSP distribution file. The wiggle format will work with large sequence files, including full genomes.
The PSP and PSP distribution files can be generated from raw scores using the
create-priors
utility.
The name of a file containing MEME formatted motifs. Outputs from MEME, STREME and DREME are supported, as well as Minimal MEME Format. You can convert many other motif formats to MEME format using conversion scripts available with the MEME Suite.
The name of a file containing a collection of sequences in FASTA format.
If only one motif is supplied to FIMO then a hyphen ('-
')
can be used to indicate that the sequence data should be read from
standard input.
FIMO will create a directory, named fimo_out
by default.
Any existing output files in the directory will be overwritten. The
directory will contain:
fimo.html
-
an HTML file that provides the results in a human-readable formatfimo.tsv
-
a TSV (tab-separated values) file
that provides the results in a format suitable for parsing by scripts and viewing with Excelfimo.gff
-
a GFF3 format file
that provides the results in a format suitable for display in the UCSC genome browser
best_site.narrowPeak
-
an Best Site narrowPeak Format file
that provides the genomic coordinates and score of the best site for each motif in each sequencecisml.xml
-
that provides the results in the CisML
schemafimo.xml
-
that describes the inputs to FIMO and references
the CISML file cisml.xml
The default output directory can be overridden using the --o or --oc options which are described below.
The --text option will limit output to TSV (tab-separated values) results sent to the standard output. This will also disable the calculation and printing of q-values.
Note: See this detailed description of the FIMO output formats for more information.
Option | Parameter | Description | Default Behavior |
---|---|---|---|
Output | |||
--skip-matched-sequence | Like the --text option, this limits output to tab-separated values (TSV) sent to standard out, but in addition, turns off output of the sequence of motif matches. This speeds up processing considerably. | All formats are output to files in the selected output directory. | |
--best-site | Limits output to FIMO Best Site narrowPeak Format sent to standard output. This output provides the genomic coordinates of the single best site for each motif in each sequence. Only sites that pass the significance threshold (see "Scoring", below) are considered. | All formats are output to files in the selected output directory. | |
Motifs | |||
--motif | id | Use only the motif identified by id. This option may be repeated. | Use all motifs. |
--motif-pseudo | count | A pseudocount to be added to each count in the motif matrix, after first multiplying by the corresponding background frequency | A pseudocount of 0.1 is used. |
Sequences | |||
Background Model and Priors | |||
Scoring | |||
--thresh | num | The output threshold for displaying search results. Only search results with a p-value less than the threshold will be output. The threshold can be set to use q-values rather than p-values via the --qv-thresh option. | The threshold is a p-value of 1e-4. |
--qv-thresh | Directs the program to use q-values for the output threshold. | The program thresholds on p-values. | |
--no-qvalue | Do not compute a q-value for each p-value. The q-value calculation is that of Benjamini and Hochberg (1995). | The q-values are calculated. | |
--norc | Do not score the reverse complement strand. | Both strands are scored if the alphabet is complementable. | |
--max-strand | If matches on both strands at a given position satisfy the output threshold, only report the match for the strand with the higher score. If the scores are tied, the matching strand is chosen at random. | Both matches are reported. | |
--max-stored-scores | max | Set the maximum number of scores that will be stored. Keeping a complete list of scores may exceed available memory. Once the number of stored scores reaches the maximum allowed, the least significant 50% of scores will be dropped. In this case, the list of reported motifs may be incomplete and the q-value calculation will be approximate. | The maximum number of stored matches is 100,000. |
Misc |