centrimo [options] <primary sequence file> <motif file>+
The name of a file containing FASTA formatted sequences, ideally all of the same length. The sequences in this file are referred to as the "primary sequences" when a second set of (control) sequences is provided using the --neg option (see below).
The names of one or more files containing MEME formatted motifs. Outputs from MEME, STREME and DREME are supported, as well as Minimal MEME Format. You can convert many other motif formats to MEME format using conversion scripts available with the MEME Suite.
CentriMo writes its output to files in a directory named
centrimo_out
, which it creates if necessary. You can change the
output directory using the --o or --oc options.
The directory will contain:
centrimo.html
-
an HTML file that provides the results in a human-readable format;
this file allows interactive selection of which motifs to plot the
positional distribution for, as well as control over smoothing and other plotting parameters
centrimo.tsv
-
a TSV (tab-separated values)
the results in a format suitable for parsing by scripts and viewing with Excel
site_counts.txt
-
a text file
that lists, for each motif and each sequence position,
the number of sequences where the best match of the motif occurs at the given position
Note: See this detailed description of the CentriMo output formats for more information.
Option | Parameter | Description | Default Behavior |
---|---|---|---|
Input/Output | |||
--neg | control sequence file | Plot the motif distributions in this set (the control sequences) as well. Also, for each enriched region in the primary sequences, the significance of the relative enrichment of the motif in that region in the primary versus control sequences is evaluated using Fisher's exact test. | |
--seqlen | length | Use sequences with the length length ignoring all other sequences in the input file(s). | Use sequences with the same length as the first sequence, ignoring all other sequences in the input file(s). |
Scanning | |||
--score | S | The score threshold for predicting motif sites. By default, motif log-odds scores are used and the threshold S is in bits. If option --use-lo-fraction is given, motif log-odds scorea are still used, but the score threshold is S times the maximum log-odds score possible for the motif. If option --use-pvalues is given, adjusted motif p-values are used instead of log-odds scores. Sequences without a match with score ≥ S (log-odds) or ≤ S (p-values) are ignored. | A threshold of 5 bits (default), 25% of maximum (--use-lo-fraction), or 0.05 (--use-pvalues) is used. |
--use-lo-fraction | The score threshold S (see option --score, above) gives the fraction of the maximum log-odds score for a motif site. Not compatible with option --use-pvalues. | The score threshold S is in bits. | |
--use-pvalues | Use the adjusted motif p-value of potential sites for scoring motifs. The p-value of a potential site is adjusted for the number of possible positions a site could occur in the given sequence. See option --score for how to set the p-value threshold. Not compatible with option --use-lo-fraction. | Use the motif log-odds score for scoring motifs. | |
--norc | Scan only the given strand of sequences. | Scans the given and reverse complement strands of sequences with complementable alphabets. | |
--sep | Create a reverse complement for each given motif and scan separately with both. Note: this option implies --norc. | Scan with the given motifs only. | |
--flip | Reverse complement matches appear 'reflected' around sequence centers. | Do not 'flip' the sequence; use rc of motif instead. | |
Enrichment | |||
--optimize-score | Search for the optimal score subject to the constraint given by the --score option. | The score threshold is used (see option --score, above). | |
--maxreg | max region | The maximum region size to consider. | Try all region sizes up to the sequence width. |
--minreg | min region | The minimum region size to consider. Must be less than max region. | Try regions 1 bp and larger. |
--local | Compute enrichment of all regions. | Compute enrichment of central regions. | |
--cd | Measure enrichment using the average distance between the center of the best site and the sequence center. The score threshold is varied to optimize the significance of the (small) distance, which is computed using the cumulative Bates distribution. Note 1: Only sequences with a site at or above the minimum score are considered (see the --score option, below). Note 2: If a sequence has ties for best site, their average distance is used. Note 3: This option implies the --optimize-score option, and may not be used with options --local, --neg, --minreg or --maxreg. | Enrichment is measured by counting the number of times the best site occurs in the the central region vs. the flanks of the sequence. | |
Output filtering | |||
--ethresh | thresh | Limit the results to motifs with an enriched region whose E-value is less than thresh. Enrichment E-values are computed by first adjusting the binomial p-value of a region for the number of regions tested using the Bonferroni correction, and then multiplying the adjusted p-value by the number of motifs in the input to CentriMo. | Include motifs with E-values up to 10. |
Miscellaneous | |||
--noseq | Do not store sequence IDs in the output of CentriMo. | CentriMo stores a list of the sequence IDs with matches in the best region for each motif. This can potentially make the file size much larger. |