ame [options] <sequence file> <motif file>+
The name of a file containing a set of (primary) sequences in FASTA format. The FASTA header line of each sequence may contain a number (called a 'FASTA score') immediately following the sequence name that is used by some of AME's statistical enrichment methods.
>sequence_name score other_descriptive_text
The names of one or more files containing MEME formatted motifs. Outputs from MEME, STREME and DREME are supported, as well as Minimal MEME Format. You can convert many other motif formats to MEME format using conversion scripts available with the MEME Suite.
AME writes its output to files in a directory named
ame_out
, which it creates if necessary. You can change the
output directory using the --o or --oc options.
The directory will contain the following files:
ame.html
-
an HTML file that provides the results in a human-readable format
ame.tsv
-
a TSV (tab-separated values)
the results in a format suitable for parsing by scripts and viewing with Excel
sequences.tsv
-
(optional, --method fisher only)
a TSV (tab-separated values)
file that lists the true- and false-positive sequences identified by AME
In all output files, only results for significantly enriched motifs are reported.
Note: See this detailed description of the AME output formats for more information.
Scores- AME uses two scores for each sequence in computing motif enrichment. The 'PWM score' is computed by scoring the sequence with the motif. The 'FASTA score' is either provided in the sequence header line (see above); otherwise it is the rank of the sequence within the sequence file.
Partition maximization- AME sorts the sequences in increasing order of FASTA score, and then 'partitions' the sequences, labeling the first N sequences 'positive', and the rest 'negative'. AME computes the significance of motif enrichment using this labeling and the PWM scores, and then repeats the process using values of N from 1 to the total number of sequences. AME reports the partition with the highest significance.
Variations- The above behavior can be modified using the options described below. For example, with some enrichment methods you can switch the roles of the FASTA and PWM scores (see options --poslist and --linreg-switchxy, below). With two enrichment methods (fisher and ranksum), you can provide control sequences (see --control, below), which causes both FASTA scores and sequence order to be ignored. Two other enrichment methods (pearson and spearman), which are based on the correlation coefficient, ignore the 'negative' sequences entirely during partition maximization. You can also define which sequences are 'positive' by specifying '--fix-partition N', which causes the first N sequences (sorted by FASTA score) to be labeled 'positive'.
Option | Parameter | Description | Default Behavior |
---|---|---|---|
General Options | |||
--text | Output TSV format only to standard output. | AME behaves as if --oc ame_out had
been specified. |
|
--control | file | A set of control sequences in FASTA format or the keyword --shuffle--. AME will determine if each motif is enriched in the primary sequences compared to the control sequences by labeling the primary sequences 'positive' and the control sequences 'negative', and then applying the enrichment method to that labeling. The keyword --shuffle-- causes AME to create (a minimum of 1000) control sequences by shuffling the letters in each primary sequence while preserving the frequencies of k-mers (see option --kmer, below). Note: The control sequences should have (approximately) the same distribution of lengths as the primary sequences or AME may fail to correctly detect enriched motifs and will report inaccurate p-values. | AME sorts the sequences by FASTA score and performs partition maximization, labeling the first N sequences as positive, for N=1,..,number of sequences. |
--kmer | k | Preserve the frequencies of k-mers when creating a control dataset by shuffling the letters of each primary sequence. | A value of 2 is used. |
--seed | s | Use s as the initial random number seed when shuffling sequence letters. | A value of 1 is used. |
--method | fisher|ranksum|pearson |spearman|3dmhg|4dmhg | The method for testing motif enrichment.
|
The one-tailed Fisher's exact test (fisher ) method is used for testing motif enrichment. |
--scoring | avg|max|sum|totalhits | The method for scoring a single sequence for matches to a motif's PWM.
The PWM score assigned to a sequence is either:
|
The avg scoring method is used. |
--hit-lo-fraction | fraction | The hit threshold for a motif is defined as fraction times the maximum possible log-odds score for the motif. A position is considered a "hit" if the log-odds score is greater than or equal to the hit threshold. | A value of 0.25 is used. |
--evalue-report-threshold | evalue | E-value threshold for reporting a motif as significantly enriched. | A threshold of 10 is used for reporting a motif. |
--fasta-threshold | score | For the Fisher's exact test only when you use --poslist pwm ,
and you do not use --control --fix-partition .
AME will classify sequences with FASTA scores below
score as 'positives'. |
A maximum FASTA score of 0.001 is used by AME to classify a sequence as 'positive'. |
--fix-partition | N | Causes AME to evaluate only the single partition consisting of the first N sequences. May not be use with --control or --poslist pwm. | Partition maximization is performed. |
--poslist | pwm|fasta | For partition maximization, test thresholds on either X (PWM score)
or Y (FASTA score). May not be used with --control or
--fix-partition.
poslist . It switches between
using X and Y for determining true positives in the contingency matrix,
in addition to switching which of X and Y AME uses for partition maximization. |
Use the FASTA score. |
--log-fscores | Convert FASTA scores into log-space. Only relevant for the pearson method. | Use the FASTA score directly. | |
--log-pwmscores | Convert PWM scores into log-space. Only relevant for the pearson method. | Use the PWM score directly. | |
--linreg-switchxy | Make the x-points FASTA scores and the y-points PWM scores. Only relevant for the pearson and spearman methods. | Keep the original axis. | |
--noseq | (--method fisher only)
Do not output the TSV (tab-separated values) file
sequences.tsv .
Note: This option is recommended when there are many many
motifs and many input sequences as the TSV file can become
extremely large.
|
AME outputs file
sequences.tsv ,
which lists the true- and false-positive sequences identified
by AME using Fisher's Exact test. |
|
--verbose | 1|2|3|4|5 | A number that regulates the verbosity level of the output information messages. If set to 1 (quiet) then AME will only output error messages whereas the other extreme 5 (dump) outputs lots of mostly useless information. This option is best placed first. At verbosity level 3, AME will report the significance of each set of each partition of the sequences that it considers. | The verbosity level is set to 2 (normal). |
If you use AME in your research, please cite the following paper:
Robert McLeay and Timothy L. Bailey,
"Motif Enrichment Analysis: A unified framework and method evaluation",
BMC Bioinformatics, 11:165, 2010, doi:10.1186/1471-2105-11-165.
[full text]