AME

Input

<sequence file>

The name of a file containing a set of (primary) sequences in FASTA format. The FASTA header line of each sequence may contain a number (called a 'FASTA score') immediately following the sequence name that is used by some of AME's statistical enrichment methods.

>sequence_name score other_descriptive_text

The (optional) FASTA scores can represent any biological signal related to the sequences such as expression level, peak height or fluorescence score. If the sequences do not contain FASTA scores, some of AME's statistical enrichment methods utilize the order of the sequences in the sequence file.

<motif file>+

The names of one or more files containing MEME formatted motifs. Outputs from MEME, STREME and DREME are supported, as well as Minimal MEME Format. You can convert many other motif formats to MEME format using conversion scripts available with the MEME Suite.

Output

AME writes its output to files in a directory named ame_out, which it creates if necessary. You can change the output directory using the --o or --oc options. The directory will contain the following files:

ame.html - an HTML file that provides the results in a human-readable format
ame.tsv - a TSV (tab-separated values) the results in a format suitable for parsing by scripts and viewing with Excel
sequences.tsv - (optional, --method fisher only) a TSV (tab-separated values) file that lists the true- and false-positive sequences identified by AME

In all output files, only results for significantly enriched motifs are reported.

Note: See this detailed description of the AME output formats for more information.

Algorithm

Scores- AME uses two scores for each sequence in computing motif enrichment. The 'PWM score' is computed by scoring the sequence with the motif. The 'FASTA score' is either provided in the sequence header line (see above); otherwise it is the rank of the sequence within the sequence file.

Partition maximization- AME sorts the sequences in increasing order of FASTA score, and then 'partitions' the sequences, labeling the first N sequences 'positive', and the rest 'negative'. AME computes the significance of motif enrichment using this labeling and the PWM scores, and then repeats the process using values of N from 1 to the total number of sequences. AME reports the partition with the highest significance.

Variations- The above behavior can be modified using the options described below. For example, with some enrichment methods you can switch the roles of the FASTA and PWM scores (see options --poslist and --linreg-switchxy, below). With two enrichment methods (fisher and ranksum), you can provide control sequences (see --control, below), which causes both FASTA scores and sequence order to be ignored. Two other enrichment methods (pearson and spearman), which are based on the correlation coefficient, ignore the 'negative' sequences entirely during partition maximization. You can also define which sequences are 'positive' by specifying '--fix-partition N', which causes the first N sequences (sorted by FASTA score) to be labeled 'positive'.

Options

Option	Parameter	Description	Default Behavior
General Options
--text		Output TSV format only to standard output.	AME behaves as if `--oc ame_out` had been specified.
--control	file	A set of control sequences in FASTA format or the keyword --shuffle--. AME will determine if each motif is enriched in the primary sequences compared to the control sequences by labeling the primary sequences 'positive' and the control sequences 'negative', and then applying the enrichment method to that labeling. The keyword --shuffle-- causes AME to create (a minimum of 1000) control sequences by shuffling the letters in each primary sequence while preserving the frequencies of k-mers (see option --kmer, below). Note: The control sequences should have (approximately) the same distribution of lengths as the primary sequences or AME may fail to correctly detect enriched motifs and will report inaccurate p-values.	AME sorts the sequences by FASTA score and performs partition maximization, labeling the first N sequences as positive, for N=1,..,number of sequences.
--kmer	k	Preserve the frequencies of k-mers when creating a control dataset by shuffling the letters of each primary sequence.	A value of 2 is used.
--seed	s	Use s as the initial random number seed when shuffling sequence letters.	A value of 1 is used.
--method	fisher\|ranksum\|pearson \|spearman\|3dmhg\|4dmhg	The method for testing motif enrichment. `fisher` - the one-tailed Fisher's Exact test. By default, AME performs partition maximization, labeling sequences sorted by FASTA score, and classifies them using the hit threshold (see --hit-lo-fraction, below). If you specify which sequences are 'positive' using either '--control' or '--fix-partition', AME instead maximizes over all possible PWM thresholds that are at least as large as the sequence threshold defined for the scoring method in use (see --scoring, below). `ranksum` - the one-tailed Wilcoxon rank-sum test, also known as the Mann-Whitney U test. `pearson` - the significance of the Pearson correlation coefficient between the PWM score and the FASTA score. Requires FASTA scores in the all sequence headers. If there are fewer than 30 sequences, AME computes the mean-squared error of the linear regression between the PWM score and the FASTA score instead. Not valid with `--control`. `spearman` - the significance of Spearman's rank coefficient (ρ) between the PWM score ranks and the FASTA score ranks. Not valid with `--control`. `3dmhg` and `4dmhg` - the 3-dimensional (`3dmhg`) and 4-dimensional (`4dmhg`) multi-hypergeometric tests are two-tailed tests described in McLeay and Bailey, "Motif Enrichment Analysis: a unified framework and an evaluation on ChIP data", BMC Bioinformatics 11:165, 2010. These tests require `--scoring totalhits`; the `3dmhg` function discriminates among sequences with 0, 1 or ≥ 2 hits, and the `4dmhg` function discriminates among sequences with 0, 1, 2 or ≥ 3 hits. Note: Motifs enriched in either the primary or control sequences (or at the top or bottom of the sequences if you only give one sequence file) are considered significant by these tests. Not valid with `--control`.	The one-tailed Fisher's exact test (`fisher`) method is used for testing motif enrichment.
--scoring	avg\|max\|sum\|totalhits	The method for scoring a single sequence for matches to a motif's PWM. The PWM score assigned to a sequence is either: `avg` - the average motif odds score of all positions in the sequence; the sequence threshold assumes that the sequence has one "hit" (see --hit-lo-fraction, below) and the rest of the sites in the sequence have an average odds of 1. `max` - the maximum motif odds score over all positions in the sequence; the sequence threshold is equal to hit threshold (see --hit-lo-fraction, below). `sum` - the sum of the motif odds scores of all positions in the sequence; the sequence threshold assumes that the sequence has one "hit" (see --hit-lo-fraction, below) and the rest of the sites in the sequence have an average odds of 1. `totalhits` - the total number of positions in the sequence whose odds score is at least hit score (see `--hit-lo-fraction`, below); the sequence threshold is 1.	The `avg` scoring method is used.
--hit-lo-fraction	fraction	The hit threshold for a motif is defined as fraction times the maximum possible log-odds score for the motif. A position is considered a "hit" if the log-odds score is greater than or equal to the hit threshold.	A value of 0.25 is used.
--evalue-report-threshold	evalue	E-value threshold for reporting a motif as significantly enriched.	A threshold of 10 is used for reporting a motif.
--fasta-threshold	score	For the Fisher's exact test only when you use `--poslist pwm`, and you do not use `--control` `--fix-partition`. AME will classify sequences with FASTA scores below score as 'positives'.	A maximum FASTA score of 0.001 is used by AME to classify a sequence as 'positive'.
--fix-partition	N	Causes AME to evaluate only the single partition consisting of the first N sequences. May not be use with --control or --poslist pwm.	Partition maximization is performed.
--poslist	pwm\|fasta	For partition maximization, test thresholds on either X (PWM score) or Y (FASTA score). May not be used with --control or --fix-partition. `pwm` - Use PWM score (X). `fasta` - Use FASTA score (Y). Hint: Be careful switching the `poslist`. It switches between using X and Y for determining true positives in the contingency matrix, in addition to switching which of X and Y AME uses for partition maximization.	Use the FASTA score.
--log-fscores		Convert FASTA scores into log-space. Only relevant for the pearson method.	Use the FASTA score directly.
--log-pwmscores		Convert PWM scores into log-space. Only relevant for the pearson method.	Use the PWM score directly.
--linreg-switchxy		Make the x-points FASTA scores and the y-points PWM scores. Only relevant for the pearson and spearman methods.	Keep the original axis.
--noseq		(--method fisher only) Do not output the TSV (tab-separated values) file `sequences.tsv`. Note: This option is recommended when there are many many motifs and many input sequences as the TSV file can become extremely large.	AME outputs file `sequences.tsv`, which lists the true- and false-positive sequences identified by AME using Fisher's Exact test.
--verbose	1\|2\|3\|4\|5	A number that regulates the verbosity level of the output information messages. If set to 1 (quiet) then AME will only output error messages whereas the other extreme 5 (dump) outputs lots of mostly useless information. This option is best placed first. At verbosity level 3, AME will report the significance of each set of each partition of the sequences that it considers.	The verbosity level is set to 2 (normal).

The MEME Suite

Motif-based sequence analysis tools

Analysis of Motif Enrichment

Usage: