Usage:

meme [options] <primary sequence file>

Description


See these examples of how to use MEME.

Input

<primary sequence file>

The name of a file containing the primary sequences in FASTA format, or the word stdin to indicate that the primary sequences should be read from standard input. Note that MEME does not attempt to detect the alphabet from the primary sequences so you should specify it with the -dna, -rna, -protein or -alph options. MEME also supports a modification to the FASTA format for weighting the primary sequences.

Output

MEME writes its output to files in a directory named meme_out, which it creates if necessary. You can change the output directory using the -o or -oc options. The directory will contain:

MEME gives each motif it discovers a name, which is a consensus sequence that approximately describes the motif. Motif names are intended to be mnemonic only, and are not intended to be used for searching sequences for matches; use the position-specific probability matrix (PSPM) contained in each of the MEME output file formats for that purpose via programs such as FIMO and MAST.

Options

Option Parameter Description Default Behavior
Output
-text Output in text format only to standard output. The program behaves as if -oc meme_out had been specified.
-brief nbrief If there are more than nbrief (primary) sequences, the size of MEME's output will be reduced by suppressing the inclusion of the sequence names, motif sites and scanned sites in MEME's HTML and XML outputs, and by suppressing the tables of sequence lengths, sites and block diagrams in MEME's text output. A value of 1000 is used for nbrief.
Objective Function
-objfun classic|​ de|​ se|​ cd|​ ce|​ nc This option is used to select the objective function that MEME uses to select the width and number of sites in a motif, to calculate the motif's statistical significance, and to choose the best motif in the current iteration.
ValueNameDescription
classicClassic This is the original MEME objective function. It scores motifs using an approximation to the E-value of the information content of the motif. MEME will run much more quickly on larger datasets but the results may differ from previous versions of MEME. To more closely emulate previous versions of MEME, see the -searchsize option, below.
deDifferential Enrichment This objective function scores motifs based on the enrichment of their sites in the primary sequences compared with the control sequences. MEME reserves two subsets of primary sequences---not seen by EM or in the search for EM starts---for use in 1) choosing the width and number of sites of a given motif, and 2) computing the statistical significance of the motif, respectively. (See option -hsfrac, below.) Motif enrichment is computed (by default) using the multiple hypergeometric test (see option -test, below).
seSelective Differential Enrichment This objective function scores motifs based on the enrichment of their sites in the primary sequences compared with the control sequences. It is very similar to the de objective function, but determines the best motif width by masking all but a selected set of highest-information motif columns. (See the discussion of Motif Refinement, below.)
cdCentral Distance This objective function scores motifs based on their tendency to occur near the center of the primary sequences, which must all be of the same length. The width and number of sites of each motif are determined using the Bates distribution applied to the mean of the distances of the best sites from the centers of their sequences. MEME reserves a subset of primary sequences for use in computing the statistical significance of the motif. (See option -hsfrac, below.) Motif enrichment is computed using the Bates distribution. This objective function is designed for use with sequences from whole genome approaches such as ChIP-seq and CLIP-seq, and the primary sequences should include adequate flanking region around the expected motif sites—e.g., use sequences of 500bp for ChIP-seq.
Note: The search for starting points and their optimization are limited to the central regions of sequences, as determined by the -cefrac option, below.
ceCentral Enrichment This objective function scores motifs based on their enrichment in the central regions of the primary sequences, which must all be of same the length. (See option -cefrac, below.) The width and number of sites of each motif are determined using the binomial test. MEME reserves a subset of primary sequences for use in computing the statistical significance of the motif. (See option -hsfrac, below.) Motif enrichment is computed using the binomial test. This objective function is designed for use with sequences from whole genome approaches such as ChIP-seq and CLIP-seq, and the primary sequences should include adequate flanking region around the expected motif sites—e.g., use sequences of 500bp for ChIP-seq.
ncNumerically Correct This objective function is identical to the Classic algorithm except it scores motifs based on the estimate of the p-value of the LLR according to the NC algorithm of Hertz and Stormo, rather than using the estimate based on the product of p-values of the LLR of motif columns. Warning: Using this function causes MEME to run very slowly.
MEME uses the Classic objective function.
-test mhg|​ mbn|​ mrs The statistical test used for computing motif enrichment when the objective function is specified as Differential Enrichment (-objfun de) or Selective Differential Enrichment (-objfun se). The three options cause MEME to use the the multiple hypergeometric test (mhg), the multiple binomial test (mbn), or the multiple rank-sum test (mrs), respectively. MEME uses the multiple hypergeometric test with the de se objective functions.
-use_llr Use the log-likelihood ratio method rather than the original E-value method for evaluating EM starting points when the objective function is set to classic. (See "Search for starting points" below for more details.) The classic objective function uses the original E-value method for evaluating EM starting points.
Control Sequences, Hold-out Set and Efficiency Control
-neg control sequences The name of a file containing control sequences in FASTA format. The control sequences must be in the same sequence alphabet as the primary sequences. If the objective function requires control sequences and none are provided, MEME creates a set of control sequences by shuffling a copy of each primary sequence while preserving the frequencies of k-mers (see next option.)
-shuf kmer Preserve the frequencies of words of this size when shuffling primary sequences to create control sequences. kmer must be in the range [1,..,6].
Note: Using kmer = 1 is not advised if your sequences have been masked (e.g., using RepeatMasker) since the runs of "N"s in the masked sequence will be broken up by shuffling, resulting in shuffled sequences with very different statistical properties than the primary sequences.
MEME preserves the frequencies of words of length 2.
-hsfrac hsfrac The minimum fraction of the primary sequences that will be held out for estimating the width, number of sites and significance of motifs for certain objective functions (see above). A larger fraction of the primary sequences may be held out if the size of the primary sequences exceeds the value specified using the -searchsize option (see below).
Note: The letter frequencies in the final motifs reported by MEME are based on all primary sequences, including those in the hold-out set(s).
MEME reserves 0.5 (50%) of the primary sequences for estimating the width, number of sites and significance of motifs with objective functions that require hold-out set(s).
-cefrac cefrac The size of the central region (as a fraction of sequence length) that MEME uses for computing central enrichment of motifs when the Central Enrichment objective function (-objfun ce) is specified.
Note: Candidate starting points and EM are also restricted to this portion of the sequences with both the Central Enrichment (-objfun ce) and Central Distance (-objfun cd) objective functions.
MEME uses a central region consisting of the middle 0.25 (25%) of each sequence with the Central Enrichment objective function.
-searchsize searchsize The maximum portion of the primary sequences (in letters) used by MEME in searching for motifs. To reduce running time, if the total size of the primary sequences exceeds searchsize letters, then MEME performs the search for motifs on a random sample of the primary sequences of (approximately) searchsize letters. To ensure that the order of the primary sequences has no effect, MEME orders the sequences alphabetically (by sequence, not sequence ID) before choosing the sample (unless you specify option -norand, see below).
Note 1: This option only affects the search for motifs; the motif letter frequencies and sites reported by MEME are based on all primary sequences.
Note 2: MEME's running time will increase roughly as the square of searchsize.
Note 3: Unless you specify searchsize as 0, no individual sequence may be longer than searchsize.
Note 4: If you specify searchsize as 0, MEME will use all of the sequences (no sampling). This will cause the classic objective function to mostly closely emulate previous versions of MEME, and MEME will run very slowly on large datasets.
MEME performs sampling if the primary sequences contain more than 100,000 letters. (This is done to improve MEME's running time.)
-norand If your (primary) sequences are sorted in order of confidence (best to worst) then you should select this option. It causes MEME not to randomize the order of the input sequences if there are more than searchsize letters in the (primary) sequences (see the -searchsize option, above).
Note: Using this option may make the results of MEME depend on the order of the (primary) sequences.
The order of the input sequences is randomized before applying the -searchsize option. (This is done to ensure that MEME's results, by default, do not depend on the order of the input sequences.)
-csites csites Maximum number of sites used for computing the E-value with the Classic objective function (-objfun classic). To reduce running time, if the total number of primary sequences exceeds this value with site distribution models oops and zoops, then the search for motifs is performed on a random sample of the primary sequences of (approximately) csites sequences. If the site distribution model is anr, sequence sampling is not done, but the maximum number of sites a motif may have is limited to csites (unless this is overridden by the -maxsites or -nsites options).
Note 1: This only affects the search for motifs; the motif letter frequencies and sites reported by MEME are based on all sequences.
Note 2: MEME's running time will increase roughly as the cube of csites when it is larger than 10,000.
Note 3: This option has no effect if the objective function is not Classic.
With the Classic objective function, MEME performs sequence sampling if there are more than 1000 primary sequences (site distribution models oops and zoops), or limits the total number of sites per motif to 1000 (site distribution model anr).
Note: You can suppress sequence sampling by setting searchsize to 0 using the -searchsize option, above.
-seed seed Random seed for shuffling and sampling sequences (see above). MEME uses a random seed of 0.
Alphabet
Strands & Palindromes
-revcomp Consider both the given strand and the reverse complement strand when searching for motifs in a complementable alphabet (ie DNA). MEME searches for motifs of complementable alphabets on the given strand only.
-pal This causes MEME to only look for palindromes in complementable datasets. During EM, MEME will average the letter frequencies in complementary columns of the motif (PSPM) together. For instance, if the width of the motif is 10, columns 1 and 10, 2 and 9, 3 and 8, etc., are averaged together. For DNA the averaging would combines the frequency of A in one column with T in the other, and the frequency of C in one column with G in the other. MEME does not average the letter frequencies in complementary columns of motifs.
Contributing Site Distribution
-modoops|​zoops|​anr This option is used to describe the distribution of motif sites.
ValueNameDescription
oopsOne Occurrence Per Sequence MEME assumes that each sequence in the dataset contains exactly one occurrence of each motif. This option is the fastest and most sensitive but the motifs returned by MEME may be "blurry" if any of the primary sequences is missing them.
zoopsZero or One Occurrence Per Sequence MEME assumes that each sequence may contain at most one occurrence of each motif. This option is useful when you suspect that some motifs may be missing from some of the primary sequences. In that case, the motifs found will be more accurate than using the first option. This option takes more computer time than the first option (about twice as much) and is slightly less sensitive to weak motifs present in all of the primary sequences.
anrAny Number of Repetitions MEME assumes each sequence may contain any number of non-overlapping occurrences of each motif. This option is useful when you suspect that motifs repeat multiple times within a single sequence. In that case, the motifs found will be much more accurate than using one of the other options. This option can also be used to discover repeats within a single sequence. This option takes the much more computer time than the first option (about ten times as much) and is somewhat less sensitive to weak motifs which do not repeat within a single sequence than the other two options.
Note: tcm is a synonym for anr for backwards compatibility.
MEME assumes the Zero or One Occurrence Per Sequence model.
Number of Motifs
-nmotifsnmotifs MEME will stop searching for motifs after finding nmotifs motifs. MEME will find 1 motif.
-evtev MEME will stop searching for motifs if the last motif found has an E-value > ev, or when 1000 motifs have been found. If you also specify option -nmotifs (above), MEME will stop searching if nmotifs have been found. MEME will rely on other limits to decide when to stop searching for motifs.
-timesecs MEME will stop searching for motifs if it has found at least 1 motif and it estimates that finding any more will cause the total running time to exceed secs seconds. MEME will rely on other limits to decide when to stop searching for motifs.
Number of Motif Occurrences
-nsitesnsites When the motif site distribution model allows, MEME will only attempt to find motifs with nsites sites. Specifying this option is equivalent to setting the -minsites and -maxsites options to the same value. When the site distribution model is oops this is ignored and the number of sites is set to the number of primary sequences. See the -minsites and -maxsites options for information on the default behavior.
-minsitesminsites When the motif site distribution model allows, MEME will attempt to find motifs with at least minsites sites. When the site distribution model is oops this is ignored and the number of sites is set to the number of primary sequences. The minimum number of sites is set to 2 when it is not otherwise defined by the use of the oops model or the -nsites option.
-maxsitesmaxsites When the motif site distribution model allows, MEME will attempt to find motifs with at most maxsites sites. When the site distribution model is oops this is ignored and the number of sites is set to the number of primary sequences. When the site distribution model is zoops maxsites is set to the number of primary sequences. When the site distribution model is anr maxsites is set to 5 times the number of primary sequences.
-wnsitesweight This controls the strength of the bias towards motifs with exactly the expected number of sites as defined by the -nsites, -minsites and -maxsites options. It is a number in the range [0..1). The closer to one it is, the stronger the bias towards motifs with exactly the expected number of sites. The value of weight is set to 0.8.
Motif Width
-ww Search for motifs with a width of w. Search for motifs with widths between the range set by -minw and -maxw.
-minwminw Search for motifs with a width ≥ minw. Searches for motifs with a minimum width of 8.
-maxwmaxw Search for motifs with a width ≤ maxw. Searches for motifs with a maximum width of 50.
-allw Find starting points for EM for each possible motif width from minw to maxw, inclusive.
Note: Using this option with large motif width ranges will slow MEME substantially.
MEME finds EM starting points only for widths in a geometric progression from minw to maxw and then refines the width of the motif after running EM.
-nomatrim Do not adjust motif width using multiple alignments. With the Classic objective function, the motif is trimmed to avoid insertions and deletions.
Note: This is not done with the other objective functions, so this option has no effect with them.
-wgwg The gap opening cost for creating the alignments used to trim the motif. The opening cost for a gap is 11.
-wsws The gap extension cost for creating the alignments used to trim the motif. The extension cost for a gap is 1.
-noendgaps Do not count end gaps in the alignments used to trim the motif. End gaps are penalized like any other gap.
Background Sequence Model
-bfilebfile The name of Markov background model file. You can use a background model with MEME in order to normalize for biased distribution of letters and groups of letters in your sequences. A 0-order model adjusts for single letter biases, a 1-order model adjusts for dimer biases (e.g., GC content in DNA sequences), etc.

The background model is used by MEME for four purposes:

  1. during EM as the "null model",
  2. for calculating the log likelihood ratio of a motif,
  3. for calculating the significance (E-value) of a motif, and,
  4. for creating the position-specific scoring matrix (log-odds matrix).
If the -markov_order option is given (see below), MEME only reads the Markov model in bfile up to the given order, ignoring higher order entries.

Note that MEME uses only the 0-order portion (single letter frequencies) of the background model for purpose 4 for all objective functions other than classic, and for purposes 3 and 4 when the objective function is classic.

Note that if both strands are being considered, MEME will create a background model that averages the frequencies of letters (0-order) or words (higher order) and their reverse complements.

A background model of the order requested (see -markov_order) is estimated from the primary sequences.
-markov_order order The (maximum) order of the Markov model to read from the background file or to create from the primary sequences. If the -bfile option is not specified, MEME uses the fasta-get-markov program with a total pseudocount of 0.1 to create a Markov model from the primary sequences. MEME uses the 0-order Markov model from the background file, or creates a 0-order Markov model using an add-one prior from the primary sequences.
Position-Specific Priors
-pspPSP file The name of a MEME position-specific priors file. This can be used to bias the search for motifs in MEME. The PSP file supplies a position-specific prior distribution on the location of motif sites in sequence(s) in the input dataset. All motif sites are considered equally likely.
EM Algorithm
-maxitermaxiter The number of iterations of EM to run from any starting point. EM is run for maxiter iterations or until convergence (see -distance, below) from each starting point. MEME will use a maximum of 50 EM iterations from any starting point.
-distancedistance MEME stops iterating EM when the change in the motif frequency matrix is less than distance. Change is defined as the euclidean distance between two successive frequency matrices. The distance used for measuring convergence is 0.001.
-priordirichlet|​dmix|​mega|​megap|​addone The type of prior to use.
ValueNameDescription
dirichletSimple Dirichlet This is the default for DNA sequences. It is a simple dirichlet prior based on the background model as set by the -bfile option.
dmixDirichlets Mix This is the default for protein sequences when using the oops site distribution model. It is a mixture of Dirichlets prior. The source of the Dirichlets is specified by the -plib option.
megaMega-weight Dirichlets Mix This is an extremely low variance version of dmix. The variance is scaled inversely with the size of the dataset.
megapMega-weight Dirichlets Mix Plus This is the default for protein sequences when using the zoops or anr models. This behaves like mega until the last iteration of EM when it reverts to dmix behavior.
addoneAdd One Add +1 to each observed count.
When the prior is not selected:
ModelDNAProtein
One Occurrence Per Sequence dirichlet dmix
Zero or One Occurrence Per Sequence dirichlet megap
Any Number of Repetitions dirichlet megap
-bb The strength of the prior on model parameters: b = 0 means use intrinsic strength of prior for prior = dmix.
Priorb
dirichlet0.01
dmix0
-plibplib The name of the file containing the Dirichlet mixtures prior library. The default value of plib depends on the sequence alphabet.
Alphabetplib
DNAINSTALL_DIR/etc/dna.plib
ProteinINSTALL_DIR/etc/prior30.plib
Selecting Starts for EM
-spfuzzfuzz The fuzziness of sequence to theta mapping. The meaning of this parameter depends on the choice of mapping function as set by the -spmap option. The default value of fuzz depends on the mapping function (see -spmap).
Mapping Functionfuzz
Uniform2 / | alphabet |
e.g., 0.5 for DNA
Point Accepted Mutation120
-spmapuni|pam The mapping function to use for estimating theta.
ValueNameDescription
uniUniform Add a uniform prior of fuzz when converting a substring to an estimate of theta.
pamPoint Accepted Mutation Use PAM matrices, with the number of mutation events set by fuzz, to estimate theta.
The default mapping function depends on the sequence alphabet.
AlphabetMapping Function
DNAUniform
ProteinPoint Accepted Mutation
-consconsensus (May be repeated.) Override the sampling of starting points and just use a starting point derived from consensus. This is useful when an actual occurrence of a motif is known and can be used as the starting point for finding the motif. If this option is specified N times, starting point sampling will be suppressed for the first N motifs.
Note: With DNA and RNA sequences, motifs shorter than 6 can be difficult for MEME to find, so you may want to pad the consensus sequence with "N"s to width 6, e.g. -cons NGATAA rather than -cons GATAA.
Refer to the -spmap option for the default behavior.
Misc
-p np Use faster, parallel version of MEME with np processors. The parameter np may be a number or it may be a quoted string starting with a number and followed by arguments to the particular MPI run command for your installation (e.g., mpirun). Use a single processor.
-maxsizemaxsize Maximum allowed dataset size (in letters). You can set this in scripts to prevent very large running times. A maxsize value of 0 means "no limit".
Note: Unlike previous versions of MEME, there is no limit on dataset size by default, so this option does not normally need to be used.
No limit on dataset size.
-nostatus Print no status messages to terminal. Print minimal status messages to terminal.
-sfsf Print sf as name of sequence file. Print actual file name.
-V Print extensive status messages to terminal. Print minimal status messages to terminal.

MEME algorithm overview

MEME searches for motifs in a four step process:

  1. Search for starting points.
  2. Expectation Maximization (EM).
  3. Motif Refinement.
  4. Motif Erasing.
First, MEME searches for words in the primary sequences that are likely to be good starting points for EM. Second, MEME runs EM to convergence from each such starting point. Third, after EM converges, MEME refines the width and depth (number of sites) of the motif. Finally, MEME performs a "soft" erase procedure that essentially erases all the sites of the best motif found proportional to the certainty that the site is real. MEME repeats the above four steps to find subsequent motifs until a stopping criterion is met, as specified in the "Number of Motifs" section above.

Expectation Maximization. The EM algorithm is used to generate motif models that are optimal with respect to the primary sequences in a maximum likelihood sense. EM is prone to getting stuck in local optima, so it is important to start it from several good, initial motif models. Thus, MEME converts each candidate starting point to an initial motif model and runs EM to convergence.

Motif Refinement. MEME subjects each motif found by EM to further refinement using the selected objective function, selecting the as the final motif the one with the best value of the objective function. This refinement is over both the width and depth (number of sites) of the motif. To compute the objective function on a motif, MEME predicts all possible sites using the motif, sorts them, and then applies the objective function to the sorted sites. With the Classic and Numerically Correct objective functions, MEME tests all possible sub-motifs consisting of consecutive columns of the motif in this way. With the SE objective function, trims the motif to a single width by selecting the highest nkeep columns and masking the remainder by setting them to the background frequencies, where nkeep = max(6, 6 + floor(width-6/3). MEME uses a greedy search over sub-motifs with high information content. First, it sorts the columns of the motif by information content. Then MEME tests the sub-motifs spanned by the first 2, 3, ... etc. columns. Consequently, if the first and last motif columns have the highest information content, only the original motif is tested. With the Classic and Numerically Correct objective functions, MEME also further shortens each candidate submotif using a procedure that minimizes the number of gaps in a multiple alignment of the motifs sites.

Motif Erasing. To enable EM to find additional motifs, MEME "erases" the predicted sites of the last motif found. This is done in a "soft" fashion by adjusting a weight associated each position in the primary sequences, reducing it in proportion to its likelihood as the the start of a site for the motif. These weights are used when computing the scores of both starting points and final motifs, and a low weight causes a sequence position to be less likely to be chosen as a starting point and reduces its contribution to the motif model during EM.

Motif Objective Function

MEME uses an objective function on motifs when refining a motif, and to select the "best" motif from among all the refined motifs obtained from different starting points for EM. MEME supports several choices of objective function, which are described above under the -objfun option. Each of the objective functions is based on the significance (E-value) of the motif according to a statistical test.

The Classic objective function is based on the statistical significance of the log likelihood ratio (LLR) of the occurrences of the motif. The E-value of the motif is an estimate of the number of motifs (with the same width and number of occurrences) that would have equal or higher log likelihood ratio if the input sequences had been generated randomly according to the (0-order portion of the) background model. An approximate algorithm is used to estimate the probability of the observed (or better) LLR, and this p-value is then multiplied by an estimate of the number of possible motifs to give the E-value.

The Differential Enrichment and Selective Enrichment objective functions are based on the ability of the motif to discriminate predicted sites in the primary sequences from those in the control sequences. The motif is used to predict sites in a held-out subset of the primary sequences, and in the control sequences. The sites are sorted by their likelihood ratio scores. A statistical test is then applied to the sites in the sorted list above and below each possible score threshold. With the Selective Enrichment objective function, the best p-value among the tests is then multiplied by the total number of motifs and thresholds tested to give the E-value of the final motif. With the Discriminative Enrichment objective function, the best motif is selected as above, and then a second hold-out set of primary sequences is used for computing the final E-value. The statistical validity of the E-value is ensured by basing it on held-out primary sequences that are not used in finding the motif.

The Central Enrichment and Central Distance objective functions are based on the central enrichment of motif in the central region of the (equal-length) primary sequences. The motif is used to predict sites in each of the primary sequences, and the sites are sorted by their likelihood ratio scores. With the Central Enrichment objective function, the binomial test is then applied to the counts of the number of times the highest-scoring site occurs in the central vs. the flanking regions of the sequences. With the Central Distance objective function, the average distance of the best site in each sequence to the center of the sequence is computed, and its significance is estimated using the Bates distribution.

Multiple Alignment Trimming (Classic objective function only)

Once a candidate motif has been found the multiple alignment method performs a separate pairwise alignment of the site with the highest probability and each other possible site (the alignment includes width/2 positions on either side of the sites).

The pairwise alignments are then combined and the method determines the widest section of the motif with no insertions or deletions. If this alignment is shorter than minw, it tries to find an alignment allowing up to one insertion/deletion per motif column. This continues (allowing up to 2, 3 ... insertions/deletions per motif column) until an alignment of width at least minw is found.

The switches -nomatrim, -wg, -ws and -noendgaps control trimming of motifs using the multiple alignment method. Specifying the -nomatrim option causes MEME to skip trimming altogether. The -wg and -ws options set the costs of including gaps in the alignments and the -noendgaps allows gaps at the beginning and end to be treated specially.

After trimming, the number of occurrences is then adjusted to maximize the motif E-value, and then the motif width is further shortened to optimize the E-value.

Running time on large inputs

MEME's running time depends on the type of model you have chosen: oops < zoops < anr. Running time also depends on the objective function you have chosen: ce,cd < de, se < classic « nc, for small datasets; the order reverses (except for nc) for very large datasets. Running time also is affected by the options -csites, -searchsize, -hsfrac, or -cefrac. Running time is shorter with smaller alphabets (e.g., DNA) compared to with larger alphabets (e.g., Protein). Running time increases with the number of motif widths that MEME considers (see the options -w, -maxw, -minw, and -allw). The table below gives more details on the relationship between sequence dataset size and MEME's running time with large datasets.

objective functionrunning time per motif (for large datasets)
classic, nc approximately nlog(n) for motifs with up to csites sites, where n is the size of the sequence file; thereafter, the running time is essentially constant
de, se approximately nlog(n) for sequence files with up to searchsize divided by hsfrac letters, where n is the size of the (primary) sequence file; thereafter, the running time is essentially constant
ce, cd approximately nlog(n) for sequence files with up to searchsize divided by cefrac letters, where n is the size of the (primary) sequence file; thereafter, the running time is essentially constant

The following table shows actual running times for MEME on DNA datasets containing different numbers of 100bp sequences on a 4 GHz Intel Core i7 processor with 16GB of memory. MEME was run using either one or 6 processors (option -p), different models and objective functions (options -mod and -objfun), and the option -revcomp. (The parallel version of MEME scales up to about 128 processors. Please see https://academic.oup.com/bioinformatics/article/12/4/303/183600 for a discussion of the parallel version of MEME.)

processors OOPS model ZOOPS model ANR model
1
6

Examples

The following examples use data files provided in the MEME Suite tests/common directory. If you installed the MEME Suite using MacPorts, or if are using a Docker image, you can download the data files by clicking on their name in each of the example command lines, below.

  1. A simple DNA example:

    meme crp0.s -dna -mod oops -pal

    MEME looks for a single motif in the file crp0.s which contains DNA sequences in FASTA format. The oops site distribution model is used so MEME assumes that every sequence contains exactly one occurrence of the motif. The palindrome switch is given so the motif model (PSPM) is converted into a palindrome by combining corresponding frequency columns. MEME automatically chooses the best width for the motif in this example since no width was specified.

  2. Searching for motifs on both DNA strands:

    meme crp0.s -dna -mod oops -revcomp

    This is like the previous example except that the -revcomp switch tells MEME to consider both DNA strands, and the -pal switch is absent so the palindrome conversion is omitted. When DNA uses both DNA strands, motif occurrences on the two strands may not overlap. That is, any position in the sequence given in the input sequences may be contained in an occurrence of a motif on the positive strand or the negative strand, but not both.

  3. A fast DNA example:

    meme crp0.s -dna -mod oops -revcomp -w 20

    This example differs from example 1) in that MEME is told to only consider motifs of width 20. This causes MEME to execute about 10 times faster. The -w switch can also be used with protein datasets if the width of the motifs is known in advance.

  4. Using a higher-order background model:

    meme INO_up800.s -dna -mod anr -revcomp -bfile yeast.nc.6.freq

    In this example we use -mod anr and -bfile yeast.nc.6.freq. This specifies that

    1. the motif may have any number of occurrences in each sequence, and,
    2. the Markov model specified in yeast.nc.6.freq is used as the background model. This file contains a fifth-order Markov model for the non-coding regions in the yeast genome.

    Using a higher order background model can often result in more sensitive detection of motifs. This is because the background model more accurately models non-motif sequence, allowing MEME to discriminate against it and find the true motifs.

  5. A simple protein example:

    meme lipocalin.s -mod oops -maxw 20 -nmotifs 2

    The -dna switch is absent, so MEME assumes the file lipocalin.s contains protein sequences. MEME searches for two motifs each of width less than or equal to 20. (Specifying -maxw 20 makes MEME run faster since it does not have to consider motifs longer than 20.) Each motif is assumed to occur in each of the primary sequences because the oops distribution model is specified.

  6. Another simple protein example:

    meme farntrans5.s -mod anr -maxw 40 -maxsites 50

    MEME searches for a motif of width up to 40 with up to 50 occurrences in the entire set of input sequences. The anr site distribution model is specified, which allows each motif to have any number of occurrences in each sequence. This dataset contains motifs with multiple repeats of motifs in each sequence. This example is fairly time consuming due to the fact that the time required to initialize the motif probability tables is proportional to maxw × maxsites.

  7. A much faster protein example:

    meme farntrans5.s -mod anr -w 10 -maxsites 30 -nmotifs 3

    This time MEME is constrained to search for three motifs of width exactly ten. The effect is to break up the long motif found in the previous example. The -w switch forces motifs to be exactly ten letters wide. This example is much faster because, since only one width is considered, the time to build the motif probability tables is only proportional to maxsites.

  8. Splitting the sites into three:

    meme farntrans5.s -mod anr -maxw 12 -nsites 24 -nmotifs 3

    This forces each motif to have 24 occurrences, exactly, and be up to 12 letters wide.

  9. A larger protein example with E-value cutoff:

    meme adh.s -mod zoops -nmotifs 20 -evt 0.01

    In this example, MEME looks for up to 20 motifs, but stops when a motif is found with E-value greater than 0.01. Motifs with large E-values are likely to be statistical artifacts rather than biologically significant.

Citing