MEME -- Multiple EM for Motif Elicitation

Motif discovery tool


The input to MEME contains the following fields.

  • e-mail address
    This is the address where the confirmation message will be sent. Make sure this is a valid e-mail address.
  • description[optional]
    This will be included in the subject line of the confirmation message you will be sent. You may only use the following characters in your description:
    a-zA-Z0-9:;-_"()<>%
    All other characters will be converted to blanks. This field is optional so you may leave it blank if you wish.
  • sequences
    This is the group of sequences (the "training set") which you want to analyze for patterns that are shared among the sequences and/or repeat within individual sequences. All of the sequences must be either protein or DNA. You may not mix different types of sequences in the same group. A large number of sequence formats are supported. To specify the sequences you wish to analyze, you can enter either a file name or the actual sequences. Do not enter both.
    • file name This should be the name of a file on your computer containing a group of related sequences in an appropriate sequence format.
    • actual sequences You may type or cut-and-paste the sequences in an appropriate sequence format in the window provided here.
  • motif distribution
    This is where you tell MEME how you believe occurrences of the motifs are distributed among the sequences. Selecting the correct type of distribution improves the sensitivity and quality of the motif search.
  • number of motifs
    MEME will look for up to this number of distinct motifs in the training set. MEME will stop when this number of motifs has been found, or when none can be found with E-value less than 10000.
  • number of sites[optional]
    This is the total number of sites in the training set where a single motif occurs. You can choose different limits for the minimum and maximum number of occurrences that MEME will consider. If you have prior knowledge about the number of occurrences that motifs have in your trainig set, limiting MEME's search in this way can can increase the likelihood of MEME finding true motifs.
    For example, if you know that each motif is likely to occur at least 5 times but no more than 8 times in the training set, you could specify:
    Minimum sites = 5
    Maximum sites = 8
    MEME will then only report motifs with between 5 and 8 occurrences, inclusive, in the training set. (The same range [5,...,8] will apply, separately, to each motif MEME finds.) MEME may still find motifs with slightly fewer or more occurrences then those you specify. In the above example, if there is a motif in the training set with only 4 occurrences, MEME may still find it, but it will report 5 occurrences, one of which will be erroneous. Likewise, if a motif in the training set has 9 occurrences, MEME will probably still find it, but it will report only 8 of its occurrences.
    MEME chooses the number of occurrences to report for each motif by optimizing a statistical heuristic function, restricting the number of occurrences to the range you give here, or using defaults described below if you leave these fields blank.
    Note 1:
    Leave these fields blank if you have selected "One per sequence" in the "How you think the occurrences... are distributed" field. In that case, each sequence must have exactly one occurrence of each motif. It also doesn't make any sense to set the maximum number of sites to a number larger than the number of sequences if you have chosen the "Zero or one per sequence distribution. In that case, there can never be more occurrences of a motif than there are sequences.
    Note 2:
    These fields are optional. If you leave them blank, MEME will choose limits depending on the type of occurrence distribution you have specified and the number of sequences (n) in the training set. MEME will also override your settings if they conflict with the type of distribution you have chosen (see Note 1, above).
    Default Numbers of Sites for each Motif
    type of distribution minimum sites maximum sites
    one occurrence per sequence n n
    zero or one occurrence per sequence sqrt(n) n
    any number of repetitions per sequence sqrt(n) min(5*n, 50)
  • motif width
    This is the width (number of characters in the sequence pattern) of a single motif. MEME chooses the optimal width of each motif individually using a statistical heuristic function. You can choose different limits for the minimum and maximum motif widths that MEME will consider. The width of each motif that MEME reports will lie within the limits you choose.
  • use a different Markov background model
    You can provide a Markov background model for MEME to use as its model of "random" sequences. The format and effect of the background model file is described under the topic "BACKGROUND MODEL" on the MEME "man page".
    The downloadable version of MEME contains a script named "fasta-get-markov" that you can use to create background model files in the correct format from FASTA sequence files.
  • shuffle letters in input sequences
    You can ask for the letters in each of your input sequences to be shuffled. This can be useful for determining if the motifs found using your (unshuffled) sequences are not statistically significant. To determine this, compare the E-value (or score) for the best motif/alignment from your original sequences with that using the "Shuffle letters in input sequences" option. (Keep all other parameter settings the same so that the comparison will be valid.) Compare the best E-value (or score) for the first motif in the original run to the best E-value (or score) for the first motif in the shuffled run. If they are similar (or the E-value (score) in the shuffled run tends to be better), then the motif is probably not significant. Repeat this process for each motif bearing in mind that the earlier motifs are most likely to be significant. Concluding that a motif is significant is more problematic. For MEME, low p-value occurrences of the motif in many sequences and in a conserved position relative to other motifs (or the ends of the sequence) is evidence that it is statistically significant. You can use the shuffle option to help determine what "low" is for the particular set of sequences and MEME parameters you are using.
  • search given strand only
    MEME searches for motifs on both the given DNA strand and the reverse complement strand by default. Checking this box will cause MEME to search the given DNA strand only.
  • look for palindromes only
    Checking this box causes MEME to search only for DNA palindromes. This causes MEME to average the letter frequencies in corresponding motif columns together. For instance, if the width of the motif is 10, columns 1 and 10, 2 and 9, 3 and 8, etc., are averaged together. The averaging combines the frequency of A in one column with T in the other, and the frequency of C in one column with G in the other. If this box is not checked, the columns are not averaged together. (SDSC) and analyzed by MEME for repeated patterns.
  • Clicking on the Start search button causes your sequences to be sent to San Diego Supercomputer Center