• sequences
    This is the group of sequences (the "training set") which you want to analyze for patterns that are shared among the sequences and/or repeat within individual sequences. All of the sequences must be either protein or DNA. You may not mix different types of sequences in the same group. A large number of sequence formats are supported. To specify the sequences you wish to analyze, you can enter either a file name or the actual sequences. Do not enter both.
  • file name
    This should be the name of a file on your computer containing a group of related sequences in an appropriate sequence format.
  • actual sequences
    You may type or cut-and-paste the sequences in an appropriate sequence format in the window provided.
  • shuffle sequence letters
    You can ask for the letters in each of your input sequences to be shuffled. This can be useful for determining if the motifs found using your (unshuffled) sequences are not statistically significant. To determine this, compare the E-value (or score) for the best motif/alignment from your original sequences with that using the "Shuffle sequence letters" option. (Keep all other parameter settings the same so that the comparison will be valid.) Compare the best E-value (or score) for the first motif in the original run to the best E-value (or score) for the first motif in the shuffled run. If they are similar (or the E-value (score) in the shuffled run tends to be better), then the motif is probably not significant.
    Repeat this process for each motif bearing in mind that the earlier motifs are most likely to be significant. Concluding that a motif is significant is more problematic. For MEME, low p-value occurrences of the motif in many sequences and in a conserved position relative to other motifs (or the ends of the sequence) is evidence that it is statistically significant. You can use the shuffle option to help determine what "low" is for the particular set of sequences and parameters you are using.