sequences
This is the group of sequences (the "training set") which you want
to analyze for patterns that are shared among the sequences and/or
repeat within individual sequences. All of the sequences must be
either protein or DNA. You may not mix different types of sequences
in the same group. A large number of sequence formats are supported. To specify the
sequences you wish to analyze, you can enter either a file name or
the actual sequences. Do not enter both.
file name
This should be the name of a file on your
computer containing a group of related sequences in an appropriate
sequence format.
actual sequences
You may type or cut-and-paste the
sequences in an appropriate sequence format in the window provided.
shuffle sequence letters
You can ask for the letters in each of your input sequences to be
shuffled. This can be useful for determining if the motifs found
using your (unshuffled) sequences are not statistically
significant. To determine this, compare the E-value (or score) for
the best motif/alignment from your original sequences with that
using the "Shuffle sequence letters" option. (Keep all
other parameter settings the same so that the comparison will be
valid.) Compare the best E-value (or score) for the first motif in
the original run to the best E-value (or score) for the first motif
in the shuffled run. If they are similar (or the E-value (score) in
the shuffled run tends to be better), then the motif is probably
not significant.
Repeat this process for each motif bearing in mind
that the earlier motifs are most likely to be significant.
Concluding that a motif is significant is more problematic. For
MEME, low p-value occurrences of the motif in many sequences and in
a conserved position relative to other motifs (or the ends of the
sequence) is evidence that it is statistically significant. You can
use the shuffle option to help determine what "low" is for the
particular set of sequences and parameters you are using.