If your sequences are not in a standard alphabet (DNA, RNA or protein), you must input a custom alphabet file.
Specify a file to upload containing sequence coordinates in BED format. The file must be based on the exact genome version you specified in the menus above.
Select an available sequence database from this menu.
Select an available version of the sequence database from this menu.
Select an available tissue/cell-specificity from this menu.
Selecting this option will filter the sequence menu to only contain databases that have additional information that is specific to a tissue or cell line.
This option causes MEME Suite to use tissue/cell-specific information (typically from DNase I or histone modification ChIP-seq data) encoded as a position specific prior that has been created by the MEME Suite create-priors utility. You can see a description of the sequence databases for which we provide tissue/cell-specific priors here.
Note that you cannot upload or type in your own sequences when tissue/cell-specific scanning is selected.
Enter text naming or describing this analysis. The job description will be included in the notification email you receive and in the job output.
You provide one set of sequences and MEME discovers motifs enriched in this set. Enrichment is measured relative to a (higher order) random model based on frequencies of the letters in your sequences, or relative to the frequencies given in a "Custom background model" that you may provide (see Advanced options).
You provide two sets of sequences and MEME discovers motifs that
are enriched in the first (primary) set relative to the second (control) set.
In Discriminative mode, we first calculate a
position-specific prior
from the two sets of sequences.
MEME then searches the first set of sequences for motifs using the
position-specific prior to inform the search. This approach is based on the
simple discriminative prior "D" described in Section 3.5 of
Narlikar et al.
We modify their approach to search for the "best" initial motif width, and
to handle protein sequences using
spaced triples.
Refer to the psp-gen documentation
and to our paper
for more details.
You provide two sets of sequences and MEME discovers motifs that are enriched in the first (primary) set relative to the second (control) set. In Differential Enrichment mode, MEME optimizes an objective function based on the hypergeometric distribution to determine the relative enrichment of sites in the primary sequences compared to the control sequences.
Position-specific priors (PSPs) assign a probability that a motif starts at each possible location in your sequence data. MEME uses PSPs to guide its search, biasing the search towards sites that have higher values in the PSP. MEME creates a PSP when you use it in "Discriminative mode", up-weighting words in the primary dataset that occur frequently there but are infrequent in the negative dataset.
Spaced triples are sub-sequences in which only the first and last letter (residue or amino acid for protein) and one interior letter are used in matches. For example, the subsequence MTFEKI contains the following triples:
MT...I M.F..I M..E.I M...KIwhere "." matches anything. We use spaced triples for protein because the probability of exact matches is much lower than for DNA due to the much larger amino acid alphabet.
To score a word using spaced triples, we count how often each triple contained in the word occurs in the primary and control sequence sets, and use the maximum over all triples as the word count in the formula for scoring words described by Narlikar et al.
Please enter sequences that you believe share one or more motifs. When running MEME in "Discriminative" or Differential Enrichment" mode, this set of sequences is referred to as the "primary sequence set".
There may be at most 500,000 (primary) sequences in FASTA format. There is also a limit of 80,000,000 bytes for the entire contents of the input form.
See the example DNA sequences which were used to create the sample output.
Please enter sequences that you believe contain patterns you wish to avoid making motifs from. This set of sequences is referred to as the "control sequence set.
The control sequence set should contain sequences that are in some sense a contrast to likely sites for motifs (e.g. sequences rejected as unlikely to contain a transcription factor binding site), but otherwise similar to the primary sequence set.
There may be at most 500,000 control sequences in FASTA format. There is also a limit of 80,000,000 bytes for the entire contents of the input form.
You can use a background model with MEME in order to normalize for biased distribution of letters and groups of letters in your sequences. A 0-order model adjusts for single letter biases, a 1-order model adjusts for dimer biases (e.g., GC content in DNA sequences), etc.
By default MEME will use a the letter frequencies in the primary sequence set to create a 0-order background model. Alternatively, you may select 'Upload background model' and you can then specify here a file containing a background model in a simple format.
The downloadable version of the MEME Suite also contains a program named fasta-get-markov that you can use to create background model files in the correct format from FASTA sequence files.
This is where you tell MEME how you believe occurrences of the motifs are distributed among the sequences. Selecting the correct type of distribution improves the sensitivity and quality of the motif search.
MEME will keep searching until it finds this many motifs or until it exceeds one of its other thresholds (e.g., maximum run time). Note that unlike DREME, MEME does not use an E-value threshold, so you should always check the E-value of any motifs discovered by MEME.
This is the width (number of characters in the sequence pattern) of a single motif. MEME chooses the optimal width of each motif individually using a heuristic function. You can choose limits for the minimum and maximum motif widths that MEME will consider. The width of each motif that MEME reports will lie within the limits you choose.
This is the total number of sites in the primary sequence set where a single motif occurs. You can choose limits for the minimum and maximum number of occurrences that MEME will consider. If you have prior knowledge about the number of occurrences that motifs have in your primary sequence set, limiting MEME's search in this way can can increase the likelihood of MEME finding true motifs.
MEME chooses the number of occurrences to report for each motif by optimizing a heuristic function, restricting the number of occurrences to the range you give here.
If you do not select one of these fields, MEME uses the following defaults for the range of the number of motif sites, where "n" is the number of sequences in the primary sequence set:
Distribution | Minimum | Maximum |
---|---|---|
Zero or One Occurrence per Sequence | sqrt(n) | n |
One Occurrence per Sequence | n | n |
Any Number of Repetitions | sqrt(n) | min(5*n, 600) |
Checking this box instructs MEME to NOT check the reverse complement of the input sequences for motif sites when analyzing DNA or RNA sequences.
Note: When your sequences are RNA, you should select this option to ensure that only the given strand is searched for motifs.
Checking this box causes MEME to search only for DNA palindromes.
This causes MEME to average the letter frequencies in corresponding motif columns together. For instance, if the width of the motif is 10, columns 1 and 10, 2 and 9, 3 and 8, etc., are averaged together. The averaging combines the frequency of A in one column with T in the other, and the frequency of C in one column with G in the other. If this box is not checked, the columns are not averaged together.
Checking this box causes MEME to shuffle each of the primary sequences individually. The sequences will still be the same length and have the same character frequencies but any existing patterns will be obliterated.
Using this option repeatedly you can get an idea of the E-values of motifs discovered in "random" sequence datasets similar to your primary dataset. This can help you determine a reasonable E-value cutoff for motifs discovered in your unshuffled primary sequence dataset.