If your sequences are not in a standard alphabet (DNA, RNA or protein), you must input a custom alphabet file.

[ close ]

Click on the menu at the left to see which of the following motif input methods are available.

Type in motifs
When this option is available you may directly input multiple motifs by typing them (or using "cut-and-paste").  First select the desired motif alphabet using the menu immediately to the left. If you select the "Custom" option then you must provide an alphabet definition in the file input that immediately follows. Warning: custom alphabets are case-sensitive.  You may optionally give each motif an identifier and alternate name by inputting a line like >Identifer Alternate-Name preceeding the motif.  You can then enter each motif as either matrices, sequence sites or regular expressions.  You can enter multiple motifs by typing an empty line after each motif.  Individual motifs will be shown in square brackets, and errors in your motifs will be highlighted in red while warnings will be highlighted in yellow.  Mouse-over individual motifs to display their sequence logos.  View the examples for more information on what is possible.
Upload motifs
When this option is available you may upload a file containing motifs in MEME motif format.  This includes the outputs generated by MEME and DREME, as well as files you create using the motif conversion scripts or manually following the MEME motif format guidelines.
Databases (select category)
When this option is available you can select the category of motif database desired from the list below it. Then select the motif database from the displayed list.  Consult the motif database documentation for descriptions of all the motif databases present on this MEME Suite server.
Submitted motifs
This option is only available when you have invoked the current program by clicking on a button in the output report of a different MEME Suite program.  By selecting this option you will input the motifs sent by that program.
[ close ]
<< back to overview

Typed Motifs - Matrices

You may input both probability and count matrices of either orientation and the rules described below will be used to convert the matrix into a MEME formatted motif.

Alphabet Order

The counts/probabilities are expected to be ordered based on the alphabetical ordering of their codes.  So DNA is ordered ACGT and protein is ordered ACDEFGHIKLMNPQRSTVWY. For custom alphabets the ordering goes uppercase letters (A-Z), lowercase letters (a-z), numbers (0-9) and finally the symbols '*', '-' and '.'.

Matrix Orientation

Matrix motifs may be input with either one position per row (preferred) which is called row orientation, or one position per column which is called column orientation.  The orientation is determined by picking which dimension (row or column) is equal to the alphabet size.  If both dimensions are equal to the alphabet size then row orientation is assumed.  If neither dimension is equal to the alphabet size then the closest that is still smaller than the alphabet size is picked, however if both are equally smaller then column orientation is assumed.  Finally if none of the above rules work to determine the orientation then row orientation is assumed.

Site counts

Once the orientation is determined, the sum of the numbers that make up the first position is calculated and rounded to the nearest integer.  If that value is larger than 1 then the matrix is assumed to be a count matrix and that value is used as the site count, otherwise the matrix is assumed to be a probability matrix and a site count of 20 is used.

Converting to a normalized probability matrix

Once the orientation is determined then each number in the matrix is converted to a normalized probability by dividing by the sum of all the numbers for that motif position.  If any numbers are missing they are assumed to have the value zero.  As a special case if all numbers in a motif position have the value zero then they are given the uniform probability of 1 / alphabet size.

Yellow highlighting and red annotations

Red asterisks (*) indicate where the parser thinks values are missing.  A yellow highlighted row or column with a red number at the end indicates that the counts for that position don't sum to the same count as the first position. The red number shows the difference. If the red number is negative then that position sums to less then the first position, if it is positive then it sums to more than the first position.

[ close ]
<< back to overview

Typed Motifs - Sequence Sites or Regular Expressions

You may input one or more sites of the motif including using ambiguity codes or bracket expressions to represent multiple possibilities for a single motif position.

Ambiguity Codes

The DNA and protein alphabets include additional codes that represent multiple possible bases. For example the DNA alphabet includes W (for weak) which represents that the given position could be either a A (for adenosine) or a T (for thymidine).

Bracket Expressions

Bracket expressions also group together multiple codes so they share a single position.  Their syntax is a opening square bracket '[' followed by one or more codes and a closing square bracket ']'. For example with a DNA motif the bracket expression [AT] means that both A and T are acceptable and is equivalent to the ambiguity code W.  Any repeats of a base in a bracket expression are ignored so for example a DNA bracket expression [AAT] has the same effect as [AT] or [AW] or W.

Multiple sites

When only one site is provided the site count is set to 20, however you can precisely control the motif by providing multiple sites.  Each of these sites can still contain ambiguity codes and bracket expressions but a single count will be divided among the selected bases for each position.  When multiple sites are provided the site count will be set to the number of sites provided.

[ close ]
<< back to sequence site motifs

DNA Alphabet

DNA motifs support the standard 4 codes for the bases: adenosine (A), cytidine (C), guanosine (G) and thymidine (T) as well as supporting the following ambiguity codes.

DescriptionCodeBases
UracilUT
WeakWA, T
StrongSC, G
AminoMA, C
KetoKG, T
PurineRA, G
PyrimidineYC, T
Not ABC, G, T
Not CDA, G, T
Not GHA, C, T
Not TVA, C, G
AnyNA, C, G, T
[ close ]
<< back to sequence site motifs

Protein Alphabet

Protein motifs support the standard 20 codes for the amino acids: Alanine (A), Arginine (R), Asparagine (N), Aspartic acid (D), Cysteine (C), Glutamic acid (E), Glutamine (Q), Glycine (G), Histidine (H), Isoleucine (I), Leucine (L), Lysine (K), Methionine (M), Phenylalanine (F), Proline (P), Serine (S), Threonine (T), Tryptophan (W), Tyrosine (Y) and Valine (V) as well as supporting the following ambiguity codes.

DescriptionCodeBases
Asparagine or aspartic acidBN, D
Glutamine or glutamic acidZE, Q
Leucine or IsoleucineJI, L
Unspecified or unknown amino acidXA, C, D, E, F, G, H, I, K, L, M, N, P, Q, R, S, T, V, W, Y

Note that the two amino acids Selenocysteine (U) and Pyrrolysine (O) are not supported by the MEME Suite.

[ close ]
<< back to overview

Typed Motifs - Examples

Single site motif using ambiguity codes N and R or bracket expressions. These give an approximation of the other motifs below.

or

Multiple site motif. This lists all 28 sites and gives the same result as the count matrix below.

Count matrix motif showing row and column orientations.

or

Note that all of these can be used with an identifier and alternate name like these 3 count matrix motifs from Jaspar.

[ close ]

When enabled this field supports selecting motifs from the file with a space separated list of motif identifiers and/or their positions in the file.

Any numbers in the range 1 to 999 are assumed to refer to the position of the selected motif in the file, so the entry "3" always refers to the third motif.  Any other entry is assumed to be a motif identifier.

Motif identifiers can not start with a dash and can only contain alphanumeric characters as well as colon ':', underscore '_', dot '.' and dash '-'.

[ close ]

Select the desired motif database.

Consult the motif database documentation for descriptions of all the DNA, RNA and protein motif databases present on this MEME Suite server.

[ close ]

This option can help change the alphabet of motifs from a base alphabet to a derived alphabet.

This might be useful if you need to compare an extended DNA motif with a library of DNA motifs, or if you wish to compare RNA motifs to DNA motifs.  Note that this option will also let you do nonsensical things like compare Protein motifs to DNA motifs so use it with care.

The derived alphabet must have all the core symbols of the alphabet that it is derived from. For example if the alphabet is derived from DNA it must have ACGT as core symbols. Expanding the alphabet adds frequencies of zero for every symbol in the derived alphabet that did not exist in the base alphabet.

[ close ]
Click on the menu at the left to see which of the following sequence input methods are available.
Type in sequences
When this option is available you may directly input multiple sequences by typing them. Sequences must be input in FASTA format.
Upload sequences
When this option is available you may upload a file containing sequences in FASTA format.
Upload BED file new
When this option is available you may upload a file containing sequence coordinates in BED format.
Databases (select category)
When this option is available you may first select a category of sequence database from the list below it. Two additional menus will then appear where you can select the particular database and version desired, respectively. The full list of available sequence databases and their descriptions can be viewed HERE.
Submitted sequences
This option is only available when you have invoked the current program by clicking on a button in the output report of a different MEME Suite program. By selecting this option you will input the sequences sent by that program.
[ close ]

Specify a file to upload containing sequence coordinates in BED format. The file must be based on the exact genome version you specified in the menus above.

[ close ]

Select an available sequence database from this menu.

[ close ]

Select an available version of the sequence database from this menu.

[ close ]

Select an available tissue/cell-specificity from this menu.

[ close ]

Selecting this option will filter the sequence menu to only contain databases that have additional information that is specific to a tissue or cell line.

This option causes MEME Suite to use tissue/cell-specific information (typically from DNase I or histone modification ChIP-seq data) encoded as a position specific prior that has been created by the MEME Suite create-priors utility. You can see a description of the sequence databases for which we provide tissue/cell-specific priors here.

Note that you cannot upload or type in your own sequences when tissue/cell-specific scanning is selected.

[ close ]

Enter text naming or describing this analysis. The job description will be included in the notification email you receive and in the job output.

[ close ]
Classic mode

You provide one set of sequences MEME-ChIP reports motifs enriched in this set. MEME discovers motifs enriched relative to a random model based on frequencies of the letters in your sequences, or relative to the frequencies given in a "background model" that you may provide (see "Universal options"). STREME discovers motifs that are enriched in your sequences relative to a control set of sequences that STREME creates by shuffling each of your sequences, conserving dinucleotide frequencies. CentriMo identifies motifs enriched in a central (or uncentered, see "CentriMo options") region relative to the flanking regions.

Discriminative mode

You provide two sets of sequences and MEME-ChIP reports motifs that are enriched in the first (primary) set relative to the second (control) set. MEME motifs use the classic MEME objective function with a position-specific prior created from the primary and control sequences using using psp-gen. Note:The sequences in the primary and control sets should all be the same length.

Differential Enrichment mode

You provide two sets of sequences and MEME-ChIP discovers motifs that are enriched in the first (primary) set relative to the second (control) set. In Differential Enrichment mode, MEME motifs are discovered using an objective function based on the hypergeometric distribution to determine the relative enrichment of sites in the primary sequences compared to the control sequences.

[ close ]

The primary sequences should all be the same length. The recommended length for ChIP-seq sequences is 500 bp centered on the summit (or center if the summit is not known) of a peak. MEME-ChIP works best with sequences no longer than 2000 bp.

There may be at most 500,000 (primary) sequences in FASTA format. There is also a limit of 80,000,000 bytes for the entire contents of the input form.

MEME-ChIP can analyze peak regions identified by ChIP-seq, cross-linking sites identified by CLIP-seq and related assays, as well as sets of genomic regions selected using other criteria (e.g., TSSs of differentially expressed or bound genes).

See the example DNA sequences which were used to create the sample output.

[ close ]

The control sequences should all be the same length as the primary sequences. The recommended length for ChIP-seq sequences is 500 bp centered on the summit (or center if the summit is not known) of a peak. MEME-ChIP works best with sequences no longer than 2000 bp.

There may be at most 500,000 control sequences in FASTA format. There is also a limit of 80,000,000 bytes for the entire contents of the input form.

If the primary sequences are ChIP-seq peak regions from a transcription factor ChIP-seq experiment, similar regions from a knockout cell line or organism, are a possible choice for control sequences. The control sequences should be prepared in exactly the same way (e.g., repeat-masking) as the primary sequences.

[ close ]

When this option is selected, if the FASTA sequence header of an input sequence contains genomic coordinates in UCSC or Galaxy format the discovered motif sites will be output in genomic coordinates. If the sequence header does not contain valid coordinates, the sites will be output with the start of the sequences as position 1.

[ close ]

MEME-ChIP will use this set of motifs for motif enrichment analysis and will also report if any motifs that it discovers in your sequences are similar to any motifs in this set.

[ close ]

This is the width (number of letters in the sequence pattern) of a single motif. MEME and STREME choose the optimal width of each motif individually using a statistical heuristic function. You can choose different limits for the minimum and maximum motif widths that MEME and STREME will consider. The width of each motif that MEME and STREME report will lie within the limits you choose.

[ close ]

When your sequences are in the DNA alphabet but you want them to be treated as single-stranded RNA, check this box.

[ close ]

The background model normalizes for biased distribution of letters and groups of letters in your sequences. A 0-order model adjusts for single letter biases, a 1-order model adjusts for dimer biases (e.g., GC content in DNA sequences), etc.

By default MEME-ChIP will determine the background Markov model from the primary sequences (or from the control sequences if you provide them). You may select a Markov model order of 0 to 4. Alternately you may select "Upload background model" and input a file containing a background model.

The downloadable version of the MEME Suite contains a program named "fasta-get-markov" that you can use to create background model files in the correct format from FASTA sequence files.

[ close ]

This is where you tell MEME how you believe occurrences of the motifs are distributed among the sequences. Selecting the correct type of distribution improves the sensitivity and quality of the motif search.

Zero or one occurrence per sequence
MEME assumes that each sequence may contain at most one occurrence of each motif. This option is useful when you suspect that some motifs may be missing from some of the sequences. In that case, the motifs found will be more accurate than using the one occurrence per sequence option. This option takes more computer time than the one occurrence per sequence option (about twice as much) and is slightly less sensitive to weak motifs present in all of the sequences.
One occurrence per sequence
MEME assumes that each sequence in the dataset contains exactly one occurrence of each motif. This option is the fastest and most sensitive but the motifs returned by MEME may be "blurry" if any of the sequences is missing them.
Any number of repetitions
MEME assumes each sequence may contain any number of non-overlapping occurrences of each motif. This option is useful when you suspect that motifs repeat multiple times within a single sequence. In that case, the motifs found will be much more accurate than using one of the other options. This option can also be used to discover repeats within a single sequence. This option takes much more computer time than the one occurrence per sequence option (about ten times as much) and is somewhat less sensitive to weak motifs which do not repeat within a single sequence than the other two options.
[ close ]

MEME will keep searching until it finds this many motifs or it hits some other threshold like the maximum run time. Note that MEME does not use an p-value threhold like STREME, so you should always check the E-value of any found motifs.

[ close ]

This is the total number of sites in the training set where a single motif occurs. You can choose different limits for the minimum and maximum number of occurrences that MEME will consider. If you have prior knowledge about the number of occurrences that motifs have in your training set, limiting MEME's search in this way can can increase the likelihood of MEME finding true motifs.

MEME chooses the number of occurrences to report for each motif by optimizing a statistical heuristic function, restricting the number of occurrences to the range you give here, or using defaults described below if you leave these fields deselected.

DistributionMinimumMaximum
Zero or one occurrence per sequencesqrt(n)n
One occurrence per sequencenn
Any number of repetitionssqrt(n)min(5*n, 600)
[ close ]

Checking this box causes MEME to search only for DNA palindromes. This causes MEME to average the letter frequencies in corresponding motif columns together. For instance, if the width of the motif is 10, columns 1 and 10, 2 and 9, 3 and 8, etc., are averaged together. The averaging combines the frequency of A in one column with T in the other, and the frequency of C in one column with G in the other. If this box is not checked, the columns are not averaged together.

[ close ]

If your (primary) sequences are sorted in order of confidence (best to worst) then you should select this option. This will cause MEME not to randomize the the order of the (primary) sequences before sampling starting points if there are more than 1000 sequences. See the MEME documentation for the -norand option for more details.

[ close ]

STREME stops looking for motifs when one of these limits is met. There is an additional time limit which is set by the server operator.

p-value threshold
STREME stops when three consecutive motifs are found whose p-values exceed this threshold.
Number of motifs
STREME stops if it has found this many motifs. By default STREME does not limit the number of motifs to be found.
[ close ]

Specify a minimum score for a match to be considered. If a sequence does not have any matches which meet this minimum score for a given motif, then that sequence will not be considered for that motif.

[ close ]

This option limits the maximum region size that CentriMo will test. This option is useful if your sequences are quite long (> 500 bp) or you are interested only in narrow regions of enrichment. Limiting the size of the maximum region reduces the impact of the multiple testing correction, increasing the sensitivity of the analysis. When this option is not supplied CentriMo will test region sizes up to one less than the maximum number of places that a given motif can align to the sequence.

[ close ]

This is the E-value threshold CentriMo uses for reporting enriched central regions for motifs. If multiple enriched regions overlap then the region with the best p-value and smallest size will be output.

[ close ]

This option causes all regions up to the maximum region size to be considered even if they are not in the center. This can be useful when your sequences are aligned on a genomic landmark (e.g., TSS) since a motif might be enriched at a particular distance upstream or downstream of the landmark.

[ close ]

This option causes CentriMo to store the identifiers (IDs) of sequences that have their best match in the most enriched region for each motif. This will allow you to easily extract the IDs of the sequences contributing to the enrichment of one or more motifs in the CentriMo output. This option makes the CentriMo output file much larger.

[ close ]

Data Submission Form

Perform motif discovery, motif enrichment analysis and clustering on large nucleotide datasets.

Select the motif discovery and enrichment mode

Select the sequence alphabet

Use sequences with a standard alphabet or specify a custom alphabet.

Input the primary sequences

Enter the (equal-length) nucleotide sequences to be analyzed.


Specify the genome your BED file is based on.

Select the BED file to upload.

Input the motifs

Select, upload or enter a set of known motifs.


Input job details

(Optional) Enter your email address.

(Optional) Enter a job description.

Universal options hidden modifications! [Reset]   

What should be used as the background model?

What width motifs should MEME and STREME find?

Should STREME and FIMO parse genomic coordinates?

new
MEME options hidden modifications! [Reset]   

What is the expected motif site distribution?

How many motifs should MEME find?

How many sites per motif is acceptable?

Should MEME restrict the search to palindromes?

Should MEME randomize the sequence order?

STREME options hidden modifications! [Reset]   

How should STREME limit its search?

How should sequences be aligned for site positional diagrams?

CentriMo options hidden modifications! [Reset]   

What is the threshold for a motif match (bits)?

What is the maximum allowed width of an enriched region?

What is the E-value threshold for an enriched region?

Should CentriMo find non-central enriched regions?

Should CentriMo output include the IDs of sequences with a motif match?

Warning: Your maximum job quota has been reached! You will need to wait until one of your jobs completes or 1 second has elapsed before submitting another job.

This server has the job quota set to 10 unfinished jobs every 1 hour.

Note: if the combined form inputs exceed 80MB the job will be rejected.