CentriMo

Centrally

Choose this mode to look for motifs that are enriched in the centers of your sequences relative to their flanks.
This mode is appropriate for ChIP-seq and other types of data where you expect the distribution of motifs to be symmetrical around the sequence centers.

Anywhere

Choose this mode to look for motifs that are enriched in some confined (local) region along the sequences.
This mode is appropriate when your sequences are all aligned on a genomic landmark such as transcription start sites or splice junctions and you expect that motifs may be enriched at specifc locations relative to the genomic landmark.
This mode can also detect centrally enriched motifs, but will have lower statistical power due to more multiple tests being performed.

[ close ]

Absolute (single dataset)

Choose this mode if you have just one set of sequences to search for enriched motifs.
In this mode, CentriMo uses the binomial test to determine if the number of sequences that have their best matches to a motif in a given region is greater than expected given that matches should be uniformly distributed along the sequence.

Absolute and Differential (two datasets)

Choose this mode if you would also like to find motifs that are more locally enriched in the first set of sequences compared to the second set.
When you choose this mode, you will be able to view the motifs sorted by either absolute enrichment (E-value) or by differential enrichment (Fisher E-value) via the Sort menu on CentriMo's report.
CentriMo first determines the regions of localized enrichment in the primary set of sequences for a given motif as in Absolute mode, and then it computes the differential enrichment of those regions.
For differential enrichment, CentriMo uses Fisher's exact test to determine if the number of best matches to the motif in the region in the primary sequences is surpringly higher than the number of best matches in the same region of the control sequences.

[ close ]

Use the menu below to choose how you wish to input your primary sequences.

Note 1: All sequences should be the same length. Suggested lengths are 500-bp for ChIP-seq, 100-bp for CLIP-seq and 1000bp for promoter regions.

Note 2: You must convert your RNA sequences to the DNA alphabet (U to T) for use with CentriMo.

See the example DNA sequences which were used to create the sample output to get an idea of input that works well for CentriMo.

[ close ]

Use the menu below to choose how you wish to input your control sequences.

Note 1: All sequences should be the same length as the primary sequences.

Note 2: You must convert your RNA sequences to the DNA alphabet (U to T) for use with CentriMo.

See the example DNA sequences to get an idea of input that works well for CentriMo.

[ close ]

Using the menu below, select the way you want to input motifs that will be tested for enrichment in your input sequences. Use the first menu below to choose how you want to input the motifs, and the second menu to choose the particular motif database you require.

[ close ]

The background model normalizes for biased distribution of individual letters in your sequences. By default CentriMo will create a 0-order Markov sequence model from the letter frequencies in the primary input sequences. You may also choose to use a uniform background model or to use the background model specified by the motifs.

Alternately you may select "Upload background" and input a file containing a background model.

The downloadable version of the MEME Suite contains a script named "fasta-get-markov" that you can use to create sequence model files in the correct format from a FASTA sequence file.

[ close ]

match either strand

Choose this mode if your sequences are DNA and you want CentriMo to consider motif matches on either strand to be equivalent.
This mode is usually appropriate for ChIP-seq and similar data.

match given strand only

Choose this mode if your sequences are RNA or if you want to CentriMo to ignore motif matches on the reverse-complement strand of DNA sequences.
This mode is usually appropriate with CLIP-seq and similar data.

separately

Choose this mode if you want to treat matches on the reverse-complement strand separately from those on the given strand.
CentriMo will produce separate site distribution plots for each motif and its reverse-complement.
This mode is is useful when your sequences have strand information, such as when they are promoter regions.

reflected

Choose this mode if you think the positions of motif matches on the reverse-complement strand should be reflected around the sequence centers.
This mode is useful with ChIP-seq data for detecting the presence of (non-palindromic) motifs for co-factors--transcription factors other than the one that was ChIP-ed.

[ close ]

Score ≥

Increase the match score threshold if want CentriMo to ignore weaker matches to motifs, or decrease it to include them in the analysis.
Sequences with no match to a given motif above the match score threshold are ignored in computating that motif's enrichment.

Optimize Score

Select option this if you want CentriMo to find the optimal match score threshold.
Independently for each motif, CentriMo will consider all thresholds above 0 and will choose the one that maximizes the statistical significance of the motif's enrichment.
This option increases running time and can reduce statistical power due to increased multiple tests.

[ close ]

Check this option and specify the maximum width for enriched regions if you have prior knowledge of what is a reasonable limit.

By default CentriMo considers regions up to one minus the maximum number of places that a given motif will fit in a sequence.

Reducing the maximum width increases the statistical power of CentriMo. and can help cut-down the multiple testing correction.

[ close ]

Reduce the E-value threshold if you want CentriMo to report only more significant motif enrichments; increase it to include less significant motif enrichments in the report.

Setting the E-value threshold to the number of motifs in the input database will cause CentriMo to report a result for every motif.

Note that if there are multiple, overlapping enriched regions, then CentriMo reports the most significant overlapping region.

[ close ]

Disable this option if you don't want CentriMo to store sequence identifiers in its output file.

Disabling this option will make the CentriMo output file smaller, but the CentriMo output will not be able to interactively show you the sizes of the intersection and union sets of sequences matching the motifs you select.

[ close ]

If your sequences are not in a standard alphabet (DNA, RNA or protein), you must input a custom alphabet file.

[ close ]

Click on the menu at the left to see which of the following motif input methods are available.

Type in motifs: When this option is available you may directly input multiple motifs by typing them (or using "cut-and-paste"). First select the desired motif alphabet using the menu immediately to the left. If you select the "Custom" option then you must provide an alphabet definition in the file input that immediately follows. Warning: custom alphabets are case-sensitive. You may optionally give each motif an identifier and alternate name by inputting a line like >Identifer Alternate-Name preceeding the motif. You can then enter each motif as either matrices, sequence sites or fixed-length regular expressions. You can enter multiple motifs by typing an empty line after each motif. Individual motifs will be shown in square brackets, and errors in your motifs will be highlighted in red while warnings will be highlighted in yellow. Mouse-over individual motifs to display their sequence logos. View the examples for more information on what is possible.
Upload motifs: When this option is available you may upload a file containing motifs in MEME motif format. This includes the outputs generated by MEME and DREME, as well as files you create using the motif conversion scripts or manually following the MEME motif format guidelines.
Databases (select category): When this option is available you can select the category of motif database desired from the list below it. Then select the motif database from the displayed list. Consult the motif database documentation for descriptions of all the motif databases present on this MEME Suite server.
Submitted motifs: This option is only available when you have invoked the current program by clicking on a button in the output report of a different MEME Suite program. By selecting this option you will input the motifs sent by that program.

[ close ]

<< back to overview

Typed Motifs - Matrices

You may input both probability and count matrices of either orientation and the rules described below will be used to convert the matrix into a MEME formatted motif.

Alphabet Order

The counts/probabilities are expected to be ordered based on the alphabetical ordering of their codes. So DNA is ordered ACGT and protein is ordered ACDEFGHIKLMNPQRSTVWY. For custom alphabets the ordering goes uppercase letters (A-Z), lowercase letters (a-z), numbers (0-9) and finally the symbols '*', '-' and '.'.

Matrix Orientation

Matrix motifs may be input with either one position per row (preferred) which is called row orientation, or one position per column which is called column orientation. The orientation is determined by picking which dimension (row or column) is equal to the alphabet size. If both dimensions are equal to the alphabet size then row orientation is assumed. If neither dimension is equal to the alphabet size then the closest that is still smaller than the alphabet size is picked, however if both are equally smaller then column orientation is assumed. Finally if none of the above rules work to determine the orientation then row orientation is assumed.

Site counts

Once the orientation is determined, the sum of the numbers that make up the first position is calculated and rounded to the nearest integer. If that value is larger than 1 then the matrix is assumed to be a count matrix and that value is used as the site count, otherwise the matrix is assumed to be a probability matrix and a site count of 20 is used.

Converting to a normalized probability matrix

Once the orientation is determined then each number in the matrix is converted to a normalized probability by dividing by the sum of all the numbers for that motif position. If any numbers are missing they are assumed to have the value zero. As a special case if all numbers in a motif position have the value zero then they are given the uniform probability of 1 / alphabet size.

Yellow highlighting and red annotations

Red asterisks (*) indicate where the parser thinks values are missing. A yellow highlighted row or column with a red number at the end indicates that the counts for that position don't sum to the same count as the first position. The red number shows the difference. If the red number is negative then that position sums to less then the first position, if it is positive then it sums to more than the first position.

[ close ]

<< back to overview

Typed Motifs - Sequence Sites or Regular Expressions

You may input one or more sites of the motif including using ambiguity codes or bracket expressions to represent multiple possibilities for a single motif position.

Ambiguity Codes

The DNA and protein alphabets include additional codes that represent multiple possible bases. For example the DNA alphabet includes W (for weak) which represents that the given position could be either a A (for adenosine) or a T (for thymidine). Note that MEME Suite regular expressions must be fixed-length, so they may not include the Kleene star character *.

Bracket Expressions

Bracket expressions also group together multiple codes so they share a single position. Their syntax is a opening square bracket '[' followed by one or more codes and a closing square bracket ']'. For example with a DNA motif the bracket expression [AT] means that both A and T are acceptable and is equivalent to the ambiguity code W. Any repeats of a base in a bracket expression are ignored so for example a DNA bracket expression [AAT] has the same effect as [AT] or [AW] or W.

Multiple sites

When only one site is provided the site count is set to 20, however you can precisely control the motif by providing multiple sites. Each of these sites can still contain ambiguity codes and bracket expressions but a single count will be divided among the selected bases for each position. When multiple sites are provided the site count will be set to the number of sites provided.

[ close ]

<< back to sequence site motifs

DNA Alphabet

DNA motifs support the standard 4 codes for the bases: adenosine (A), cytidine (C), guanosine (G) and thymidine (T) as well as supporting the following ambiguity codes.

Description	Code	Bases
Uracil	U	T
Weak	W	A, T
Strong	S	C, G
Amino	M	A, C
Keto	K	G, T
Purine	R	A, G
Pyrimidine	Y	C, T
Not A	B	C, G, T
Not C	D	A, G, T
Not G	H	A, C, T
Not T	V	A, C, G
Any	N	A, C, G, T

[ close ]

<< back to sequence site motifs

Protein Alphabet

Protein motifs support the standard 20 codes for the amino acids: Alanine (A), Arginine (R), Asparagine (N), Aspartic acid (D), Cysteine (C), Glutamic acid (E), Glutamine (Q), Glycine (G), Histidine (H), Isoleucine (I), Leucine (L), Lysine (K), Methionine (M), Phenylalanine (F), Proline (P), Serine (S), Threonine (T), Tryptophan (W), Tyrosine (Y) and Valine (V) as well as supporting the following ambiguity codes.

Description	Code	Bases
Asparagine or aspartic acid	B	N, D
Glutamine or glutamic acid	Z	E, Q
Leucine or Isoleucine	J	I, L
Unspecified or unknown amino acid	X	A, C, D, E, F, G, H, I, K, L, M, N, P, Q, R, S, T, V, W, Y

Note that the two amino acids Selenocysteine (U) and Pyrrolysine (O) are not supported by the MEME Suite.

[ close ]

<< back to overview

Typed Motifs - Examples

Single site motif using ambiguity codes N and R or bracket expressions. These give an approximation of the other motifs below.

NTRGGTCAN

or

[ACGT]T[AG]GGTCA[ACGT]

Multiple site motif. This lists all 28 sites and gives the same result as the count matrix below.

CTAGGTCAT
ATAGGTCAC
GTAGGTCAC
GTAGGGCAC
GTAGGGCAC
GTGGGTCAC
GTAGGTCAC
GTAGGTCAC
TTGGGTCAC
CTAGGTCAT
CTAGGTCAT
CTAGGTCAT
CTGGGTCAC
ATAGGTCAG
GTAGGTCAA
GTAGGTGAG
ATGGGTCAC
GTAGGTCAG
GTGGGTGAA
GTAGGGCAC
CTGGGTCAC
TTGGGTCAC
CTAGGTCAC
GAAGGTGAC
GTAGGTAAA
GTAGGTCAA
CAGCAGCTG
TAGGTCACA

Count matrix motif showing row and column orientations.

 3  8 14  3
 3  0  0 25
19  0  9  0
 0  1 27  0
 1  0 26  1
 0  1  4 23
 2 23  3  0
26  1  0  1
 5 15  4  4

or

 3  3 19  0  1  0  2 26  5
 8  0  0  1  0  1 23  1 15
14  0  9 27 26  4  3  0  4
 3 25  0  0  1 23  0  1  4

Note that all of these can be used with an identifier and alternate name like these 3 count matrix motifs from Jaspar.

>MA0001.1 SEP4
0   3   79  40  66  48  65  11  65  0
94  75  4   3   1   2   5   2   3   3
1   0   3   4   1   0   5   3   28  88
2   19  11  50  29  47  22  81  1   6
>MA0002.1 RUNX1
10  12  4   1   2   2   0   0   0   8   13
2   2   7   1   0   8   0   0   1   2   2
3   1   1   0   23  0   26  26  0   0   4
11  11  14  24  1   16  0   0   25  16  7
>MA0003.1 TFAP2A
0   0   0   22  19  55  53  19  9
0   185 185 71  57  44  30  16  78
185 0   0   46  61  67  91  137 79
0   0   0   46  48  19  11  13  19

[ close ]

When enabled this field supports selecting motifs from the file with a space separated list of motif identifiers and/or their positions in the file.

Any numbers in the range 1 to 999 are assumed to refer to the position of the selected motif in the file, so the entry "3" always refers to the third motif. Any other entry is assumed to be a motif identifier.

Motif identifiers can not start with a dash and can only contain alphanumeric characters as well as colon ':', underscore '_', dot '.' and dash '-'.

[ close ]

Select the desired motif database.

Consult the motif database documentation for descriptions of all the DNA, RNA and protein motif databases present on this MEME Suite server.

[ close ]

This option can help change the alphabet of motifs from a base alphabet to a derived alphabet.

This might be useful if you need to compare an extended DNA motif with a library of DNA motifs, or if you wish to compare RNA motifs to DNA motifs. Note that this option will also let you do nonsensical things like compare Protein motifs to DNA motifs so use it with care.

The derived alphabet must have all the core symbols of the alphabet that it is derived from. For example if the alphabet is derived from DNA it must have ACGT as core symbols. Expanding the alphabet adds frequencies of zero for every symbol in the derived alphabet that did not exist in the base alphabet.

[ close ]

Click on the menu at the left to see which of the following sequence input methods are available.

Type in sequences: When this option is available you may directly input multiple sequences by typing them. Sequences must be input in FASTA format.
Upload sequences: When this option is available you may upload a file containing sequences in FASTA format.
Upload BED file: When this option is available you may upload a file containing sequence coordinates in BED format.
Databases (select category): When this option is available you may first select a category of sequence database from the list below it. Two additional menus will then appear where you can select the particular database and version desired, respectively. The full list of available sequence databases and their descriptions can be viewed HERE.
Submitted sequences: This option is only available when you have invoked the current program by clicking on a button in the output report of a different MEME Suite program. By selecting this option you will input the sequences sent by that program.

[ close ]

Specify a file to upload containing sequence coordinates in BED format. The file must be based on the exact genome version you specified in the menus above.

[ close ]

Select an available sequence database from this menu.

[ close ]

Select an available version of the sequence database from this menu.

[ close ]

Select an available tissue/cell-specificity from this menu.

[ close ]

Selecting this option will filter the sequence menu to only contain databases that have additional information that is specific to a tissue or cell line.

This option causes MEME Suite to use tissue/cell-specific information (typically from DNase I or histone modification ChIP-seq data) encoded as a position specific prior that has been created by the MEME Suite create-priors utility. You can see a description of the sequence databases for which we provide tissue/cell-specific priors here.

Note that you cannot upload or type in your own sequences when tissue/cell-specific scanning is selected.

[ close ]

Enter the email address where you want the job notification email to be sent. Please check that this is a valid email address!

The notification email will include a link to your job results.

Note: You can also access your jobs via the Recent Jobs menu on the left of all MEME Suite input pages. That menu only keeps track of jobs submitted during the current session of your internet browser.

Note: Most MEME Suite servers only store results for a couple of days. So be sure to download any results you wish to keep.

[ close ]

Enter text naming or describing this analysis. The job description will be included in the notification email you receive and in the job output.

[ close ]

Typed Motifs - Matrices

Alphabet Order

Matrix Orientation

Site counts

Converting to a normalized probability matrix

Yellow highlighting and red annotations

Typed Motifs - Sequence Sites or Regular Expressions

Ambiguity Codes

Bracket Expressions

Multiple sites

DNA Alphabet

Protein Alphabet

Typed Motifs - Examples

Local Motif Enrichment Analysis

Select the kind of local motif enrichment to search for

Select whether to perform differential enrichment analysis

Select the sequence alphabet

Use sequences with a standard alphabet or specify a custom alphabet.

Input the primary sequences

Enter the nucleotide sequences you want to search for enriched motifs.

Select the BED file to upload.

Input the control sequences

Enter the control sequences for differential enrichment analysis.

Select the BED file to upload.

Input the motifs

Select a motif database or enter the motifs you wish to test for enrichment.

Input job details

(Optional) Enter your email address.

(Optional) Enter a job description.

Select how to treat the reverse-complement strand

Choose the match score threshold (bits)

Set the maximum width of enriched regions

Set the E-value threshold for reporting enriched regions

Include/supress sequence IDs

What should be used as the background model?