If your sequences are not in a standard alphabet (DNA, RNA or protein), you must input a custom alphabet file.
Click on the menu at the left to see which of the following motif input methods are available.
You may input both probability and count matrices of either orientation and the rules described below will be used to convert the matrix into a MEME formatted motif.
The counts/probabilities are expected to be ordered based on the alphabetical ordering of their codes. So DNA is ordered ACGT and protein is ordered ACDEFGHIKLMNPQRSTVWY. For custom alphabets the ordering goes uppercase letters (A-Z), lowercase letters (a-z), numbers (0-9) and finally the symbols '*', '-' and '.'.
Matrix motifs may be input with either one position per row (preferred) which is called row orientation, or one position per column which is called column orientation. The orientation is determined by picking which dimension (row or column) is equal to the alphabet size. If both dimensions are equal to the alphabet size then row orientation is assumed. If neither dimension is equal to the alphabet size then the closest that is still smaller than the alphabet size is picked, however if both are equally smaller then column orientation is assumed. Finally if none of the above rules work to determine the orientation then row orientation is assumed.
Once the orientation is determined, the sum of the numbers that make up the first position is calculated and rounded to the nearest integer. If that value is larger than 1 then the matrix is assumed to be a count matrix and that value is used as the site count, otherwise the matrix is assumed to be a probability matrix and a site count of 20 is used.
Once the orientation is determined then each number in the matrix is converted to a normalized probability by dividing by the sum of all the numbers for that motif position. If any numbers are missing they are assumed to have the value zero. As a special case if all numbers in a motif position have the value zero then they are given the uniform probability of 1 / alphabet size.
Red asterisks (*) indicate where the parser thinks values are missing. A yellow highlighted row or column with a red number at the end indicates that the counts for that position don't sum to the same count as the first position. The red number shows the difference. If the red number is negative then that position sums to less then the first position, if it is positive then it sums to more than the first position.
You may input one or more sites of the motif including using ambiguity codes or bracket expressions to represent multiple possibilities for a single motif position.
The DNA and protein alphabets include additional codes that represent multiple possible bases. For example the DNA alphabet includes W (for weak) which represents that the given position could be either a A (for adenosine) or a T (for thymidine).
Bracket expressions also group together multiple codes so they share a single position. Their syntax is a opening square bracket '[' followed by one or more codes and a closing square bracket ']'. For example with a DNA motif the bracket expression [AT] means that both A and T are acceptable and is equivalent to the ambiguity code W. Any repeats of a base in a bracket expression are ignored so for example a DNA bracket expression [AAT] has the same effect as [AT] or [AW] or W.
When only one site is provided the site count is set to 20, however you can precisely control the motif by providing multiple sites. Each of these sites can still contain ambiguity codes and bracket expressions but a single count will be divided among the selected bases for each position. When multiple sites are provided the site count will be set to the number of sites provided.
DNA motifs support the standard 4 codes for the bases: adenosine (A), cytidine (C), guanosine (G) and thymidine (T) as well as supporting the following ambiguity codes.
Description | Code | Bases |
---|---|---|
Uracil | U | T |
Weak | W | A, T |
Strong | S | C, G |
Amino | M | A, C |
Keto | K | G, T |
Purine | R | A, G |
Pyrimidine | Y | C, T |
Not A | B | C, G, T |
Not C | D | A, G, T |
Not G | H | A, C, T |
Not T | V | A, C, G |
Any | N | A, C, G, T |
Protein motifs support the standard 20 codes for the amino acids: Alanine (A), Arginine (R), Asparagine (N), Aspartic acid (D), Cysteine (C), Glutamic acid (E), Glutamine (Q), Glycine (G), Histidine (H), Isoleucine (I), Leucine (L), Lysine (K), Methionine (M), Phenylalanine (F), Proline (P), Serine (S), Threonine (T), Tryptophan (W), Tyrosine (Y) and Valine (V) as well as supporting the following ambiguity codes.
Description | Code | Bases |
---|---|---|
Asparagine or aspartic acid | B | N, D |
Glutamine or glutamic acid | Z | E, Q |
Leucine or Isoleucine | J | I, L |
Unspecified or unknown amino acid | X | A, C, D, E, F, G, H, I, K, L, M, N, P, Q, R, S, T, V, W, Y |
Note that the two amino acids Selenocysteine (U) and Pyrrolysine (O) are not supported by the MEME Suite.
Single site motif using ambiguity codes N and R or bracket expressions. These give an approximation of the other motifs below.
Multiple site motif. This lists all 28 sites and gives the same result as the count matrix below.
Count matrix motif showing row and column orientations.
Note that all of these can be used with an identifier and alternate name like these 3 count matrix motifs from Jaspar.
When enabled this field supports selecting motifs from the file with a space separated list of motif identifiers and/or their positions in the file.
Any numbers in the range 1 to 999 are assumed to refer to the position of the selected motif in the file, so the entry "3" always refers to the third motif. Any other entry is assumed to be a motif identifier.
Motif identifiers can not start with a dash and can only contain alphanumeric characters as well as colon ':', underscore '_', dot '.' and dash '-'.
Select the desired motif database.
Consult the motif database documentation for descriptions of all the DNA, RNA and protein motif databases present on this MEME Suite server.
This option can help change the alphabet of motifs from a base alphabet to a derived alphabet.
This might be useful if you need to compare an extended DNA motif with a library of DNA motifs, or if you wish to compare RNA motifs to DNA motifs. Note that this option will also let you do nonsensical things like compare Protein motifs to DNA motifs so use it with care.
The derived alphabet must have all the core symbols of the alphabet that it is derived from. For example if the alphabet is derived from DNA it must have ACGT as core symbols. Expanding the alphabet adds frequencies of zero for every symbol in the derived alphabet that did not exist in the base alphabet.
Specify a file to upload containing sequence coordinates in BED format. The file must be based on the exact genome version you specified in the menus above.
Select an available sequence database from this menu.
Select an available version of the sequence database from this menu.
Select an available tissue/cell-specificity from this menu.
Selecting this option will filter the sequence menu to only contain databases that have additional information that is specific to a tissue or cell line.
This option causes MEME Suite to use tissue/cell-specific information (typically from DNase I or histone modification ChIP-seq data) encoded as a position specific prior that has been created by the MEME Suite create-priors utility. You can see a description of the sequence databases for which we provide tissue/cell-specific priors here.
Note that you cannot upload or type in your own sequences when tissue/cell-specific scanning is selected.
Enter text naming or describing this analysis. The job description will be included in the notification email you receive and in the job output.
When this option is selected, if the FASTA sequence header of an input sequence contains genomic coordinates in UCSC or Galaxy format the discovered motif sites will be output in genomic coordinates. If the sequence header does not contain valid coordinates, the sites will be output with the start of the sequences as position 1.
XSTREME will determine which of these known motifs are enriched in your sequences relative to the control sequences (using SEA). It will also compare any motifs it discovers to each of these known motifs (using Tomtom).
This is the width (number of characters in the sequence pattern) of a single discovered motif. STREME and MEME choose the optimal width of each motif individually using a statistical heuristic function. You can choose limits for the minimum and maximum motif widths that STREME and MEME will consider. The width of each motif that STREME and MEME report will lie within the limits you choose.
Specify the order (m) for the background model and sequence shuffling. By default, XSTREME uses m=2 for DNA and RNA sequences, and m=0 for protein or custom alphabet sequences. Check this box and set the value of m if you want to override the default value of m that XSTREME uses.
If you upload a background model (see option above), XSTREME will only use the m-order portion of that model. If you do not upload a background model, XSTREME will create an order-m model from the control sequences that you provide, or from the shuffled primary sequences if you don't provide control sequences.
If you do not specify a set of control sequences, XSTREME will create one by shuffling each primary sequence while preserving the frequencies of all words of length k that it contains, where k=m+1.
Check this box and enter a size if you want to limit motif discovery and enrichment analysis to the central regions of the (primary) sequences. Only the central 'size' characters of each (primary) sequence will be input to the STREME, MEME and SEA algorithms. The full-length sequences will still be used for the positional distribution plots and as input to FIMO.
When your sequences are in the DNA alphabet but you want them to be treated as single-stranded RNA, check this box.
The background model normalizes for biased distribution of letters and groups of letters in your sequences. A 0-order model adjusts for single letter biases, a 1-order model adjusts for dimer biases (e.g., GC content in DNA sequences), etc.
By default XSTREME will determine the background Markov model from the control sequences (or from the primary sequences if you do not provide control sequences). The order of the background model depends on the sequence alphabet, but you can also set it manually (see option "What Markov order...", below). Alternately you may select "Upload background model" and input a file containing a background model.
The downloadable version of the MEME Suite contains a program named "fasta-get-markov" that you can use to create background model files in the correct format from FASTA sequence files.
This is where you tell MEME how you believe occurrences of the motifs are distributed among the sequences. Selecting the correct type of distribution improves the sensitivity and quality of the motif search.
Check this box if you want SEA to output the IDs of the sequences matching each significant motif in a TSV file, and the matching sites in a separate TSV file. These files can be very large, so don't check this box if you don't need that information.