Specify a file to upload containing sequence coordinates in BED format. The file must be based on the exact genome version you specified in the menus above.
Select an available sequence database from this menu.
Select an available version of the sequence database from this menu.
Select an available tissue/cell-specificity from this menu.
Selecting this option will filter the sequence menu to only contain databases that have additional information that is specific to a tissue or cell line.
This option causes MEME Suite to use tissue/cell-specific information (typically from DNase I or histone modification ChIP-seq data) encoded as a position specific prior that has been created by the MEME Suite create-priors utility. You can see a description of the sequence databases for which we provide tissue/cell-specific priors here.
Note that you cannot upload or type in your own sequences when tissue/cell-specific scanning is selected.
Enter text naming or describing this analysis. The job description will be included in the notification email you receive and in the job output.
Select the PTM file(s) that you wish to analyze. The PTM file(s) should all use the same format. This can be in either a Peptide-Spectrum Match (PSM) format or a pure-sequence format (FASTA or Raw).
For PSM formats, which contain tab-separated columns and a header line giving the column names, you must specify the name of the column that contains the modified peptides in the provided field. The website will attempt to guess the file format based on what appears in the first (non-blank) line in the file, and will indicate the its guess in the 'Format?' column. The website will also indicate the correct value of the 'Modified Peptide Column Name' for that format, which you can change if your files use a slightly different format.
If the first (non-empty) line in the file contains one or more tab characters, but none of the known values for 'Modified Peptide Column Name' are found (see Peptide-Spectrum Match (PSM) format), the website will display 'unknown PSM format' in the 'Format?' column. This will allow you to use PSM formats that have different column names than the known formats, as long as modified amino acids are indicated using a one of the 'Modified Peptide Formats' described in the last column of the table in Peptide-Spectrum Match (PSM) format.
For pure-sequence formats, you may not specify a 'Modified Peptide Column Name', the peptides must use the standard IUPAC protein alphabet, and must all be the same length. The pure-sequence formats are:
EGKSLGI KKQSGLA GALSRTH RMHSAGK ELKSEGL ...
Protein database (in FASTA format) to use for filling in any missing flanking amino acids, and to use as background sequences.
Minimum occurrences of modification required to output a motif. This threshold is applied after filtering and eliminating repeats (if applicable).
The p-value threshold used by motif-x for selecting significant residue/position pairs in the motif.
MoDL will stop after it finds the given number of motifs.
MoDL will stop after the given number of iterations is reached.
MoDL will stop if there is no decrease in its objective function, the minumum description length (MDL), for the given number of iterations.
The width of motifs to discover. Because motifs will be symmetric around the central, modified residue, width must be odd. The behavior of MoMo depends on the format of the PTM input file(s).
PTM file format | MoMo Behavior |
---|---|
FASTA or Raw format | No effect. An error is reported if the length of any sequence in the input files differs from Width. |
PSM format | If a modified peptide is shorter than Width, MoMo will first attempt to expand it by looking up its context in the protein database file, if given (see option --protein-database, below.) If the modified peptide is still shorter than Width, MoMo will pad it on either side as required using the Protein IUPAC 'X' character. If the longest modified peptide is still shorter than Width, MoMo will set the motif width to the length of the longest (expanded and padded) modified peptide. |
The background peptides will be extracted from the context sequences. By default, background peptides are generated by shuffling each foreground peptide while conserving its central residue.
Mimic the behavior of (the Harvard version of) motif-x more closely by only calculating binomial p-values no smaller than 10-16 for residue/position pairs. Smaller p-values are set to 10-16, and ties are broken by sorting residue/position pairs by decreasing number of peptides that match them.
All peptides that contain an 'X' (after expansion and padding, see option 'Width:', above) will be removed from the analysis.
Create one motif per mass instead of one motif per mass and central peptide. (The modification mass is given as a number following the modified amino acid in the modified peptide as described in the PSM format documentation.) For example, phosphorylation is typically specified as a mass of 79.97 added to the residues S, T or Y. If this option is not checked, then three separate motifs are generated, each with a perfectly conserved central residue. If this option is checked, then all the phosphorylation events are combined into a single motif, with a mixture of S, T and Y in the central position.
Any groups of modified peptides whose <width> central residues, after expansion and padding, are identical will be replaced with a single copy. Note: Since shorter peptides will be padded with the 'X' character, which matches any other character, shorter peptides will match longer ones that contain them, and will be subject to elimination.
Filter the entries in the PSM-formatted input file based on one of its fields. Select the check box and then specify the Field, Test and Threshold in the three columns that will be provided. (Not available with FASTA and Raw formatted input files.)
Choose the algorithm to use to discover motifs:
Simple | Creates a maximum-likelihood position weight matrix (PWM) motif for each distinct central residue present in the modified peptides in the input PTM file(s). The weights in the PWM are the observed frequencies of the amino acids in the equal-length modified peptides, aligned on their central residue (the modified amino acid). If the modified peptides in the input PTM file(s) have differing lengths, their lengths are adjusted to be equal, as described below under the advanced option "How wide will the motifs be?". |
---|---|
motif-x | The motif-x algorithm utilizes a greedy iterative search to discover motifs by recursively picking the most statistically significant position/residue pair according to binomial probabilility, reducing the dataset to only sequences containing that pair, and continuing until no more position/residue pairs are significant according to a user-defined threshold. If this motif has at least one statistically significant position/residue pair, all instances of the pattern are removed, and the algorithm continues to generate motifs until this condition fails. The motif-x algorithm is described in the paper Schwartz, D. and Gygi, S. P. (2005). "An iterative statistical approach to the identification of protein phosphorylation motifs from large-scale data sets". Nature Biotechnology, 23(11), 1391-1398. |
MoDL | The MoDL algorithm is based on the principle of minimum description length (MDL). It searches for a set of motifs that minimizes the number of bits to encode the set of modified peptides and motifs, using a greedy and iterative approach. The algorithm uses a list of candidate single-residued motifs (excluding the modified site) that exist in the modification dataset. Starting with an empty set of motifs, at each iteration, a set of potential motif sets are generated by either removing a candidate motif, adding a candidate motif, adding a candidate motif then removing a motif, merging a motif with a candidate motif, or merging a motif with a candidate motif and then removing a motif. From this set of potential motifs, the algorithm chooses the motif set with the minimum description length, and repeats the algorithm a specified number of times t, or until the description length does not change for L iterations (t=50 and L=10 by default). Finally, the algorithm returns the motif with the minimum description length among all motifs found. The MoDL algorithm described in the paper Ritz, A., Shakhnarovich, G., Salomon, A., and Raphael, B. (2009). "Discovery of phosphorylation motif mixtures in phosphoproteomics data". Bioinformatics, 25(1), 14-21. |
MoMo discovers sequence motifs associated with different types of protein post-translational modifications (PTMs) (sample output). The program takes as input a collection of PTMs identified using protein mass spectrometry. For each distinct type of PTM, MoMo uses one of three algorithms to discover motifs representing amino acid preferences flanking the modification site. See this Manual for more information.