Usage:

momo <algorithm> [options] <PTM file>+

Description

Input

<algorithm>

The name of the algorithm to use to search for motifs. Available algorithms are motifx, modl, and simple.

<PTM file>+

The names of one or more files with peptide sequences containing a post-translationally modified amino acid. (These are referred to as the "foreground" peptides.) Each file must be either in PSM format, FASTA format, or "Raw" (one sequence per line) format. All files must use the same format, and MoMo will attempt to determine the format of the file(s) using the following rules, in order:

With FASTA or Raw format files, all sequences must have the same length, the length must be 7 or be specified using the option --width, below, the sequences must be in the Protein IUPAC alphabet, and the modified amino acid is assumed to be the central residue.

With PSM format files, one column must contain the modified peptides with the modified amino acid indicated as described in the PSM format documentation. You can specify the name of the modified peptide column using the --sequence-column, option, below. MoMo will attempt to expand all modified peptides to the width of the longest modified peptide or the requested motif width, whichever is wider. (See option --width, below, for a description of how the expansion is done.)

Output

MoMo writes its output to files in a directory named momo_out, which it creates if necessary. You can change the output directory using the --o or --oc options. The directory will contain the following files:

Note: See this detailed description of the MoMo output formats for more information.

Options

Option Parameter Description Default Behavior
General Options
--psm-type comet|​ ms-gf+|​ tide|​ percolator If the PTM file(s) are in tab-separated Peptide-Spectrum Match (PSM) format, you can specify the name of the program that created them. This will cause MoMo to set the name of the column containing the modified peptides appropriately. MoMo will attempt to determine the type of the PTM file(s), and if they are in a PSM format, you must use the --sequence-column option, below.
--sequence-columnname The name of the column containing the modified peptides in the PTM file. This option is required if the PTM file(s) are in PSM format unless you use the --psm-type option, above.. None.
--widthwidth The width of motifs to discover. Because motifs will be symmetric around the central, modified residue, width must be odd. The behavior of MoMo depends on the format of the PTM input file(s).
PTM file formatMoMo Behavior
FASTA or Raw format No effect. An error is reported if the length of any sequence in the input files differs from width.
PSM format If a modified peptide is shorter than width, MoMo will first attempt to expand it by looking up its context in the protein database file, if given (see option --protein-database, below.) If the modified peptide is still shorter than width, MoMo will pad it on either side as required using the Protein IUPAC 'X' character. If the longest modified peptide is still shorter than width, MoMo will set the motif width to the length of the longest (expanded and padded) modified peptide.
FASTA or Raw format: Motifs of width 7 are generated.
PSM format: Motifs of width the length of the longest modified peptide are generated.
--seedseed The seed for initializing the random number generator used for shuffling foreground peptides (preserving the central residue) to use as the background peptides unless you specify option --db_background, below. A value of 0 is used.
--db-background The background peptides for the motif-x and MoDL algorithms will be extracted from the protein database if you specify option --protein-database, below. Shuffled versions (preserving the central residue) of each of the foreground peptides will be used as the background peptides for the motif-x and MoDL algorithms.
--protein-databaseprotein database file A protein database that will be used to allow expansion of modified peptides from PSM formatted PTM input file(s) (see option --width, above), for estimating the amino acid background frequencies, and potentially for creating a set of background peptides (see option --db_background, above). (This is typically the protein database that was used to generate the PTM input file(s).) This file may be in either FASTA or Raw format. If it is in FASTA format, the sequences may be of any length. If it is in Raw format, each sequence must be of the length specified by the --width option, above, and there must be exactly one sequence per line, with no sequence ID lines. The background frequencies are used as follows:
AlgorithmMoMo Behavior
motifx The background frequencies are used to estimate the binomial probabilities.
modl The background frequencies are used to estimate the description length.
simple The background frequencies are included in the MEME Motif Format motif output file momo.txt.
Modified peptides from PTM file(s) in PSM format are padded using the Protein IUPAC 'X' character as required, and the amino acid background frequencies are derived from the frequencies in the foreground peptides (after expansion and padding).
--filterfield,lt|le|eq|ge|gt,threshold Specifies a (single) filter that causes only modified peptides that pass the given test to be included in the analysis. The test consists of three components separated by commas with no spaces in between. (If the field name contains spaces, enclose the entire test string in quotes.) The field component of the parameter specifies the name of the column in the PSM format PTM file from which the score is drawn. The next component specifies whether only modified peptides with scores less than (lt), less than or equal (le), equal (eq), greater than or equal (ge), or greater than (gt) the threshold are retained. No filter.
--remove-unknownsT|F If TRUE (T), all foreground and background peptides that contain an 'X' (after expansion and padding, see option --width, above) will be removed from the analysis. Do not remove peptides just because they contain an 'X'. Remove peptides if they contain an 'X'.
--eliminate-repeatsnum Any groups of peptides in the foreground or background sets whose num central residues (after expansion and padding, see option --width, above) are identical will be replaced with a single copy. Because the window is symmetric around the central, modified residue, num must be odd. To turn this option off, specify a width of 0. Note: Since shorter peptides will be padded with the 'X' character, which matches any other character, shorter peptides will match longer ones that contain them, and will be subject to elimination. Behave as if the value width (see option --width, above) was given for num.
--min-occurrencesnum Only attempt to construct a motif for a particular modification of an amino acid if there are at least num foreground and background peptides (after eliminating repeats, see option --eliminate-repeats, above) that contain it. Behave as if 5 was given for num.
--min-occurrencesnum The minimum number of peptides in the post-translationally modified data set needed to match the residue/position pair for each recursive iteration of motif-x. Also, MoMo only attempts to construct a motif for a particular modification of an amino acid if there are at least num foreground and background peptides (after eliminating repeats, see option --eliminate-repeats, above) that contain it. Behave as if 20 was given for num.
--single-motif-per-mass Generate a single motif that combines all central residues that have the same modification mass. Only valid with PSM formatted PTM files. (The modification mass is given as a number following the modified amino acid in the modified peptide as described in the PSM format documentation.) For example, phosphorylation is typically specified as a mass of 79.97 added to the residues S, T or Y. If this option is not given, three separate motifs are generated, each with a perfectly conserved central residue. If this option is given, then all the phosphorylation events are combined into a single motif, with a mixture of S, T and Y in the central position. Generate a motif for each combination of residue and modification mass.
--hash-fastak If a protein database is provided in FASTA format, the process of finding the location of a peptide within the protein can be sped up using an O(1) lookup table hashing from each unique k-mer to an arraylist of locations. If k is 0, the program will proceed using linear search instead of creating a lookup table. Note: With a full mammalian proteome as the protein database, MoMo will typically run faster using hashing (e.g., set k to 6) if your PTM file(s) contain more than 50,000 peptides in total. Behave as if 0 was given for k.
--max-motifsnum MoDL will stop after it finds num motifs. Behave as if 100 was given for num.
--max-iterationsnum MoDL will stop after num iterations. Behave as if 50 was given for num.
--max-no-decreasenum MoDL will stop if there is no decrease in MDL for num iterations. Behave as if 10 was given for num.
--score-thresholdnum The largest binomial probability for a residue/position pair to be counted as significant during each recursive iteration of motif-x. Behave as if 1e-6 was given for num.
--harvard Mimic the behavior of (the Harvard version of) motif-x more closely by only calculating binomial p-values no smaller than 10-16 for residue/position pairs. Smaller p-values are set to 10-16, and ties are broken by sorting residue/position pairs by decreasing number of peptides that match them. Calculate residue/position pair binomial p-values in log-space, allowing p-values as small as e(-10300).

Citing

If you use MoMo in your research, please cite the following paper: A. Cheng, C. E. Grant, T. L. Bailey and W. S. Noble, "MoMo: Discovery of post-translational modification motifs", bioRxiv, preprint, 2017. [full text]