momo <algorithm> [options] <PTM file>+
The name of the algorithm to use to search for motifs. Available algorithms are motifx, modl, and simple.
The names of one or more files with peptide sequences containing a post-translationally modified amino acid. (These are referred to as the "foreground" peptides.) Each file must be either in PSM format, FASTA format, or "Raw" (one sequence per line) format. All files must use the same format, and MoMo will attempt to determine the format of the file(s) using the following rules, in order:
With FASTA or Raw format files, all sequences must have the same length, the length must be 7 or be specified using the option --width, below, the sequences must be in the Protein IUPAC alphabet, and the modified amino acid is assumed to be the central residue.
With PSM format files, one column must contain the modified peptides with the modified amino acid indicated as described in the PSM format documentation. You can specify the name of the modified peptide column using the --sequence-column, option, below. MoMo will attempt to expand all modified peptides to the width of the longest modified peptide or the requested motif width, whichever is wider. (See option --width, below, for a description of how the expansion is done.)
MoMo writes its output to files in a directory named
momo_out, which it creates if necessary. You can change the
output directory using the --o or --oc options.
The directory will contain the following files:
momo.html- an HTML file that provides the results in a human-readable format
momo.tsv- a TSV (tab-separated values) file that provides the results in a format suitable for parsing by scripts and viewing with Excel
momo.txt- a plain text file containing the motifs discovered by MoMo in MEME Motif Format
<name>.png- PNG image files containing sequence logos for each of the motifs found by MoMo (where <name> is the motif name)
Note: See this detailed description of the MoMo output formats for more information.
|--psm-type||comet| ms-gf+| tide| percolator||If the PTM file(s) are in tab-separated Peptide-Spectrum Match (PSM) format, you can specify the name of the program that created them. This will cause MoMo to set the name of the column containing the modified peptides appropriately.||MoMo will attempt to determine the type of the PTM file(s), and if they are in a PSM format, you must use the --sequence-column option, below.|
|--sequence-column||name||The name of the column containing the modified peptides in the PTM file. This option is required if the PTM file(s) are in PSM format unless you use the --psm-type option, above..||None.|
|--width||width||The width of motifs to discover.
Because motifs will be symmetric around the central, modified residue, width
must be odd. The behavior of MoMo depends on the format of the PTM input file(s).
FASTA or Raw format: Motifs of width 7 are generated.
PSM format: Motifs of width the length of the longest modified peptide are generated.
|--seed||seed||The seed for initializing the random number generator used for shuffling foreground peptides (preserving the central residue) to use as the background peptides unless you specify option --db_background, below.||A value of 0 is used.|
|--db-background||The background peptides for the motif-x and MoDL algorithms will be extracted from the protein database if you specify option --protein-database, below.||Shuffled versions (preserving the central residue) of each of the foreground peptides will be used as the background peptides for the motif-x and MoDL algorithms.|
|--protein-database||protein database file||A protein database that will be used to allow expansion of modified peptides
from PSM formatted PTM input file(s)
(see option --width, above),
for estimating the amino acid background frequencies, and
potentially for creating a set of background peptides (see option
(This is typically the protein database that was used to generate the PTM input file(s).)
This file may be in either FASTA or Raw format.
If it is in FASTA format, the sequences may be of any length.
If it is in Raw format, each sequence must be of the length specified by the
--width option, above, and there must be exactly one sequence per line,
with no sequence ID lines.
The background frequencies are used as follows:
|Modified peptides from PTM file(s) in PSM format are padded using the Protein IUPAC 'X' character as required, and the amino acid background frequencies are derived from the frequencies in the foreground peptides (after expansion and padding).|
|--filter||field,lt|le|eq|ge|gt,threshold||Specifies a (single) filter that causes only modified peptides that pass the given test to be included in the analysis. The test consists of three components separated by commas with no spaces in between. (If the field name contains spaces, enclose the entire test string in quotes.) The field component of the parameter specifies the name of the column in the PSM format PTM file from which the score is drawn. The next component specifies whether only modified peptides with scores less than (lt), less than or equal (le), equal (eq), greater than or equal (ge), or greater than (gt) the threshold are retained.||No filter.|
|--remove-unknowns||T|F||If TRUE (T), all foreground and background peptides that contain an 'X' (after expansion and padding, see option --width, above) will be removed from the analysis.||Do not remove peptides just because they contain an 'X'.||Remove peptides if they contain an 'X'.|
|--eliminate-repeats||num||Any groups of peptides in the foreground or background sets whose num central residues (after expansion and padding, see option --width, above) are identical will be replaced with a single copy. Because the window is symmetric around the central, modified residue, num must be odd. To turn this option off, specify a width of 0. Note: Since shorter peptides will be padded with the 'X' character, which matches any other character, shorter peptides will match longer ones that contain them, and will be subject to elimination.||Behave as if the value width (see option --width, above) was given for num.|
|--min-occurrences||num||Only attempt to construct a motif for a particular modification of an amino acid if there are at least num foreground and background peptides (after eliminating repeats, see option --eliminate-repeats, above) that contain it.||Behave as if 5 was given for num.|
|--min-occurrences||num||The minimum number of peptides in the post-translationally modified data set needed to match the residue/position pair for each recursive iteration of motif-x. Also, MoMo only attempts to construct a motif for a particular modification of an amino acid if there are at least num foreground and background peptides (after eliminating repeats, see option --eliminate-repeats, above) that contain it.||Behave as if 20 was given for num.|
|--single-motif-per-mass||Generate a single motif that combines all central residues that have the same modification mass. Only valid with PSM formatted PTM files. (The modification mass is given as a number following the modified amino acid in the modified peptide as described in the PSM format documentation.) For example, phosphorylation is typically specified as a mass of 79.97 added to the residues S, T or Y. If this option is not given, three separate motifs are generated, each with a perfectly conserved central residue. If this option is given, then all the phosphorylation events are combined into a single motif, with a mixture of S, T and Y in the central position.||Generate a motif for each combination of residue and modification mass.|
|--hash-fasta||k||If a protein database is provided in FASTA format, the process of finding the location of a peptide within the protein can be sped up using an O(1) lookup table hashing from each unique k-mer to an arraylist of locations. If k is 0, the program will proceed using linear search instead of creating a lookup table. Note: With a full mammalian proteome as the protein database, MoMo will typically run faster using hashing (e.g., set k to 6) if your PTM file(s) contain more than 50,000 peptides in total.||Behave as if 0 was given for k.|
|--max-motifs||num||MoDL will stop after it finds num motifs.||Behave as if 100 was given for num.|
|--max-iterations||num||MoDL will stop after num iterations.||Behave as if 50 was given for num.|
|--max-no-decrease||num||MoDL will stop if there is no decrease in MDL for num iterations.||Behave as if 10 was given for num.|
|--score-threshold||num||The largest binomial probability for a residue/position pair to be counted as significant during each recursive iteration of motif-x.||Behave as if 1e-6 was given for num.|
|--harvard||Mimic the behavior of (the Harvard version of) motif-x more closely by only calculating binomial p-values no smaller than 10-16 for residue/position pairs. Smaller p-values are set to 10-16, and ties are broken by sorting residue/position pairs by decreasing number of peptides that match them.||Calculate residue/position pair binomial p-values in log-space, allowing p-values as small as e(-10300).|
If you use MoMo in your research, please cite the following paper: A. Cheng, C. E. Grant, T. L. Bailey and W. S. Noble, "MoMo: Discovery of post-translational modification motifs", bioRxiv, preprint, 2017. [full text]