Motif file format

Here is an example of the motif input format required by Meta-MEME. This format is based upon the HTML output format produced by MEME version 3.0. However, MEME output files contain a lot of additional information that is not required by Meta-MEME. Following is a list of the essential elements.

First, Meta-MEME looks for the words MEME version or Meta-MEME version followed by a version number. The version number must begin with a 3.

Next, Meta-MEME looks for the alphabet. The alphabet must start a new line with the string ALPHABET= and is terminated by white space. Note that the order that the alphabet appears in here is assumed to be the same in the motifs below. Only two alphabets (protein and DNA) are supported. Ambiguous characters are not listed, although they are used by Meta-MEME.

Third comes the strand information. This appears only in DNA motif files. The strand information is written strands: + or strands: + -, depending upon whether one or both strands are included.

Fourth comes the background distribution, formatted as shown. The background must start a new line with the string Background letter frequencies (from. This is followed by a list of characters and their associated frequencies, delimited by white space.

Fifth come the motifs. Motifs are printed with each row representing amino acid or nucleotide distribution at a particular position. Motifs are labeled by index, as MOTIF X. The header lines at the beginning of the letter-probability matrix must be duplicated exactly, with the proper values for the alphabet length, number of sites, motif width, and E-value. For example:

BL   MOTIF 1 width=19 seqs=5
letter-probability matrix: alength= 20 w= 18 nsites= 818 E= 1.2e+002 

Older versions of MEME produce a matrix of probabilities rather than frequencies (i.e., the prior is applied to the frequencies). The old version is indicated by having n= rather than nsites= in the header for the letter probability matrix. If your MEME output file is from a previous version of MEME, you will receive the following message:

  Warning: This is an old MEME file that contains
  posterior probabilities rather than frequencies.
  Meta-MEME will still work, but it would be better
  to run an updated version of MEME on your data.

Finally, after all of the motifs, Meta-MEME looks for the motif occurrence section. This section is optional and is preceded by the line
<INPUT TYPE = HIDDEN NAME = motif-summary VALUE = "
Each subsequent line contains the motif occurrence information from one sequence in the training set. Each line begins with the sequence id, sequence p-value, number n of motif occurrences, and the length of sequence. The four values are followed by n triples, each corresponding to one motif occurrence. Each triple consists of the motif id, its occurrence position in the sequence, and the motif occurrence p-value. The motif occurrence section must end with a line containing only a quotation mark followed by a greater-than symbol:
">

Any blank line in the example file may be replaced by arbitrary text.


Author: William Stafford Noble.

Meta-MEME program documentation

Meta-MEME home