MEME Suite Motif File Formats
MEME results are recorded in three file formats: plain text, HTML, and XML. The XML format was added for MEME 4.0. The plain text and HTML formats have been supported in all versions of MEME.
GLAM2 provides plain text and HTML output. The format is described in the Output format section of the GLAM2 Tutorial. It also provides output in the MEME plain text format.
MAST will accept the plain text and HTML forms of MEME output, and also several other formats described in the MAST documentation.
FIMO will accept the plain text, HTML, and XML forms of MEME output.
GLAM2SCAN will accept the plain text and HTML forms of GLAM2 output.
Details of the MEME Output Formats
The MEME XML format is completely specified by the Document Type Definition (DTD) found at the start of the MEME XML output.
The MEME plain text and HTML formats contain much explanatory text which is not required for processing by tools like MAST and FIMO. The essential elements are these:
- The text
MEME version
followed by a version number. - The alphabet.
The alphabet must start a new line with the string
ALPHABET=
and is terminated by white space. Note that the order that the alphabet appears in here is assumed to be the same in the motifs below. Only two alphabets (protein and DNA) are supported. Ambiguous characters are not listed, but may be used by applications. -
Strand information. This appears only in DNA motif
files. The strand information is written
strands: +
orstrands: + -
, depending upon whether one or both strands are included. - The background distribution, formatted as shown.
The background must start a new line with the string
Background letter frequencies (from
. This is followed by a list of characters and their associated frequencies, delimited by white space. - The motifs. Motifs are printed with each row
representing amino acid or nucleotide distribution at a particular position.
Motifs are labeled by index, as
MOTIF X
. The header lines at the beginning of the log-odds and letter-probability matrices must be duplicated exactly, with the proper values for the alphabet length, number of sites, motif width, and E-value. For example:log-odds matrix: alength= 4 w= 6 n= 1800 bayes= 9.81218 E= 1.3e+004
andletter-probability matrix: alength= 20 w= 18 nsites= 818 E= 1.2e+002
Older versions of MEME produce a matrix of probabilities rather than frequencies (i.e., the prior is applied to the frequencies). The old version is indicated by having
n=
rather thannsites=
in the header for the letter probability matrix. If your MEME output file is from a previous version of MEME, you will receive the following message:Warning: This is an old MEME file that contains posterior probabilities rather than frequencies. Meta-MEME will still work, but it would be better to run an updated version of MEME on your data.
-
The motif occurrence section.
This section is optional and is preceded by the line
<INPUT TYPE = HIDDEN NAME = motif-summary VALUE = "
Each subsequent line contains the motif occurrence information from one sequence in the training set. Each line begins with the sequence id, sequence p-value, number n of motif occurrences, and the length of sequence. The four values are followed by n triples, each corresponding to one motif occurrence. Each triple consists of the motif id, its occurrence position in the sequence, and the motif occurrence p-value. The motif occurrence section must end with a line containing only a quotation mark followed by a greater-than symbol:
">