MEME Motif Format

Description

A motif is a sequence pattern that occurs repeatedly in a group of related sequences. Motifs in the MEME Suite are represented as position-dependent letter-probability matrices that describe the probability of each possible letter at each position in the pattern. Individual MEME motifs do not contain gaps. Patterns with variable-length gaps are split by MEME into two or more separate motifs.

There are two flavors of MEME Motif Format:

Programs for converting many other motif formats into Minimal MEME Motif Format are provided by the downloadable version of MEME Suite.

MEME, STREME and DREME Output Formats

The HTML (.html) and text (.txt) output files from the motif discovery tools MEME, STREME and DREME are accepted as input by the programs in the MEME Suite that require MEME Motif Format. You can use these files directly as input (without any need to edit them) to other MEME Suite programs.

Minimal MEME Motif Format

The MEME Minimal Motif Format is a simple text format for motifs that is accepted by the programs in the MEME Suite that require MEME Motif Format.

The format is a plain text (ASCII) format that can easily be created by hand using a text editor (e.g., emacs, vi, TextEdit) or a word processor (e.g., MSWord, but make sure to export as plain text). However, before hand-crafting any motifs you might want to first check if there is a conversion script that can do the job for you.

A text file in MEME minimal motif format can contain more than one motif, and also (optionally) specifies the motif alphabet, background frequencies of the letters in the alphabet, and strand information (for motifs of complementable alphabets like DNA), as illustrated in the example files below.

Format Specification

The minimal MEME format contains following sections:

  1. Version (required)
  2. Alphabet (recommended)
  3. Strands (optional)
  4. Background frequencies (recommended)
  5. Motifs (required)

For each motif in the motifs section there are the sub-sections:

  1. Motif name (required)
  2. Motif letter-probability matrix (required)
  3. Motif URL (optional)

As well as the documentation you can also refer to these examples.

MEME version line (required)

The MEME Suite requires this line to be certain that it really is reading a MEME motif file and not just something that looks slightly like it. This line must appear before any other sections in the file.

MEME version version number

The version number should be the MEME Suite version you are targeting.

For example to target MEME Suite version 4 and above:

MEME version 4

Alphabet (recommended)

The alphabet line tells the MEME Suite what alphabet to expect the motifs to be in. If this line is not present then the MEME Suite can attempt to detect this from the background or the motifs themselves.

ALPHABET= alphabet

The alphabet can be ACGT for DNA, ACGU for RNA or ACDEFGHIKLMNPQRSTVWY for protein.

For example using a DNA alphabet:

ALPHABET= ACGT

If you are using a custom alphabet instead of the alphabet line you can also supply the complete alphabet definition followed by a line containing just "END ALPHABET" (without the quotes).

For example a simplified RNA alphabet:

ALPHABET "Basic RNA" RNA-LIKE A C G U N = ACGU END ALPHABET

Strands line (optional)

The strands line only has meaning for motifs of complementable alphabets like DNA and indicates if motifs were created from sites on both the given and the reverse complement strands of the sequences. If this line is not supplied then the MEME Suite will assume that motifs of complementable alphabets were created from both strands.

strands: which strands

The which strands can be replaced with + to indicate only the given strand and + - to indicate both strands.

For example to indicate only the given strand was used:

strands: +

Background frequencies lines (recommended)

The background frequencies tell the MEME Suite how prevalent each letter of the motif alphabet was in the source sequences that were used to create the motifs. If the background frequencies are not supplied then the MEME Suite will assume uniform background frequencies. The MEME Suite uses this background to automatically create the log-odds matrices for MAST.

Background letter frequencies (from source):
letter 1 frequency 1 letter 2 frequency 2 ... (repeated) ... letter n-1 frequency n-1 letter n frequency n

The source is not required and if you wish you can leave off then end of the first line after "Background letter frequencies". On the next line is listed each letter in the alphabet followed by its frequency. The letters must be listed in the same order as in the alphabet line (see also custom alphabet ordering) and the frequencies should sum to 1.

An example of uniform DNA frequencies:

Background letter frequencies A 0.25 C 0.25 G 0.25 T 0.25

An example of protein frequencies with a source listed:

Background letter frequencies (from lipocalin.s): A 0.071 C 0.029 D 0.069 E 0.077 F 0.043 G 0.057 H 0.026 I 0.048 K 0.085 L 0.087 M 0.018 N 0.053 P 0.032 Q 0.029 R 0.031 S 0.058 T 0.048 V 0.069 W 0.017 Y 0.050

Motif name line (required)

The motif name line indicates the start of a new motif and designates an identifier for it that must be unique to the file. It also allows for an alternate name that does not have to be unique. Neither the identifier nor the alternate name may contain spaces or equal signs (=).

MOTIF identifier alternate name

For example:

MOTIF MA0002.1 RUNX1

Motif letter-probability matrix lines (required)

The letter probability matrix is a table of probabilities where the rows are positions in the motif and the columns are letters in the alphabet. The columns are ordered alphabetically so for DNA the first column is A, the second is C, the third is G and the last is T. For protein motifs the columns come in the order A, C, D, E, F, G, H, I, K, L, M, N, P, Q, R, S, T, V, W and Y (see also custom alphabet ordering). As each row contains the probability of each letter in the alphabet the probabilities in the row must sum to 1.

letter-probability matrix: alength= alphabet length w= motif length nsites= source sites E= source E-value
... (letter-probability matrix goes here) ...

All the "key= value" pairs after the "letter-probability matrix:" text are optional. The "alength= alphabet length" and "w= motif length" can be derived from the matrix if they are not specified, provided there is an empty line following the letter probability matrix. The "nsites= source sites" will default to 20 if it is not provided and the "E= source E-value" will default to zero. (Note: "S= source Score" may replace "E= source E-value".) The source sites is used to apply pseudocounts to the motif and the source E-value (or source P-value or source Score) is used for filtering the motifs input to some MEME Suite programs (see MAST's -mev option).

An example of a DNA motif's letter-probability matrix:

letter-probability matrix: alength= 4 w= 18 nsites= 18 E= 1.1e-006 0.611111 0.000000 0.055556 0.333333 0.555556 0.000000 0.111111 0.333333 0.222222 0.166667 0.222222 0.388889 0.000000 0.111111 0.000000 0.888889 0.000000 0.055556 0.944444 0.000000 0.111111 0.000000 0.000000 0.888889 0.055556 0.000000 0.888889 0.055556 0.833333 0.111111 0.055556 0.000000 0.111111 0.388889 0.277778 0.222222 0.333333 0.055556 0.500000 0.111111 0.111111 0.222222 0.111111 0.555556 0.277778 0.222222 0.222222 0.277778 0.111111 0.055556 0.722222 0.111111 0.388889 0.166667 0.055556 0.388889 0.055556 0.000000 0.111111 0.833333 0.055556 0.777778 0.000000 0.166667 0.777778 0.000000 0.222222 0.000000 0.277778 0.611111 0.055556 0.055556

Motif URL line (optional)

The URL line specifies a web-page to link to when mentioning the motif in results.

URL web page URL

For example:

URL http://jaspar.genereg.net/cgi-bin/jaspar_db.pl?ID=MA0002.1&rm=present&collection=CORE

Other MEME Motif Formats

The motif discovery tools in the MEME Suite output motifs in several more complex formats. All of these formats are also accepted by programs in the MEME Suite that take motifs as input. The HTML formats are designed for viewing by humans, whereas the text (.txt) and XML (.xml) formats are primarily designed to make it easier for third-party applications to post-process the output of MEME Suite tools.