A motif is represented by a position-dependent scoring
matrix.
A scoring matrix is preceded by a line starting with the words
log-odds matrix: and specifying
alength, the length of the alphabet (number of
columns in the scoring matrix), and the w, the
width of the motif (number of rows in the scoring matrix).
The following w lines (no blank lines allowed)
contain the rows of the scoring matrix. Row i, column
j of the matrix gives the score for the j-th letter
in alphabet appearing at position i in an
occurrence of the motif.
The spaces after the equals signs and the colon are
required.
The number of letters in alphabet must equal
alength.
Any number of additional motifs may follow the first one.
The motif file must contain a line starting with
ALPHABET=
followed by alphabet, a list containing the letters used in
the motifs. The order of the letters in alphabet must be the
same as the order of the columns of scores in the motifs. The order
need not be alphabetical and case does not matter, but there should
be no spaces in alphabet. The letters in alphabet
must be a subset of either the IUB/IUPAC DNA
(ABCDGHKMNRSTUVWY*-) or protein
(ABCDEFGHIKLMNPQRSTUVWXYZ*-) alphabets. DNA alphabets must
contain at least the letters ACGT. Protein alphabets must
contain at least the letters ACDEFGHIKLMNPQRSTVWY. All other
letters in the alphabets are optional. If any of the optional
letters are missing from alphabet, MAST automatically
generates scores for them by taking the weighted average of the
scores for the letters which the missing letter could match. (The
weights are the frequencies of the replaced letters in the
appropriate non-redundant database.) Replacements for the optional
letters are given in the following table.
Letters matched by optional letters
optional
letter
matches
DNA
protein
B
CGT
DN
D
AGT
H
ACT
K
GT
M
AC
N
ACGT
R
AG
S
CG
U
T
ACDEFGHIKLMNPQRSTVWY
V
CAG
W
AT
X
ACDEFGHIKLMNPQRSTVWY
Y
CT
Z
EQ
*
ACGT
ACDEFGHIKLMNPQRSTVWY
-
ACGT
ACDEFGHIKLMNPQRSTVWY
EXAMPLE
Here is an example of a DNA motif file that contains two
motifs.
In the example above, because the order of the letters in
alphabet is ACGT, the first column of each
motif gives the scores for the letter A at each position in
the motif, the second column gives the scores for C and so
forth.