This file format is used by many programs in the MEME Suite to model
Markov background probabilities. A background model file specifies all
*k*-mer frequencies up to a user-chosen maximum *k*. These
define a Markov of order *k-1*.
Some the MEME Suite programs only use the 0-order model, and ignore
any higher order information in the file. You can easily create
a Markov model of any order from a FASTA file of sequences using
the fasta-get-markov command
provided with the downloadable version of the MEME Suite.

# order 0 a 0.324 c 0.176 g 0.176 t 0.324

# order 0 A 2.563e-01 C 2.437e-01 G 2.437e-01 T 2.563e-01 # order 1 AA 7.020e-02 AC 5.388e-02 AG 8.089e-02 AT 5.134e-02 CA 7.575e-02 CC 7.050e-02 CG 1.659e-02 CT 8.089e-02 GA 6.280e-02 GC 5.652e-02 GG 7.050e-02 GT 5.388e-02 TA 4.751e-02 TC 6.280e-02 TG 7.575e-02 TT 7.020e-02

Each line may contain either:

- Any number of white-space characters including empty lines.
- A unique
*k*-mer and a probability separated and potentially surrounded by whitespace. - One of the other options followed by a "#" character designating the rest of the line as a comment to be ignored.

For each value of *k*, up to the maximum you choose,
the file should have exactly one line for each possible
*k*-mer composed of the **core** symbols from either the standard
DNA,
RNA, or
protein alphabet,
or from a custom alphabet.
The frequencies of all *k*-mers
must precede the frequencies of all *k+1*-mers.

For each value of *k*, the probabilities of the *k*-mers must
sum to approximately 1.0 (small allowances for rounding are made).
To define a consistent Markov model, it is necessary that,
for each value of *k*, the sum of the probabilities of the *k*-mers whose
suffix is a particular *k-1*-mer should approximately equal the probability of that *k-1*-mer,
as given in the file.

The probabilities are numbers in the range 0 ≤ **p** ≤ 1.
The may be in simple decimal (e.g., 0.00015) or use exponential notation (e.g., 1.5e-4).
To be precise, each probability is a number **p**, where
**p** can be matched by the regular expression
`^([0]|[1-9][0-9]*)(\.[0-9]+)?([eE][+-]?[0-9]+)?$`

and is in the range 0 ≤ **p** ≤ 1.

It is important to note that probabilities of zero (or one) are not allowed because these cause asymptotic conditions in the equations used by our programs. They are also unlikely to be correct - just because the dataset used to calculate a background might not contain any instances of "CGAAA" does not mean that it is impossible. For this reason the tool fasta-get-markov automatically adds pseudocounts to the observed letter counts (unless it is specifically told not to).

A background model file can be created from any FASTA sequence file using the fasta-get-markov program.