MEME Dirichlet Mixtures Format

Description

A Dirichlet mixture file specifies residues' tendencies to align with one another, and is the basis for scoring columns of aligned residues in MEME and GLAM2.

Format Specification

The format is identical to that of the UCSC Dirichlet mixtures. The default 30-component mixture prior used by MEME is given here.

GLAM2 Dirichlet Mixtures Format

The GLAM2 programs use the same format as MEME but only read lines beginning with Mixture= or Alpha=. Mixture= is followed by a number giving the weight of that mixture component: these weights should sum to 1. Alpha= is followed by a list of numbers giving the pseudocounts for that mixture component, as many as there are symbols in the alphabet. The first number after Alpha= is the sum of the pseudocounts, and is in fact ignored by the GLAM2 programs.

The pseudocounts should be in the same order as the alphabet symbols. For the n (nucleotide) alphabet, this is: acgt. For the p (protein) alphabet, this is: ACDEFGHIKLMNPQRSTVWY.

If no Dirichlet mixture file is specified, the GLAM2 programs use recode3.20comp for the p (protein) alphabet, glam_tfbs.1comp for the n (nucleotide) alphabet, and a uniform prior for user-specified alphabets.