MEME Position Specific Priors Format

Description

These priors allow the user to bias the search for motifs in MEME. They give a position-specific prior distribution on the location of motif sites in sequence(s) in the input dataset.

Format Specification

The MEME Position Specific Priors (PSP) format includes the name of the sequence for which a prior distribution corresponds. Sequences not named in the PSP file are given uniform prior distributions on site locations by MEME.

A PSP must be created for a specific width of motif. This width must be specified for each entry in the PSP file, and must be the same for all entries. If MEME varies the motif width during computation, MEME renormalises the PSP for each sequence.

MEME PSP format is similar to FASTA format. Each entry should start with a header line consisting of a sequence name (ID) followed by the width (WIDTH) of the PSP prior. The sequence name must match the name of a sequence in the FASTA file input to MEME. Any other text on the header line after the name and width is ignored by MEME. The following lines (PRIORS) contain one number for each position in the identically-named FASTA sequence, where the number gives the prior probability of a motif site at that position in the sequence (or in the reverse complement if -revcomp is specified). The last w - 1 numbers for each entry should be 0 (shown in blue in the example), since a motif of that width cannot start in those positions. All numbers for an entry must be in the range [0,1], and must sum to a number no greater than 1. If they sum to less than 1 and -mod oops is specified, MEME will rescale the numbers so that they sum to 1.

The format is arranged like this:

>ID WIDTH
PRIORS
      

An example of the PSP format is given below:

>ICYA_MANSE 4
0.075922 0.070764 0.082380 0.030292 0.025101 0.043139 0.032963
0.086047 0.057445 0.000000 0.000000 0.000000

>LACB_BOVIN 4
0.107099 0.099822 0.116208 0.042731 0.035408 0.060854 0.046499
0.000000 0.000000 0.000000
      

See Also

The psp-gen tool can be used to generate position specific priors when supplied with a discriminative sequence dataset.