These priors allow the user to bias the search for motifs in MEME. They give a position-specific prior distribution on the location of motif sites in sequence(s) in the input dataset.
The MEME Position Specific Priors (PSP) format includes the name of the sequence for which a prior distribution corresponds. Sequences not named in the PSP file are given uniform prior distributions on site locations by MEME.
A PSP must be created for a specific width of motif. This width must be specified for each entry in the PSP file, and must be the same for all entries. If MEME varies the motif width during computation, MEME renormalises the PSP for each sequence.
MEME PSP format is similar to
FASTA format. Each entry should start
with a header line consisting of a sequence name (ID) followed by
the width (WIDTH) of the PSP prior. The sequence name must match
the name of a sequence in the FASTA file input to MEME. Any other text
on the header line after the name and width is ignored by MEME. The
following lines (PRIORS) contain one number for each position in
the identically-named FASTA sequence, where the number gives the prior
probability of a motif site at that position in the sequence (or in the
reverse complement if -revcomp is specified).
The last w - 1
numbers for each entry should be 0 (shown in
blue in the example), since a motif of that width cannot start in those
positions. All numbers for an entry must be in the range [0,1], and must
sum to a number no greater than 1. If they sum to less than 1 and
-mod oops
is specified, MEME will rescale the numbers so
that they sum to 1.
The format is arranged like this:
>ID WIDTH PRIORS
An example of the PSP format is given below:
>ICYA_MANSE 4 0.075922 0.070764 0.082380 0.030292 0.025101 0.043139 0.032963 0.086047 0.057445 0.000000 0.000000 0.000000 >LACB_BOVIN 4 0.107099 0.099822 0.116208 0.042731 0.035408 0.060854 0.046499 0.000000 0.000000 0.000000
The psp-gen tool can be used to generate position specific priors when supplied with a discriminative sequence dataset.