MEME Position Specific Priors Format

Description

These priors allow the user to bias the search for motifs by MEME. They give a position-specific prior distribution on the location of motif sites in sequence(s) in the input dataset.

Format Specification

The MEME Position Specific Priors (PSP) format includes the name of the sequence for which a prior distribution corresponds. Sequences not named in the PSP file are given uniform prior distributions on site locations by MEME.

A PSP must be created for a specific width of motif. This width must be specified for each entry in the PSP file, and must be the same for all entries. If MEME varies the motif width during computation, MEME renormalizes the PSP for each sequence. For motif widths larger than that of the prior, the renormalized prior of a site is the geometric mean of the original priors of all sites that it completely contains, normalized to sum to Pr(site). For motif widths smaller than that of the prior, the renormalized prior of a site is just the original prior, normalized to sum to Pr(>0 sites). MEME computes Pr(>0 sites) for a sequence is as the sum of the priors of all potential sites in the sequence and is 1 for the OOPS model, and ≤ 1 for the ZOOPS model.

MEME PSP format is similar to FASTA format. Each entry should start with a header line consisting of a sequence name (ID) followed by the width (WIDTH) of the PSP prior. The sequence name must match the name of a sequence in the FASTA file input to MEME. Any other text on the header line after the name and width is ignored by MEME.

The following lines (PRIORS) contain one number for each position in the identically-named FASTA sequence, where the number gives the prior probability of a motif site at that position in the sequence (or in the reverse complement if -revcomp is specified). All numbers for a PRIORS entry must be in the range (0,1] except for the last w - 1 numbers, which should be 0 (shown in blue in the example), since a motif of that width cannot start in those positions. The numbers in a PRIORS entry must sum to a number no greater than 1. If they sum to less than 1 and MEME is run with the -mod oops switch, MEME will rescale the numbers so that they sum to 1. With the -mod zoops> switch, MEME does not rescale the numbers to sum to 1; a very small sum represents a prior belief that the sequence contains no motif sites. The -mod anr switch to MEME currently does not allow PSPs.

The format of an entry in a PSP file is:

>ID WIDTH
PRIORS
      

An example of the PSP format is given below:

>ICYA_MANSE 4
0.075922 0.070764 0.082380 0.030292 0.025101 0.043139 0.032963
0.086047 0.057445 0.000000 0.000000 0.000000

>LACB_BOVIN 4
0.107099 0.099822 0.116208 0.042731 0.035408 0.060854 0.046499
0.000000 0.000000 0.000000
      

See Also

The psp-gen tool can be used to generate position-specific priors when supplied with two datasets of sequences: "primary" and "control".

Notes:

Custom priors can also be built by hand, to, for example, favor including certain protein letters in motifs. To maximize their effect when used with MEME, a few points should be kept in mind:

  1. The width of the prior should be as close as possible to the expected width of the motifs to be discovered. The effect of the prior drops exponentially for motifs wider than the prior width. Motifs shorter than the prior will have a very strong tendency to have sites aligned with the maxima of the prior.
  2. The ratio of the largest to smallest prior values for a sequence should be large. If the size of the largest entry in PRIORS for a sequence is close to that of the smallest entry, the prior will have much little effect when used with MEME.
  3. The sum of the values in PRIORS should be as large as possible when used with the ZOOPS model (MEME's default). Unless you have reason to believe that one sequence is more likely to contain motifs than another, you will want to make the values of PRIORS as large as possible.