Various MEME Suite programs require as input a file containing protein or DNA sequences. These input files must be in FASTA format.
Every entry consists of a sequence identifier (ID), an optional comment (COMMENT), and a sequence (SEQUENCE). The format looks like this:
>ID COMMENT SEQUENCE
The special character ">" marks the beginning of a new sequence. The ">" character is followed immediately by the sequence identifier. The rest of that line is occupied by the optional comment. Subsequent lines contain the sequence itself.
Some rules about representing sequences:
Here is an example of three protein sequences in FASTA format:
>ICYA_MANSE GDIFYPGYCPDVKPVNDFDLSAFAGAWHEIAKLPLENENQGKCTIAEYKY DGKKASVYNSFVSNGVKEYMEGDLEIAPDAKYTKQGKYVMTFKFGQRVVN LVPWVLATDYKNYAINYNCDYHPDKKAHSIHAWILSKSKVLEGNTKEVVD NVLKTFSHLIDASKFISNDFSEAACQYSTTYSLTGPDRH >LACB_BOVIN MKCLLLALALTCGAQALIVTQTMKGLDIQKVAGTWYSLAMAASDISLLDA QSAPLRVYVEELKPTPEGDLEILLQKWENGECAQKKIIAEKTKIPAVFKI DALNENKVLVLDTDYKKYLLFCMENSAEPEQSLACQCLVRTPEVDDEALE KFDKALKALPMHIRLSFNPTQLEEQCHI >BBP_PIEBR NVYHDGACPEVKPVDNFDWSNYHGKWWEVAKYPNSVEKYGKCGWAEYTPE GKSVKVSNYHVIHGKEYFIEGTAYPVGDSKIGKIYHKLTYGGVTKENVFN VLSTDNKNYIIGYYCKYDEDKKGHQDFVWVLSRSKVLTGEAKTAVENYLI GSPVVDSQKLVYSDFSEAACKVN
When running MEME sequence weights may be specified in the dataset file by special header lines where the unique name is "WEIGHTS" (all caps) and the descriptive text is a list of sequence weights.
Sequence weights are numbers in the range 0 < w ≤ 1. All weights are assigned in order to the sequences in the file. If there are more sequences than weights, the remainder are given weight one. Weights may be specified by more than one "WEIGHTS" entry which may appear anywhere in the file. When weights are used, sequences will contribute to motifs in proportion to their weights.
Here is an example for a file of three sequences where the first two sequences are very similar and it is desired to down-weight them:
>WEIGHTS 0.5 .5 1.0 >seq1 GDIFYPGYCPDVKPVNDFDLSAFAGAWHEIAK >seq2 GDMFCPGYCPDVKPVGDFDLSAFAGAWHELAK >seq3 QKVAGTWYSLAMAASDISLLDAQSAPLRVYVEELKPTPEGDLEILLQKW