MAST -- Motif Alignment and Search Tool
Motif search tool
The input to MAST contains the following fields.
- e-mail address
Enter the email address where you want MAST to send the confirmation message and your search results. Be sure that this is a valid e-mail address.
- description
This will be included in the subject lines of the messages MAST will send you. This field is optional so you may leave it blank if you wish.
- motifs
This is the name of a file on your computer that contains one or more of motifs that characterize a group of related sequences. This can be the name of a file that contains the results of a MEME analysis, one or more profiles in GCG format, or a motifs from any other source as long as they are in the correct format. You can create a motif file by saving the e-mail message MEME sends you to a file. How you save an e-mail message to a file depends on the e-mail program you use--consult your system manager if you need help. A "browse" button is provided by MAST to help you locate the motif file you wish to use. If you use profiles, the gap opening and extension penalties will be ignored.
- sequence database
This is the name of the sequence database you wish to search for sequences containing your motifs. You may either
- select a MAST database from a pull-down menu of available databases, or,
- specify the name of a FASTA file of sequences on your computer that MAST will upload and search.
Note 1: If you specify a file on your computer, the selection in the pull-down menu will be ignored.
Note 2: Make sure that you specify the right kind of database for your motifs. MAST can only search protein databases with protein motifs. It can search DNA databases with DNA motifs or with protein motifs if you so specify in the "DNA-ONLY OPTIONS" section.
- FASTA format
If you specify a sequence file to be uploaded and searched by MAST, it must be in FASTA format. The sequence(s) it contains must use either the MEME protein or MEME DNA alphabet.
FASTA sequences start with a header line followed by sequence lines. A header line has the character ``>'' in position one, followed by an unique name without any spaces, followed by (optional) descriptive text. After the header line come the actual sequence lines. Spaces and blank lines are ignored. Sequences may be in capital or lowercase or both. Here is an example of FASTA format:
>ICYA_MANSE INSECTICYANIN A FORM (BLUE BILIPROTEIN)
GDIFYPGYCPDVKPVNDFDLSAFAGAWHEIAK
LPLENENQGKCTIAEYKYDGKKASVYNSFVSNGVKEYMEGDLEIAPDA
>LACB_BOVIN BETA-LACTOGLOBULIN PRECURSOR (BETA-LG)
MKCLLLALALTCGAQALIVTQTMKGLDI
QKVAGTWYSLAMAASDISLLDAQSAPLRVYVEELKPTPEGDLEILLQKW
- treatment of reverse complement strands
MAST can automatically generate the reverse complement strand for each nucleotide sequence in the database and treat it in three different ways. ("Given strand" refers to the sequence as it appears in the database MAST is searching.):
- combine with given strand
MAST searches for motif occurrences on either the given strand or its reverse complement together, not allowing occurrences on the two strands to overlap each other, and displays them together as a single sequence. This allows motifs to occur on either strand and still count toward the overall E-value of the match. (The given strand is the sequence as it appears in the database MAST is searching.)
- treat as separate sequence
MAST to search for motifs in both the given strand and its reverse complement, treating them as two, independent sequences. The results are displayed separately for the two strands, as though both had occurred in the database.
- none
MAST searches only the given strand of each sequence in the database.
Note: this field has no effect when the database contains protein sequences.
- use individual sequence compostion in E-and p-value calculation
This option can improve search selectivity when erroneous matches are due to biased sequence composition. MAST normally computes E-values and p-values using a random sequence model based on the overall letter composition of the database being searched. Selecting this option will cause MAST to use a different random model for each target sequence. The random model for each target sequence will be based on its letter composition, not that of the entire database. Using this option will tend to give more accurate E-values and increase the E-values of compositionally biased sequences. This option may increase search times substantially if used in conjunction with E-value display thresholds over 10, since MAST must compute a new set of motif score distributions for each high-scoring sequence.
- ignore motifs with high E-values
MAST can ignore motifs in the query with E-values above a threshold you select. This is desirable because motifs with high E-values are unlikely to be biologically significant. The default threshold will cause MAST to use all motifs in the query, regardless of their E-values.
Note: This option is only available for motifs generated by MEME 3.0 and above.
- search nucleotide database with protein motifs
Choosing this option will cause MAST to search the nucleotide version of the selected sequence database, converting the nucleotide sequences to protein sequences in all six reading frames. By default, MAST searches the protein version of the selected database when you give it a file of protein motifs.
- scale motif display threshold by sequence length
MAST displays motifs that score above a threshold for all high-scoring sequences. By default, this threshold is based on the probability of the motifs without regard to the length of the sequence. The threshold was chosen with protein sequences of average length in mind. Consequently, many positions in very long sequences may match motifs with scores above this threshold by chance, making the results difficult to interpret. Selecting this option causes the motif display threshold to take sequence length into account. This will reduce the number of weak motifs displayed in long sequences and minimize the size of the output file.
- E-value display threshold
MAST only displays sequences matching your query with E-values below the given threshold you specify here. By default, sequences in the database with matches with E-values less than 10 are displayed. If your motifs are very short or have low information content (are not very specific), it may be impossible for any sequence to achieve a low E-value. If your MAST search returns no hits, you may wish to increase the E-value display threshold and repeat the search.
- rank of first match returned
In order to prevent excessively large results files that cannot be emailed, a maximum of 500 matching sequences is returned. By default, results for the 500 best-matching sequences are returned. If you wish to see further results, you can resubmit your query to MAST and specify a rank larger than 1. For example, specifying 501 will cause the 500 best-matching sequences to be omitted and the results will start with the 501st best-matching sequence.
- text output format
Choosing this option will cause MAST to produce plain text (ASCII) output. By default, MAST output is in hypertext (HTML) format.
Clicking on the Start search button causes your motifs to be sent to San Diego Supercomputer Center (SDSC) and used to search the database you selected.
The results of the MAST search are sent to you by e-mail.
No copies of your motifs or search results are saved at SDSC after the results have been sent to you.
Search using MAST
MAST introduction
MEME SYSTEM introduction