Click on the menu at the left to see which of the following sequence input methods are available.
Type in sequences
When this option is available you may directly input multiple
sequences by typing them. Sequences must be input in
FASTA format.
Upload sequences
When this option is available you may upload a file containing
sequences in FASTA format.
Upload BED file
When this option is available you may upload a file containing
sequence coordinates in BED format.
Databases (select category)
When this option is available you may first select a category of
sequence database from the list below it. Two additional menus will then appear
where you can select the particular database and version desired, respectively.
The full list of available sequence databases and their descriptions
can be viewed HERE.
Submitted sequences
This option is only available when you have invoked the current
program by clicking on a button in the output report of a different MEME Suite program.
By selecting this option you will input the sequences sent by that program.
Specify a file to upload containing
sequence coordinates in BED format.
The file must be based on the exact genome version you specified in the
menus above.
Selecting this option will filter the sequence menu to only contain
databases that have additional information that is specific to a tissue
or cell line.
This option causes MEME Suite to use tissue/cell-specific information
(typically from DNase I or histone modification ChIP-seq data) encoded
as a position specific prior that
has been created by the MEME Suite create-priors
utility. You can see a description of the sequence databases
for which we provide tissue/cell-specific priors
here.
Note that you cannot upload or type in your own sequences
when tissue/cell-specific scanning is selected.
Enter the email address where you want the job notification email to
be sent. Please check that this is a valid email address!
The notification email will include a link to your job results.
Note: You can also access your jobs via the Recent Jobs
menu on the left of all MEME Suite input pages. That menu only
keeps track of jobs submitted during the current session of your internet browser.
Note: Most MEME Suite servers only store results for a couple of days.
So be sure to download any results you wish to keep.
STREME looks for motifs that are enriched in your sequences relative
to a control set of sequences.
By default STREME creates the control
set by shuffling each of your sequences, conserving k-mer
frequencies ("Shuffled input sequences"), where k=3 for
DNA and RNA sequences, and k=1 for protein or custom
alphabet sequences.
Alternatively, you may
may provide a set of control sequences ("User-provided sequences").
IMPORTANT NOTE: If you provide control sequences,
they should have the same length distribution as your
the primary sequences. For example, all sequences
in both sets could have the same length, or, for each sequence in
the primary set there could be exactly N sequences with the
same length as it in in the control set.
Failure to ensure this may cause STREME to report inaccurate
estimates of the statistical significance (p-value)
of the motifs it finds.
Select a file of FASTA formatted
biological sequences or paste in FASTA formatted biological sequences to search sequence motifs.
The more sequences that you can give STREME the
more subtle the motifs it can find.
For ChIP-seq we recommend using sequences of length 100bp centered on
the summit or center of the peak. For CLIP-seq we recommend using
the actual peak regions.
The STREME webserver limits the total
length of the sequences to 10,000,000 (DNA and RNA) and 1,000,000
(protein and custom alphabets).
Select a file of FASTA formatted
biological sequences or paste in FASTA formatted biological sequences to use as controls in the
search for motifs.
Your control sequences should have approximately the same length
distribution and background frequencies as your primary sequences
the motifs that you are attempting to find.
The more control sequences that you can give STREME, the
more subtle the motifs it can find.
The STREME webserver limits the total
length of the control sequences to 10,000,000 (DNA and RNA) and 1,000,000
(protein and custom alphabets).
Enter the email address where you want the job notification email to
be sent. Please check that this is a valid email address!
The notification email will include a link to your job results.
Note: You can also access your jobs via the Recent Jobs
menu on the left of all MEME Suite input pages. That menu only
keeps track of jobs submitted during the current session of your internet browser.
Note: Most MEME Suite servers only store results for a couple of days.
So be sure to download any results you wish to keep.
This is the width (number of characters in the sequence pattern) of a
single motif. STREME chooses the optimal width of each motif individually
using a heuristic function. You can choose limits
for the minimum and maximum motif widths that STREME will consider. The
width of each motif that STREME reports will lie within the limits you
choose.
If you do not specify a set of control sequences, STREME will
create one by shuffling each primary sequences while preserving
the frequencies of all words of length order+1 that it contains.
STREME also creates a Markov model of the given order from the control sequences
that you provide, or from the shuffled primary sequences.
Check this box and set the value of order if you want to override
the default value that STREME uses.
STREME stops looking for motifs when one of these limits is met. There
is an additional time limit which is set by the server operator.
p-value threshold
The probability of a motif being found that would discriminate
the primary sequences from the control sequences at least as well,
assuming that the letters in the primary sequences were randomly shuffled.
STREME stops when 3 motifs have been found whose p-values exceed
this threshold.
Number of motifs
Check this box and set the (maximum) number of motifs
you want STREME to find before it stops. STREME will ignore
the p-value threshold if this box is checked.
By default STREME does not limit the number of motifs to be found,
but uses the p-value threshold as its stopping criterion.
For the site positional distribution diagrams, align the sequences
on their left ends, on their centers, or on their right ends.
For visualizing motif distributions, center alignment is
ideal for ChIP-seq and similar data; right alignment
for sequences upstream of transcription start sites; left
alignment for many proteins or 3' UTR sequences.