Click on the menu at the left to see which of the following sequence input methods are available.
Type in sequences
When this option is available you may directly input multiple
sequences by typing them. Sequences must be input in
FASTA format.
Upload sequences
When this option is available you may upload a file containing
sequences in FASTA format.
Upload BED file
When this option is available you may upload a file containing
sequence coordinates in BED format.
Databases (select category)
When this option is available you may first select a category of
sequence database from the list below it. Two additional menus will then appear
where you can select the particular database and version desired, respectively.
The full list of available sequence databases and their descriptions
can be viewed HERE.
Submitted sequences
This option is only available when you have invoked the current
program by clicking on a button in the output report of a different MEME Suite program.
By selecting this option you will input the sequences sent by that program.
Specify a file to upload containing
sequence coordinates in BED format.
The file must be based on the exact genome version you specified in the
menus above.
Selecting this option will filter the sequence menu to only contain
databases that have additional information that is specific to a tissue
or cell line.
This option causes MEME Suite to use tissue/cell-specific information
(typically from DNase I or histone modification ChIP-seq data) encoded
as a position specific prior that
has been created by the MEME Suite create-priors
utility. You can see a description of the sequence databases
for which we provide tissue/cell-specific priors
here.
Note that you cannot upload or type in your own sequences
when tissue/cell-specific scanning is selected.
Enter the email address where you want the job notification email to
be sent. Please check that this is a valid email address!
The notification email will include a link to your job results.
Note: You can also access your jobs via the Recent Jobs
menu on the left of all MEME Suite input pages. That menu only
keeps track of jobs submitted during the current session of your internet browser.
Note: Most MEME Suite servers only store results for a couple of days.
So be sure to download any results you wish to keep.
STREME looks for motifs that are enriched in your sequences relative
to a control set of sequences.
By default STREME creates the control
set by shuffling each of your sequences, conserving k-mer
frequencies ("Shuffled input sequences"), where k=3 for
DNA and RNA sequences, and k=1 for protein or custom
alphabet sequences.
Alternatively, you may
may provide a set of control sequences ("User-provided sequences").
IMPORTANT NOTE: If you provide control sequences,
they should have the same length distribution as your
the primary sequences. For example, all sequences
in both sets could have the same length, or, for each sequence in
the primary set there could be exactly N sequences with the
same length as it in in the control set.
Failure to ensure this may cause STREME to report inaccurate
estimates of the statistical significance (p-value)
of the motifs it finds.
Select a file of FASTA formatted
biological sequences or paste in FASTA formatted biological sequences to search sequence motifs.
The more sequences that you can give STREME the
more subtle the motifs it can find.
For ChIP-seq we recommend using sequences of length 100bp centered on
the summit or center of the peak. For CLIP-seq we recommend using
the actual peak regions.
The STREME webserver limits the total
length of the sequences to 10,000,000 (DNA and RNA) and 1,000,000
(protein and custom alphabets).
Select a file of FASTA formatted
biological sequences or paste in FASTA formatted biological sequences to use as controls in the
search for motifs.
Your control sequences should have approximately the same length
distribution and background frequencies as your primary sequences
the motifs that you are attempting to find.
The more control sequences that you can give STREME, the
more subtle the motifs it can find.
The STREME webserver limits the total
length of the control sequences to 10,000,000 (DNA and RNA) and 1,000,000
(protein and custom alphabets).
Enter the email address where you want the job notification email to
be sent. Please check that this is a valid email address!
The notification email will include a link to your job results.
Note: You can also access your jobs via the Recent Jobs
menu on the left of all MEME Suite input pages. That menu only
keeps track of jobs submitted during the current session of your internet browser.
Note: Most MEME Suite servers only store results for a couple of days.
So be sure to download any results you wish to keep.
This is the width (number of characters in the sequence pattern) of a
single motif. STREME chooses the optimal width of each motif individually
using a heuristic function. You can choose limits
for the minimum and maximum motif widths that STREME will consider. The
width of each motif that STREME reports will lie within the limits you
choose.
The background model normalizes for biased distribution of
letters and groups of letters in your sequences.
A 0-order model adjusts for single letter biases, a 1-order model adjusts for
dimer biases (e.g., GC content in DNA sequences), etc.
By default STREME will determine the background Markov model from
the control sequences (or from the primary sequences if you do not provide
control sequences). The order of the background model depends
on the sequence alphabet, but you can also set it manually (see option "What Markov order...", below).
Alternately you may select "Upload background model" and input a file containing
a background model.
The downloadable version of the MEME Suite contains a program named
"fasta-get-markov" that you can use to create background model files in
the correct format from FASTA sequence files.
Specify the order (m) for the background model and sequence shuffling.
By default, STREME uses m=2 for DNA and RNA sequences, and m=0 for
protein or custom alphabet sequences.
Check this box and set the value of m if you want to override
the default value of m that STREME uses.
If you upload a background model (see option above), STREME will only
use the m-order portion of that model.
If you do not upload a background model,
STREME will create an order-m model
from the control sequences that you provide, or from the shuffled primary sequences
if you don't provide control sequences.
If you do not specify a set of control sequences, STREME will
create one by shuffling each primary sequence while preserving
the frequencies of all words of length k that it contains,
where k=m+1.
If this option is checked, STREME will NOT trim the control sequences
even if their average length exceeds that of the primary sequences.
This will cause STREME to use the (less accurate) Binomial test rather
than the Fisher Exact test if the control sequences are longer (on average)
than the primary sequences.
STREME stops looking for motifs when one of the limits below is met. There
is an additional time limit which is set by the server operator.
p-value threshold
The probability of a motif being found that would discriminate
the primary sequences from the control sequences at least as well,
assuming that the letters in the primary sequences were randomly shuffled.
STREME stops when 3 motifs have been found whose p-values exceed
this threshold. If STREME is unable to estimate the p-values of motifs, it will
stop when 5 motifs have been found.
Number of motifs
Check this box and set the (maximum) number of motifs
you want STREME to find before it stops. STREME will ignore
the p-value threshold if this box is checked.
By default STREME does not limit the number of motifs to be found,
but uses the p-value threshold as its stopping criterion.
For the site positional distribution diagrams, align the sequences
on their left ends, on their centers, or on their right ends.
For visualizing motif distributions, center alignment is
ideal for ChIP-seq and similar data; right alignment
for sequences upstream of transcription start sites; left
alignment for many proteins or 3' UTR sequences.
When this option is selected, if the FASTA sequence header of an input
sequence contains genomic coordinates in UCSC or Galaxy format the discovered motif sites
will be output in genomic coordinates. If the sequence header does
not contain valid coordinates, the sites will be output with
the start of the sequences as position 1.