Click on the menu at the left to see which of the following motif input methods are available.
Type in motifs
When this option is available you may directly input multiple motifs
by typing them (or using "cut-and-paste").
First select the desired motif alphabet using the menu
immediately to the left. If you select the "Custom" option then
you must provide an alphabet definition
in the file input that immediately follows. Warning: custom alphabets are case-sensitive.
You may optionally give each motif an identifier and alternate name by
inputting a line like >Identifer Alternate-Name preceeding the motif.
You can then enter each motif as either matrices, sequence sites or regular expressions.
You can enter multiple motifs by typing an empty line after each motif.
Individual motifs will be shown in square brackets, and errors in your
motifs will be highlighted in red while warnings will be highlighted in
yellow.
Mouse-over individual motifs to display their sequence logos.
View the examples for more information on what is possible.
Upload motifs
When this option is available you may upload a file containing
motifs in MEME motif format. This includes the outputs generated by MEME
and DREME, as well as files you create using the
motif conversion scripts
or manually following the
MEME motif format guidelines.
Databases (select category)
When this option is available you can select the category of
motif database desired from the list below it. Then select the motif
database from the displayed list.
Consult the
motif database documentation
for descriptions of all the motif databases present on this MEME Suite server.
Submitted motifs
This option is only available when you have invoked the current
program by clicking on a button in the output report of a different MEME Suite program.
By selecting this option you will input the motifs sent by that program.
You may input both probability and count matrices of either orientation
and the rules described below will be used to convert the matrix into a
MEME formatted motif.
Alphabet Order
The counts/probabilities are expected to be ordered based on the
alphabetical ordering of their codes. So DNA is ordered ACGT and
protein is ordered ACDEFGHIKLMNPQRSTVWY. For custom alphabets the ordering
goes uppercase letters (A-Z), lowercase letters (a-z), numbers (0-9) and
finally the symbols '*', '-' and '.'.
Matrix Orientation
Matrix motifs may be input with either one position per row (preferred)
which is called row orientation, or one position per column which is
called column orientation. The orientation is determined by picking
which dimension (row or column) is equal to the alphabet size. If both
dimensions are equal to the alphabet size then row orientation is assumed.
If neither dimension is equal to the alphabet size then the closest
that is still smaller than the alphabet size is picked, however if both
are equally smaller then column orientation is assumed. Finally if none of
the above rules work to determine the orientation then row orientation is
assumed.
Site counts
Once the orientation is determined, the sum of the numbers that make up
the first position is calculated and rounded to the nearest integer.
If that value is larger than 1 then the matrix is assumed to be a count
matrix and that value is used as the site count, otherwise the matrix is
assumed to be a probability matrix and a site count of 20 is used.
Converting to a normalized probability matrix
Once the orientation is determined then each number in the matrix is
converted to a normalized probability by dividing by the sum of all the
numbers for that motif position. If any numbers are missing they are
assumed to have the value zero. As a special case if all numbers in a
motif position have the value zero then they are given the uniform
probability of 1 / alphabet size.
Yellow highlighting and red annotations
Red asterisks (*) indicate where the
parser thinks values are missing. A yellow highlighted row or column
with a red number at the end indicates that the counts for that position
don't sum to the same count as the first position. The red number shows
the difference. If the red number is negative then that position sums to
less then the first position, if it is positive then it sums to more than
the first position.
Typed Motifs - Sequence Sites or Regular Expressions
You may input one or more sites of the motif including using ambiguity
codes or bracket expressions to represent multiple possibilities for a
single motif position.
Ambiguity Codes
The DNA and protein alphabets include additional codes that represent
multiple possible bases. For example the DNA alphabet includes W (for weak)
which represents that the given position could be either a A (for adenosine)
or a T (for thymidine).
Bracket Expressions
Bracket expressions also group together multiple codes so they share
a single position. Their syntax is a opening square bracket '[' followed
by one or more codes and a closing square bracket ']'. For example with a
DNA motif the bracket expression [AT] means that both A and T are
acceptable and is equivalent to the ambiguity code W. Any repeats of a
base in a bracket expression are ignored so for example a DNA bracket
expression [AAT] has the same effect as [AT] or [AW] or W.
Multiple sites
When only one site is provided the site count is set to 20, however
you can precisely control the motif by providing multiple sites. Each of
these sites can still contain ambiguity codes and bracket expressions
but a single count will be divided among the selected bases for each
position. When multiple sites are provided the site count will be set
to the number of sites provided.
DNA motifs support the standard 4 codes for the bases: adenosine (A),
cytidine (C), guanosine (G) and thymidine (T) as well as supporting
the following ambiguity codes.
When enabled this field supports selecting motifs from the file with a
space separated list of motif identifiers and/or their positions in the
file.
Any numbers in the range 1 to 999 are assumed to refer to the position
of the selected motif in the file, so the entry "3" always refers to the
third motif. Any other entry is assumed to be a motif identifier.
Motif identifiers can not start with a dash and can only contain
alphanumeric characters as well as colon ':', underscore '_', dot '.'
and dash '-'.
This option can help change the alphabet of motifs from
a base alphabet to a derived alphabet.
This might be useful if you need to compare an extended DNA motif with
a library of DNA motifs, or if you wish to compare RNA motifs to DNA motifs.
Note that this option will also let you do nonsensical things like compare
Protein motifs to DNA motifs so use it with care.
The derived alphabet must have all the core symbols of the alphabet
that it is derived from. For example if the alphabet is derived from DNA
it must have ACGT as core symbols. Expanding the alphabet adds frequencies
of zero for every symbol in the derived alphabet that did not exist in the
base alphabet.
Click on the menu at the left to see which of the following sequence input methods are available.
Type in sequences
When this option is available you may directly input multiple
sequences by typing them. Sequences must be input in
FASTA format.
Upload sequences
When this option is available you may upload a file containing
sequences in FASTA format.
Upload BED file
When this option is available you may upload a file containing
sequence coordinates in BED format.
Databases (select category)
When this option is available you may first select a category of
sequence database from the list below it. Two additional menus will then appear
where you can select the particular database and version desired, respectively.
The full list of available sequence databases and their descriptions
can be viewed HERE.
Submitted sequences
This option is only available when you have invoked the current
program by clicking on a button in the output report of a different MEME Suite program.
By selecting this option you will input the sequences sent by that program.
Specify a file to upload containing
sequence coordinates in BED format.
The file must be based on the exact genome version you specified in the
menus above.
Selecting this option will filter the sequence menu to only contain
databases that have additional information that is specific to a tissue
or cell line.
This option causes MEME Suite to use tissue/cell-specific information
(typically from DNase I or histone modification ChIP-seq data) encoded
as a position specific prior that
has been created by the MEME Suite create-priors
utility. You can see a description of the sequence databases
for which we provide tissue/cell-specific priors
here.
Note that you cannot upload or type in your own sequences
when tissue/cell-specific scanning is selected.
Enter the email address where you want the job notification email to
be sent. Please check that this is a valid email address!
The notification email will include a link to your job results.
Note: You can also access your jobs via the Recent Jobs
menu on the left of all MEME Suite input pages. That menu only
keeps track of jobs submitted during the current session of your internet browser.
Note: Most MEME Suite servers only store results for a couple of days.
So be sure to download any results you wish to keep.
XSTREME will report motifs that are enriched in your (primary) sequences relative to a control set of sequences.
XSTREME inputs the primary and control sequences to the STREME motif discovery algorithm,
and to the SEA motif enrichment analysis algorithm. XSTREME will also create a Markov background
model from the control sequences (unless you provide one) that it inputs to the STREME, MEME and SEA algorithms.
If you do not provide control sequences, STREME and SEA create them by shuffling a copy of each primary sequence,
using an m-order shuffle. The value of m depends on the sequence alphabet (see Universal options, below).
Shuffling also preserves the positions of non-core (e.g., ambiguous) characters in each sequence to avoid artifacts.
Alternatively, you may may provide a set of control sequences ("User-provided sequences").
IMPORTANT NOTE: If you provide control sequences,
they should ideally have the same length distribution as your
the primary sequences. For example, all sequences
in both sets could have the same length, or, for each sequence in
the primary set there could be exactly N sequences with the
same length as it in in the control set.
Failure to ensure this may cause XSTREME to report inaccurate
estimates of the statistical significance (p-value)
of the motifs it finds.
Select a file of FASTA formatted
biological sequences or paste in FASTA formatted biological sequences to search sequence motifs.
The more sequences that you can give XSTREME the
more subtle the motifs it can find.
For ChIP-seq we recommend using sequences of length 100bp centered on
the summit or center of the peak. For CLIP-seq we recommend using
the actual peak regions.
Select a file of FASTA formatted
biological sequences or paste in FASTA formatted biological sequences to use as controls in the
search for motifs.
XSTREME inputs the primary and control sequences to the STREME motif
discovery algorithm, and to the SEA motif enrichment analysis algorithm.
If you do not provide a background model (see "Universal options", below)
XSTREME also creates a Markov background model from the control sequences
that it inputs to the STREME, MEME and SEA algorithms.
Your control sequences should have approximately the same length
distribution and background frequencies as your primary sequences
the motifs that you are attempting to find.
The more control sequences that you can give XSTREME, the
more subtle the motifs it can find.
XSTREME will determine which of these known motifs are enriched in
your sequences relative to the control sequences (using SEA). It will also compare
any motifs it discovers to each of these known motifs (using Tomtom).
Enter the email address where you want the job notification email to
be sent. Please check that this is a valid email address!
The notification email will include a link to your job results.
Note: You can also access your jobs via the Recent Jobs
menu on the left of all MEME Suite input pages. That menu only
keeps track of jobs submitted during the current session of your internet browser.
Note: Most MEME Suite servers only store results for a couple of days.
So be sure to download any results you wish to keep.
This is the width (number of characters in the sequence pattern) of a
single discovered motif. STREME and MEME choose the optimal width of each motif individually
using a statistical heuristic function.
You can choose limits
for the minimum and maximum motif widths that STREME and MEME will consider. The
width of each motif that STREME and MEME report will lie within the limits you
choose.
Specify the order (m) for the background model and sequence shuffling.
By default, XSTREME uses m=2 for DNA and RNA sequences, and m=0 for
protein or custom alphabet sequences.
Check this box and set the value of m if you want to override
the default value of m that XSTREME uses.
If you upload a background model (see option above), XSTREME will only
use the m-order portion of that model.
If you do not upload a background model,
XSTREME will create an order-m model
from the control sequences that you provide, or from the shuffled primary sequences
if you don't provide control sequences.
If you do not specify a set of control sequences, XSTREME will
create one by shuffling each primary sequence while preserving
the frequencies of all words of length k that it contains,
where k=m+1.
Check this box and enter a size if you want to limit motif discovery and enrichment
analysis to the central regions of the (primary) sequences. Only the
central 'size' characters of each (primary) sequence will be input to the
STREME, MEME and SEA algorithms. The full-length sequences will still be used for the
positional distribution plots and as input to FIMO.
For the site positional distribution diagrams, align the sequences
on their left ends, on their centers, or on their right ends.
For visualizing motif distributions, center alignment is
ideal for ChIP-seq and similar data; right alignment
for sequences upstream of transcription start sites; left
alignment for many proteins or 3' UTR sequences.
XSTREME will only include motifs in its output if their E-value is
no larger than this value.
This is also used as the default E-value threshold for STREME and MEME
(see "STREME options" and "MEME options", below).
The background model normalizes for biased distribution of
letters and groups of letters in your sequences.
A 0-order model adjusts for single letter biases, a 1-order model adjusts for
dimer biases (e.g., GC content in DNA sequences), etc.
By default XSTREME will determine the background Markov model from
the control sequences (or from the primary sequences if you do not provide
control sequences). The order of the background model depends
on the sequence alphabet, but you can also set it manually (see option "What Markov order...", below).
Alternately you may select "Upload background model" and input a file containing
a background model.
The downloadable version of the MEME Suite contains a program named
"fasta-get-markov" that you can use to create background model files in
the correct format from FASTA sequence files.
STREME stops looking for motifs when one of the limits below is met.
There is an additional time limit which is set by the server operator.
Default E-value threshold
STREME will use the value of the E-value threshold
given above under "Universal options". The STREME E-value
is the probability of a motif being found that would discriminate
the primary sequences from the control sequences at least as well,
assuming that the letters in the primary sequences were randomly shuffled.
STREME stops when 3 motifs have been found whose E-values exceed
this threshold.
E-value threshold
Check this box and enter the desired threshold if you
want STREME to use a different E-value threshold than MEME.
Number of motifs
Check this box and set the (maximum) number of motifs
you want STREME to find before it stops. STREME will ignore
the E-value threshold if this box is checked.
By default STREME does not limit the number of motifs to be found,
but uses the E-value threshold as its stopping criterion.
MEME stops looking for motifs when one of the limits below is met.
There is an additional time limit which is set by the server operator.
Default E-value threshold
MEME will use the value of the E-value threshold
given above under "Universal options". The MEME E-value
is the probability of a motif being found that would have
information content as at least as high, assuming that the letters
in the primary sequences were randomly shuffled (0-order shuffle).
MEME stops when the next motif has E-value exceeding this threshold.
E-value threshold
Check this box and enter the desired threshold if you
want MEME to use a different E-value threshold than STREME.
Number of motifs
Check this box and set the (maximum) number of motifs
you want MEME to find before it stops. MEME will ignore
the E-value threshold if this box is checked.
By default MEME does not limit the number of motifs to be found,
but uses the E-value threshold as its stopping criterion.
This is where you tell MEME how you believe occurrences of the motifs
are distributed among the sequences. Selecting the correct type of
distribution improves the sensitivity and quality of the motif search.
Zero or one occurrence per sequence
MEME assumes that each sequence may contain at most one
occurrence of each motif. This option is useful when you suspect that
some motifs may be missing from some of the sequences. In that case, the
motifs found will be more accurate than using the one occurrence per
sequence option. This option takes more computer time than the one
occurrence per sequence option (about twice as much) and is slightly less
sensitive to weak motifs present in all of the sequences.
One occurrence per sequence
MEME assumes that each sequence in the dataset contains
exactly one occurrence of each motif. This option is the fastest
and most sensitive but the motifs returned by MEME may be "blurry" if
any of the sequences is missing them.
Any number of repetitions
MEME assumes each sequence may contain any number of
non-overlapping occurrences of each motif. This option is useful when
you suspect that motifs repeat multiple times within a single sequence.
In that case, the motifs found will be much more accurate than using one
of the other options. This option can also be used to discover repeats
within a single sequence. This option takes much more computer time than
the one occurrence per sequence option (about ten times as much) and is
somewhat less sensitive to weak motifs which do not repeat within a
single sequence than the other two options.
Check this box if you want SEA to output the IDs of the sequences matching
each significant motif in a TSV file. This file can be very large, so don't
check this box if you don't need that information.