XSTREME

If your sequences are not in a standard alphabet (DNA, RNA or protein), you must input a custom alphabet file.

Click on the menu at the left to see which of the following motif input methods are available.

Type in motifs: When this option is available you may directly input multiple motifs by typing them (or using "cut-and-paste"). First select the desired motif alphabet using the menu immediately to the left. If you select the "Custom" option then you must provide an alphabet definition in the file input that immediately follows. Warning: custom alphabets are case-sensitive. You may optionally give each motif an identifier and alternate name by inputting a line like >Identifer Alternate-Name preceeding the motif. You can then enter each motif as either matrices, sequence sites or fixed-length regular expressions. You can enter multiple motifs by typing an empty line after each motif. Individual motifs will be shown in square brackets, and errors in your motifs will be highlighted in red while warnings will be highlighted in yellow. Mouse-over individual motifs to display their sequence logos. View the examples for more information on what is possible.
Upload motifs: When this option is available you may upload a file containing motifs in MEME motif format. This includes the outputs generated by MEME and DREME, as well as files you create using the motif conversion scripts or manually following the MEME motif format guidelines.
Databases (select category): When this option is available you can select the category of motif database desired from the list below it. Then select the motif database from the displayed list. Consult the motif database documentation for descriptions of all the motif databases present on this MEME Suite server.
Submitted motifs: This option is only available when you have invoked the current program by clicking on a button in the output report of a different MEME Suite program. By selecting this option you will input the motifs sent by that program.

Typed Motifs - Matrices

You may input both probability and count matrices of either orientation and the rules described below will be used to convert the matrix into a MEME formatted motif.

Alphabet Order

The counts/probabilities are expected to be ordered based on the alphabetical ordering of their codes. So DNA is ordered ACGT and protein is ordered ACDEFGHIKLMNPQRSTVWY. For custom alphabets the ordering goes uppercase letters (A-Z), lowercase letters (a-z), numbers (0-9) and finally the symbols '*', '-' and '.'.

Matrix Orientation

Matrix motifs may be input with either one position per row (preferred) which is called row orientation, or one position per column which is called column orientation. The orientation is determined by picking which dimension (row or column) is equal to the alphabet size. If both dimensions are equal to the alphabet size then row orientation is assumed. If neither dimension is equal to the alphabet size then the closest that is still smaller than the alphabet size is picked, however if both are equally smaller then column orientation is assumed. Finally if none of the above rules work to determine the orientation then row orientation is assumed.

Site counts

Once the orientation is determined, the sum of the numbers that make up the first position is calculated and rounded to the nearest integer. If that value is larger than 1 then the matrix is assumed to be a count matrix and that value is used as the site count, otherwise the matrix is assumed to be a probability matrix and a site count of 20 is used.

Converting to a normalized probability matrix

Once the orientation is determined then each number in the matrix is converted to a normalized probability by dividing by the sum of all the numbers for that motif position. If any numbers are missing they are assumed to have the value zero. As a special case if all numbers in a motif position have the value zero then they are given the uniform probability of 1 / alphabet size.

Yellow highlighting and red annotations

Red asterisks (*) indicate where the parser thinks values are missing. A yellow highlighted row or column with a red number at the end indicates that the counts for that position don't sum to the same count as the first position. The red number shows the difference. If the red number is negative then that position sums to less then the first position, if it is positive then it sums to more than the first position.

Description	Code	Bases
Uracil	U	T
Weak	W	A, T
Strong	S	C, G
Amino	M	A, C
Keto	K	G, T
Purine	R	A, G
Pyrimidine	Y	C, T
Not A	B	C, G, T
Not C	D	A, G, T
Not G	H	A, C, T
Not T	V	A, C, G
Any	N	A, C, G, T

Description	Code	Bases
Asparagine or aspartic acid	B	N, D
Glutamine or glutamic acid	Z	E, Q
Leucine or Isoleucine	J	I, L
Unspecified or unknown amino acid	X	A, C, D, E, F, G, H, I, K, L, M, N, P, Q, R, S, T, V, W, Y

Typed Motifs - Matrices

Alphabet Order

Matrix Orientation

Site counts

Converting to a normalized probability matrix

Yellow highlighting and red annotations

Typed Motifs - Sequence Sites or Regular Expressions

Ambiguity Codes

Bracket Expressions

Multiple sites

DNA Alphabet

Protein Alphabet

Typed Motifs - Examples

Motif Discovery and Enrichment Analysis

Select the type of control sequences to use

Select the sequence alphabet

Use sequences with a standard alphabet or specify a custom alphabet.

Input the sequences

Enter the sequences in which you want to find motifs.

Select the BED file to upload.

Input the control sequences

XSTREME will find motifs that are enriched relative to these sequences.

Select the BED file to upload.

Convert DNA sequences to RNA?

Input the motifs

Select, upload or enter a set of known motifs.

Input job details

(Optional) Enter your email address.

(Optional) Enter a job description.

How should XSTREME limit its search?

What width motifs XSTREME discover?

What should be used as the background model?

What Markov order should XSTREME use for shuffling sequences and the background model?

Should XSTREME use only the central portion of the (primary) sequences?

How should sequences be aligned for site positional diagrams?

Should STREME and FIMO parse genomic coordinates?

How should STREME limit its search?

How should MEME limit its search?

What is the expected motif site distribution?

Should SEA output TSV files of matching sequences and sites?