xstreme [options] [--m <motifs>]* --p <primary sequences>
The name of a file containing the primary (positive) sequences in FASTA format on which to perform comprehensive motif analysis. The file must contain at least two valid sequences or XSTREME will reject it.
XSTREME writes its output to files in a directory named
xstreme_out
, which it creates if necessary. You can change the
output directory using the --o or --oc options.
The directory will contain the following files:
xstreme.html
-
an HTML file that provides the results in an interactive, human-readable format that contains
links to the other files produced by the analyses performed by XSTREME
xstreme.tsv
-
a TSV (tab-separated values) file that provides
a summary of the results in a format suitable for parsing by scripts and viewing with Excel
xstreme.txt
-
a text file that contains all the non-redundant ab initio motifs discovered by XSTREME
in MEME Motif Format
In addition, the XSTREME output directory will contain sub-directories with the results of each of the individual analyses it performed. The results in these directories are all linked to from the XSTREME HTML output file.
Note: See this detailed description of the XSTREME output formats for more information.
Note: All options may be preceded by a single dash (-) instead of a double dash (--) if desired.
Option | Parameter | Description | Default Behavior |
---|---|---|---|
Output | |||
Primary Sequences | |||
--p | primary sequences | [REQUIRED] The name of a file containing primary (positive) sequences in FASTA format. XSTREME will perform comprehensive motif analysis on these sequences. | None. This input is required. |
Control Sequences and Background Model | |||
--n | control sequences | The name of a file containing control (negative) sequences in FASTA format. XSTREME will report motifs that are enriched in the primary sequences relative to the control sequences. XSTREME inputs the primary and control sequences to the STREME motif discovery algorithm, and to the SEA motif enrichment analysis algorithm. If you do not provide a background model (see option --bfile, below), XSTREME also creates a Markov background model from the control sequences that it inputs to the STREME, MEME and SEA algorithms. The control sequences must be in the same sequence alphabet as the primary sequences. | If you do not provide control sequences, XSTREME creates them by shuffling a copy of each primary sequence, using an m-order shuffle (see next option). Shuffling also preserves the positions of non-core (e.g., ambiguous) characters in each sequence to avoid artifacts. |
--order | m | Estimate an m-order Markov background model from the
control sequences for input to the STREME, MEME and SEA algorithms.
If you do not provide control sequences, XSTREME creates them
by shuffling a copy of each primary sequence, using an m-order
shuffle of each primary sequence. This preserves the frequencies of words of
length m+1 in each shuffled sequence.
m must be in the range [0,..,4].
Note: If you do not specify control sequences, XSTREME estimates the background
model from the primary sequences instead. XSTREME uses the
fasta-get-markov
program with a total pseudocount of 1 to create the Markov model.
Note: If you specify a background model using --bfile
(see below), XSTREME does not estimate a background model, but passes the
specified background to STREME, MEME and SEA. XSTREME also passes
m to STREME, MEME and SEA.
| XSTREME uses m=2 (DNA and RNA), and m=0 (Protein and Custom alphabets). |
--bfile | file | Specify the source of a background model in Markov Background Model Format to be passed to STREME, MEME and SEA. | XSTREME estimates a background model from the control sequences, or from the primary sequences if you do not provide control sequences, as described above for the --order option. |
--seed | seed | Random seed to be passed to STREME, MEME and SEA. | XSTREME uses a random seed of 0. |
Input Motifs | |||
--m | motifs | recommended The name of a file containing known MEME formatted motifs. Outputs from MEME, STREME and DREME are supported, as well as Minimal MEME Format. You can convert many other motif formats to MEME Motif format using conversion scripts available with the MEME Suite. This option may be repeated to pass multiple files of motifs. | When no files are provided, XSTREME cannot report similar known motifs. |
Alphabet | |||
Output Filtering and Number of Motifs | |||
--evt | evt | E-value threshold for including motifs in the output. This is also used as the default E-value threshold for STREME (--streme-evt) and for MEME (--meme-evt). | A value of 0.05 is used. |
--time | minutes | The maximum time (in minutes) that XSTREME is allowed to run before terminating itself gracefully. | There is no time limit |
Motif Width | |||
--minw | width | The minimum width of motifs to find. | A minimum width of 6 is used unless the maximum width has been set to be less than 6 in which case the maximum width is used. |
--maxw | width | The maximum width of motifs to find. | A maximum width of 15 is used unless the minimum width has been set to be larger than 15 in which case the minimum width is used. |
--w | width | Search for motifs with an exact width of width. Overrides --minw and --maxw. | See --minw and --maxw, above. |
Misc | |||
--mea-only | Use XSTREME as a Motif Enrichment Analysis (MEA) tool. No motif discovery algorithms will be run. XSTREME will use SEA to analyze the enrichment of the known motifs you provide in your input sequences. XSTREME will cluster the enriched motifs as usual, and will show the distribution of the motif sites in your input sequences for each enriched motif it identifies. This option sets options --streme-nmotifs and --meme-nmotifs to 0, and sets option --fimo-skip (see below). | Perform motif discovery and motif enrichment analysis. | |
--ctrim | size | For input to STREME, MEME and SEA, XSTREME will trim the primary sequences to their central region of size characters. (The full-length sequences will still be used for the postional distribution plots and as input to FIMO.) | The input sequences will not be trimmed. |
--align | left | center | right | For the site positional distribution diagrams, XSTREME will align the sequences on their left ends (left), on their centers (center), or on their right ends (right). For visualizing motif distributions, center alignment is ideal for ChIP-seq and similar data; right alignment for sequences upstream of transcription start sites; left alignment for many proteins or 3' UTR sequences. | Align the sequences on their centers. |
--group-thresh | gthr | Main threshold for clustering highly similar motifs in XSTREME output. All motifs in a group will have a Tomtom E-value less than or equal to gthr when compared to the seed motif for the group, which is the most significant motif in the group. | A value of 0.05 is used. |
--group-weak | wthr | Secondary threshold for clustering highly similar motifs in XSTREME output. If this is specified by the user, groups will be merged into a more significant group if all their motifs are weakly similar to the seed motif of the more significant group. wthr specifies the Tomtom E-value threshold for merging groups. | Set to be equal to twice the value of the main clustering threshold: 2 * gthr. |
--verbosity | 0|1|2|3|4|5 | A number that regulates the verbosity level of the output information messages. If set to 0 (very quiet), XSTREME will only output warning and error messages. If set to 1 (quiet), then XSTREME will also output the start/stop of each pipeline step. At the other extreme 5 (dump), lots of information intended for debugging will be output by XSTREME and the programs in its pipeline. | The verbosity level is set to 1 (quiet). |
STREME Specific Options | |||
--streme-evt | E-value | Stop searching for more motifs when three successive motifs have E-values larger than this threshold. | The value specified for --evt (or its default) is used. |
--streme-nmotifs | count | Stop searching for more motifs when count motifs have been found. If count is 0, STREME will not be run. | Search stops when the --streme-evt criterion has been satisfied. |
--streme-totallength | totallength | The maximum length of each sequence set (in characters) used by STREME. If the input sequence sets exceed this limit they will be down-sampled. See the documentation on the STREME --totallength option in the STREME documentation for more details. | The total length of the input sequences to STREME is not limited. |
MEME Specific Options | |||
--meme-evt | E-value | Stop searching for more motifs if next motif has E-value larger than this threshold. | The value specified for --evt (or its default) is used. |
--meme-nmotifs | num | The number of motifs that MEME should search for. If num is 0, MEME will not be run. | Search stops when the --meme-evt criterion has been satisfied. |
--meme-searchsize | searchsize | The maximum portion of the primary sequences (in characters) used by MEME in searching for motifs. See the documentation on the MEME -searchsize option in the MEME documentation for more details. | MEME performs sampling if the primary sequences contain more than 100,000 characters. |
--meme-p | np | Use faster, parallel version of MEME with np processors.
The parameter np may be a number or it
may be a quoted string starting with a number and followed by arguments
to the particular MPI run command for your installation (e.g., mpirun ). |
Use a single processor. |
--meme-brief | nbrief | If there are more than nbrief (primary) sequences, the size of MEME's output will be reduced by suppressing the inclusion of the sequence names, motif sites and scanned sites in MEME's HTML and XML outputs, and by suppressing the tables of sequence lengths, sites and block diagrams in MEME's text output. | A value of 1000 is used for nbrief. |
--meme-mod | oops|zoops|anr | The number of motif sites that MEME will find per sequence.
oops - One Occurrence Per Sequence,
See -mod in the
MEME documentation for more information.
zoops - Zero or One Occurrence Per Sequence, anr - Any Number of Repetitions |
MEME defaults to using zoops mode. |
SEA Specific Options | |||
--sea-noseqs | Do not output the SEA matching sequences TSV file. This option is useful to if the matching sequence information is not needed as the TSV file can be very large. | SEA will output the matching sequences TSV file. | |
FIMO Specific Options | |||
--fimo-skip | Do not run FIMO. This option is useful for saving disk space if the predicted motif sites in the sequences are not needed. | Run FIMO using most significant motif from each cluster to scan the (full-length) input sequences. |