Usage:

xstreme [options] [--m <motifs>]* --p <primary sequences>

Description

Input

--p <primary sequences>

The name of a file containing the primary (positive) sequences in FASTA format on which to perform comprehensive motif analysis. The file must contain at least two valid sequences or XSTREME will reject it.

Output

XSTREME writes its output to files in a directory named xstreme_out, which it creates if necessary. You can change the output directory using the --o or --oc options. The directory will contain the following files:

In addition, the XSTREME output directory will contain sub-directories with the results of each of the individual analyses it performed. The results in these directories are all linked to from the XSTREME HTML output file.

Note: See this detailed description of the XSTREME output formats for more information.

Note: All options may be preceded by a single dash (-) instead of a double dash (--) if desired.

Options

Option Parameter Description Default Behavior
Output
Primary Sequences
--p primary sequences [REQUIRED] The name of a file containing primary (positive) sequences in FASTA format. XSTREME will perform comprehensive motif analysis on these sequences. None. This input is required.
Control Sequences and Background Model
--n control sequences The name of a file containing control (negative) sequences in FASTA format. XSTREME will report motifs that are enriched in the primary sequences relative to the control sequences. XSTREME inputs the primary and control sequences to the STREME motif discovery algorithm, and to the SEA motif enrichment analysis algorithm. If you do not provide a background model (see option --bfile, below), XSTREME also creates a Markov background model from the control sequences that it inputs to the STREME, MEME and SEA algorithms. The control sequences must be in the same sequence alphabet as the primary sequences. If you do not provide control sequences, XSTREME creates them by shuffling a copy of each primary sequence, using an m-order shuffle (see next option). Shuffling also preserves the positions of non-core (e.g., ambiguous) characters in each sequence to avoid artifacts.
--order m Estimate an m-order Markov background model from the control sequences for input to the STREME, MEME and SEA algorithms. If you do not provide control sequences, XSTREME creates them by shuffling a copy of each primary sequence, using an m-order shuffle of each primary sequence. This preserves the frequencies of words of length m+1 in each shuffled sequence. m must be in the range [0,..,4]. Note: If you do not specify control sequences, XSTREME estimates the background model from the primary sequences instead. XSTREME uses the fasta-get-markov program with a total pseudocount of 1 to create the Markov model. Note: If you specify a background model using --bfile (see below), XSTREME does not estimate a background model, but passes the specified background to STREME, MEME and SEA. XSTREME also passes m to STREME, MEME and SEA. XSTREME uses m=2 (DNA and RNA), and m=0 (Protein and Custom alphabets).
--bfile file Specify the source of a background model in Markov Background Model Format to be passed to STREME, MEME and SEA. XSTREME estimates a background model from the control sequences, or from the primary sequences if you do not provide control sequences, as described above for the --order option.
--seed seed Random seed to be passed to STREME, MEME and SEA. XSTREME uses a random seed of 0.
Input Motifs
--mmotifs recommended The name of a file containing known MEME formatted motifs. Outputs from MEME, STREME and DREME are supported, as well as Minimal MEME Format. You can convert many other motif formats to MEME Motif format using conversion scripts available with the MEME Suite. This option may be repeated to pass multiple files of motifs. When no files are provided, XSTREME cannot report similar known motifs.
Alphabet
Output Filtering and Number of Motifs
--evtevt E-value threshold for including motifs in the output. This is also used as the default E-value threshold for STREME (--streme-evt) and for MEME (--meme-evt). A value of 0.05 is used.
--timeminutes The maximum time (in minutes) that XSTREME is allowed to run before terminating itself gracefully. There is no time limit
Motif Width
--minwwidth The minimum width of motifs to find. A minimum width of 6 is used unless the maximum width has been set to be less than 6 in which case the maximum width is used.
--maxwwidth The maximum width of motifs to find. A maximum width of 15 is used unless the minimum width has been set to be larger than 15 in which case the minimum width is used.
--wwidth Search for motifs with an exact width of width. Overrides --minw and --maxw. See --minw and --maxw, above.
Misc
--mea-only  Use XSTREME as a Motif Enrichment Analysis (MEA) tool. No motif discovery algorithms will be run. XSTREME will use SEA to analyze the enrichment of the known motifs you provide in your input sequences. XSTREME will cluster the enriched motifs as usual, and will show the distribution of the motif sites in your input sequences for each enriched motif it identifies. This option sets options --streme-nmotifs and --meme-nmotifs to 0, and sets option --fimo-skip (see below). Perform motif discovery and motif enrichment analysis.
--ctrimsize For input to STREME, MEME and SEA, XSTREME will trim the primary sequences to their central region of size characters. (The full-length sequences will still be used for the postional distribution plots and as input to FIMO.) The input sequences will not be trimmed.
--align left | center | right For the site positional distribution diagrams, XSTREME will align the sequences on their left ends (left), on their centers (center), or on their right ends (right). For visualizing motif distributions, center alignment is ideal for ChIP-seq and similar data; right alignment for sequences upstream of transcription start sites; left alignment for many proteins or 3' UTR sequences. Align the sequences on their centers.
--group-threshgthr Main threshold for clustering highly similar motifs in XSTREME output. All motifs in a group will have a Tomtom E-value less than or equal to gthr when compared to the seed motif for the group, which is the most significant motif in the group. A value of 0.05 is used.
--group-weakwthr Secondary threshold for clustering highly similar motifs in XSTREME output. If this is specified by the user, groups will be merged into a more significant group if all their motifs are weakly similar to the seed motif of the more significant group. wthr specifies the Tomtom E-value threshold for merging groups. Set to be equal to twice the value of the main clustering threshold: 2 * gthr.
--verbosity0|1|2|3|4|5 A number that regulates the verbosity level of the output information messages. If set to 0 (very quiet), XSTREME will only output warning and error messages. If set to 1 (quiet), then XSTREME will also output the start/stop of each pipeline step. At the other extreme 5 (dump), lots of information intended for debugging will be output by XSTREME and the programs in its pipeline. The verbosity level is set to 1 (quiet).
STREME Specific Options
--streme-evtE-value Stop searching for more motifs when three successive motifs have E-values larger than this threshold. The value specified for --evt (or its default) is used.
--streme-nmotifscount Stop searching for more motifs when count motifs have been found. If count is 0, STREME will not be run. Search stops when the --streme-evt criterion has been satisfied.
--streme-totallength totallength The maximum length of each sequence set (in characters) used by STREME. If the input sequence sets exceed this limit they will be down-sampled. See the documentation on the STREME --totallength option in the STREME documentation for more details. The total length of the input sequences to STREME is not limited.
MEME Specific Options
--meme-evtE-value Stop searching for more motifs if next motif has E-value larger than this threshold. The value specified for --evt (or its default) is used.
--meme-nmotifsnum The number of motifs that MEME should search for. If num is 0, MEME will not be run. Search stops when the --meme-evt criterion has been satisfied.
--meme-searchsize searchsize The maximum portion of the primary sequences (in characters) used by MEME in searching for motifs. See the documentation on the MEME -searchsize option in the MEME documentation for more details. MEME performs sampling if the primary sequences contain more than 100,000 characters.
--meme-pnp Use faster, parallel version of MEME with np processors. The parameter np may be a number or it may be a quoted string starting with a number and followed by arguments to the particular MPI run command for your installation (e.g., mpirun). Use a single processor.
--meme-brief nbrief If there are more than nbrief (primary) sequences, the size of MEME's output will be reduced by suppressing the inclusion of the sequence names, motif sites and scanned sites in MEME's HTML and XML outputs, and by suppressing the tables of sequence lengths, sites and block diagrams in MEME's text output. A value of 1000 is used for nbrief.
--meme-modoops|zoops|anr The number of motif sites that MEME will find per sequence.
oops - One Occurrence Per Sequence,
zoops - Zero or One Occurrence Per Sequence,
anr - Any Number of Repetitions
See -mod in the MEME documentation for more information.
MEME defaults to using zoops mode.
SEA Specific Options
--sea-noseqs  Do not output the SEA matching sequences TSV file. This option is useful to if the matching sequence information is not needed as the TSV file can be very large. SEA will output the matching sequences TSV file.
FIMO Specific Options
--fimo-skip  Do not run FIMO. This option is useful for saving disk space if the predicted motif sites in the sequences are not needed. Run FIMO using most significant motif from each cluster to scan the (full-length) input sequences.

Citing