spamo [options] <sequences> <primary motif> <secondary motifs>+
A FASTA formatted file containing lots of short sequences centered on a site expected to be relevant to the primary motif. This would typically be generated by expanding either side of a ChIP-seq peak to obtain sequences of about 500 bases in length.
SpaMo scans the central section, excluding the margin on either edge, for the primary motif. As the margin on each edge is excluded then if the sequence is shorter than two times the margin plus the trimmed length of the primary motif the sequence will always be discarded.
A file containing at least one MEME formatted motif.
Outputs from MEME and DREME are supported, as well as Minimal MEME
Format. You can convert many other motif formats to MEME format
using conversion scripts
available with the MEME Suite.
The primary motif is the motif for which you are trying to find cofactors. If the file
contains more than one motif then the first will be selected by default
or another can be selected using the -primary
or -primaryi
options.
One or more MEME formatted
motif files containing DNA motifs (see Primary Motifs, above). The
secondary motifs are tested for a significant spacing with the primary
motif which might imply they act together. If the motif databases
contain motifs which you don't wish to scan, the motifs can be filtered
based on their name by using the -inc
and -exc
options.
SpaMo outputs its output to files in a directory named
spamo_out
, which it creates if necessary. You can change the
output directory using the -o
or -oc
options.
The main output file is named spamo.html
and can be viewed
with a web browser. The spamo.html
file is generated from the
spamo.xml
file so using the xml file is recommended when
machine processing is required.
The histograms are only generated when the -eps
and/or
the -png
options
are specified. If you are viewing the output in older web-browsers you
will need to specify the -png
option so the histograms are viewable.
Option | Parameter | Description | Default Behaviour | ||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Input/Output | |||||||||||||||||||||||||||||||||||||
-loadcismls | Load CISML files to get motif position scores instead of scanning.
If this flag is specified then each motif file must have a CISML
file specified after it. This is not compatible with -trim as that
option must modify the motifs before scanning. |
Scan sequences to determine the position scores. | |||||||||||||||||||||||||||||||||||
-eps | Output histograms in Encapsulated PostScript format which can be
included in publications. This option can be used with the -png
option. |
Image files are not output by default as the webpage is capable of generating the graphs on demand. | |||||||||||||||||||||||||||||||||||
-png | Output histograms in Portable Network Graphic format which is good
for webpages. This option can be used with the -eps option |
Image files are not output by default as the webpage is capable of generating the graphs on demand. | |||||||||||||||||||||||||||||||||||
-dumpseqs | Write space separated values in columns, describing the motif
matches used to make the histograms, to output files. The rows are
initially in sequence name order but various command-line tools can
be used to sort them on other values. The columns contain:
|
No specific match information is output. | |||||||||||||||||||||||||||||||||||
-dumpsigs | Same as -dumpseqs , but only secondary matches in significant
bins are dumped. |
As in -dumpseqs . |
|||||||||||||||||||||||||||||||||||
Scanning | |||||||||||||||||||||||||||||||||||||
-numgen | seed | Specify a number as the seed for initializing the pseudo-random number generator used in breaking scoring ties. The seed is included in the output so experiments can be repeated. If you wish to run multiple experiments with different seeds then you can use the special value 'time' (without the quotes) which sets the seed to the system clock. | A seed of 1 is used. | ||||||||||||||||||||||||||||||||||
-margin | size | The distance either side of the primary motif site which makes up the region that can contain the secondary motif site. Additionally it is the minimum gap between the primary motif site and the edge of the sequence. These constraints mean that input sequences shorter than the trimmed length of the primary motif plus two times the margin size can not be used by SpaMo. | A margin of 150 is used. For an input sequence of length 500 this means the central 200 bases are scanned for the best primary motif match and then the 300 bases surrounding the best primary site are scanned for the best secondary site. | ||||||||||||||||||||||||||||||||||
-bin | size | The size of the bin used to calculate the histogram and p-values. A bin size of 1 is recommended as it gives better output. | A bin size of 1 is used. | ||||||||||||||||||||||||||||||||||
-range | size | The distance from the primary motif site for which p-values are calculated to include in significance tests. A small value for range may miss significant peaks but this is a trade-off as a the larger the range the more bins have to be tested leading to a larger factor used in the Bonferroni correction for multiple tests. | A range of 150 is used. | ||||||||||||||||||||||||||||||||||
-shared | fraction | Redundant sequences are removed that have more than this fraction of identical residues. After the primary motif site has been selected in each sequence the sequence is trimmed to only include a region of size margin on either side of the primary motif site. This aligned and trimmed sequence (and its reverse complement) is then compared with all the other sequences and the fraction of shared bases is calculated, not including the bases in the match to the primary motif. If the fraction of shared bases between the sequence (or its reverse complement) is larger than this limit, then the second sequences is eliminated. To disable this feature set the shared fraction to 1. | The shared fraction is set to 0.5 which means that the trimmed, aligned sequences must share 50% or more of their bases to be declared redundant. | ||||||||||||||||||||||||||||||||||
-odds | odds ratio | To speed up the elimination of redundant sequences their
positions are compared in a random order and comparison stops
whenever the number of matches is so small that the odds ratio
is greater than this value. The odds ratio is the probability
of the given number of matches given that the sequences were
generated by the background model, divided by the same probability
given they have at least fraction
matching positions (as specified by the option -shared ).
| The odds ratio is set to 20. | ||||||||||||||||||||||||||||||||||
Summarizing | |||||||||||||||||||||||||||||||||||||
-cutoff | p-value | The p-value cutoff for bins to be considered significant.
This is the p-value of the Binomial Test on the number of
observed secondary spacings or more falling into the given bin,
adjusted for the number of bins tested. Note that the p-value
is only calculated and tested for bins within the distance of the
primary motif as specified by the option -range . |
A bin p-value smaller than or equal to 0.05 is considered significant. | ||||||||||||||||||||||||||||||||||
-evalue | E-value | The minimum secondary motif E-value for its results to be printed. For each secondary motif, this is the minimum p-value of all tested bins multipled by the number of secondary motifs. The E-value estimates the expected number of random secondary motifs that would have the given E-value or lower. | Results for all secondary motifs with E-value smaller than or equal to 10 are printed. | ||||||||||||||||||||||||||||||||||
-overlap | size | To determine if two motifs are redundant the most significant bin in the tested range for each of the motifs is compared. For the motifs to be considered redundant it needs to be possible that the sites that got counted in the bin could have overlapped, and this parameter sets the minimum overlap. For a bin size larger than 1 the overlap of the bins can not be precisely calculated as the actual site positions are not stored and so the maximum possible overlap is used. | A minimum overlap of 2 is required. | ||||||||||||||||||||||||||||||||||
-joint | fraction | To determine if two motifs are redundant the most significant bin in the tested range in each of the motifs is compared. The most significant bin in each motif has the list of sequence identifiers which had a primary and secondary at the correct spacing to go into that bin. To compare the motifs for redundancy this set of sequence identifiers is compared and the size of the intersection is counted. This intersection size is divided by the size of the smaller of the two sequence sets to get the joint sequence fraction. | A minimum joint sequence fraction of 0.5 is required for two motifs to be considered redundant. | ||||||||||||||||||||||||||||||||||
Motif Loading | |||||||||||||||||||||||||||||||||||||
-pseudo | count | The pseudocount added to loaded motifs. | A pseudocount of 0.1 is added to loaded motifs. | ||||||||||||||||||||||||||||||||||
-bgfile | file | The file containing the background frequency information used in applying pseudocounts. | The frequencies of bases in the sequences are used as a background. | ||||||||||||||||||||||||||||||||||
-trim | bits | Trim the edges of motifs based on the information content. The
positions on the edges of the motifs with information content less
than bits will not be used in scanning. This is incompatible with the
-loadcismls option as the motifs must be trimmed before scoring can
take place. |
Positions on the edges of the motifs with information content less than or equal to 0.25 will be trimmed. | ||||||||||||||||||||||||||||||||||
-primary | name | The name of the motif to select as the primary motif. This option
is incompatible with -primaryi as only one primary motif can be
selected. |
The first motif in the file is selected. | ||||||||||||||||||||||||||||||||||
-primaryi | num | The index of the motif to select as the primary motif counting
from 1. This option is incompatible with -primary as only one primary
motif can be selected. |
The first motif in the file is selected. | ||||||||||||||||||||||||||||||||||
-keepprimary | If the same file is specified for the primary and secondary motifs then by default the primary motif is excluded but specifying this option keeps it. | The primary motif is excluded from the secondaries if the same file is used for the primary and secondary motifs. | |||||||||||||||||||||||||||||||||||
-inc | pattern | Select the motifs with names matching the pattern. The pattern can contain shell like wildcards (e.g., '*') though they must be escaped or quoted to prevent the shell from auto-expanding them. This option may be may be repeated and all the patterns will be used. | Unless the -exc option has been specified all the motifs are
used. |
||||||||||||||||||||||||||||||||||
-exc | pattern | Exclude the motifs with names matching the pattern. The pattern can contain shell like wildcards (e.g., '*') though they must be escaped or quoted to prevent the shell from auto-expanding them. This option may be may be repeated and all the patterns will be used. | Unless the -inc option has been specified all the motifs are
used. |
||||||||||||||||||||||||||||||||||
Miscellaneous | |||||||||||||||||||||||||||||||||||||
-help | Print out a help message. |
If you use SpaMo in your research please cite the following paper:
Tom Whitington, Martin C. Frith, James Johnson and Timothy L. Bailey,
"Inferring transcription factor complexes from ChIP-seq data",
Nucleic Acids Research, 39(15):e98, 2011.
[full text]