Usage:

spamo [options] <sequence file> <primary motif> <secondary motifs>+

Description

Inputs

<sequence file>

The name of a FASTA formatted file containing sequences (ideally of about 500bp) centered on a genomic location expected to be relevant to the primary motif. This would typically be generated by expanding either side of a ChIP-seq peak to obtain sequences of about 500 bases in length.

SpaMo scans the central section, excluding the margin on either edge, for the primary motif. As the margin on each edge is excluded then if the sequence is shorter than two times the margin plus the trimmed length of the primary motif the sequence will always be discarded.

<primary motif>

The name of a file containing at least one MEME formatted motif. Outputs from MEME, STREME and DREME are supported, as well as Minimal MEME Format. You can convert many other motif formats to MEME format using conversion scripts available with the MEME Suite. The primary motif is the motif for which you are trying to find cofactors. If the file contains more than one motif then the first will be selected by default or another can be selected using the -primary or -primaryi options.

<secondary motifs>+

The names of one or more MEME formatted motif files containing DNA motifs (see Primary Motifs, above). The secondary motifs are tested for a significant spacing with the primary motif which might imply they act together. If the motif databases contain motifs which you don't wish to scan, the motifs can be filtered based on their name by using the -inc and -exc options.

Outputs

SpaMo writes its output to files in a directory named spamo_out, which it creates if necessary. You can change the output directory using the -o or -oc options. The directory will contain the following files:

Additional outputs may be requested using the -dumpseqs, -dumpsigs, -eps and -png options, as described below.

Note: See this detailed description of the SpaMo output formats for more information.

Options

Option Parameter Description Default Behavior
Output
-text   Output only a TSV file (spamo.tsv), not an HTML file (spamo.html). Both an HTML and a TSV file are output.
-eps   Output histograms in Encapsulated PostScript format which can be included in publications. This option can be used with the -png option. Image files are not output by default as the webpage is capable of generating the graphs on demand.
-png   Output histograms in Portable Network Graphic format which is good for webpages. This option can be used with the -eps option. Image files are not output by default as the webpage is capable of generating the graphs on demand.
-dumpseqs   Write TSV files describing the motif matches used to make the histograms to output files named seqs_<primary_motif>_<secondary_db>_<secondary_motif>.tsv. The rows are sorted in sequence name order, but various command-line tools can be used to sort them on other values. The format of the files is described in detail in the SpaMo output formats documentation. No specific match information is output.
-dumpsigs   Same as -dumpseqs, but only secondary matches in significant bins are dumped. The format of the files is described in detail in the SpaMo output formats documentation. No specific match information is output.
Scanning
-norc Do not scan the reverse-complement strand of the sequences for motif sites. This is useful if the input sequences are DNA but you wish to treat them as RNA. Scan both strands of the input sequences for motif sites if the alphabet has a complement (as does DNA but not RNA or protein).
-minscore value The minimum score accepted as a match to either the primary or secondary motif. This value can greatly affect the results of SpaMo. If it is too high, there will be no matches to the primary motif. If too low, sequences with non-significant matches to the primary and/or secondary motif will reduce the effectiveness of the spacing analysis. Note: If value is in the range [-1,0) then the minimum score is set to the absolute value of value times the maximum possible match score. A minimum score of 7 bits is used.
-margin size The distance either side of the primary motif site which makes up the region that can contain the secondary motif site. Additionally it is the minimum gap between the primary motif site and the edge of the sequence. These constraints mean that input sequences shorter than the trimmed length of the primary motif plus two times the margin size can not be used by SpaMo. A margin of 150 is used. For an input sequence of length 500 this means the central 200 bases are scanned for the best primary motif match and then the 300 bases surrounding the best primary site are scanned for the best secondary site.
-range size The distance from the primary motif site for which p-values are calculated to include in significance tests. A small value for range may miss significant peaks but this is a trade-off as a the larger the range the more bins have to be tested leading to a larger factor used in the Bonferroni correction for multiple tests. A range of 150 is used.
-bin size The size of the bin used to calculate the histogram and p-values. A bin size of 1 is recommended as it gives better output. A bin size of 1 is used.
-usebestsec Use only the best match of the secondary motif that occurs within the distance range. Count all secondary matches above the score match threshold in the margins around the primary motif match.
-numgen seed Specify a number (or the word 'time') as the seed for initializing the pseudo-random number generator used in breaking scoring ties. The seed is included in the output so experiments can be repeated. If you wish to run multiple experiments with different seeds then you can use the special value 'time' (without the quotes), which sets the seed to the current value of the system clock. A seed of 1 is used.
-shared fraction Redundant sequences are removed that have more than this fraction of identical residues. After the primary motif site has been selected in each sequence the sequence is trimmed to only include a region of size margin on either side of the primary motif site. This aligned and trimmed sequence (and its reverse complement) is then compared with all the other sequences and the fraction of shared bases is calculated, not including the bases in the match to the primary motif. If the fraction of shared bases between the sequence (or its reverse complement) is larger than this limit, then the second sequences is eliminated. To disable this feature set the shared fraction to 1. The shared fraction is set to 0.5 which means that the trimmed, aligned sequences must share 50% or more of their bases to be declared redundant.
-odds odds ratio To speed up the elimination of redundant sequences their positions are compared in a random order and comparison stops whenever the number of matches is so small that the odds ratio is greater than this value. The odds ratio is the probability of the given number of matches given that the sequences were generated by the background model, divided by the same probability given they have at least fraction matching positions (as specified by the option -shared). The odds ratio is set to 20.
Summarizing
-cutoff p-value The p-value cutoff for bins to be considered significant. This is the p-value of the Binomial Test on the number of observed secondary spacings or more falling into the given bin, adjusted for the number of bins tested. Note that the p-value is only calculated and tested for bins within the distance of the primary motif as specified by the option -range. A bin p-value smaller than or equal to 0.05 is considered significant.
-evalue E-value The minimum secondary motif E-value for its results to be printed. For each secondary motif, this is the minimum p-value of all tested bins multiplied by the number of secondary motifs. The E-value estimates the expected number of random secondary motifs that would have the given E-value or lower. Results for all secondary motifs with E-value smaller than or equal to 10 are printed.
-overlap size To determine if two motifs are redundant the most significant bin in the tested range for each of the motifs is compared. For the motifs to be considered redundant it needs to be possible that the sites that got counted in the bin could have overlapped, and this parameter sets the minimum overlap. For a bin size larger than 1 the overlap of the bins can not be precisely calculated as the actual site positions are not stored and so the maximum possible overlap is used. A minimum overlap of 2 is required.
-joint fraction To determine if two motifs are redundant the most significant bin in the tested range in each of the motifs is compared. The most significant bin in each motif has the list of sequence identifiers which had a primary and secondary at the correct spacing to go into that bin. To compare the motifs for redundancy this set of sequence identifiers is compared and the size of the intersection is counted. This intersection size is divided by the size of the smaller of the two sequence sets to get the joint sequence fraction. A minimum joint sequence fraction of 0.5 is required for two motifs to be considered redundant.
Motif Loading
-pseudo count The pseudocount added to loaded motifs. A pseudocount of 0.1 is added to loaded motifs.
-trimbits Trim the edges of motifs based on the information content. The positions on the edges of the motifs with information content less than bits will not be used in scanning. Positions on the edges of the motifs with information content less than or equal to 0.25 will be trimmed.
-primaryname The name of the motif to select as the primary motif. This option is incompatible with -primaryi as only one primary motif can be selected. The first motif in the file is selected.
-primaryinum The index of the motif to select as the primary motif counting from 1. This option is incompatible with -primary as only one primary motif can be selected. The first motif in the file is selected.
-keepprimary  If the same file is specified for the primary and secondary motifs then by default the primary motif is excluded but specifying this option keeps it. The primary motif is excluded from the secondaries if the same file is used for the primary and secondary motifs.
Miscellaneous

Citing