![]()
Usage:
tomtom [options] -query <query motifs> -target <target motifs>
Description:
The
Tomtom
program searches one or more query motifs against a database of target motifs, and reports for each query a list of target motifs, ranked by E-value. The E-value is the expected number of times that a similarity this strong would be observed by chance in a target database of random motifs. The output contains results for each query, in the order that the queries appear in the input file. With respect to each query, targets are ranked by E-value.For a given pair of motifs, the program considers all offsets, while requiring a minimum number of overlapping positions. For a given offset, each overlapping position is scored using one of seven column similarity functions defined below. In order to compute the scores,
Tomtom
needs to know the frequencies of the letters of the sequence alphabet in the database being searched (the "background" letter frequencies). By default, the background letter frequencies included in the MEME input files are used. The scores of columns that overlap for a given offset are summed. This summed score is then converted to an E-value. The reported E-value is the minimal E-value over all possible offsets.Input:
- <query motifs> - A file containing one or more motifs in MEME format. Each of these motifs will be searched against the target database.
- <target motifs> - A file containing one or more motifs in MEME format.
Output:
Tomtom
writes its output to files in a directory named tomtom_out, which it creates if necessary. (You can also cause the output to be written to a different directory; see -o and -oc, below.) The main output file is named tomtom.html and can be viewed with an internet browser. A second file, tomtom.txt, contains a simplified, text-only version of the output. (See -text, below, for the text output format.) For each query-target match, two additional files containing LOGO alignments are also written--an encapsulated postscript file (.eps) and a PNG file (.png). If the convert program is not available, no PNG files will be written.Options:
-o <output dir>
- Name of the output directory for all output files. If the output directory already exists, it will not be replaced and the program will exit without doing anything.-oc <output dir>
- Name of the output directory for all output files. If the output directory already exists, it will be replaced ('clobbered').-ethresh <value>
- Only report E-values below the specified threshold (Default = 1).-min-overlap <value>
- Only report motif matches that overlap by this many positions or more. In case a query motif is smaller than the value ofmin-overlap
, then the corresponding motif-width is used as the requiredmin-overlap
for that query. The default value is 5.-internal
- This parameter forces the shorter motif to be completely contained in the longer motif.-dist <allr|ed|kullback|pearson|sandelin>
These values correspond to Pearson correlation coefficient (pearson
), Average log-likelihood ratio (allr
), Kullback-Leibler divergence (kullback
), Euclidian distance (ed
) and Sandelin-Wasserman function (sandelin
). Detailed descriptions of these functions can be found in the published description ofTomtom
.-query-pseudo <float>
This option adds the specified pseudocount to each count in the query matrix. The default value is 0.-target-pseudo <float>
This option adds a pseudocount to each count in each target matrix. The default value is 0.-query-url-type < jaspar|transfac|scpd|macisaac|flyreg|dpinteract|regtransbase|none >
This option causes the names of query motifs in the output to be hot-links to the entry for the motif in the given, on-line database.-target-url-type < jaspar|transfac|scpd|macisaac|flyreg|dpinteract|regtransbase|none >
This option causes the names of target motifs in the output to be hot-links to the entry for the motif in the given, on-line database.-text
This option causes Tomtom to print just a tab-delimited text file to standard output. The output begins with a header, indicated by leading "#" characters. This is followed by a single title line, and then the actual values. The columns are:
- Query motif name
- Target motif name
- Optimal offset: the offset between the query and the target motif
- E-value
- Overlap: the number of positions of overlap between the two motifs.
- Query consensus sequence.
- Target consensus sequence.
- Orientation: Orientation of target motif with respect to query motif.
Bugs: none known.
Authors: Shobhit Gupta (shobhitg@u.washington.edu), Timothy Bailey (tbailey@imb.uq.edu.au), Charles E. Grant (cegrant@gs.washington.edu) and William Noble (noble@gs.washington.edu).