shadow

Usage:

shadow [options] <tree file> <alignment file>

Description

Perform phylogenetic shadowing on a given DNA alignment, using a given tree (Boffelli et al, Science 2003). This program is a simplified version of motiph, in which the equilibrium distribution is set equal to the background model, rather than being taken from a given motif.

Input

<tree file>

The name of a file containing a phylogenetic tree in Phylip Newick format. This tree may contain additional species not represented in the alignment.

<alignment file>

The name of a file containing a DNA multiple alignment in ClustalW format. Alternatively, if the --list option is used, this file may contain a list of alignment files.

Output

Shadow will create a directory, named shadow_out by default. Any existing output files in the directory will be overwritten. The directory will contain:

The output directory can be changed using the --o or --oc options which are described below.

The --text will limit output to plain text sent to the standard output.

Options

Option Parameter Description Default Behavior
General Options
--bgrate The mutation rate for sites in the background model. The background mutation rate is set to 1.
--fgrate The mutation rate for sites in the foreground model(s). The mutation rate is set to 1.
-gapskip|​fixed|​wildcard|​minimum Specifies the gap handling strategy.
ValueDescription
skip Skip those sites where any position in the alignment window contains a gap.
fixed Sites containing gaps are assigned a fixed score, specified by --gap-cost.
wildcard The gap character matches any base, and the score is the product of the corresponding probabilities.
minimum The gap character is assigned the score corresponding to the least likely letter at the given position.
Gaps are skipped.
--gap-costcost Specifies the costs for gaps when using the fixed gap handling strategy. The gap cost is zero.
--list Treat the second required input as a list of alignments, rather than a single alignment. The second required input is a single alignment.
--modelsingle|​average|​jc|​k2|​f81|​f84|​hky|​tn The evolutionary model to use.
ValueNameDescription
singleSingle Score score first sequence: compute standard log-odds score of first sequence in the alignment; ignores tree but does NOT remove gaps.
averageAverage Score compute average of standard log-odds score of aligned sites.
jcJukes-Cantor equilibrium base frequencies are all 1/4; the only free parameter is the mutation rate.
k2Kimura 2-parameter equilibrium base frequencies are all 1/4; the free parameters are the mutation rate and the transition/transversion rate ratio.
f81Felsenstein 1981 equilibrium base frequencies are taken from the alignment; the only free parameter is the mutation rate.
f84Felsenstein 1984 equilibrium base frequencies are taken from the alignment; the free parameters are the mutation rate and the transition/transversion rate ratio. The ratio of purine-purine to pyrimidine->pyrimidine transitions is assumed to be 1.
hkyHasegawa-Kishino-Yano equilibrium base frequencies are taken from the alignment; the free parameters are the mutation rate and the transition/transversion rate ratio. The ratio of purine-purine to pyrimidine-pyrimidine transitions is assumed to be equal to the ratio of purines to pyrimidines.
tnTamura-Nei equilibrium base frequencies are taken from the alignment; the free parameters are the mutation rate, the transition/transversion rate ratio, and the ratio of purine-purine transitions to pyrimidine-pyrimidine transitions.
A description of the f81 model is available in chapter 13 of Statistical Methods in Bioinformatics by Ewens and Grant. The other models are described in chapters 9 and 13 of Inferring Phylogenies by Felsenstein.
Behaves as if --model f81 was specified.
--pur-pyrratio The ratio of the purine transition rate to pyrimidine transition rate. This parameter is used by the Tamura-nei model. The ratio is set to 1.0.
--transition-transversionratio The ratio of the transition rate to the transversion rate. This parameter is used by the Kimura 2-parameter, F84, HKY, and Tamura-nei models. The ratio is set to 0.5.
--bfile The file should be in MEME background file format. The keyword motif-file can be used to indicate that the frequencies should be taken from the motif file. Use the alignment frequencies.
--max-stored-scorescount Set the maximum number of scores that will be stored. Keeping a complete list of scores may exceed available memory. Once the number of stored scores reaches the maximum allowed, the least significant 50% of scores will be dropped. In this case, the list of reported motifs may be incomplete and the q-value calculation will be approximate. The maximum number of stored matches is 100,000.
--no-pvalue Skip the p-value calculation. This switch will be necessary when a large number n of species are in the tree, because the memory requirement is 4n. This also disables computation of q-values. The p-values are calculated.
--no-qvalue Do not compute a q-value for each p-value. The q-value calculation is that of Benjamini and Hochberg (1995). The q-values are calculated.
--output-pthreshp-value threshold The p-value threshold for displaying search results. If the p-value of a match is greater than this value, then the match will not be printed. If both the --output-pthresh and --output-qthresh options appear on the command line, whichever appears later on the command line will be applied. The p-value threshold is set to 1e-4.
--output-qthreshq-value threshold The q-value threshold for displaying search results. If the q-value of a match is greater than this value, then the match will not be printed. No q-value threshold is applied.
--text Limits output to plain text sent to standard out. For shadow, the text output is unsorted, and q-values are not reported. This mode allows the program to search an arbitrarily large database, because results are not stored in memory. Outputs are created as normal.