create-priors

Usage:

create-priors <wiggle file>

Description

The program create-priors takes as input a series of numeric values defined with respect to one or more DNA sequences. The program converts the data into a probabilistic prior using the method described in:

Gabriel Cuellar-Partida, Fabian A. Buske, Robert C. McLeay, Tom Whitington, William Stafford Noble, and Timothy L. Bailey,
"Epigenetic priors for identifying active transcription factor binding sites",
Bioinformatics 28(1): 56-62, 2012 [pdf]

A binned distribution for the priors is also generated. Examples of input data types include sequence tags counts from a DNaseI hypersensitivity or histone modification ChIP-seq assay, or sequence conservation scores.

Input

Wiggle File

A file in wiggle or bigWig format.

Output

create-priors will create a directory, named create-priors_out by default. Any existing output files in the directory will be overwritten. The directory will contain:

The default output directory can be overridden using the --o or --oc options which are described below.

Options

Option Parameter Description Default Behaviour
General Options
--alphanum The alpha parameter for calculating position specific priors. Alpha represents the fraction of all transcription factor binding sites that are binding sites for the TF of interest. Alpha must be between 0 and 1. The default value is 1.0.
--betanum The beta parameter for calculating position specific priors. Beta represents an estimate of the total number of binding sites for all transcription factors in the input data. Beta must be greater than 0. The default value is 10000.
--bigwig  Directs create-priors to output the priors in bigWig format. The name of the output file will be create-priors.bwg. This format is more efficent when the size of the input file is large.
--est-seq-sizen Estimated length of the full sequence represented by the wiggle input file. The wiggle file may not contain scores for every position in the underlying sequence. The sequence size is used to esimate the amount of missing data. A rough guess is all that is required. The default behavior is to estimate the sequence size by summing the maximum coordinate of each sequence in the input wiggle file.
--numbinsn Number of bins to use in prior distribution file. The default value is 100.
--psp  Directs create-priors to also output the priors in MEME PSP format. The name of the output file will be create-priors.psp. This format is only suitable when the number of postions is relatively small.