tgene [options] <locus_file> <annotation_file>
ftp://ftp.ensembl.org/pub/release-X/gtf
(where you must substitute an actual release number for 'X', e.g.,
ftp://ftp.ensembl.org/pub/release-102/gtf) and
ftp://ftp.ensemblgenomes.org/pub/release-X/GROUP/gtf
(where you must substitute an actual release number for 'X', and an actual group for 'GROUP',
e.g., ftp://ftp.ensemblgenomes.org/pub/release-57/fungi/gtf).
The genomic coordinates are assumed to be 1-based, closed as defined in the GTF standard.
T-Gene writes its output to files in a directory named
tgene_out
, which it creates if necessary. You can change the
output directory using the --o or --oc options.
The directory will contain:
tgene.html
-
links.tsv
-
Note: See this detailed description of the T-Gene output formats for more information.
Option | Parameter | Description | Default Behavior | |||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
General Options | ||||||||||||||||||||||||||||||||||||||||||
--transcript-types | ttypes | A comma-separated list (no spaces) of RNA transcript types. T-Gene will only output links for transcripts of these types. |
The value of ttypes is set to ' protein_coding,processed_transcript '.
|
|||||||||||||||||||||||||||||||||||||||
--max-link-distances | mlds | A comma-separated list (no spaces) of maximum distances between a potential regulatory element (RE) and its target. By default, T-Gene will evaluate all potential links that satisfy the maximum distance criterion as well as Closest-Locus and Closest-TSS links (see options --no-closest-locus and --no-closest-tss, below). Note: If you provide a tissue panel (see Tissue Panel Options, below), there must be one distance for each histone name in histones (see option --histones, below), and each distance is used with the corresponding histone name. If you do not provide a tissue panel, you may only specify one distance. |
The value of mlds is set to '500000 '.
|
|||||||||||||||||||||||||||||||||||||||
--max-pvalue | mpv | Only links whose p-value is less than or equal to mpv will be included in the output of T-Gene. If you provide a tissue panel (see Tissue Panel Options, below), T-Gene will test the CnD (Correlation and Distance) p-value, otherwise it will test the Distance p-value. Note: T-Gene does not apply the maximum p-value threshold to closest-locus and closest-TSS links, which are always included in the output unless options you specify options (see Other Options, below) --no-closest-locus or --no-closest-tss, respectively. | 0.05 | |||||||||||||||||||||||||||||||||||||||
Tissue Panel Options | ||||||||||||||||||||||||||||||||||||||||||
--tissues | tissues | A comma-separated list (no spaces) of three (or more) tissue names that are the sources of the histone and expression data. These names are must also be the names of the subfolders where the histone and expression data files are to be found by T-Gene. See below under options --histone-root and --expression-root for more information. Note: Because the Pearson correlation coefficient is always 1, 0 or -1 for a pair of points, it does not make sense to use fewer than three tissues. | None. | |||||||||||||||||||||||||||||||||||||||
--histone-root | hrd |
The root directory containing the histone modification files.
The files are must be in
ENCODE broadPeak
or ENCODE narrowPeak format format,
but only the first 7 fields are used (or required).
The genomic coordinates are assumed to be 0-based, half-open as defined in the above standards.
The histone modification files should be subdirectories under
the histone root directory, where each subfolder is named according
to the tissue from which the data is taken. (See option --tissues,
above.) The subdirectories should be named
'<hrd>/<t> ',
where <t>
is one of the tissue names in the comma-separated
tissues list.
|
None. | |||||||||||||||||||||||||||||||||||||||
--histones | histones |
A comma-separated list (no spaces) of histone modification names.
The histone modification file names must match
'<hrd>/<t>/*<hname>*[broad|narrow]Peak ',
where <t> is one of the tissue names in the
comma-separated tissues list, and
<hname> is one of the histone names in the
comma-separated histones list.
|
None. | |||||||||||||||||||||||||||||||||||||||
--rna-source | Cage|LongPap |
The type of RNA expression data that you are providing.
This determines the precise format expected in the expression files in
GTF format
that you specify.
For Cage GTF files,
the attributes field (column 9) should contain the key-value pairs for the
following keys: gene_id and trlist (which is a comma-separated list of transcript IDs),
and one or two keys matching the regular expression "rpm[12]?" whose value is
the RNA expression of the transcript.
(Note that this is not standard GTF format due to the required
transcript_id key-value pair being replaced by the trlist key-value pair.)
For LongPap GTF files,
the attributes field (column 9) should contain the key-value pairs for the
following keys: gene_id, transcript_id and one or two keys matching the
regular expression "[RF]PKM[12]?" whose value is the RNA expression
of the transcript.
|
None. | |||||||||||||||||||||||||||||||||||||||
--expression-root | erd |
The root directory containing the RNA expression files.
The files must be in a flavor of
GTF format,
as described above under option --source.
The genomic coordinates are assumed to be 1-based, closed as defined in the GTF standard.
The RNA expression files must be subdirectories under
the expression root directory, where each subfolder is named according
to the tissue from which the data is taken. (See option --tissues,
above.) The subdirectories should be named
'<erd>/<t> ',
where <t>
is one of the tissue names in the comma-separated
tissues list.
The RNA expression file names must match
'<erd>/<t>/*<rna_source>*.gtf '.
|
None. | |||||||||||||||||||||||||||||||||||||||
--use-gene-ids | If your expression data files only contain gene ID information (e.g., if the 'transcript_id' fields are not unique or not specified), T-Gene can use the 'gene_id' fields instead for associating entries in the expression files with an entry in the annotation file. T-Gene will use the start and end positions given for each 'gene_id' in the expression files, and, for each 'gene_id' all expression files must agree or the results will be unpredictable. | T-Gene uses the 'transcript_id' fields in the annotation and expression files to identify transcripts, and they must be unique within the annotation file. T-Gene uses the start and end positions for each transcript as specified in the annotation file. | ||||||||||||||||||||||||||||||||||||||||
--lecat | lecat | (Low Expression Adjustment Threshold) If the maximum expression of a TSS is < lecat, T-Gene reduces the computed correlation values for all its links. It multiplies the computed correlations are each link by the scale factor max_expr/lecat, where max_expr is the maximum expression of the TSS across the panel of tissues. | 0 (No correlations are reduced.) | |||||||||||||||||||||||||||||||||||||||
Other Options | ||||||||||||||||||||||||||||||||||||||||||
--no-closest-locus | T-Gene will not search for closest-locus links that exceed the maximum distance requirement (see option --max-link-distances, above), and it will not output closest-locus links that exceed the maximum p-value requirement (see option --max-pvalue, above). | T-Gene includes a link to the closest locus (or loci in case of ties) for each transcript even if the locus does not meet the maximum link distance requirement or the maximum p-value requirement. | ||||||||||||||||||||||||||||||||||||||||
--no-closest-tss | T-Gene will not search for closest-TSS links that exceed the maximum distance requirement (see option --max-link-distances, above), and it will not output closest-TSS links that exceed the maximum p-value requirement (see option --max-pvalue, above). | T-Gene includes a link to the closest transcript (or transcripts in case of ties) for each locus even if the transcript does not meet the maximum link distance requirement or the maximum p-value requirement. | ||||||||||||||||||||||||||||||||||||||||
--no-noise | T-Gene will not add random Gaussian noise to expression or histone values that are zero. Note: Using this option will make the p-value calculations less accurate. | T-Gene adds random Gaussian noise to all zero expression and histone values. | ||||||||||||||||||||||||||||||||||||||||
--seed | seed | Seed for random number generator for generating the null model for correlation p-values and for adding random noise to zero expression and histone values. | 0 |