gomo [options] <go-term database> <scoring file>+
The name of a file containing GO terms mapped to to the sequences in the scoring
file. Database are provided by the webservices and are formatted using a
simple tab separated values (tsv) format:
"GO-term" "Sequence identifiers separated by tabs"
The exception to this rule is the first line which instead contains the
URL to an on-line database (if any) containing entries for the gene IDs.
The URL should have ampersands (&) replaced with
&
and the place
for the gene ID marked by the token !!GENEID!!
.
Each gene ID reported
in GOMo's output will be linked to the URL with the actual gene ID inserted.
The names of one or more XML files. Each file contains motif scores for a set of sequences from a genome following the CisML schema. When scoring data is available for multiple related species, GOMo can take multiple scoring files where the true sequence identifiers have been mapped to their orthologs in the reference species for which the GO-term database was supplied (see <go-term database>, above).
Scoring files may easily be created using the AMA utility
that is part of the downloadable MEME Suite. A typical command to
create a scoring file named "ama_out/ama.xml
" using AMA would be:
ama -oc ama_out -pvalues <motif_file> <fasta_sequence_file> <background_file>
By default GOMo uses the p-value given for each gene in the CisML file to rank the genes. Any sequence failing to provide a p-value will cause GOMo to exit. The --gs switch causes GOMo to use the gene scores from the CisML file instead for ranking genes.
GOMo writes its output to files in a directory named
gomo_out
, which it creates if necessary. You can change the
output directory using the --o or --oc options.
The directory will contain the following files:
gomo.html
-
an HTML file that provides the results in a human-readable formatgomo.tsv
-
a TSV (tab-separated values) file that provides
the results in a format suitable for parsing by scripts and viewing with Excelgomo.xml
-
an XML file that provides the results in a machine-readable format
You can override the creation of files altogether by specifying the --text option, which causes GOMo to output its TSV format to standard output.
Note: See this detailed description of the GOMo output formats for more information.
Option | Parameter | Description | Default Behavior |
---|---|---|---|
General Options | |||
--text | Output in tab separated values format to standard output. Will not create an output directory or files. | ||
--motifs | motifs | Path to the optional motif file in MEME Motif Format that was used to generate (all of the) scoring file(s). The motifs in this file will be used to generate sequence logos in the GOMo HTML output. Each mootif may be no wider than 300 positions. | No logos are displayed in the HTML output. |
--dag | godag | Path to the optional Gene Ontology DAG file to be used for identifying the most specific terms in the GOMo xml output so they can be highlighted in the HTML output. | |
--motif | id | Use only the motif identified by id. This option may be repeated. | All motifs are used. |
--shuffle_scores | n | Generate empirical null by shuffling the sequence-to-score assignments n times. Use the resulting distribution to compute empirical p-values. | Shuffle 1000 times. |
--t | q | Threshold used on the score q-values for reporting results. To show all results use a value of 1.0. | A threshold of 0.05 is used. |
--gs | Use the scores contained in the CisML file for ranking genes. Any sequence failing to provide a score will cause GOMo to exit. | Use the p-values contained in the CisML file for ranking genes. | |
--score_E_thresh | E | All genes with E-values in the CisML file larger than E are treated as having the maximum possible score (and as having tied worst rank when the genes are sorted for the rank-sum test). The E-values are computed by multiplying the p-values by the number of genes in the CisML file. Setting E to a number less than 1 can reduce the effect of noise. The threshold will be ignored when GOMo is told to use gene scores rather than p-values via the --gs switch. | E-values are not thresholded when ranking genes. |
--min_gene_count | n | Only consider GO terms annotated with a at least n genes. | A value of 1 is used, which shows all results. |
--nostatus | Suppresses the progress information. |