Usage:

gomo [options] <go-term database> <scoring file>+

Description

Input

<go-term database>

The name of a file containing GO terms mapped to to the sequences in the scoring file. Database are provided by the webservices and are formatted using a simple tab separated values (tsv) format:

"GO-term" "Sequence identifiers separated by tabs"

The exception to this rule is the first line which instead contains the URL to an on-line database (if any) containing entries for the gene IDs. The URL should have ampersands (&) replaced with &amp; and the place for the gene ID marked by the token !!GENEID!!. Each gene ID reported in GOMo's output will be linked to the URL with the actual gene ID inserted.

<scoring file>+

The names of one or more XML files. Each file contains motif scores for a set of sequences from a genome following the CisML schema. When scoring data is available for multiple related species, GOMo can take multiple scoring files where the true sequence identifiers have been mapped to their orthologs in the reference species for which the GO-term database was supplied (see <go-term database>, above).

Scoring files may easily be created using the AMA utility that is part of the downloadable MEME Suite. A typical command to create a scoring file named "ama_out/ama.xml" using AMA would be:

	    ama ama_out -pvalues <motif_file> <fasta_sequence_file> <background_file>
          

By default GOMo uses the p-value given for each gene in the CisML file to rank the genes. Any sequence failing to provide a p-value will cause GOMo to exit. The --gs switch causes GOMo to use the gene scores from the CisML file instead for ranking genes.

Output

GOMo writes its output to files in a directory named gomo_out, which it creates if necessary. You can change the output directory using the --o or --oc options. The directory will contain the following files:

You can override the creation of files altogether by specifying the --text option, which causes GOMo to output its TSV format to standard output.

Note: See this detailed description of the GOMo output formats for more information.

Options:

OptionParameterDescriptionDefault Behavior
General Options
--text  Output in tab separated values format to standard output. Will not create an output directory or files.
--motifsmotifs Path to the optional motif file in MEME Motif Format that was used to generate (all of the) scoring file(s). The motifs in this file will be used to generate sequence logos in the GOMo HTML output. No logos are displayed in the HTML output.
--daggodag Path to the optional Gene Ontology DAG file to be used for identifying the most specific terms in the GOMo xml output so they can be highlighted in the HTML output.
--motifid Use only the motif identified by id. This option may be repeated. All motifs are used.
--shuffle_scores n Generate empirical null by shuffling the sequence-to-score assignments n times. Use the resulting distribution to compute empirical p-values. Shuffle 1000 times.
--t q Threshold used on the score q-values for reporting results. To show all results use a value of 1.0. A threshold of 0.05 is used.
--gs  Use the scores contained in the CisML file for ranking genes. Any sequence failing to provide a score will cause GOMo to exit. Use the p-values contained in the CisML file for ranking genes.
--score_E_thresh E All genes with E-values in the CisML file larger than E are treated as having the maximum possible score (and as having tied worst rank when the genes are sorted for the rank-sum test). The E-values are computed by multiplying the p-values by the number of genes in the CisML file. Setting E to a number less than 1 can reduce the effect of noise. The threshold will be ignored when GOMo is told to use gene scores rather than p-values via the --gs switch. E-values are not thresholded when ranking genes.
--min_gene_count n Only consider GO terms annotated with a at least n genes. A value of 1 is used, which shows all results.
--nostatus  Suppresses the progress information.

Citing