glam2scan [options] <alphabet> <glam2 motif> <sequences>
GLAM2SCAN finds matches, in a sequence database, to a motif discovered by GLAM2. Each match receives a score, indicating how well it fits the motif.
The alphabet of the motif and sequences. This can be 'p' for protein sequences, 'n' for nucleotide sequences or the name of an GLAM2 alphabet file.
A file containing a GLAM2 motif. If the file contains multiple motifs then GLAM2SCAN only considers the top one.
A file containing FASTA formatted sequences.
The output begins with some general information:
GLAM2scan Version 9999 glam2scan p prot_motif.glam2 lotsa_prots.fa
This is followed by motif matches, sorted in order of score. A motif match looks like this:
**.**** SOS1_HUMAN 780 HPIE.IA 785 + 8.70
The name of the sequence with the match appears on the left; the start and end coordinates of the match appear on either side of the matching sequence; the match score appears on the right. The plus sign indicates the strand of the match (only meaningful when considering both strands of nucleotide sequences with the -2 option). The stars indicate the key positions of the motif: the alignment of the match to the key positions is shown.
Option | Parameter | Description | Default Behaviour |
---|---|---|---|
Basic Options | |||
-o | file | Write the output to file. | Write the output to standard output. |
-n | n | Report n matches. If scores are tied, they are sorted in alphabetical order of sequence name. If sequence names are also identical, the order is arbitrary. | |
-2 | Search both strands of nucleotide sequences. | ||
Advanced Options | |||
The remaining options are somewhat specialized. For typical usage, it is reasonable to set them to exactly the same values as were used with glam2 to discover the motif. | |||
-D | pseudocount | Specify the deletion pseudocount. | |
-E | pseudocount | Specify the 'no-deletion' pseudocount. | |
-I | pseudocount | Specify the insertion pseudocount. | |
-J | pseudocount | Specify the 'no-insertion' pseudocount. | |
-d | file | Specify a Dirichlet mixture file. |
Some users may wish to make 'fake' glam2 motifs for input to glam2scan, for instance based on motifs found by other tools. Most of the glam2 output is ignored by glam2scan, and a minimal motif file looks like this:
**..**** seq1 10 HP..D.IG seq2 5 HPGADLIG seq3 7 HP..ELIG seq4 5 HP..ELLA
The sequence names and coordinates are ignored, but some placeholder characters should be present. The stars indicating key positions are necessary, and the first and last columns must be starred.
If you use this program in your research, please cite: