index-sequence-db <sequence database directory>
Create index files for the genomes in the sequence database directory.
This program requires the update-sequence-db
program to have been run first.
If necessary, the program adds the fileSeqIndex column to the tblSequenceFile.
Then loops through all the databases specified in the SQLite database fasta_db.sqlite
that is created by update-sequence-db
and uses fasta-file-indexer
to create the FASTA index file for each genome database.
The program then updates the tblSequenceFile record with
the name of the index file in the fileSeqIndex column.
The folder to used to store sequence database files that should be indexed.
The program expects the folder to contain the sequence FASTA files,
and the SQLite database fasta_db.sqlite
created by
update-sequence-db
.
For each sequence file indexed the program creates a corresponding index file
with the same basename, but the suffix .fai
.
The fasta_db.sqlite
database is updated with the name of the index file.
The program also creates a folder called
called logs
containing date-stamped logs of the program's activity.
As well as downloading the sequence files from many sources, the updater tracks the files using a SQLite database. The schema of the database is given below.
Column | Type | Constraint | Description |
---|---|---|---|
id | INTEGER | PRIMARY KEY | A auto-generated unique identifier for the category. Other tables reference this field. |
name | TEXT | UNIQUE NOT NULL | The unique name of the category as shown to users. |
Column | Type | Constraint | Description |
---|---|---|---|
id | INTEGER | PRIMARY KEY | A auto-generated unique identifier for the listing. Other tables reference this field. |
categoryId | INTEGER | NOT NULL REFERENCES tblCategory (id) | The identifier of the category that contains this listing. |
name | TEXT | NOT NULL | The name of the listing shown to users. |
description | TEXT | NOT NULL | The description of the listing shown to users. |
The combination of the fields categoryId
and name
is unique.
Column | Type | Constraint | Description |
---|---|---|---|
id | INTEGER | PRIMARY KEY | A auto-generated unique identifier for the sequence file. |
retriever | INTEGER | NOT NULL | An identifier for the code module that downloaded this sequence. It allows the individual code modules to ensure they don't change the records of files downloaded by other modules. |
listingId | INTEGER | NOT NULL REFERENCES tblListing (id) | The identifier of the listing that contains this sequence file. |
alphabet | INTEGER | NOT NULL CHECK (alphabet IN (1, 2, 4)) | Represents the alphabet as powers of 2 so they can be combined into a bitset.
|
edition | INTEGER | NOT NULL | A machine readable version. This field is used for sorting. Larger numbers are considered newer. |
version | TEXT | NOT NULL | A human readable version which is displayed to the user. |
description | TEXT | NOT NULL | The description of the sequence file, often containing information about the source. |
fileSeq | TEXT | UNIQUE NOT NULL | The relative path to the sequence file. |
fileBg | TEXT | UNIQUE NOT NULL | The relative path to the background file. |
fileSeqIndex | TEXT | UNIQUE NOT NULL | The relative path to the index file for the sequence. |
sequenceCount | INTEGER | NOT NULL | The number of sequences. |
totalLen | INTEGER | NOT NULL | The total end-to-end combined length of the sequences. |
minLen | INTEGER | NOT NULL | The length of the shortest sequence. |
maxLen | INTEGER | NOT NULL | The length of the longest sequence. |
avgLen | REAL | NOT NULL | The average length of the sequences. |
stdDLen | REAL | NOT NULL | Currently unused! Intended to store the standard deviation of the average length. |
obsolete | INTEGER | DEFAULT 0 | Used to flag sequences as obsolete. Sequences flagged as obsolete are hidden from the interface. |
The combination of the fields listingId
, alphabet
and edition
is unique.
Column | Type | Constraint | Description |
---|---|---|---|
id | INTEGER | PRIMARY KEY | A auto-generated unique identifier for the prior. |
sequenceId | INTEGER | NOT NULL REFERENCES tblSequenceFile (id) | The identifier of the sequence that is associated with this prior. |
filePrior | TEXT | UNIQUE NOT NULL | The relative path to the wig file (which may be gzipped). |
fileDist | TEXT | UNIQUE NOT NULL | The relative path to the dist file (which may be gzipped). |
biosample | TEXT | NOT NULL | A short descriptive name for the sample used in the experiment that the priors were derived from. |
assay | TEXT | NOT NULL | A short descriptive name for the experiment that the priors were derived from. |
source | TEXT | NOT NULL | A short descriptive name of the lab or group that performed the experiment that the priors were derived from. |
url | TEXT | NOT NULL | A URL linking to further information on the experiment. |
description | TEXT | NOT NULL | A description of the experiment which may contain HTML. |