update-sequence-db

Usage:

update-sequence-db [options] <sequence database directory>

Description

Download sequence databases.

Creates a SQLite database called fasta_db.sqlite and downloads sequences from multiple sources while storing information about the sequences in the database.

The program will start in status display mode where it will give regular updates on what it is doing. You can switch it to command mode by pressing Enter. In command mode you can type the two basic commands "help" which will show the available commands and "status" which will switch it back to status mode. While sequences are downloading you may use the command "exit" to stop any further downloading.

Input

Sequence Database Directory

The folder to store downloaded database files. The MEME Suite expects to find sequence databases in a folder called fasta_databases either inside in the folder MEME Install Folder/db or in the folder specified to the configure script --with-db DB Install Folder . Depending on how you configured the MEME Suite you should either specify MEME Install Folder/db/fasta_databases or DB Install Folder/fasta_databases .

Output

The program creates a folder called downloads and a folder called logs. It also creates a SQLite database called fasta_db.sqlite. Every sequence database that is downloaded is initially put in the folder downloads until it has been completely downloaded. When the sequence has been downloaded it will be decompressed or merged from multiple sources as required and put into a sequence file with either a .faa or .fna extension for protein or DNA sequences. Once the sequence has been expanded it will be processed by fasta-get-markov to calculate a 1st order background model in a file with the extension .bfile. Additionally fasta-get-markov will calculate the number of sequences, the shortest, longest and average size and all this information will be stored in the SQLite database.

Configuration

Configuration files that tweak the behaviours of the sequence database downloaders will be automatically generated in the conf directory within the specified sequence database directory.

Additionally the miscellaneous source downloader will check the conf directory for any files ending with the extension .csv which it reads to determine sequence sources. The MEME Suite includes two files db_general.csv and db_other_genomes.csv in the distribution's etc folder which may be moved into the conf folder, though this is not done automatically during install.

Options

Option Parameter Description Default Behaviour
Help
--help Display a help message and exit. Run like normal.
Disable Database Sources
--no_ensembl Disable downloading genmoes from Ensembl. Download genomes from Ensembl.
--no_epd Disable downloading Eukaryotic Promoter Database. Download Eukaryotic Promoter Database.
--no_genbank Disable downloading genomes from GenBank. Download genomes from GenBank.
--no_misc Disable downloading miscellaneous sequence databases listed in csv files. Download miscellaneous sequence databases listed in csv files.
--no_rsat Disable downloading upstream sequences from RSAT. Download upstream sequences from RSAT.
--no_ucsc Disable downloading genomes from UCSC Download genomes from UCSC
--updaterclassname Experimental Specify the classname of a custom updater.
File Cleanup
--delete_old Sequence databases marked as obsolete (on a previous update) will be deleted. Sequence databases marked as obsolete will be left untouched.
--retain_missing Database entries for missing files are retained. Database entries for missing files are removed.
Backwards compatibility
--csv:directory Create a csv file and index file that lists all the databases to enable backwards compatibility with older releases. The directory to create the csv and index file can be specified if desired but if it is not specified then the csv and index file will be placed in the sequence database directory. Don't create a csv or index file.
Miscellaneous
--bindirectory Specify the location to find the fasta-get-markov tool. The program will search the configured bin directory and if fasta-get-markov is not present it will search the path.
--loglog file Specify the file to write logs. A log will be written the logs directory below the sequence database directory.
-vlog level Specify the logging level [1-8]. A default logging level of 3 is used which outputs errors, warnings and summary information.