This is the Title of the Book, eMatter Edition
Copyright © 2012 O’Reilly & Associates, Inc. All rights reserved.
Sequence Database Management Strategies
|
211
human ESTs from the EST division, as well as all the mRNAs from the PRI (primate)
division. But if you’ve designated all ESTs and mRNAs as the cDNA moltype, get-
ting all human transcripts is as easy as retrieving all records in which the species is
Homo sapiens and the moltype is cDNA. You can add several more fields to the data-
base, like date created, division, keywords, etc., and get quite a bit of functionality
without much more complexity.
Overall, flat file indexing is a very good strategy for sequence management because it
is simple, fast, and retains the data in its original format. You don’t even have to
write any software, as both free and commercial software packages are designed spe-
cifically for managing flat file data. Check out the Bioperl project at http://bioperl.org,
MyGenBank at http://sourceforge.net/projects/mgb, and SRS (see Table 11-4).
Commercial Sequence Management Software
Several commercial software packages are designed for managing biological sequence
data. The database software is generally part of a much larger software suite that
includes sequence analysis tools such as BLAST and visualization tools to make
interpretation easier. The companies that develop these packages expend a great deal
of effort to make the various sequence analysis tasks interoperable and user friendly.
Table 11-4 gives a brief description of the software.
As you can see from the descriptions of the personnel and hardware requirements,
using these comprehensive sequence analysis systems requires a serious
Table 11-4. Commercial sequence management software
Company Product and description
Accelrys The popular Wisconsin GCG package is now owned by Accelrys, which provides the SeqStore software for
managing sequence data. The system uses an Oracle database and allows daily/weekly updates. To
install and maintain the system, you must have personnel with experience in Unix systems administra-
tion and Oracle database administration. Accelrys recommends a computer with at least 4 CPUs, at 4-GB
RAM, and 40- GB disk space.
http://www.accelrys.com
Informax The Genomax software suite provides sequence management along with a comprehensive set of interop-
erable tools. Informax recommends a project manager, a Unix systems administrator, and an Oracle data-
base administrator to manage and maintain the system, as well as a life sciences expert to respond to
users’ questions. Informax uses a three-tiered architecture and recommends that the three computers be
configured with 4 CPUs and 4-8 GB RAM, and the database server have 400-GB disk space.
http://www.informaxinc.com
LION Biosciences LION Biosciences offers the Sequence Retrieval System (SRS). SRS is probably the most popular sequence
management software in use today and is used by both DDBJ and EMBL. SRS is free for academic users.
LION produces a separate, related product PRISMA2, which is an automatic databank-updating and
maintenance tool. SRS requires a person with competent Unix skills to install and maintain and a server
with enough storage for the various databases and indexes.
http://lionbioscience.com