This is the Title of the Book, eMatter Edition
Copyright © 2012 O’Reilly & Associates, Inc. All rights reserved.
Sequence Database Management Strategies
|
211
human ESTs from the EST division, as well as all the mRNAs from the PRI (primate)
division. But if you’ve designated all ESTs and mRNAs as the cDNA moltype, get-
ting all human transcripts is as easy as retrieving all records in which the species is
Homo sapiens and the moltype is cDNA. You can add several more fields to the data-
base, like date created, division, keywords, etc., and get quite a bit of functionality
without much more complexity.
Overall, flat file indexing is a very good strategy for sequence management because it
is simple, fast, and retains the data in its original format. You don’t even have to
write any software, as both free and commercial software packages are designed spe-
cifically for managing flat file data. Check out the Bioperl project at http://bioperl.org,
MyGenBank at http://sourceforge.net/projects/mgb, and SRS (see Table 11-4).
Commercial Sequence Management Software
Several commercial software packages are designed for managing biological sequence
data. The database software is generally part of a much larger software suite that
includes sequence analysis tools such as BLAST and visualization tools to make
interpretation easier. The companies that develop these packages expend a great deal
of effort to make the various sequence analysis tasks interoperable and user friendly.
Table 11-4 gives a brief description of the software.
As you can see from the descriptions of the personnel and hardware requirements,
using these comprehensive sequence analysis systems requires a serious
Table 11-4. Commercial sequence management software
Company Product and description
Accelrys The popular Wisconsin GCG package is now owned by Accelrys, which provides the SeqStore software for
managing sequence data. The system uses an Oracle database and allows daily/weekly updates. To
install and maintain the system, you must have personnel with experience in Unix systems administra-
tion and Oracle database administration. Accelrys recommends a computer with at least 4 CPUs, at 4-GB
RAM, and 40- GB disk space.
http://www.accelrys.com
Informax The Genomax software suite provides sequence management along with a comprehensive set of interop-
erable tools. Informax recommends a project manager, a Unix systems administrator, and an Oracle data-
base administrator to manage and maintain the system, as well as a life sciences expert to respond to
usersquestions. Informax uses a three-tiered architecture and recommends that the three computers be
configured with 4 CPUs and 4-8 GB RAM, and the database server have 400-GB disk space.
http://www.informaxinc.com
LION Biosciences LION Biosciences offers the Sequence Retrieval System (SRS). SRS is probably the most popular sequence
management software in use today and is used by both DDBJ and EMBL. SRS is free for academic users.
LION produces a separate, related product PRISMA2, which is an automatic databank-updating and
maintenance tool. SRS requires a person with competent Unix skills to install and maintain and a server
with enough storage for the various databases and indexes.
http://lionbioscience.com
This is the Title of the Book, eMatter Edition
Copyright © 2012 O’Reilly & Associates, Inc. All rights reserved.
212
|
Chapter 11: BLAST Databases
commitment. For these reasons, these packages aren’t recommended for small
research groups. For larger groups, though, these products can save a lot of time and
money. It’s easy to underestimate the effort required to develop your own sequence
management system, so take caution before embarking on such a task, and give the
professionals a chance to show you their wares.
Tools on the Internet
There are good reasons to use web-based tools for sequence management rather than
building a local database. First, you don’t have to download more data than you
need. Mirroring the entire public database isn’t efficient if you need only a slice of it.
Second, database providers take care of the most time-consuming and expensive
tasks, namely processing, storing, and indexing the data. Third, the databases are
self-updating, which means that you can always get the latest and most accurate
information. Best of all, the service is completely free. Well, maybe not completely
free since the databases are supported from taxes, but let’s all thank the various gov-
ernments and funding agencies for putting our hard-earned money toward a worthy
cause, and let’s especially recognize all the people that make it actually happen.
The downside to using web-based tools is that you have to spend time learning how
to query the database efficiently and accurately, but that’s going to be true of any
sequence management system, even your own. A more serious issue is that you will
depend on the computers and network between you and the database provider, but
this will improve over time. Still, even if you have to put up with a few glitches here
and there, the total cost in time and money is probably cheaper than building your
own local mirror.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.133.158.36