This is the Title of the Book, eMatter Edition
Copyright © 2012 O’Reilly & Associates, Inc. All rights reserved.
166
|
Chapter 10: Installation and Command-Line Tutorial
(an optimized version of NCBI-BLAST, see Chapter 12). The CD image is located at
http://gm.sonsorol.org:8080/BioInfxToolsInstaller.cdr.
The installation procedure could not be much simpler. Double-click on the
BioInfxToolsInstaller.cdr image, open the BioInfxToolsInstaller that appears on your
desktop, and then double-click the agncbi12-20-2001.pkg. This launches a typical
installer, and after a few clicks and keystrokes, you’re done. At the end, you need to
do two more things: add one line to your .cshrc file and copy the .ncbirc file to your
home directory. To do this, open the Terminal application and type the following
two lines exactly as they appear here:
echo "source/usr/local/biotools/cshrc.biotools" >> ~/.cshrc
cp /usr/local/biotools/.ncbirc ~/.ncbirc
Macintosh OS 9 Installation
The OS 9 archive is called blast.hqx. If you click on the file icon, your browser will
most likely launch the appropriate tools to automatically expand the archive. If not,
you can use Stuffit Expander, which is available for free from http://www.stuffit.com.
The OS 9 applications look completely different from the command-line versions
because they all have a graphical interface. Don’t worry about this because the inter-
face isn’t pretty, and you have to drag the window across your screen several times to
see all the buttons and text fields. (You may also experience a few system crashes
because OS 9 isn’t the ideal environment for BLAST.) You must also create a special
file to tell BLAST where to find its data directory. Create a file called ncbi.cnf in your
system folder that contains the path to the data folder. For example, if the data folder
is in a computer named MyMac and in a folder called Blast, the ncbi.cnf file should
look like this:
[BLAST]
BLASTDB=MyMac:Blast:data
Installation instructions for OS 9 are included for completeness, but Apple no longer
supports this operating system. You might want to upgrade to OS X or install one of
the Linux distributions for PPC. If you install Linux, you may have to compile the
executables from the source, but it’s worth checking if anyone has already done this.
A Google search for “Mac linux BLAST” is a good place to start.
WU-BLAST Installation
Obtaining WU-BLAST software is slightly more complicated than NCBI-BLAST
because it requires a license from Washington University in St. Louis. If you are affil-
iated with an academic institution or a nonprofit organization, the license is free. If
you are part of a for-profit enterprise, you must pay a licensing fee. The price is
expensive by shrink-wrapped software standards, but is similar to other bioinformat-
ics software packages available from universities. If you find the cost prohibitive, an
This is the Title of the Book, eMatter Edition
Copyright © 2012 O’Reilly & Associates, Inc. All rights reserved.
WU-BLAST Installation
|
167
earlier version of WU-BLAST is available for free. The free version contains fewer
features, and is available for a limited number of operating systems, but for most
people, it works just fine. If your operating system isn’t supported and your specific
use doesn’t require gapped alignment, a free version of the classic, ungapped BLAST
with public domain source code also exists. This older version, 1.4.9, is nearly identi-
cal to NCBI-BLAST Version 1.4, which is no longer available from the NCBI.
Should you wish to license WU-BLAST or download the free versions, visit the offi-
cial site for the WU-BLAST software at http://blast.wustl.edu. The free versions can
be downloaded with a couple clicks, but more patience is required for the licensed
version. After the license is issued, you will be sent a user-specific URL from which
to download the software. It’s a good idea to save this information because you will
use it again to download the free updates. Licensed users are notified by email as
new features are added (usually a few times per year).
WU-BLAST is available only for Unix operating systems. If you don’t have access
to a Unix computer, you can run Linux or FreeBSD under a virtual machine with
products such as VMWare (http://www.vmware.com) or VirtualPC (http://www.
connectix.com).
Expanding the tarball
The software comes as a compressed Unix archive, or tarball. First, create a direc-
tory such as /usr/local/pkg/wu-blast; if you don’t have root access, create a wu-blast
directory inside your home directory. Next, download the tarball to that directory. If
you do this from a browser, the files may be extracted automatically. If not, use the
following command, where your_platform_name will be something like linuxi686.
tar.Z:
tar -xzf blast2.your_platform_name
Not all versions of tar support the -z option above, in which case you can use the fol-
lowing command line:
zcat blast.your_blastform_name | tar –xf –
Before you continue with the rest of the installation procedures, look at what’s inside
the tarball.
Files and Directories
There are a number of files and two subdirectories. The most important items are
described very briefly in Table 10-2 in logical, rather than alphabetical, order. See the
WU-BLAST reference in Chapter 14 for more information.
This is the Title of the Book, eMatter Edition
Copyright © 2012 O’Reilly & Associates, Inc. All rights reserved.
168
|
Chapter 10: Installation and Command-Line Tutorial
Executables
Let’s assume the tarball has been downloaded to /usr/pkg/wu-blast, and you nor-
mally keep your executables in /usr/local/bin. Issue the following commands to put
the executables in your path.
ln -s /usr/pkg/wu-blast/blasta /usr/local/bin/blastn
ln -s /usr/pkg/wu-blast/blasta /usr/local/bin/blastp
ln -s /usr/pkg/wu-blast/blasta /usr/local/bin/blastx
ln -s /usr/pkg/wu-blast/blasta /usr/local/bin/tblastn
ln -s /usr/pkg/wu-blast/blasta /usr/local/bin/tblastx
ln -s /usr/pkg/wu-blast/xdformat /usr/local/bin
ln -s /usr/pkg/wu-blast/xdget /usr/local/bin
Note, unlike the NCBI program blastall, blasta can not be executed by its own name,
but only through aliases.
Table 10-2. WU-BLAST files and directories
Name Description
blasta The WU-BLAST executable. Unlike the free version, which comes with five different BLAST
executables, the licensed version has only one.
blastn, blastp, blastx, tblastn,
tblastx
Symbolic links (aliases) to blasta. blasta figures out what kind of program to run based on
the name of the symbolic link.
xdformat Executable for formatting both nucleotide and protein databases.
xdget Executable that allows you to retrieve sequences by accession number from a WU-BLAST
database.
nrdb, patdb Programs used to create nonredundant databases. nrdb keeps only unique sequences and
concatenates the descriptions of identical sequences. patdb goes a little further and
removes sequences that are perfect substrings of other sequences.
gb2fasta, gt2fasta, pir2fasta,
sp2fasta
Programs to convert GenBank, SwissProt, and PIR files to FASTA files. gb2fasta extracts the
nucleotides, and gt2fasta extracts the proteins.
filter Directory containing the complexity filtering programs used by WU-BLAST (seg, dust, and
xnu).
matrix Directory containing two subdirectories, aa and nt, which contain, respectively, the amino
acid and nucleotide scoring matrices. The amino acid matrices like BLOSUM 62 are singular
files, but the nucleotide matrices exist in two forms, with the extension 4.2 or 4.4 that cor-
responds to 4- and 16-symbol matrices.
setdb, pressdb Executable used to format protein and nucleotide databases. The xdformat executable
replaces these programs, but they are included for those who prefer the old interface or
require compatibility with older executables.
wu-blastall, wu-formatdb Perl scripts that mimic the NCBI-BLAST command-line interface while executing the WU-
BLAST counterparts.
sysblast Configuration file that allows administrators to enforce system-level resource limitations
on BLAST jobs.
This is the Title of the Book, eMatter Edition
Copyright © 2012 O’Reilly & Associates, Inc. All rights reserved.
WU-BLAST Installation
|
169
Environment Variables
You’ll need to set three environment variables: BLASTDB, BLASTMAT, and BLAST-
FILTER. These variables correspond to the locations of the databases, scoring matri-
ces, and complexity filters. WU-BLAST environment variables use a colon-delimited
list of locations, like the PATH variable. This is especially useful for database files,
which can be placed in several locations in the filesystem and then be accessed by
name rather than explicit path. This is convenient because it allows computers to
access databases on a networked server or on their local disks, and this is invisible to
the user. Databases are looked for from a colon-delimited list of locations defined in
the BLASTDB environment variable (similar to the PATH variable for executables). If
BLASTDB isn’t set, blasta looks in the current directory and in /usr/ncbi/blast/db.In
these cases, FASTA databases of the same name must be present (or symbolic links
to such databases). It’s generally a better idea to use the BLASTDB variable because
this strategy uses less disk space and is much less confusing.
Two environment variables, BLASTMAT and BLASTFILTER, must be set so blasta
can find the scoring matrices and complexity filters. These variables also use colon-
delimited lists, but there’s little reason to have them in more than one location.
Now set the BLASTMAT and BLASTFILTER environment variables to the explicit
paths of the matrix and filter directories (we’ll assume that the software was unpack-
aged in /usr/local/wu-blast). Here’s how to do so in csh and its derivatives:
setenv BLASTMAT /usr/local/wu-blast/matrix
setenv BLASTFILTER /usr/local/wu-blast/filter
And in sh and its derivatives:
BLASTMAT=/usr/local/wu-blast/matrix
BLASTFILTER=/usr/local/wu-blast/filter
export BLASTMAT BLASTFILTER
Setting Resource Limits with /etc/sysblast
WU-BLAST has a special file called /etc/sysblast that sets systemwide resource limita-
tions for each machine running BLAST jobs. The /etc/sysblast file currently supports
three commands: nice, cpus, and cpusmax. The nice value gives BLAST processes a
lower priority (nice values are generally in the range of 1 to 20, with 20 being the
least demanding). If the computer is used for other jobs, such a workstation, setting
this to 5 makes the workstation more responsive, but the BLAST job will take over at
idle times. The cpus value is the default number of CPUs to use, and cpusmax defines
the maximum number of CPUs allowed. These two should be set on any large, mul-
tiprocessor machine. Here is a sample /etc/sysblast file:
nice = 5
cpus = 1
cpusmax = 4
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.16.48.181