Chapter 2. Preparing Your Linux Cluster

Preparing Your Linux Cluster

Building a cluster takes a great deal of preparation and forethought. Depending on the solution involved, clusters can run from a few hundred dollars in spare parts to several hundred thousand for the larger, more intricate clusters. To create the right solution for your cluster, all design aspects need to be thought out properly in advance.

If the budget doesn’t involve throwing money at a vendor to come up with an instant solution, planning rests on your shoulders. Once the go ahead has been given, it’s time to create an installation checklist.

You need to keep several things in mind when you start your checkist. What equipment best suits your needs? Do you have a preexisting service contract with a particular vendor? Does this vendor offer systems prebuilt with your needed specs, or are you going to have add extra parts? What sort of network will you build? What sort of network cards will work best with your environment? The entire topology must be mapped out before you begin.

You also have to ask yourself what kind of cluster will best suit your needs. If your company’s goal is to build a web server, does that solution include high availability (HA) and load balancing? Do you really need an SMP processor for your parallel application, or will single processors work? This chapter examines different ways to design your environment to bring your various cluster configurations up to speed quickly and easily.

Planning the Topology

Designing your cluster doesn’t stop at the hardware level. If you don’t take into account and plan for every little thing, you’ll find yourself quickly running over budget. You have to plan for network cabling, power and cooling requirements, and replacement costs should anything fail. The physical placement of the computer is important, so you need to decide if the corner table is appropriate, if a dedicated rack should be designed for the environment, or even if a new datacenter would work best.

One of the things most overlooked in the budget is the maintenance. Clusters take time and overhead to run, not to mention a qualified administrator. If something goes wrong, it’s nice to have someone who knows what the problem is right away or have a maintenance contract to bring in someone to solve the problem for you.

One Format Yields One Solution

Although it isn’t absolutely necessary, providing a standard for your equipment helps reduce overhead and maintenance costs. Having disparate systems means keeping track of all the separate items, which increases administrative overhead. Most system administrators already have heterogeneous networks that they have to deal with on a daily basis. Unless you’re fortunate enough to administrate one operating system and one computer vendor, planning for one solution alleviates headaches down the road.

One of the benefits of using (staying with) GNU applications for clustering is the availability of support. Although the documentation for software can be sparse, help is usually an email or newsgroup away, with responses often coming from the writer of the application itself. When selecting software for your cluster, look and see what support options are available. A great deal of software written for Linux under the Gateway Naming Utility (GNU) license is written by people in their spare time, so answers to questions are usually found in the same manner.

Keeping the same set of hardware in your cluster allows for decreased administration time. Having all nodes built exactly the same decreases the amount of guessing during maintenance checks or replacement time when devices fail. Having the same hardware solution across all the nodes lets you maintain a smaller pool of replacement parts rather than having to stock replacement parts for multiple systems. Having the same ethernet card on all nodes allows you to keep one set of drivers on hand, maintain one set of patches, and look in one place for updates. Maintaining a heterogeneous environment just adds to the complexity of the solution when you have to look in several places to find the proper drivers and patches.

Spending a little extra in maintenance can be a double-edged sword. On one hand, support contracts can usually be purchased where hardware support is never more than four hours away. In case of a downed node, parts can be replaced quickly by a trained technician to bring the system up cleanly and easily. Sometimes, that limits an organization due to the trained technician being the only one allowed to fix the server by contract. Nowadays, some companies are replacing their Linux servers and workstations every two years or so rather than paying high maintenance costs.

The whole idea behind HA clustering allows for failover so that downtime becomes a non-issue. Failover happens when a secondary system detects that the primary has stopped proving services. The secondary system in an HA configuration then moves itself into the foreground, allowing the primary to be worked on and brought back to service. Knowing that there’s a secondary server that can be switched over allows an organization a certain amount of breathing room, so a maintenance contract might not even be needed. A downed node in a parallel cluster will not adversely affect the system. Processes could be easily migrated or started on other nodes.

Get the Best Components for the Price

Although this sounds like a no-brainer, getting the best components for the price is extremely important for designing a cluster. A penny saved here and there on parts that aren’t top of the line will come back to haunt you when you least expect it.

The greatest bottleneck in any cluster configuration is the network. Faster processors and bigger hard drives don’t mean anything if the network isn’t there to be able to drive it. You can find cheap fast ethernet cards that are designed to work under Linux, although they’re not compatible with each distribution. Gigabit ethernet cards are becoming reasonable, and Myrinet seems to be the cluster hardware of choice now. If you can afford it, spend a little more and check out the hardware-compatibility list that’s associated with your distribution. Even though a card says that it works with Linux, find out if it works with your distribution before you purchase it. Some cards are designed only to work with RPM distributions right off the bat. Although you can tweak the drivers to your own distribution, finding out ahead of time saves lots of potential hassles down the road. You’ll be thankful you checked ahead of time.

Replacement Considerations

One thing to keep in mind is that the cost of the hardware is negotiable compared to the data kept within. Hardware is easily replaced and generally cheap compared to the time it took to create that data. It’s vital that you keep good backups. With a parallel or distributed cluster, the relevant data resides mainly on the master nodes. This makes it easy to replace nodes that have failed. With today’s cheap commodity computers, it might be easier to replace the entire node in the cluster than take the time needed to work on the unit. This makes sense with diskless clusters, as the only parts involved typically are the motherboard, CPU, RAM, and network interface.

Avoid the Single Point of Failure

When designing HA or load balanced networks, keep in mind where the single points of failure (SPOF) typically lie. Often, the items that have the most moving parts are affected the most, but these incidents of failure can be reduced. Hard drives can be hooked together in a RAID type configuration to lessen the chance of a single disk bringing down the entire machine. Dual network cards attached to the same network offer a good level of redundancy should a card fail. A better solution is to attach each network interface card (NIC) to a different switch, then to a different network, and then ultimately to a different ISP. Remember that only one method of calling the administrator on duty results in an SPOF; at times, cellular service is spotty. The admin needs to carry a pager as well.

Plan for Administrative Overhead

One of the most overlooked aspects of any budget is administration. Sure, hardware is included, perhaps maintenance, but what about the Linux administrator to handle the cluster itself? Yes, you might be the person to set up the system, but are you available to maintain and monitor it constantly? Do you have the proper training to set up this network? If not, the budget might be amended to include for training or time set aside to study the applications included within the cluster. You also might include overtime pay, budget for consultants, training new employees, and budget for employee turnover.

Select the Right Distribution

When experienced users talk about Linux, they’re talking about the kernel itself. The rest of the operating system happens to be optional depending on the distribution. This leaves each individual or company to add whatever software they see fit. If parts of the software don’t fit your needs, feel free to either add your own, take away the things that don’t fit, or modify the kernel in such a way that that it fits your needs. As of this writing, Linux isn’t at version 8.0 and won’t be for some time. That’s a common misconception that people who are unfamiliar with Linux tend to have.

Keeping this in mind, you have to select the right distribution that best suits the needs of the cluster and those who administer it. Are the administrators familiar with Linux in the first place? How about the users? Will they be able to handle any distribution that you throw at them, or will they have to have an installation that guides them through the process?

There are more Linux distributions now that you can shake a stick at, and more are growing daily. There are distributions for almost every purpose imaginable—from embedded devices, security focused distributions, distributions that start from within Windows, to even distributions that run on game consoles. Some distributions cater to new Linux users more than other distributions. Most distributions let you configure most of the operating system in the installation dialogues; however, there are certain distributions whose installers are more advanced than others. For example, Mandrake Linux appears to have a full-featured graphical install that adjusts to the correct video resolution, whereas Debian leaves more of the configuration to the more experienced user in the XF86config file.

Differences in Distributions

The first thing to keep in mind is the layout of the system files. Remember, there’s nothing that says any Linux distribution has to have things in common with any other. This leads to fragmentation and lack of standards within the Linux community. Distributions, such as Red Hat, tend to place their system files in non-traditional places. Of course, it works for them, but these discrepancies must be kept in mind when moving from one install to the other.

Different distributions support their own implementation of init scripts. Red Hat and similar distributions tend to be more Sys V compliant, while images like Slackware have no distinct directories for their run levels. Which version you choose is a personal preference, but keep in mind that what works on one system might not work on all Linux implementations. For example, Slackware starts NFS in the rc.inet2 file, which initializes at boot time. Red Hat and like distributions start those services individually in /etc/rc.d/init.d.

Distributions to Avoid

When building a cluster, avoid distributions that don’t run natively with ext2 or a journaling file system such as ext3. Some distributions can run under DOS, FAT, or even use NTFS file systems. These Linux installs mimic the native ext2 file structure, although the underlying file structure is still DOS or FAT. This creates an abstraction layer between the Linux file system and native formatted file system that results in a lack of performance. For optimal performance, stick with a system that stays away from these abstraction layers. Common sense also warns against using any Linux distribution with “Phat” in the name for production use.

UMSDOS and loopback file systems run in mostly the same manner. Essentially, Linux creates a file system in an individual file that can be accessed by the local hard drive. Although these distributions are fine for a person starting out with Linux, they’re not recommended for production use (though they could be used in a test environment or an encrypted loopback file). If your company is short on lab equipment, a good solution is VMware (www.vmware.com), which allows for multiple virtual Linux installs on a single box.

Planning the Environment

After you decide on the distribution and type of cluster you’re going to implement, the next step is to plan the surrounding environment to house the cluster. If the cluster you’re implementing contains an excessive amount of components, you need to think about additional power, cooling, and UPS requirements to maintain them. In addition, special types of clusters need enhanced security requirements. If you’re just putting up a small load balanced solution, chances are you’re not going to need extra power requirements. If you already have a datacenter or Network Operations Center (NOC), chances are that you’ve already got the infrastructure in place to support a few extra computers. Of course, large parallel clusters of a few thousand nodes are going to need special design and consideration.

Consider Power and Cooling Requirements

Planning the power requirements of a large cluster requires forethought and careful layout. After the selection of the servers, monitors, UPS systems, and network equipment, an inventory is required to plan for the power infrastructure. The trick here is to plan for everything that is going to need power, not just the servers. Your existing NOC might have these components accounted for, but if you’re a home user or building an NOC from the ground up, you’re going to have to provide for the power somehow.

When planning for the infrastructure to house your cluster, remember that there is no magic solution. Many factors play a part in the design, such as the amount of space you have to work with, the outside average temperature, and the ambient room temperature. If the floor on which the cluster sits is raised, the cooling requirements will obviously be different.

Electricity Requirements

A small cluster setup can generally get away with being placed on a 20-amp circuit. If the goal is high availability, you would ideally place each server on its own circuit attached to its own UPS. A home-built cluster can fit nicely plugged into normal outlets rated for their region. Excessive cooling isn’t required, although remember that any network equipment tends to be more sensitive to temperature changes than your servers. Remember to exercise caution; you’ll notice any overdraw in power if the lights or monitors start dimming and your equipment starts smoking.

On the other hand, planning for a large cluster of powerful servers requires space in a datacenter or reserved room. If you haven’t decided to colocate your equipment, you need to either add on to your existing NOC or build one from scratch. The first step in determining the power requirements is to catalog all the devices and add up their power requirements. When calculating for your requirements, remember to calculate with the total power draw possible, rather than the average. Your server should have the total output labeled on the power supply itself or supplied in the documentation. If you’re not sure, call the manufacturer.

During the inventory, don’t forget to include anything that might have a power draw or that could generate heat. This might include special cards inside the server, the monitors, powered racks themselves, the lights, tape drives, and even the power strips themselves. Keep in mind extra cards and peripherals that you’ve added to the system that might cause an extra power draw, such as tape drives. If the cluster in question is a large parallel cluster, this is made easier by simply adding up the power in a rack and multiplying by the number of racks.

A cluster needs (proper or adequate) power requirements and circuit breakers. All outlets need to have three wire-grounded plugs, for it’s only a matter of time before you start putting in power strips and extending the power draw on the existing infrastructure. For this reason, you need to estimate for future growth and the cost of replacement after the lifetime of your equipment. What seems appropriate today will probably be outgrown sooner than later.

Provisioning for Cooling

After you generate all kinds of heat with the electricity you’re consuming, you must decide how to get rid of it. Thankfully, you’ve considered the requirements and you’ve come up with proper air conditioning to alleviate the heated air. Before preparing your datacenter architecture, here’s some handy terms that you’ll want to arm yourself with:

  • HVAC—Heating,Ventilation, and Air Conditioning.

  • BTU—British Thermal Unit. A standard of measuring how much heat it takes to raise or lower one pound of water by one degree Fahrenheit. Heat generated in BTUs equals the total power in watts multiplied by 3.41.

  • Ton—12,000 BTUs per hour, or 3,510 watts. (Ton refers to an old ice-cooling term, back when ice was used for air conditioning.)

A one-ton air conditioning unit can remove 288,000 BTUs of heat in 24 hours, while a two-ton air conditioner can remove twice this amount. To keep this in perspective, a typical 1,350 square foot house will require a four-ton air conditioner. When planning for an HVAC system, make sure that the air is circulating several times an hour. The more times you’re able to circulate the air through the room, the faster you’ll recover from a power outage. The amount of times you’ll be able to clear the air from a room depends on the layout of the items in the room, the external environment, and the extraneous space.

A rough guideline to plan for the air conditioning required is to take the total power in watts and multiply that by 3.41. A datacenter that puts out 20,000 watts would then put out 68,200 BTUs per hour that have to be dissipated.

Cooling the cluster from a raised floor can be achieved more effectively than by using ambient room air conditioners. The holes in the floor can move air directly into the racks enabling the air to circulate within the racks themselves. Cooling the servers in this way is more effective than trying to cool the servers by lowering the room temperature. A raised floor in a datacenter offers the ability to run cables and piping underneath the server racks. Not only does this allow for greater safety, it keeps unsightly cables and piping out of the way. Keep in mind that normal floors can’t handle the stress that a large cluster places on it.

Space: The Final Frontier

When planning for the layout of your equipment, keep in mind that you’ll actually have to reach the back of your servers once in a while. Putting your servers against the back wall assures that you’ll have to get at those more often than not. There’s nothing like twisting your body around in positions that aren’t humanly possible. Then again, if you’re into yoga, misplacement of servers might be a blessing. For the rest of us, sane placement of the equipment is a must.

Adequate space is needed for air to circulate through the racks. A basic rule of thumb: Keep the servers at least 1 1/2 inches from the sides of the cabinets. This allows for proper airflow and cooling of the equipment.

Racks for Servers and Desktops

The design of the server storage will most likely depend on the choice of servers themselves. If you’re able to build your parallel cluster from scratch, consider spending a little extra money on 1U servers. The space saved with a racked system such as this will result in lower cooling costs and higher cooling efficiency. The rack will also provide housing for keyboard, video, mouse (KVM) switches, network gear, and the UPS. Racks can easily be found in configurations of over 72” 80,” housing over 50 servers in each rack.

If you’re not considering server racks for your cluster, your local home improvement store should sell tool racks that function fine for the placement of desktop or tower servers. Be sure that if you use this type of storage solution, that the rack lends itself well to cooling and has plenty of airflow around the system.

You can find specialized computer companies that fit multiple servers in a small amount of space. For example, eAppliance Corporation sells a 1U server with four hot-swappable servers, each with three fast ethernet jacks for redundancy. You can find them at www.eappliancecorp.com/. RLX Technologies has a device with 24 servers on blades, which fits nicely in a standard 19” 3U cabinet (www.rlxtechnologies.com).

Switches and Connectivity

With the speeds of today’s computers, the biggest bottleneck is the network. You can alleviate some of your bandwidth problems with a simple fast ethernet solution. Fast ethernet today is acceptable for most implementations, but for the cluster that needs the most out of its performance, gigabit ethernet is becoming the standard.

When choosing connectivity for your professional cluster solution, try not to skimp on any of the network components. The cluster is essentially a communication medium between nodes. When you drop the medium, you don’t have a cluster.

Connectivity with a straight hub allows for traffic between the nodes, although it transmits the traffic across the entire hub. This not only slows down the network traffic by exposing the packet to each interface, but is also inherently insecure. Anyone with knowledge of network snooping gear can sniff out passwords or other sensitive data. A switch is sufficient for most cluster implementations. This allows for direct data transfer between hosts, and the chance of snooping is greatly minimized. Managed switches, in turn, allow for greater granularity at the port level. Some allow for the implementation of virtual networks.

A second interface is a good idea when implementing heartbeat connectivity between two machines. These can be of any standard; however, consider using crossover cables rather than hooking the interfaces up to the network. In this case, the network just adds another point of failure in case something goes wrong.

Regular 10bt ethernet cards will be more than fine for any home network solution that’s dependent on an ISP. The speed of the cable or DSL link will not be able to saturate the ethernet card, and response times are not as critical from internal servers. Because people are practically giving their 10bt cards away in favor of fast ethernet, any real home solution using Linux can easily be implemented on the fly. If you’re contemplating a parallel cluster, you can use the leftover 10bt cards, but you’ll notice a large performance hit.

Donald Becker, CTO and founder of Scyld Computing Corporation, wrote a great deal of the ethernet drivers for Linux. You can find drivers for most supported cards at www.scyld.com/network/.

A Few Words About Security

We would be doing a great disservice to the computing community as a whole if we didn’t share a few words about security. Almost every day a new exploit is found, a new virus hits the net, or a denial of service attack is launched. The majority of these attacks cost companies thousands of dollars in lost manpower recovering from downed or defaced servers.

Where do these attacks come from? A great majority of the attacks come within the company itself. A disgruntled employee in a low-security environment can destroy years of work in a matter of minutes. Crackers (and thieves) are opportunists. By minimizing the opportunities that exist, you minimize the chance that your cluster will be compromised.

Many books have been written on the concept of security, and it’s a full-time job for many administrators. A full examination of security is beyond the scope of this book, so we cannot hope to engage a full discussion, although we can go over some important security highlights.

Security in Layers

Security at the system level is only one part of the equation. Just like the OSI networking model, the security model encompasses the entire organization. To fully secure your environment, you need to restrict access to the server on the Network Layer, the OS level, at the physical machine, and the datacenter area. After a hacker gets physical access to the server, that’s it. Game over.

There lies a ratio between usability and security. There is no way to make your system totally secure; however, you can take steps to reduce the chance of a compromise. Take a proactive approach in security prevention. Subscribe to security newsletters and newsgroups. Have a well-defined security policy at your site. Hacking at the OS level is only one way to get in. As others might tell you, social engineering (basically, the art of getting people to tell you password or security information) is easy to pull off if you’re knowledgeable enough. Take the time to educate your users; tell them who has access to what areas and which passwords. Encourage your users to use passwords that are hard to guess. Basically, the more parts of the keyboard that you can use, the harder it will be to guess and/or crack the password. You might suggest acronyms for phrases, rather than words or dates.

It’s possible to have too many security measures in place. You want to make sure to have the most restrictive security policy in place without restricting access to your needed services, and that’s a fine line. If everyone in your environment is using ftp to access services, implementing secure ftp as a replacement can be an uphill battle. Be ready to strike some compromise. Try to get upper management involved in the decisions and get them to support whatever security policies you have in place. Otherwise, the buck will stop with you.

Security at the System Level

Use /etc/hosts.allow and /etc/hosts.deny liberally to secure who gains access to which services.

Check your password and group files at regular intervals. A sure sign of a compromise is to find extra users on your system that nobody recognizes. Utilize shadow passwords and change them on a regular basis. Use only secure transmission of passwords. Telnet should never be used in most circumstances. SSH is a fine replacement and is found in both an open sourced and closed source version. You can find the commercial version of SSH at ftp.ssh.com, and the open-sourced version at www.openssh.com.

Close down unnecessary ports in /etc/inetd.conf or edit the file so that it passes through TCP Wrappers. Keep a watch for programs that are set SUID.

Install traps to detect intrusion, such as Tripwire (www.tripwire.com). Be sure to monitor the reports.

Patch your system. Although patching can be a full-time job in the enterprise, try to keep an eye out at least once a week to download and install fixes for services you’re running.

It’s also a good idea to have an image of your more important servers off-hand in case your server is compromised. Newer technologies make bare metal recovery much easier; at the least, a decent system imager and a restore option can bring your system up in hours rather than days.

Security at the Network Level

Examine the services listed in /etc/services and only allow access through the firewall to the ports you absolutely need. Remember that this method doesn’t work if you’re on the same subnet. Consider blocking insecure services, such as telnet and ftp from the firewall as well.

Monitoring tools, such as Big Brother (http://bb4.com) and NetSaint (www.netsaint.org), can be set up to notify a responsible person on certain events.

Physical Level Security

Keep your cluster behind locked doors, inaccessible from the general public. A knowledgeable cracker can easily gain root access at the boot prompt if not properly secured, or with a CD-ROM in hand. If that weren’t enough, a simple disconnected power cord would wreak havoc on the live data and potentially cause a great deal of lost revenue.

For increased security at the local level, one might consider even implementing passwords at both the bios and the lilo level, which helps prevent local access. You also might consider removing the CD-ROM and floppy drives, which helps prevent someone from booting the system by using alternate methods.

Many datacenters have some sort of keycard access to restrict access to the servers in addition to regular surveillance methods. More advanced colocation centers have a scale that weighs everyone so that security personnel can be assured that customers don’t leave with more than they take in.

Don’t Let Your Security Turn Against You

Keep the firewall access down to a minimum; only a select few should be able to modify the rules. A misconfigured firewall can be set to deny needed ports, or worse, all access from the internal network.

Keep tabs on your clusters. A malevolent user can easily hijack a process on a parallel cluster to run a crack program, enabling your high performance solution to compromise security or worse yet, another’s security on your equipment. Just imagine what could happen if your distributed cluster were somehow pointed against a popular site in a denial of service attack.

TCP Wrappers

TCP Wrappers was written by Wietse Venema to log incoming connections. Like the name says, TCP Wrappers provides tiny daemon wrappers to log incoming hostnames and the services that they request. TCP Wrappers can be downloaded at ftp://ftp.porcupine.org/pub/security/ and comes preinstalled on many distributions.

TCP Wrappers can monitor and secure incoming connections to popular services, such as telnet, ssh, rsh, and so on. In essence, it becomes like a front end, or “wrapper,” for these services and logs the connection information, allowing or denying resources based on the finding from these wrappers.

To install, first grab the latest release and uncompress to your local machine. The installation isn’t that hard, although some edits are necessary. After uncompressing the image, chmod the Makefile to 644.

Edit Makefile for the correct choice of REAL_DAEMON_DIR. Set FACILITY to log to LOG_AUTH.

Edit percent_m.c file to comment off sys_errlist[] declaration:

/*  extern char *sys_errlist[]; */ 

Compile.

make linux 

TCP Wrappers does not come with make install functionality. Here’s a simple script that does just that.

#!/bin/sh 
for file in safe_finger tcpd tcpdchk tcpdmatch try-from 
do 
/usr/bin/install -m 0555 -o root -g daemon $file /usr/sbin 
done 
/usr/bin/install -m 0444 -o root -g daemon tcpd.h /usr/include 
/usr/bin/install -m 0555 -o root -g daemon libwrap.a /usr/lib 

Set up syslogd to log by AUTH. Add the following line in /etc/syslog.conf. Don’t forget to use a tab to separate fields.

auth.info     /var/log/authlog 

Create the logging file:

touch /var/log/authlog 
chmod 600 /var/log/authlog 
chown root /var/log/authlog 

Send HUP signal to syslogd and test with logger program:

logger -p auth.info Test 

If everything goes right, you’ll see the “Test” message that you just sent in /var/log/authlog.

Edit /etc/inetd.conf to use TCP Wrappers for services. For example, change

ftp stream    tcp     nowait   root     /usr/sbin/tcpd   in.ftpd 

to read:

ftp  stream   tcp      nowait   root    /usr/local/sbin/tcpd 
/usr/sbin/in.ftpd 

Send HUP signal to syslogd. Congratulations, you now have a working TCP Wrappers.

If you’re curious to know how Wietse Venema pronounces his name, download the .wav file at www.porcupine.org/wietse/wietse.wav.

Secure Shell

Secure Shell (SSH) was developed as an alternative to insecure login methods. It is a replacement for rsh, rcp, and rlogin, as well as telnet and ftp. SSH also allows for secure X connections, as it never sends a clear text password. SSH, when used effectively, can easily replace these programs without the end user being the wiser. SSH provides the same functionality as telnet; however, SFTP includes more features, such as a percent transferred monitor. Scp can securely copy files across the wire without clear text. Two different implementations of the SSH protocol exist: One is a commercial version that you can buy with support—the other, a non-commercial, open sourced version. OpenSSH is becoming the standard in the Linux/BSD community.

SSH uses different ciphers for encryption, such as 3DES, IDEA, Blowfish, Twofish, ArcFour, and Cast128-cbc. SSH also uses DSA and RSA for authenti-cation. You can authenticate using a public key, password, kerberos, or .rhosts login. SSH protects against IP and DNS spoofing, attacks based on X authentication data, as well as attacks from snooping.

SSH comes in two versions, appropriately named 1 and 2. 1 has largely been superceded by 2, although it remains in wide use. 2 is by far the most common. Version 3 is available, but it doesn’t refer to protocol; it still installs only Versions 1 and 2 by default.

Be sure that, when using SSH, your passwords are more than two characters. A bug in the commercial release, 3.0.0, enabled users to bypass authentication if the password contained two characters or less. In general, it’s good practice to have as many varied characters in your passwords as possible.

The non-commercial versions can be downloaded anonymously from ftp.openssh.com, in the /pub/ssh directory. Commercial versions allow for support, whereas non-commercial versions are restricted to submitting bug reports.

Installing SSH in your environment is easy. If at all possible, install SSH using TCP Wrappers so that all incoming connections are logged. In the most simple installation, you can install SSH by using configure, make, make install. Due to the nature of this program, be sure that you get it from a reliable source. Just like whenever you download a program, it’s got the potential to be compromised to enable security loopholes and backdoors. In some countries (notably Russia, Iraq, Pakistan, and France), it might be illegal to use encryption without a special permit.

By default, SSH 2 isn’t compatible with SSH 1. You can make them compatible by doing a few manual edits. It’s a good idea to read the README file before starting; there might be other options that you want to have in your environment. First of all, install SSH 1 with TCP Wrappers and disable the SUID bit:

tar xzvf ssh-<version number> 
cd ssh-<version number> 
./configure —with-libwrap="<path to libwrap>" —disable-suid-ssh 
make 
su 
make install 

SSH 2 can be installed using the same method. After installing SSH 1, alter SSH 2 configuration so that it can call SSH 1 when needed.

Insert/edit the lines in /etc/ssh2/ssh2_config so that the following are in the file in this form:

Ssh1Compatibility               yes 
Ssh1Path                        /usr/local/bin/ssh1 

Insert/edit the lines in /etc/ssh2/sshd2_config so that the following are in the file in this form:

Ssh1Compatibility               yes 
Ssh1Path                        /usr/local/sbin/sshd1 

Starting with 3.0.0, the SSH distribution comes with startup scripts in ssh-<version>/startup/linux/redhat. This sshd2 script can easily be modified to support any version of Linux you happen to be running. With Red Hat, this is designed to go in the /etc/rc.d/init.d/ directory, with appropriate links in rc3.d and rc0.d. Other distributions can easily start this daemon up at boot with an entry in rc.local.

SSH can usually be invoked with the following syntax:

/usr/local/bin/ssh user@remote_host. 

Sftp is handled in much the same way:

/usr/local/bin/sftp user@remote_host. 

When copying a file with Secure Copy, the syntax is as follows:

/usr/local/bin/scp user@local_host:/path_to_file/file 
user@remote_host:/pathtofile/file 

For more information, check out the SSH guide for UNIX administrators at www.ssh.com/support/ssh/ssh2-adminguide.pdf.

SSH Tips and Tricks

SSH is a great tool for remote connections, although documentation is hard to come by at times. Here are some little known tips and tricks that every user should have at his or her disposal.

SSH supports the tilde character (∼) for escape sequences. Pressing ∼? presents you with a list:

Supported escape sequences: 
∼.  - terminate connection 
∼^Z - suspend ssh 
∼#  - list forwarded connections 
∼&  - background ssh (when waiting for connections to terminate) 
∼?  - this message 
∼∼  - send the escape character by typing it twice 
(Note that escapes are only recognized immediately after newline.) 

You can restrict your users from logging in remotely as root by changing the /etc/ssh2/sshd2_config file. Change the following line:

PermitRootLogin   yes 

to

PermitRootLogin   no 

Using X11 forwarding, you can export the display of your favorite X application, even behind a firewall. Forwarding only works if you didn’t expressly disable it during configuration. To enable X11 forwarding, edit or add the following in your /etc/ssh2/sshd2 configuration file:

ForwardX11 yes 

At the remote server, simply type the name of the X program you’d like to forward. Do not set the DISPLAY variable to the local machine; SSH takes care of that for you.

Distributing Patches, Updates, and Software Securely

SSH distributes software to remote machines through scripts and without user intervention. One way to distribute code is to set up a script with a dummy account using SSH and without having to type passwords in every time. Here’s a method to enable SSH 1 to accept logins without a password: Log on to the client and generate a new passkey:

ssh-keygen1 -b 1024 -C ssh-key 

When the script asks for a passphrase, hit Enter twice for no passphrase. Next, copy the passphrase over to the remote machine in which you want to authenticate to:

scp localhost:∼/.ssh/identity.pub [email protected]: 

Log on to the remote machine with SSH:

mkdir .ssh 

chmod 700 .ssh 

cp identity.pub .ssh/authorized_keys 

Exit the shell. When you log back in with SSH 1, you are not prompted for a password. This enables you to use scp to securely copy scripts and files within scripts.

Developing a Backup Policy

It’s been said that your data are only as good as your last backup. With that in mind, it’s absolutely essential that you keep current backups of your data. Often, the data on the computer is worth more than the computer. Although clusters are expensive, replacing them is a much easier task than replacing all the data on them. It might be okay to lose the data on your home workstation, but when your HA cluster goes down at the multinational bank, it’s time to break out the resume.

Only the master server from a cluster should be backed up. There’s no sense in backing up repetitive data from slave nodes. HA clusters should back up the master node at least once a day (more if the data is sensitive). On parallel and distributed clusters, all the data is stored on the master node with jobs being farmed out to slave nodes. Backups of these slave nodes would just result in copies of the operating system.

Develop a Comprehensive Backup Plan

The first part of any backup plan has to include the retention policy. All further choices should revolve around how long you’re going to keep your tapes around before being reused.

Depending on the data being backed up, you’re going to have to keep some tapes around for quite some time. Financial institutions and records departments need data around for a minimum of seven years. If this is included in your backup strategy, be sure to keep it in mind when choosing your backup media (tapes don’t come cheap).

Who is going to pay for the tapes? If your environment has different cost centers, a good strategy is to develop a service-level agreement (SLA) between the backup operator and the department whose data is being backed up. Depending on the urgency of the data and the cost center, a department might not need to have its server included in a backup scheme.

When reusing tapes, decide on a lifecycle complementary to your budget and your environment. Although some tapes are rated at millions of reuses, do you really trust your company’s data to a worn tape?

Decide how often you’re going to do full backups. A fine line exists between backup and restore functionality. On one hand, a full backup once a day might be a bit excessive if the data is relatively static; however, if you can afford the media and time, this is among the best strategies out there. More common is the full backup once a week. In that scenario, if you lose data mid-week you’ve either lost that amount of data or have to restore not only the full backup, but also the incremental data since the last full backup. This can take some time, so decide which is more important: budget and backups or data and restores.

Make sure that the network backbone is capable of supporting backups if you’re doing it over the network. A nightly backup can totally saturate a regular ethernet network and render communication (even backups) useless due to the amount of time involved. Consider a separate backup network if your regular network can’t handle the traffic or your backup data needs to be secure.

Backups aren’t inherently secure. Anyone with access to a tape drive can pull off sensitive data. Several popular programs, including Veritas Netbackup and Legato Networker, are installed with root permissions. Therefore, a malicious user can pull whatever he wants from disk and dump it wherever he wants. A backup tape can also be read back into almost any file system. Secure your media as closely as you would your most sensitive data.

Select the Best Backup Strategy Available

Budget is often the deciding factor when selecting the proper backup program for any particular environment. You can find several open source programs that can back up data well, although the closed source programs support tape robots and autochangers.

Linux support from the closed source vendors has typically been spotty in the past, with the operating system being more of an afterthought than a supported OS.

dump

dump is the good ol’ reliable standard for UNIX backups and provides the back end for a good many programs.

dump scans the inodes of a particular file system to decide whether or not to backup a file that resides in a single partition. dump stores its information about backups in text format in /etc/dumpdates. This file keeps track of dates so that incremental backups can be performed.

dump can span more than one tape, although it takes user intervention to do so. Adding the -F flag tells dump to run a script at the end of each tape. This can flag a backup operator and let him know that a tape must be changed. The -n flag, when used, also sends a “wall” type request to all users in the operator group when dump needs attention.

The format of dump (from the man page) is

/sbin/dump [dumplevel] [-B records] [-b blocksize] [-d density] [-e inode 
number] [-f file] [-h level] [-L label] [-s feet] [-T date] file_to_dump 

In other words, it’s something similar to

/sbin/dump -0u -f /dev/st0 /usr/src 

This invokes dump with a full backup, adds the information to /etc/dumpdates, and writes the data from /usr/src to the first SCSI tape device. If you specify dump without the -u flag, it automatically assumes a full dump because of the fact that it doesn’t consult the /etc/dumpdates —and, therefore, doesn’t know that there was ever a previous dump.

You also can use the -f flag to send the dump to a remote device. Using /sbin/dump -0u -f remote:/dev/st1 /dev/hdc1 sends the contents to the machine called “remote.”

restore

Now that you’ve got all your data backed up, the way to restore it is by simply using the restore command. restore works either by restoring the entire file system that you initially backed up using dump or by an interactive file mode.

To restore a file system interactively, use the i switch. This allows you to browse the dump with ls, cd, and pwd. To mark the directories and files that you want to restore, use the add command. After you’re done selecting the files to restore, the extract command restores the files to the directory (recursively, if needed) in which you started the restore. Entering help gets you a list of available commands.

When finished, an interactive restore gives you an option to set owner/mode for '.'? [y/n]. Answering y changes all permissions to root or whomever did the restore. It’s almost always a good idea to answer n to keep the permissions intact.

Restoring an entire file system takes a different approach but isn’t much harder. Hopefully, you’ll never have to do this, but sometimes the situation presents itself during the most inopportune moments. The file system in question must first be prepared with mkfs. After that, you have to restore from the last full dump, then each incremental on top of that.

First, prepare the file system with mkfs:

mkfs /dev/hdc1 
mount /dev/hdc1 /mnt 
cd /mnt 
restore rf /dev/st0 

This initial restore restores from the first file dumped onto tape. If you’re restoring more than one file from the same tape, be sure to use the no rewind device and use the mt command to move backward and forward around the tape.

g4u

Ghost for UNIX (g4u) is a simple program that takes a snapshot of an operating system, passes it through gzip, and sends the image over the network to a preconfigured ftp account. g4u works on Intel-based hardware only, yet it will make an image of virtually any OS, including Linux, BSD, and Microsoft Windows.

Although not a backup program per se, its use is most invaluable for providing quick and inexpensive images. g4u is basically a great little hack. It entails a preconfigured NetBSD distribution on a floppy that supports most ethernet cards. It will back up an entire drive, including boot sector information, lilo, and the partition tables. It supports IDE and SCSI drives.

To setup g4u, download the image at www.feyrer.de/g4u/ and copy it to a floppy with either

dd if=/g4u-1.5.fs of=/dev/fd0 

or

catg4u-1.5.fs > /dev/fd0 

g4u needs an ftp server that contains an “install” username and a working DHCP server. (See Chapter 3 for information on configuring the DHCP server.)

Boot the server with the floppy installed. The NetBSD kernel will boot, detect the ethernet card if it supports it, and then offer a prompt. The format for the upload is easy:

uploaddisk ftp.server.com <filename> 

The program then prompts for the install account’s password. Enter the password and wait for the image to be uploaded.

Restoring the image is done in much the same way. Boot from the floppy and enter slurpdisk instead of uploaddisk at the prompt.

If you’d like to use an SCSI disk instead of an IDE disk, append sd0 at the end of the command line.

Veritas Netbackup

Netbackup is really two programs marketed as one backup solution. Veritas uses its Media Manager to control the tape robot and the inventory of the tapes or other media. The other program is GNU Tar.

Netbackup is an enterprise-level backup scheme, as its price range tends to be outside the budget of most home users. Netbackup also doesn’t support Linux as a server, but it will back up a Linux client.

The java interface is kludgy and tends not to work at times, and the wizards that are included tend to configure the program incorrectly. The tools included in the command line follow the UNIX standard of sharp, distinct tools. All the configuration files are stored in text format, making editing easy, although Veritas provides tools to manipulate the data. A motif-based interface is also available, which works well, but isn’t as feature-rich. When used properly, you don’t need Netbackup to restore the tapes; you can simply use tar to read back the information—provided you know which image is on which tape.

Technical support for Netbackup tends to follow the typical Veritas approach of “ask no questions.” Support tends to be outstanding, and the tech support engineers freely offer their direct lines.

Legato Networker

Legato’s Networker product offers a relatively easy GUI to navigate, and the setup is straightforward. Networker has a great data transfer rate monitor on the display, which makes tuning easy. Networker also includes a separate command-line program that makes controlling remote backups easy without having to memorize all the command-line tools and options.

Networker stores its data and logs in a proprietary format, making parsing by hand impossible, although there are command-line programs that you can run. Networker also grooms the logs at random times, which can cause data corruption should you attempt a backup at the same time. Data corruption in the logs usually means that you have to start from scratch with an individual client—meaning that you’ve lost all previous backups.

Legato bundles Networker for sale by other vendors, so you might be able to get more attractive pricing under a different name.

Backup Tips

Hopefully, the first thing you learned in system administrator school was how to make decent backups and that they need to be taken at reasonable periods of time, like every day, for example. With all the emphasis on backups, it’s a good chance that you’ve got some method already in place, although you might not have considered all aspects of backup management. Here are a few tips to help the beginning cluster administrator along.

Do You Really Need All that Data?

There’s nothing wrong with full backups. In fact, there’s no such a thing as too many backups. The more sensitive and volatile the data is, the more frequent the backups occur. Some companies do backups every half hour of important data.

It doesn’t hurt, however, to be selective about the type of data you back up. All the dynamic data directories should be backed up, of course—the log files and the user files. But how often do you need to back up the root partition? On many systems, this partition should not change often, if at all. The same goes for /usr/local/bin. It’s not a bad idea to back these up constantly, but you just don’t need to. These generally won’t change, and you should be able to replace most of the data with a master image.

Keep your Backups Offsite

There’s nothing like a good disaster to rid your datacenter of equipment. A small planet falling on your organization can really ruin your day, especially if it also destroys your disaster-recovery plan.

Make sure that you’ve got an offsite strategy for your media. This can include your backup operator taking the tapes home with him or finding a service that will do this for you. Services like these often have reasonable rates and secure, temperature-controlled storage. Be sure your SLA with your tape company (or your backup operator) includes a short retrieval time.

Yes, But Can You Recover the Data?

You can run backups all night long, day after day, but the data is only as good as the restore. I can’t stress this enough. Do test restores often. Don’t wait until you’ve lost critical data to find out that all the data you’ve been backing up isn’t valid. Optimally, set aside time to do test restores on a backup server that mirrors your primary server. It’s also a good idea to be able to read tapes from drives that aren’t your own. It would be of no value if you couldn’t read from a brand new drive after your entire datacenter burned down.

Secure Backups

Running remote backups over SSH is easy. Just do

tar cvf - | ssh user@host "dd of=/dev/tape" 

Summary

Proper planning of a Linux cluster doesn’t only start with the choice and purchasing of computer equipment, but also with the designing the environment to go with it. Special considerations include the air conditioning requirements, security, and a decent backup strategy.

It’s important to weigh these decisions and plan them out with management approval as a concise strategy that you stick with. It’ll save you many headaches down the road.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.17.62.34