Home Page Icon
Home Page
Table of Contents for
I. Introduction to Cluster Concepts
Close
I. Introduction to Cluster Concepts
by Robert W. Lucke
Building Clustered Linux Systems
Copyright
Dedication
Praise for Building Clustered Linux Systems
Hewlett-Packard® Professional Books
List of Figures
List of Tables
Preface
About This Book
Notation and Conventions
Using This Book
Production Information
Acknowledgments
Introduction
I. Introduction to Cluster Concepts
1. Parallel Power: Defining the Clustered System Approach
1.1. Avoiding Difficulties with the Word Cluster
1.2. Defining a Cluster
1.3. The Evolution of a Clustered Solution
1.3.1. Uniprocessor Systems (UPs)
1.3.2. SMP Systems
1.3.3. Networks of Independent Systems
1.3.3.1. The Introduction of Microprocessors
1.3.3.2. Evolution of Network Connections
1.3.3.3. Remote Procedure Calls
1.3.3.4. Tying Everything Together
1.4. Collapsed Network Computing for Engineering
1.5. Scientific Cluster Computing
1.5.1. An Example Parallel Problem
1.5.2. Refining the Parallel Example
1.5.3. Software Communication Facilities
1.5.4. High-Speed Interconnect (HSI)
1.6. Revisiting the Definition of Cluster
1.7. Commercial Cluster Computing
1.8. High Performance, High Throughput, and High Availability
1.9. A Formal Definition of Cluster
1.10. The Why and Wherefore of Clusters
1.11. Summary
2. One Step at a Time: A Process for Building Clusters
2.1. Building Clusters as a Complex Endeavor
2.2. Talking about the “P Word”
2.3. Presenting a Formal Cluster Creation Process
2.3.1. Phase 1: Cluster Solution Design
2.3.1.1. Technical Analysis
2.3.1.2. Preliminary Solution Design
2.3.1.3. Final Solution Design
2.3.2. Phase 2: Cluster Installation
2.3.2.1. Site Preparation
2.3.2.2. Physical Hardware Assembly
2.3.2.3. Software Installation and Configuration
2.3.3. Phase 3: Cluster Testing
2.3.3.1. Cluster Operational Testing
2.3.3.2. Cluster Acceptance
2.3.3.3. Full Operation and Release to Production
2.4. Formal Cluster Process Summary
II. Cluster Architecture and Hardware Components
3. Underneath the Hood: Cluster Hardware Components and Architecture
3.1. Hardware Categories in a Cluster
3.1.1. Passive Hardware Elements in a Cluster
3.1.2. Active Hardware Elements in a Cluster
3.1.3. Cluster Resources and the “Outside” World
3.2. A Survey of Cluster Hardware Configurations
3.3. High-Throughput Cluster Configurations
3.3.1. A “Carpet” Cluster
3.3.2. Compute “Farms and Ranches”
3.4. High-Availability Cluster Configurations
3.4.1. An Example “Virtual” Web Server
3.4.2. A Parallel Database Server
3.5. High-Performance Cluster Configurations
3.5.1. A Visualization Cluster
3.5.2. High-Performance Parallel Application Configurations
3.6. Common Cluster Hardware Architecture
3.7. Cluster Hardware Architecture Summary
4. Any Way You Slice It: Work and Master Nodes in a Cluster
4.1. Criteria for Selecting Compute Slices
4.2. An Example Compute Slice from Hewlett-Packard
4.2.1. Analysis of the Example Compute Slice
4.2.2. Comparing the Example Compute Slice with Similar Systems
4.2.3. Example Clusters Using Our Compute Slices
4.3. Thirty-two Bit and 64-Bit Compute Slices
4.3.1. Physical RAM Addressing
4.3.2. Process Virtual Address Space
4.3.3. Software Implications of 64-Bit Hardware
4.4. Memory Bandwidth
4.5. Memory and Cache Latency
4.6. Number of Processors in a Compute Slice
4.7. I/O Interface Capacity and Performance
4.7.1. PCI Implementation
4.7.2. Accelerated Graphics Port
4.8. Compute Slice Operating System Support
4.9. Master Node Characteristics
4.10. Compute Slice and Master Node Summary
5. Packet In: Cluster Networking Basics and Example Devices
5.1. A Short View of Ethernet Networking History
5.2. The Open System Interconnect (OSI) Communication Model
5.3. Ethernet Network Topologies
5.3.1. Ethernet Frames
5.3.2. Ethernet Hubs
5.3.3. Network Routers
5.4. Internet Protocol and Addressing
5.4.1. IP and TCP/UDP
5.4.2. IP Addressing
5.4.3. IP Subnetting
5.4.4. IP Supernetting
5.4.5. Ethernet Unicast, Multicast, and Broadcast Frames
5.4.6. Address Resolution Protocol (ARP)
5.4.7. IPv4 and IPv6
5.4.8. Private, Nonroutable Network Addresses
5.5. Ethernet Switching Technology
5.5.1. Half and Full Duplex Operation
5.5.2. Store and Forward versus Cut-through Switching
5.5.3. Collision Domains and Switching
5.5.4. Link Aggregation
5.5.5. Virtual LANs
5.5.6. Jumbo Frames
5.5.7. Managed versus Unmanaged Switches
5.6. Example Switches
5.6.1. A GbE Edge Switch
5.6.2. Ethernet Core Switches
5.7. Ethernet Networking Summary
6. Tying It Together: Cluster Data, Management, and Control Networks
6.1. Networked System Management and Serial Port Access
6.1.1. Remote System Management Access
6.1.2. Keyboard, Video, and Mouse Switches
6.1.3. Serial Port Concentrators or Switches
6.2. Cluster Ethernet Network Design
6.2.1. Choosing a Clusterwide IP Address Scheme
6.2.2. IP Addressing Conventions
6.2.3. Using Nonroutable Network Addresses
6.3. An Example Cluster Ethernet Network Design
6.3.1. Choosing the Type of Network and Address Ranges
6.3.2. Device Addressing Schemes
6.3.3. The Management and Control Networks
6.3.4. The Data Network
6.3.5. Example IP Address Assignments
6.4. Cluster Network Design Summary
7. Life in the Fast LAN: HSIs and Your Cluster
7.1. HSIs
7.2. HSI Latency and Bandwidth
7.3. Examining HSI Topologies
7.3.1. Some Common Topologies
7.3.2. Cross-Sectional Bandwidth
7.3.3. Clos Networks
7.3.4. Fat Tree Networks
7.4. Ethernet for HSI
7.4.1. An Example Ethernet HSI Network
7.4.2. Direct Attach Example Bandwidth
7.4.3. Multilevel Attach Example Bandwidth
7.4.4. A Larger Ethernet HSI Example
7.4.5. Other Ethernet HSI Configurations
7.5. Myricom's Myrinet HSI
7.6. Infiniband
7.7. Dolphin
7.8. Quadrics QsNet
7.9. HSI Technology Summary and Comparison
III. Cluster Software Architecture
8. The Right Stuff: Linux as the Basis for Clusters
8.1. Choosing a Cluster Operating System
8.1.1. Hardware Support
8.1.2. Operating System Stability
8.1.3. Software License Costs
8.1.4. Manageability
8.1.5. Software Flexibility
8.1.6. Openness
8.1.7. Scalability
8.1.8. Software Availability and Cost
8.1.9. Multiple Support Options
8.2. Introducing the Linux Operating System and Licensing
8.3. Linux Distributions
8.4. Managing Open-Source Software “Churn”
8.5. Commercial Linux Distributions
8.5.1. Red Hat Linux
8.5.2. SUSE Linux
8.5.3. Conclusions about Commercial Linux Distributions
8.6. Free Linux Distributions
8.6.1. The Fedora Project
8.6.2. Debian Linux
8.6.3. Conclusions about Free Distributions
8.7. Conclusions about Linux for Clusters
9. Round and Round It Goes: Booting, Disks, Partitioning, and Local File Systems
9.1. Disk Partitioning, Booting, and the BIOS
9.1.1. Default Disk Partitioning
9.1.2. A Brief Note on IA-64 Disk Partitioning
9.1.3. Red Hat Linux Boot Loaders
9.2. Booting the Linux Kernel
9.3. The Linux Initial RAM Disk Image
9.4. Linux Local Disk Storage
9.4.1. Using the Software RAID 5 Facility
9.4.2. Using Software RAID 1 for System Disks
9.4.3. RAID Multipath
9.4.4. Recovering from Software RAID Failures
9.4.4.1. Saving the Disk Partition Table
9.4.4.2. Determining Software RAID Array Status
9.4.4.3. Using mdadm in Place of raidtools
9.4.4.4. Monitoring Arrays with mdadm
9.5. Linux File System Types
9.6. The Linux /proc and devfs Pseudo File Systems
9.7. The Linux ext2 and ext3 Physical File Systems
9.7.1. File System Volume Labels
9.7.2. Creating the Example ext3 File System
9.7.3. Linux ext3 Journal Behavior and Options
9.7.4. The ext File System Stride Option for RAID
9.8. Standard Mount Options for All File Systems
9.9. The Temporary File System
9.10. Other Available File System Types
9.11. Advanced Performance Tuning
9.12. A Word about SMART Monitoring for Disks
9.13. Local Disks and File Systems Summary
10. Supporting Role: Infrastructure Services and Administration
10.1. The Big Infrastructure Picture
10.2. Initializing Your Cluster's Software Infrastructure
10.3. Infrastructure Implementation Recommendations
10.3.1. Avoiding Service Interference
10.3.2. Redundant Copies of Essential Services
10.3.3. Services with Fall-Back Capabilities
10.3.4. Single-Point Administration
10.3.5. Choosing Efficient Services
10.3.6. Management of Configuration Information
10.4. Protecting Active Configuration Information
10.5. Preparation for Infrastructure Installation
10.5.1. Order of Installation
10.5.2. Steps for Installing Infrastructure Services
10.5.3. Loading the Linux Operating System Distribution
10.6. Networking
10.6.1. Configuring Ethernet Switching Equipment
10.6.2. Network Aliases
10.6.3. Channel Bonding
10.6.4. Setting the Ethernet Link MTU Size
10.6.5. The Media-Independent Interface (MII) Tool
10.7. Enabling and Starting Linux Services
10.8. Time Synchronization
10.9. Name Services
10.9.1. Host Naming Conventions
10.9.2. The Name Service Switch File
10.9.3. The Hosts File
10.9.4. The DNS
10.9.5. The NIS
10.9.5.1. NIS Server Configuration
10.9.5.2. Modifying the NIS Slave Server List
10.9.5.3. NIS Slave Server Configuration
10.9.5.4. NIS Client Systems
10.9.5.5. Special NIS Configuration Options
10.9.5.6. Adding Custom NIS Maps
10.9.5.7. NIS Testing
10.9.5.8. NIS Summary
10.9.6. Name Resolution Recommendations
10.10. Infrastructure Services Summary
11. Reach Out and Access Something: Remote Access Services, DHCP, and System Logging
11.1. Continuing Infrastructure Installation
11.2. “Traditional” User Login and Authentication
11.2.1. Using Groups and Directory Permissions
11.2.2. Distributing Password Information with NIS
11.2.3. Introducing Kerberos
11.2.4. Configuring a Kerberos KDC on Linux
11.2.5. Creating a Kerberos Slave KDC
11.2.6. Kerberos Summary
11.3. Remote Access Services
11.4. Using BSD Remote Access Services
11.5. Kerberized Versions of BSD/ARPA Remote Services
11.6. The Secure Shell
11.6.1. SSH and Public Key Encryption
11.6.2. Configuring the SSH Client and Server
11.6.3. Configuring User Identity for SSH
11.6.4. SSH Host Keys, and Known and Authorized Hosts
11.6.5. Using the Authorized Keys File
11.6.6. Fine-Tuning SSH Access
11.6.7. SSH scp and sftp Commands
11.6.8. SSH Forwarding
11.6.9. SSH Summary
11.7. The Parallel Distributed Shell
11.7.1. Getting and Installing PDSH
11.7.2. Compiling PDSH to Use SSH
11.7.3. Using PDSH in Your Cluster
11.7.4. PDSH Summary
11.8. Configuring DHCP
11.8.1. Client-side DHCP Information
11.8.2. Configuring the DHCP Server
11.9. Logging System Activity
11.9.1. Operation of the System Logging Daemon
11.9.2. Kernel Message Logging
11.9.3. Enabling Remote Logging
11.9.4. Using logrotate to Archive Log Files
11.9.5. Using logwatch Reporting
11.9.6. An Example Subsystem Logging Design
11.9.7. Linux System Logging Summary
11.10. Access and Logging Services Summary
12. Installment Plan: Introduction to Compute Slice Configuration and Installation
12.1. Compute Slice Configuration Considerations
12.2. One Thousand Pieces Flying in Close Formation
12.3. The Single-System View
12.3.1. Shared System Structure, Individual System Personality
12.3.2. Accomplishing Shared System Structure
12.3.3. Compute Slice Software Requirements
12.4. A Generalized Network Boot Facility: pxelinux
12.4.1. Configuring TFTP for Booting
12.4.2. Configuring the pxelinux Software
12.4.3. The pxelinux Configuration Files
12.5. Configuring Network kickstart
12.5.1. The kickstart File Format
12.5.2. Making the Install Media Available for kickstart
12.5.3. The Network kickstart Directory
12.6. NFS Diskless Configuration
12.6.1. The Linux Terminal Server Project (LTSP)
12.6.2. Cluster NFS
12.7. Introduction to Compute Slice Installation Summary
13. Improving Your Images: System Installation with SystemImager
13.1. Using the SystemImager Software
13.1.1. Downloading and Installing SI
13.1.2. Configuring the SI Server
13.1.3. The SI Cold Installation Boot Process
13.1.4. SI Server Commands
13.1.5. Installing and Configuring the SI Client Software
13.1.6. Capturing a Client Image
13.1.7. Forcing Hardware-to-Driver Mapping with SystemConfigurator
13.1.8. Installing a Client Image
13.1.9. Updating Client Software without Reinstalling
13.1.10. Image Management and Naming
13.1.11. Avoiding the Big MAC-Gathering Syndrome
13.1.12. Summary
13.2. Multicast Installation
13.2.1. Multicast Basics
13.2.2. An Open-Source Multicast Facility: udpcast
13.2.3. A Simple Multicast Example
13.2.4. A More Complex Example
13.2.5. Command-line Prototyping with Multicast
13.2.6. Prototyping a Network Multicast Installation
13.2.7. Making More Modifications
13.2.8. Generalizing the Multicast Installation Prototype
13.2.9. Triggering a Multicast Installation
13.3. The SI flamethrower Facility
13.3.1. Installing flamethrower
13.3.2. Activating flamethrower
13.3.3. Additional SI Functionality in Version 3.2.0
13.4. System Installation with SI Summary
14. To Protect and Serve: Providing Data to Your Cluster
14.1. Introduction to Cluster File Systems
14.1.1. Cluster File System Requirements
14.1.2. Networked File System Access
14.1.3. Parallel File System Access
14.2. The NFS
14.2.1. Enabling NFS on the Server
14.2.2. Adjusting NFS Mount Daemon Protocol Behavior
14.2.3. Tuning the NFS Server Network Parameters
14.2.4. NFS and TCP Wrappers
14.2.5. Exporting File Systems on the NFS Server
14.2.6. Starting the NFS Server Subsystem
14.2.7. NFS Client Mount Parameters
14.2.8. Using autofs on NFS Clients
14.2.9. NFS Summary
14.3. A Survey of Some Open-Source Parallel File Systems
14.3.1. The Parallel Virtual File System (PVFS)
14.3.2. The Open Global File System (OpenGFS)
14.3.3. The Lustre File System
14.4. Commercially Available Cluster File Systems
14.4.1. Red Hat Global File System (GFS)
14.4.2. The PolyServe Matrix File System
14.4.3. Oracle Cluster File System (OCFS)
14.5. Cluster File System Summary
15. Stuck in the Middle: Cluster Middleware
15.1. Introduction to Cluster Middleware
15.1.1. Describing the Parallel Application Execution Environment
15.1.2. The HSI Message-Passing Facility
15.1.3. Load Balancing or Job Scheduling
15.1.4. Cluster Resource Management
15.1.5. Custom Scheduling
15.1.6. Monitoring, Measuring, and Managing Your Cluster
15.2. The MPICH Library
15.2.1. Introduction to MPICH
15.2.2. Downloading and Installing MPICH
15.2.3. Using mpirun
15.2.4. Special Versions of MPICH
15.2.5. MPICH Summary
15.3. The Simple Linux Utility for Resource Management
15.4. The Maui Scheduler
15.4.1. Maui Scheduler Software Architecture
15.4.2. Job Scheduling in Maui
15.4.3. Maui Scheduler Summary
15.5. The Ganglia Distributed Monitoring and Execution System
15.5.1. The Ganglia Software Architecture
15.5.2. Introducing RRD Software: rrdtool
15.5.3. Downloading and Installing Ganglia Software
15.5.4. Ganglia's gmond and gmetad Daemons
15.5.5. Adding Your Own Ganglia Metrics
15.5.6. Parallel Authentication with authd and gexec
15.5.7. Starting Parallel Programs with gexec
15.5.8. Ganglia Summary
15.6. Monitoring with Nagios
15.6.1. Explaining Nagios
15.6.2. Downloading and Installing Nagios
15.6.3. Configuring the Web Server for Nagios
15.6.4. Configuring and Using Nagios
15.6.5. Nagios Summary
15.7. Cluster Middleware Summary
15.8. An Afterword on Linux High-Availability and Open-Source
16. Put Tab A in Slot C: OSCAR, Rocks, OpenMOSIX, and the Globus Toolkit
16.1. Introducing Cluster-Building Toolkits
16.2. General Cluster Toolkit Installation Process
16.3. Installing a Cluster with OSCAR
16.3.1. OSCAR Initial Software Installation and Configuration
16.3.2. The OSCAR Installation Wizard
16.3.3. OSCAR Package Configuration
16.3.4. Building an OSCAR Compute Slice Image
16.3.5. Defining and Installing OSCAR Clients
16.3.6. Completing the OSCAR Installation
16.3.7. Adding and Deleting OSCAR Clients
16.3.8. OSCAR Summary
16.4. Installing a Cluster with NPACI Rocks
16.4.1. Getting the Rocks Software
16.4.2. Installing a Cluster Front-End Node Using Rocks
16.4.3. Completing the Installation
16.4.4. Rocks System Administration
16.4.5. Rocks Summary
16.5. The OpenMOSIX Project
16.5.1. Getting and Installing OpenMOSIX
16.5.2. Configuration of OpenMOSIX Clusters
16.5.3. OpenMOSIX Summary
16.6. Introduction to the Grid Concept
16.7. The Globus Toolkit
16.8. Cluster-Building Toolkit Summary
IV. Building and Deploying Your Cluster
17. Dollars and Sense: Cluster Economics
17.1. Initial Perceptions
17.2. Setting the Ground Rules
17.3. Cluster Cabling and Complexity
17.4. Eight-Compute Slice Cluster Hardware Costs
17.5. Sixteen-Compute Slice Cluster Hardware Costs
17.6. Thirty-two-Compute Slice Hardware Costs
17.7. Sixty-four-Compute Slice Hardware Costs
17.8. One Hundred Twenty-eight-Compute Slice Hardware Costs
17.9. The Land beyond 128 Compute Slices
17.10. Hardware Cost Trends and Analysis
17.11. Cluster Economics Summary
18. Racking Your Brains: Example Cluster Rack Assembly Steps
18.1. Examining the Cluster Assembly Process
18.2. Assembly Assumptions
18.3. Some “Rules of Thumb” for Physical Cluster Assembly
18.4. Detailed Cluster Assembly Steps
18.4.1. Physical Rack Assembly
18.4.2. Physical Management Rack Assembly
18.4.3. Physical Compute Rack Assembly
18.4.4. Physical Compute Rack System Installation
18.4.5. Physical Rack Final Assembly and Checkout
18.4.6. Individual System Checkout
18.4.7. Physical Rack Cleanup
18.4.8. Physical Rack Positioning
18.4.9. Interrack Configuration
18.4.10. Interrack Cabling
18.4.11. Final Cluster Hardware Assembly and Checkout
18.4.11.1. Master Rack Power-on
18.4.11.2. Compute Rack Power-on
18.4.11.3. Clusterwide Hardware Verification
18.5. Learning from the Example Steps
18.5.1. Finding Efficiencies in Cluster Construction
18.5.2. Parallelism in Rack Verification and Checkout
18.5.3. Parallelism in Interrack Cabling
18.5.4. Types of Teams and Specific Skills
18.6. Physical Assembly Conclusions
19. Getting Your Cluster Wired: An Example Cable-Labeling Scheme
19.1. Defining the Cable Problem
19.2. Different Classes of Cabling
19.2.1. Intrarack Cables
19.2.2. Interrack Cables
19.3. A First Pass at a Cable-Labeling Scheme
19.4. Refining the Cable Documentation Scheme
19.4.1. Labeling Cable Ends
19.4.2. Tracking and Documenting the Connections
19.5. Calculating the Work in Cable Installation
19.6. Minimizing Interrack Cabling
19.7. Cable Labeling System Summary
20. Physical Constraints: Heat, Space, and Power
20.1. Identifying Physical Constraints for Your Cluster
20.2. Space, the Initial Frontier
20.3. Power-Up Requirements
20.4. System Power Utilization
20.5. Taking the Heat
20.6. Physical Constraints Summary
A. Acronym List
B. List of URLs and Software Sources
B.1. Cluster Construction Tool Kits
B.2. Cluster Design Tools
B.3. Conversion Factors
B.4. File Systems and Volume Management
B.5. General Linux Software
B.6. Grid Tool Kits and Software
B.7. Hardware Vendors
B.8. High-Availability Software
B.9. High-Performance Graphics
B.10. HSI Technologies
B.11. Java Software for Linux
B.12. Linux Distributions and Open-Source License Examples
B.13. Monitoring and Event Generation Software and Dependencies
B.14. Networking Software, Hardware, and Examples
B.15. Open-Source Databases
B.16. Parallel Applications and Development Tools
B.17. Parallel Application Examples
B.18. Performance Benchmarks and Lists
B.19. Protocols and Messaging Libraries
B.20. Resource Management, Parallel Execution, and Scheduling
B.21. Security and Encryption
B.22. System Installation and Management Tools
Glossary
Bibliography
Search in book...
Toggle Font Controls
Playlists
Add To
Create new playlist
Name your new playlist
Playlist description (optional)
Cancel
Create playlist
Sign In
Email address
Password
Forgot Password?
Create account
Login
or
Continue with Facebook
Continue with Google
Sign Up
Full Name
Email address
Confirm Email Address
Password
Login
Create account
or
Continue with Facebook
Continue with Google
Prev
Previous Chapter
Introduction
Next
Next Chapter
1. Parallel Power: Defining the Clustered System Approach
Part I. Introduction to Cluster Concepts
Add Highlight
No Comment
..................Content has been hidden....................
You can't read the all page of ebook, please click
here
login for view all page.
Day Mode
Cloud Mode
Night Mode
Reset