1 fault tolerant, 38
2PC (two-phase commit), 40, 141–142
3PC (three-phase commit), 40
acceleration, 108
ACID (Atomicity, Consistency, Isolation, and Durability), 6, 139–140
address resolution protocol. See ARP
addresses (IP)
Anycast, 102–103
load balancing, 63–65
managing,Wackamole, 92
cost effectiveness, 34
high availability, 26
monitoring, 27–30
release cycles, 30–34
speed, 34–36
AFS protocol, priority placement, 85
agents
MTAs, 28
SNMP, 28
aggregation
passive logs for metrics, 187–188
passive sniffing log, 173–174
periodic batch, 171–172
real-time unicast, 172–173
analysis, real-time, 181–183
Anycast, static content, 102–103
Apache
memory resources, 79
Web server, 88
APIs (application programming interfaces), 67
application-layer load balancers, 67–71
applications
horizontal scalability, 5
mod_perl, 202
wwwstat, 186
applying
logging, 197
Spread, 241–245
architecture
data resiliency, 137
design, 23
five nines availability, 26
high availability
costs, 39–40
Foundry ServerIron, 45–49
growth, 55
load balancing, 41–43
maintenance, 40–41
mission-critical systems, 55–60
peer-based, 49–54
site surveys, 45
traditional, 43–44
logging, 176–177
mission-critical environments, 25
cost effectiveness, 34
high availability, 26
management, 36
monitoring, 27–30
release cycles, 30–34
speed, 34–36
production environments, 12–13
decreasing, 7–8
flawed designs, 6–7
need for, 6
real-world design of, 8–9
Spread, 227–228
ARP (address resolution protocol), 63
response packets, 97
spoofing, 96
Atomicity, Consistency, Isolation, and Durability (ACID), 6, 139–140
five nines availability, 26
costs, 39–40
Foundry ServerIron, 45–49
growth, 55
load balancing, 41–43
maintenance, 40–41
mission-critical systems, 55–60
peer-based, 49–54
site surveys, 45
traditional, 43–44
static content, 90
available resource load balancing, 62
avoiding failure
control, 14–15
disaster recovery, 22–23
rapid development, 15–16
unit testing, 16–17
version control, 18–21
production environments, 12–13
stability, 14–15
disaster recovery, 22–23
rapid development, 15–16
unit testing, 16–17
version control, 18–21
batches, periodic aggregation, 171–172
binlog, 147
black-box load balancers. See Web switches
browser caching, 83
building high availability architectures, 40–41
caches, 134–135
cache-on-demand, 86–87
deploying, 117
invalidation, 134
static content clusters, 83
types of, 107
data, 112–113
distributed, 114–116
integrated/look-aside, 109–112, 122–127
layered/transparent, 107–108
write-thru/write-back, 113–114
Web, 87–88
Ccapacity
horizontal scalability, 5
planning, 72
CARP (Common Address Redundancy Protocol), 49
casual monitors, 177
casual ordering, 228
changeset replication, 145–146
Oracle, 147
selecting, 147
CLF (Common Log Format), 180
clients
spoon-feeding, 108
testing, 223
writing, 220–223
clusters, 50
load balancing, 61–63
IP services, 63–65
web switches, 65–66
load balancing, 73
periodic batch aggregation, 171–172
static content, 82
caching, 83
upper bound, 83
CODA protocol, priority placement, 85
Code, consistency, 41
collecting metrics, 191–193
Common Address Redundancy
Protocol (CARP), 49
Common Log Format (CLF), 180
communication
group (Spread), 227–228
Spread, 229–230
applying, 241–245
configuring, 231–241
installing, 231
compiling Wackamole, 91
complete reliability, 138
complex scheduling, monitoring, 29
computational reuse, 109
Concurrent Versioning System (CVS), 19
conditions, race, 154
databases, 189–193
logging, 177
mod_log_spread, 179
spreadlogd, 178–179
logs, 152
MySQL
defining scope, 206–207
selecting, 207–208
technical setup, 200–206
testing, 223–226
troubleshooting, 208–223
PHP extensions, 203–206
Squid, 98
Wackamole, 92–94
connections
least, 62
Spreads, 178–179
consistency, code, 41
content distribution
caches, 134–135
data, 112–113
deploying, 117
distributed, 114–116
integrated/look-aside, 109–112, 122–127
layered/transparent caches, 107–108
types of, 107
write-thru/write-back, 113–114
dynamic
serving news sites, 117–125, 127–130
two-tier execution, 130–134
context switching, 77–78
control, 14–15
disaster recovery, 22–23
rapid development, 15–16
unit testing, 16–17
versions, 18–21
cookies, 128–130
costs
architecture, 34
high availability, 39–40
replication, 126
CREATE TABLE DDL, 151
criteria for monitoring systems, 29–30
cross-vendor replication, implementing, 151–166
cull_old_hits function, 216
custom reactions, monitoring, 29
CVS (Concurrent Versioning System), 19
daemons, Spread, 229–230
data caches, 112–113
data modification language. See DML
data resiliency, 137
databases
distributed
data resiliency, 137
geographically distributed operations, 139
operational failover, 138
optimizing query performance, 138
overview of, 137
reliability, 138
MySQL
defining scope, 206–207
selecting, 207–208
technical setup, 200–206
testing, 223–226
troubleshooting, 208–223
RDBMS, 113
RDDtool, 188–189
collecting metrics, 191–193
configuring, 189–190
generating graphs, 194–197
replication, 139
master-master, 144
master-slave, 144–147
multimaster, 140–143
implementing cross-vendor replication, 151–166
implementing same-vendor replication, 166
dblinks, 164
DDL:CREATE TABLE, 151
decreasing scalability, 7–8
defining scope, 206–207
dependencies, services, 30
deploying caches, 117
design, 6–10, 23. See also configuration
development
internal release cycles, 31
rapid, 15–16
differential synchronization, 85
disaster recovery, 22–23
distributed databases
data resiliency, 137
geographically distributed operations, 139
master-master replication, 144
master-slave replication, 144–147
multimaster replication, 140–141
2PC, 141–142
EVS, 142–143
operational failover, 138
optimizing query performance, 138
overview of, 137
reliability, 138
cross-vendor replication, 151–154, 156–166
same-vendor replication, 166
distribution
data, 112–113
deploying, 117
integrated/look-aside, 109–112, 122–127
layered/transparent, 107–108
types of, 107
write-thru/write-back, 113–114
dynamic
serving news sites, 117–130
two-tier execution, 130–134
static content, 83–87
DML (data modification language), 145–147
logs, 152
race conditions, 154
replay replication, 152–162
snapshot replication, 163–166
DNS (Domain Name Service)
high availability, 55–60
Round-Trip Times
static content, 101
dot.com bust, affect on large systems, 13–14
dynamic content distribution
two-tier execution, 130–134
effective resource utilization, load balancing, 62
email, high availability, 55–60
emergency releases, 33
environments
mission-critical, 25
cost effectiveness, 34
high availability, 26
management, 36
monitoring, 27–30
release cycles, 30–34
speed, 34–36
production, 12–13
events
logging
architecture, 176–177
configuring, 177
mod_log_spread, 179
optimizing, 175–176
overview of, 169–171
passive sniffing log aggregation, 173–174
periodic batch aggregation, 171–172
real-time unicast aggregation, 172–173
spreadlogd, 178–179
monitoring, 30
evolution of architecture, 8
EVS (extended virtual synchrony), 142–143
exports, revision control, 86
extended virtual synchrony (EVS), 142–143
extensibility, monitoring, 29
extensions, creating PHP, 203–206
external release cycles, 33–34
fault tolerance, 38, 90. See also availability
FIFO ordering, 228
files, creating, 92–94
Finagle’s Law, 12
fine nines availability, 26
firewalls
high availability, 55–60
OSs (operating systems), 90
flexible notifications, monitoring, 29
flipping snapshots, 163–166
formatting CLF, 180
Foundry ServerIron, 45–49
frameworks, monitoring, 29–30
FreeBSD 4.9, 90
Freevrrpd, 49
front-end load balancers, 66
functions
cull_old_hits, 216
get_current_online_count, 216
get_hit_info, 211
online_init, 218
online_shutdown, 218
sl_find_compare_neighbors, 216
generating RRDtool graphs, 194–197
geographically distributed operations, 139
getcounts method, 222
getuserinfo method, 222
get_current_online_count function, 216
get_hit_info function, 211
graphs, generating, 194–197
gratuitous ARPing, 97
groups, Spread communication, 227–228
HA (high availability), 37–38
costs, 39–40
load balancing, 41–43
maintenance, 40–41
mission-critical systems, 55–60
peer-based, 49–55
traditional, 43–44
Foundry ServerIron, 45–49
site surveys, 45
Wackamole, 94–98
Web servers, 89
hardware
high availability, 43
horizontal scalability, 5
load balancing, 42
costs, 39–41
load balancing, 41–43
mission-critical systems, 55–60
peer-based, 49–55
traditional, 43–44
Foundry ServerIron, 45–49
site surveys, 45
high performance computing (HPC) systems, 72
Hitdate index, 215
horizontal scalability, 5. See also scalability
Hot Standby Routing Protocol (HSRP), 45
hot-standby, 138
HPC (high performance computing) systems, 72
HSRP (Hot Standby Routing Protocol), 45
http acceleration mode, Squid Web servers, 89
HTTPS (secure hypertext transport protocol), 64
image serving, 99–101
implementation
cross-vendor replication, 151–166
flipping snapshots, 163–166
high availability, 40–41
monitoring, 27–28
samevendor replication, 166
information collectors, selecting, 214–220
infrastructure
mission-critical environments, 25
cost effectiveness, 34
high availability, 26
management, 36
monitoring, 27–30
release cycles, 30–34
speed, 34–36
scalability, 5
architecture, 9–10
decreasing, 7–8
flawed designs, 6–7
need for, 6
real-world design of, 8–9
installation
Spread, 231
Squid, 98
Wackamole, 91–94
integrated caches, 109–112, 122–127
integrity, caches, 130
internal release cycles, 31–32
Internet, load balancing, 72–73
Invalidation, caches, 134
IP (Internet Protocol)
addresses
Anycast, 102–103
managing Wackamole, 92
load balanced protocols, 72–73
services, 63–65
IPVS (IP virtual servers), 67
ISPs (Internet service providers), 83
keys, primary, 158
large systems, dot.com bust affect on, 13–14
latency, static content improvements, 99–101
layered caches, 107–108
layers, application-layer load balancers, 67–71
least connections load balancing, 62
load balancing, 61–63
application-layer, 67–71
definition of, 71–72
high availability, 41–43
IP services, 63–65
IPVS, 67
services, 72–73
session stickiness, 73–74
web switches, 65–66
loading MySQL modules, 220
logging
applying, 197
architecture, 176–177
binlog, 147
configuring, 177
mod_log_spread, 179
spreadlogd, 178–179
DML, 152
monitors, 177
optimizing, 175–176
overview of, 169–171
passive log aggregation for metrics, 187–188
passive sniffing log aggregation, 173–174
periodic batch aggregation, 171–172
RDDtool, 188–189
collecting metrics, 191–193
configuring, 189–190
generating graphs, 194–197
real-time analysis, 181–183
real-time monitoring, 183–186
real-time unicast aggregation, 172–173
servers, 177
look-aside caches, 109–112, 122–127
Mail Transport Agents. See MTAs
maintenance
high availability, 40–41
monitoring, 30
management
logging, 171–172
mission-critical environments, 25, 36
cost effectiveness, 34
high availability, 26
monitoring, 27–30
release cycles, 30–34
speed, 34–36
Management Information Bases. See MIBs
manual magic, 142
master-master replication, 144
master-slave replication, 144–147
mecached servers, 125
memcached servers, 125
memory resources, Apache, 79
methods
getcounts, 222
getuserinfo, 222
query, 221
read_response, 222
metrics
passive log aggregation for, 187–188
RRDtool, 188–189
collecting, 191–193
configuring, 189–190
generating graphs, 194–197
MIBs (Management Information Bases), 28
mission-critical environments, 25
availability, 55–60
cost effectiveness, 34
high availability, 26
management, 36
monitoring, 27–30
release cycles, 30–34
speed, 34–36
modules, loading MySQL, 220
mod_log_spread, 179
mod_perl application, 202
casual, 177
logging, 177
Moore’s Law, 13
MTAs (Mail Transport Agents), 28
multimaster replication, 140–141
2PC, 141–142
EVS, 142–143
multinode clusters, 171–172
MySQL, 147
MySQLl
scope, 206–207
selecting, 207–208
technical setup, 200–206
testing, 223–226
troubleshooting, 208–223
N-1 fault tolerant, 38
name-based virtual hosting, 64
NAT (network address translation), 65
network file system (NFS), 85
networks, partitions, 142
news sites
two-tier execution, 130–134
NFS (network file system), 85
nodes, 84–85
AFS protocol, 85
CODA protocol, 85
differential synchronization, 85
NFS (network file system), 85
revision control exports, 86
thttpd Web server, 88
notifications, monitoring, 29
online_init function, 218
online_shutdown function, 218
operating systems (OSs), 90
operational failover, 138
optimization
logging, 175–176
passive log aggregation for metrics, 187–188
real-time analysis, 181–183
real-time monitoring, 183–186
queries, 138
Oracle, changeset replication, 147
ordering, 228
OSs (operating systems), 90
outages, 26. See also availability
P2P (peer-to-peer) systems, 39
partitions, networks, 142
passive log aggregation for metrics, 187–188
passive sniffing log aggregation, 173–174
peak rates, cost effectiveness, 81–82
peer-based high availability, 49–55
peer-to-peer (P2P) systems, 39
performance
dynamic content distribution, 121
logging
real-time analysis, 181–183
real-time monitoring, 183–186
passive log aggregation for metrics, 187–188
peak rates, 81–82
queries, 138
speed, 34–36
periodic batch aggregation, 171–172
perl, mod_perl application, 202
PHP extensions, creating, 203–206
PKI (public key infrastructure), 69
planning capacity, 72
platforms, selecting, 208–209
point-to-point communication, 227
predictive load balancing, 62
preferences, tracking, 127–130
primary keys, 158
priority placement, 84–85
AFS protocol, 85
CODA protocol, 85
differential synchronization, 85
NFS, 85
revision control exports, 86
thttpd Web server, 88
production
internal release cycles, 32
working, 12–13
protocols
AFS, 85
ARP, 63
CARP, 49
CODA, 85
HSRP, 45
HTTPS, 64
IP load balanced, 72–73
SNMP, 28
VRRP, 45
proxy caches, 88, 108. See also caches
public key infrastructure (PKI), 69
publishing logs
configuring, 177
mod_log_spread, 179
spreadlogd, 178–179
query method, 221
queries
caches, 113
optimizing, 138
quorums, 143
race conditions, DML, 154
random load balancing, 62
rapid development, 15–16
RDBMS (relation database management system), 113
reactions, monitoring, 29
read_response method, 222
real-time
analysis, 181–183
monitoring, 183–186
unicast aggregation, 172–173
real-world design of scalability, 8–9
recovery, disaster, 22–23
recursive name service resolution, 101
reference primary keys, 158
relation database management system. See RDBMS
release cycles
mission-critical applications, 30–34
reliability, 138
replay replication, 152–162
changeset, 145–147
cross-vendor, 151–154, 156–166
databases, 139
master-master, 144
master-slave, 144–147
multimaster, 140–143
same-vendor, 166
snapshot, 163–166
requests, 61–63
IP services, 63–65
web switches, 65–66
requirements, ACID, 140
resiliency, data, 137
resolution, DNS Round-Trip Times, 101
resources
load balancing, 62
memory, 79
site processes, 78–80
reverse-proxy support, 88
revision control exports, 86
roles, logging, 176–177
Round Robin Database tool. See RRDtool
round robin load balancing, 62
routers, high availability, 55–60
RRDtool, 188–189
collecting metrics, 191–193
configuring, 189–190
rsync tool, 85
same-vendor replication, implementing, 166
scalability, 5
architecture, 9–10
decreasing, 7–8
distributed databases, 148–151
implementing cross-vendor replication, 151–166
implementing same-vendor replication, 166
flawed designs, 6–7
memcached servers, 125
need for, 6
real-world design of, 8–9
speed, 34–36
web switches, 65–66
scheduling, monitoring, 29
scope, defining, 206–207
secure hypertext transport protocol (HTTPS), 64
secure socket layer (SSL), 51
security
caches, 130
firewalls, 90
selecting
changeset replication, 147
information collectors, 214–220
MySQL, 207–208
platforms, 208–209
service providers, 210–213
tools, 200
servers
Foundry ServerIron, 45–49
IPVS, 67
logging, 176–177
memcached, 125
monitoring, 27–28
Web
choosing, 88–89
processes, context switching, 77
setting up, 98
selecting, 210–213
services
dependencies, 30
IP, 63–65
monitoring, 27–30
serving news sites, 117–134
sessions
SSL caches, 116
stickiness, 73–74
tracking, 127–130
simple distributed information caches, 115
simple network management protocol. See SNMP
single point of failure, Foundry ServerIron, 47
site surveys, 45
site processes
analyzing, 76–77
context switching, 77–78
resources, 78–80
SiteUserID index, 215
skiplists, 216
sl_find_compare_neighbors function, 216
snapshot replication, 163–166
SNMP (simple network management protocol), 28–29
software, logging, 176–177
spacachepurge, 244
speed, architecture, 34–36
spoofing, ARP, 96
spoon-feeding clients, 108
spurgecached, 242
Spread, 229–230
applying, 241–245
group communication, 227–228
installing, 231
Wackamole, 91
spreadlogd, 178–179
spuser real-time observation sessions, 181
SSL (secure socket layer), 51, 116
stability, 14–15
disaster recovery, 22–23
rapid development, 15–16
unit testing, 16–17
version control, 18–21
staging internal release cycles, 31–32
staleness, 62
static content
Anycast, 102–103
availability, 90
clustering, 82
caching, 83
upper bound, 83
distribution, 83–87
DNS Round-Trip Times, 101
improvements, 99–101
OSs (operating systems), 90
overview, 75
peak rates, 81–82
site processes
analyzing, 76–77
context switching, 77–78
resources, 78–80
Wackamole, 91
ARP spoofing, 96
benefits, 91
compiling, 91
high availability, 94–98
Web servers
choosing, 88–89
setting up, 98
static URLs, creating, 82
statistics, wwwstat program, 186
stickiness, sessions, 73–74
storage, binlog, 147
subscribers, logging, 178–179
switches
high availability, 55–60
load balancing, 65–66
web, 104
synchronization, 41
differential, 85
rsync tool, 85
tables
CREATE TABLE DDL, 151
replay replication, 152–162
snapshot replication, 163–166
technical setup, MySQL, 200–206
testing
clients, 223
MySQL, 223–226
unit, 16–17
three-phase commit (3PC), 40
thttpd Web server, 88
timeout-based caches, 123
tools
RDDtool, 188–189
collecting metrics, 191–193
configuring, 189–190
generating graphs, 194–197
rsync, 85
selecting, 200
Spread, 229–230
applying, 241–245
configuring, 231–241
installing, 231
total ordering, 228
tracking user data, 127–130
traditional high availability, 43–44
Foundry ServerIron, 45–49
peer-based, 49–51
site surveys, 45
traffic, scalability, 8
transparent caches, 107–108
troubleshooting
high availability, 40–41
logging
overview of, 169–171
passive log aggregation for metrics, 187–188
passive sniffing log aggregation, 173–174
periodic batch aggregation, 171–172
real-time analysis, 181–183
real-time monitoring, 183–186
real-time unicast aggregation, 172–173
MySQL, 208–223
two-phase commit (2PC), 40, 141–142
two-tier execution, dynamic content distribution, 130–134
types
of caches, 107
data, 112–113
distributed, 114–116
integrated/look-aside, 109–112, 122–127
layered/transparent, 107–108
write-thru/write-back, 113–114
of DML, 145–146
unification of separate systems, 42. See also load balancing
unit testing, 16–17
unplanned outages, 26. See also high availability
unsolicited ARPing, 97
upper bound, static content clusters, 83
URL-Hitdate index, 215–216
URLs (Uniform Resource Locators)
static, 82
viewing, 217
user data, tracking, 127–130
utilities
RDDtool, 188–189
collecting metrics, 191–193
configuring, 189–190
generating graphs, 194–197
selecting, 200
Spread, 229–230
applying, 241–245
configuring, 231–241
installing, 231
utilization
load balancing, 62
scalability, 8
VCSs (version control systems), 18
versions, CVS, 19
viewing URLs, 217
Virtual Router Redundancy Protocol (VRRP), 45
visualization (RRDtool), 188–189
collecting metrics, 191–193
configuring, 189–190
generating graphs, 194–197
VRRP (Virtual Router Redundancy Protocol), 45
Wackamole, 91
ARP spoofing, 96
benefits, 91
compiling, 91
high availability, 94–98
installing, 91–94
wackamole.conf file, 92–94
warm-standby, 138
web accelerators, 87
web caches, 86–88
web servers
choosing, 88–89
logging, 177
monitoring, 27–28
processes, 77
setting up, 98
web sites, Squid, 89
weighted random load balancing, 62
whitepaper approach to high availability, 43–44
Foundry ServerIron, 45–49
site surveys, 45
"who’s online" solution, 226
write-thru/write-back caches, 113–114
writing clients, 220–223
wwwstat program, 186
3.133.127.37