Part II. Distributed Systems

A distributed system is one in which the failure of a computer you didn’t even know existed can render your own computer unusable.

Leslie Lamport

Without distributed systems, we wouldn’t be able to make phone calls, transfer money, or exchange information over long distances. We use distributed systems daily. Sometimes, even without acknowledging it: any client/server application is a distributed system.

For many modern software systems, vertical scaling (scaling by running the same software on a bigger, faster machine with more CPU, RAM, or faster disks) isn’t viable. Bigger machines are more expensive, harder to replace, and may require special maintenance. An alternative is to scale horizontally: to run software on multiple machines connected over the network and working as a single logical entity.

Distributed systems might differ both in size, from a handful to hundreds of machines, and in characteristics of their participants, from small handheld or sensor devices to high-performance computers.

The time when database systems were mainly running on a single node is long gone, and most modern database systems have multiple nodes connected in clusters to increase storage capacity, improve performance, and enhance availability.

Even though some of the theoretical breakthroughs in distributed computing aren’t new, most of their practical application happened relatively recently. Today, we see increasing interest in the subject, more research, and new development being done.

Part II. Basic definitions

In a distributed system, we have several participants (sometimes called processes, nodes, or replicas). Each participant has its own local state. Participants communicate by exchanging messages using communication links between them.

Processes can access the time using a clock, which can be logical or physical. Logical clocks are implemented using a kind of monotonically growing counter. Physical clocks, also called wall clocks, are bound to a notion of time in the physical world and are accessible through process-local means; for example, through an operating system.

It’s impossible to talk about distributed systems without mentioning the inherent difficulties caused by the fact that its parts are located apart from each other. Remote processes communicate through links that can be slow and unreliable, which makes knowing the exact state of the remote process more complicated.

Most of the research in the distributed systems field is related to the fact that nothing is entirely reliable: communication channels may delay, reorder, or fail to deliver the messages; processes may pause, slow down, crash, go out of control, or suddenly stop responding.

There are many themes in common in the fields of concurrent and distributed programming, since CPUs are tiny distributed systems with links, processors, and communication protocols. You’ll see many parallels with concurrent programming in “Consistency Models”. However, most of the primitives can’t be reused directly because of the costs of communication between remote parties, and the unreliability of links and processes.

To overcome the difficulties of the distributed environment, we need to use a particular class of algorithms, distributed algorithms, which have notions of local and remote state and execution and work despite unreliable networks and component failures. We describe algorithms in terms of state and steps (or phases), with transitions between them. Each process executes the algorithm steps locally, and a combination of local executions and process interactions constitutes a distributed algorithm.

Distributed algorithms describe the local behavior and interaction of multiple independent nodes. Nodes communicate by sending messages to each other. Algorithms define participant roles, exchanged messages, states, transitions, executed steps, properties of the delivery medium, timing assumptions, failure models, and other characteristics that describe processes and their interactions.

Distributed algorithms serve many different purposes:

Coordination

A process that supervises the actions and behavior of several workers.

Cooperation

Multiple participants relying on one another for finishing their tasks.

Dissemination

Processes cooperating in spreading the information to all interested parties quickly and reliably.

Consensus

Achieving agreement among multiple processes.

In this book, we talk about algorithms in the context of their usage and prefer a practical approach over purely academic material. First, we cover all necessary abstractions, the processes and the connections between them, and progress to building more complex communication patterns. We start with UDP, where the sender doesn’t have any guarantees on whether or not its message has reached its destination; and finally, to achieve consensus, where multiple processes agree on a specific value.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.119.253.31