The ZooKeeper data model

As defined by the ZooKeeper wiki, ZooKeeper allows distributed processes to coordinate with each other through a shared hierarchical namespace of data registers. The namespace looks quite similar to a Unix filesystem. The data registers are known as znodes in the ZooKeeper nomenclature. You can see examples of znodes in the following image:

The ZooKeeper data model

A ZooKeeper's hierarchical namespace

Here, you can see that znodes are organized hierarchically, much like a tree, as a standard filesystem. Some important points to take note of are as follows:

  • The root node has one child znode called /zoo, which in turn has three znodes.
  • Every znode in the ZooKeeper tree is identified by a path, and the path elements are separated by /.
  • The znodes are called data registers because they can store data. Thus, a znode can have children as well as data associated with it. It's analogous to having a filesystem that allows a file to also be a path.

The data in a znode is typically stored in a byte format, with a maximum data size in each znode of no more than 1 MB. ZooKeeper is designed for coordination, and almost all forms of coordination data are relatively small in size; hence, this limit on the size of data is imposed. It is recommended that the actual data size be much less than this limit as well.

The slash-separated znode paths are canonical and have to be absolute. Relative paths and references are not recognized by ZooKeeper. It is useful to know that the znode names can be composed of Unicode characters and that the znodes can have any name. The exception to this is that the word ZooKeeper is reserved. On top of this, the use of "." is illegal as a path component.

Like files in a filesystem, znodes maintain a stat structure that includes version numbers for data changes and an access control list that changes along with timestamps associated with changes. The version number increases whenever the znode's data changes. ZooKeeper uses the version numbers along with the associated timestamps to validate its in-core cache. The znode version number also enables the client to update or delete a particular znode through ZooKeeper APIs. If the version number specified doesn't match the current version of a znode, the operation fails. However, this can be overridden by specifying 0 as the version number while performing a znode update or delete operation.

Types of znodes

ZooKeeper has two types of znodes: persistent and ephemeral. There is a third type that you might have heard of, called a sequential znode, which is a kind of a qualifier for the other two types. Both persistent and ephemeral znodes can be sequential znodes as well. Note that a znode's type is set at its creation time.

The persistent znode

As the name suggests, persistent znodes have a lifetime in the ZooKeeper's namespace until they're explicitly deleted. A znode can be deleted by calling the delete API call. It's not necessary that only the client that created a persistent znode has to delete it. Note that any authorized client of the ZooKeeper service can delete a znode.

It's time to put this newly acquired knowledge into practice, so let's create a persistent znode using the ZooKeeper Java shell:

[zk: localhost(CONNECTED) 1] create /[PacktPub] "ApacheZooKeeper"
Created /[PacktPub]
[zk: localhost(CONNECTED) 2] get /[PacktPub]
"ApacheZooKeeper"

Persistent znodes are useful for storing data that needs to be highly available and accessible by all the components of a distributed application. For example, an application can store the configuration data in a persistent znode. The data as well as the znode will exist even if the creator client dies.

The ephemeral znode

By contrast, an ephemeral znode is deleted by the ZooKeeper service when the creating client's session ends. An end to a client's session can happen because of disconnection due to a client crash or explicit termination of the connection. Even though ephemeral nodes are tied to a client session, they are visible to all clients, depending on the configured Access Control List (ACL) policy.

An ephemeral znode can also be explicitly deleted by the creator client or any other authorized client by using the delete API call. An ephemeral znode ceases to exist once its creator client's session with the ZooKeeper service ends. Hence, in the current version of ZooKeeper, ephemeral znodes are not allowed to have children.

To create an ephemeral znode using the ZooKeeper Java Shell, we have to specify the -e flag in the create command, which can be done using the following command:

[zk: localhost(CONNECTED) 1] create -e /[PacktPub] "ApacheZooKeeper"
Created /[PacktPub]

Now, since an ephemeral znode is not allowed to have children, if we try to create a child znode to the one we just created, we will be thrown an error, as follows:

[zk: localhost(CONNECTED) 2] create -e /[PacktPub]/EphemeralChild "ChildOfEphemeralZnode"
Ephemerals cannot have children: /[PacktPub]/EphemeralChild

The concept of ephemeral znodes can be used to build distributed applications where the components need to know the state of the other constituent components or resources. For example, a distributed group membership service can be implemented by using ephemeral znodes. The property of ephemeral nodes getting deleted when the creator client's session ends can be used as an analogue of a node that is joining or leaving a distributed cluster. Using the membership service, any node is able discover the members of the group at any particular time. We will discuss this in more detail in Chapter 4, Performing Common Distributed System Tasks.

The sequential znode

A sequential znode is assigned a sequence number by ZooKeeper as a part of its name during its creation. The value of a monotonously increasing counter (maintained by the parent znode) is appended to the name of the znode.

The counter used to store the sequence number is a signed integer (4 bytes). It has a format of 10 digits with 0 (zero) padding. For example, look at /path/to/znode-0000000001. This naming convention is useful to sort the sequential znodes by the value assigned to them.

Tip

Sequential znodes can be used for the implementation of a distributed global queue, as sequence numbers can impose a global ordering. They may also be used to design a lock service for a distributed application. The recipes for a distributed queue and lock service will be discussed in Chapter 4, Performing Common Distributed System Tasks.

Since both persistent and ephemeral znodes can be sequential znodes, we have a total of four modes of znodes:

  • persistent
  • ephemeral
  • persistent_sequential
  • ephemeral_sequential

To create a sequential znode using the ZooKeeper Java shell, we have to use the -s flag of the create command:

[zk: localhost(CONNECTED) 1] create -s /[PacktPub] "PersistentSequentialZnode"
Created /[PacktPub]0000000001
[zk: localhost(CONNECTED) 3] create -s -e /[PacktPub] "EphemeralSequentialZnode"
Created /[PacktPub]0000000008

Keeping an eye on znode changes – ZooKeeper Watches

ZooKeeper is designed to be a scalable and robust centralized service for very large distributed applications. A common design anti-pattern associated while accessing such services by clients is through polling or a pull kind of model. A pull model often suffers from scalability problems when implemented in large and complex distributed systems. To solve this problem, ZooKeeper designers implemented a mechanism where clients can get notifications from the ZooKeeper service instead of polling for events. This resembles a push model, where notifications are pushed to the registered clients of the ZooKeeper service.

Clients can register with the ZooKeeper service for any changes associated with a znode. This registration is known as setting a watch on a znode in ZooKeeper terminology. Watches allow clients to get notifications when a znode changes in any way. A watch is a one-time operation, which means that it triggers only one notification. To continue receiving notifications over time, the client must reregister the watch upon receiving each event notification.

Let's walk through an example of a cluster group membership model to illustrate the concept of ZooKeeper watches and notifications:

  • In the cluster, a node, say Client1, is interested in getting notified when another node joins the cluster. Any node that is joining the cluster creates an ephemeral node in the ZooKeeper path /Members.
  • Now, another node, Client2, joins the cluster and creates an ephemeral node called Host2 in /Members.
  • Client1 issues a getChildren request on the ZooKeeper path /Members, and sets a watch on it for any changes. When Client2 creates a znode as /Members/Host2, the watch gets triggered and Client1 receives a notification from the ZooKeeper service. If Client1 now issues getChildren request on the ZooKeeper path /Members, it sees the new znode Host2. This flow of the setting of watches, and notifications and subsequent resetting of the watches is shown in the following image:
    Keeping an eye on znode changes – ZooKeeper Watches

    An image to representing how the relationship between two clients and ZooKeeper works through watches and notifications

ZooKeeper watches are a one-time trigger. What this means is that if a client receives a watch event and wants to get notified of future changes, it must set another watch. Whenever a watch is triggered, a notification is dispatched to the client that had set the watch. Watches are maintained in the ZooKeeper server to which a client is connected, and this makes it a fast and lean method of event notification.

The watches are triggered for the following three changes to a znode:

  1. Any changes to the data of a znode, such as when new data is written to the znode's data field using the setData operation.
  2. Any changes to the children of a znode. For instance, children of a znode are deleted with the delete operation.
  3. A znode being created or deleted, which could happen in the event that a new znode is added to a path or an existing one is deleted.

Again, ZooKeeper asserts the following guarantees with respect to watches and notifications:

  • ZooKeeper ensures that watches are always ordered in the first in first out (FIFO) manner and that notifications are always dispatched in order
  • Watch notifications are delivered to a client before any other change is made to the same znode
  • The order of the watch events are ordered with respect to the updates seen by the ZooKeeper service

    Note

    Since ZooKeeper watches are one-time triggers and due to the latency involved between getting a watch event and resetting of the watch, it's possible that a client might lose changes done to a znode during this interval. In a distributed application in which a znode changes multiple times between the dispatch of an event and resetting the watch for events, developers must be careful to handle such situations in the application logic.

When a client gets disconnected from the ZooKeeper server, it doesn't receive any watches until the connection is re-established. If the client then reconnects, any previously registered watches will also be reregistered and triggered. If the client connects to a new server, the watch will be triggered for any session events. This disconnection from a server and reconnection to a new server happens in a transparent way for the client applications.

Although ZooKeeper guarantees that all registered watches get dispatched to the client, even if the client disconnects from one server and reconnects to another server within the ZooKeeper service, there is one possible scenario worth mentioning where a watch might be missed by a client. This specific scenario is when a client has set a watch for the existence of a znode that has not yet been created. In this case, a watch event will be missed if the znode is created, and deleted while the client is in the disconnected state.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.145.152.242