24
Apache Ignite in Python

This chapter's example uses Python to build a NoSQL key-value database. The example runs on the local computer, but it shouldn't be hard to adapt it to run in the cloud.

This example is much less complicated than the examples described in the previous chapters, largely because a key-value database has fewer features. Relational, graph, and document databases all provide sophisticated querying features. In contrast, a key-value database mostly just stores values and later retrieves them.

Apache Ignite can actually function as a relational database, but Chapters 16 through 19 demonstrated that kind of database, so this chapter uses only Ignite's key-value features.

A key-value database lets you associate a piece of data with a key and then later use the key to retrieve the data.

You can compare a key-value database to your computer's environment variables. You can set an environment variable and give it a name. Later, you can get that variable by its name.

For example, to set the environment variable named GREETING in Windows, you can open a console window and enter the following command.

SET GREETING=Howdy Pard!

Later, you can use the following code to show the saved value:

ECHO %GREETING%

A program can use code to do something similar. For example, the following code shows how a Python program might display the variable that you set in the command window:

import os
print(os.getenv('GREETING'))

The following code shows the C# version:

Console.WriteLine(Environment.GetEnvironmentVariable("GREETING"));

By default, Ignite stores data in RAM, so the data is lost when your program ends. This chapter explains how to use volatile or nonvolatile data. The following section explains how to install Apache Ignite. The rest of the chapter describes the example program.

INSTALL APACHE IGNITE

Apache Ignite is an open source database, and you can find free and non-free versions. You can find an overview at https://en.wikipedia.org/wiki/Apache_Ignite or visit the Ignite website https://ignite.apache.org.

You can also find a quick-start guide for Python that includes installation instructions at https://ignite.apache.org/docs/latest/quick-start/python.

Here's my abbreviated list of installation instructions.

  1. Go to https://ignite.apache.org/download.cgi#binaries. Click the Binary Releases link at the top of that page to go to the Binary Releases section and download the latest zip file. (You can also download the documentation if you like.)
  2. Unzip the files into a convenient directory. I put mine in C:users odapache-ignite-2.14.0-bin. If you just unzip the files, you'll get a directory named apache-ignite-2.14.0-bin inside the apache-ignite-2.14.0-bin directory, so either move into the zip file and drag the subdirectory to a good location or rearrange the files to suit your preferences after you unzip.

The download file is fairly large (version 2.14 is about 252 MB), so use a fast network connection for the download. It will also expand when you unzip it (version 2.14 expands to around 383 MB), so you'll need a fair amount of disk space.

The Apache Ignite community is quite active, so it's likely that a newer (and bigger) version will be available by the time you read this. There's already been a new version since I started writing this chapter.

START A NODE

Before you can use Ignite in your program, you need to start an Ignite node, and that node should remain running as long as you want to work with it.

To start a node, you need to run the batch file named ignite.bat, passing it the location of a configuration file to tell Ignite how to start the node. That configuration determines, among other things, whether the node should store data persistently.

Running the batch file isn't hard, but to make the process even easier, I created two batch files to start the node with or without persistence.

Without Persistence

If you start a node without persistence, then the data is stored in memory and is lost when the node stops. To start the node without persistence, use the following batch file, which I named start_node.bat:

cd c:users
odapache-ignite-2.14.0-binin
ignite.bat
pause

This file uses the cd command to set the current directory to the location of the ignite.bat file. It then runs ignite.bat to start the node. When the node starts, you should see a command window similar to the one shown in Figure 24.1.

A representation of the pgAdmin screen. The create submenu is opened and the table is selected. The Identity option is selected.

FIGURE 24.1

That window will remain running as long as the node is running.

The batch file's final pause command is only there in case there's a problem starting ignite.bat. For example, if ignite.bat isn't in the directory passed to the cd command, the pause command makes the output pause so that you can read the error message. Without that command, the window would vanish too quickly for you to see the error message.

You can start as many nodes as you like, and they should find each other automatically.

With Persistence

If you start a node with persistence enabled, then as much data as possible is stored in memory and it is all shadowed to hard disk. If you stop the node and restart it, the node reloads the saved data.

To start the node with persistence, use the following batch file, which I named start_node_persistent.bat:

cd c:users
odapache-ignite-2.14.0-binin
ignite.bat ..examplesconfigpersistentstoreexample-persistent-store.xml
pause

This file uses the same cd command and runs ignite.bat as before. This time it passes the location of a configuration file to ignite.bat. You can look at that configuration file to see what it does. The most important part for our purposes is that it sets the value persistenceEnabled to true.

If you look closely at the avalanche of information in the command window, you'll see a couple of differences with the window shown in Figure 24.1. First, you'll find the following information:

Ignite node started OK (id=36174a6d)
[15:24:56]>>> Ignite cluster is in INACTIVE state (limited functionality
available). Use control.(sh|bat) script or IgniteCluster.state(ClusterState.ACTIVE) to change the state.
[15:24:56] Topology snapshot [ver=1, locNode=36174a6d, servers=1, clients=0,
state=INACTIVE, CPUs=8, offheap=3.1GB, heap=1.0GB]

This says the node is in an inactive state. For some reason, that's the default if you start the node with persistence enabled. If you try to access the node while it is inactive, you get the following message:

CacheCreationError: Cannot perform the operation because the cluster is
inactive. Note, that the cluster is considered inactive by default if Ignite
Persistent Store is used to let all the nodes join the cluster. To activate the
cluster call Ignite.active(true).

You can use the Ignite control script control.bat (or control.sh if you have a Linux accent to change the state, but I found it easy enough to make the program do it.

To make it easier to test the database with and without persistence, I've broken the example program into sections that define a class, write some data, and read some data. The following sections describe those pieces. Later in the chapter, I'll summarize how to use the pieces to demonstrate persistence. In a real application, you might want to combine the pieces or call them from a main program.

CREATE THE PROGRAM

Now that you know how to start the node with and without persistence, you need to install a database adapter for it. Then you can start writing code.

Install the pyignite Database Adapter

To install the pyignite driver, simply use the following pip command:

$ pip install pyignite

If you're not familiar with pip, you can execute the following command in a Jupyter Notebook cell instead:

!pip install pyignite

That's all there is to it!

Define the Building Class

The example program saves a few pieces of information into the database. One of those pieces is a Building class. To allow Ignite to understand the data saved in the object's fields, you need to define it properly.

The following code shows how the program defines the Building class:

# Cell 1.
# Define the Building class with a schema
# so the database can understand its fields.
from pyignite import *
from pyignite.datatypes import String, IntObject
 
class Building(metaclass=GenericObjectMeta, schema={
    'name': String,
    'city': String,
    'height': IntObject,
    }):
 
    def __init__(self, name, city, height):
        self.name = name
        self.city = city
        self.height = height

This code uses the GenericObjectMeta class to define the Building class. The interesting part here is the schema, which is a dictionary that defines the class's fields and their data types. Ignite uses the data types to understand how it should save and retrieve data for Building objects. In addition to defining the class's fields, this cell creates a constructor to make initializing objects easier.

Save Data

The following cell saves data into the database.

# Create some data.
from pyignite import *
from pyignite.datatypes import String, IntObject
from pyignite.cluster import *
 
client = Client()
with client.connect('localhost', 10800):
    my_cluster = cluster.Cluster(client)
    my_cluster.set_state(ClusterState.ACTIVE)
 
    misc_data_cache = client.get_or_create_cache('misc_data')
 
    building = Building('Burj Khalifa', 'Dubai', 2717)
    misc_data_cache.put(100, building)
    
    misc_data_cache.put('fish', 'humuhumunukunukuapua'a')
    misc_data_cache.put(3.14, 'pi')
 
print("Created data")

After the imports statements, the code creates a client. This example assumes that the node is running on the local computer, so it connects to localhost. You can change that value to an IP address if the node is running on another computer. You can also replace the parameters with a list of IP addresses and port numbers, as in the following code:

nodes = [
    ('127.0.0.1', 10800),
    ('127.0.0.1', 10801),
    ('127.0.0.1', 10802),
]
 
with client.connect(nodes):

After it has connected to the node, the code gets a Cluster object representing the cluster that it is using and calls its set_state method to activate the cluster.

If you started the node without persistence, then it is already active and this does nothing. If you started the node with persistence, then it is initially inactive and this activates it.

Next, the code calls the client's get_or_create_cache method to retrieve a cache named misc_data. As the method's name implies, this gets the cache if it already exists, and it creates an empty cache if it does not exist.

The next statements add data to the cache using an assortment of data types for both keys and values.

The program creates a new Building object representing Burj Khalifa, the tallest building in the world. It then saves that object in the database, giving it the integer key 100.

Next, the cell saves the string value “humuhumunukunukuapua'a” in the cache with the string key “fish.” (It's the official state fish of Hawaii. It's also called the reef triggerfish, but humuhumunukunukuapua'a is more fun to say.)

Finally, the code saves the string value “pi” in the cache with the floating point key 3.14.

This cell finishes by displaying a message to show that it finished.

Read Data

The following cell reads data back from the node:

# Cell 3.
# Read and display the  data.
from pyignite import *
from pyignite.datatypes import String, IntObject
from pyignite.cluster import *
 
client = Client()
with client.connect('localhost', 10800):
    my_cluster = cluster.Cluster(client)
    my_cluster.set_state(ClusterState.ACTIVE)
 
    misc_data_cache = client.get_or_create_cache('misc_data')
 
    building = misc_data_cache.get(100)
    print(building)
 
    text = misc_data_cache.get('fish')
    print(text)
 
    number = misc_data_cache.get(3.14)
    print(number)

The code first creates a client, connects to the database, and activates the cluster in case it is not already active. It then uses the client's get_or_create_cache method to get the cache.

The program then calls the cache's get method, passing it the keys that the previous cell used to save the data. It uses the key 100 to retrieve the Building object and prints that object. Next, it uses the string key "fish" to fetch the fish's name and displays that value. Finally, it uses the floating point key 3.14 to fetch the associated string and prints that string. Here's the result:

Building(name='Burj Khalifa', city='Dubai', height=2717, version=1)
humuhumunukunukuapua'a
pi

If you would rather not create a Building class with a schema, you can store and retrieve building data in another format such as a JSON document or delimited string.

Demonstrate Volatile Data

To demonstrate the program using volatile data, use the following steps:

  1. Run start_node.bat to start the node without persistence.
  2. Run Cell 1 to define the Building class.
  3. Run Cell 2 to create some data.
  4. Run Cell 3 to read and display the data. It should look as expected.
  5. Close the node's command window and rerun start_node.bat to restart the node, again without persistence.
  6. Run Cell 3 to read and display the data. Notice that all of the values are displayed as “none.”

Demonstrate Persistent Data

To demonstrate the program using volatile data, use the following steps:

  1. Run start_node_persistent.bat to start the node without persistence.
  2. Run Cell 1 to define the Building class.
  3. Run Cell 2 to create some data.
  4. Run Cell 3 to read and display the data. It should look as expected.
  5. Close the node's command window and rerun start_node_persistent.bat to restart the node, again with persistence.
  6. Run Cell 3 to read and display the data. This time all the values should be displayed normally.

SUMMARY

This chapter shows how you can use Python and a key-value database to store and retrieve values in a cache. You can use a configuration file to enable persistence if you want to save data when you stop and restart the node. If persistence is enabled, the node starts in an inactive state, so you'll need to activate it either by using the control script control.bat (or control.sh in Linux), or you can make the program do it.

If you like, you can use the database to pass information between the C# program described in Chapter 25 and the Python program described in this chapter. For example, you can use the C# program to save data into the node and then use the Python program to read the data. That should work for simple data types such as integers and strings, but it may not work with objects. For example, the Python and C# programs represent the Building class in slightly different ways, so they can't reliably pass Building objects through the database. If you really need to pass an object back and forth, you could serialize the object in a string, pass that to the other program, and then deserialize it on the other end.

The next chapter shows how to build a similar example program in C#. Before you move on to that example, however, use the following exercises to test your understanding of the material covered in this chapter. You can find the solutions to these exercises in Appendix A.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.118.12.50