We have done enough of theory; let's deal with clustering in RethinkDB. Till now we have covered what clustering really is in terms of computing and what it provides us. In this section, we are going to learn how we can perform clustering in RethinkDB, which by nature is a distributed database.
We will also learn how to add new machines into our existing cluster, manage them from the RethinkDB administrative screen, and monitor them for any errors.
We have two ways to perform RethinkDB clustering:
We can create a RethinkDB cluster in the same machine using the simple command under a minute. Yes you heard it right, in a minute (assuming you have RethinkDB installed). Let's do this.
Lift up the default RethinkDB server using the following command in the terminal:
rethinkdb
It should lift the RethinkDB server on the default port and you should be able to see the console as shown here:
Now open a new terminal, and run the following command:
rethinkdb --port-offset 1 --directory rethinkdb_data2 --join localhost:29015
You should be able to see a new RethinkDB instance lifting up, as shown here:
That is it. We have our first RethinkDB cluster running. Let's verify this, visit the administrative console and you should be seeing the 2 servers connected in Servers section, as shown in the following figure:
Yes it works! Try to execute a query from the data explorer and you should be receiving the same result regardless of having the cluster, because no matter which instance you use, RethinkDB will automatically route the query to the appropriate node.
Let us look at the command that we executed previously:
--port-offset
: This makes sure that no two nodes use the same port by incrementing them by 1--directory
: This tells RethinkDB to use a different directory in order to maintain consistency and avoid read/write issues--join
: This tells RethinkDB to connect to the existing seed nodeI would like to point out an important key here regarding failover. If you are creating a cluster in the same machine, you won't be able to achieve full failover because if a RethinkDB instance is down, it will manage it, but if your machine (which is running RethinkDB) goes down, your complete cluster will go down altogether. So, for learning purposes, this is OK, but not for production.
You can add new RethinkDB instances to the existing cluster using the same command, but make sure you use a different port offset and directory. Let's create RethinkDB using different machines.
We have seen how easy it is to create a RethinkDB cluster in the same machine. Let's see how we can create it using different machines. Actually, this is much easier than creating a cluster on the same machine, because you really don't need to worry about the port and directory usage.
Let's say you have two servers, one running on 104.121.23.24 and another one running on 104.121.23.25 respectively. We need to first install RethinkDB in each machine. You can find a detailed description about installing RethinkDB in the Mac, Linux or Windows machines at the official website of RethinkDB (https://www.rethinkdb.com/docs/install/).
Assuming you have RethinkDB installed on both machines, log in to the machine with the 104.121.23.24
IP address and lift the RethinkDB Server using the following command:
rethinkdb --bind all
RethinkDB will initiate itself, and you should be able to see the following on the terminal:
Now log in to the machine with the 104.121.23.25
IP address and lift the RethinkDB server using the following command:
rethinkdb --join 104.121.23.24:29015 --bind all
Upon running this command, you should be seeing the two servers on the administrative screen of RethinkDB. There is our cluster.
As you may notice, this is really easy to do, but is this sustainable? What I mean by sustainable here is: will it run on production? Let's find out.
Since we have performed cluster creation using the command line, what if one of the servers requires a reboot? Will it create the same cluster automatically? Well, no. Since we have a different machine running on the Internet, is it secure enough to run on production with so many hackers trying to intercept our data? Well No!
So why did we do this in the first place? To simply learn the concept. I always believe that everyone (including me) is looking for shortcuts to get the end result. Since we have covered the shortcut part and seen why it is not good for production, let's learn to optimize this and run our cluster in production mode. That's what this book is all about Mastering RethinkDB.
We are going to consider a separate machine as an instance for a cluster in production. Considering you have a two-server machine with IPs 104.121.23.24
and 104.121.23.25
respectively. We need to define which server will act as the starting point (SEED server) for other machines to enter the cluster; for instance, say 104.121.23.24
machine is the SEED server.
Log in to the machine with the 104.121.23.24
IP address and open up the RethinkDB configuration file. Assuming it's Ubuntu, the config
file can be located at /etc/rethinkdb/instances.d/default.conf
. If it is not present, you need to manually create it.
If it's already present, then you can uncomment some of the code lines shown here or simply write them.
First set up the instance name as follows. For the instance, the name is rethink_main
:
server-name = rethink_main
The next setting we need to alter or create is to allows the RethinkDB instance running on a different machine to connect to this server in order to create a cluster.
If the config
file is already there, you may find this code:
# bind=127.0.0.1
Change it to:
bind = all
Later, save and close the file. You will need to restart RethinkDB to let change take effect. You can do this in the following ways:
rethinkdb
command as follows: sudo /etc/init.d/rethinkdb restart
service
command as follwos:sudo service rethinkdb restart
systemctl
command as follows:sudo systemctl rethinkdb restart
Once the restart is done, we can move ahead to configure our next machine. We need to ensure one thing before this; it's the startup script. In case we need to reboot the server (which happens at regular maintenance by the service provider), our RethinkDB Server should be started on boot.
By default, RethinkDB automatically creates the init.d
script, which a Unix-based system reads on boot to start the services. As we have our configuration file in place, it will automatically start the service on boot.
This is our administrative screen after restarting the RethinkDB server:
Now let's configure our second machine and form our cluster. Log in to the machine with the 104.121.23.25
IP address and open up the configuration file using your favorite editor by the following command:
sudo vi /etc/rethinkdb/instances.d/default.conf
Change the name of the server by setting the following key:
server-name=rethink_child
Again, change the bind key and allow it to connect to other RethinkDB instances:
bind=all
Now, to add this machine to a cluster, we need to add the following join
command. This command will join our machine to the first machine and we will have our cluster ready:
join=104.121.23.24:29015
Save and close the editor. To make the setting effective, we need to restart the server:
sudo /etc/init.d/rethinkdb restart
Once the reboot is successful, we can check whether we have our cluster formed successfully or not. Visit the administrative screen and you should be able to see a similar screen to that shown here:
As we can see, we have two servers connected and running as one cluster. Congratulations! We have a production-ready cluster with us.
You can learn more about the configuration files of RethinkDB at their official docs website ( https://rethinkdb.com/docs/config-file/)
However, there is still a little work left to do on the security of the cluster. As you may have noticed, we are setting bind=all
in our configuration, which simply means any machine can make an attempt to connect to our cluster. We need to add some security layers in order to have some prevention.
In the next section, we are going to learn how we can secure our cluster.
3.144.172.38