Redis is known as an ultrafast data store because it can not only deliver data at a higher speed but also accept new data into its data set at a higher speed. There is not much performance difference between read and write operations in Redis, leaving persistence aside. It is critical to know how to feed a large set of data into Redis in a short burst of time.
The write operation in Redis can be of two types:
require "redis" redis = Redis.new(:host => "127.0.0.1", :port => 6379) redis.pipelined do redis.set "user", "user1" redis.set "userid", 1 redis.incr "totallogin" end
Clients needing to write and update multiple keys as a response to some user action is a common operation. When multiple keys are to be updated, the commands are sent sequentially to Redis. Let us see how the commands are sent and how it affects performance. Assume that we need to write two new keys and increment a counter in Redis as a response to some user action. So there are three commands in total.
In this case, normally we need to perform three different operations:
SET user User1 OK SET userid 1 OK INCR totalLogin (integer) 14
Considering a network roundtrip of 100 ms from the client to the server, and ignoring the Redis execution time, total time to execute all three commands will be:
Total time = (request sent + response from server) + (request sent + response from server) + (request sent+ response from server)
Total time = 100 ms + 100 ms + 100 ms (ignoring Redis's execution time)
Total time = 300 ms
So, the number of commands executed increases the time proportionally. This is because Redis is a TCP server running with request/response protocol. The server and the client are connected through a network socket and are forced to suffer the network latency even if you run the client on the server itself. Consider your Redis setup can process at least 50,000 requests per second, but if your network latency is 100 ms as described above, we can process only a maximum of 10 requests in a second, no matter how fast our Redis server works. As it is not possible to reduce the travel time between the server and client, the solution is to reduce the number of trips made between the server and client. In short, the lesser the number of round-trips, the more the number of requests processed by Redis in a second. Redis provides us with a solution for this problem: pipelining.
Let us take a look at the same example and see how pipelining will help us. As all the commands are sent to the server in a single flush over the wires, there will be only one roundtrip between the server and the client. So the time taken will be a little more than 100 ms. By using pipelining, we can gain a 200 percent performance increase for simple commands.
One thing to be aware of in pipelining is that the server will be forced to queue the response using the memory. So to prevent memory spikes, always send a considerable amount of commands in a single pipeline. We can send a few hundreds of commands, read the response, and then send another batch in the next pipeline to keep a check on memory usage. The performance will be almost the same.
By reducing the number of roundtrips between the server and the clients, pipelining provides an efficient way of writing data into Redis at faster speeds.
There might be other situations where it is necessary to import millions of records into Redis in a very short span of time.
The next type of data imports millions of records in a short span. In this recipe, we will take a look at how to feed Redis with a huge amount of data as fast as possible. Usually, bulk importing of data into Redis is performed in the following scenarios:
In these cases, the data to be imported is usually huge and has millions of writes. To achieve the import, using a normal Redis client is not a good idea as sending commands in a sequential manner will be slow and we need to pay for the roundtrip. We can use pipelining, but pipelining makes it impossible for the data inserted to be read at the same time, as Redis will not commit the data till all the commands in the pipeline are executed.
The most recommended way to mass-import a huge data set is to generate a text file with the commands in the Redis protocol format and use the file to import the data into Redis. The redis-cli interface provides a pipe mode to perform a bulk import from a raw file with commands as per Redis protocol specifications (http://redis.io/topics/protocol).
The protocol is simple and binary safe, and its format is as follows:
*<number of arguments> CR LF $<number of bytes of argument 1> CR LF <argument data> CR LF ... $<number of bytes of argument N> CR LF <argument data> CR LF
Where CR
is
(or ASCII character 13) and LF
is
(or ASCII character 10).
For example, execute the following command:
SET samplekey testvalue
This command will look like the following in the raw file:
*3 - Number of arguments = SET + key + value = 3 $3 - Number of Bytes = 3 - SET has 3 bytes SET $8 - Number of bytes in samplekey samplekey $9 testvalue - Number of bytes in testvalue
This translates to the following:
*3
$3
SET
$8
samplekey
$9
testvalue
Redis uses the same format for both its request and response. As the protocol itself is simple, a simple program can generate a text file with all the commands in raw format.
Once the text file is generated, the data in the text file, say redis-data.txt
, which has 1 million commands, can be imported into Redis using a simple command, as follows:
cat redis-data.txt | redis-cli --pipe
After the execution, the output will look like the following:
All data transferred. Waiting for the last reply... Last reply received from server. errors: 0, replies: 1000000
The pipe mode not only tries to send the data to the server as fast as possible, but also reads and tries to parse the data when available. When it finds that there is no more data to send, it sends an ECHO
command with a random 20 bytes to the server and then starts listening for the server's response. The server sends a response and sends the same 20 bytes to signal the end of the response. Because we use this, redis-cli does not need to know which commands or how many commands were sent to the server. But by counting the response, it provides us with a brief report about the status of our bulk import.
18.119.235.79