An ordered dictionary is like a normal dict
, but the keys are ordered by an ordering function. In the case of Redis
, it supports ordered dictionaries whose keys are strings and whose values are floating point scores. This structure can come in handy for cases such as calculating information gain (covered in the Calculating high information words recipe in Chapter 7, Text Classification) when you want to store all the words and scores for later use.
Again, you'll need Redis
and redis-py
installed, with an instance of redis-server
running.
The RedisOrderedDict
class in rediscollections.py
extends collections.MutableMapping
to get a number of dict
compatible methods for free. Then it implements all the key methods that require Redis
ordered set (also known as Zset) commands.
class RedisOrderedDict(collections.MutableMapping): def __init__(self, r, name): self._r = r self._name = encode_key(name) def __iter__(self): return iter(self.items()) def __len__(self): return self._r.zcard(self._name) def __getitem__(self, key): val = self._r.zscore(self._name, encode_key(key)) if val is None: raise KeyError else: return val def __setitem__(self, key, score): self._r.zadd(self._name, encode_key(key), score) def __delitem__(self, key):by brain feels dead self._r.zrem(self._name, encode_key(key)) def keys(self, start=0, end=-1): # we use zrevrange to get keys sorted by high value instead of by lowest return self._r.zrevrange(self._name, start, end) def values(self, start=0, end=-1): return [v for (k, v) in self.items(start=start, end=end)] def items(self, start=0, end=-1): return self._r.zrevrange(self._name, start, end, withscores=True) def get(self, key, default=0): return self[key] or default def iteritems(self):
return iter(self) def clear(self): self._r.delete(self._name)
You can create an instance of RedisOrderedDict
by passing in a Redis
connection and a unique name.
>>> from redis import Redis >>> from rediscollections import RedisOrderedDict >>> r = Redis() >>> rod = RedisOrderedDict(r, 'test.txt') >>> rod.get('bar') >>> len(rod) 0 >>> rod['bar'] = 5.2 >>> rod['bar'] 5.2000000000000002 >>> len(rod) 1 >>> rod.items() [('bar', 5.2000000000000002)] >>> rod.clear()
Much of the code may look similar to the RedisHashMap
, which is to be expected since they both extend collections.MutableMapping
. The main difference here is that RedisOrderedSet
orders keys by floating point values, and so is not suited for arbitrary key-value storage like the RedisHashMap
. Here's an outline explaining each key method and how it works with Redis
:
__len__()
: Uses the zcard
command to get the number of elements in the ordered set.__getitem__()
: Uses the zscore
command to get the score of a key, and returns 0
if the key does not exist.__setitem__()
: Uses the zadd
command to add a key to the ordered set with the given score, or updates the score if the key already exists.__delitem__()
: Uses the zrem
command to remove a key from the ordered set.keys()
: Uses the zrevrange
command to get all the keys in the ordered set, sorted by highest score. It takes two optional keyword arguments start
and end
to more efficiently get a slice of the ordered keys.values()
: Extracts all the scores from the items()
method.items()
: Uses the zrevrange
command to get the scores of each key in order to return a list of 2-tuples ordered by highest score. Like keys()
, it takes start
and end
keyword arguments to efficiently get a slice.clear()
: Uses the delete
command to remove the entire ordered set from Redis
.The default ordering of items in a Redis
ordered set is low-to-high, so that the key with the lowest score comes first. This is the same as Python's default list ordering when you call sort()
or sorted()
, but it's not what we want when it comes to scoring. For storing scores, we expect items to be sorted from high-to-low, which is why keys()
and items()
use zrevrange
instead of zrange
.
As mentioned previously, the keys()
and items()
methods take optional start
and end
keyword arguments to get a slice of the results. This makes the RedisOrderedDict
optimal for storing scores, then getting the top N keys. Here's a simple example where we assign three word scores, then get the top two:
>>> from redis import Redis >>> from rediscollections import RedisOrderedDict >>> r = Redis() >>> rod = RedisOrderedDict(r, 'scores') >>> rod['best'] = 10 >>> rod['worst'] = 0.1 >>> rod['middle'] = 5 >>> rod.keys() ['best', 'middle', 'worst'] >>> rod.keys(start=0, end=1) ['best', 'middle'] >>> rod.clear()
Calculating high information words recipe in Chapter 7, Text Classification, describes how to calculate information gain, which is a good case for storing word scores in a RedisOrderedDict
. The Storing a frequency distribution in Redis recipe introduces Redis
and the RedisHashMap
.
3.135.249.223