Filtering datasets with $where

While we can filter datasets before we import them into Incanter, Incanter makes it easy to filter and create new datasets from the existing ones. We'll take a look at its query language in this recipe.

Getting ready

We'll use the same dependencies, imports, and data as we did in the Selecting columns with $ recipe.

How to do it…

Once we have the data, we query it using the $where function:

  1. For example, this creates a dataset with a row for the percentage of China's total land area that is used for agriculture:
    user=> (def land-use
             (i/$where {:Indicator-Code "AG.LND.AGRI.ZS"}
                       chn-data))
    user=> (i/nrow land-use)
    1
    user=> (i/$ [:Indicator-Code :2000] land-use)
    ("AG.LND.AGRI.ZS" "56.2891584865366")
  2. The queries can be more complicated too. This expression picks out the data that exists for 1962 by filtering any empty strings in that column:
    user=> (i/$ (range 5) [:Indicator-Code :1962]
             (i/$where {:1962 {:ne ""}} chn-data))
    
    |   :Indicator-Code |             :1962 |
    |-------------------+-------------------|
    |    AG.AGR.TRAC.NO |             55360 |
    |    AG.LND.AGRI.K2 |           3460010 |
    |    AG.LND.AGRI.ZS |  37.0949187612906 |
    |    AG.LND.ARBL.HA |         103100000 |
    | AG.LND.ARBL.HA.PC | 0.154858284392508 |

Incanter's query language is even more powerful than this, but these examples should show you the basic structure and give you an idea of the possibilities.

How it works…

To better understand how to use $where, let's break apart the last example:

($i/where {:1962 {:ne ""}} chn-data)

The query is expressed as a hashmap from fields to values (highlighted). As we saw in the first example, the value can be a raw value, either a literal or an expression. This tests for inequality.

($i/where {:1962 {:ne ""}} chn-data)

Each test pair is associated with a field in another hashmap (highlighted).

In this example, both the hashmaps shown only contain one key-value pair. However, they might contain multiple pairs, which will all be ANDed together.

Incanter supports a number of test operators. The basic boolean tests are :$gt (greater than), :$lt (less than), :$gte (greater than or equal to), :$lte (less than or equal to), :$eq (equal to), and :$ne (not equal). There are also some operators that take sets as parameters: :$in and :$nin (not in).

The last operator—:$fn—is interesting. It allows you to use any predicate function. For example, this will randomly select approximately half of the dataset:

(def random-half
  (i/$where {:Indicator-Code {:$fn (fn [_] (< (rand) 0.5))}}
            chnchn-data))

There's more…

For full details of the query language, see the documentation for incanter.core/query-dataset (http://liebke.github.com/incanter/core-api.html#incanter.core/query-dataset).

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.218.31.165