Chapter 8. Learning from User Feedback

PredictionIO is an open source machine learning server written in Scala, which under the hood uses Apache Spark and/or Apache Mahout for its machine learning algorithms. It uses HBase, Elasticsearch, and other different databases for data storage. PredictionIO provides a developer API where we can even create our own custom algorithms, evaluate them, and deploy in a fraction of time.

We will discuss the following topics:

  • Case study: PredictionIO
  • Hybrid recommender: PredictionIO unified recommender

Introducing PredictionIO

PredictionIO is targeted for developers and data scientists, specifically because it can help with the following tasks:

  • Connecting different pieces in the complete data processing pipeline into one coherent system
  • Prototyping predictive models
  • Training, persisting, and deploying predictive models in a distributed environment
  • Configuration changes and updates to the models without downtime

How does PredictionIO achieve all these things you may ask? Well, we get the answer from the way it is architected, which is called DASE. It consists of the following components:

  • Data includes data source and DataPreparator
  • Algorithm(s)
  • Serving
  • Evaluator

In the next figure, we illustrate how data, algorithms, serving and evaluator components fit together in the whole data processing pipeline. Event server passes events to the data source. These events are actually the real data, which we want to learn from. DataPreparator transforms it into a representation that can then be passed on to different learning algorithms.

Once the models are trained, they are persisted into storage, and can be used for either experimental evaluation, or for actual production usage. Serving layer queries different algorithms and produces a prediction for a prediction query.

For evaluation though, we need the evaluation layer, which uses the serving layer to generate predictions for test queries.

The architectural diagram of DASE components in PredictionIO:

Introducing PredictionIO

You may want to look at its documentation (https://prediction.io/whatispredictionio) for more details. However, we will present you with a summary here:

  • Based on the event source: The events are a source of inputs to the learning algorithms
  • DASE: This gives us a complete pipeline
  • Swap and evaluate: The algorithms evaluation step is integrated with the pipeline
  • Template Gallery: There are many ready-to-use algorithms in the Template Gallery

Also, there is a broad range of pre-build engines that you can use from PredictionIO's Template Gallery. This will definitely get you started on your next scalable machine learning project.

Here is a pipeline view of the PredictionIO engine:

Let's now move on to some real examples. We first begin by installing PredictionIO and setting it up.

Installing PredictionIO

Installing PredictionIO is very straightforward. We can use a shell script that does all the installation for us, and using cURL we don't even need to download that script. Here is how you do it:

$ bash -c "$(curl -s https://install.prediction.io/install.sh)"
Welcome to PredictionIO 0.9.4!
Installation path (/home/tuxdna/PredictionIO): 
Vendor path (/home/tuxdna/PredictionIO/vendors): 
Please choose between the following sources (1, 2 or 3):
1) PostgreSQL
2) MySQL
3) Elasticsearch + HBase
#? 3
Receive updates? [Y/n] n
--------------------------------------------------------------------------------
OK, looks good!
You are going to install PredictionIO to: /home/tuxdna/PredictionIO
Vendor applications will go in: /home/tuxdna/PredictionIO/vendors

Spark: /home/tuxdna/PredictionIO/vendors/spark-1.4.1
Elasticsearch: /home/tuxdna/PredictionIO/vendors/elasticsearch-1.4.4
HBase: /home/tuxdna/PredictionIO/vendors/hbase-1.0.0
ZooKeeper: /home/tuxdna/PredictionIO/vendors/zookeeper
Select your linux distribution:
1) Debian/Ubuntu
2) Other
#? 1
Would you like to install Java? [Y/n] n
Locating JAVA_HOME...
Found: /usr/lib/jvm/java-7-oracle
--------------------------------------------------------------------------------
Installation of PredictionIO 0.9.4 complete!
... OUTPUT SKIPPED ... 

Now that we have installed all the required dependencies, we also need to make sure that PredictionIO tools are on the PATH variable so that we can use them. To do that we add them to the command path:

$ echo 'export PATH=$PATH:'"$HOME/PredictionIO/bin" >> ~/.bashrc

Now, open a new terminal and then start all the services required:

$ pio-start-all
Starting Elasticsearch...
Starting HBase...
starting master, logging to /home/tuxdna/PredictionIO/vendors/hbase-1.0.0/bin/../logs/hbase-tuxdna-master-matrix02.out
Waiting 10 seconds for HBase to fully initialize...
Starting PredictionIO Event Server...

Note that you should not have any incompatible versions of Hadoop tools on the path. If there is an incompatible version of Hadoop tools, then HBase may not start up properly.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.147.67.16