PredictionIO is an open source machine learning server written in Scala, which under the hood uses Apache Spark and/or Apache Mahout for its machine learning algorithms. It uses HBase, Elasticsearch, and other different databases for data storage. PredictionIO provides a developer API where we can even create our own custom algorithms, evaluate them, and deploy in a fraction of time.
We will discuss the following topics:
PredictionIO is targeted for developers and data scientists, specifically because it can help with the following tasks:
How does PredictionIO achieve all these things you may ask? Well, we get the answer from the way it is architected, which is called DASE. It consists of the following components:
In the next figure, we illustrate how data, algorithms, serving and evaluator components fit together in the whole data processing pipeline. Event server passes events to the data source. These events are actually the real data, which we want to learn from. DataPreparator transforms it into a representation that can then be passed on to different learning algorithms.
Once the models are trained, they are persisted into storage, and can be used for either experimental evaluation, or for actual production usage. Serving layer queries different algorithms and produces a prediction for a prediction query.
For evaluation though, we need the evaluation layer, which uses the serving layer to generate predictions for test queries.
The architectural diagram of DASE components in PredictionIO:
You may want to look at its documentation (https://prediction.io/whatispredictionio) for more details. However, we will present you with a summary here:
Also, there is a broad range of pre-build engines that you can use from PredictionIO's Template Gallery. This will definitely get you started on your next scalable machine learning project.
Here is a pipeline view of the PredictionIO engine:
Let's now move on to some real examples. We first begin by installing PredictionIO and setting it up.
Installing PredictionIO is very straightforward. We can use a shell script that does all the installation for us, and using cURL we don't even need to download that script. Here is how you do it:
$ bash -c "$(curl -s https://install.prediction.io/install.sh)" Welcome to PredictionIO 0.9.4! Installation path (/home/tuxdna/PredictionIO): Vendor path (/home/tuxdna/PredictionIO/vendors): Please choose between the following sources (1, 2 or 3): 1) PostgreSQL 2) MySQL 3) Elasticsearch + HBase #? 3 Receive updates? [Y/n] n -------------------------------------------------------------------------------- OK, looks good! You are going to install PredictionIO to: /home/tuxdna/PredictionIO Vendor applications will go in: /home/tuxdna/PredictionIO/vendors Spark: /home/tuxdna/PredictionIO/vendors/spark-1.4.1 Elasticsearch: /home/tuxdna/PredictionIO/vendors/elasticsearch-1.4.4 HBase: /home/tuxdna/PredictionIO/vendors/hbase-1.0.0 ZooKeeper: /home/tuxdna/PredictionIO/vendors/zookeeper Select your linux distribution: 1) Debian/Ubuntu 2) Other #? 1 Would you like to install Java? [Y/n] n Locating JAVA_HOME... Found: /usr/lib/jvm/java-7-oracle -------------------------------------------------------------------------------- Installation of PredictionIO 0.9.4 complete! ... OUTPUT SKIPPED ...
Now that we have installed all the required dependencies, we also need to make sure that PredictionIO tools are on the PATH
variable so that we can use them. To do that we add them to the command path:
$ echo 'export PATH=$PATH:'"$HOME/PredictionIO/bin" >> ~/.bashrc
Now, open a new terminal and then start all the services required:
$ pio-start-all Starting Elasticsearch... Starting HBase... starting master, logging to /home/tuxdna/PredictionIO/vendors/hbase-1.0.0/bin/../logs/hbase-tuxdna-master-matrix02.out Waiting 10 seconds for HBase to fully initialize... Starting PredictionIO Event Server...
Note that you should not have any incompatible versions of Hadoop tools on the path. If there is an incompatible version of Hadoop tools, then HBase may not start up properly.
3.147.67.16