Alternatively, we can load the data from a database using one of the JDBC data models. In this chapter, we will not dive into the detailed instructions of how to set up a database, connections, and so on, but we will give a sketch of how this can be done.
Database connectors have been moved to a separate package, mahout-integration; hence, we have to add the package to our dependency list. Open the pom.xml file and add the following dependency:
<dependency> <groupId>org.apache.mahout</groupId> <artifactId>mahout-integration</artifactId> <version>0.7</version> </dependency>
Consider that we want to connect to a MySQL database. In this case, we will also need a package that handles database connections. Add the following to the pom.xml file:
<dependency> <groupId>mysql</groupId> <artifactId>mysql-connector-java</artifactId> <version>5.1.35</version> </dependency>
Now, we have all of the packages, so we can create a connection. First, let's initialize a DataSource class with connection details, as follows:
MysqlDataSource dbsource = new MysqlDataSource(); dbsource.setUser("user"); dbsource.setPassword("pass"); dbsource.setServerName("hostname.com"); dbsource.setDatabaseName("db");
Mahout integration implements JDBCDataModel to various databases that can be accessed via JDBC. By default, this class assumes that there is a DataSource available under the JNDI name, jdbc/taste, which gives access to a database with a
taste_preferences table, with the following schema:
CREATE TABLE taste_preferences ( user_id BIGINT NOT NULL, item_id BIGINT NOT NULL, preference REAL NOT NULL, PRIMARY KEY (user_id, item_id) ) CREATE INDEX taste_preferences_user_id_index ON taste_preferences
(user_id); CREATE INDEX taste_preferences_item_id_index ON taste_preferences
(item_id);
A database-backed data model is initialized as follows. In addition to the DB connection object, we can specify the custom table name and the table column names, as follows:
DataModel dataModel = new MySQLJDBCDataModel(dbsource,
"taste_preferences", "user_id", "item_id", "preference", "timestamp");