This recipe shows how to parse comma-separated values (CSV) files using a datasource, showing a clean approach to CSV processing.
We start by installing CakePHP's datasources plugin. Download the latest release from http://github.com/mariano/datasources/downloads and uncompress the downloaded file into your app/plugins
folder. You should now have a directory named datasources
inside app/plugins
.
The datasources plugin, located at http://github.com/cakephp/datasources, is an official CakePHP plugin that offers several community-provided datasources, such as XML-RPC and SOAP. This and other recipes use a customized version of the plugin, modified for the purpose of this book.
We need some sample data to work with. Create a file named contacts.csv
and place it in a folder of your choice (such as /home/mariano
), with contents similar to the ones shown below. This example includes only two rows of data, but the file used in this recipe uses several more rows, and should include the starting header row:
name,email,country,gender,age "John Doe","[email protected]","United States of America","Male",34 "Jane Doe","[email protected]","United Kingdom","Female",25
app/config/database.php
file and add the following connection:public $csv = array( 'datasource' => 'datasources.CsvSource', 'path' => '/home/mariano/', 'readonly' => true );
Contact
in a file named contact.php
and place it in your app/models
folder with the following contents:<?php class Contact extends AppModel { public $useDbConfig = 'csv'; } ?>
contacts_controller.php
and place it in your app/controllers
folder with the following contents:<?php class ContactsController extends AppController { public function index() { $this->set('contacts', $this->paginate()); } } ?>
contacts
in your app/views
folder, and in that folder create a file named index.ctp
with the following contents:<p> <?php echo $this->Paginator->prev(); ?> <?php echo $this->Paginator->numbers(); ?> <?php echo $this->Paginator->next(); ?> </p> <table> <thead><tr> <th>ID</th> <th>Name</th> <th>Email</th> <th>Country</th> <th>Gender</th> <th>Age</th> </tr></thead> <tbody> <?php foreach($contacts as $contact) { ?> <tr> <td><?php echo $contact['id']; ?></td> <td><?php echo $contact['name']; ?></td> <td><?php echo $contact['email']; ?></td> <td><?php echo $contact['country']; ?></td> <td><?php echo $contact['gender']; ?></td> <td><?php echo $contact['age']; ?></td> </tr> <?php } ?> </tbody> </table>
If we now browse to http://localhost/contacts
, we should see a paginated list, as shown in the following screenshot:
We start by creating a new connection named csv
, specifying datasources.CsvSource
as its type, that is, a datasource named CsvSource
that is a part of a plugin named datasources
. We set the path to our CSV files to CakePHP's temporary directory using the path
setting, and we specify that we don't want that path to be created if it doesn't exist, by setting readonly
to true.
The fork we are using in this recipe adds a feature to the original plugin: allowing one to change the CSV file used via the model property table
We then create the Contact
model, specifying its underlying connection to be csv
through the useDbConfig
property. The CSV data source will then use the respective table name as the name of the file, attaching the csv
extension to it. In this case, the CSV data source will use contacts for the Contact
model, which can be changed through the model property table.
Using that file name, it will look for it in the path that was defined in the connection settings. If the file cannot be loaded, or if the path does not exist, it will throw a missing table error, just as any model with a missing table would.
Once the file is properly loaded, the datasource allows us to fetch records by issuing simple find()
calls. It supports some of the most common find settings: limit, page, fields
, and includes basic support for defining the setting conditions
to limit the obtained records (see the There's more section below).
The rest of the recipe shows how we use our Contact
model just as we would use any model, exemplifying this flexibility with a paginated list of parsed CSV records.
Other than being able to define which page to obtain (through the page
find setting) and how many records to obtain (using the limit
find setting), the CSV datasource allows for some basic filtering, by means of the handy Set::matches()
method. For example, we can modify our paginated list to obtain contacts whose ages are over 30, by adding the following conditions
setting to our index()
method:
public function index()
{
$this->paginate = array(
'conditions' => array('age >' => 30)
);
$this->set('contacts', $this->paginate());
}
The example used in this recipe is bound to the file contacts.csv
by means of the default table named for the Contact
model, but what would be required if we needed to process several CSV files and we don't want to create a model for each of those files?
Using the table
model property, we can dynamically change the underlying CSV file a model is importing from, and execute our find
operations just as if we would've created a model specifically for this file. We start by creating a model that uses the csv
connection, but that is not tied to any file:
<?php class Csv extends AppModel { public $useDbConfig = 'csv'; public $useTable = false; } ?>
Setting useTable
to false
allows us to avoid any file loading. We can then use the listSources()
datasource method to obtain a list of all CSV files available for importing, and then dynamically change the table
model property for each of those files, and fetch the actual records. We do this in the following controller
method:
public function import() { $this->loadModel('Csv'), $sources = array_flip($this->Csv->getDataSource()->listSources()); foreach($sources as $source => $null) { $this->Csv->table = $source; $sources[$source] = $this->Csv->find('all'), } debug($sources); exit; }
The list of files obtained through the listSources()
method is fetched from the path
setting specified in the datasource configuration, as defined in app/config/database.php
. This path can be changed by first cleaning up the current connection, which releases the handle to the previously configured path, by using the datasource setConfig()
method to change the path
setting, and then calling it's connect()
method to load the path:
$dataSource = $this->Csv->getDataSource(); $dataSource->close(); $dataSource->setConfig(array('path' => '/home/john/')); $dataSource->connect();
18.188.115.155