Parsing CSV files with a datasource

This recipe shows how to parse comma-separated values (CSV) files using a datasource, showing a clean approach to CSV processing.

Getting ready

We start by installing CakePHP's datasources plugin. Download the latest release from http://github.com/mariano/datasources/downloads and uncompress the downloaded file into your app/plugins folder. You should now have a directory named datasources inside app/plugins.

The datasources plugin, located at http://github.com/cakephp/datasources, is an official CakePHP plugin that offers several community-provided datasources, such as XML-RPC and SOAP. This and other recipes use a customized version of the plugin, modified for the purpose of this book.

We need some sample data to work with. Create a file named contacts.csv and place it in a folder of your choice (such as /home/mariano), with contents similar to the ones shown below. This example includes only two rows of data, but the file used in this recipe uses several more rows, and should include the starting header row:

name,email,country,gender,age
"John Doe","[email protected]","United States of America","Male",34
"Jane Doe","[email protected]","United Kingdom","Female",25

How to do it...

  1. We start by creating a connection to use the CSV datasource. Open your app/config/database.php file and add the following connection:
    public $csv = array(
    'datasource' => 'datasources.CsvSource',
    'path' => '/home/mariano/',
    'readonly' => true
    );
    
  2. Create a model named Contact in a file named contact.php and place it in your app/models folder with the following contents:
    <?php
    class Contact extends AppModel
    {
    public $useDbConfig = 'csv';
    }
    ?>
    
  3. Create its controller in a file named contacts_controller.php and place it in your app/controllers folder with the following contents:
    <?php
    class ContactsController extends AppController
    {
    public function index()
    {
    $this->set('contacts', $this->paginate());
    }
    }
    ?>
    
  4. Finally, we need to create the view. Create a folder named contacts in your app/views folder, and in that folder create a file named index.ctp with the following contents:
    <p>
    <?php echo $this->Paginator->prev(); ?>&nbsp;
    <?php echo $this->Paginator->numbers(); ?>&nbsp;
    <?php echo $this->Paginator->next(); ?>
    </p>
    <table>
    <thead><tr>
    <th>ID</th>
    <th>Name</th>
    <th>Email</th>
    <th>Country</th>
    <th>Gender</th>
    <th>Age</th>
    </tr></thead>
    <tbody>
    <?php foreach($contacts as $contact) { ?>
    <tr>
    <td><?php echo $contact['id']; ?></td>
    <td><?php echo $contact['name']; ?></td>
    <td><?php echo $contact['email']; ?></td>
    <td><?php echo $contact['country']; ?></td>
    <td><?php echo $contact['gender']; ?></td>
    <td><?php echo $contact['age']; ?></td>
    </tr>
    <?php } ?>
    </tbody>
    </table>
    
    

    If we now browse to http://localhost/contacts, we should see a paginated list, as shown in the following screenshot:

    How to do it...

How it works...

We start by creating a new connection named csv, specifying datasources.CsvSource as its type, that is, a datasource named CsvSource that is a part of a plugin named datasources. We set the path to our CSV files to CakePHP's temporary directory using the path setting, and we specify that we don't want that path to be created if it doesn't exist, by setting readonly to true.

Note

The fork we are using in this recipe adds a feature to the original plugin: allowing one to change the CSV file used via the model property table

We then create the Contact model, specifying its underlying connection to be csv through the useDbConfig property. The CSV data source will then use the respective table name as the name of the file, attaching the csv extension to it. In this case, the CSV data source will use contacts for the Contact model, which can be changed through the model property table.

Using that file name, it will look for it in the path that was defined in the connection settings. If the file cannot be loaded, or if the path does not exist, it will throw a missing table error, just as any model with a missing table would.

Note

The default csv extension can be changed by specifying the extension setting in the connection.

Once the file is properly loaded, the datasource allows us to fetch records by issuing simple find() calls. It supports some of the most common find settings: limit, page, fields, and includes basic support for defining the setting conditions to limit the obtained records (see the There's more section below).

The rest of the recipe shows how we use our Contact model just as we would use any model, exemplifying this flexibility with a paginated list of parsed CSV records.

There's more...

Other than being able to define which page to obtain (through the page find setting) and how many records to obtain (using the limit find setting), the CSV datasource allows for some basic filtering, by means of the handy Set::matches() method. For example, we can modify our paginated list to obtain contacts whose ages are over 30, by adding the following conditions setting to our index() method:

public function index()
{
$this->paginate = array(
'conditions' => array('age >' => 30)
);
$this->set('contacts', $this->paginate());
}

Dynamic loading of CSV files

The example used in this recipe is bound to the file contacts.csv by means of the default table named for the Contact model, but what would be required if we needed to process several CSV files and we don't want to create a model for each of those files?

Using the table model property, we can dynamically change the underlying CSV file a model is importing from, and execute our find operations just as if we would've created a model specifically for this file. We start by creating a model that uses the csv connection, but that is not tied to any file:

<?php
class Csv extends AppModel
{
public $useDbConfig = 'csv';
public $useTable = false;
}
?>

Setting useTable to false allows us to avoid any file loading. We can then use the listSources() datasource method to obtain a list of all CSV files available for importing, and then dynamically change the table model property for each of those files, and fetch the actual records. We do this in the following controller method:

public function import()
{
$this->loadModel('Csv'),
$sources = array_flip($this->Csv->getDataSource()->listSources());
foreach($sources as $source => $null)
{
$this->Csv->table = $source;
$sources[$source] = $this->Csv->find('all'),
}
debug($sources);
exit;
}

The list of files obtained through the listSources() method is fetched from the path setting specified in the datasource configuration, as defined in app/config/database.php. This path can be changed by first cleaning up the current connection, which releases the handle to the previously configured path, by using the datasource setConfig() method to change the path setting, and then calling it's connect() method to load the path:

$dataSource = $this->Csv->getDataSource();
$dataSource->close();
$dataSource->setConfig(array('path' => '/home/john/'));
$dataSource->connect();
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.188.115.155