Using schema.ini to resolve data type issues

In this recipe, we will use a schema.ini file to resolve the data types when we connect to a fixed width text file data source with four columns.

Getting ready

To follow this recipe, download this chapter's files from the Packt website and use the file called Fixed Width Sample.txt.

This is what the file looks like when opened in a text editor showing special characters:

Getting ready

Note that this file does not have any column headers. In addition, note the following:

  • The first column should be text
  • The second column should be integers
  • The third column should be text
  • The fourth column should be dates

How to do it...

Here are the steps to use the .ini file with the fixed width text file:

  1. Create a text file with the following contents:
    How to do it...
  2. Save the file as schema.ini and save it in the same directory as the Fixed Width Sample.txt file.
  3. Connect to the text file in Tableau:
    How to do it...
  4. Confirm that there are four fields in the Tableau preview window, with the same configuration as specified in the schema.ini file:
    • First field is text
    • Second field is number
    • Third field is text
    • Fourth field is date
    How to do it...
  5. Add a new sheet and create your visualization using this data set.

How it works...

Connecting to text files can sometimes be more challenging than connecting to a database or server-based data source. Relational databases will typically have the data types and constraints built in. Tableau can read this metadata and interpret the correct types and settings for the data set.

Text files can be tricky. We usually need to identify delimiters (that is, how is one field separated from another). If we want headers, we will need to either manually assign them from within Tableau, or override them in a configuration file.

If we connect to the file in this recipe from Tableau without a configuration (or schema.ini) file, this is what we will get:

How it works...

There are a few things that are incorrect or missing:

  • The field names are missing.
  • The first field contains a null value for the very last record, because Tableau assumes this field is numeric based on the first few rows. The last record has an alphanumeric value of C160, which is invalid for a numeric field. Show how to change with just default text driver properties clicking on the dropdown.
  • The second field is interpreted as a date because the values, while numeric, can assume the format of yyyymmdd.

Tableau does allow us some flexibility when working with text files. When you click on the dropdown for the text file, there is an option for Text File Properties:

How it works...

This provides another window that allows us to specify the field separators, text qualifiers (that is, characters that enclose text values), character set, and locale:

How it works...

This still makes working with fixed width files without column headers a challenge. Microsoft recommends using schema.ini for all fixed length files. schema.ini provides a way to specify the data types and other configurations for the text file that Tableau can read. It does not solve all cases, but it can help with some.

Note

The format, supported fields, and options for schema.ini are documented in the MSDN page called the schema.ini File (Text File Driver), which can be found at http://bit.ly/msdn-schema-ini.

Tableau also has a KB article called Resolving Incorrect Data Type Issues Related to Jet, which can be found at the following URL: http://bit.ly/tableau-jet-engine.

There's more...

What we used in this recipe is one of the simpler text files that can be cleaned up using a schema.ini file. In reality, there are many limitations.

If we had spaces in the third column, for example, if record #2's name is Harris Jr, this is what we will get in Tableau even if we specified the width of the string in the schema.ini file:

There's more...

What if the date format was yyyy-dd-mm and we specified it in the schema.ini like this:

There's more...

Tableau still uses the date format yyyy-mm-dd and ignores the specification in the schema.ini file:

There's more...

There are other variations that demonstrate the limitations of schema.ini. Sometimes, the best way to approach data wrangling problems is to either export to another format that Tableau can more easily read, or to resort to other tools, or even scripting. For example, Python, R, or even PowerShell are great, powerful scripting tools that can give you much more flexibility with how to shape your data.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.140.195.225