In this recipe, we will use a schema.ini
file to resolve the data types when we connect to a fixed width text file data source with four columns.
To follow this recipe, download this chapter's files from the Packt website and use the file called Fixed Width Sample.txt
.
This is what the file looks like when opened in a text editor showing special characters:
Note that this file does not have any column headers. In addition, note the following:
Here are the steps to use the .ini
file with the fixed width text file:
schema.ini
and save it in the same directory as the Fixed Width Sample.txt
file.schema.ini
file:Connecting to text files can sometimes be more challenging than connecting to a database or server-based data source. Relational databases will typically have the data types and constraints built in. Tableau can read this metadata and interpret the correct types and settings for the data set.
Text files can be tricky. We usually need to identify delimiters (that is, how is one field separated from another). If we want headers, we will need to either manually assign them from within Tableau, or override them in a configuration file.
If we connect to the file in this recipe from Tableau without a configuration (or schema.ini
) file, this is what we will get:
There are a few things that are incorrect or missing:
Tableau does allow us some flexibility when working with text files. When you click on the dropdown for the text file, there is an option for Text File Properties:
This provides another window that allows us to specify the field separators, text qualifiers (that is, characters that enclose text values), character set, and locale:
This still makes working with fixed width files without column headers a challenge. Microsoft recommends using schema.ini
for all fixed length files. schema.ini
provides a way to specify the data types and other configurations for the text file that Tableau can read. It does not solve all cases, but it can help with some.
The format, supported fields, and options for schema.ini
are documented in the MSDN page called the schema.ini
File (Text File Driver), which can be found at http://bit.ly/msdn-schema-ini.
Tableau also has a KB article called Resolving Incorrect Data Type Issues Related to Jet, which can be found at the following URL: http://bit.ly/tableau-jet-engine.
What we used in this recipe is one of the simpler text files that can be cleaned up using a schema.ini
file. In reality, there are many limitations.
If we had spaces in the third column, for example, if record #2's name is Harris Jr, this is what we will get in Tableau even if we specified the width of the string in the schema.ini
file:
What if the date format was yyyy-dd-mm
and we specified it in the schema.ini
like this:
Tableau still uses the date format yyyy-mm-dd
and ignores the specification in the schema.ini
file:
There are other variations that demonstrate the limitations of schema.ini
. Sometimes, the best way to approach data wrangling problems is to either export to another format that Tableau can more easily read, or to resort to other tools, or even scripting. For example, Python, R, or even PowerShell are great, powerful scripting tools that can give you much more flexibility with how to shape your data.
3.140.195.225