Data structure

Modeler can read data from a variety of different file formats. If you take a look at the Sources palette, you will see that Modeler can read data from any one of the data import nodes: text files, databases, IBM SPSS Statistics or Cognos, SAS, or Excel, in addition to many others. In fact, unlike programs such as Microsoft Excel or IBM SPSS Statistics, where you can manually enter data into the program, in Modeler you must read data into the software from an external file. In this chapter, the focus will be on reading data files from free-field text files, which is a common file type. We will also briefly show you the options for reading in data from two other commonly used sources, Microsoft Excel and ODBC databases.

The following figure depicts the typical data structure used in Modeler. In general, Modeler uses a data structure in which the rows of a table represent cases and the columns of a table represent variables. In this figure, we can see that each row represents a person and each column represents a demographic characteristic of that individual. For example, in the first row we have the data for the person with the label ID 1001. We can see that this person is 73 years old, is a High school graduate, and so on:

When data is in this format, it is very important that each row in a file has the same unit of analysis. For example, in a file where each record or row of data represents a unique employee, then the employee is the unit of analysis. In another file, each row may represent a department; in this case the department is the unit of analysis. If a project requirement is to merge these two files, you might need to aggregate the employee information so that each row represents the department.

Table of Contents for Data structure

Create new playlist

Sign In

Sign Up

Table of Contents for
Data structure