Tables and databases

Once you have found the data set that's ideal for your visualization, it's helpful to know how data stores are structured and what the different terms are.

Data is stored in tables. A table is an array of items, and it can be as simple as a single word, letter, or number, or as complicated as millions (or more) of rows of transactions with timestamps, qualitative attributes (such as size or color), and numeric facts, such as the quantity of the purchased goods.

Both a single text file of data and a worksheet in an excel workbook are tables, though this may not be apparent. When grouped together in a method that has been designed to enable a user to retrieve data from them, they constitute a database. Typically, when we think of databases, we think of the Database Management Systems (DBMS) and languages that we use to make sense of the data in tables, such as Oracle, Teradata, or Microsoft's SQL Server. Currently, the Hadoop and NoSQL platforms are very popular because they are comparatively low-cost and can store very large sets of data, but Tableau Public does not enable a connection to to these platforms. They are considered enterprise tools that should be used with Tableau Desktop Professional. Therefore, our discussion about these tools is limited.

Tableau Public is designed in such a way that it allows users in a single data connection to join tables of data, which may or may not have been previously related to each other, as long as they are in the same format. In other words, multiple CSV files or worksheets can be joined in the same excel workbook. Then, users can specify the conditions under which they need to retrieve data from the tables and how to aggregate it (examples are given in following section). Thus, that data connection becomes a de-facto database.

The most common format of publicly available data is in a text file or a Character-Separated Values (CSV) file. CSV files are useful because they are simple. The rows of data, which may or may not contain a header row, are separated by line breaks. The fields within each row can be separated by a character. Typically, this character is a comma, pipe, or tab. Commas present difficulties because the content of the fields can contain them, which causes the text to shift into a new column.

Many public data sources do allow data to be downloaded as Excel documents. The World Bank has a comprehensive collection, and we will demonstrate the connective capabilities of Tableau Public using one of its data products. Tables can be joined in Tableau Public by manually identifying the common field among the tables.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.145.170.75