The last category that you tend to hear about with types of data is ordinal data, and it's sort of a mixture of numerical and categorical data. A common example is star ratings for a movie or music, or what have you.
In this case, we have categorical data in that could be 1 through 5 stars, where 1 might represent poor and 5 might represent excellent, but they do have mathematical meaning. We do know that 5 means it's better than a 1, so this is a case where we have data where the different categories have a numerical relationship to each other. So, I can say that 1 star is less than 5 stars, I can say that 2 stars is less than 3 stars, I can say that 4 stars is greater than 2 stars in terms of a measure of quality. Now you could also think of the actual number of stars as discrete numerical data. So, it's definitely a fine line between these categories, and in a lot of cases you can actually treat them interchangeably.
So, there you have it, the three different types. There is numerical, categorical, and ordinal data. Let's see if it's sunk in. Don't worry, I'm not going to make you hand in your work or anything.
Quick quiz: For each of these examples, is the data numerical, categorical, or ordinal?
- Let's start with how much gas is in your gas tank. What do you think? Well, the right answer is numerical. It's a continuous numerical value because you can have an infinite range of possibilities of gas in your tank. I mean, yeah, there's probably some upper bound of how much gas you can fit in it, but there is no end to the number of possible values of how much gas you have. It could be three quarters of a tank, it could be seven sixteenths of the tank, it could be 1/pi of a tank, I mean who knows, right?
- How about if you're reading your overall health on a scale of 1 to 4, where those choices correspond to the categories poor, moderate, good, and excellent? What do you think? That's a good example of ordinal data. That's very much like our movie ratings data, and again, depending on how you model that, you could probably treat it as discrete numerical data as well, but technically we're going to call that ordinal data.
- What about the races of your classmates? This is a pretty clear example of categorical data. You can't really compare purple people to green people, right, they're just purple and green, but they are categories that you might want to study and understand the differences between on some other dimension.
- How about the ages of your classmates in years? A little bit of a trick question there; if I said it had to be in an integer value of years, like 40, 50, or 55 years old, then that would be discrete numerical data, but if I had more precision, like 40 years three months and 2.67 days, that would be continuous numerical data, but either way, it's a numerical data type.
- And finally, money spent in a store. Again, that could be an example of continuous numerical data. So again, this is only important because you might apply different techniques to different types of data.
There might be some concepts where we do one type of implementation for categorical data and a different type of implementation for numerical data, for example.
So that's all you need to know about the different types of data that you'll commonly find, and that we'll focus on in this book. They're all pretty simple concepts: you've got numeric, categorical, and ordinal data, and numerical data can be continuous or discrete. There might be different techniques you apply to the data depending on what kind of data you're dealing with, and we'll see that throughout the book. Let's move on.