The Chilean plebiscite data

The Chilean plebiscite of 1988 was set to decide whether General Augusto Pinochet—ruler of Chile at the time—should or should not extend his ruling for an additional 8 years. The majority of voters picked No (nearly 56%), hence marking the ending the Pinochet's era. Data for voting intentions can be found in the car package. The dataset in question is called Chile.

Let's start by gathering some knowledge on the dataset. Uncomment and run the first line if you haven't installed car yet:

# install.packages('car')
library(car)
?Chile
head(Chile)

The dataset has eight variables. Four of them are numerical:

  • population: The size of the respondent's community
  • age: Measured in years
  • income: Monthly income (in pesos)
  • statusquo: Scale of support for the status quo

The remaining four are categorical:

  • region: A factor with levels for regions—central (C), Metropolitan Santiago (M), north (N), south (S) and city of Santiago (SA)
  • sex: This displays F for females and M for males.
  • education: A factor with levels for education—primary (P), post-secondary (PS) and secondary (S)
  • vote: A factor with levels—abstain (A), will vote against Pinochet (N), will vote in favor of Pinochet (Y) and undecided (U)

Compared with the dataset just used, Chile has a higher dimensionality. Always take care what kind of information you are using. If something could cause a decision tree to lose some of its interpretability, this something is a lot of nodes (input and output). Overcomplicated models usually tend to generalize badly.

Using the Chile dataset, it's time to get practical. Taking advantage of the large number of packages already created for R, let's try a decision tree to predict vote intentions from Chile data. With the following, we see how the strengths and weakness play their roles in a practical example.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.145.81.98