Online repositories

Look back to the Web Technologies task view we talked about in the previous section. There are a tremendous amount of R packages specifically designed to import data directly from specialized sources on the web. Among these are packages to search for and retrieve the full text of academic articles in the Public Library of Science journals (rplos), search for and download the full text of Wikipedia articles (WikipediR), download data about Berlin from the German government (BerlinData), interface with the Chromosome Counts Database (chromer), download historical financial data (quantmod), and access the information in the PubChem chemistry database (rpubchem).

These examples notwithstanding, given that there are many hundreds of immense repositories of public data, it is far too much to expect the R community to have a package specially built for every single one. Luckily, with the ability to handle many different data formats under our belt, we can just download and import the data from these repositories ourselves. The following are a few of my favorite repositories. Perhaps some of them will have dedicated R packages for handling them by the time you read this.

  • data.gov: a huge repository of data from the US government in a variety of formats including CSV, XML, and JSON
  • data.gov.uk: the UK's equivalent repository
  • data.worldbank.org: a spot for data made available by the World Bank including data on climate change, poverty, and aid effectiveness
  • archive.ics.uci.edu/ml/: 333 (at time of writing) datasets of various length and widths for testing statistical learning algorithms
  • www.cdc.gov/nchs/data_access/ftp_data.htm: some health-related data sets made available by the US Center of Disease Control
Online repositories
Online repositories
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.217.193.85