APPENDIX C

image

Datasets

This appendix gives details and sources where relevant for each of the example datasets used in this book. The datasets are available as an R workspace file or as separate csv files with the downloads for this book (www.apress.com/9781484201404).

apartments

Description

Details of 32 one-bedroom apartments advertised for rent within a five-mile radius of Bishops Stortford, Hertfordshire, UK, in October 2012.

Variables

Town

Location of apartment

Furnished

Provided furnished (Yes or No)

Price.Cat

Rental price category (per calendar month)

Source

www.rightmove.co.uk

Used in

Chapter 6

bigcats

Description

Average weights for four big cat species

Variables

Name

Name of species. Note that first instant of Leopard refers to Pantera Pardus and the second instance refers to Unica unica (Snow Leopard)

Weight

Mean weight in kilograms for male of species

Source

Url: dialspace.dial.pipex.com/agarman/facts1.htm

Used in

Chapter 4

bottles

Description

Dataset giving the volume of liquid within 30 bottles of soft drink randomly selected from a production line. This is fictional data

Variables

Volume

Volume (milliliters)

Used in

Chapters 5, 10

brains

Description

Brain volume for 10 pairs of monozygotic twins, measured using magnetic resonance imaging and computer-based image analysis techniques

Variables

Pair

Pair identifier

Twin1

Total brain volume of first-born twin (cubic centimeters)

Twin2

Total brain volume of second-born twin (cubic centimeters)

Source

This data is taken from the article “Brain Size, Head Size, and IQ in Monozygotic Twins,” by Tramo, M. J., et al. and published in Neurology 1998; 50:1246-1252. Reproduced with permission. Url: lib.stat.cmu.edu/datasets/IQ_Brain_Size

Used in

Chapter 10

CIAdata1, CIAdata2

Description

Demographic data for seven European countries

Variables

country

Country name

lifeExp

Life expectancy (years)

urban

Living in urban areas (%)

pcGDP

Per capita gross domestic product ($US)

Source

Collected from the CIA World Factbook on August 5, 2012 Url: https://www.cia.gov/library/publications/the-world-factbook/

Used in

Chapters 4, 9

coffeeshop

Description

Total sales at a coffee shop over a five-day period. This is fictional data

Variables

Date

Date in format dd/mmm/yyyy

Sales

Sales for the day (£)

Used in

Chapter 3

concrete

Description

The results of an experiment to determine the best concrete mix

Variables

Cement

Cement type (I or II)

Additive

Additive (A or B)

Additive.Dose

Additive dose (0.3%, 0.4%, or 0.5%)

Density

Density (grams per cubic centimeter)

Source

The experiment was conducted in Santiago, Chile, in 2007

Used in

Chapter 11

CPIdata

Description

Consumer price index data for six countries (2012)

Variables

country

Country name

CPI

Consumer price index (relative to New York at 100)

Source

Url: www.numbeo.com/cost-of-living/rankings_by_country.jsp

Used in

Chapter 4

customers

Description

The names and addresses of five customers living in the area of Reading, Berkshire, UK. This is fictional data

Variables

Name

Character string giving customer's full name

Address

Character string giving customer's full address

Used in

Chapter 3

endangered

Description

Conservation status of four big cat species

Variables

Name

Name of species

Status

Conservation status

Source

Url: www.bigcats.com/redlist.php

Used in

Chapter 4

fiveyearreport

Description

UK Sales (including VAT) for the years 2007 to 2011 for the Tesco, Sainsburys, and Morrisons supermarket chains

Variables

Year

Year (2007–2011)

Tesco

UK sales including VAT (£M) for Tesco

Sainsburys

UK sales including VAT (£M) for Sainsburys

Morrisons

UK sales including VAT (£M) for Morrisons

Source

Data collected from respective annual reports for 2007–2011 Url:
www.tescoplc.com/files/pdf/reports/tesco_annual_report_2011.pdf
www.tescoplc.com/files/pdf/reports/annual_report_2010.pdf(p16)
www.tescoplc.com/files/pdf/reports/annual_report_2009.pdf(p34)
www.tescoplc.com/files/pdf/reports/annual_report_2008.pdf(p5)
www.tescoplc.com/files/pdf/reports/annual_report_2007.pdf(p3)
www.j-sainsbury.co.uk/investor-centre/financial-performance/5-year-summary/
www.morrisons.co.uk/corporate/2011/annualreport/investor-information/five-year-summary-results/

Used in

Chapter 9

flights

Description

Flight data for seven flights departing from Southampton Airport on January 12 and 13, 2012

Variables

Date

Date of flight in format dd/mm/yyyy

Time

Time of flight in format hh:mm

Flight.Number

Alphanumeric flight number

Destination

Name of destination city

Source

Url: www.southamptonairport.com

Used in

Chapter 3

fruit

Description

Dataset of UK fruit prices in August 2012

Variables

Product

Product name

Price

Sale price (£)

Unit

Sale unit

Source

Url: www.sainsburys.co.uk

Used in

Chapter 3

grades1

Description

Fictional dataset giving the grades of 15 students belonging to three classes labeled A, B, and C.

Variables

ClassA

Grades of students in class A (%)

ClassB

Grades of students in class B (%)

ClassC

Grades of students in class C (%)

Used in

Chapters 4, 10

people

Description

Physical characteristics for a sample of 16 people. Sample selected using nonrandom methods. Data is self-reported

Variables

Subject

Respondent number

Eye.Color

Eye color (Blue, Green, or Brown)

Height

Height (centimeters)

Hand.Span

Hand span of left hand (millimeters)

Sex

Sex (1=Male, 2=Female)

Handedness

Handedness (L=left-handed, R=right-handed)

Used in

Chapter 3

people2

Description

This dataset is a clean version of the people dataset.

Variables

Subject

Respondent number

Eye.Color

Eye color (Blue, Green, or Brown)

Height

Height (centimeters)

Hand.Span

Hand span of left hand (millimeters)

Sex

Sex (Male or Female)

Handedness

Handedness (Left or Right)

Height.Cat

Height category (Tall, Medium, or Short)

Used in

Chapters 6, 8, 11

powerplant

Description

Thirty measurements of pressure, temperature, and output for a gas electrical turbine

Variables

Pressure

Pressure inside turbine (millibars)

Temp

Temperature inside turbine (degrees Celsius)

Output

Output of turbine (megawatts)

Source

Collected from a gas electrical turbine at a UK power station in 2010

Used in

Chapter 11

pulserates

Description

Pulse data for four people. Sample selected using nonrandom methods. Data is self-reported

Variables

Patient

Patient identifier

Pulse1

First pulse reading (beats per minute)

Pulse2

Second pulse reading (beats per minute)

Pulse3

Third pulse reading (beats per minute)

Used in

Chapter 3

resistance

Description

Gives the results of a simple experiment to compare the cubic resistance of four concrete formulations, at three, seven, and fourteen days after setting

Variables

Formula

A (Huechuraba Aggregate + Additive A)
B (Huechuraba Aggregate + Additive B)
C (Mauro Aggregate + Additive A)
D (Mauro Aggregate + Additive B)

Day3

Cubic resistance (kilograms per square meter), measured three days after setting

Day7

Cubic resistance (kilograms per square meter), measured seven days after setting

Day14

Cubic resistance (kilograms per square meter), measured fourteen days after setting

Source

Experiment conducted in Santiago, Chile, in 2007

Used in

Chapter 4

supermarkets

Description

Data for four UK supermarket chains

Variables

Chain

Name of chain

Stores

Number of stores in the UK

Sales.Area

Sales area (1,000 square feet)

Market.Share

Market share (%)

Source

Data for number of stores and total sales area is collected from the respective 2011 annual reports: www.tescoplc.com/media/417/tesco_annual_report_2011_final.pdf
www.j-sainsbury.co.uk/investor-centre/reports/2011/annual-report-and-financial-statements-2011/
www.morrisons.co.uk/Documents/Morrisons-Annual-Report-2011.pdf
Market share is collected from Kantar Worldpanel:
www.kamcity.com/namnews/mktshare/2011/kantar-march11.htm

Used in

Chapter 2

vitalsigns

Description

Measurements of systolic blood pressure, diastolic blood pressure, and pulse rate for four patients. This is fictional data

Variables

subject

Patient identifier

test

Name of parameter (SysBP, DiaBP, Pulse)

result

Systolic blood pressure (mmHg), diastolic blood pressure (mmHg), or pulse (beats per minute)

Used in

Chapter 4

WHOdata

Description

Data on alcohol consumption and mortality rate for five countries

Variables

alcohol

Alcohol consumption per adult over 15 years (liters of pure alcohol per person per year)

mortality

Adult mortality rate (probability of dying between 15 and 60 years, per 1000 of population)

Source

Collected from the WHO website: apps.who.int/ghodata

Used in

Chapter 4

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.118.186.202