Matt Wiley and

Joshua F. Wiley

Advanced R

Data Programming and the Cloud

Matt Wiley

Elkhart Group Ltd. & Victoria College, Columbia City, Indiana, USA

Joshua F. Wiley

Elkhart Group Ltd. & Victoria College, Columbia City, Indiana, USA

Any source code or other supplementary materials referenced by the author in this text are available to readers at www.apress.com . For detailed information about how to locate your book’s source code, go to www.apress.com/source-code/ . Readers can also access source code at SpringerLink in the Supplementary Material section for each chapter.

ISBN 978-1-4842-2076-4

e-ISBN 978-1-4842-2077-1

DOI 10.1007/978-1-4842-2077-1

Library of Congress Control Number: 2016959581

© Matt Wiley and Joshua F. Wiley 2016

This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed.

Trademarked names, logos, and images may appear in this book. Rather than use a trademark symbol with every occurrence of a trademarked name, logo, or image, we use the names, logos, and images only in an editorial fashion and to the benefit of the trademark owner, with no intention of infringement of the trademark. The use in this publication of trade names, trademarks, service marks, and similar terms, even if they are not identified as such, is not to be taken as an expression of opinion as to whether or not they are subject to proprietary rights.

While the advice and information in this book are believed to be true and accurate at the date of publication, neither the authors nor the editors nor the publisher can accept any legal responsibility for any errors or omissions that may be made. The publisher makes no warranty, express or implied, with respect to the material contained herein.

Printed on acid-free paper

Distributed to the book trade worldwide by Springer Science+Business Media New York, 233 Spring Street, 6th Floor, New York, NY 10013. Phone 1-800-SPRINGER, fax (201) 348-4505, e-mail [email protected], or visit www.springeronline.com. Apress Media, LLC is a California LLC and the sole member (owner) is Springer Science + Business Media Finance Inc (SSBM Finance Inc). SSBM Finance Inc is a Delaware corporation.

To Family.

Introduction

R has become one of the most popular programming languages in an era where data science is increasingly prevalent. As R and data science have become more mainstream, there is a growing number of R users without dedicated training in statistical computing or data science, and thus a growing demand for books and resources to bridge the gap between applied users who may have only an introductory background in statistics or programming and advanced and sophisticated data analytics. This book focuses on how to use advanced programming in R to speed up everyday tasks in data analysis and data science. This book is also unique in its coverage of how to set up R in the cloud and generate dynamic reports for analyses that are regularly repeated, such as monthly analysis of company sales or quarterly analysis of student grades, enrollment, and dropout numbers in schools with projections for future enrollment rates.

Chapters 1 through 6 focus on more advanced programming techniques than the Apress offering of Beginning R .

Chapters 710 develop powerful data management measures including the exciting and (comparatively) new data.table .

From here, we delve into the modern (and slightly edgy) world of cloud computing with R. From the ground up, we walk you through getting R started on an Amazon cloud in chapters 1114 .

Finally, Chapter 15 provides you with solid techniques in dynamic documents and reports.

Acknowledgments

We would like to profusely thank our technical reviewer, Andrew Moskowitz. Through direct comments in chapters, e-mails about proper explanations, and Skype calls, Andrew gave us a lot of thoughtful feedback. If our readers feel that any portion explains a technique well, that is thanks to his efforts; the errors of course remain ours alone.

Mark Powers has been extraordinarily kind to us, and this book would not be here without his advocacy and support. Steve Anglin also deserves thanks for working with us to start this project. Truly, if you look at the very front of this book, there is an entire team at Apress who deserve rich and warm thanks.

Contents

  1. Chapter 1:​ Programming Basics
    1. Advanced R Software Choices
    2. Reproducing Results
    3. Types of Objects
    4. Base Operators and Functions
    5. Mathematical Operators and Functions
    6. References
  2. Chapter 2:​ Programming Utilities
    1. Help and Documentation
    2. System and Files
    3. Input
    4. Output
    5. References
  3. Chapter 3:​ Programming Automation
    1. Loops
    2. Flow Control
    3. *apply Family of Functions
    4. Final Thoughts
  4. Chapter 4:​ Writing Functions
    1. Components of a Function
    2. Scoping
    3. Functions for Functions
    4. Debugging
    5. Summary
  5. Chapter 5:​ Writing Classes and Methods
    1. S3 System
      1. S3 Classes
      2. S3 Methods
    2. S4 System
      1. S4 Classes
      2. S4 Class Inheritance
      3. S4 Methods
    3. Summary
  6. Chapter 6:​ Writing a Package
    1. Before You Get Started
      1. Version Control
    2. R Package Basics
      1. Starting a Package by Using DevTools
      2. Adding R Code
      3. Tests
    3. Documentation Using roxygen2
      1. Functions
      2. Data
      3. Classes
      4. Methods
    4. Building, Installing, and Distributing an R Package
    5. Summary
  7. Chapter 7:​ Introduction to Data Management Using data.​table
    1. Introduction to data.​table
    2. Selecting and Subsetting Data
      1. Using the First Formal
      2. Using the Second Formal
      3. Using the Second and Third Formals
    3. Variable Renaming and Ordering
    4. Computing on Data and Creating Variables
    5. Merging and Reshaping Data
      1. Merging Data
      2. Reshaping Data
    6. Summary
  8. Chapter 8:​ Data Munging with data.​table
    1. Data Munging /​ Cleaning
      1. Recoding Data
      2. Recoding Numeric Values
    2. Creating New Variables
    3. Fuzzy Matching
    4. Summary
  9. Chapter 9:​ Other Tools for Data Management
    1. Sorting
    2. Selecting and Subsetting
    3. Variable Renaming and Ordering
    4. Computing on Data and Creating Variables
    5. Merging and Reshaping Data
    6. Summary
  10. Chapter 10:​ Reading Big Data(bases)
    1. SQLite
      1. Installing SQLite on Windows
      2. SQLite and R
    2. PostgreSQL
      1. Installing PostgreSQL on Windows
      2. PostgreSQL and R
    3. MongoDB
      1. Installing MongoDB on Windows
      2. MongoDB and R
    4. Summary
  11. Chapter 11:​ Getting a Cloud
    1. Disclaimers
    2. Starting Amazon Web Services
    3. Accessing Your Instance’s Command Line
    4. Uploading Files to Your Instance
    5. Final Thoughts
  12. Chapter 12:​ Cloud Ubuntu for Windows Users
    1. Common Commands
    2. Superuser and Security
    3. Installing and Using R
    4. Installing and Using RStudio Server
    5. Installing Microsoft R
    6. Installing Java
    7. Installing Shiny on Your Cloud
    8. Final Thoughts
  13. Chapter 13:​ Every Cloud has a Shiny Lining
    1. The Basics of Shiny
    2. Shiny in Motion
    3. Uploading a User File into Shiny
    4. Hosting Shiny in the Cloud
    5. Final Thoughts
  14. Chapter 14:​ Shiny Dashboard Sampler
    1. A Dashboard’s Bones
      1. Dashboard Header
      2. Dashboard Sidebar
      3. Dashboard Body
    2. Dashboard in the Cloud
    3. Complete Sampler Code
    4. References
  15. Chapter 15:​ Dynamic Reports and the Cloud
    1. Needed Software
      1. Local Machine
      2. Cloud Instance
    2. Dynamic Documents
    3. Dynamic Documents and Shiny
      1. server.​R
      2. ui.​R
      3. report.​Rmd
    4. Uploading to the Cloud
    5. Summary
  16. References
  17. Index

About the Authors and About the Technical Reviewer

About the Authors

A393929_1_En_BookFrontmatter_Figb_HTML.jpg

Matt Wiley is a tenured, associate professor of mathematics with awards in both mathematics education and honor student engagement. He earned degrees in pure mathematics, computer science, and business administration through the University of California and Texas A&M systems. He serves as director for Victoria College’s quality enhancement plan and managing partner at Elkhart Group Limited, a statistical consultancy. With programming experience in R, C++, Ruby, Fortran, and JavaScript, he has always found ways to meld his passion for writing with his joy of logical problem solving and data science. From the boardroom to the classroom, Matt enjoys finding dynamic ways to partner with interdisciplinary and diverse teams to make complex ideas and projects understandable and solvable.

A393929_1_En_BookFrontmatter_Figc_HTML.jpg

Joshua F. Wiley is a lecturer in the Monash Institute for Cognitive and Clinical Neurosciences and School of Psychological Sciences at Monash University and a senior partner at Elkhart Group Limited, a statistical consultancy. He earned his PhD from the University of California, Los Angeles, and his research focuses on using advanced quantitative methods to understand the complex interplays of psychological, social, and physiological processes in relation to psychological and physical health. In statistics and data science, Joshua focuses on biostatistics and is interested in reproducible research and graphical displays of data and statistical models. Through consulting at Elkhart Group Limited and former work at the UCLA Statistical Consulting Group, he has supported a wide array of clients ranging from graduate students, to experienced researchers, to biotechnology companies. He also develops or co-develops a number of R packages including varian , a package to conduct Bayesian scale-location structural equation models, and MplusAutomation , a popular package that links R to the commercial Mplus software.

About the Technical Reviewer

A393929_1_En_BookFrontmatter_Figd_HTML.jpg

Andrew Moskowitz is a doctoral candidate in quantitative psychology at the University of California, Los Angeles, and a self-employed statistical consultant. His quantitative research focuses mainly on hypothesis testing and effect sizes in mixed-effects models. While at UCLA, Andrew has collaborated with a number of faculty, students, and enterprises to help them derive meaning from data across an array of fields ranging from psychological services and health care delivery to marketing.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.14.141.115