In an Internet online advertisement, a job vacancy advertisement for a statistician reads as follows:
Job Summary |
Statistician I |
Salary: Open |
Employer: XYZ Research and Statistics |
Location: City X, State Y |
Type: Full time – entry level |
Category: Financial analyst/Statistics, Data analysis/processing, Statistical organization & administration |
Required Education: Masters Degree preferred |
XYZ Research and Statistics is a national leader in designing, managing, and analyzing financial data. XYZ partners with other investigators to offer respected statistical expertise supported by sophisticated web-based data management systems. XYZ services assure timely and secure implementation of trials and reliable data analyses.
Job Description |
Position Summary: An exciting opportunity is available for a statistician to join a small but growing group focused on financial investment analysis and related translational research. XYZ, which is located in downtown City XX, is responsible for the design, management, and analysis of a variety of investment and financial, as well as the analysis of associated market data. The successful candidate will collaborate with fellow statistics staff and financial investigators to design, evaluate, and interpret investment studies. |
Primary Duties and Responsibilities: Analyzes investment situations and associated ancillary studies in collaboration with fellow statisticians and other financial engineers. Prepares tables, figures, and written summaries of study results; interprets results in collaboration with other financial; and assists in preparation of manuscripts. Provides statistical consultation with collaborating staff. Performs other job-related duties as assigned. |
Requirements |
Required Qualifications: Masters Degree in Statistics, Applied Mathematics, or related field. Sound knowledge of applied statistics. Proficiency in statistical computing in R. |
Preferred Responsibilities/Qualifications: Statistical consulting experience. S-Plus or R programming language experience. Experience with analysis of high-dimensional data. Ability to communicate well orally and in writing. Excellent interpersonal/teamwork skills for effective collaboration. Spanish language skills a plus. |
*In your cover letter, describe how your skills and experience match the qualifications for the position. |
To learn more about XYZ, visit www.XYZ.org. |
Clearly, one should be cognizant of the overt requirement of an acceptable level of professional proficiency in data analysis using R programming!
Even if one is not in such a job market, as a statistician working in the fields of Finance, Asset Allocations, Portfolio Optimization, and so on, a skill set that would include R programming would be helpful and interesting.
Data are facts or figures from which conclusions can be drawn. When the data have been recorded, classified, organized, related, or interpreted within a framework so that meaning emerges, they become information. There are several steps involved in turning data into information, and these steps are known as data processing. This section describes data processing and how computers perform these steps efficiently and effectively.
It will be indicated that many of these processing activities may be undertaken using R programming, or performed in an R environment with the aid of available R packages – where R functions and data sets are stored.
The simplified flowchart shows how raw data are transformed into information:
Data processing takes place once all of the relevant data have been collected. They are gathered from various sources and entered into a computer where they can be processed to produce information – the output.
Data processing includes the following steps:
First, before raw data can be entered into a computer, they must be coded. To do this, survey responses must be labeled, usually with simple, numerical codes. This may be done by the interviewer in the field or by an office employee. The data coding step is important because it makes data entry and data processing easier.
Surveys have two types of questions – closed questions and open questions. The responses to these questions affect the type of coding performed.
A closed question means that only a fixed number of predetermined survey responses are permitted. These responses will have already been coded.
The following question, in a survey on sporting activities, is an example of a closed question:
To what degree is sport important in providing you with the following benefits?
An open question implies that any response is allowed, making subsequent coding more difficult. In order to code an open question, the processor must sample a number of responses, and then design a code structure that includes all possible answers.
The following code structure is an example of an open question:
What sports do you participate in?
Specify (28 characters)__________________________________
In the Census and almost all other surveys, the codes for each question field are premarked on the questionnaire. To process the questionnaire, the codes are entered directly into the database and are prepared for data capturing. The following is an example of premarked coding:
What language does this person speak most often at home?
There are programs in use that will automate repetitive and routine tasks. Some of the advantages of an automated coding system are that the process increasingly becomes
The next step in data processing is inputting the coded data into a computer database. This method is known as data capture.
This is the process by which data are transferred from a paper copy, such as questionnaires and survey responses, to an electronic file. The responses are then put into a computer. Before this procedure takes place, the questionnaires must be groomed (prepared) for data capture. In this processing step, the questionnaire is reviewed to ensure that the entire minimum required data have been reported, and that they are decipherable. This grooming is usually performed during extensive automated edits.
There are several methods used for capturing data:
Some modern examples of data input devices are
Once data have been entered into a computer database, the next step is ensuring that all of the responses are accurate. This method is known as data editing.
Data should be edited before being presented as information. This action ensures that the information provided is accurate, complete, and consistent. There are two levels of data editing – micro- and macroediting.
Microediting corrects the data at the record level. This process detects errors in data through checks of the individual data records. The intent at this point is to determine the consistency of the data and correct the individual data records.
Macroediting also detects errors in data, but does this through the analysis of aggregate data (totals). The data are compared with data from other surveys, administrative files, or earlier versions of the same data. This process determines the compatibility of data.
Editing is of little value to the overall improvement of the actual survey results, if no corrective action is taken when items fail to follow the rules set out during the editing process. When all of the data have been edited using the applied rules and a file is found to have missing data, then imputation is usually done as a separate step.
Nonresponse and invalid data definitely impact the quality of the survey results.
Imputation resolves the problems of missing, invalid, or incomplete responses identified during editing, as well as any editing errors that might have occurred.
At this stage, all of the data are screened for errors because respondents are not the only ones capable of making mistakes; errors can also occur during coding and editing.
Some other types of imputation methods include the following:
Imputation methods can be performed automatically, manually, or in combination.
Quality is an essential element at all levels of processing. To ensure the quality of a product or service in survey development activities, both quality assurance and quality control methods are used.
Quality assurance refers to all planned activities necessary in providing confidence that a product or service will satisfy its purpose and the users' needs. In the context of survey conducting activities, this can take place at any of the major stages of survey development: planning, design, implementation, processing, evaluation, and dissemination.
This approach anticipates problems prior to their unexpected occurrences, and uses all available information to generate improvements. It is not restricted to any specific quality, the planning stage, and is all-encompassing in its activities standards. It is applicable mostly at the planning stage, and is all-encompassing in its activities.
Quality control is a regulatory procedure through which one may measure quality, with preset standards, and then act on any differences. Examples of this include controlling the quality of the coding operation, the quality of the survey interviewing, and the quality of the data capture.
Quality control responds to observed problems, using ongoing measurements to make decisions on the processes or products. It requires a prespecified quality for comparability. It is applicable mostly at the processing stage following a set procedure that is a subset of quality assurance.
The quality of the data must be defined and assured in the context of being “fit for use,” which will depend on the intended function of the data and the fundamental characteristics of quality. It also depends on the users' expectations of what is considered to be useful information.
There is no standard definition among statistical agencies for the term “official Statistics.” There is a generally accepted, but evolving, range of quality issues underlying the concept of “fitness for use”. These elements of quality need to be considered and balanced in the design and implementation of an agency's statistical program.
The following is a list of the elements of quality:
These elements of quality tend to overlap. Just as there is no single measure of accuracy, there is no effective statistical model for bringing together all these characteristics of quality into a single indicator. Also, except in simple or one-dimensional cases, there is no general statistical model for determining whether one particular set of quality characteristics provides higher overall quality than another.
After editing, data may be processed further to produce a desired output. The computer software used to process the data will depend on the form of output required. Software applications for word processing, desktop publishing, graphics (including graphing and drawing), programming, databases, and spreadsheets are commonly used. The following are some examples of ways that software can produce data:
R is an open-source, freely available, integrated software environment for data manipulation, computation, analysis, and graphical display.
The R environment consists of the following:
The term “environment” is used to show that it is indeed a planned and coherent system.
R and Statistics
R was initially written by Robert Gentleman and Ross Ihaka of the Statistics Department of the University of Auckland, New Zealand, in 1997. Since then there has been the R-development core group of about 20 people with write access to the R source code.
The original introduction to the R environment, evolved from the S/S-Plus languages, was not primarily directed toward statistics. However, since its development in the 1990s, it appeared to have been “hijacked” by many working in the areas of classical and modern statistical techniques, including many applications in financial engineering, econometrics, biostatistics with respect to epidemiology, public health and preventive medicine. These applications have led to the raison d'état for writing this book.
As of this writing, the latest version of R is R-3.3.2, officially released on October 31, 2016. The primary source of R packages is the Comprehensive R Archive Network, CRAN, at http://cran.r-project.org/. Another source of R packages may be found in numerous publications, for example, the Journal of Statistical Software, now at its 45th volume, is available at http://www.jstatsoft.org/v45.
Let us get started (the R-3.3.2 version environment is being used here). Recall in Section 4.1, the R environment was obtained as follows:
Here is R:
Let us download the open-source high-level program R from the Internet and take a first look at the R computing environment.
Remark: Access the Internet at the website of CRAN (The Comprehensive
R Archive Network: http://cran.r-project.org/
To install R: R-3.3.2-win32.exe http://www.r-project.org/
=> download R
=> Select: USA
http://cran.cnr.Berkeley.edu <http://cran.cnr.berkeley.edu/ >
University of California, Berkeley, CA
=> http://cran.cnr.berkeley.edu/
=> Windows (95 and later)
=> base
=> R-3.3.2-win32.exe
AFTER the down-loading:
=> Double-click on: R-3.3.2-win32.exe
(on the DeskTop) to un-zip & install R
=> An icon (Script R 3.3.2) will appear on ones Computer
“desktop” as follows: Figure 4.1
On the computer “desktop” is the R icon:
In this book, the following special color scheme legend will be used for all statements during the computational activities in the R environment, to clarify the various inputs to and outputs from the computational process:
Note: The # sign is the comment character: All text in the line following this sign is treated as a comment by the R program, that is, no computational action will be taken regarding such a statement. That is, the computational activities will proceed as though the comment statements are ignored. These comment statements help the programmer and user by providing some clarification of the purposes involved in the remainder of the R environment. The computations will proceed even if these comment statements are eliminated.
# is known as the number sign, it is also known as the pound sign/key, the hash key, and, less commonly, as the octothorp, octothorpe, octathorp, octotherp, octathorpe, and octatherp.
To use R under Windows: Double-click on the R 3.3.2 icon…
Upon selecting and clicking on R, the R window opens with the following declaration:
> # This is the R computing environment.
> # Computations may begin now!
> # First, use R as a calculator, and try a simple arithmetic
> # operation, say: 1 + 1
> 1+1
> [1] 2 # This is the output!
> # WOW! It's really working!
> # The [1] in front of the output result is part of R’s way of printing
> # numbers and vectors. Although it is not so useful here, it does
> # become so when the output result is a longer vector
From this point on, this book is most beneficially read with the R environment at hand. It will be a most effective learning experience if one practices each R command as one goes along the textual materials.
This section introduces some important and practical features of the R environment (Figure 4.2). Login and start an R session in the Windows system of the computer
>
> # This is the R environment.
> help.start() # Outputting the page shown in Figure 4.1
> # Statistical Data Analysis Manuals
starting httpd help server ... done
If nothing happens, you should open
‘http://127.0.0.1:28103/doc/html/index.html’ yourself
At this point, explore the HTML interface to on-line help right from the desktop, using the mouse pointer to note the various features of this facility available within the R environment. Then, returning to the R environment:
> help.start()
Carefully read through each of the sections under “Manuals” – to obtain an introduction to the basic language of the R environment. Then look through the items under “Reference” to reach beyond the elementary level, including access to the available “R Packages” – all R functions and datasets are stored in packages.
For example, if one selects the Packages Reference, the following R Package Index window will open up, showing Figure 4.3, listing a collection of R program packages under the R library: C:Program FilesRR-2.14.1library
One may now access each of these R program packages, and use them for further applications as needed.
Returning to the R environment:
>
> x <- rnorm(100)
> # Generating a pseudo-random 100-vector x
> y <- rnorm(x)
> # Generating another pseudo-random 100-vector y
> plot (x, y)
> # Plotting x vs. y in the plane, resulting in a graphic
> # window: Figure 4.4.
Remark: For reference, Appendix 1 contains the CRAN documentation of the R function plot(), available for graphic outputting, which may be found by the R code segment:
> ?plot
CRAN has documentations for many R functions and packages.
Again, returning to the R workspace, and enter
>
>
> ls() # (This is a lower-case “L” followed by “s”, viz., the ‘list’
> # command.)
> # (NOT 1 = “ONE” followed by “s”)
> # This command will list all the R objects now in the
> # R workspace:
> # Outputting:
[1] "E" "n" "s" "x" "y" "z"
Again, returning to the R workspace, and enter
>
> rm (x, y) # Removing all x and all y from the R workspace
> x # Calling for x
Error: object 'x' not found
> # Of course, the xs have just been removed!
> y # Calling for y
Error: object 'y' not found # Because the ys have also been
# removed!
>
> x <- 1:10 # Let x = (1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
> x # Outputting x (just checking!)
[1] 1 2 3 4 5 6 7 8 9 10
> w <- 1 + sqrt(x)/2 # w is a weighting vector of
> # standard deviations
> dummy <- data.frame (x = x, y = x + rnorm(x)*w)
> # Making a data frame of 2 columns, x, and y, for inspection
> dummy # Outputting the data frame dummy
x y
1 1 1.311612
2 2 4.392003
3 3 3.669256
4 4 3.345255
5 5 7.371759
6 6 -0.190287
7 7 10.835873
8 8 4.936543
9 9 7.901261
10 10 10.712029
>
> fm <- lm(y∼x, data=dummy)
> # Doing a simple Linear Regression
> summary(fm) # Fitting a simple linear regression of y on x,
> # then inspect the analysis, and outputting:
Call:
lm(formula = y ∼ x, data = dummy)
Residuals:
Min 1Q Median 3Q Max
-6.0140 -0.8133 -0.0385 1.7291 4.2218
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.0814 2.0604 0.525 0.6139
x 0.7904 0.3321 2.380 0.0445*
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 3.016 on 8 degrees of freedom
Multiple R-squared: 0.4146, Adjusted R-squared: 0.3414
F-statistic: 5.665 on 1 and 8 DF, p-value: 0.04453
> fm1 <- lm(y∼x, data=dummy, weight=1/w^2)
> summary(fm1) # Knowing the standard deviation,
> # then doing a weighted
> # regression and outputting:
> # regression and outputting:
Call:
lm(formula = y ∼ x, data = dummy, weights = 1/w^2)
Residuals:
Min 1Q Median 3Q Max
-2.69867 -0.46190 -0.00072 0.90031 1.83202
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.2130 1.6294 0.744 0.4779
x 0.7668 0.3043 2.520 0.0358*
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 1.356 on 8 degrees of freedom
Multiple R-squared: 0.4424, Adjusted R-squared: 0.3728
F-statistic: 6.348 on 1 and 8 DF, p-value: 0.03583
> attach(dummy) # Making the columns in the data
> # frame as variables
The following object(s) are masked _by_ '.GlobalEnv': x
> lrf <- lowess(x, y) # a non-parametric local
> # regression functionlrf
> plot (x, y) # Making a standard point plot, outputting: Figure 4.5.
.
Remark: For reference, Appendix 1 contains the CRAN documentation of the R function plot(), available for graphic outputting, which may be found by the R code segment:
> ?plot
> # CRAN has documentations for many R functions and packages.
Again, returning to the R workspace, and enter
>
> ls() # (This is a lower-case “L” followed by “s”, viz., the ‘list’
> # command.)
> # (NOT 1 = “ONE” followed by “s”)
> # This command will list all the R objects now in the
> # R workspace:
> # Outputting:
[1] "E" "n" "s" "x" "y" "z"
Again, returning to the R workspace, and enter
>
> rm (x, y) # Removing all x and all y from the R workspace
> x # Calling for x
Error: object 'x' not found
> # Of course, the xs have just been removed!
>
> y # Calling for y
Error: object 'y' not found
> # Because the ys have been removed too!
>
> x <- 1:10 # Let x = (1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
> x # Outputting x (just checking!)
[1] 1 2 3 4 5 6 7 8 9 10
> w <- 1 + sqrt(x)/2 # w is a weighting vector of
> # standard deviations
> dummy <- data.frame (x = x, y = x + rnorm(x)*w)
> # Making a data frame of 2 columns, x, and y, for inspection
> dummy # Outputting the data frame dummy
x y
1 1 1.311612
2 2 4.392003
3 3 3.669256
4 4 3.345255
5 5 7.371759
6 6 -0.190287
7 7 10.835873
8 8 4.936543
9 9 7.901261
10 10 10.712029
> fm <- lm(y∼x, data=dummy)
> # Doing a simple Linear Regression
> summary(fm)
> # Fitting a simple linear regression of y on x,
> # then inspect the analysis, and outputting!:
Call:
lm(formula = y ∼ x, data = dummy)
Residuals:
Min 1Q Median 3Q Max
-6.0140 -0.8133 -0.0385 1.7291 4.2218
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.0814 2.0604 0.525 0.6139
x 0.7904 0.3321 2.380 0.0445 *
--
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 3.016 on 8 degrees of freedom
Multiple R-squared: 0.4146, Adjusted R-squared: 0.3414
F-statistic: 5.665 on 1 and 8 DF, p-value: 0.04453
> fm1 <- lm(y∼x, data=dummy, weight=1/w^2)
> summary(fm1) # Knowing the standard deviation,
> # then doing a weighted
> # regression and outputting:
> # regression and outputting:
Call:
lm(formula = y ∼ x, data = dummy, weights = 1/w^2)
Residuals:
Min 1Q Median 3Q Max
-2.69867 -0.46190 -0.00072 0.90031 1.83202
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.2130 1.6294 0.744 0.4779
x 0.7668 0.3043 2.520 0.0358 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 1.356 on 8 degrees of freedom
Multiple R-squared: 0.4424, Adjusted R-squared: 0.3728
F-statistic: 6.348 on 1 and 8 DF, p-value: 0.03583
> attach(dummy) # Making the columns in the data frame as
> # variables
> lrf <- lowess(x, y)
> lrf
> plot (x, y) # Making a standard point plot,
> # outputting: Figure 4.6
> abline(0, 1, lty=3) # adding in the true regression line:
> # (Intercept = 0, Slope = 1),
> # outputting: Figure 4.7.
> abline(coef(fm)) # adding in the unweighted regression line:
> # outputting Figure 4.8
abline(coef(fm1), col="red")
# adding in the weighted regression line:
# outputting Figure 4.8.
detach() # Removing data frame from the search path
plot(fitted(fm), resid(fm), # Doing a standard diagnostic
+ # plot
+ xlab="Fitted values", # to check for heteroscedasticity**,
+ ylab="residuals", # viz., checking for differing variance.
+ main="Residuals vs Fitted")
# Outputting Figure 4.9.
**Heteroskedasticity occurs when the variance of the error terms differ across observations.
qqnorm(resid(fm), main="Residuals Rankit Plot")
# Doing a normal scores plot to check for skewness, kurtosis,
+ # and
# outliers.
# (Not very useful here.) Outputting
Figure 4.11.
rm(fm, fm1, lrf, x, dummy) # Removing these 5 objects
fm
Error: object 'fm' not found # Checked!
fm1
Error: object 'fm1' not found # Checked!
lrf
Error: object 'lrf' not found # Checked!
x
Error: object 'x' not found # Checked!
dummy
Error: object 'dummy' not found # Check!
# END OF THIS PRACTICE SESSION!
Getting through the First Session in Section 4.2.1 shows:
Technically, R is an expression language with a simple syntax, which is almost self-explanatory. It is case sensitive, so x and X are different symbols and refer to different variables. All alphanumeric symbols are allowed, plus ‘.’ and ‘-’, with the restriction that a name must start with '.' or a letter, and if it starts with ‘.’ the second character must not be a digit. The command prompt > indicates when R is ready for input.
This is where one types commands to be processed by R, which will happen when one hits the ENTER key. Commands consist of either expressions or assignments. When an expression is given as a command, it is immediately evaluated, printed, and the value is discarded. An assignment evaluates an expression and passes the value to a variable – but the value is not automatically printed. To print the computed value, simple enter the variable again at the next command.Commands are separated either by a new line, or separated by a semicolon (‘;’). Several elementary commands may be grouped together into one compound expression by braces (‘{‘ and ‘}').
Comments, starting with a hashmark/number sign (‘#’), may be put almost anywhere: everything to the end of the line following this sign is a comment.
Comments may not be used in an argument list of a function definition or inside strings. If a command is not complete at the end of a line, R will give a different prompt, a “+” sign, by default.
On the second and subsequent lines, continue to read input until the command is completed syntactically. The result of a command is printed to the output device: If the result is an array, such as a vector or a matrix, then the elements are formatted with line break (wherever necessary) with the indices of the leading entries labeled in square brackets: [index].
For example, an array of 15 elements may be outputted as:
> array(8, 15)
[1] 8 8 8 8 8 8 8 8 8 8
[11] 8 8 8 8 8
The labels ‘[1]’ and ‘[11]’ indicate the 1st and
11th elements in the output.
These labels are not part of the data itself!
Similarly, the labels for a matrix are placed at the
start of each row and column in the output.
For example, for the 3 x 5 matrix M, it is outputted
as:
>
> M <- matrix(1:15, nrow=3)
> M
[,1] [,2] [,3] [,4] [,5]
[1,] 1 4 7 10 13
[2,] 2 5 8 11 14
[3,] 3 6 9 12 15
>
Note that the storage is a column-major, namely, the elements of the first column are printed out first followed by those of the second column, and so on. To cause a matrix to be filled in a row-wise manner, rather than the default column-wise fashion, the additional switch byrow=T will cause the matrix to be filled row-wise rather than column-wise:
>
> M <- matrix(1:15, nrow=3, byrow=T)
> M
[,1] [,2] [,3] [,4] [,5]
[1,] 1 2 3 4 5
[2,] 6 7 8 9 10
[3,] 11 12 13 14 15
>
The First Session also shows that there is a host of helpful resources imbedded in the R environment that one can readily access, using the online help provided by CRAN.
Please follow the step-by-step instructions given in Section 4.2 to set up an R environment. The R window show looks like this:
>
Great!
Now enter the following arithmetic operations: press “Enter” after each entry
(a) 2 + 3 <Enter>
(b) 13 – 7 <Enter>
(c) 17 * 23 <Enter)
(d) 100/25 <Enter>10/25
(e) Did you obtain the following results:
5, 6, 391, 4?
> th <- seq(-pi, pi, len=20)
> th (a) How many numbers are printed out?
> z <- exp(1i*th)
> z (b) How many complex numbers are printed out?
> par(pty="s")
(c) Along the menu-bar at the top of the R environment:
* Select and left-click on “Window”:, then
* Move downwards and select the 2nd option:
R Graphic Device 2 (ACTIVE)
* Go to the “R Graphic Device 2 (ACTIVE) Window”
(d) What is there?
> plot(z)
(e) Describe what is in the Graphic Device 2 Window.
To learn to do statistical analysis and computations, one may start by considering the R programming language as a simple calculator.
Start from here: just enter an arithmetic expression, press the <Enter> key, and the answer from the machine is found in the next line
>
> 2 + 3
[1] 5
>
OK! What about other calculations? Such as: 13 - 7, 3 x 5, 12 /4, 72,
√2, e3, eiπ, ln 5 = loge5, (4 + √3) (4 – √3), (4 + i√3)(4 - i√3),…
and so on. Just try:
>
> 13 - 7
[1] 6
> 3*5
[1] 15
> 12/4
[1] 3
> 7^2
[1] 49
> sqrt(2)
[1] 1.414214
>
> exp(3)
[1] 20.08554
>
> exp(1i*pi) [1i is used for the complex number
i = √-1.]
[1] -1-0i [ This is just the famous Euler’s Identity
equation: eiπ +1 = 0.]
> log(5)
[1] 1.609438
> (4 + sqrt(3))*(4 - sqrt(3))
[1] 13
[Checking: (4+√3) (4-√3) = 42 - (√3)2 = 16 - 3 = 13 (Checked!)]
> (4 + 1i*sqrt(3))*(4 - 1i*sqrt(3))
[1] 19+0i [Checking: (4+i√3)(4-i√3) = 42 -(i√3)2
= 16 - (-3) = 19 (Checked!)
Remark: The [1] in front of the computed result is R's way of outputting numbers. It becomes useful when the result is a long vector. The number N in the brackets [N] is the index of the first number on that line. For example, if one generated 23 random numbers from a normal distribution
>
> x <- rnorm(23)
> x
[1] -0.5561324 0.2478934 -0.8243522 1.0697415 1.5681899
[6] -0.3396776 -0.7356282 0.7781117 1.2822569 -0.5413498
[11] 0.3348587 -0.6711245 -0.7789205 -1.1138432 -1.9582234
[16] -0.3193033 -0.1942829 0.4973501 -1.5363843 -0.3729301
[21] 0.5741554 -0.4651683 -0.2317168
>
Remark: After the random numbers have been generated, there is no output until one calls for x, namely, x has become a vector with 23 elements, call that a 23-vector.
The [11] on the third line of the output indicates that 0.3348587 is the 11th element in the 23-vector x. The numbers of outputs per line depends on the length of each element as well as the width of the page.
R is designed to be a dynamically typed language, namely, at any time one may change the data type of any variable. For example, one can first set x to be numeric as has been done so far, say x = 7; next one may set x to be a vector, say x = c (1, 2, 3, 4); then again one may set x to a word object, such as “Hi!”. Just watch the following R environment:
>
> x <- 7
> x
[1] 7
> x <- c(1, 2, 3, 4) # x is assigned to be a 4-vector.
> x
[1] 1 2 3 4
> x <- c("Hi!") # x is assigned to be a character string.
> x
[1] "Hi!"
> x <- c("Greetings & Salutations!")
> x
[1] "Greetings & Salutations!"
> x <- c("The rain in Spain falls mainly on the
+ plain.")
[1] "The rain in Spain falls mainly on the plain."
> x <- c("Calculus", "Financial", "Engineering", “R”)
> x
[1] "Calculus", "Financial", "Engineering", “R”
>
The use of arrays and matrices was introduced in Section 4.2.2.
In finite mathematics, a matrix is a two-dimensional array of elements, which are usually numbers. In R, the use of the matrix extends to elements of any type, such as a matrix of character strings. Arrays and matrices may be represented as vectors with dimensions. In statistics in which most variables carry multiple values, therefore, computations are usually performed between vectors of many elements. These operations among multivariates result in large matrices. To demonstrate the results, graphical representations are often useful. The following simple example illustrates these operations being readily accomplished in the R environment:
>
> weight <- c(73, 59, 97)
> height <- c(1.79, 1.64, 1.73)
> bmi <- weight/height^2
> bmi # Read the BMI Notes below
[1] 22.78331 21.93635 32.41004
> # To summarize the results proceed to compute as follows:
> cbind(weight, height, bmi)
weight height bmi
[1,] 73 1.79 22.78331
[2,] 59 1.64 21.93635
[3,] 97 1.73 32.41004
>
> rbind(weight, height, bmi)
[,1] [,2] [,3]
weight 73.00000 59.00000 97.00000
height 1.79000 1.64000 1.73000
bmi 22.78331 21.93635 32.41004
>
Clearly, the functions cbind and rbind bind (namely, join, link, glue, concatenate) by column and row, respectively, the vectors to form new vectors or matrices.
In the analysis of, for example, health science data sets, categorical variables are often needed. These categorical variables indicate subdivisions of the origin data set into various classes, for example: age, gender, disease stages, degrees of diagnosis, and so on. Input of the original data set is generally delineated into several categories using a numeric code: 1 = age, 2 = gender, 3 = disease stage, and so on. Such variables are specified as factors in R, resulting in a data structure that enables one to assign specific names to the various categories. In certain analyses, it is necessary for R to distinguish among categorical codes and variables whose values have direct numerical meanings.
A factor has four levels, consisting of two items
Consider the following example:
The following R code segment delineates the data set:
> cancerpain <- c(1, 4, 3, 3, 2, 4)
> fcancerpain <- factor(cancerpain, level=1:4)
> levels(fcancerpain) <- c("none", "mild",
+ "moderate", "severe")
The first statement creates a numerical vector cancerpain that encodes the pain levels of six case subjects. This is being considered as a categorical variable for which, using the factor function, a factor fcancerpain is created. This may be called with one argument in addition to cancerpain, namely, levels 1–4, which indicates that the input coding uses the values 1–4. In the final line, the pain level names are changed to the four specified character strings. The result is
> fcancerpain
[1] none severe moderate moderate mild severe
Levels: none mild moderate severe
> as.numeric(fcancerpain)
[1] 1 4 3 3 2 4
> levels(fcancerpain)
[1] "none" "mild" "moderate" "severe"
Remarks: The function as.numeric outputs the numerical coding as numbers 1–4, and the function levels outputs the names of the respective levels.
The original input coding in terms of the numbers 1–4 is no longer needed. There is an additional option using the function ordered that is similar to the function factor used here.
The body mass index (BMI), is a useful measure for human body fat based on an individual's weight and height – it does not actually measure the percentage of fat in the body. Invented in the early nineteenth century, BMI is defined as a person's body weight (in kilograms) divided by the square of the height (in meters). The formula universally used in health science produces a unit of measure of kg/m2:
A BMI chart may be used displaying BMI as a function of weight (horizontal axis) and height (vertical axis) with contour lines for different values of BMI or colors for different BMI categories (Figure 4.12).
Generating graphical presentations is an important aspect of statistical data analysis. Within the R environment, one may construct plots that allow production of plots and control of the graphical features. Thus, with the previous example, the relationship between body weight and height may be considered by first plotting one versus the other by using the following R code segments:
>
> plot (weight, height)
> # Outputting: Figure 4.13.
Remarks:
> ?plot # This is a call for “Help!” within the R environment.
> # The output is the R documentation for:
> plot {graphics} # Generic X-Y plotting
This is the official documentation of the R function plot, within the R package graphics – note the special notations used for plot and {graphics}. To fully make use of the provisions of the R environment, one should carefully investigate all such documentations. (R has many available packages, each containing a number of useful functions.) This document shows all the plotting options available with the R environment. A copy of this documentation is shown in Appendix 1 for reference.
For example, to change the plotting symbol, one may use the keyword pch (for “plotting character”) in the following R command:
> plot (weight, height, pch=8)
> # Outputting: Figure 4.14.
Note that the output is the same as that shown in Figure 4.13, except that the points are marked with little “8-point stars”, corresponding to plotting character pch = 8.
In the documentation for pch, a total of 26 options are available, providing different plotting characteristics for points in R graphics. They are shown in Figure 4.15.
The parameter BMI was chosen in order that this value should be independent of a person's height, thus expressing as a single number or index indicative of whether a case subject is overweight, and by what relative amount. Of course, one may plot “height” as the abscissa (namely, the horizontal “x-axis”) and “weight” as the ordinate (namely, the vertical “y-axis”), as follows:
> plot(height, weight, pch=8) # Outputting: Figure 4.16.
Since a normal BMI is between 18.5 and 25, averaging (18.5 + 25)/2 = 21.75. For this BMI value, the weight of a typical “normal” person would be (21.75 × height2). Thus, one can superimpose a line of “expected” weights at BMI = 21.75 on Figure 4.16. This line may be accomplished in the R environment by the following code segments:
> ht <- c(1.79, 1.64, 1.73)
> lines(ht, 21.75*ht^2) # Outputting: Figure 4.17.
In the last plot, a new variable for heights (ht) was defined instead of the original (height) because
Remarks:
>
> weight <- c(73, 59, 97) # a 3-vector
> height <- c(1.79, 1.64, 1.73, 1.48) # a 4-vector !
> bmi <- weight/height^2 # Outputting:
Warning message: # An Error message!
In weight/height^2 :
longer object length is not a multiple of shorter object length
>
It has just been shown that a variable, such as x or M may be assigned as
[,1] | [,2] | [,3] | [,4] | [,5] | |
[1,] | 1 | 4 | 7 | 10 | 13 |
[2,] | 2 | 5 | 8 | 11 | 14 |
[3,] | 3 | 6 | 9 | 12 | 15 |
> c("one", "two", "three", "four", "five")
> # Double-quotes
[1] "one" "two" "three" "four" "five"
>
> c('one', 'two', 'three', 'four', 'five')
> # Single-quotes
[1] ‘one’ ‘two’ ‘three’ ‘four’ ‘five’
>
> c("one", 'two', "three", 'four', "five")
> # Mixed-quotes
[1] "one" ‘two’ "three" ‘four’ "five"
However, if there is a mixed pair of quotes such as “xxxxx,” it will not be accepted. For example:
> c("one", "two", "three", "four", "five')
These vectors are similarly specified using the c function:
> c(T, F, T, F, T)
[1] TRUE FALSE TRUE FALSE TRUE
In most cases, there is no need to specify repeated logical vectors. It is acceptable to use a single logical value to provide the needed options, as vectors of more than one value will respond in terms of relational expressions. Observe
> weight <- c(73, 59, 97)
> height <- c(1.79, 1.64, 1.73)
> bmi <- weight/height^2
> bmi # Outputting:
[1] 22.78331 21.93635 32.41004
> bmi > 25 # A single logical value will suffice!
[1] FALSE FALSE TRUE
>
Three functions that create vectors are c, seq, and rep
> x <- c(1, 2, 3, 4) # x is assigned to be a 4-vector.
> x
[1] 1 2 3 4
> seq(1, 20, 2) # To output a sequence from 1 to 20, in steps of 2
[1] 1 3 5 7 9 11 13 15 17 19
> seq(1, 20) # To output a sequence from 1 to 20, in steps of 1
> # (which may be omitted)
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
> 1:20 # This is a simplified alternative to writing seq(1, 20).
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
> seq(1, 20, 2.5) # To output a sequence from 1 to 20, in steps of
> # 2.5 .
[1] 1.0 3.5 6.0 8.5 11.0 13.5 16.0 18.5
> rep(1:2, c(3,5)) # Replicating the first element (1) 3 times,
+ # and
> # then replicating the second element (2) 5 + # times
[1] 1 1 1 2 2 2 2 2 # This is the output.
> vector <- c(1, 2, 3, 4)
> vector # Outputting the vector
[1] 1 2 3 4
> rep(vector, 5) # Replicating vector 5 times:
[1] 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4
In finite mathematics, a matrix M is a two-dimensional array of elements, generally numbers, such as
M = | 1 | 4 | 7 | 10 | 13 |
2 | 5 | 8 | 11 | 14 | |
3 | 6 | 9 | 12 | 15 |
and the array is usually placed inside parenthesis () or some brackets {}, [], and so on. In R, the use of a matrix is extended to elements of many types: numbers as well as character strings. For example, in R, the matrix M is expressed as
[,1] | [,2] | [,3] | [,4] | [,5] | |
[1,] | 1 | 4 | 7 | 10 | 13 |
[2,] | 2 | 5 | 8 | 11 | 14 |
[3,] | 3 | 6 | 9 | 12 | 15 |
In R, the above 3 × 5 matrix may be set up as vectors with dimension dim(x) using the following code segment:
> x <- 1:j
> x
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
> dim(x) <- c(3, 5) # a dimension of 3 rows by 5 columns
..,
Remark: Here a total of 15 elements, 1–15, are set to be the elements of the matrix x. Then the dimension of x is set as c(3, 5), making x to become a 3 × 5 matrix. The assignment of the 15 elements follows a column-wise procedure, namely, the elements of the first column are allocated first followed by those of the second column, then the third column, and so on.
Another way to generate a matrix is using the function matrix().
The above 3 × 5 matrix may be created by the following 1-line code segment:
> matrix (1:15, nrow=3)
> matrix # Outputting:
[,1] [,2] [,3] [,4] [,5]
[1,] 1 4 7 10 13
[2,] 2 5 8 11 14
[3,] 3 6 9 12 15
However, if the 15 elements should be allocated by row, then the following code segment should be used:
> matrix (1:15, nrow=3, byrow=T)
> matrix # Outputting:
[,1] [,2] [,3] [,4] [,5]
[1,] 1 2 3 4 5
[2,] 6 7 8 9 10
[3,] 11 12 13 14 15
Using the previous example
> matrix (1:15, nrow=3, byrow=T)
> matrix # Outputting:
[,1] [,2] [,3] [,4] [,5]
[2,] 6 7 8 9 10
[3,] 11 12 13 1 4 15
> colnames(x) <- c("C1", "C2", "C3", "C4", "C5")
> x # Outputting:
C1 C2 C3 C4 C5
[1,] 1 4 7 10 .gf
[2,] 2 5 8 11 14
[3,] 3 6 9 12 15
> t(x)
[,1] [,2] [,3]
C1 1 2 3
C2 4 5 6
C3 7 8 9
C4 10 11 12
C5 13 14 15
> t(t(x)) # which is just x, as expected!
C1 C2 C3 C4 C5
[1,] 1 4 7 10 13
[2,] 2 5 8 11 14
[3,] 3 6 9 12 15
Yet another way is to use the function LETTERS, which is a built-in variable containing the capital letters A through Z. Other useful vectors include letters, month.name, and month.abb for lower-case letters, month names, and abbreviated names of months, respectively. Take a look:
> X <-LETTERS
> X # Outputting:
[1] "A" "B" "C" "D" "E" "F" "G" "H" "I" "J" "K" "L" "M" "N" "O"
[16] "P" "Q" "R" "S" "T" "U" "V" "W" "X" "Y" "Z"
> M # Outputting:
[1] "January" "February" "March" "April" "May"
[6] "June" "July" "August" "September" "October"
[11] "November" "December"
> m <- month.abb
> m # Outputting:
[1] "Jan" "Feb" "Mar" "Apr" "May" "Jun" "Jul" "Aug" "Sep" "Oct"
[11] "Nov" "Dec"
NA is a logical constant of length 1 that contains a missing value indicator. NA may be forced to any other vector type except raw. There are also constants NA integer, NA real, NA complex, and NA character of the other atomic vector types that support missing values: all of these are reserved words in the R language.
The generic function .na indicates which elements are missing.
The generic function .na<- sets elements to NA.
The reserved words in R's parser are
if, else, repeat, while, function, for, in next, break, NA complex, NA character, and 1, 2, and so on, which are used to refer to arguments passed down from an enclosing function.
Reserved words outside quotes are always parsed to be references to the objects linked to in the foregoing list, and are not allowed as syntactic names. They are allowed as nonsyntactic names.
There are three useful R functions that are often used to create vectors:
> c(2, 3, 5, 7, 11, 13, 17, 19, 23, 29)
> # The first 10 prime numbers
[1] 2 3 5 7 11 13 17 19 23 29
> seq(1, 20) # Sequence from 1 to 20
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
> seq(1, 20, 1) # Sequence from 1 to 20, in steps of 1
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
> 1:20 # Sequence from 1 to 20, in steps of 1
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
> seq(1, 20, 2) # Sequence from 1 to 20, in steps of 2
[1] 1 3 5 7 9 11 13 15 17 19
> seq(1, 20, 3) # Sequence from 1 to 20, in steps of 3
[1] 1 4 7 10 13 16 19
> seq(1, 20, 10) # Sequence from 1 to 20, in steps of 10
[1] 1 11
> seq(1, 20, 20) # Sequence from 1 to 20, in steps of 20
[1] 1
> seq(1, 20, 21) # Sequence from 1 to 20, in steps of 21
[1] 1
>
> x <- c(3, 4, 5)
> rep(x, 4) # Replicate the vector x 4-times.
[1] 3 4 5 3 4 5 3 4 5 3 4 5
> rep(x, 1:3) # Replicate the elements of x: the first element
> # once, the second element twice,
> # and the third element three times.
[1] 3 4 4 5 5 5
> rep(1:3, c(3,4,5)) # For the sequence (1, 2, 3), replicate its
> # elements 3-, 4-, and 5-times, respectively
[1] 1 1 1 2 2 2 2 3 3 3 3 3
There is an interesting challenge in arithmetic that goes like this
What is the value of ? Namely, an infinity of ascending tower of powers of the square root of 2.
Solution: Let x be the value of this “Tower of Powers”, then it is easily seen that √2x = x itself ! Agree?
Watch the lowest √2.
And clearly it follows that x = 2, because √22 = 2.
This shows that the value of this “Infinite Tower of Powers of √2” is just 2.
Now use the R environment to verify this interesting result:
(a) Compute √2
> sqrt(2)
(b) Compute √2√2
> sqrt(2)^sqrt(2) [a 2-Towers of √2-s]
(c) > sqrt(2)^sqrt(2)^sqrt(2) [a 3-Towers of √2-s]
(d) > sqrt(2)^sqrt(2)^sqrt(2)^sqrt(2)[a 4-Towers of √2-s]
(e) > sqrt(2)^sqrt(2)^sqrt(2)^sqrt(2)^sqrt(2)
[a 5-Towers of √2-s]
(f) Now try the following computations of 10-, 20-, 30-, and finally 40-Towers of Powers of √2, and finally reach the result of 2 (accurate to 6 places of decimal!).
> sqrt(2)^sqrt(2)^sqrt(2)^sqrt(2)^sqrt(2)^sqrt(2)^sqrt(2)^sqrt(2)^sqrt(2)^sqrt(2)
[1] 1.983668 [a 10-Towers of Powers of √2-s]
> sqrt(2)^sqrt(2)^sqrt(2)^sqrt(2)^sqrt(2)^sqrt(2)^sqrt(2)^sqrt(2)^sqrt(2)^sqrt(2)^
+ sqrt(2)^sqrt(2)^sqrt(2)^sqrt(2)^sqrt(2)^sqrt(2)^sqrt(2)^sqrt(2)^sqrt(2)^sqrt(2)
[1] 1.999586 [a 20-Towers of Powers of √2-s]
>sqrt(2)^sqrt(2)^sqrt(2)^sqrt(2)^sqrt(2)^sqrt(2)^sqrt(2)^sqrt(2)^sqrt(2)^sqrt(2)^
+sqrt(2)^sqrt(2)^sqrt(2)^sqrt(2)^sqrt(2)^sqrt(2)^sqrt(2)^sqrt(2)^sqrt(2)^sqrt(2)^
+ sqrt(2)^sqrt(2)^sqrt(2)^sqrt(2)^sqrt(2)^sqrt(2)^sqrt(2)^sqrt(2)^sqrt(2)^sqrt(2)
[1] 1.999989 [a 30-Towers of Powers of √2-s]
>sqrt(2)^sqrt(2)^sqrt(2)^sqrt(2)^sqrt(2)^sqrt(2)^sqrt(2)^sqrt(2)^sqrt(2)^sqrt(2)^
+sqrt(2)^sqrt(2)^sqrt(2)^sqrt(2)^sqrt(2)^sqrt(2)^sqrt(2)^sqrt(2)^sqrt(2)^sqrt(2)^
+sqrt(2)^sqrt(2)^sqrt(2)^sqrt(2)^sqrt(2)^sqrt(2)^sqrt(2)^sqrt(2)^sqrt(2)^sqrt(2)^
+ sqrt(2)^sqrt(2)^sqrt(2)^sqrt(2)^sqrt(2)^sqrt(2)^sqrt(2)^sqrt(2)^sqrt(2)^sqrt(2)
[1] 2 [a 40-Towers of Powers of √2-s]
Thus, this R computation verifies the solution.
+, −, ×, / (division), √, squaring of a number
Enter the R environment, and do the following exercises using R programming:
(a) 7 + 31; (b) 87 – 23; (c) 3.1417 X (7)2; (d) 22/7;
(e) e√2
BMI = kg / m2
Using 1 kg ≈ 2.2 lb, and 1m ≈ 3.3 ft ≈ 39.4 in
(a) Calculate your BMI.
(b) Is it in the “normal” range 18.5 ≤ BMI ≤ 25?
John Chang Michael Bryan Jose
WEIGHT 69.1 62.5 74.3 70.9 96.6
HEIGHT 1.81 1.46 1.69 1.82 1.74
(a) Construct a matrix showing their BMI as the last row.
(b) Plot: (i) WEIGHT (on the y-axis) vs HEIGHT (on the x-axis)
(ii) HEIGHT vs WEIGHTi
(iii) Assuming that the weight of a typical “normal”:
person is
(21.75 x HEIGHT2), superimpose a line-of-
“expected”-weight at BMI = 21.75 on the plot in (i).
At standard temperature and pressure, the freezing and boiling points of water are 0 and 100 °C, respectively. What are the freezing and boiling points of water in degrees Fahrenheit?
Note: To create the sequence of Celsius temperatures use the R function seq(0, 100, 5).
The probability of not getting infected after two consecutive acts is (1 − p)2, and the probability of not getting infected after three consecutive acts is (1 − p)3. Therefore, the probability of not getting infected after n consecutive acts is (1 − p)n, and the probability of getting infected after n consecutive acts is 1 − (1 − p)n.
For the nonblood transfusion transmission probability (per act risk) in Table A, calculate the risk of being infected after one year (365 days) if one carries out the needle sharing injection drug use (IDU) once daily for one year.
Do these cumulative risks seem reasonable? Why? Why not?
Table A
Estimated per-act risk (transmission probability) for acquisition of HIV, by exposure route to an infected source.
Source: CDC
Exposure route Risk per 10,000 exposures
Blood Transfusion (BT) 9,000
Needle-sharing Injection-Drug Use (IDU) 67
Solution:
> p <- 67/10000
> p
[1] 0.0067
> q <- (1 - p)
> q
[1] 0.9933
> q365 <- q^365
> q365
[1] 0.08597238
> p365 <- 1 - q365
> p365
[1] 0.9140276
# => Probability of being infected in a year = 91.40%.
# A high risk, indeed!
What is a Straddle? A straddle, also known as disambiguation, is a type of financial investment strategy involving simultaneously both the put and call of a given stock, to provide additional opportunities for profiting (with the concomitant risk for losing!).
In finance, a straddle refers to two transactions that share the same security, with positions that offset one another. One holds long risk, the other short. Thus, it involves the purchase or sale of particular option derivatives that allow the holder to profit based on how much the price of the underlying security moves, regardless of the direction of price movement.
A straddle involves buying a call and put with same strike price and expiration date:
A straddle is appropriate when an investor is expecting a large move in a stock price but does not known in which direction the move will be.
The purchase of particular option derivatives is known as a long straddle, while the sale of the option derivatives is known as a short straddle.
A long straddle buys both a call option and a put option on some stock, interest rate, index, or other underlying. The two options are bought at the same strike price and expire at the same time. The owner of a long straddle makes a profit if the underlying price moves a long way from the strike price, either above or below. Thus, an investor may take a long straddle position if he thinks the market is highly volatile, but does not know in which direction it is going to move. This position is a limited risk, since the most a purchaser may lose is the cost of both options. At the same time, there is unlimited profit potential.
For example, company ABC is set to release its quarterly financial results in 2 weeks. An investor believes that the release of these results will cause a large movement in the price of the ABC's stock, but does not know whether the price will go up or down. The investor may enter into a long straddle, where one gets a profit no matter which way the price of ABC stock moves, if the price changes enough either way:
Thus, the risk is limited by the total premium paid for the options, as opposed to the short straddle where the risk is virtually unlimited.
A short straddle is a nondirectional option trading strategy that involves simultaneously selling a put and a call of the same underlying security, strike price, and expiration date. The profit is limited to the premium received from the sale of put and call. The risk is virtually unlimited as large moves of the underlying security's price either up or down will cause losses proportional to the magnitude of the price move. A maximum profit upon expiration is achieved if the underlying security trades exactly at the strike price of the straddle. Thus, both puts and calls comprising the straddle expire worthless allowing straddle owner to keep full credit received as their profit. This strategy is also called “nondirectional” because the short straddle profits when the underlying security changes little in price before the expiration of the straddle. The short straddle may be classified as a credit spread because the sale of the short straddle results in a credit of the premiums of the put and call.
(A risk for holder of a short straddle position is unlimited due to the sale of the call and the put options that expose the investor to unlimited losses (on the call) or losses limited to the strike price (on the put), whereas maximum profit is limited to the premium gained by the initial sale of the options.)
An example of a Straddle Financial Investment, using R:
In CRAN, the package, FinancialMath: Financial Mathematics for Actuaries provides a numerical example for assessing a Straddle investment, using the Black–Scholes equation for estimating the call and put prices:
traddle.bls Straddle Spread - Black Scholes
Description
Gives a table and graphical representation of the payoff and profit of a long or short straddle for a range of future stock prices. Uses the Black–Scholes equation for estimating the call and put prices.
Usage
straddle.bls(S,K,r,t,sd,position,plot=TRUE/FALSE)
Arguments
S spot price at time 0
K strike price of the put and call
r continuously compounded yearly risk free rate
t time of expiration (in years)
sd standard deviation of the stock ( a measure of its volatility)
position either buyer or seller of option (“long” or “short”)
plot specifying whether or not to plot the payoff and profit
Details
Stock price at time t = St
For St ≤ K1: payoff = K1 - St
For K1 < St ≤ K2: payoff = 0
For St > K2: payoff = St – K2
profit = payoff - (priceK1 + priceK2) * er*t
For St ≤ K: payoff = St - K
For St > K: payoff = K - St
Profit = Payoff + (pricecall + priceput) * er*t
Value
A list of two components.
Payoff A data frame of different payoffs and profits for given stock prices.
Premiums A matrix of the premiums for the call and put options, and the net cost.
See Also
option.put
option.call
strangle.bls
In financial investigations, after preparing the collected data sets (as discussed in Section 3.1) to undertake financial analysis, the first step is to enter the data sets into the R environment. Once the data sets are placed within the R environment, analysis will process the data to obtain results leading to creditable conclusions, and likely recommendations for definitive courses of actions to improve pertinent aspects of public and personal data. Several methods for data set entry will be examined.
Data Frames and Data sets: For many financial investigators, the terms data frame and data set may be used interchangeably.
Data sets: In many applications, a complete data set contain several data frames, including the real data that have been collected.
Data Frames: Rules for data frames are similar to those for arrays and matrices, introduced earlier. However, data frames are more complicated than arrays. In an array, if just one cell is a character, then all the columns will be characters. On the other hand, a data frame can consist of the following:
In a data frame:
These properties can be transferred from the original data set in other software formats (such as SPS, Stata, etc.). They can also be created in R.
As an example, using a typical set of real case control epidemiologic research data, consider the data set in Table B, from a clinical trial to evaluate the efficacy of maintenance chemotherapy for case subjects with acute myelogenous leukemia (AML), conducted at Stanford University, California, U.S.A., in 1977. After reaching a status of remission through treatment by chemotherapy, the patients who entered the study were assigned randomly to two groups:
The clinical trial was to ascertain whether maintenance chemotherapy prolonged the time until relapse (= “death”).
Procedure: (1) To create an acute myelogenous
leukemia (AML) data file,
called AML.csv, in Windows
(2) To input it into R as a data file AML
(1) Creating a data frame for R computation
1. Data Input Using EXCEL:
(a) Open the Excel spreadsheet
(b) Type in data such that the variable names are
in the row 1 of the Excel spreadsheet.
(c) Consider each row of data as an individual in
the study.
(d) Start with column A.
2. Save as a .csv file:
(a) Click: “File” → “Save as” → and then, in the
file name box (the upper box at the bottom) type: AML
(b) In the “Save in:” Box (at the top), choose
“Local Disc (C:)”
The file AML will then be saved in the top
level of the C:Drive, but another level may also
be chosen.
In the “Save as Type” Box (the lower box at the bottom), scroll down, select, and click on CSV
(Comma delimited = Comma Separated Values)
To close out of Excel using the big “X” at the top right-hand corner: Click X.
Table B**
Data for the AML maintenance Clinical Study (a + indicates a censored value)
____________________________________________________________
Group Duration for Complete Remission (weeks) .
1=Maintained (11) 9,13,13+,18,23,28+,31,34,45+,48,161+ } 1=Uncensored
0=Nonmaintained (12) 5, 5, 8, 8, 12, 16+,23,27,30,33,43,45} 0=Censored (+).
NB: The Nonmaintained group may be considered as MBD***
The AML Clinical Study Data: Tableman & Kim (2004).- Table B-1: 23 data points, taken from -
“Survival Analysis Using S: Analysis of Time-to-Event Data” by Mara Tableman and Jong Sung Kimz, published by Chapman & Hall/CRC, Boca Raton, 2004
The cancer epigenome is characterised by specific DNA methylation and chromatin modification patterns. The proteins that mediate these changes are encoded by the epigenetics genes, here defined as DNA methyltransferases (DNMT), methyl-CpG-binding domain (MBD) proteins, histone acetyltransferases (HAT), histone deacetylases (HDAC), histone methyltransferases (HMT), and histone demethylases.
3. In Windows, check the C:Drive for the AML.csv file., namely, C:AML
4. Read AML into R:
(a) Open R
(b) Use the read.csv() function:
> aml <- read.csv("C:\AML.csv", header = T,sep=
+ ",")
(c) Actually, it can also be done by
> aml <- read.csv("C:\AML.csv")
> # Read in the AML.csv file from the C:Drive of the
> # Computer, and call it
> # aml
5. Output the AML.csv file for inspection
> aml # Outputting:
weeks group status
1 9 1 1
2 13 1 1
3 13 1 0
4 18 1 1
5 23 1 1
6 28 1 0
7 31 1 1
8 34 1 1
9 45 1 0
10 48 1 1
11 161 1 0
12 5 0 1
13 5 0 1
14 8 0 1
15 8 0 1
16 12 0 1
17 16 0 0
18 23 0 1
19 27 0 1
20 30 0 1
21 33 0 1
22 43 0 1
23 45 0 1
>
Later, in Section 6.3, this data set will be revisited and further processed for survival analysis.
Data from various sources are often entered using many different software programs.
They may be transferred from one format to another through the ASCII file format.
For example, in Windows, a text file is the most common ASCII file, usually having a “.txt” extension. There are other files in ASCII format, including the “.R” command file.
Data from most software programs can be outputted or saved as an ASCII file. From Excel, a very common spreadsheet program, the data can be saved as “.csv” (comma separated values) format. This is an easy way to interface between Excel spreadsheet files and R. Open the Excel file and “save as” the csv format.
As an example, suppose the file “csv1.xls” is originally an Excel spreadsheet. After “save as” into csv format, the output file is called “csv1.csv”, the contents of which is
"name", "gender", "age"
"A", "F", 20
"B", "M", 30
"C", "F", 40
The characters are enclosed in quotes and the delimiters (variable separators) are commas. Sometimes the file may not contain quotes, as in the file “csv2.csv”.
name, gender, age
A, F, 20
B, M, 30
C, F, 40
For both files, the R command to read in the data set is the same.
> a <- read.csv("csv1.csv", as.is=TRUE)
> a
name gender age
1 A F 20
2 B M 30
3 C F 40
The argument ‘as.is=TRU ' keeps all characters as they are, otherwise the characters would have been coerced into factors. The variable “name” should not be factored but “gender” should. The following command should, therefore, be entered
> a$gender <- factor(a$gender)
Note that the object “a” has class data frame and that the names of the variables within the data frame “a” must be referenced using the dollar sign notation $. Otherwise, R will state that the object “gender” cannot be found.
For files with white space (spaces and tabs) as the separator, such as in the file
"data1.txt", the command to use is read.table():
> a <- read.table("data1.txt", header=TRUE,
+ as.is=TRUE)
Consider the file “data2.txt” which in fixed field format without field separators.
name gender age
1 A F 20
2 B M 30
3 C F 40
To read in such a file, use the function read.fwf():
> a <- read.fwf("data2.txt", skip=1, width=c(1,1,2),
+ col.names = c("name", "gender", "age"),
+ as.is=TRUE)
The previous section deals with creating data frames by reading in data created from programs outside R, such as Excel. It is also possible to enter data directly into R by using the function data.entry(). However, if the data size is large (say more than 15 columns and/or more than 25 rows), the chance of human error is high with the spreadsheet or text mode data entry. A software program specially designed for data entry, such as Epidata, is more appropriate. (http://www.epidata.dk)
The data set, in Table C, lists deaths among subjects who received a dose of tolbutamide or a placebo in the University Group Diabetes Program (1970), stratifying by age:
Table C**
Deaths Among Subjects Who Received Tolbutamide or a Placebo in the University Group Diabetes Program (1970)
Age < 55 Age ≥ 55 Combined
Tolbutamide Placebo Tolbutamide Placebo Tolbutamide Placebo
Deaths 8 5 22 16 30 21
Survivors 98 115 76 69 174 184
____________________________________________________
**Available at http://www.medepi.net/data/ugdp.txt
The R functions that can be used to import the data frame have been previously introduced in Sections 4.3.3–4.3.13 A convenient way to enter data at the command prompt is to use the R functions:
c(), matrix(), array(), apply(), list(), data.frame(), and
odd.ratio(),
as shown by the following examples and using the data in Table C.
> #Entering data for a vector
> vector1 <- c(8, 98, 5, 115) # Using data from Table C.
> vector1
[1] 8 98 5 115
>
> vector2 <- c(22, 76, 16, 69); vector2
> # Data from Table C.
[1] 22 76 16 69
>
> # Entering data for a matrix
> matrix1 <- matrix(vector1, 2, 2)
> matrix1
[,1] [,2]
[1,] 8 5
[2,] 98 115
> matrix2 <- matrix(vector2, 2, 2); matrix2
[,1] [,2]
[1,] 22 16
[2,] 76 69
>
> # Entering data for an array
> udata <- array(c(vector1, vector2), c(2, 2, 2))
> udata
,, 1
[,1] [,2]
[1,] 8 5
[2,] 98 115
,, 2
[,1] [,2]
[1,] 22 16
[2,] 76 69
> apply(udata, c(1, 2), sum); udata.tot
[,1] [,2]
[1,] 30 21
[2,] 174 184
>
> # Entering a list
> x <- list(crude.data = udata.tot, stratified.data =
+ udata)
> x$crude.data
[,1] [,2]
[1,] 30 21
[2,] 174 184
> x$stratified
,, 1
[,1] [,2]
[1,] 8 5
[2,] 98 115
,, 2
[,1] [,2]
[1,] 22 16
[2,] 76 69
>
> # Entering a simple data frame
> subjectname <- c("Peter", "Paul", "Mary")
> subjectnumber <- 1:length(subjectname)
> age <- c(26, 27, 28) # These are their true ages,
> # respectively, in 1964!
> gender <- c("Male", "Male", "Female")
> data1 <- data.frame(subjectnumber, subjectname,
+ age, gender)
> data1
subjectnumber subjectname age gender
1 1 Peter 26 Male
2 2 Paul 27 Male
3 3 Mary 28 Female
>
> # Entering a simple function
> odds.ratio <- function(aa, bb, cc, dd){ aa*dd /
+ (bb*cc)}
> odds.ratio(30, 174, 21, 184) # Data from Table C.
[1] 1.510673
The R function scan() is taken from the CRAN package base.
This function reads data into a vector or list from the console or file. This function takes the following usage form:
scan(file = ", what = double(), nmax = -1, n = -1,
sep = "",
quote = if(identical(sep, " ")) "" else
"'"", dec = ".",
skip = 0, nlines = 0, na.strings = "NA",
flush = FALSE, fill = FALSE, strip.white =
FALSE,
quiet = FALSE, blank.lines.skip = TRUE,
multi.line = TRUE,
comment.char = "", allowEscapes =
FALSE,
fileEncoding = "", encoding = "unknown",
text)
Argument
what | The type of what gives the type of data to be read. The supported types are logical, integer, numeric, complex, character, raw, and list. If what is a list, it is assumed that the lines of the data file are records each containing length (what) items (fields) and the list components should have elements that are one of the first six types listed or NULL. |
The what argument describes the tokens that scan() should expect in the input file.
For a detailed description of this function, execute
> ?scan
The methodology of applying scan() is similar to c(), as described in Section 4.4.1.4, except that it does not matter the numbers are being entered on different lines, it will still be a vector.
The function readLines() reads lines from a file, and returns them to a list of character strings:
> lines <- readLines(“input.text”)
One may limit the number of lines to be read, per pass, by using the n parameter that gives the maximum number of lines to be read:
> lines <- readLines(“input.text, n=5)
> # read 5 lines and stop
The function scan() reads one token at a time, and handles it accordingly as instructed.
An example:
Assume that the file to be scanned and read contains triplets of data (like the dates, and the corresponding daily highs and lows of financial markets):
15-Oct-1987 | 2439.78 | 2345.63 |
16-Oct-1987 | 2396.21 | 2207.73 |
19-Oct-1987 | 2164.16 | 1677.55 |
20-Oct-1987 | 2067.47 | 1616.23 |
21-Oct-1987 | 2087.07 | 1951.76 |
Use a list to operate scan() that it should expect a repeating, 3-token sequence:
> triplets <- scan(“triples.txt, what=list(character(0),
+ numeric(0), numeric(0)))
Give names to the list elements, and scan() will assign those names to the data:
> triplets <- scan(“triples.txt,
+ what=list(date=character(0),
+ high=numeric(0), low=numeric(0)))
Reads 5 records.
> triples # Outputs:
$date
[1] “15-Oct-1987” “15-Oct-1987” “19-oct-1987” “20-Oct-1987” “21-oct-1987”
$high
[1] 2439-78 2396.21 2164.16 2067.47 2081.07
$low
[1] 2345.63 2207.73 1677.55 1616.21 1951.76
The R function source() is also taken from the CRAN package base. This function reads data into a vector or list from the console or file. It takes the following usage form:
source(file, local = FALSE, echo = verbose, print.eval
= echo,
verbose = getOption("verbose"),
prompt.echo = getOption("prompt"),
max.deparse.length = 150, chdir = FALSE,
encoding = getOption("encoding"),
continue.echo = getOption("continue"),
skip.echo = 0, keep.source =
getOption("keep.source"))
For commands that are stored in an external file, such as “commands.R” in the working directory “work,” they can be executed in an R environment with the command
> source(“command.R”)
The function source() instructs R to read the text and execute its contents. Thus, when one has a long, or frequently used, piece of R code, one may capture it inside a text file. This allows one to rerun the code without having to retype it, and use the function source() to read and execute the code.
For example, suppose the file howdy.R contains the familiar greeting:
Print (“Hi, My Friend!”)
Then by sourcing the file, one may execute the content of the file, as in the follow R code segment:
> source(“howdy.R”)
[1] “Hi, My Friend!”
Setting echo-TRUE will echo the same script lines before they are executed, with the R prompt shown before each line:
> source(“howdy.R”, echo=TRUE)
> Print(“Hi, My Friend!”)
[1] “Hi, My Friend!”
This method consists of the following R functions in the package Utils.
This is a spreadsheet-like editor for entering or editing data, with the following R functions:
data.entry(..., Modes = NULL, Names = NULL)
dataentry(data, modes)
de(..., Modes = list(), Names = NULL)
The arguments of these R functions are:
A list of variables: currently these should be numerals or character vectors or a list containing such vectors.
Modes | The modes to be used for the variables. |
Names | The names to be used for the variables. |
data | A list of numeric and/or character vectors. |
modes | A list of length up to that of data giving the modes of (some of) the variables. list() is allowed. |
The function data.entry() edits an existing object, saving the changes to the original object name.
However, the function edit() edits an existing object but not saving the changes to the original object name so that one must assign it to an object name (even if it is the original name). To enter a vector, one needs to initialize a vector and then use the function data.entry(). For example:
Start by entering the R environment, and set
> x <- c(2, 4, 6, 8, 10)
> # X is initially defined as an array of 5 elements.
> x # Just checking – to make sure!
[1] 2 4 6 8 10 # x is indeed set to be an array of 5elements
>
> data.entry(x) # Entering the Data Editor:
> # The Data Editor window pops up, and looking at the first
> # column: it is now named “x”, with the first 5 rows (all on first
> # column) filled, respectively, by the numbers 2, 4, 6, 8, 10
> # One can now edit this dataset by, say, changing all the
> # entries to 2, then closing the Data Editor window, and
> # returning to the R console window:
> x
[1] 2 2 2 2 2 # x is indeed changed!
> # Thus one can change the entries for x via the Data Editor,
> # and save the changes.
When using the functions data.entry(x) and edit() for data entry, there are a number of limitations:
To illustrate the ease of use of R in financial mathematics, consider a simple example of the repayment process of a loan, such as a mortgage on a piece of real estate.
Two examples are used for calculating the loan repayment process:
Example (A): A loan of one million dollars, at an interest rate of 2.5%, to be repaid by equal monthly installments over 30 years or 360 months. One should also consider the rate and total amount of interests to be repaid over the life of the loan
> amort.table(Loan=1000000,n=360,
+ i=0.025,pf=360,plot=TRUE)
Example (B): Example A shows that the monthly payment is $2,812.31. Suppose the borrower can afford to pay more that this monthly amount, say, $5,000.00 monthly, how does this alternate repayment scheme affect the overall repayment process, particularly in terms of the total interest payable over the whole life of the loan?
Package: | ‘FinancialMath' |
Date: | December 16, 2016 |
Type: | Package |
Title: | Financial Mathematics for Actuaries |
Version: | 0.1.1 |
Author Kameron Penn | [aut, cre], |
Jack Schmidt | [aut] |
Maintainer | Kameron Penn <[email protected]> |
Description | Contains financial math functions and introductory derivative functions included in the Society of Actuaries and Casualty Actuarial Society ‘Financial Mathematics' exam and some topics on the “Models for Financial Economics' exam.” |
License | GPL-2 |
Encoding | UTF-8 |
LazyData | true |
Needs Compilation | no |
Repository | CRAN |
Date/Publication | 2016-12-16 22:51:34 |
amort.table Amortization Table
Description
Produces an amortization table for paying off a loan while also solving for either the number of payments, loan amount, or the payment amount. In the amortization table, the payment amount, interest paid, principal paid, and balance of the loan are given for each period. If n ends up not being a whole number, outputs for the balloon payment, drop payment, and last regular payment are provided. The total interest paid, and total amount paid is also given. It can also plot the percentage of each payment toward interest versus period.
Usage
amort.table(Loan=NA, n=NA, pmt=NA, i=0.025,
ic=1, pf=1, plot=TRUE)
Arguments
Loan | loan amount |
n | the number of payments/periods |
pmt | value of level payments |
i | nominal interest rate convertible ic times per year |
ic | interest conversion frequency per year |
pf | the payment frequency- number of payments per year |
plot | tells whether or not to plot the percentage of each payment toward interest vs.period |
Details
Effective Rate of Interest: eff.i = [1 + (i/ic]ic - 1
j = (1 + eff.i)(1/pf) - 1
Loan = pmt * anj
Balance at the end of period t: Bt = pmt * an - t|j
Interest paid at the end of period t: it = Bt-1*j
Principal paid at the end of period t: pt = pmt - it
Total Paid = pmt * n
Total Interest Paid = pmt * n - Loan
If n = n* + k where n* is an integer and 0 < k < 1:
Last regular payment (at period n*) = pmt * sk/j
Drop payment (at period n* + 1) = Loan * (1 + j)n*+1 - pmt *sn*j
Balloon payment (at period n*) = Loan * (1 + j)n* - pmt * sn*j + pmt
Value
A list of two components.
Schedule | A data frame of the amortization schedule. |
Other | A matrix of the input variables and other calculated variables. |
Note
Assumes that payments are made at the end of each period.
One of n, Loan, or pmt must be NA (unknown).
If pmt is less than the amount of interest accumulated in the first period, then the function will stop because the loan will never be paid off due to the payments being too small.
If pmt is greater than the loan amount plus interest accumulated in the first period, then the function will stop because one payment will pay off the loan.
Author(s)
K. Penn and J. Schmidt
See Also
amort.period
annuity.level
Remark:
In this case, it took only 258.04 months to pay off the loan. Hence, the overall saving in interest payments, for the $1M loan, is
(Seems attractive?)
The Function list()
A list in R consists of an ordered collection of objects – its components, which may be of any modes or types. For examples, a list may consist of a matrix, a numeric vector, a complex vector, a logical value, a character array, a function, and so on. Thus, some simple way to create a list would be
Example 1: It is as easy as “1, 2, 3”!
> x <- 1
> y <- 2
> z <- 3
> list1 <- list(x, y, z) # Forming a simple list
> list1 # Outputting:
[[1]]
[1] 1
[[2]]
[1] 2
[[3]]
[1] 3
Moreover, the components are always numbered, and may be referred to as such. Thus, if my.special.list is the name of a list with four components, they may be referred to, individually, as
my.special.list[[1]], my.special.list[[2]],
my.special.list[[3]], and my.special.list[[4]].
If one defines my.special.list as:
> my.special.list <- list(name="John", wife="Mary",
+ number.of.children=3, children.age=c(2, 4, 6))
then
> my.special.list[[1]] # Outputting:
[1] "John"
> my.special.list[[2]]
[1] "Mary"
> my.special.list[[3]]
[1] 3
> my.special.list[[4]]
[1] 2 4 6
The number of (top-level) components in a list may be found by the function length(). Thus
> length(my.special.list)
[1] 4
viz., the list my.special.list has 4 components.
To combine a set of objects into a larger composite collection for more efficient processing, the list function may be used to construct a list from its components.
As an example, consider
> odds <- c(1, 3, 5, 7, 9, 11,13,15,17,19)
> evens <- c(2, 4, 6, 8, 10, 12, 14, 16, 18, 20)
> mylist <- list(before=odds, after=evens)
> mylist
$before
[1] 1 3 5 7 9 11 13 15 17 19
$after
[1] 2 4 6 8 10 12 14 16 18 20
> mylist$before
[1] 1 3 5 7 9 11 13 15 17 19
> mylist$after
[1] 2 4 6 8 10 12 14 16 18 20
Components of a list may be named. In such a case, the component may be referred to either
> name$component_name
for the same object.
Example 2: A family affair -
> my.special.list <- list(name="John", wife="Mary",
+ number.of.children=3, children.age=c(2, 4, 6))
> my.special.list # Outputting:
$name
[1] "John"
$wife
[1] "Mary"
$number.of.children
[1] 3
$children.age
[1] 2 4 6
Thus, for this list:
> my.special.list[[1]]
[1] "John"
> my.special.list$name
> # This is the same as my.special.list[[1]]
[1] "John"
> my.special.list[[2]]
[1] "Mary"
> my.special.list$wife
> # This is the same as my.special.list[[2]]
[1] "Mary"
> my.special.list[[2]]
[1] "Mary"
> my.special.list[[3]]
[1] 3
> my.special.list$number.of.children
> # This is the same as my.special.list[[3]]
[1] 3
> my.special.list[[4]]
[1] 2 4 6
> my.special.list$children.age
> # This is the same as my.special.list[[4]]
[1] 2 4 6
To extract the name of a component stored in another variable, one may use the names of the list components in double square brackets, viz., list1[[“name”]]. The following R code segment may be used:
> x <- "name"; my.special.list[[John]]
[1] "John"
Constructing, Modifying, and Concatenating Lists:
New lists may be constructed from existing objects by the function
list().
Thus, the form
> new.list <- list(name_1=object_1,... name-
+ n=object_n)
will set up s list, list1, of n components using object_1,…,
object_n for the components and giving then names as specified.
Package ‘Dowd’
March 11, 2016
Type Package
Title Functions Ported from ‘MMR2' Toolbox Offered in Kevin Dowd's
Book Measuring Market Risk
Version 0.12
Date 2015-08-20
Author Dinesh Acharya <[email protected]>
Maintainer Dinesh Acharya <[email protected]>
Description ‘Kevin Dowd's' book Measuring Market Risk is a widely read book
in the area of risk measurement by students and
practitioners alike. As he claims, ‘MATLAB' indeed might have been the most
suitable language when he originally wrote the functions, but,
with growing popularity of R it is not entirely
valid. As ‘Dowd's' code was not intended to be error free and were mainly for reference, some functions in this package have inherited those errors. An attempt will be made in future releases to identify and correct them.'Dowd's' original code can be downloaded from www.kevindowd.org/measuring-marketrisk/.
It should be noted that 'Dowd' offers both 'MMR2'
and 'MMR1' toolboxes. Only 'MMR2' was ported to
R. 'MMR2' is more recent version of 'MMR1' toolbox and they both have mostly similar function. The toolbox mainly
contains different parametric and non-parametric
methods for measurement of market risk as well as
backtesting risk measurement methods.
Depends R (>= 3.0.0), bootstrap, MASS, forecast
Suggests PerformanceAnalytics, testthat
License GPL
NeedsCompilation no
Repository CRAN
Date/Publication 2016-03-11 00:45:03
BlackScholesCallESSim ES of Black-Scholes call using Monte Carlo Simulation
Description
Estimates ES of Black-Scholes call Option using Monte Carlo simulation
Usage
BlackScholesCallESSim(amountInvested, stockPrice, strike, r, mu, sigma,
maturity, numberTrials, cl, hp)
Arguments
amountInvested Total amount paid for the Call Option and is positive (negative) if the option
position is long (short)
stockPrice Stock price of underlying stock
strike Strike price of the option
r Risk-free rate
mu Expected rate of return on the underlying asset and is in annualised term
sigma Volatility of the underlying stock and is in annualised term
maturity The term to maturity of the option in days
numberTrials The number of interactions in the Monte Carlo simulation exercise
cl Confidence level for which ES is computed and is scalar
hp Holding period of the option in days and is scalar
Value
ES
Author(s)
Dinesh Acharya
References
Dowd, Kevin. Measuring Market Risk, Wiley, 2007.
Lyuu, Yuh-Dauh. Financial Engineering & Computation: Principles, Mathematics, Algorithms,
Cambridge University Press, 2002.
Examples
# Market Risk of American call with given parameters.
BlackScholesCallESSim(0.20, 27.2, 25, .16, .2, .05, 60, 30, .95, 30)
> In the R domain
>
> install.packages("Dowd")
Installing package into ‘C:/Users/Bert/Documents/R/win-library/3.2’
(as ‘lib’ is unspecified)
--- Please select a CRAN mirror for use in this session ---
# A CRAN mirror is selected
The downloaded binary packages are in
C:UsersBertAppDataLocalTempRtmpuYe2Oxdownloaded_packages
> library(4.4.3 Stock Market Risk Analysis:)
> # ES (Expected Shortfall) in the Black-Scholes Model
> library(Dowd)
> ls("package:Dowd") # Outputting:
[1] "AdjustedNormalESHotspots"
[2] "AdjustedNormalVaRHotspots"
[3] "AdjustedVarianceCovarianceES"
[4] "AdjustedVarianceCovarianceVaR"
[5] "ADTestStat"
[6] "AmericanPutESBinomial"
[7] "AmericanPutESSim"
[8] "AmericanPutPriceBinomial"
[9] "AmericanPutVaRBinomial"
[10] "BinomialBacktest"
[11] "BlackScholesCallESSim"
[12] "BlackScholesCallPrice"
[13] "BlackScholesPutESSim"
[14] "BlackScholesPutPrice"
[15] "BlancoIhleBacktest"
---------------------------------------------
[145] "tVaRPlot3D"
[146] "VarianceCovarianceES"
[147] "VarianceCovarianceVaR"
>
>
> # Market Risk of American call with given
+ # parameters.
> BlackScholesCallESSim(0.20, 27.2, 25, .16, .2, .05,
+ 60, 30, .95, 30)
> # Outputting the Black-Scholes Call ES (Expected Shortfall):
[1] 0.001294227
> # viz., according to the Black-Scholes Model, for this Call,
> # the ES (Expected Shortfall) is predicted to be at the 0.1%
> # level, or, very unlikely indeed!
The “expected shortfall at q% level” is the expected return on the portfolio in the worst { q}qqq% of cases. ES is an alternative to Value at Risk that is more sensitive to the shape of the tail of the loss distribution.
> list.ABC <- c(list.A, list.B, list.C)
As an example, enter the following code segments:
> x <- rexp(100); x
> # Outputting 100 exponentially-distributed
> # random numbers:
[1] 0.39136880 0.66948212 1.48543076 0.34692128 0.71533079 0.12897216
[7] 1.08455419 0.07858231 1.01995665 0.81232737 0.78253619 4.27512555
[13] 2.11839466 0.47024886 0.62351482 1.02834522 2.17253419 0.37622879
[19] 0.16456926 1.81590741 0.16007371 0.95078524 1.26048607 5.92621325
[25] 0.21727112 0.07086311 0.83858727 1.01375231 1.49042968 0.53331210
[31] 0.21069467 0.37559212 0.10733795 2.84094906 0.17899040 1.34612473
[37] 0.00290699 1.77078060 1.79505318 0.09763821 1.96568170 0.15911043
[43] 4.36726420 0.33652419 0.01196883 0.35657882 0.72797670 0.91958975
[49] 0.68777857 0.29100399 0.22553560 1.56909742 0.20617517 0.37169621
[55] 0.53173534 0.26034316 0.21965356 2.94355695 1.88392667 1.13933083
[61] 0.31663107 0.23899975 0.01544856 1.30674088 0.53674598 1.72018758
[67] 0.31035278 0.81074737 0.09104104 1.52426229 1.35520172 0.27969075
[73] 1.36320488 0.56317216 0.85022837 0.49031656 0.17158651 0.31015165
[79] 2.07315953 1.29566872 1.28955269 0.33487343 0.20902716 2.84732652
[85] 0.58873236 1.54868210 2.93994181 0.46520037 0.73687959 0.50062507
[91] 0.20275282 0.49697531 0.58578119 0.49747575 1.53430435 4.56340237
[97] 0.90547787 0.72972219 2.60686316 0.33908320
Note: The function rexp() is defined as follows:
with arguments:
x | vector |
n | number of observations. If length(n) > 1, the length is taken to be the number required. |
The exponential distribution with rate λ has density
f(x) = λ e -λx, for x ≥ 0.
If the rate λ is not specified, it assumes the default value of 1.
Remark: The function rexp() is one of the functions in R under exponential in the CRAN package stats.
To undertake a biostatistical analysis of this set of univariate data, one may call up the function univax(), in the package epibasix, using the following code segments:
> library(epibasix)
> univar(x) # Outputting:
Univariate Summary
Sample Size: 100
Sample Mean: 1.005
Sample Median: 0.646
Sample Standard Deviation: 1.067
>
Thus, for this sample size of 100 elements, the mean, median and standard deviation have been computed.
For data analysis of univariate data sets, the R package epibasix may be used.
This CRAN package covers many elementary financial functions for statistics and econometrics. It contains elementary tools for analysis of common financial problems, ranging from sample size estimation, through 2 x 2 contingency table analysis, and basic measures of agreement (kappa, sensitivity/specificity).
Appropriate print and summary statements are also written to facilitate interpretation wherever possible. This work is appropriate for graduate financial engineering courses.
This package is a work in progress.
To start, enter the R environment and use the code segment:
> install.packages("epibasix")
Installing package(s) into ‘C:/Users/bertchan/Documents/R/win-library/2.14
(as ‘lib’ is unspecified)
--- Please select a CRAN mirror for use in this session ---
> # Select CA1
trying URL
'http://cran.cnr.Berkeley.edu/bin/windows/contrib/2.14/epibasix_1.1.zip'
Content type 'application/zip' length 57888 bytes (56 Kb)
opened URL
downloaded 56 Kb
package ‘epibasix’ successfully unpacked and MD5 sums checked
The downloaded packages are in
C:UsersertchanAppDataLocalTempRtmpMFOrEndownloaded_packages
With epibasix loaded into the R environment, to learn more about this package,
follow these steps:
1. Go to the CRAN website: http://cran.r-project.org/
2. Select (single-click) Packages, on the left-hand column
3. On the page: select E (for epibasix)
Available CRAN Packages By Name
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
4. Scroll down list of packages whose name starts with “E” or “e”, and select:
epibasix
5. When the epibasix page opens up, select: Reference manual: epibasix.pdf
6. The information is now on displayed, as follows:
Package ‘epibasix’
January 2, 2012
Version 1.1
Date 2009-05-13
Author Michael A Rotondi <[email protected]>
Maintainer Michael A Rotondi [email protected]
Depends R (>= 2.01)
For another example, consider the same analysis on the first one hundred Natural
Numbers, using the following R code segments:
> x <-1:100; x # Consider, and then output, the first 100
> # Natural Numbers
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
[19] 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36
[37] 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54
[55] 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72
[73] 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90
[91] 91 92 93 94 95 96 97 98 99 100
> # ANOVA Tables: Summarized in the following tables,
> # ANOVA is used for
> # two different purposes:
> library(epibasix)
> univar(x) # Performing a univariate data analysis
> # on the vector x, and Outputting:
Univariate Summary
Sample Size: 100
Sample Mean: 50.5
Sample Median: 50.5
Sample Standard Deviation: 29.011
And that’s it!
When there are two variables,(X, Y), one need to consider the following two cases:
Correlation for two variables implies a corelationship between the variables, and does not distinguish between them as to which is the dependent or the independent variable. Thus, one may fit a straight line to the data either by minimizing ∑(xi − x)2 or by minimizing ∑(yi − y)2. The fitted regression line will, in general, be different in the two cases – and a logical question arises as to which line to fit.
Two situation do exit, and should be considered:
Among the R packages for bivariate data analysis, a notable one available for sample size calculations for bivariate random intercept regression model is the bivariate power.
As an example, this package may be used to calculate necessary sample size to achieve 80% power at 5% alpha level for null and alternative hypotheses that correlate between RI 0 and 0.2, respectively, across six time points. Other covariance parameter are set as follows:
Correlation between residuals = 0;
Standard deviations: 1st RI = 1, 2nd RI = 2, 1st residual = 0.5, 2nd residual = 0.75
The following R code segment may be used:
> library(bivarRIpower)
> bivarcalcn(power=.80,powerfor='RI',timepts=6,
+ d1=1,d2=2, p=0,p1=.2,s1=.5,s2=.75,r=0,r1=.1)
# Outputting:
Variance parameters
Clusters = 209.2
Repeated measurements = 6
Standard deviations
1st random intercept = 1
2nd random intercept = 2
1st residual term = 0.5
2nd residaul term = 0.75
Correlations
RI under H_o = 0
RI under H_a = 0.2
Residual under H_o = 0
Residual under H_a = 0.1
Con obs under H_o = 0
Con obs under H_a = 0.1831984
Lag obs under H_o = 0
Lag obs under H_a = 0.1674957
Correlation variances under H_o
------------------------------------------------------------
Random intercept = 0.005096138
Residual = 0.0009558759
Concurrent observations = 0.00358999
Lagged observations = 0.003574277
Power (%) for correlations
------------------------------------------------------------
Random intercept = 80%
Residual = 89.9%
Concurrent observations = 86.4%
Lagged observations = 80%
>
Under the correlation model, the bivariates X and Y vary together in a joint distribution, which, if this joint distribution is a normal distribution, is called a bivariate normal distribution, from which inferences may be made based on the results of sampling properly from the population. If the joint distribution is known to be non-normal, or if the form is unknown, inferential procedures are invalid. The following assumptions must hold for inferences about the population to be valid when sampling from a bivariate distribution:
Two random variables X and Y are said to be jointly normal if they can be expressed in the form
where U and V are independent normal random variables.
If X and Y are jointly normal, then any linear combination
has a normal distribution. The reason is that if one has X = aU + bV and Y = cU + dV for some independent normal random variables U and V, then
Thus, Z is the sum of the independent normal random variables (as1 + cs2)U and (bs1 + ds2)V, and is therefore normal.
A very important property of jointly normal random variables is that zero correlation implies independence.
If two random variables X and Y are jointly normal and are uncorrelated, then they are independent.
(This property can be verified using multivariate transforms)
The following are the two similar, but distinct, approaches used for multivariate data analysis:
The following are the assumptions underlying multiple regression model analysis:
For multiple linear regression, the model equation is
where yj is a typical value from one of the subpopulations of Y values, and the βi values are the regression coefficients.
x1j, x2j, x3j,…, xnj are, respectively, particular values of the independent variables X1, X2, X3,…, Xn, and ej is a random variable with mean 0 and variance σ2, the common variance of the subpopulation of Y values. Generally, ej is assumed normal and independently distributed.
When Equation (4.1) consists of one dependent variable and two independent variables, the model becomes
A plane in three-dimensional space may be fitted to the data points. For models containing more than two variables, it is a hyperplane.
If y is the mean of the observed data,
then the variability of the data set may be measured using three sums of squares (proportional to the variance of the data):
The most general definition of the coefficient of multiple determination is
The parameter of interest in this model is the coefficient of multiple determination, , obtained by dividing the explained sum of squares by the total sum of squares:
where: | ||
∑ (yi − fi)2 | = | the explained variation |
= | the original observed values from the calculated Y values | |
= | the sum of squared deviation of the calculated values from the mean of the observed Y values, or | |
= | the sum of squares due to regression (SSR) | |
∑ (yi − y)2 | = | the unexplained variation |
= | the sum of squared deviations of the original observations from the calculated values | |
= | the sum of squares about regression, or | |
= | the error sum of squares (SSE) |
The total variation is the sum of squared deviations of each observation of Y from the mean of the observations:
namely,
or
The Multiple Correlation Model Analysis – The object of this approach is to gain insight into the strength of the relationship between variables.
The multiple regression correlation model analysis equation is
where yj is a typical value from one of the subpopulations of Y values, the βi are the regression coefficients, x1j, x2j, x3j,…, xnj are, respectively, particular known values of the random variables X1, X2, X3,…, Xn, and ej is a random variable with mean 0 and variance σ2, the common variance of the subpopulation of Y values. Generally, ej is assumed normal and independently distributed.
This model is similar to model Equation (4.5), with the following important distinction:
That is, in the correlation model, Equation (4.9), there is a joint distribution of Y and the Xi that is called a multivariate distribution.
Under this model, the variables are no longer considered as being dependent or independent, because logically they are interchangeable, and either of the Xi may play the role of Y.
The Multiple Correlation Coefficient: To analyze the relationships among the variables, consider the multiple correlation coefficient, which is the square root of the coefficient of multiple determination, and hence the sample value may be computed by taking the square root of Equation (4.12), namely,
In statistics, ANalysis Of VAriance (ANOVA) is a collection of statistical models in which the observed variance in a particular variable is partitioned into components from different sources of variation. ANOVA provides a statistical test of whether or not the means of several groups are all equal, and therefore, generalizes t-test to more than two groups. Doing multiple two-sample t-tests would result in an increased chance of committing a Type I error. For this reason, ANOVAs are useful in comparing two, three, or multiple means.
ANOVA Tables: Summarized in the following tables, ANOVA is used for two different purposes:
ANOVA table for testing hypotheses about simple linear regression
Source | DF | Sum of squares | Mean squares |
F-value | P-value | MSG/MSE = F1,n-2 | Pr(F > F1,n-2) |
Model | 1 | Σ(yi − y)2 = SSModel | SSM = MSM |
Residual | n − 2 | Σei2 = SSResidual | SSR/(n − 2) = MSE |
Total | n − 1 | Σ(yi − y)2 = SSTotal | SST/(n − 1) = MST |
Residuals are often called errors since they are the part of the variation that the line could not explain, so MSR = MSE = sum of squared residuals/df = σ = estimate for variance of the population regression line SSTot/(n−1) = MSTOT = sy2 = the total variance of the ys F = t2 for simple linear regression.
The larger the F (the smaller the p-value) the more the y's variation in the line explained and so the less likely that H0 is true. One rejects a hypothesis when the p-value < α.
R2 | = | proportion of the total variation of y explained by the regression line |
= | SSM/SST | |
= | 1 − SSResidual/SST |
ANOVA table for testing hypotheses about population means
Source | DF | Sum of squares | Mean squares | F-value P-value | p-value |
Group (between) | k−1 | Σni()2 = SSG | SSG/(k−1) = MSG | MSG/MSE Pr = Fk−1,N−k = (F > Fk−1,N−k) | Pr(F > Fk−1,N−k) |
Error (within) | N−k | Σ(ni − 1)si2 = SSE | SSE/(N−k) = MSE | ||
Total | N−1 | Σ(xij−)2 = SSTot | SSTot/(N−1) = MST |
N = total number of observations = Σni, where ni = number of observations for group i.
The F test statistic has two different degrees of freedom: the numerator = k − 1, and the denominator = N − k · Fk−1,N−k
Note: SSE/(N−k) = MSE = sp2 = (pooled sample variance)
(This is the “average” variance for each group.) SSTot/(N−1) = MSTOT = s2 = the total variance of the data (assuming NO groups)
F ≈ variance of the (between) sample means divided by the ∼average variance of the data, the larger the F (the smaller the p-value) the more varied the means are, so the less likely H0 is true. It is rejected when the p-value < α.
R2 = proportion of the total variation explained by the difference in means =
2005-11-01 2005-11-02 2005-11-03 2005-11-04 2005-11-07
1 -7.174567e-05 -0.0004228193 0.0070806633 0.0086669900 0.003481978
2 8.414595e-03 0.0025193420 0.0127072920 -0.0007027570 0.006205226
3 -9.685162e-04 -0.0021474822 -0.0004991982 -0.0005400452 0.000752058
2005-11-08 2005-11-09 2005-11-10 2005-11-11 2005-11-14
1 0.0006761683 0.001840756 0.001091617 0.008291539 -0.0006469167
2 0.0003292600 -0.002378200 0.000922087 0.013334906 -0.0046930640
3 0.0014039486 -0.001694911 0.001119758 0.001929053 -0.0003708446
2005-11-15 2005-11-16 2005-11-17 2005-11-18 2005-11-21
1 -0.0005408223 0.001686769 0.0048643273 0.005051224 0.001901881
2 0.0012668650 -0.007187498 0.0076581030 0.012527202 0.002659666
3 -0.0008278810 0.001879324 0.0003510514 -0.001005688 0.001511155
2005-11-22 2005-11-23 2005-11-24 2005-11-25 2005-11-28
1 0.0037938340 0.0024438233 -0.000394091 0.0022790387 -0.006316902
2 0.0021424940 0.0035671280 -0.002559544 0.0033748180 -0.009816739
3 0.0003204556 0.0008044326 0.001046522 0.0007771762 -0.001717004
2005-11-29 2005-11-30 2005-12-01 2005-12-02 2005-12-05
1 0.0031941750 -0.0037985373 0.011412029 0.0030362450 -0.003450339
2 0.0018864440 -0.0040370550 0.015977559 0.0070552900 -0.000268694
3 0.0001304264 -0.0001264086 0.002280999 0.0004551984 -0.001906796
2005-12-06 2005-12-07 2005-12-08 2005-12-09 2005-12-12
1 -0.001325485 0.0007240550 -0.0066049723 0.0031775860 -0.001708432
2 0.002864672 -0.0026064460 -0.0034829450 0.0007472420 -0.000278182
3 0.002119992 -0.0007569246 0.0008520178 0.0006191402 0.000879002
2005-12-13 2005-12-14 2005-12-15 2005-12-16 2005-12-19
1 0.0046313440 -0.004764346 0.0029376953 0.001213580 0.0004901883
2 0.0013586830 -0.007283576 -0.0073441080 0.004292528 0.0049806720
3 0.0008959768 -0.001095387 0.0007868516 0.001616794 0.0023614496
2005-12-20 2005-12-21 2005-12-22 2005-12-23 2005-12-26
1 0.0071593677 0.007886516 0.0001159380 0.001659755 0.001861593
2 -0.0016334200 0.004319502 -0.0036299400 -0.002234004 0.000000000
3 0.0009404428 0.002617884 -0.0002855498 0.000507245 0.000246350
2005-12-27 2005-12-28 2005-12-29 2005-12-30 2006-01-02
1 -0.001653545 -0.0005308263 0.004244265 -0.0020539910 0.0007404967
2 0.006571546 0.0016811710 0.007701353 -0.0039764730 0.0000000000
3 0.001191907 0.0015647450 0.001259883 -0.0007382012 0.0000960024
2006-01-03 2006-01-04 2006-01-05 2006-01-06 2006-01-09
1 -0.0013538470 0.001985783 -0.001291249 0.0015069157 0.006899025
2 0.0065956320 0.012163609 -0.002285923 0.0022278130 -0.000174350
3 -0.0001111704 0.001657156 -0.000790521 0.0005815596 0.000574890
2006-01-10 2006-01-11 2006-01-12 2006-01-13 2006-01-16
1 -0.001533005 0.0040728800 0.004428087 -0.001791852 -0.0005044543
2 -0.006484459 0.0109940950 0.005887780 0.000629145 0.0041673760
3 -0.000713456 -0.0008200066 0.001761519 -0.002024939 -0.0000711238
2006-01-17 2006-01-18 2006-01-19 2006-01-20 2006-01-23
1 -0.0064264427 -0.009959086 0.008732922 -0.005498331 -0.012300008
2 -0.0060610800 -0.007104074 0.004901245 -0.008712876 0.000080500
3 -0.0004887026 -0.000363750 0.001841324 -0.002770640 -0.000594795
2006-01-24 2006-01-25 2006-01-26 2006-01-27 2006-01-30 2006-01-31
1 0.0043293403 0.003928724 0.007596804 0.013018524 0.004510001 -0.000363935
2 -0.0009696750 -0.001245166 0.008348556 0.004847171 0.001426750 0.002736095
3 -0.0007514866 -0.001468543 0.001332284 0.002198388 0.000585831 0.000676559
2006-02-01 2006-02-02 2006-02-03 2006-02-06 2006-02-07
1 0.0025679990 -0.0033918090 0.001392298 0.003829826 -0.0016356553
2 0.0033390250 -0.0052315370 0.006229563 0.000010100 -0.0003661590
3 -0.0001937426 -0.0009799974 0.001561223 0.000789857 -0.0005261388
2006-02-08 2006-02-09 2006-02-10 2006-02-13 2006-02-14
1 -0.0034102587 0.004724740 -0.0002670790 -0.0043036747 0.0071958650
2 -0.0007797950 0.005323768 -0.0010290070 0.0030521410 -0.0005124970
3 0.0002705448 0.001218430 -0.0001116748 -0.0002136692 0.0009441608
2006-02-15 2006-02-16 2006-02-17 2006-02-20 2006-02-21
1 -0.0012863397 0.0045143087 -0.0008902317 -0.002237808 0.0050509303
2 -0.0099343840 0.0129171990 0.0012844190 0.004250758 0.0063638350
3 0.0002048986 0.0009410282 0.0012602358 0.001315374 0.0009478938
2006-02-22 2006-02-23 2006-02-24 2006-02-27 2006-02-28
1 0.007771075 0.0008127150 0.0050935397 0.0050390380 -0.0108695373
2 0.004433658 -0.0060669620 -0.0027457400 0.0053722110 -0.0119569840
3 0.002287663 0.0003148286 -0.0003851528 0.0009570004 -0.0006375348
2006-03-01 2006-03-02 2006-03-03 2006-03-06 2006-03-07
1 0.0038673107 -0.003960535 -0.004273008 0.0005488833 -0.001580974
2 0.0105047210 -0.004177747 0.001755361 0.0004602100 -0.007028455
3 0.0001534274 -0.003340033 -0.001016466 0.0001020288 -0.001403624
2006-03-08 2006-03-09 2006-03-10 2006-03-13 2006-03-14
1 -0.004631919 0.0054281930 0.0077513163 0.0026288817 -0.0011077267
2 -0.004230567 0.0035344700 0.0128746620 0.0082543960 -0.0005305960
3 -0.000989756 0.0001691144 -0.0002940006 0.0006133858 0.0006779516
2006-03-15 2006-03-16 2006-03-17 2006-03-20 2006-03-21
1 0.0039076300 -0.002874349 0.0035407443 -0.0002971253 0.0017513113
2 -0.0017679700 0.000666818 0.0043129660 0.0021624820 -0.0002753190
3 0.0001087012 0.000484471 -0.0001434976 0.0006474828 -0.0008360246
2006-03-22 2006-03-23 2006-03-24 2006-03-27 2006-03-28
1 0.0034417023 0.0042167183 -0.0005082437 -0.002477860 -0.005123098
2 0.0013273100 -0.0043185900 0.0027725900 -0.006945127 -0.000727865
3 0.0006235516 0.0002566278 0.0001903118 -0.000559792 -0.001748221
2006-03-29 2006-03-30 2006-03-31 2006-04-03 2006-04-04
1 0.0083719110 -0.0023536467 0.0032189740 0.002929918 -0.0053531483
2 0.0009252440 0.0062064890 -0.0009292510 0.006944399 -0.0026184930
3 -0.0004704764 0.0003112066 -0.0003227242 -0.001220841 -0.0005425296
2006-04-05 2006-04-06 2006-04-07 2006-04-10 2006-04-11
1 0.001672089 0.0035391920 0.0006688370 -0.0018577213 -0.0078612010
2 0.004752739 0.0015471340 0.0005119710 0.0000160000 -0.0095747430
3 0.000371034 0.0001308954 -0.0001606024 0.0004964256 0.0008645472
2006-04-12 2006-04-13 2006-04-14 2006-04-17 2006-04-18
1 -0.001794527 -6.576767e-05 0.0003451073 -0.01162293 0.0072958027
2 -0.002978612 3.927437e-03 0.0000000000 0.00000000 -0.0050312550
3 -0.001221078 -2.154967e-03 -0.0000179728 -0.00127895 0.0008801984
2006-04-19 2006-04-20 2006-04-21 2006-04-24 2006-04-25
1 0.005262631 0.004346695 -0.0001275733 -0.0055535810 -0.0008195537
2 0.010893639 0.004201724 0.0048380810 -0.0016947400 -0.0041756790
3 0.001021676 0.001559365 -0.0000182248 -0.0001482958 -0.0017556356
2006-04-26 2006-04-27 2006-04-28 2006-05-01 2006-05-02
1 0.001310372 -0.003174862 -0.0081970307 -0.0042204567 0.005113142
2 0.004699408 0.000883969 -0.0029437410 0.0000000000 0.003189327
3 -0.000656350 0.000053330 0.0002421188 -0.0007466388 0.001444051
2006-05-03 2006-05-04 2006-05-05 2006-05-08 2006-05-09
1 -0.001263695 -0.0006502133 0.006613750 0.0063954017 -0.003357079
2 -0.011943802 0.0030146350 0.011235363 0.0061756300 0.002517020
3 -0.001037496 -0.0000032026 0.002150292 0.0007181488 -0.001401839
2006-05-10 2006-05-11 2006-05-12 2006-05-15 2006-05-16
1 -0.0025826107 -0.008643966 -0.016678837 -0.006493649 -0.002878691
2 -0.0013872090 -0.001096609 -0.017989732 -0.013399941 0.003285150
3 -0.0005785598 -0.003397807 -0.004047129 -0.000274325 0.001044719
2006-05-17 2006-05-18 2006-05-19 2006-05-22 2006-05-23
1 -0.010999777 -0.0085399823 0.005933202 -0.021077161 0.007398834
2 -0.028406916 -0.0097114170 0.001848007 -0.025997761 0.018970677
3 -0.003777801 0.0001129756 0.002052023 -0.001940486 0.002266990
2006-05-24 2006-05-25 2006-05-26 2006-05-29 2006-05-30
1 -0.0004267907 0.0061746340 0.016697015 -0.0011805273 -0.017512962
2 -0.0111155890 0.0000000000 0.025842125 0.0017720070 -0.019842413
3 -0.0011767074 0.0006506848 0.002053482 0.0005608306 -0.003672446
2006-05-31 2006-06-01 2006-06-02 2006-06-05 2006-06-06
1 0.004247769 0.0048529913 0.002548541 -0.006215087 -0.007154035
2 0.009323326 0.0015364830 0.006993059 0.000000000 -0.022326545
3 0.001607127 -0.0008596914 0.002280503 -0.001082275 -0.001973220
2006-06-07 2006-06-08 2006-06-09 2006-06-12 2006-06-13
1 -0.0015405703 -0.008894756 0.004967781 -0.004911457 -0.017072989
2 0.0056383270 -0.027379310 0.012429170 -0.013895865 -0.023992295
3 -0.0003462346 -0.001317298 0.002478040 -0.001395419 -0.003113317
2006-06-14 2006-06-15 2006-06-16 2006-06-19 2006-06-20
1 -0.0008148080 0.0180758310 0.0023954587 0.001269003 -0.002988990
2 0.0022722680 0.0215693860 -0.0029262460 0.008192221 0.003859327
3 -0.0006585664 0.0004955194 -0.0000012182 -0.001387469 -0.001853536
2006-06-21 2006-06-22 2006-06-23 2006-06-26 2006-06-27
1 0.0019448093 0.005720239 0.0003236553 0.0022263750 -0.004784092
2 0.0033826850 0.007397751 0.0005459380 -0.0024934820 -0.006852493
3 -0.0002515638 -0.000396063 -0.0009850436 -0.0002276698 -0.001709434
2006-06-28 2006-06-29 2006-06-30 2006-07-03 2006-07-04 2006-07-05
1 0.002418113 0.013946565 0.0007082933 0.0051894390 0.0036357180 -0.003018747
2 0.002764602 0.014041765 0.0144738140 0.0087848690 0.0019620600 -0.008807793
3 0.000155729 0.003295715 0.0007134052 0.0007825544 -0.0005997304 -0.002343919
2006-07-06 2006-07-07 2006-07-10 2006-07-11 2006-07-12
1 0.0006826483 -0.0053623837 0.006527871 -0.0026637643 -0.000460215
2 0.0055633270 -0.0057979700 0.005982938 -0.0038346560 0.004470950
3 0.0008409888 0.0001182382 0.001124756 0.0001879154 -0.000366662
2006-07-13 2006-07-14 2006-07-17 2006-07-18 2006-07-19
1 -0.013003389 -0.0051940770 -0.0004672143 0.0002666863 0.010286439
2 -0.015294320 -0.0103727210 -0.0022662700 -0.0056064500 0.018419580
3 -0.002107831 0.0007863122 0.0008104944 -0.0013472198 0.001975698
2006-07-20 2006-07-21 2006-07-24 2006-07-25 2006-07-26 2006-07-27
1 0.001826095 -0.0080939067 0.013847674 0.0046651393 0.0005912200 -0.000995908
2 0.008023110 -0.0053387720 0.019239808 0.0006067870 0.0043343300 0.005560211
3 0.001140619 -0.0006294194 0.003391497 0.0009658202 0.0007748634 0.001358731
2006-07-28 2006-07-31 2006-08-01 2006-08-02 2006-08-03
1 0.0054530087 0.0009621367 -0.0016914860 0.001702776 3.388133e-05
2 0.0102982490 0.0002672360 0.0000000000 -0.003903872 -1.101953e-02
3 0.0008377254 0.0012288024 -0.0002328868 -0.000336470 -1.946893e-03
2006-08-04 2006-08-07 2006-08-08 2006-08-09 2006-08-10
1 0.0004074303 -0.005170347 0.001632508 0.0004397503 0.005330442
2 0.0095373900 -0.012098329 0.000553560 0.0107776200 -0.005122863
3 0.0025471316 -0.001348114 0.001391077 0.0003048744 0.000744234
2006-08-11 2006-08-14 2006-08-15 2006-08-16 2006-08-17
1 0.0003234277 0.0042420490 0.006281944 0.002505853 0.0022312940
2 0.0016952790 0.0080151810 0.014854679 0.004466055 0.0029039360
3 -0.0003461608 -0.0005048628 0.002816991 0.001751071 0.0003821246
2006-08-18 2006-08-21 2006-08-22 2006-08-23 2006-08-24
1 0.001322035 -0.0061511060 0.007819312 -0.0024164317 -0.0006088927
2 -0.003261951 -0.0030139970 0.003136533 -0.0002148440 0.0020208950
3 0.001275974 -0.0002004374 0.002229195 -0.0002547412 -0.0001443246
2006-08-25 2006-08-28 2006-08-29 2006-08-30 2006-08-31
1 0.001879964 0.0001150750 0.0039964717 -0.0013533323 0.003702581
2 0.000665318 0.0018015970 0.0060754350 0.0027585440 -0.001466397
3 0.000606633 0.0008119902 -0.0000166692 0.0008165558 0.002074951
2006-09-01 2006-09-04 2006-09-05 2006-09-06 2006-09-07
1 0.003825121 0.0022519797 0.002990688 -0.005576266 -0.0046239360
2 0.003126659 0.0045617030 -0.000424630 -0.005904139 -0.0061051260
3 0.001543151 -0.0000862626 0.000150264 -0.002215612 -0.0008681144
2006-09-08 2006-09-11 2006-09-12 2006-09-13 2006-09-14 2006-09-15
1 0.005100734 -0.006268633 0.0079822300 0.004665103 -0.0020439300 0.007261748
2 0.005038040 -0.008630266 0.0115443790 0.003047893 -0.0032925020 0.005426908
3 0.002150122 -0.002463850 0.0007386444 0.001504165 -0.0003697208 0.001212241
2006-09-18 2006-09-19 2006-09-20 2006-09-21 2006-09-22
1 -0.002313630 -0.0022077103 0.002879550 0.0017740240 -0.0120944313
2 0.003786049 -0.0025686540 0.012243951 0.0039715930 -0.0091896050
3 -0.001438723 0.0004212428 0.001333139 0.0009339414 0.0000184756
2006-09-25 2006-09-26 2006-09-27 2006-09-28 2006-09-29 2006-10-02
1 0.0015608587 0.007422705 0.0054981810 0.003993309 0.002248728 -0.004799167
2 -0.0027169680 0.012816618 0.0036014040 0.000706224 0.001409431 -0.005008288
3 0.0006968214 0.002279966 -0.0000253728 0.000284538 0.000532484 -0.001994171
2006-10-03 2006-10-04 2006-10-05 2006-10-06 2006-10-09
1 0.0011308367 0.007532294 0.007702274 0.0013383343 -1.842333e-06
2 0.0006241600 0.006657881 0.007166048 0.0010145910 3.046625e-03
3 -0.0000141456 0.001957197 0.001609076 -0.0009962644 2.434124e-04
2006-10-10 2006-10-11 2006-10-12 2006-10-13 2006-10-16
1 0.0058490050 -0.001032232 0.005763076 0.0057859347 0.0016738733
2 0.0089863210 0.001066884 0.004239901 -0.0020605700 -0.0018373360
3 -0.0004574944 0.000480683 0.001357373 0.0004827066 -0.0004780952
2006-10-17 2006-10-18 2006-10-19 2006-10-20 2006-10-23
1 -0.0072083463 0.0065625757 -0.004172731 0.0013755277 0.0067030360
2 -0.0105018630 0.0091540220 0.001017744 0.0026080710 0.0055130300
3 -0.0002134596 0.0004904236 -0.001158779 -0.0000334576 -0.0001258452
2006-10-24 2006-10-25 2006-10-26 2006-10-27 2006-10-30
1 0.0006909480 0.0007045297 0.0004360173 -0.006801505 -0.0013724143
2 -0.0035125790 0.0028470680 -0.0006172110 0.002649247 -0.0047902870
3 0.0001992902 0.0006721280 0.0005680150 0.000502184 0.0003469408
2006-10-31 2006-11-01 2006-11-02 2006-11-03 2006-11-06
1 -0.001518782 1.051133e-05 -0.0013112907 0.0043942913 0.006986784
2 -0.008238610 4.875862e-03 0.0034278750 0.0059986060 0.010778959
3 0.001557833 1.065105e-03 -0.0000054624 -0.0000061736 0.002111045
2006-11-07 2006-11-08 2006-11-09 2006-11-10 2006-11-13
1 -0.000492798 0.0010355810 -0.0042464430 -0.0022569473 0.0017332780
2 0.003869904 -0.0059257000 -0.0005821010 -0.0026741380 0.0015220670
3 0.002714624 0.0004917422 0.0007154908 -0.0007612248 -0.0003075366
2006-11-14 2006-11-15 2006-11-16 2006-11-17 2006-11-20
1 0.005042735 0.0039623527 0.000571091 -0.0041334430 -0.0008046817
2 -0.001419646 0.0067483290 0.000088600 -0.0046228550 0.0012852990
3 0.001633393 -0.0003903638 -0.000064044 -0.0004950332 0.0008012360
2006-11-21 2006-11-22 2006-11-23 2006-11-24 2006-11-27
1 0.0035024650 -0.001423817 -0.0022464847 -0.0074566157 -0.008884057
2 0.0015966630 0.000878824 -0.0028429630 -0.0120525280 -0.014190477
3 0.0000896774 -0.000362714 -0.0009733886 -0.0007358458 -0.001985401
2006-11-28 2006-11-29 2006-11-30 2006-12-01 2006-12-04
1 -0.0005720070 0.010011509 -0.0028207700 -0.0045097767 0.008035140
2 -0.0064998550 0.012901202 -0.0086468080 -0.0064254010 0.006552949
3 0.0004750374 0.001182122 0.0003041304 -0.0001392252 0.003841311
2006-12-05 2006-12-06 2006-12-07 2006-12-08 2006-12-11
1 0.0006129080 0.0012128337 0.0014398583 -0.001522731 0.006351745
2 -0.0028113180 0.0071741430 0.0074404230 -0.002915097 0.006216417
3 -0.0002819442 -0.0002501772 -0.0002762668 0.000266215 0.000853780
2006-12-12 2006-12-13 2006-12-14 2006-12-15 2006-12-18
1 0.0003822723 0.0032613870 0.0085999303 0.004852111 0.0007975687
2 0.0077836640 0.0016147920 0.0100852550 0.001294973 0.0045667980
3 0.0014112930 -0.0003111648 0.0006167068 0.002972998 -0.0018724352
2006-12-19 2006-12-20 2006-12-21 2006-12-22 2006-12-25
1 -0.007259400 0.0039956217 -0.001713398 -0.001404090 -0.0004215033
2 -0.006190852 0.0015531190 0.000940550 -0.005018284 0.0000000000
3 -0.001259345 0.0002665752 0.002535426 -0.002011179 -0.0000406544
2006-12-26 2006-12-27 2006-12-28 2006-12-29 2007-01-01
1 0.0026915907 0.009066652 0.0004164007 -0.001200784 -9.820667e-06
2 0.0000000000 0.010333820 -0.0020348850 -0.001101976 0.000000e+00
3 0.0004873476 0.002159424 -0.0010962306 -0.000159042 -1.140000e-05
2007-01-02 2007-01-03 2007-01-04 2007-01-05 2007-01-08
1 0.0016820867 0.004437189 -0.0002060123 -0.002683635 -0.001690284
2 0.0000000000 0.014834908 0.0001279650 -0.002580941 -0.004541686
3 0.0004246192 0.001879946 0.0006782908 -0.001927514 -0.001011925
2007-01-09 2007-01-10 2007-01-11 2007-01-12 2007-01-15
1 0.0044103520 -0.0006727953 0.007703170 0.003633612 0.005514158
2 0.0032808260 -0.0014696800 0.013481449 0.005427095 0.008135203
3 0.0001659808 -0.0012173928 0.001547265 -0.000644134 0.001385737
2007-01-16 2007-01-17 2007-01-18 2007-01-19 2007-01-22
1 -0.0002334037 -0.0009348027 0.0018915933 0.005485118 -0.001422125
2 -0.0023385150 0.0036009040 -0.0000306000 0.004899725 -0.004463529
3 0.0019940186 -0.0003052906 -0.0002513524 0.001870075 0.000533480
2007-01-23 2007-01-24 2007-01-25 2007-01-26 2007-01-29
1 -0.0024795987 0.010078597 -0.005617319 0.0003207103 0.0026204570
2 -0.0005070610 0.006027774 -0.001732155 -0.0095941300 0.0074250550
3 -0.0000391516 0.001921089 -0.001420198 0.0000189644 0.0006710216
2007-01-30 2007-01-31 2007-02-01 2007-02-02 2007-02-05 2007-02-06
1 0.002703326 -0.0017867277 0.005665622 0.005261889 1.452767e-05 0.0016964537
2 0.004207812 -0.0000649000 0.009002711 0.004474549 2.320000e-05 0.0009123640
3 0.002512929 0.0001274554 0.001459012 0.002004385 1.534503e-03 -0.0001289984
2007-02-07 2007-02-08 2007-02-09 2007-02-12 2007-02-13
1 0.001088440 0.0005737837 -0.0000244410 -0.003474908 0.0039971257
2 0.002937055 -0.0060861880 0.0065013820 -0.004102506 -0.0023852220
3 -0.001036504 0.0008835816 -0.0007572888 -0.001322240 0.0002060698
2007-02-14 2007-02-15 2007-02-16 2007-02-19 2007-02-20
1 0.003429143 0.0003378953 -0.0010496707 0.001237031 0.002481922
2 0.006839015 0.0014878740 0.0030582220 0.000894606 -0.003270610
3 0.001787848 0.0004864332 -0.0005123012 -0.002126213 0.001312554
2007-02-21 2007-02-22 2007-02-23 2007-02-26 2007-02-27
1 -0.0003544737 0.004133925 -0.003687659 -0.0006432653 -0.028181932
2 -0.0099202370 0.003907766 -0.000693145 -0.0035030230 -0.035746244
3 0.0008831304 -0.000198173 0.000224204 0.0003787182 -0.003557775
2007-02-28 2007-03-01 2007-03-02 2007-03-05 2007-03-06 2007-03-07
1 -0.006545974 -0.004681529 -0.007204097 -0.016965306 0.0131994283 0.001848668
2 -0.011946524 -0.001959064 0.002364749 -0.014462125 0.0116318040 0.011790712
3 -0.001276448 -0.000431924 -0.001183209 -0.000934475 0.0006908982 0.000013807
2007-03-08 2007-03-09 2007-03-12 2007-03-13 2007-03-14 2007-03-15
1 0.01238125 0.0054817963 -0.0019347513 -0.011489525 -0.016127064 0.010698674
2 0.01192565 0.0003297920 -0.0031733170 -0.007777588 -0.028205491 0.014756610
3 0.00211282 0.0003630802 0.0003176836 -0.001476343 -0.003026142 0.003330527
2007-03-16 2007-03-19 2007-03-20 2007-03-21 2007-03-22 2007-03-23
1 -0.0040807663 0.011023104 0.006051645 0.008317470 0.006371441 0.003735611
2 0.0011294700 0.014488916 0.004157206 0.007430739 0.013969587 0.001698465
3 -0.0004977044 0.001539199 0.001811865 0.002705776 0.002193543 -0.000066116
2007-03-26 2007-03-27 2007-03-28 2007-03-29 2007-03-30
1 -0.0038874940 -0.0024984800 -0.007625166 0.008954610 0.0028986417
2 -0.0075840220 -0.0047538740 -0.009870398 0.011153959 0.0004970940
3 -0.0006071596 -0.0008749648 -0.001164874 0.001407261 -0.0002237774
2007-04-02 2007-04-03 2007-04-04 2007-04-05 2007-04-06
1 -0.0013403807 0.008656804 0.0031424653 -0.0014326240 -0.0004997497
2 -0.0014515930 0.010335207 0.0013071590 0.0050004920 0.0000000000
3 0.0000685862 0.002129822 0.0004029036 -0.0001666756 -0.0000500682
2007-04-09 2007-04-10 2007-04-11
1 0.006564997 0.0004269093 -0.001272110
2 0.000000000 0.0063294250 -0.001044170
3 0.000570819 -0.0000510372 -0.000411593
SBI SPI SII LMI MPI ALT LPP25 LPP40 LPP60
3 2 3 3 1 1 3 3 1
Within cluster sum of squares by cluster:
[1] 0.003806242 0.000000000 0.005432037
(between_SS / total_SS = 75.3 %)
Available components:
[1] "cluster" "centers" "totss" "withinss" "tot.withinss"
[6] "betweenss" "size" "iter" "ifault"
>
++++++++++++++++++++++++++++++++++++++
Description
Adjust all columns of an OHLC object for split and dividend.
Usage
adjustOHLC(x,
adjust = c("split","dividend"),
use.Adjusted = FALSE,
ratio = NULL,
symbol.name=deparse(substitute(x)))
Arguments
x | An OHLC object |
adjust | adjust by split, dividend, or both (default) |
use.Adjusted | use the ‘Adjusted' column in Yahoo! data to adjust |
ratio | ratio to adjust with, bypassing internal calculations |
symbol.name | used if x is not named the same as the symbol adjusting |
Details
This function calculates the adjusted Open, High, Low, and Close prices according to split and dividend information.
There are three methods available to calculate the new OHLC object prices.
By default, getSplits and getDividends are called to retrieve the respective information. These may dispatch to custom methods following the “.” methodology used by quantmod dispatch. See getSymbols for information related to extending quantmod. This information is passed to adjRatios from the TTR package, and the resulting ratio calculations are used to adjust to observed historical prices. This is the most precise way to adjust a series.
The second method works only on standard Yahoo! data containing an explicit Adjusted column.
A final method allows for one to pass a ratio into the function directly.
All methods proceed as follows:
Value
An object of the original class, with prices adjusted for splits and dividends.
Warning
Using use.Adjusted = TRUE will be less precise than the method that employs actual split and dividend information. This is due to loss of precision from Yahoo! using Adjusted columns of only two decimal places. The advantage is that this can be run offline, and for short series or those with few adjustments the loss of precision will be small.
The resulting precision loss will be from row observation to row observation, as the calculation will be exact for intraday values.
Author(s)
Jeffrey A. Ryan
References
Yahoo Finance http://finance.yahoo.com
See Also
getSymbols.yahoo getSplits getDividends
Examples
getSymbols("AAPL", from="1990-01-01",
src="yahoo")
head(AAPL)
head(AAPL.a <- adjustOHLC(AAPL))
head(AAPL.uA <- adjustOHLC(AAPL,
use.Adjusted=TRUE))
# intraday adjustments are precise across all
# methods
# an example with Open to Close (OpCl)
head(cbind(OpCl(AAPL),OpCl(AAPL.a),
OpCl(AAPL.uA)))
# Close to Close changes may lose precision
head(cbind(ClCl(AAPL),ClCl(AAPL.a),
ClCl(AAPL.uA)))
## End
In the R domain:
>
> install.packages("quantmod")
Installing package into ‘C:/Users/Bert/Documents/R/win-library/3.2’
(as ‘lib’ is unspecified)
--- Please select a CRAN mirror for use in this session
A CRAN mirror is selected.
---
also installing the dependencies ‘xts’, ‘TTR’
sums checked
The downloaded binary packages are in
C:UsersBertAppDataLocalTempRtmp2jzxqkdownloaded_packages
> library(quantmod)
> ls("package:quantmod")
[1] "Ad" "add_axis" "add_BBands"
[4] "add_DEMA" "add_EMA" "add_EVWMA"
[7] "add_GMMA" "add_MACD" "add_RSI"
[10] "add_Series" "add_SMA" "add_SMI"
[13] "add_TA" "add_VMA" "add_Vo"
[16] "add_VWAP" "add_WMA" "addADX"
[19] "addAroon" "addAroonOsc" "addATR"
[22] "addBBands" "addCCI" "addChAD"
[25] "addChVol" "addCLV" "addCMF"
[28] "addCMO" "addDEMA" "addDPO"
[31] "addEMA" "addEMV" "addEnvelope"
[34] "addEVWMA" "addExpiry" "addKST"
[37] "addLines" "addMACD" "addMFI"
[40] "addMomentum" "addOBV" "addPoints"
[43] "addROC" "addRSI" "addSAR"
[46] "addShading" "addSMA" "addSMI"
[49] "addTA" "addTDI" "addTRIX"
[52] "addVo" "addVolatility" "addWMA"
[55] "addWPR" "addZigZag" "addZLEMA"
[58] "adjustOHLC" "allReturns" "annualReturn"
[61] "as.quantmod.OHLC" "attachSymbols" "axTicksByTime2"
[64] "axTicksByValue" "barChart" "buildData"
[67] "buildModel" "candleChart" "chart_pars"
[70] "chart_Series" "chart_theme" "chartSeries"
[73] "chartShading" "chartTA" "chartTheme"
[76] "Cl" "ClCl" "current.chob"
[79] "dailyReturn" "Delt" "dropTA"
[82] "findPeaks" "findValleys" "fittedModel"
[85] "fittedModel<-" "flushSymbols" "futures.expiry"
[88] "getDefaults" "getDividends" "getFin"
[91] "getFinancials" "getFX" "getMetals"
[94] "getModelData" "getOptionChain" "getPrice"
[97] "getQuote" "getSplits" "getSymbolLookup"
[100] "getSymbols" "getSymbols.csv" "getSymbols.FRED"
[103] "getSymbols.google" "getSymbols.mysql" "getSymbols.MySQL"
[106] "getSymbols.oanda" "getSymbols.rda" "getSymbols.RData"
[109] "getSymbols.SQLite" "getSymbols.yahoo" "getSymbols.yahooj"
[112] "has.Ad" "has.Ask" "has.Bid"
[115] "has.Cl" "has.Hi" "has.HLC"
[118] "has.Lo" "has.OHLC" "has.OHLCV"
[121] "has.Op" "has.Price" "has.Qty"
[124] "has.Trade" "has.Vo" "Hi"
[127] "HiCl" "HLC" "importDefaults"
[130] "is.BBO" "is.HLC" "is.OHLC"
[133] "is.OHLCV" "is.quantmod" "is.quantmodResults"
[136] "is.TBBO" "Lag" "lineChart"
[139] "listTA" "Lo" "loadSymbolLookup"
[142] "loadSymbols" "LoCl" "LoHi"
[145] "matchChart" "modelData" "modelSignal"
[148] "monthlyReturn" "moveTA" "new.replot"
[151] "newTA" "Next" "oanda.currencies"
[154] "OHLC" "OHLCV" "Op"
[157] "OpCl" "OpHi" "OpLo"
[160] "OpOp" "options.expiry" "peak"
[163] "periodReturn" "quantmodenv" "quarterlyReturn"
[166] "reChart" "removeSymbols" "saveChart"
[169] "saveSymbolLookup" "saveSymbols" "seriesAccel"
[172] "seriesDecel" "seriesDecr" "seriesHi"
[175] "seriesIncr" "seriesLo" "setDefaults"
[178] "setSymbolLookup" "setTA" "show"
[181] "showSymbols" "specifyModel" "standardQuote"
[184] "summary" "swapTA" "tradeModel"
[187] "unsetDefaults" "unsetTA" "valley"
[190] "viewFin" "viewFinancials" "Vo"
[193] "weeklyReturn" "yahooQF" "yahooQuote.EOD"
[196] "yearlyReturn" "zoom_Chart" "zoomChart"
[199] "zooom"
> adjustOHLC
function (x, adjust = c("split", "dividend"), use.Adjusted = FALSE,
ratio = NULL, symbol.name = deparse(substitute(x)))
{
if (is.null(ratio)) {
if (use.Adjusted) {
if (!has.Ad(x))
stop("no Adjusted column in 'x'")
ratio <- Ad(x)/Cl(x)
}
else {
div <- getDividends(symbol.name, from = "1900-01-01")
splits <- getSplits(symbol.name, from = "1900-01-01")
if (is.xts(splits) && is.xts(div) && nrow(splits) >
0 && nrow(div) > 0)
div <- div * 1/adjRatios(splits = merge(splits,
index(div)))[, 1]
ratios <- adjRatios(splits, div, Cl(x))
if (length(adjust) == 1 && adjust == "split") {
ratio <- ratios[, 1]
}
else if (length(adjust) == 1 && adjust == "dividend") {
ratio <- ratios[, 2]
}
else ratio <- ratios[, 1] * ratios[, 2]
}
}
Adjusted <- Cl(x) * ratio
structure(cbind((ratio * (Op(x) - Cl(x)) + Adjusted), (ratio *
(Hi(x) - Cl(x)) + Adjusted), (ratio * (Lo(x) - Cl(x)) +
Adjusted), Adjusted, if (has.Vo(x))
Vo(x)
else NULL, if (has.Ad(x))
Ad(x)
else NULL), .Dimnames = list(NULL, colnames(x)))
}
<environment: namespace:quantmod>
> ## Not run:
> getSymbols("AAPL", from="1990-01-01", src="yahoo")
As of 0.4-0, ‘getSymbols’ uses env=parent.frame() and
auto.assign=TRUE by default.
This behavior will be phased out in 0.5-0 when the call will default to use auto.assign=FALSE. getOption("getSymbols.env") and getOptions("getSymbols.auto.assign") are now checked for alternate defaults.
This message is shown once per session and may be disabled by setting options("getSymbols.warning4.0"=FALSE). See ?getSymbols for more details.
[1] "AAPL"
Warning message:
In download.file(paste(yahoo.URL, "s=", Symbols.name, "&a=", from.m, :
downloaded length 452489 != reported length 200
> head(AAPL)
AAPL. AAPL AAPL. AAPL. AAPL. AAPL.
Open High Low Close Volume Adjusted
1990-01-02 35.25 37.50 35.00 37.250 45799600 1.132075
1990-01-03 38.00 38.00 37.50 37.500 51998800 1.139673
1990-01-04 38.25 38.75 37.25 37.625 55378400 1.143471
1990-01-05 37.75 38.25 37.00 37.75 30828000 1.147270
1990-01-08 37.50 38.00 37.00 38.000 25393200 1.154868
1990-01-09 38.00 38.00 37.00 37.625 21534800 1.143471
> head(AAPL.a <- adjustOHLC(AAPL))
AAPL. AAPL. AAPL. AAPL. AAPL. AAPL.
Open High Low Close Volume Adjusted
1990-01-02 1.071292 1.139673 1.063694 1.132075 45799600 1.132075
1990-01-03 1.154868 1.154868 1.139673 1.139673 51998800 1.139673
1990-01-04 1.162466 1.177662 1.132075 1.143471 55378400 1.143471
1990-01-05 1.147270 1.162466 1.124477 1.147270 30828000 1.147270
1990-01-08 1.139673 1.154868 1.124477 1.154868 25393200 1.154868
1990-01-09 1.154868 1.154868 1.124477 1.143471 21534800 1.143471
> head(AAPL.uA <- adjustOHLC(AAPL,
+ use.Adjusted=TRUE))
AAPL. AAPL. AAPL. AAPL. AAPL. AAPL.
Open High Low Close Volume Adjusted
1990-01-02 1.071292 1.139673 1.063695 1.132075 45799600 1.132075
1990-01-03 1.154869 1.154869 1.139673 1.139673 51998800 1.139673
1990-01-04 1.162466 1.177661 1.132074 1.143471 55378400 1.143471
1990-01-05 1.147270 1.162466 1.124477 1.147270 30828000 1.147270
1990-01-08 1.139672 1.154868 1.124477 1.154868 25393200 1.154868
1990-01-09 1.154868 1.154868 1.124476 1.143471 21534800 1.143471
>
> # intraday adjustments are precise across all methods
> # an example with Open to Close (OpCl)
> head(cbind(OpCl(AAPL),OpCl(AAPL.a),
+ OpCl(AAPL.uA)))
OpCl.AAPL OpCl.AAPL.a OpCl.AAPL.uA
1990-01-02 0.056737647 0.056737647 0.056737647
1990-01-03 -0.013157869 -0.013157869 -0.013157869
1990-01-04 -0.016339895 -0.016339895 -0.016339895
1990-01-05 0.000000000 0.000000000 0.000000000
1990-01-08 0.013333307 0.013333307 0.013333307
1990-01-09 -0.009868395 -0.009868395 -0.009868395
> # Close to Close changes may lose precision
> head(cbind(ClCl(AAPL),ClCl(AAPL.a),
+ ClCl(AAPL.uA)))
ClCl.AAPL ClCl.AAPL.a ClCl.AAPL.uA
1990-01-02 NA NA NA
1990-01-03 0.006711382 0.006711382 0.006711569
1990-01-04 0.003333333 0.003333333 0.003332535
1990-01-05 0.003322259 0.003322259 0.003322340
1990-01-08 0.006622490 0.006622490 0.006622678
1990-01-09 -0.009868395 -0.009868395 -0.009868660
>
Description
Charting tool to create standard financial charts gives a time series like object. Serves as the base function for future technical analysis additions. Possible chart styles include candles, matches (1 pixel candles), bars, and lines. Chart may have white or black background.
reChart allows for dynamic changes to the chart without having to respecify the full chart parameters.
Usage
chartSeries(x, type = c("auto",
"candlesticks", "matchsticks",
"bars","line"), subset = NULL,
show.grid = TRUE, name = NULL,
time.scale = NULL,
log.scale = FALSE, TA = 'addVo()',
TAsep=';', line.type = "l",
bar.type = "ohlc",
theme = chartTheme("black"),
layout = NA,
major.ticks='auto', minor.ticks=TRUE,
yrange=NULL,
plot=TRUE,
up.col,dn.col,color.vol
= TRUE, multi.col = FALSE, ...)
reChart(type = c("auto", "candlesticks",
"matchsticks", "bars","line"),
subset = NULL,
show.grid = TRUE,
chartSeries 23
name = NULL,
time.scale = NULL,
line.type = "l",
bar.type = "ohlc",
theme = chartTheme("black"),
major.ticks='auto', minor.ticks=TRUE,
yrange=NULL,
up.col,dn.col,color.vol =
TRUE, multi.col = FALSE,
...)
Arguments
x | an OHLC object – see details |
type | style of chart to draw |
subset xts | style date subsetting argument |
show.grid | display price grid lines? |
name | name of chart |
time.scale | what is the timescale? automatically deduced (broken) |
log.scale | should the y-axis be log-scaled? |
TA | a vector of technical indicators and params, or character strings |
TAsep | TA delimiter for TA strings |
line.type | type of line in line chart |
bar.type | type of barchart - ohlc or hlc |
theme | a chart.theme object |
layout | if NULL bypass internal layout |
major.ticks | where should major ticks be drawn |
minor.ticks | should minor ticks be drawn? |
yrange | override y-scale |
plot | should plot be drawn |
up.col | up bar/candle color |
dn.col | down bar/candle color |
color.vol | color code volume? |
multi.col | 4 color candle pattern |
… | additional parameters |
Details
Currently, chart displays standard style OHLC charts familiar in financial applications or line charts not passing OHLC data. Works with objects having explicit time-series properties.
Line charts are created with close data, or from single column time series.
The subset argument can be used to specify a particular area of the series to view. The underlying series is left intact to allow for TA functions to use the full data set. Additionally, it is possible to use syntax borrowed from the first and last functions, for example, “last 4 months.”
TA allows for the inclusion of a variety of chart overlays and technical indicators. A full list is available from addTA. The default TA argument is addVo() – which adds volume, if available, to the chart being drawn.
theme requires an object of class chart.theme, created by a call to chartTheme. This function can be used to modify the look of the resulting chart. See chart.theme for details. line.type and bar.type allow further fine tuning of chart styles to user tastes. multi.col implements a color coding scheme used in some charting applications, and follows the following rules:
• grey => Op[t] < Cl[t] and Op[t] < Cl[t-1]
• white => Op[t] < Cl[t] and Op[t] > Cl[t-1]
• red => Op[t] > Cl[t] and Op[t] < Cl[t-1]
• black => Op[t] > Cl[t] and Op[t] > Cl[t-1]
reChart takes any number of arguments from the original chart call and redraws the chart with the updated parameters. One item of note: if multiple color bars/candles are desired, it is necessary to respecify the theme argument. Additionally, it is not possible to change TA parameters at present. This must be done with addTA/dropTA/swapTA/moveTA commands.
Value
Returns a standard chart plus volume, if available, suitably scaled.
If plot=FALSE a chob object will be returned.
Note
Most details can be fine-tuned within the function, though the code does a reasonable job of scaling and labeling axes for the user. The current implementation maintains a record of actions carried out for any particular chart. This is used to recreate the original when adding new indicator. A list of applied TA actions is available with a call to listTA. This list can be assigned to a variable and used in new chart calls to recreate a set of technical indicators. It is also possible to force all future charts to use the same indicators by calling setTA.
Additional motivation to add outlined candles to allow for scaling and advanced color coding is owed to Josh Ulrich, as are the base functions (from TTR) for the yet to be released technical analysis charting code.
Many improvements in the current version were the result of conversations with Gabor Grothendieck. Many thanks to him.
Author(s)
Jeffrey A. Ryan
References
Josh Ulrich - TTR package and multi.col coding
See Also
getSymbols, addTA, setTA, chartTheme
Examples
## Not run:
getSymbols("YHOO")
chartSeries(YHOO)
chartSeries(YHOO, subset='last 4 months')
chartSeries(YHOO, subset='2007::2008-01')
chartSeries(YHOO,theme=chartTheme('white'))
chartSeries(YHOO,TA=NULL) #no volume
chartSeries(YHOO,TA=c(addVo(),addBBands())) #add volume and Bollinger Bands from TTR
addMACD() # add MACD indicator to current chart
setTA()
chartSeries(YHOO) # draws chart again, this time will all indicators present
## End(Not run)
Description
Charting tool to create standard financial charts gives a time series like object. Serves as the base function for future technical analysis additions. Possible chart styles include candles, matches (1 pixel candles), bars, and lines. Chart may have white or black background. reChart allows for dynamic changes to the chart without having to respecify the full chart parameters.
Usage
chartSeries(x,
type = c("auto", "candlesticks", "matchsticks", "bars", "line"),
subset = NULL,
show.grid = TRUE,
name = NULL,
time.scale = NULL,
log.scale = FALSE,
TA = 'addVo()',
TAsep=';',
line.type = "l",
bar.type = "ohlc",
theme = chartTheme("black"),
layout = NA,
major.ticks='auto', minor.ticks=TRUE,
yrange=NULL,
plot=TRUE,
up.col,dn.col,color.vol = TRUE, multi.col = FALSE,
...)
reChart(type = c("auto", "candlesticks", "matchsticks", "bars", "line"),
subset = NULL,
show.grid = TRUE,
chartSeries 23
name = NULL,
time.scale = NULL,
line.type = "l",
bar.type = "ohlc",
theme = chartTheme("black"),
major.ticks='auto', minor.ticks=TRUE,
yrange=NULL,
up.col,dn.col,color.vol = TRUE, multi.col = FALSE,
...)
Arguments
x | an OHLC object – see details |
type | style of chart to draw |
subset xts | style date subsetting argument |
show.grid | display price grid lines? |
name | name of chart |
time.scale | what is the timescale? automatically deduced (broken) |
log.scale | should the y-axis be log-scaled? |
TA | a vector of technical indicators and params, or character strings |
TAsep | TA delimiter for TA strings |
line.type | type of line in line chart |
bar.type | type of barchart – ohlc or hlc |
theme | a chart.theme object |
layout | if NULL bypass internal layout |
major.ticks | where should major ticks be drawn |
minor.ticks | should minor ticks be drawn? |
yrange | override y-scale |
plot | should plot be drawn |
up.col | up bar/candle color |
dn.col | down bar/candle color |
color.vol | color code volume? |
multi.col | 4 color candle pattern |
… | additional parameters |
Details
Currently, charts displays standard style OHLC charts familiar in financial applications or line charts not passing OHLC data. Works with objects having explicit time-series properties.
Line charts are created with close data, or from single column time series.
The subset argument can be used to specify a particular area of the series to view. The underlying series is left intact to allow for TA functions to use the full data set. Additionally, it is possible to use syntax borrowed from the first and last functions, for example, “last 4 months.”
TA allows for the inclusion of a variety of chart overlays and technical indicators. A full list is available from addTA. The default TA argument is addVo() – which adds volume, if available, to the chart being drawn. theme requires an object of class chart.theme, created by a call to chartTheme. This function can be used to modify the look of the resulting chart. See chart.theme for details. line.type and bar.type allow further fine tuning of chart styles to user tastes. multi.col implements a color coding scheme used in some charting applications, and follows the following rules:
• grey => Op[t] < Cl[t] and Op[t] < Cl[t-1]
• white => Op[t] < Cl[t] and Op[t] > Cl[t-1]
• red => Op[t] > Cl[t] and Op[t] < Cl[t-1]
• black => Op[t] > Cl[t] and Op[t] > Cl[t-1]
reChart takes any number of arguments from the original chart call and redraws the chart with the updated parameters. One item of note: If multiple color bars/candles are desired, it is necessary to respecify the theme argument. Additionally, it is not possible to change TA parameters at present.
This must be done with addTA/dropTA/swapTA/moveTA commands.
Value
Returns a standard chart plus volume, if available, suitably scaled.
If plot=FALSE, a chob object will be returned.
Note
Most details can be fine-tuned within the function, though the code does a reasonable job of scaling and labeling axes for the user.
The current implementation maintains a record of actions carried out for any particular chart. This is used to recreate the original when adding new indicator. A list of applied TA actions is available with a call to listTA.
This list can be assigned to a variable and used in new chart calls to recreate a set of technical indicators. It is also possible to force all future charts to use the same indicators by calling setTA.
Additional motivation to add outlined candles to allow for scaling and advanced color coding is owed to Josh Ulrich, as are the base functions (from TTR) for the yet to be released technical analysis charting code.
Many improvements in the current version were the result of conversations with Gabor Grothendieck. Many thanks to him.
Author(s)
Jeffrey A. Ryan
ReferencesJosh Ulrich - TTR package and multi.col coding
See Also
getSymbols, addTA, setTA, chartTheme
Using the R code segment below,
> x <- rnorm(1:50)
> x
> install.packages(“epibasix”)
> library(epibasix)
> univar(x)
> x
plot {graphics}
R Documentation
Generic X-Y Plotting
Description
Generic function for plotting of R objects. For more details about the graphical parameter arguments, see par.
For simple scatter plots, plot.default will be used. However, there are plot methods for many R objects, including functions, data.frames, density objects, and so on. Use methods(plot) and the documentation for these.
Usage
plot(x, y,…)
Arguments
x: | the coordinates of points in the plot. Alternatively, a single plotting structure, function, or any R object with a plot method can be provided. |
Y: | the y coordinates of points in the plot, optional if x is an appropriate structure. |
… | Arguments to be passed to methods, such as graphical parameters (see par). Many methods will accept the following arguments: |
type
what type of plot should be drawn. Possible types are
All other types give a warning or an error; using, for example, type = “punkte” being equivalent to type = “p” for S compatibility. Note that some methods, for example, plot.factor, do not accept this.
Main an overall title for the plot: see title.
Sub a sub title for the plot: see title.
Xlab a title for the x-axis: see title.
Ylab a title for the y-axis: see title.
Asp the y/x aspect ratio: see plot.window.
Details
The two step types differ in their x–y preference: Going from (x1,y1) to (x2,y2) with x1 < x2, type = “s” moves first horizontal, then vertical, whereas type = “S” moves the other way around.
See Also
plot.default, plot.formula and other methods; points, lines, par.
For X-Y-Z plotting see contour, persp, and image.
3.138.122.4