Sometimes, when working with large datasets, we might find that a lot of data points on a scatter plot overlap each other. In this recipe, we will learn how to distinguish between closely packed data points by adding a small amount of noise with the jitter()
function.
All you need for the next recipe is to type it in the R prompt as we will use some base library functions to define a new error bar function. You can also save the recipe code as a script so that you can use it again later on.
First, let's create a graph that has a lot of overlapping points:
x <- rbinom(1000, 10, 0.25) y <- rbinom(1000, 10, 0.25) plot(x,y)
Now, let's add some noise to the data points to see whether there are overlapping points:
plot(jitter(x), jitter(y))
In the first graph, we plotted 1,000 random data points generated with the rbinom()
function. However, as you can see in the first graph, only a few data points are visible because there are multiple data points in the exact same location. Then, when we plotted the points by applying the jitter()
function to the x
and y
values, we saw a lot more of the thousand points. We can also see that most of the data is in the range of x
and y
values of 2
to 4
, respectively.
18.222.106.30