Spark DataFrames within the RStudio IDE

Another simple way to start your Spark connections and browse your datasets is with the RStudio IDE. After you've installed the sparklyr package, it'll appear in the top-right part of your RStudio window, close to your R environment. If you aren't connected to Spark, it'll look like the following screenshot. If you are connected, call spark_disconnect_all() before continuing, so we'll be on the same page:

Figure 12.1: Spark shown in RStudio IDE

Click on the left arrow to see all connections, then click on the new connection button to establish a connection. A window will pop up where you can connect and manage Spark only with clicks:

Figure 12.2: Spark connection guide from RStudio IDE

Once you upload a DataFrame into your Spark connection, you can browse it by selecting the file shown here, just as you are used to doing with your RStudio DataFrames. They will appear inside the connection IDE, shown in Figure 12.1.

All resources available in R can be used with Spark, it requires some extra time to learn how you can use them, more than these few pages here. Once that Spark is a great big dataset tool, if you are dealing with this kind of data, then I think the work hours are well paid. You should start by clicking on Using Spark with RStudio shown in Figure 12.2; there you will find a lot of information about it. I recommend you check the deployment examples.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.219.220.22