Connecting to the PostgreSQL database on AWS

In this section, we are going to talk about how to connect a PostgreSQL database on AWS by using the RPostgreSQL library. If you don't have this library on your R environment, you can install it by using the following command: 

install.packages("RPostgreSQL")
  • Sign into your AWS account, and click Instances from the left menu, as shown in the following screenshot:

Dashboard for an RDS Instance, which is called development
  • Scroll down to the Details section and then find the endpoint:

List of details regarding your instance
  • Now we can connect to the PostgreSQL database using the endpoint mentioned here: development.clgjf7569y1t.us-west-2.rds.amazonaws.com
When you are connecting your own database, please be sure that you are using correct endpoint. 
  • Let's use R to connect to the database. 
  • Most important, we need to install the RPostgreSQL package. We will use the install.packages() command, and we will then be able to install the package directly:
      > install.packages("RPostgreSQL")
  • Then, the next step will be calling the package by using a library() command:
      > library("RPostgreSQL")
  • We will then do a setup to connect to the PostgreSQL DB using R. We are loading  RPostgreSQL driver and using it in the following step to connect to the DB:
      drv <- dbDriver("PostgreSQL")
  • When we use the DB driver, earlier stored in a drv variable, we can then connect to the PostgreSQL DB by using the following R commands.
  • We are going to connect test2 database which we have created
  • Make sure that you are using the variable to store the connection for querying
      con <- dbConnect(drv, dbname = "test2", 

      host = "endpointofyourinstance",port=5432,user= "username", 
password = "pass"

We have learned how to connect to a PostgreSQL database using R.

  • Let's create an R script that scraps the web page and stores this data to the database on AWS. 

As you remember, we have already written the following script to scrap devveri (http://devveri.com/). We are going to use the same script to collect data, but we will be just adding a block of scripts to connect db and store data on db:

 #loading library
library(rvest)

#creating NULL variables
name <- NULL
date <- NULL
comment_count <- NULL

#links for page
urls <- "http://devveri.com/"

#reading main url
h <- read_html(urls)

#getting names
n<- html_nodes(h, xpath =
'/html/body/div[3]/div/div[1]/div/h2/a/text()')

#getting dates
d<- html_nodes(h, xpath =
'/html/body/div[3]/div/div[1]/div/p[1]/span[1]/text()')

#getting comment counts
comc<- html_nodes(h, xpath =
'/html/body/div[3]/div/div[1]/div/p[1]/span[4]/a/text()')

#saving results
name<- as.matrix(as.character(n))
date<- as.matrix(as.character(d))
comment_count <- as.matrix(as.character(comc))

#creating final data frame
final_data <- data.frame(name, comment_count, date)
  • We will need to load the library.
      library("RPostgreSQL")
  • To connect a database via R, we are going to load the PostgreSQL driver to the R environment, because we have the PostgreSQL database.
      #load db driver
drv <- dbDriver("PostgreSQL")
  • We can connect to the database by using the following command. To do it, we have to use the dbConnect function. In this function, we are defining the database name, host(endpoint on AWS), username, and password:
      #connecting to db
con <-dbConnect(drv,dbname = "test2",host = "endpointofyourinstance",
port = 5432,user = "user",password = "pass")
  • Then, let's store our final data frame to the database. To do this, we are using dbWriteTable function. First, we are defining the connection that we stored to the con variable.
  • Then, we are defining our database table name, which is going to be devveri:
     #writing final dataframes to the table
dbWriteTable(con, "devveri", final_data, row.names = FALSE,
append = TRUE)

We have created an R script to collect data from the devveri.com and store the data to the PostgreSQL database, which is hosted on AWS.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.145.16.23