How it works...

The recipe revolves around doing a series of different lookups on the database, each time receiving a little more information to work with.

In Step 1, we use the listMarts() function to get a list of all of the BioMarts available at the specified host URL. Change the URL as appropriate when you want to connect to a different server. We get a dataframe of the available marts and use that information.

In Step 2, we create a connection object called gramene_connection with the useMart() function, passing in the server URL and the specific BioMart from Step 1.

In Step 3, we pass gramene_connection to the listDatasets() function to retrieve the datasets in this biomart. Having selected one of the datasets (atrichopda_eg_gene), we can run the useMart() function to create a connection to the datasets in that biomart, naming the object data_set_connection.

In Step 4, we're nearly done working out which datasets we can use. Here, we use data_set_connection, which we created in the listAttributes() function, to get a list of the types of information we can retrieve from this dataset.

At Step 5, we finally get some actual information with the main function, getBM(). We set the attributes argument to the names of the data we want to get back; here, we get all values for chromosome_name and save them in a vector, chrom_names.

In Step 6, we set up filters—the restrictions on which values to receive. We first ask the data_set_connection object which filters we can use with the listFilters() function. Notice from the returned filters object that we can filter on chromosome_name, so we'll use that.

In Step 7, we set up a full query. Here, we intend to get all genes on the first chromosome. Note that we already have a list of chromosomes from Step 5, so we take the first element of the chrom_names object to use in the filter, saving it in first_chr. To perform the query, we use the getBM() function, with the ensembl_gene_id and description attributes. We set the filter argument to the data type we wish to filter on and set the values argument to the value of the filter we wish to keep. We also pass the data_set_connection object as the BioMart to use. The resulting genes object contains ensembl_gene_id and descriptions on the first chromosome, as follows:

## ensembl_gene_id           description
## 1 AMTR_s00001p00009420 hypothetical protein
## 2 AMTR_s00001p00015790 hypothetical protein
## 3 AMTR_s00001p00016330 hypothetical protein
## 4 AMTR_s00001p00017690 hypothetical protein
## 5 AMTR_s00001p00018090 hypothetical protein
## 6 AMTR_s00001p00019800 hypothetical protein
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.144.222.175