BigQuery public datasets

Google is continually adding publicly available data for developers to use and evaluate BigQuery's capabilities and performance. They can also build demo products based on these public datasets. The user will not be billed for the storage part of these public datasets, but they will be billed for the bytes processed when they run a query on these public datasets. As mentioned previously, the user can use a validator to estimate the number of bytes to be processed for a query.

If you are an IT service provider, then showcase your ideas on Big Data using the public datasets in BigQuery. You can see some of the cool dashboards built for BigQuery data at https://www.bimeanalytics.com/dashboards.

One of the datasets that contains huge data is bigquery-public-data:github_repos, which stores GitHub data for the repositories. One of the tables in the dataset, named files, has over 2 billion records. Querying such large data will give users an idea of the performance of BigQuery. To view that table click on the dropdown menu in the project and choose Display project as shown in the following screenshot:

Enter the project name bigquery-public-data in the dialog box and click on the OK button after choosing the options shown in the screenshot:

Choose the files table in the project bigquery-public-data under the dataset github_repos as shown in the following screenshot. Look at the schema for the table and execute some sample queries in this table to evaluate the performance of BigQuery:

As per a white paper in 2012 (https://cloud.google.com/files/BigQueryTechnicalWP.pdf), BigQuery can complete a full scan of 35 billion rows and return results in tens of seconds without any index for the table.

Table of Contents for BigQuery public datasets

Create new playlist

Sign In

Sign Up

Table of Contents for
BigQuery public datasets