We’ve covered a lot of conceptual ground, and it’s now time to start using CockroachDB as an end user. We’ve created clusters and tables; now, let’s connect to them and put them to work.
Connecting to CockroachDB
When presented with a new database, the first thing you might want to do is connect to it and start querying! In this section, we’ll connect to self-hosted and cloud-based CockroachDB clusters with tools and from code.
Connecting with Tools
There are plenty of off-the-shelf tools you can use to connect to a CockroachDB cluster. Some are free, and some you’ll pay to use.
DBeaver – https://dbeaver.com
DataGrip – www.jetbrains.com/datagrip
TablePlus – https://tableplus.com
Note that port 26257 is the default port used by CockroachDB for database connections. As I’ve not altered the default port, I’m using 26257 as the connection port number here.
The database we’ve connected to is basic and does not use client certificates. I’ll now connect to a Cockroach Cloud database to show you how to do this with DBeaver.
Unlike a local demo database, a free-tier Cockroach Cloud database requires a certificate and an argument to tell Cockroach Cloud which cluster to connect to (it’s multitenant, meaning clusters for different users are co-hosted).
With these changes made, we can connect to a Cockroach Cloud database.
Connecting Programmatically
You can connect to CockroachDB from many different programming languages . Cockroach Labs lists the available drivers on their website1 along with example applications for each.2
To give you a good introduction to working with CockroachDB programmatically , I’ll perform some basic operations against a database using some popular drivers now.
The examples that follow show how to connect to CockroachDB and get results using several common programming languages . In each case, I’ve chosen succinctness over completeness in the interest of brevity.
Go Example
In this example, I’ll connect to the “defaultdb” database and perform an INSERT and SELECT against the “person” table. I’ll be using a popular Go driver for Postgres called pgx.
Next, we’ll create a simple application in a main.go file. I’ll share the key building blocks of the application here, as the complete code listing will be available on GitHub.
For brevity, I’m passing a context.Context value of context.Background(), which is fine for toy examples like this one. However, in scenarios where you’re interacting with an actor, you’ll want to pass the request context.
The pgx package does not return an error from calls to Close() on *pgxpool.Pool structs, owing to the pooled nature of the connections it manages. If you’re using the pgx.Connect() variant of this function – which I don’t recommend for multithreaded environments – you’ll receive an object with a Close() function to call.
Python Example
Next, we’ll create a Python application to do the same thing, connect to the “defaultdb” database, INSERT into, and SELECT out of.
It’s a matter of preference whether you use the connect function that takes a connection string/Data Source Name (DSN) or separate variables for each of the connection elements.
psycopg2’s cursor function returns a cursor to use when executing statements against the database. Using the with statement ensures that transactions will be rolled back in the event that an exception is raised and committed if no exception is raised.
psycopg2’s fetchall command returns a list of tuples, one for each row returned. We can access the column values for each tuple by their index.
Ruby Example
Queries that take parameters should always be parameterized. In Ruby, you pass parameters into a query using the exec_params function. I then pass a multidimensional array as a parameter, with each array containing the fields of a row to insert.
Crystal Example
Crystal’s turn! I’ve been using Crystal for a number of personal projects recently (including projects backed by CockroachDB), so I wanted to make sure I covered it.
C# Example
Designing Databases
The database schemas we’ve designed thus far have remained purposefully simple to help demonstrate specific database features like data types. The database schemas we’ll create in this section will be a more accurate reflection of what you’ll create in the wild.
Database Design
We’ll start by looking at CockroachDB’s topmost object: the database . Up to this point, we’ve created tables against the defaultdb for simplicity. It’s now time to create our own database.
An important decision to make before creating your database is where it will live. Will it be located in a single region, or will it span multiple regions?
If your database is to span multiple regions, you’ll want to provide CockroachDB with some region hints when creating your database. These include the database’s primary region and any other region it is to operate within.
The results of SHOW REGIONS confirm that our “db_name” database is running in the European regions and has a primary region of “europe-west1”, just as we configured.
Schema Design
If you need to logically separate database objects such as tables, then creating a user-defined schema is a good choice. In the following example, we’ll assume that we’ve made the decision to use user-defined schemas to logically separate our database objects. The use of custom schemas is optional. If you don’t specify one yourself, the default “public” schema will be used.
Retail – The team that builds and runs the website
Manufacturing – The team that makes the products
Finance – The team that manages the company accounts
In a business like this, schemas make a lot of sense, as it’s feasible that each of the business areas will need an “orders” table. The retail team will need to capture customer orders, the manufacturing team will need to capture orders for raw materials, and finance would like to capture transactions made on company cards as orders.
Table Design
There are plenty of considerations to make when creating tables in any database, and CockroachDB is no exception. Among other things, badly named columns, incorrect use of data types, missing (or redundant) indexes, and badly configured primary keys can hurt readability and performance.
In this section, we’ll take a look at some table design best practices.
Naming
It’s important to know where CockroachDB will create a table, so it’s vitally important that you provide a name that instructs CockroachDB exactly where to place it.
The table exists in the public schema of whichever database you currently have selected.
The table exists in the public schema of the defaultdb database.
You receive an error asking you to select a database.
Always give your tables descriptive names . The choice of singular vs. plural table names and whether you decide to PascalCase, camelCase, snake_case, or SPoNgEBOb case is yours to make. However, whatever you decide, it’s always a good idea to remain consistent with your naming convention.
Column Data Types
CockroachDB has many column types , and to get the best out of your database, you’ll need to know when to use them and when not to use them.
Every column, except for the ID column, is nullable.
The ID column uses the SERIAL data type , which only exists for Postgres compatibility. In most scenarios, it’s better to use the UUID data type, whose values distribute more evenly across ranges.
The length of the name and primary_role columns is not limited.
The age column can store values between -9,223,372,036,854,775,807 and +9,223,372,036,854,775,807, which, given our current scientific progress, seems unlikely.
The is_admin column can also store a large number, but the main problem with this column being of type INT is that there exists a better data type to convey the Boolean semantics of this column: BOOL.
An ENUM data type for the primary_role column may better fit, given we’d have a finite range of values.
There’s usually a data type that is well suited to the data you’d like to store.
If you can make some reasonable predictions to the amount of data in a column, use the limited variable-length variant of a data type where possible.
If you require data in a given column, make it nonnullable.
Use ENUMs to represent the choice between a finite set of string values.
Consider using DEFAULT values to achieve Least Privilege semantics.
Indexes
Use indexes to improve the performance of SELECT queries by giving CockroachDB hints as to where it should look in a dataset for the value(s) you’re requesting.
Primary indexes – Every table receives a primary index . If you have provided a primary key for a table, this will be your table’s primary index. If you have not provided an index, CockroachDB will create a primary index called “rowid” and assign a unique value to each row.
Secondary indexes – Secondary indexes are used to improve the performance of SELECT queries and are recommended for any column you will be filtering on or sorting in your queries.
Uh oh! CockroachDB has had to perform a full scan of the entire person table to fulfill our query. Queries will get linearly slower with every new row added to the database.
CockroachDB has had to sort the results manually for us before returning them because the indexes on the table sort in ascending order by default and we’ve asked for them in descending order.
With this in place, CockroachDB knows how to return query results that are ordered by date_of_birth in ascending or descending order without having to manually sort them.
If you’re indexing columns containing sequential data, such as TIMESTAMP or incrementing INT values, you may want to consider using hash-sharded indexes.3 Hash-sharded indexes ensure that indexed data is distributed evenly across ranges, which prevent single-range hotspots. Single-range hotspots occur when multiple similar values are queried and exist in the same range.
Under the covers, the values in the index will be an FNV hash representation of the data you’re indexing. Any small changes to the input data may result in a different hash output and a different index bucket. The table can now scale more evenly across the nodes in your cluster, resulting in increased performance.
View Design
Materialized views – CockroachDB stores the resulting data for improved performance in subsequent queries. I would only recommend using materialized views if you cannot index the data in your table, for query performance, and if the data in that table does not update frequently. If data frequently updates, your results might be stale.
Dematerialized views – The results of a view are not stored, meaning data is always consistent with the table. As the results are not stored, CockroachDB will query the underlying table for each view query. If you’ve appropriately indexed your table for performance, dematerialized views are the best choice, as they will always return correct data.
Temporary views – Temporary views are ephemeral views that are accessible only to the session in which they were created and deleted at the end of that session.
If you need to restrict table data to specific users or have complex queries you’d like to expose as simpler ones, views are a good choice. Let’s create some views to see what they can offer us.
Simplify Queries
Suppose you have a query whose complexity you’d like to hide (or simply not have to rewrite). You can create a view over this data to expose a simpler query.
Let’s assume we’re running an online shop and would like a query to return the number of purchases by day. Granted, this is a straightforward query , but for the sake of argument, we’ll wrap this in a view to demonstrate the concept.
Restrict Table Data
Another good use case for views is for restricting access to data. In the following example, we’ll create a table and restrict its data to groups of users.
Assume that in the previous example, our shop operates globally. We might not want to sell all products everywhere, so limiting the products we show users in different countries makes sense.
We’re in business! Our user_gbr user has no access to query the product table directly but does have access to its data via the product_gbr view, which exposes only the products we’d like them to see.
Moving Data
In this section, we’ll explore some use cases that involve moving data into and out of databases via the IMPORT and EXPORT statements and CDC (Change Data Capture).
Exporting and Importing Data
Unless you’re creating a new database to store new data, you’ll likely need to migrate data from another database. This section will show you how to export data out of and import data into CockroachDB tables.
Next, we’ll need somewhere to export our data. CockroachDB supports cloud provider destinations such as Amazon S3, Azure Storage, and Google Cloud Storage and allows you to export to self-hosted destinations.
For this example, I’ve created a basic HTTP server that I’m hosting in replit.com under https://replit.com/@robreid/importexport . You can sign up for a free Replit account and fork this Repl to test your own exports/imports.
You need to insert a lot of data quickly.
You need to insert data outside of your application’s typical insert workflow. In this instance, having a process for importing data from files might make sense.
Watching for Database Changes
Like Postgres, CockroachDB supports Change Data Capture (CDC) , which allows you to stream changes to table data out of CockroachDB and into external systems.
In this section, we’ll create an Enterprise Changefeed and stream data changes out of a table. Changefeeds are available in both Enterprise and Core clusters, but the creation syntax and behaviors are slightly different.
Kafka Sink
The Kafka CDC sink was the first to appear in CockroachDB, making its first appearance in v2.1.
Webhook Sink
Feel free to fork my Repl at https://replit.com/@robreid/cdcexample .