Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

5. Interacting with CockroachDB

Rob Reid¹

(1)

Liss, Hampshire, UK

We’ve covered a lot of conceptual ground, and it’s now time to start using CockroachDB as an end user. We’ve created clusters and tables; now, let’s connect to them and put them to work.

Connecting to CockroachDB

When presented with a new database, the first thing you might want to do is connect to it and start querying! In this section, we’ll connect to self-hosted and cloud-based CockroachDB clusters with tools and from code.

Connecting with Tools

There are plenty of off-the-shelf tools you can use to connect to a CockroachDB cluster. Some are free, and some you’ll pay to use.

Here are some of the popular off-the-shelf tools for interacting with CockroachDB. I will be using DBeaver Community , but both DataGrip and TablePlus have excellent support for CockroachDB:

DBeaver – https://dbeaver.com
DataGrip – www.jetbrains.com/datagrip
TablePlus – https://tableplus.com

Let’s create a CockroachDB cluster with the following command and use DBeaver to connect to it. Note that without the --insecure flag, CockroachDB generates a username and password. I’ll use these in DBeaver. Please note that I’ve omitted a lot of the command output for brevity:

$ cockroach demo --no-example-database

# ...

# - Connection parameters:

# (webui) http://127.0.0.1:8080/demologin?password=demo1892&username=demo

# (sql) postgres://demo:[email protected]:26257?sslmode=require

# - Username: "demo", password: "demo1892"

# - Directory with certificate files (for certain SQL drivers/tools): REDACTED

# ...

[email protected]:26257/defaultdb>

Now it’s time to open DBeaver and connect to the database. Take note of the username and password in the preceding output. Figure 5-1 shows the configuration used to connect to this cluster using these values.

Figure 5-1
Connection settings in DBeaver

Note that port 26257 is the default port used by CockroachDB for database connections. As I’ve not altered the default port, I’m using 26257 as the connection port number here.

Now, let’s create a table, insert some data into it, and select the data out to see how this looks in DBeaver . Figure 5-2 shows what working with data looks like using DBeaver.

The database we’ve connected to is basic and does not use client certificates. I’ll now connect to a Cockroach Cloud database to show you how to do this with DBeaver.

Unlike a local demo database, a free-tier Cockroach Cloud database requires a certificate and an argument to tell Cockroach Cloud which cluster to connect to (it’s multitenant, meaning clusters for different users are co-hosted).

Figure 5-3 shows the additional configuration values required by a Cockroach Cloud database. Namely, the host field has been updated to point to the Cockroach Cloud instance, and the database field now includes the information Cockroach Cloud requires to locate your cluster.

Figure 5-3
Basic settings for connecting to a Cockroach Cloud cluster

In addition to the basic settings , you’ll need to help DBeaver find the certificate that will authenticate your connection. Figure 5-4 shows the two additional configuration values you’ll need to set.

Figure 5-4
SSL settings for connecting to a Cockroach Cloud cluster

With these changes made, we can connect to a Cockroach Cloud database.

Connecting Programmatically

You can connect to CockroachDB from many different programming languages . Cockroach Labs lists the available drivers on their website¹ along with example applications for each.²

To give you a good introduction to working with CockroachDB programmatically , I’ll perform some basic operations against a database using some popular drivers now.

Let’s create a table to work with:

CREATE TABLE IF NOT EXISTS person (

id UUID PRIMARY KEY DEFAULT gen_random_uuid(),

name TEXT NOT NULL

);

The examples that follow show how to connect to CockroachDB and get results using several common programming languages . In each case, I’ve chosen succinctness over completeness in the interest of brevity.

Go Example

In this example, I’ll connect to the “defaultdb” database and perform an INSERT and SELECT against the “person” table. I’ll be using a popular Go driver for Postgres called pgx.

First, let’s initialize the environment:

$ mkdir go_example

$ cd go_example

$ go mod init go_example

Next, we’ll create a simple application in a main.go file. I’ll share the key building blocks of the application here, as the complete code listing will be available on GitHub.

First, fetch and import the pgx package:

$ go get github.com/jackc/pgx/v4

import "github.com/jackc/pgx/v4/pgxpool"

Next, connect to the database and ensure that the connection is closed after use:

db, err := pgxpool.Connect(

context.Background(),

"postgres://demo:demo22992@localhost:26257/defaultdb")

if err != nil {

log.Fatalf("error opening database connection: %v", err)

}

defer db.Close()

For brevity, I’m passing a context.Context value of context.Background(), which is fine for toy examples like this one. However, in scenarios where you’re interacting with an actor, you’ll want to pass the request context.

The pgx package does not return an error from calls to Close() on *pgxpool.Pool structs, owing to the pooled nature of the connections it manages. If you’re using the pgx.Connect() variant of this function – which I don’t recommend for multithreaded environments – you’ll receive an object with a Close() function to call.

Suppose that instead of connecting to a local cluster, we wanted to connect to a cloud-based one; in that case, you’d pass a connection string similar to the following:

postgresql://<USERNAME>:<PASSWORD>@<HOST>:<PORT>/<DB_NAME>?sslmode=verify-full&sslrootcert=<PATH_TO_CERT>&options=--cluster%3D<CLUSTER_NAME>

Now we’re ready to insert data. The following statement inserts three names into the “people” table:

stmt := `INSERT INTO person (name) VALUES ($1), ($2), ($3)`

_, err = db.Exec(

context.Background(),

stmt,

"Annie Easley", "Valentina Tereshkova", "Wang Zhenyi",

)

With the data in the table, we can now read it out:

stmt = `SELECT id, name FROM person`

rows, err := db.Query(context.Background(), stmt)

if err != nil {

log.Fatalf("error querying table: %v", err)

}

var id, name string

for rows.Next() {

if err = rows.Scan(&id, &name); err != nil {

log.Fatalf("error reading row: %v", err)

}

log.Printf("%s %s", id, name)

}

Finally, we’ll run the application using the Go toolchain:

$ go run main.go

2021/11/11 09:54:46 2917ee7b-3e26-4999-806a-ba799bc515b3 Wang Zhenyi

2021/11/11 09:54:46 5339cd29-77d5-4e01-966f-38ac7c6f9fbc Annie Easley

2021/11/11 09:54:46 85288df9-25fe-439a-a670-a8e8ea70c7db Valentina Tereshkova

Python Example

Next, we’ll create a Python application to do the same thing, connect to the “defaultdb” database, INSERT into, and SELECT out of.

First, we’ll initialize the environment:

$ mkdir python_example

$ python3 -m pip install --no-binary :all: psycopg2

With the psycopg2 dependencies installed, we’re ready to write our application. Let’s import the dependency and create a connection to the database:

import psycopg2

conn = psycopg2.connect(

dbname="defaultdb",

user="demo",

password="demo9315",

port=26257,

host="localhost",

)

It’s a matter of preference whether you use the connect function that takes a connection string/Data Source Name (DSN) or separate variables for each of the connection elements.

Next, we’ll insert some data:

with conn.cursor() as cur:

cur.execute(

"INSERT INTO person (name) VALUES (%s), (%s), (%s)",

('Chien-Shiung Wu', 'Lise Meitner', 'Rita Levi-Montalcini')

)

conn.commit()

psycopg2’s cursor function returns a cursor to use when executing statements against the database. Using the with statement ensures that transactions will be rolled back in the event that an exception is raised and committed if no exception is raised.

Finally, we’ll read back the data we inserted:

with conn.cursor() as cur:

cur.execute("SELECT id, name FROM person")

rows = cur.fetchall()

for row in rows:

print(row[0], row[1])

psycopg2’s fetchall command returns a list of tuples, one for each row returned. We can access the column values for each tuple by their index.

Ruby Example

Onto Ruby ! Let’s initialize the environment:

$ mkdir ruby_example

$ cd ruby_example

Next, we’ll install the pg gem:

$ gem install pg

We’re ready to create an application. First, we’ll bring in the pg gem and create a connection to the database:

require 'pg'

conn = PG.connect(

user: 'demo',

password: 'demo9315',

dbname: 'defaultdb',

host: 'localhost',

port: "26257",

sslmode: 'require'

)

Next, let’s insert some data. We’ll reuse the person table:

conn.transaction do |tx|

tx.exec_params(

'INSERT INTO person (name) VALUES ($1), ($2), ($3)',

[['Ada Lovelace'], ['Alice Ball'], ['Rosalind Franklin']],

)

end

Queries that take parameters should always be parameterized. In Ruby, you pass parameters into a query using the exec_params function. I then pass a multidimensional array as a parameter, with each array containing the fields of a row to insert.

Let’s read the data out. The exec/exec_params function can also be used to return data. In the following data, I pass a block into which the results of the query are available as rows. For each of the rows returned, I fetch the “id” and “name” columns:

conn.transaction do |tx|

tx.exec('SELECT id, name FROM person') do |result|

result.each do |row|

puts row.values_at('id', 'name')

end

Finally, we’ll run the application:

$ ruby main.rb

2bef0b0a-3b57-4f85-b0a2-58cf5f6ab7e4

["Ada Lovelace"]

5096e818-3236-4697-b919-8695fde1581d

["Rosalind Franklin"]

cf5b1f2d-f4af-4aab-8c67-3a4ced6f6c07

["Alice Ball"]

Crystal Example

Crystal’s turn! I’ve been using Crystal for a number of personal projects recently (including projects backed by CockroachDB), so I wanted to make sure I covered it.

Let’s prepare the environment:

$ crystal init app crystal_example

$ cd crystal_example

Next, we’ll bring in a dependency that will help us communicate with CockroachDB. Open the shard.yml file and paste in the following:

dependencies:

pg:

github: will/crystal-pg

With the dependency in our shard files, the shards command will know what to fetch. Invoke it now to bring in our database driver:

$ shards install

In the src/crystal_example.cr file, let’s connect to the database:

require "db"

require "pg"

db = PG.connect "postgresql://demo:demo43995@localhost:26257/defaultdb?auth_methods=cleartext"

db.exec "INSERT INTO person (name) VALUES ($1), ($2), ($3)",

"Florence R. Sabin", "Flossie Wong-Staal", "Marie Maynard Daly"

db.query "SELECT id, name FROM person" do |rs|

rs.each do

id, name = rs.read(UUID, String)

puts "#{id} #{name}"

end

Finally, we’ll run the application:

$ crystal run src/crystal_example.cr

3bab85db-8ccb-4e9f-b1ba-3cf8ed1f0fc8 Florence R. Sabin

70f34efc-3c8a-4308-bdc6-dddafaccea57 Flossie Wong-Staal

cd37624f-2037-440e-b420-84a64a855792 Marie Maynard Daly

C# Example

Onto C#. Let’s prepare the environment . For this example, I’ll be using .NET 6 with C# 10:

$ dotnet new console --name cs_example

$ cd cs_example

Next, we’ll bring in some NuGet package dependencies. Note that Dapper is not a required dependency for working with CockroachDB; it’s just a preference in this case:

$ dotnet add package System.Data.SqlClient

$ dotnet add package Dapper

$ dotnet add package Npgsql

Now for the code. I’m omitting exception handling to keep the code succinct:

using Dapper;

using System.Data.SqlClient;

using Npgsql;

var connectionString = "Server=localhost:26257;Database=defaultdb;User Id=demo;Password=demo43995";

using (var connection = new NpgsqlConnection(connectionString))

{

var people = new List<Person>()

{

new Person { Name = "Kamala Harris" },

new Person { Name = "Jacinda Ardern" },

new Person { Name = "Christine Lagarde" },

};

connection.Execute("INSERT INTO person (name) VALUES (@Name)", people);

foreach (var person in connection.Query<Person>("SELECT * FROM person"))

{

Console.WriteLine($"{person.ID} {person.Name}");

}

public class Person

{

public Guid? ID { get; set; }

public string? Name { get; set; }

}

Finally, we’ll run the application:

$ dotnet run

4e363f35-4d3b-49dd-b647-7136949b1219 Christine Lagarde

5eecb866-a07f-47b4-9b45-86628864e778 Jacinda Ardern

7e466bd2-1a19-465f-9c70-9fb5f077fe79 Jacinda Ardern

Designing Databases

The database schemas we’ve designed thus far have remained purposefully simple to help demonstrate specific database features like data types. The database schemas we’ll create in this section will be a more accurate reflection of what you’ll create in the wild.

Database Design

We’ll start by looking at CockroachDB’s topmost object: the database . Up to this point, we’ve created tables against the defaultdb for simplicity. It’s now time to create our own database.

An important decision to make before creating your database is where it will live. Will it be located in a single region, or will it span multiple regions?

If your database is to remain within a single region, it can be created as follows:

CREATE DATABASE db_name;

The preceding command will return an error if a database called “db_name” already exists. The following command will create a database only if a database with the same name does not already exist:

CREATE DATABASE IF NOT EXISTS db_name;

If your database is to span multiple regions, you’ll want to provide CockroachDB with some region hints when creating your database. These include the database’s primary region and any other region it is to operate within.

To demonstrate this, I’ll reuse a command from an earlier chapter, which creates a CockroachDB cluster with nine nodes across three regions:

$ cockroach demo

--no-example-database

--nodes 12

--insecure

--demo-locality=region=us-east1,az=a:region=us-east1,az=b:region=us-east1,az=c:region=europe-north1,az=a:region=europe-north1,az=b:region=europe-north1,az=c:region=europe-west1,az=a:region=europe-west1,az=b:region=europe-west1,az=c:region=europe-west3,az=a:region=europe-west3,az=b:region=europe-west3,az=c

Fetching the regions confirms that we created our expected cluster topology :

root@:26257/defaultdb> SHOW REGIONS;

region | zones | database_names | primary_region_of

----------------+---------+----------------+--------------------

europe-north1 | {a,b,c} | {} | {}

europe-west1 | {a,b,c} | {} | {}

europe-west3 | {a,b,c} | {} | {}

us-east1 | {a,b,c} | {} | {}

Let’s create a database that spans the European regions and see how that affects our regions view:

CREATE DATABASE db_name

PRIMARY REGION "europe-west1"

REGIONS = "europe-west1", "europe-west3", "europe-north1"

SURVIVE REGION failure;

root@:26257/defaultdb> SHOW REGIONS;

region | zones | database_names | primary_region_of

----------------+---------+----------------+--------------------

europe-north1 | {a,b,c} | {db_name} | {}

europe-west1 | {a,b,c} | {db_name} | {db_name}

europe-west3 | {a,b,c} | {db_name} | {}

us-east1 | {a,b,c} | {} | {}

The results of SHOW REGIONS confirm that our “db_name” database is running in the European regions and has a primary region of “europe-west1”, just as we configured.

The SHOW REGIONS command takes additional arguments to further narrow down our search. The following command shows just the regions for the “db_name” database:

root@:26257/defaultdb> SHOW REGIONS FROM DATABASE db_name;

database | region | primary | zones

-----------+---------------+---------+----------

db_name | europe-west1 | true | {a,b,c}

db_name | europe-north1 | false | {a,b,c}

db_name | europe-west3 | false | {a,b,c}

Schema Design

If you need to logically separate database objects such as tables, then creating a user-defined schema is a good choice. In the following example, we’ll assume that we’ve made the decision to use user-defined schemas to logically separate our database objects. The use of custom schemas is optional. If you don’t specify one yourself, the default “public” schema will be used.

In this example, we’re building a database to support a simple online retail business. Supporting this online business are the following business areas:

Retail – The team that builds and runs the website
Manufacturing – The team that makes the products
Finance – The team that manages the company accounts

In a business like this, schemas make a lot of sense, as it’s feasible that each of the business areas will need an “orders” table. The retail team will need to capture customer orders, the manufacturing team will need to capture orders for raw materials, and finance would like to capture transactions made on company cards as orders.

Let’s create a database with some schemas to see how we might harness schema design to give us a flexible system. First, we’ll create a database:

CREATE DATABASE IF NOT EXISTS acme;

USE acme;

Next, we’ll create some users to represent people in each of the three business areas:

CREATE USER IF NOT EXISTS retail_user;

CREATE USER IF NOT EXISTS manufacturing_user;

CREATE USER IF NOT EXISTS finance_user;

Finally, we’ll create some schemas . The following statements create three schemas, one for each business area. For the retail and manufacturing schemas, we’ll give access to the finance user, as they will need to view data from tables in both schemas:

CREATE SCHEMA retail AUTHORIZATION retail_user;

GRANT USAGE ON SCHEMA retail TO finance_user;

CREATE SCHEMA manufacturing AUTHORIZATION manufacturing_user;

GRANT USAGE ON SCHEMA manufacturing TO finance_user;

CREATE SCHEMA finance AUTHORIZATION finance_user;

With users, schemas , and grants created, let’s double-check the grants using the SHOW GRANTS command:

SELECT schema_name, grantee, privilege_type FROM [SHOW GRANTS]

WHERE schema_name IN ('retail', 'manufacturing', 'finance')

ORDER BY 1, 2, 3;

schema_name | grantee | privilege_type

----------------+--------------+-----------------

finance | admin | ALL

finance | root | ALL

manufacturing | admin | ALL

manufacturing | finance_user | USAGE

manufacturing | root | ALL

retail | admin | ALL

retail | finance_user | USAGE

retail | root | ALL

Let’s connect to the database as the retail_user and create an orders table in the retail schema. Note that this is the only schema we’ll be able to do this in while connected as the retail_user:

$ cockroach sql

--url "postgres://retail_user@localhost:26257/acme?sqlmode=disable"

--insecure

CREATE TABLE retail.orders(

id UUID PRIMARY KEY DEFAULT gen_random_uuid(),

reference TEXT NOT NULL

);

Let’s do the same as the manufacturing_user:

$ cockroach sql

--url "postgres://manufacturing_user@localhost:26257/acme?sqlmode=disable"

--insecure

CREATE TABLE manufacturing.orders(

id UUID PRIMARY KEY DEFAULT gen_random_uuid(),

reference TEXT NOT NULL

);

And finally, the same again as the finance_user:

$ cockroach sql

--url "postgres://finance_user@localhost:26257/acme?sqlmode=disable"

--insecure

CREATE TABLE finance.orders(

id UUID PRIMARY KEY DEFAULT gen_random_uuid(),

reference TEXT NOT NULL

);

Let’s reconnect to the database as the root user and view the tables we’ve created:

$ cockroach sql

--url "postgres://root@localhost:26257/acme?sqlmode=disable"

--insecure

SELECT schema_name, table_name, owner

FROM [SHOW TABLES FROM acme];

schema_name | table_name | owner

----------------+------------+---------------------

finance | orders | finance_user

manufacturing | orders | manufacturing_user

retail | orders | retail_user

Table Design

There are plenty of considerations to make when creating tables in any database, and CockroachDB is no exception. Among other things, badly named columns, incorrect use of data types, missing (or redundant) indexes, and badly configured primary keys can hurt readability and performance.

In this section, we’ll take a look at some table design best practices.

Naming

It’s important to know where CockroachDB will create a table, so it’s vitally important that you provide a name that instructs CockroachDB exactly where to place it.

Where do you think CockroachDB will place the following table?

CREATE TABLE table_name();

If you said any variation of ¯\_(ツ)_/¯, you’re not far off. The following outcomes could result:

The table exists in the public schema of whichever database you currently have selected.
The table exists in the public schema of the defaultdb database.
You receive an error asking you to select a database.

In short, creating a table like this is ill-advised when working with production-scale systems. Providing the fully qualified name of the table is so important. In the following example, we’ve removed any uncertainty about where the table should live:

CREATE TABLE database_name.schema_name.table_name();

Always give your tables descriptive names . The choice of singular vs. plural table names and whether you decide to PascalCase, camelCase, snake_case, or SPoNgEBOb case is yours to make. However, whatever you decide, it’s always a good idea to remain consistent with your naming convention.

Column Data Types

CockroachDB has many column types , and to get the best out of your database, you’ll need to know when to use them and when not to use them.

Take the following table, for example. Other than my purposeful inclusion of two role-based columns, age specified as an integer and not timestamp, and omission of a database and schema, how many questionable design decisions can you spot?

CREATE TABLE users (

id SERIAL PRIMARY KEY,

name STRING,

age INT,

is_admin INT,

primary_role STRING

);

Here’s a list of potential issues I can identify in the design of this table:

Every column, except for the ID column, is nullable.
The ID column uses the SERIAL data type , which only exists for Postgres compatibility. In most scenarios, it’s better to use the UUID data type, whose values distribute more evenly across ranges.
The length of the name and primary_role columns is not limited.
The age column can store values between -9,223,372,036,854,775,807 and +9,223,372,036,854,775,807, which, given our current scientific progress, seems unlikely.
The is_admin column can also store a large number, but the main problem with this column being of type INT is that there exists a better data type to convey the Boolean semantics of this column: BOOL.
An ENUM data type for the primary_role column may better fit, given we’d have a finite range of values.

Here’s the table again but this time, with the issues I’ve identified resolved:

CREATE TYPE users_primary_role AS ENUM (

'admin',

'write',

'read'

);

CREATE TABLE users (

id UUID PRIMARY KEY DEFAULT gen_random_uuid(),

name STRING(50) NOT NULL,

age INT2 NOT NULL,

is_admin BOOL NOT NULL DEFAULT false,

primary_role users_primary_role NOT NULL DEFAULT 'read'

);

In summary:

There’s usually a data type that is well suited to the data you’d like to store.
If you can make some reasonable predictions to the amount of data in a column, use the limited variable-length variant of a data type where possible.
If you require data in a given column, make it nonnullable.
Use ENUMs to represent the choice between a finite set of string values.
Consider using DEFAULT values to achieve Least Privilege semantics.

Indexes

Use indexes to improve the performance of SELECT queries by giving CockroachDB hints as to where it should look in a dataset for the value(s) you’re requesting.

CockroachDB distinguishes between two types of indexes :

Primary indexes – Every table receives a primary index . If you have provided a primary key for a table, this will be your table’s primary index. If you have not provided an index, CockroachDB will create a primary index called “rowid” and assign a unique value to each row.
Secondary indexes – Secondary indexes are used to improve the performance of SELECT queries and are recommended for any column you will be filtering on or sorting in your queries.

In the following example, I’m creating a table with no primary or secondary indexes :

CREATE TABLE person (

id UUID NOT NULL DEFAULT gen_random_uuid(),

date_of_birth TIMESTAMP NOT NULL,

pets STRING[]

);

Let’s run a query to see how our table looks like in CockroachDB:

SELECT column_name, column_default, is_nullable, data_type

FROM defaultdb.information_schema.columns

WHERE table_name = 'person';

column_name | column_default | is_nullable | data_type

----------------+-------------------+-------------+---------------------

id | gen_random_uuid() | NO | uuid

date_of_birth | NULL | NO | timestamp without time zone

pets | NULL | YES | ARRAY

rowid | unique_rowid() | NO | bigint

As you can see from the aforementioned, CockroachDB has generated a fourth column called rowid as a substitute for our missing primary key column. Values for this column are auto-incrementing but not necessarily sequential. As can be seen with a call to unique_rowid():

SELECT abs(unique_rowid() - unique_rowid()) AS difference;

difference

--------------

32768

To see what CockroachDB has generated for us, I’ll run another command to generate the table’s CREATE statement:

SHOW CREATE TABLE person;

table_name | create_statement

-------------+-----------------------------------------------------------

person | CREATE TABLE public.person (

| id UUID NOT NULL DEFAULT gen_random_uuid(),

| date_of_birth TIMESTAMP NOT NULL,

| pets STRING[] NULL,

| rowid INT8 NOT VISIBLE NOT NULL DEFAULT unique_rowid(),

| CONSTRAINT "primary" PRIMARY KEY (rowid ASC),

| FAMILY "primary" (id, date_of_birth, pets, rowid)

| )

Let’s create the table again but this time, provide a primary key of our own:

CREATE TABLE person (

id UUID PRIMARY KEY DEFAULT gen_random_uuid(),

date_of_birth TIMESTAMP NOT NULL,

pets STRING[]

);

Now that the ID column in our table is the primary key, CockroachDB has not created its own primary key for us:

SELECT column_name, column_default, is_nullable, data_type

FROM defaultdb.information_schema.columns

WHERE table_name = 'person';

column_name | column_default | is_nullable | data_type

----------------+-------------------+-------------+-----------------------

id | gen_random_uuid() | NO | uuid

date_of_birth | NULL | NO | timestamp without time zone

pets | NULL | YES | ARRAY

With the table in place, let’s insert some data into it and see how CockroachDB searches the table when queried:

INSERT INTO person (date_of_birth, pets) VALUES

('1980-01-01', ARRAY['Mittens', 'Fluffems']),

('1979-02-02', NULL),

('1996-03-03', ARRAY['Mr. Dog']),

('2001-04-04', ARRAY['His Odiousness, Lord Dangleberry']),

('1954-05-05', ARRAY['Steve']);

EXPLAIN SELECT id, date_of_birth FROM person

WHERE date_of_birth BETWEEN '1979-01-01' and '1997-12-31'

AND pets <@ ARRAY['Mr. Dog']

ORDER BY date_of_birth;

distribution: full

vectorized: true

• sort

│ estimated row count: 0

│ order: +date_of_birth

│

└── • filter

│ estimated row count: 0

│ filter: ((date_of_birth >= '1979-01-01 00:00:00') AND (date_of_birth <= '1997-12-31 00:00:00')) AND (pets <@ ARRAY['Mr. Dog'])

│

└── • scan

estimated row count: 5 (100% of the table; stats collected 5 minutes ago)

table: person@primary

spans: FULL SCAN

Uh oh! CockroachDB has had to perform a full scan of the entire person table to fulfill our query. Queries will get linearly slower with every new row added to the database.

Secondary indexes to the rescue! Let’s recreate the table with indexes on the columns we’ll filter and sort on. In our case, that’s the date_of_birth and the pets column:

CREATE TABLE person (

id UUID PRIMARY KEY DEFAULT gen_random_uuid(),

date_of_birth TIMESTAMP NOT NULL,

pets STRING[],

INDEX (date_of_birth),

INVERTED INDEX (pets)

);

EXPLAIN SELECT id, date_of_birth FROM person

WHERE date_of_birth BETWEEN '1979-01-01' and '1997-12-31'

AND pets <@ ARRAY['Mr. Dog']

ORDER BY date_of_birth;

distribution: local

vectorized: true

• filter

│ filter: pets <@ ARRAY['Mr. Dog']

│

└── • index join

│ table: person@primary

│

└── • scan

missing stats

table: person@person_date_of_birth_idx

spans: [/'1979-01-01 00:00:00' - /'1997-12-31 00:00:00']

Success! Our query no longer requires a full table scan . Note what happens, however, if we order the results in descending order of date_of_birth:

EXPLAIN SELECT id, date_of_birth FROM person

WHERE date_of_birth BETWEEN '1979-01-01' and '1997-12-31'

AND pets <@ ARRAY['Mr. Dog']

ORDER BY date_of_birth DESC;

distribution: full

vectorized: true

• sort

│ order: -date_of_birth

│

└── • filter

│ filter: pets <@ ARRAY['Mr. Dog']

│

└── • index join

│ table: person@primary

│

└── • scan

missing stats

table: person@person_date_of_birth_idx

spans: [/'1979-01-01 00:00:00' - /'1997-12-31 00:00:00']

CockroachDB has had to sort the results manually for us before returning them because the indexes on the table sort in ascending order by default and we’ve asked for them in descending order.

Suppose you know that CockroachDB may return the results of a query in either ascending or descending order by a column. In that case, it’s worth considering adding indexes for both scenarios as follows:

CREATE TABLE person (

id UUID PRIMARY KEY DEFAULT gen_random_uuid(),

date_of_birth TIMESTAMP NOT NULL,

pets STRING[],

INDEX person_date_of_birth_asc_idx (date_of_birth ASC),

INDEX person_date_of_birth_desc_idx (date_of_birth DESC),

INVERTED INDEX (pets)

);

With this in place, CockroachDB knows how to return query results that are ordered by date_of_birth in ascending or descending order without having to manually sort them.

If you’re indexing columns containing sequential data, such as TIMESTAMP or incrementing INT values, you may want to consider using hash-sharded indexes.³ Hash-sharded indexes ensure that indexed data is distributed evenly across ranges, which prevent single-range hotspots. Single-range hotspots occur when multiple similar values are queried and exist in the same range.

At the time of writing, hash-sharded indexes are still experimental, so they need to be enabled as follows:

SET experimental_enable_hash_sharded_indexes = true;

CREATE TABLE device_event (

device_id UUID NOT NULL,

event_timestamp TIMESTAMP NOT NULL,

INDEX (device_id, event_timestamp) USING HASH WITH bucket_count = 10

);

Under the covers, the values in the index will be an FNV hash representation of the data you’re indexing. Any small changes to the input data may result in a different hash output and a different index bucket. The table can now scale more evenly across the nodes in your cluster, resulting in increased performance.

View Design

A view is simply a SELECT query that you assign a name to ask CockroachDB to remember. CockroachDB supports three types of views:

Materialized views – CockroachDB stores the resulting data for improved performance in subsequent queries. I would only recommend using materialized views if you cannot index the data in your table, for query performance, and if the data in that table does not update frequently. If data frequently updates, your results might be stale.
Dematerialized views – The results of a view are not stored, meaning data is always consistent with the table. As the results are not stored, CockroachDB will query the underlying table for each view query. If you’ve appropriately indexed your table for performance, dematerialized views are the best choice, as they will always return correct data.
Temporary views – Temporary views are ephemeral views that are accessible only to the session in which they were created and deleted at the end of that session.

If you need to restrict table data to specific users or have complex queries you’d like to expose as simpler ones, views are a good choice. Let’s create some views to see what they can offer us.

Simplify Queries

Suppose you have a query whose complexity you’d like to hide (or simply not have to rewrite). You can create a view over this data to expose a simpler query.

Let’s assume we’re running an online shop and would like a query to return the number of purchases by day. Granted, this is a straightforward query , but for the sake of argument, we’ll wrap this in a view to demonstrate the concept.

First, we’ll create the tables:

CREATE TABLE customer (

id UUID PRIMARY KEY DEFAULT gen_random_uuid(),

email STRING(50) NOT NULL,

INDEX(email)

);

CREATE TABLE product (

id UUID PRIMARY KEY DEFAULT gen_random_uuid(),

sku STRING(20) NOT NULL,

INDEX(sku)

);

CREATE TABLE purchase (

customer_id UUID NOT NULL REFERENCES customer(id),

product_id UUID NOT NULL REFERENCES product(id),

checkout_at TIMESTAMPTZ NOT NULL

);

Next, we’ll insert some data into the tables to give us something to work with:

INSERT INTO customer (email) VALUES

('[email protected]'),

('[email protected]');

INSERT INTO product (sku) VALUES

('E81ML1GA'),

('4UXK19D0'),

('L28C0XJ4');

INSERT INTO purchase (customer_id, product_id, checkout_at) VALUES

('2df28d7d-67a3-4e83-a713-96a68a196b0b', '486ecc02-6d8e-44fe-b8ef-b8d004658465', '2021-11-23T06:50:46Z'),

('5047d78c-12cc-48e3-a36a-7cac1018ba7e', '5b2756e6-cc88-47e6-849d-6291a76f3881', '2021-11-22T08:13:13+07:00'),

('792ab188-1bb2-4ac6-b1d2-f1adbcf9223e', '6cfd7bed-bf11-4c96-9300-82c8e9dd97f2', '2021-11-23T06:50:46-05:00');

Finally, we’ll create the view. This view will be a simple selection on the purchase table, along with a grouping over the extracted day-of-week value of the checkout_at column:

CREATE VIEW purchases_by_dow (dow, purchases) AS

SELECT EXTRACT('ISODOW', pu.checkout_at), COUNT(*)

FROM purchase pu

GROUP BY EXTRACT('ISODOW', pu.checkout_at);

With this in place, we can now work against the view, which hides the complexity of the date manipulation:

SELECT * FROM purchases_by_dow;

dow | purchases

------+------------

2 | 2

1 | 1

Restrict Table Data

Another good use case for views is for restricting access to data. In the following example, we’ll create a table and restrict its data to groups of users.

Assume that in the previous example, our shop operates globally. We might not want to sell all products everywhere, so limiting the products we show users in different countries makes sense.

Before we begin, we’ll need to create and connect to a secure cluster. Let’s create a single-node cluster now.

$ mkdir certs

$ mkdir keys

$ cockroach cert create-ca

--certs-dir=certs

--ca-key=keys/ca.key

$ cockroach cert create-node

--certs-dir=certs

--ca-key=keys/ca.key

localhost

$ cockroach start-single-node

--certs-dir=certs

--store=node1

--listen-addr=localhost:26257

--http-addr=localhost:8080

$ cockroach cert create-client

root

--certs-dir=certs

--ca-key=keys/ca.key

$ cockroach sql --certs-dir=certs

First, we’ll create the products table and insert some products into it:

CREATE TABLE product (

id UUID PRIMARY KEY DEFAULT gen_random_uuid(),

country_iso STRING(3) NOT NULL,

sku STRING(20) NOT NULL

);

INSERT INTO product (country_iso, sku) VALUES

('BRA', 'E81ML1GA'),

('FIN', '4UXK19D0'),

('GBR', 'L28C0XJ4');

Next, we’ll create a view to return all products for customers in the UK:

CREATE VIEW product_gbr (id, country_iso, sku) AS

SELECT id, country_iso, sku FROM product

WHERE country_iso = 'GBR';

Finally, we’ll create a user and grant them access to the view. We’ll assume the following user will serve UK customers:

CREATE USER user_gbr WITH LOGIN PASSWORD 'some_password';

GRANT SELECT ON product_gbr TO user_gbr;

REVOKE SELECT ON product FROM user_gbr;

Let’s connect to the database as the new user and see what we can (and can’t) access:

$ cockroach sql

--certs-dir=certs

--url "postgres://user_gbr:some_password@localhost:26257/defaultdb"

user_gbr@localhost:26257/defaultdb> SELECT * FROM product;

ERROR: user user_gbr does not have SELECT privilege on relation product

SQLSTATE: 42501

user_gbr@localhost:26257/defaultdb> SELECT * FROM product_gbr;

id | country_iso | sku

---------------------------------------+-------------+-----------

d1c055ed-1a2e-4383-875e-3c53a2bf65cc | GBR | L28C0XJ4

We’re in business! Our user_gbr user has no access to query the product table directly but does have access to its data via the product_gbr view, which exposes only the products we’d like them to see.

Moving Data

In this section, we’ll explore some use cases that involve moving data into and out of databases via the IMPORT and EXPORT statements and CDC (Change Data Capture).

Exporting and Importing Data

Unless you’re creating a new database to store new data, you’ll likely need to migrate data from another database. This section will show you how to export data out of and import data into CockroachDB tables.

To start, let’s create a single-node cluster:

$ cockroach demo --no-example-database --insecure

With a cluster in place, we’ll create a database with two tables: one table to export data out of and another table to import data into. We’ll also insert some data to export:

CREATE DATABASE import_export;

USE import_export;

CREATE TABLE to_export (

name TEXT NOT NULL

);

CREATE TABLE to_import (

name TEXT NOT NULL

);

INSERT INTO to_export (name) VALUES ('A'), ('B'), ('C');

Next, we’ll need somewhere to export our data. CockroachDB supports cloud provider destinations such as Amazon S3, Azure Storage, and Google Cloud Storage and allows you to export to self-hosted destinations.

For this example, I’ve created a basic HTTP server that I’m hosting in replit.com under https://replit.com/@robreid/importexport . You can sign up for a free Replit account and fork this Repl to test your own exports/imports.

With a server up and running and available at https://importexport.robreid.repl.co , I can export data as follows (you will have a different URL in the form of https://importexport.YOUR_USERNAME.repl.co):

EXPORT INTO CSV 'https://importexport.robreid.repl.co'

FROM TABLE to_export;

filename | rows | bytes

-------------------------------------------------------------------+------+-------

export16b4ae42664118980000000000000001-n707964976657694721.0.csv | 3 | 6

We can confirm that the file has been successfully ingested by the server but joining the address of the running Repl with the file name in the preceding response:

$ curl https://importexport.robreid.repl.co/export16b4ae42664118980000000000000001-n707964976657694721.0.csv

Let’s import this data into our “to_import” table now:

IMPORT INTO to_import (name)

CSV DATA (

'https://importexport.robreid.repl.co/export16b4ae42664118980000000000000001-n707964976657694721.0.csv'

);

---------------------+-----------+--------------------+------+----------------+-------

707965889435041793 | succeeded | 1 | 3 | 0 | 60

CockroachDB also supports importing data from Avro files, TSV and other delimited files, and CockroachDB, MySQL, and Postgres dump files. I recommend using the IMPORT statement in the following scenarios:

You need to insert a lot of data quickly.
You need to insert data outside of your application’s typical insert workflow. In this instance, having a process for importing data from files might make sense.

Watching for Database Changes

Like Postgres, CockroachDB supports Change Data Capture (CDC) , which allows you to stream changes to table data out of CockroachDB and into external systems.

In this section, we’ll create an Enterprise Changefeed and stream data changes out of a table. Changefeeds are available in both Enterprise and Core clusters, but the creation syntax and behaviors are slightly different.

Kafka Sink

The Kafka CDC sink was the first to appear in CockroachDB, making its first appearance in v2.1.

In this example, I’ll use Redpanda, a streaming platform that’s wire-compatible with Kafka (in the same way that CockroachDB is wire-compatible with Postgres). Let’s start an instance of Redpanda, create a topic, and consume from it:

$ docker run

--rm -it

--name redpanda

-p 9092:9092

-p 9644:9644

docker.vectorized.io/vectorized/redpanda:latest

redpanda start

--overprovisioned

--smp 1

--memory 1G

--reserve-memory 0M

--node-id 0

--check=false

$ docker exec -it

redpanda rpk topic create cdc_example

--brokers=localhost:9092

$ docker exec -it

redpanda rpk topic consume cdc_example

--brokers=localhost:9092

Moving back to Cockroach, let’s create a simple insecure demo cluster with an example table to stream data from:

$ cockroach demo --no-example-database --insecure

CREATE TABLE example (

id UUID PRIMARY KEY DEFAULT gen_random_uuid(),

value STRING(10) NOT NULL

);

With the table in place, let’s create a Kafka Changefeed to start publishing changes:

SET CLUSTER SETTING kv.rangefeed.enabled = true;

CREATE CHANGEFEED FOR TABLE example

INTO 'kafka://localhost:9092?topic_name=cdc_example'

WITH updated;

Note that in order to create a Changefeed for multiple tables, simply provide a comma-separated string containing the tables you’d like to monitor as follows:

CREATE CHANGEFEED FOR TABLE table_one, table_two, table_three

INTO 'kafka://localhost:9092?topic_name=cdc_example'

WITH updated;

The only thing required to get data into Kafka now is to use your tables as normal. All changes to the data in our example table will be automatically published to Kafka . Let’s INSERT, UPDATE, and DELETE some data now and see what CockroachDB publishes to the cdc_example topic:

INSERT INTO example (value) VALUES ('a');

UPDATE example

SET value = 'b'

WHERE id = '4c0ebb98-7f34-436e-9f6b-7ea1888327d9';

DELETE FROM example

WHERE id = '4c0ebb98-7f34-436e-9f6b-7ea1888327d9';

You’ll start to see the events received from CockroachDB via the Redpanda consumer at this point. The first message received represents the change introduced by the INSERT statement, the second message was published after the UPDATE statement, and the last message shows the row being deleted as a result of the DELETE statement:

{

"topic": "cdc_example",

"key": "["4c0ebb98-7f34-436e-9f6b-7ea1888327d9"]",

"value": "{"after": {"id": "4c0ebb98-7f34-436e-9f6b-7ea1888327d9", "value": "a"}, "updated": "1637755793947605000.0000000000"}",

"timestamp": 1637755794706,

"partition": 0,

"offset": 44

}

{

"topic": "cdc_example",

"key": "["4c0ebb98-7f34-436e-9f6b-7ea1888327d9"]",

"value": "{"after": {"id": "4c0ebb98-7f34-436e-9f6b-7ea1888327d9", "value": "b"}, "updated": "1637755826616641000.0000000000"}",

"timestamp": 1637755826791,

"partition": 0,

"offset": 45

}

{

"topic": "cdc_example",

"key": "["4c0ebb98-7f34-436e-9f6b-7ea1888327d9"]",

"value": "{"after": null, "updated": "1637756011024955000.0000000000"}",

"timestamp": 1637756011195,

"partition": 0,

"offset": 46

}

To stop a Changefeed, you need to locate and delete the job that’s running it. Let’s find the job and cancel it now:

[email protected]:26257/defaultdb> SELECT job_id, job_type, description FROM [SHOW JOBS];

job_id | job_type | description

---------------------+------------+--------------------------------------

713311406086291457 | CHANGEFEED | CREATE CHANGEFEED FOR TABLE example INTO 'kafka://localhost:9092?topic_name=cdc_example' WITH updated

[email protected]:26257/defaultdb> CANCEL JOB 713311406086291457;

Webhook Sink

Let’s redirect our CDC events to a web server now. Owing to the potentially sensitive nature of the data in your tables, CockroachDB will only send data to HTTPS-enabled servers. Like the previous export/import example, I’ll reuse replit.com to create and run a free HTTPS server. Let’s create and run a simple HTTPS server :

package main

import (

"io/ioutil"

"log"

"net/http"

)

func main() {

http.HandleFunc("/", cdc)

log.Fatal(http.ListenAndServe(":9090", nil))

}

func cdc(w http.ResponseWriter, r *http.Request) {

defer func() {

if err := r.Body.Close(); err != nil {

log.Printf("error closing request body: %v", err)

}

}()

event, err := ioutil.ReadAll(r.Body)

if err != nil {

log.Printf("error reading request body: %v", err)

}

log.Println(string(event))

}

Feel free to fork my Repl at https://replit.com/@robreid/cdcexample .

Copy the URL of your running server (mine is https://cdcexample.robreid.repl.co ) and plug it into CockroachDB to start publishing webhook CDC events :

CREATE CHANGEFEED FOR TABLE example

INTO 'webhook-https://cdcexample.robreid.repl.co?insecure_tls_skip_verify=true'

WITH updated;

Finally, let’s generate some events with INSERT, UPDATE, and DELETE statements:

INSERT INTO example (value) VALUES ('b');

UPDATE example SET value = 'b'

WHERE id = 'aea98953-0a9b-4315-a55f-e95bc7dc8305';

DELETE FROM example

WHERE id = 'aea98953-0a9b-4315-a55f-e95bc7dc8305';

Switching back to replit.com, our server is now receiving events for CockroachDB changes :

2021/11/24 12:41:13 {"payload":[{"after":{"id":"aea98953-0a9b-4315-a55f-e95bc7dc8305","value":"b"},"key":["aea98953-0a9b-4315-a55f-e95bc7dc8305"],"topic":"example","updated":"1637757672559012000.0000000000"}],"length":1}

2021/11/24 12:41:20 {"payload":[{"after":{"id":"aea98953-0a9b-4315-a55f-e95bc7dc8305","value":"b"},"key":["aea98953-0a9b-4315-a55f-e95bc7dc8305"],"topic":"example","updated":"1637757679279445000.0000000000"}],"length":1}

2021/11/24 12:41:20 {"payload":[{"after":null,"key":["aea98953-0a9b-4315-a55f-e95bc7dc8305"],"topic":"example","updated":"1637757680248432000.0000000000"}],"length":1}

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for 5. Interacting with CockroachDB

Create new playlist

Sign In

Sign Up

5. Interacting with CockroachDB

Connecting to CockroachDB

Connecting with Tools

Connecting Programmatically

Go Example

Python Example

Ruby Example

Crystal Example

C# Example

Designing Databases

Database Design

Schema Design

Table Design

Naming

Column Data Types

Indexes

View Design

Simplify Queries

Restrict Table Data

Moving Data

Exporting and Importing Data

Watching for Database Changes

Kafka Sink

Webhook Sink

Table of Contents for
5. Interacting with CockroachDB