Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

3. Concepts

Rob Reid¹

(1)

Liss, Hampshire, UK

Chapter 3ConceptsIt’s time to dive deeper into CockroachDB! This chapter explores some of CockroachDB’s building blocks, including data types, indexes, and geo-partitioning.

First, let’s explore the top-level objects that make up CockroachDB.

Database Objects

As Figure 3-1 shows, objects in CockroachDB are arranged hierarchically.

At the top level of the CockroachDB object hierarchy , there are databases. Databases contain schemas that, in turn, contain schema objects like tables and views.

Every CockroachDB cluster will contain the following three databases when it starts:

defaultdb – The defaultdb database that clients will connect to if they don’t provide a database name in their connection configuration. This database is empty and does not provide any contextual information to the schema objects contained within it. For that reason, it’s best to avoid using this database and prefer to create and reference named databases explicitly. If you don’t need this database, you can delete it.
postgres – The postgres database exists to provide compatibility with Postgres clients. If you don’t need this database, you can delete it.
system – The system database provides core information to the CockroachDB cluster, including information relating to jobs, ranges, replication, geo-locations, users, and more.

In every database, there are schemas. Schemas are CockroachDB’s second-level object. They allow you to separate objects like tables and views into logical areas. Say, for example, you have two business areas, each needing their own tables; having a schema per business area not only keeps those tables logically separated but also resolves naming collisions. Having two tables named “employee,” for example, would not be possible within the same schema. With two schemas, this is not an issue.

The following default schemas (or “System Catalogs”) exist for all new databases:

root@:26257/blah> show schemas;

schema_name | owner

---------------------+--------

crdb_internal | NULL

information_schema | NULL

pg_catalog | NULL

pg_extension | NULL

public | admin

crdb_internal – The crdb_internal schema contains information relating to the internal objects used by a database. This includes things like metrics, jobs, alerts, leases, and audit entries for CREATE operations.
information_schema – The information_schema schema contains information on user-defined objects such as columns, indexes, tables, and views.
pg_catalog – The pg_catalog schema provides compatibility with Postgres clients that are expecting a schema with this name to exist. It contains information relating to databases and the objects they contain.
pg_extension – The pg_extension schema provides extension information. By default, this includes information from the Spatial Features extension, which provides spatial data types.
public – In the same way that the defaultdb database is used if no database name is provided, the public schema is the default schema that is used if no user-defined schema is provided. All user-defined objects go into this schema unless the user provides another.

Underneath schemas are CockroachDB’s third and final-level objects. These include indexes, sequences, tables, views, and temporary objects (objects like temporary tables that are not persisted).

Data Types

CockroachDB boasts all of the data types you’ll need to build a rich database. In this section, I’ll show you how and where to use these data types .

UUID

The UUID data type stores a 128-bit UUID value. Values stored in this column can be any UUID version¹ but will all be formatted using RFC 4122 standards. Let’s create a table with a UUID column to learn more about it.

First, let’s create a contrived table that includes a UUID column:

CREATE TABLE person (id UUID);

Next, we’ll insert some data into it; note that UUIDs are valid with or without curly braces, as Uniform Resource Names (URNs), or as 16-byte strings:

INSERT INTO person (id) VALUES ('a33c928b-a138-4419-9f1f-8d8a137235d3');

INSERT INTO person (id) VALUES ('{1800ebff-bf6d-52c0-842f-d8db25e15ced}');

INSERT INTO person (id) VALUES ('urn:uuid:f2c6408a-3c3f-4071-9bcf-1a669d40c07f');

INSERT INTO person (id) VALUES (b'1oqpb0zna$*k4al~');

Selecting the UUIDs out of the table reveals their stored representation, even though we inserted them differently:

SELECT * FROM person;

----------------------------------------

a33c928b-a138-4419-9f1f-8d8a137235d3

1800ebff-bf6d-52c0-842f-d8db25e15ced

f2c6408a-3c3f-4071-9bcf-1a669d40c07f

316f7170-6230-7a6e-6124-2a6b34616c7e

UUID columns are a great choice when you need unique IDs for tables. Rather than providing the UUIDs ourselves, let’s ask CockroachDB to generate them for us on-insert:

CREATE TABLE "person" (

"id" UUID DEFAULT gen_random_uuid() PRIMARY KEY

);

CockroachDB will only generate a default value if you don’t provide one, so it’s still possible to provide values yourself.

ARRAY

The ARRAY data type stores a flat (or one-dimensional) collection of another data type. It’s indexable using inverted indexes designed to work with tokenizable data, such as the values of an array or the key-value pairs in a JSON object. Let’s create a table with an ARRAY column to learn more about it.

First, we’ll create a table with an ARRAY column . Arrays are created in the TYPE[] syntax or the TYPE ARRAY syntax as follows:

-- Create with TYPE[] syntax:

CREATE TABLE person (

id UUID DEFAULT gen_random_uuid() PRIMARY KEY,

pets STRING[]

);

-- Create with TYPE ARRAY syntax:

CREATE TABLE person (

id UUID DEFAULT gen_random_uuid() PRIMARY KEY,

pets STRING ARRAY

);

Next, we’ll insert some data:

INSERT

INTO person (pets)

VALUES (ARRAY['Max', 'Duke']),

(ARRAY['Snowball']),

(ARRAY['Gidgit']),

(ARRAY['Chloe']);

Selecting the values back out of the table reveals their representation in CockroachDB :

SELECT * FROM person;

id | pets

---------------------------------------+-------------

59220317-cc79-4689-b05f-c21886a7986d | {Max,Duke}

5b4455a2-37e7-49de-bd6f-cdd070e8c133 | {Snowball}

659dce69-03b8-4872-b6af-400e95bf43d9 | {Gidgit}

f4ef9111-f118-4f66-b950-921d8c1c3291 | {Chloe}

There are many operations you can perform against ARRAY columns . We’ll cover just the most common.

To return rows that contain a particular value in an ARRAY column, we can use the “contains” operator. The following returns the ID of any person with a pet called Max:

SELECT id FROM person WHERE pets @> ARRAY['Max'];

----------------------------------------

59220317-cc79-4689-b05f-c21886a7986d

To return rows whose ARRAY column is within a given array, we can use the “is contained by” operator. The following returns the ID of any person whose complete list of pets is contained within a given array:

SELECT id FROM person WHERE pets <@ ARRAY['Max', 'Duke', 'Snowball'];

----------------------------------------

59220317-cc79-4689-b05f-c21886a7986d

5b4455a2-37e7-49de-bd6f-cdd070e8c133

If you know the name of one of a person’s pets but not all of them, you can use the overlap operator to find the ID of any person who has a pet contained within a given array:

SELECT id FROM person WHERE pets && ARRAY['Max', 'Snowball'];

----------------------------------------

59220317-cc79-4689-b05f-c21886a7986d

5b4455a2-37e7-49de-bd6f-cdd070e8c133

Add an element to an array (note that for inserts, you can either use array_append or the append operator ||):

UPDATE person

SET pets = array_append(pets, 'Duke')

WHERE id = '59220317-cc79-4689-b05f-c21886a7986d';

Remove an element from an array:

UPDATE person

SET pets = array_remove(pets, 'Duke')

WHERE id = '59220317-cc79-4689-b05f-c21886a7986d';

To get the most out of an ARRAY column , you’ll need to use an inverted index, as without it, CockroachDB will have to perform a full table scan, as highlighted in the following:

EXPLAIN SELECT id FROM person WHERE pets @> ARRAY['Max'];

info

--------------------------------------------------------------------------

distribution: full

vectorized: true

• filter

│ estimated row count: 0

│ filter: pets @> ARRAY['Max']

│

└── • scan

estimated row count: 4 (100% of the table; stats collected 16 minutes ago)

table: person@primary

spans: FULL SCAN

You configure an inverted index on an ARRAY column in a new table as follows:

CREATE TABLE person (

id UUID DEFAULT gen_random_uuid() PRIMARY KEY,

pets STRING[],

INVERTED INDEX (pets)

);

You configure an inverted index on an ARRAY column in an existing table as follows:

CREATE INVERTED INDEX [OPTIONAL_NAME] ON person (pets);

BIT

The BIT data type stores a bit array. BIT columns can store varying numbers of bits and can either contain an exact or a variable number of bits:

CREATE TABLE bits (

exactly_1 BIT,

exactly_64 BIT(64),

any_size VARBIT,

up_to_64 VARBIT(64)

);

Values can be inserted into BIT columns as follows (note that the preceding B for each of the values denotes a binary string):

INSERT

INTO bits (exactly_1, exactly_64, any_size, up_to_64)

VALUES (

B'1',

B'1010101010101010101010101010101010101010101010101010101010101010',

B'10101',

B'10101010101'

);

BOOL

The BOOL or BOOLEAN data type stores a true or false value and is created as follows:

CREATE TABLE person (

UUID DEFAULT gen_random_uuid() PRIMARY KEY,

wants_marketing_emails

BOOL NOT NULL

);

Values are provided to BOOL columns with Boolean literals or via type casting from integers:

INSERT

INTO person (wants_marketing_emails)

VALUES

(1::BOOL), -- True (any non-zero number)

(true), -- Literal true

(12345::BOOL), -- True (any non-zero number)

(0::BOOL), -- False (zero value)

(false); -- Literal false

BYTES

The BYTES, BYTEA, or BLOB data type stores the byte array equivalent of TEXT strings and can be created as follows:

CREATE TABLE person (

UUID DEFAULT gen_random_uuid() PRIMARY KEY,

password

BYTES NOT NULL

);

You can insert BYTES values in several ways. TEXT strings will automatically cast to BYTES , and CockroachDB supports various encoding methods for fine-grained control of insert values:

INSERT

INTO person (password)

VALUES

('password'), -- String value

(b'password'), -- Byte array literal

(x'70617373776f7264'), -- Hex literal

(b'x70x61x73x73x77x6fx72x64'); -- Hex characters

Every resulting row from the preceding insert will have an identical password column value.

DATE

The DATE data type stores a day, month, and year value and is created as follows:

CREATE TABLE person (

UUID DEFAULT gen_random_uuid() PRIMARY KEY,

date_of_birth

DATE NOT NULL

);

Provide values to DATE columns as string literals, interpreted literals, timestamps (which truncate to day precision), or numbers representing the number of days since the epoch:

INSERT

INTO person (date_of_birth)

VALUES

('1941-09-09'), -- String literal

(DATE '1941-09-09'), -- Interpreted literal

('1941-09-09T01:02:03.456Z'), -- Timestamp (will be truncated)

(CAST(-10341 AS DATE)); -- Number of days since the epoch

Every resulting row from the preceding insert will have an identical date_of_birth column value.

ENUM

The ENUM data type provides an enumeration that is validated upon insert and is created as follows:

CREATE TYPE planet AS ENUM (

'mercury',

'venus',

'earth',

'mars',

'jupiter',

'saturn',

'uranus',

'neptune'

);

CREATE TABLE person (

UUID DEFAULT gen_random_uuid() PRIMARY KEY,

favourite_planet

planet NOT NULL

);

As with many of CockroachDB’s data types, ENUM columns are castable from string literals, interpreted literals, or strings with direct casts:

INSERT

INTO person (favourite_planet)

VALUES

('neptune'), -- String literal

(planet 'earth'), -- Interpreted literal

(CAST('saturn' AS planet)); -- Cast

DECIMAL

The DECIMAL, DEC, or NUMERIC data type stores exact, fixed-point² numbers of variable size and is created either with or without a precision.

Let’s start by creating a DECIMAL column without specifying a precision and scale:

CREATE TABLE person (

UUID DEFAULT gen_random_uuid() PRIMARY KEY,

bitcoin_balance

DECIMAL NOT NULL

);

Inserting some values into this table will reveal that only the digits required to represent the number are used:

INSERT

INTO person (bitcoin_balance)

VALUES

(0.000030),

(0.80),

(147.50);

SELECT * FROM person;

id | bitcoin_balance

---------------------------------------+------------------

2880768d-802f-4096-933d-68be971b3a73 | 147.50

975d5aa2-7769-48e1-99dc-693b6a3fc07f | 0.80

f8a19a8f-c40c-4b6d-b186-18688f020f2b | 0.000030

Now, let’s recreate the table and provide a precision and scale for the DECIMAL column this time. The DECIMAL column type now takes two arguments: the first defines the precision of the value, and the second defines the scale. The precision argument tells CockroachDB the maximum number of integral digits (digits to the left of the decimal point) and fractional digits (digitals to the right of the decimal point):

CREATE TABLE person (

UUID DEFAULT gen_random_uuid() PRIMARY KEY,

bitcoin_balance

DECIMAL(16, 8) NOT NULL

);

Inserting some values into this table will reveal that all eight of the fractional digits are used:

INSERT

INTO person (bitcoin_balance)

VALUES

(0.000030),

(0.80),

(147.50);

SELECT * FROM person;

id | bitcoin_balance

---------------------------------------+------------------

8de29c92-13f8-4f4c-abef-62c7ff3cdb87 | 0.00003000

ee2e01fa-7ff0-4a10-9c09-f2e607e6ec49 | 0.80000000

f9180229-d42f-45b5-b6fc-914396219da8 | 147.50000000

It is possible to insert infinite and NAN (not a number) values in a DECIMAL column as follows:

INSERT

INTO person (bitcoin_balance)

VALUES

('inf'), ('infinity'), ('+inf'), ('+infinity'),

('-inf'), ('-infinity'),

('nan');

Positive infinity Bitcoin. One can dream.

FLOAT

The FLOAT, FLOAT4 (REAL), and FLOAT8 (DOUBLE PRECISION) data types store inexact, floating-point numbers with up to 17 digits of precision and are created as follows:

CREATE TABLE person (

UUID DEFAULT gen_random_uuid() PRIMARY KEY,

latitude

FLOAT NOT NULL,

longitude

FLOAT NOT NULL

);

INSERT

INTO person (latitude, longitude)

VALUES

(38.908200917747095, -77.03236828895616),

(52.382578005867906, 4.855332269875395),

(51.46112343492288, -0.11128454244911225),

(51.514018690098645, -0.1267503331073194);

SELECT * FROM person;

id | latitude | longitude

---------------------------------------+--------------------+----------

19f0cfcc-a073-4d96-850c-e121f1e940b6 | 52.382578005867906 | 4.855332269875395

78a03a7c-afe4-4196-bc0c-0e86841373e4 | 51.46112343492288 | -0.11128454244911225

9c68004e-2265-4ebf-b14c-20fa6292dc6c | 38.908200917747095 | -77.03236828895616

a9b502c0-8ffd-4e75-a796-4e87554a3ebd | 51.514018690098645 | -0.1267503331073194

INET

The INET data type stores valid IPv4 and IPv6 addresses and CIDRs (Classless Inter-Domain Routing) and is created as follows:

CREATE TABLE person (

UUID DEFAULT gen_random_uuid() PRIMARY KEY,

INET NOT NULL

);

Inserting some values into this table will reveal that CockroachDB understands the IP addresses being inserted and removes any redundant masks:

INSERT

INTO person (ip)

VALUES

('10.0.1.0/24'),

('10.0.1.1/32'),

('229a:d983:f190:75ef:5f06:a5a8:f5c2:8500/128'),

('229a:d983:f190:75ef:5f06:a5a8:f5c2:853f/100');

SELECT * FROM person;

id | ip

---------------------------------------+----------------------------------

29121f2e-e977-474b-9fb5-818399ed7b9f | 10.0.1.0/24

5cce3936-9f27-4e00-b0dc-9866e621270f | 10.0.1.1

9b53053d-ce6e-416a-9b63-9d0bc9bc9870 | 229a:d983:f190:75ef:5f06:a5a8:f5c2:8500

d383ef34-b24d-48b8-b008-57f4e001faa3 | 229a:d983:f190:75ef:5f06:a5a8:f5c2:853f/100

Notice that the 10.0.1.0/24 address keeps its mask because a /24 mask for this address includes addresses from 10.0.1.0 to 10.0.1.255. On the other hand, the 10.0.1.1/32 address has a mask of /32, which means “this IP address only,” meaning we have a specific IP address, and the mask is superfluous. The same holds for the IPv6 addresses, where a /128 means “this IP address only.”

INTERVAL

The INTERVAL data type stores durations ranging from microseconds to years and is created as follows:

CREATE TABLE rodeo_records (

UUID DEFAULT gen_random_uuid() PRIMARY KEY,

duration

INTERVAL NOT NULL

);

You can provide durations as ISO 8601 strings, time strings, and seconds. Selecting the rows out of the table reveals how the values are stored and how you can cast them to seconds:

INSERT

INTO rodeo_records (duration)

VALUES

('10:45:00.0'), -- Time string

('1h30m'), -- ISO 8601 string

(30::INTERVAL); -- Seconds

SELECT id, duration, duration::INT FROM rodeo_records;

id | duration | duration

---------------------------------------+----------+-----------

102cbbc8-6001-4642-b7e4-b4cbd9b03d35 | 10:45:00 | 38700

665f4306-0407-4d48-ab1c-4c628cd3e1d3 | 01:30:00 | 5400

7fbf8859-65fa-4100-ae61-f9bff76b89ec | 00:00:30 | 30

JSONB

The JSONB, JSON data type stores arbitrary JSON objects and is ideal for storing semi-structured data (data that has a structure that doesn’t fit or may outgrow your relational database tables). JSONB columns can be created as follows:

CREATE TABLE song (

UUID DEFAULT gen_random_uuid() PRIMARY KEY,

details

JSONB NOT NULL

);

If you plan on performing complex queries against a JSON column , you’ll want to create an inverted index for the column. I’ll cover these in a subsequent chapter on performance.

To insert data into a JSONB column, simply provide a JSON string as follows:

INSERT

INTO song (details)

VALUES

('{"label": "Century Media", "release_date": "2004-09-20"}'),

('{"label": "Season of Mist", "release_date": "2010-02-15"}'),

('{"label": "Season of Mist", "release_date": "2016-02-12"}');

JSON columns are very flexible. You can query their fields and select specific values to return. Let’s do both now. The following query returns the release dates of songs released under the “Season of Mist” record label as string values:

SELECT

details->>'release_date' AS release_date

FROM

song

WHERE

details @> '{"label": "Season of Mist"}';

release_date

----------------

2010-02-15

2016-02-12

To access specific JSON fields as regular columns, create indexes or even create PRIMARY KEYs from JSON fields within JSON documents; this is also possible with the JSON data type .

The following CREATE statement recreates the table, but this time, its id and label columns are from fields in the details JSON column:

CREATE TABLE song (

UUID PRIMARY KEY AS ((details->>'id')::UUID) STORED,

label

TEXT AS ((details->>'label')::TEXT) STORED,

details

JSONB NOT NULL,

INDEX (label),

INVERTED INDEX idx_details(details)

);

INSERT

INTO song (details)

VALUES

('{"id":"60d28ed4-ee97-43d5-98a7-ba42d478f4c7", "label": "Century Media", "release_date": "2004-09-20"}'),

('{"id":"4b158ac6-386d-4143-8281-ca6f0f9c9a93", "label": "Season of Mist", "release_date": "2010-02-15"}'),

('{"id":"c82dc39d-f310-45a4-9a31-805d923f1c8e", "label": "Season of Mist", "release_date": "2016-02-12"}');

We can now treat the id and label columns exactly as we’d treat any other database column:

SELECT

id,

label,

details->>'release_date' release_date

FROM

song

WHERE

label = 'Season of Mist';

SERIAL

The SERIAL data type is not strictly a data type but rather an INT, INT2, INT4, or INT8 with a default value applied, resulting in an auto-incrementing number. SERIAL was introduced to CockroachDB to provide Postgres compatibility. As such, you should consider using the UUID data type with a default value of gen_random_uuid() instead of the SERIAL data type, as this provides 128 bits of randomness vs. SERIAL’s maximum of 64 bits.

SERIAL columns can be created as follows:

CREATE TABLE random (

SERIAL2, -- 16-bits AKA SMALLSERIAL

SERIAL4, -- 32-bits

SERIAL8 -- 64-bits AKA SERIAL or BIGSERIAL

);

Inserting data into this table will reveal that close attention to the SERIAL documentation³ is required to fully understand the nuances of this data type:

INSERT

INTO random

DEFAULT VALUES;

SELECT * FROM random;

r1 | r2 | r3

---------------------+--------------------+---------------------

698584214451748865 | 698584214451781633 | 698584214451814401

698584216070979585 | 698584216071012353 | 698584216071045121

698584216921047041 | 698584216921079809 | 698584216921112577

698584217849954305 | 698584217849987073 | 698584217850019841

As we can see from the values generated, regardless of the integer size provided by the data type definition, the default mode for generating these values (unique_rowid()) will always result in a 64-bit integer.

STRING

The STRING, TEXT, CHARACTER, CHAR, or VARCHAR data type stores either fixed or variable-length strings of Unicode characters and is created as follows:

CREATE TABLE person (

UUID DEFAULT gen_random_uuid() PRIMARY KEY,

first_name

TEXT NOT NULL,

last_name

VARCHAR(50) NOT NULL,

grade

CHAR NOT NULL

);

By showing the columns of our table, we can see what the data types resolve to behind the scenes :

SELECT column_name, data_type

FROM [SHOW columns FROM person];

column_name | data_type

--------------+--------------

id | UUID

first_name | STRING

last_name | VARCHAR(50)

grade | CHAR

Note that VARCHAR(n) is, in fact, an alias of STRING(n), but for Postgres compatibility, it is still represented as VARCHAR(n) here, as Postgres does not have a STRING data type .

Inserting data into this table will reveal that CockroachDB faithfully represents Unicode characters:

As in Postgres, you’ll receive an error from CockroachDB if you try to insert data that will not fit into a fixed-length or limited variable-length column.

TIME/TIMETZ

The TIME and TIMEZ data types store time (minus date) values in UTC and zoned representations, respectively. They can store time values to various levels of precision, ranging from second precision down to microsecond precision.

Cockroach Labs recommends using the TIME variant and converting to local time in the front-end.

CREATE TABLE schedule (

name STRING(50)

NOT NULL,

time_of_day TIME NOT NULL

);

INSERT INTO schedule (name, time_of_day)

VALUES

('Take dogs for walk', '10:30:00'),

('Make lunch', '12:00:00');

SELECT * FROM schedule;

name | time_of_day

---------------------+--------------

Take dogs for walk | 10:30:00

Make lunch | 12:00:00

TIMESTAMP/TIMESTAMPTZ

The TIMESTAMP and TIMESTAMPTZ data types store timestamps in UTC and display the values in UTC and zoned representations, respectively.

Cockroach Labs recommends using the TIMESTAMPTZ data type over the TIMESTAMP data type for explicitness, so I’ll demonstrate the use of the TIMESTAMPTZ data type, which is created as follows:

CREATE TABLE episode_schedule (

UUID DEFAULT gen_random_uuid() PRIMARY KEY,

name

TEXT NOT NULL,

next_show_time

TIMESTAMPTZ(0) NOT NULL

);

As you can see, the TIMESTAMP data types allow for an optional precision to be provided. This precision accepts a value between zero (representing second precision) and six (representing microsecond precision). The default precision for TIMESTAMP data types is 6, so for most cases, you can omit the precision entirely.

Let’s insert some data into the table to show its representation to users in different time zones:

INSERT

INTO episode_schedule (name, next_show_time)

VALUES

('South Park - The Tail of Sc...', '2021-12-10 22:00:00+00:00'),

('South Park - Grounded Vinda...', '2021-12-10 22:30:00+00:00'),

('South Park - Make Love, Not...', '2021-12-10 23:00:00+00:00');

Let’s select the rows out of the table now, once for users in a UTC time zone and once for users in the Eastern Daylight Time (EDT) time zone :

SET TIME ZONE 'UTC';

SELECT next_show_time FROM episode_schedule;

next_show_time

--------------------------

2021-12-10 23:00:00+00

2021-12-10 22:30:00+00

2021-12-10 22:00:00+00

SET TIME ZONE 'America/New_York';

SELECT next_show_time FROM episode_schedule;

next_show_time

--------------------------

2021-12-10 18:00:00-05

2021-12-10 17:30:00-05

2021-12-10 17:00:00-05

As you can see, users in the UTC time zone see the data as it was inserted into the database, while users in the EDT time zone see the data in their local time as a -05:00 offset from UTC.

GEOMETRY

The GEOMETRY and GEOGRAPHY data types store spatial objects on variable-plane⁴ geometry and earth positions as spheroids, respectively, and are created as follows:

CREATE TABLE property (

UUID DEFAULT gen_random_uuid() PRIMARY KEY,

location

GEOMETRY NOT NULL,

INVERTED INDEX (location)

);

Both the GEOMETRY and GEOGRAPHY data types can store the following spatial objects:

GEOMETRYCOLLECTION – A collection of other spatial objects (including other GEOMETRYCOLLECTION objects)
LINESTRING – A collection of points making up an arbitrary line, useful for roads and rivers
MULTILINESTRING – A collection of LINESTRING objects
MULTIPOINT – A collection of POINT objects
MULTIPOLYGON – A collection of POLYGON objects
POINT – Two points that represent coordinates (e.g., latitude and longitude)
POLYGON – A collection of points that represent an area

Let’s insert some GEOMETRY data into our table and perform some basic operations on it. In this scenario, let’s assume we provide a service that allows users to look up stores. We hold the store locations, and users draw search areas to find them.

INSERT

INTO property (location)

VALUES

('POINT(-0.16197244907496533 51.50186005364136)'),

('POINT(-0.16139087662866003 51.498542352748814)'),

('POINT(-0.17528813494181622 51.48604279157454)');

A particularly wealthy user wishes to search for property in the Knightsbridge area of London and provides the following coordinates:

SELECT id FROM property

WHERE ST_CoveredBy(

location,

'POLYGON((-0.1642155647277832 51.496975050193456, -0.15342235565185544 51.496975050193456, -0.15342235565185544 51.50344015477113, -0.1642155647277832 51.50344015477113, -0.1642155647277832 51.496975050193456))'

);

----------------------------------------

7a37b426-c87e-454e-9d67-2848e05f8326

de7425c9-9939-4e3c-9f48-f9fabdfca9df

Their query returns just the two properties in Knightsbridge, leaving the equally lavish Chelsea property for another fabulously wealthy user.

A very neat feature of Postgres and CockroachDB is the ability to convert GEOMETRY data into GeoJSON; let’s convert the search area provided by the user into GeoJSON and visualize it on https://geojson.io :

SELECT ST_AsGeoJSON('POLYGON((-0.1642155647277832 51.496975050193456, -0.15342235565185544 51.496975050193456, -0.15342235565185544 51.50344015477113, -0.1642155647277832 51.50344015477113, -0.1642155647277832 51.496975050193456))');

{"type":"Polygon","coordinates":[[[-0.164215565,51.49697505],[-0.153422356,51.49697505],[-0.153422356,51.503440155],[-0.164215565,51.503440155],[-0.164215565,51.49697505]]]}

Pasting this data into the GeoJSON website yields the user’s search area as shown in Figure 3-2.

Figure 3-2
A GeoJSON polygon on https://geojson.io

Functions

CockroachDB provides built-in functions to make certain operations easier, and there are functions for working with most of the data types introduced in this chapter.

Built-in functions⁵ either provide functionality for working with data types or expose core system functionality such as working with stream ingestion and getting system information.

I will demonstrate the use of a small selection of built-in functions, but for brevity, I will leave the exploration of the remaining functions as an exercise to the reader. The following code example shows some of the more commonly used functions, and the comment preceding each provides a brief explanation.

-- Return the largest element in a variadic collection of items.

SELECT greatest(ARRAY[1, 2, 3], ARRAY[1, 2, 4], ARRAY[1, 2, 2]);

--> {1,2,4}

-- Return the smallest element in a variadic collection of items.

SELECT least(ARRAY[1, 2, 3], ARRAY[1, 2, 4], ARRAY[1, 2, 2]);

--> {1,2,2}

-- Return the number of non-null items in a variadic collection of items.

SELECT num_nonnulls(NULL, 1, NULL, 'a', NULL);

--> 2

-- Return the number of null items in a variadic collection of items.

SELECT num_nulls(NULL, 1, NULL, 'a', NULL);

--> 3

-- Creates a SHA256 hash from a given string or binary string.

SELECT sha256('password');

--> 5e884898da28047151d0e56f8dc6292773603d0d6aabbdd62a11ef721d1542d8

-- Returns the Soundex code for a given string.

SELECT soundex('cockroachdb');

--> C262

-- Returns the start position of a substring.

SELECT strpos('cockroachdb', 'db');

--> 10

-- Encodes a value into its hexadecimal representation.

SELECT to_hex('cockroachdb');

--> 636f636b726f6163686462

-- Decodes a value from its encoded representation (out of hex, escape, and base64).

SELECT decode('636f636b726f6163686462', 'hex');

--> cockroachdb

-- Converts an X, Y coordinate (or longitude, latitude) to a POINT geometry.

SELECT st_point(-0.667790874, 51.922472692);

--> 010100000054B68CF78A5EE5BFEACBCD9513F64940

-- Converts a GEOMETRY type to its GeoJSON representation.

SELECT st_asgeojson('010100000054B68CF78A5EE5BFEACBCD9513F64940');

--> {"type":"Point","coordinates":[-0.667790874,51.922472692]}

-- Returns the host bitmask from a CIDR.

SELECT hostmask('10.1.0.0/16');

--> 0.0.255.255

-- Returns the network bits mask used from a CIDR.

SELECT netmask('10.1.0.0/16');

--> 255.255.0.0

-- Gets a value out of a JSON object using dot-notation.

SELECT json_extract_path('{"first_name": "rob", "last_name": "reid"}', 'first_name');

--> "rob"

-- Sets a value in a JSON object using dot-notation.

SELECT json_set('{"first_name": "Rob", "last_name": "Reid"}', array['first_name'], '"Emily"');

--> {"first_name": "Emily", "last_name": "Reid"}

-- Determines whether a value exists in a JSON object using dot-notation.

SELECT jsonb_exists_any('{"foo": "bar"}', array['foo']);

--> true

-- Truncates a date beyond a given precision.

SELECT date_trunc('hour', '2021-10-07 18:26:37.241363+00'::TIMESTAMP);

--> 2021-10-07 18:00:00

-- Returns the difference between two dates.

SELECT age(now(), '1941-09-09');

--> 80 years 28 days 18:30:15.845593

There are many more functions to try; for a complete list of available functions, visit www.cockroachlabs.com/docs/stable/functions-and-operators.html .

Geo-partitioned Data

Geo-partitioning is one of CockroachDB’s most powerful Enterprise features , and as of v21.1.10, working with geo-partitioned data has become much more straightforward.

To get started with geo-partitioned data, you first need to decide on a partitioning strategy. For example, will all of the rows within a table need to remain in one location, or will specific rows within that table need to be separated by location? If you need all of the rows within a table to remain in one location, the REGION BY TABLE table locality is what you need. On the other hand, if you need to pin rows to different places, the REGION BY ROW table locality is a good choice.

Let’s create some database tables to demonstrate geo-partitioning for both locality strategies.

Before we start, let’s create a cluster that spans multiple continents. Start by creating a temporary Enterprise cluster using CockroachDB’s demo command :

$ cockroach demo

--no-example-database

--nodes 9

--insecure

--demo-locality=region=us-east1,az=a:region=us-east1,az=b:region=us-east1,az=c:region=asia-northeast1,az=a:region=asia-northeast1,az=b:region=asia-northeast1,az=c:region=europe-west1,az=a:region=europe-west1,az=b:region=europe-west1,az=c

This command creates an insecure, empty cluster with nine nodes. The --demo-locality argument allows us to specify the region and availability zones for the nodes to be distributed across: nine availability zones, one for each of our nine nodes. The syntax of this argument’s value looks tricky but is actually very simple; for each of the requested localities, we simply provide a colon-separated collection of region and availability zone pairs.

In order for CockroachDB to show a map of your cluster in its Node Map view, we need to run a SQL command in the CockroachDB shell. At the time of writing, only a handful of locations are configured in the system.locations table; both the United States and Europe are configured, but Asia is not. We need to tell CockroachDB where to place the asia-northeast1 cluster on its Node Map, and the following statement does just that:

INSERT into system.locations VALUES ('region', 'asia-northeast1', 35.689487, 139.691706);

Open a browser to http://localhost:8080/#/overview/map and you’ll see a visual representation of your new cluster such as that shown in Figure 3-3.

Figure 3-3
The CockroachDB Node Map view of our cluster

It’s now time to create some partitioned tables.

Whether we’re pinning table or row data, we need to make CockroachDB region-aware. The following statements create three regions in the database. These match the three regions we created when initializing the cluster with the demo command .

ALTER DATABASE defaultdb PRIMARY REGION "europe-west1";

ALTER DATABASE defaultdb ADD REGION "us-east1";

ALTER DATABASE defaultdb ADD REGION "asia-northeast1";

As discussed in Chapter 2, it’s vital you correctly set regional or zonal survival goals for your databases. While it is not essential to geo-partitioning, it is helpful to know how your database will function in the event of an outage.

ALTER DATABASE defaultdb SURVIVE REGION FAILURE;

REGION BY ROW

The REGION BY ROW table locality ensures that individual rows within a table are pinned to specific locations, depending on values set for a specific field. In human speak, a location can be a continent, a country, a state, or even a specific machine. In infrastructure speak, a location can be a region, an availability zone, or a node.

First, we’ll create a database for this example and make it region-aware. The following code statements create a database and attach regions and a primary region via ALTER statements:

CREATE DATABASE my_app;

USE my_app;

ALTER DATABASE my_app PRIMARY REGION "europe-west1";

ALTER DATABASE my_app ADD REGION "us-east1";

ALTER DATABASE my_app ADD REGION "asia-northeast1";

We don’t need to manually set a survival goal in this example because for now, we’ll stick with the default configuration , which allows CockroachDB to survive zonal failures.

Next, we’ll create a table. This table will have two columns that allow us to geo-partition data:

country – The country that is associated with the row.
crdb_region – The region that describes where CockroachDB will pin data. This column is populated automatically on-insert with a value derived from the country column. Using a case statement, we’ll ensure that when the country column is “JP” for Japan, “KR” for South Korea, or “TH” for Thailand, the crdb_region column receives the value “asia-northwest1”, etc.

The following code creates a table called “person” with the aforementioned geo-partitioning columns. It ensures that locality rules for the table are set to REGIONAL BY ROW, meaning each row will be pinned to a different region:

CREATE TABLE person (

UUID DEFAULT gen_random_uuid() PRIMARY KEY,

country

TEXT NOT NULL,

crdb_region crdb_internal_region NOT NULL AS (

CASE

WHEN country IN ('JP', 'KR', 'TH') THEN 'asia-northeast1'

WHEN country IN ('DE', 'FR', 'UK') THEN 'europe-west1'

WHEN country IN ('BR', 'MX', 'US') THEN 'us-east1'

ELSE 'europe-west1'

END

) STORED

);

ALTER TABLE "person" SET LOCALITY REGIONAL BY ROW;

Next, we’ll insert some data into the person table. We’ll provide different country codes to ensure that data is stored across each of our three regions.

INSERT into person (country)

VALUES

('JP'), ('KR'), -- Japan and South Korea -> asia-northeast1

('DE'), ('FR'), -- German and France -> europe-west1

('BR'), ('MX'); -- Brazil and Mexico -> us-east1

Selecting the values back out of the person table reveals that the crdb_region column is populated as expected:

SELECT * FROM person;

id | country | crdb_region

---------------------------------------+---------+------------------

3a77bfcf-e289-4e47-8e11-7ca0b77ade72 | FR | europe-west1

bc5f5bef-72b4-4c7a-a7d8-0ece1345a7e1 | DE | europe-west1

74a1c95b-c5a4-46ae-bd8d-d049514fb728 | BR | us-east1

477cbdc6-fe0f-436c-99fd-0f28866d394a | KR | asia-northeast1

8c5e8b31-c361-4157-8671-b723cfcf0e06 | MX | us-east1

e8a6760f-9596-4b4c-838b-94eb2098a73e | JP | asia-northeast1

By default, CockroachDB will use a replication factor of five for the person table. You can confirm this with a query for the table’s RANGES (I’ve modified the formatting to make it easier to read):

SELECT lease_holder_locality, replicas, replica_localities

FROM [SHOW RANGES FROM TABLE person]

WHERE start_key NOT LIKE '%Prefix%';

region=asia-northeast1,az=c

{3,4,5,6,7} {"region=us-east1,az=c","region=asia-northeast1,az=a","region=asia-northeast1,az=b","region=asia-northeast1,az=c","region=europe-west1,az=a"}

region=europe-west1,az=a

{3,6,7,8,9}

{"region=us-east1,az=c","region=asia-northeast1,az=c","region=europe-west1,az=a","region=europe-west1,az=b","region=europe-west1,az=c"}

region=us-east1,az=b

{1,2,3,5,7}

{"region=us-east1,az=a","region=us-east1,az=b","region=us-east1,az=c","region=asia-northeast1,az=b","region=europe-west1,az=a"}

From the preceding output , you’ll see that data in the person table is shared across five nodes, as per the default replication factor in CockroachDB. A replication factor of five provides additional robustness if a region becomes unavailable.

I’ll now drop the replication factor from five to three to show how CockroachDB will keep geo-partitioned data pinned to the regions specified in crdb_internal_region columns (in our case, the crdb_region column):

SET override_multi_region_zone_config = true;

ALTER TABLE person CONFIGURE ZONE USING num_replicas = 3;

The preceding ALTER statement will be blocked by CockroachDB, as any request to update the number of replicas in a table or database requires special permissions. However, as seen in the preceding statement, we can ask CockroachDB to permit us to make this change by overriding the multiregion zone configuration flag.

Over the next ~30 seconds, you’ll see CockroachDB reduce the number of replicas for the person table from five to three. Replicas are now pinned to the respective regions.

SELECT lease_holder_locality, replicas, replica_localities

FROM [SHOW RANGES FROM TABLE person]

WHERE start_key NOT LIKE '%Prefix%';

region=asia-northeast1,az=c

{4,5,6}

{"region=asia-northeast1,az=a","region=asia-northeast1,az=b","region=asia-northeast1,az=c"}

region=europe-west1,az=a

{7,8,9}

{"region=europe-west1,az=a","region=europe-west1,az=b","region=europe-west1,az=c"}

region=us-east1,az=b

{1,2,3}

{"region=us-east1,az=a","region=us-east1,az=b","region=us-east1,az=c"}

REGION BY TABLE

The REGION BY TABLE table locality ensures that all of the rows within a table remain in a particular location. In human speak, a location can be a continent, a country, a state, or even a specific machine. In infrastructure speak, a location can be a region, an availability zone, or a node.

The REGION BY TABLE table locality is the default configuration for tables in CockroachDB. As a result, access to regional tables will be faster when accessed from the database’s primary region.

Use this option if you wish to pin all data within a table to one region. In the following example, we’ll create a regional table.

First, we’ll create a table . Note that in this example, I’m not providing any regional information to the rows of the table; this is because the table itself is regional.

CREATE TABLE american_scientists (

UUID DEFAULT gen_random_uuid() PRIMARY KEY,

first_name

TEXT NOT NULL,

last_name

TEXT NOT NULL

);

INSERT into american_scientists (first_name, last_name)

VALUES

('Grace', 'Hopper'),

('Barbara', 'McClintock'),

('Rachel', 'Carson');

Assuming the database’s primary region is still set to europe-west1 from the REGION BY ROW example, if we were to select the table’s ranges at this point, we’d see that a European-based leaseholder currently manages our American scientists:

SELECT lease_holder_locality, replicas, replica_localities

FROM [SHOW RANGES FROM TABLE american_scientists];

region=europe-west1,az=a

{3,4,7,8,9}

{"region=us-east1,az=c","region=asia-northeast1,az=a","region=europe-west1,az=a","region=europe-west1,az=b","region=europe-west1,az=c"}

To move this table into the us-east1 region , we need to run a couple of simple statements. First, set the database’s primary region to us-east1, which will update the leaseholder for the data but will not physically locate all replicas to the United States. CockroachDB sets a default number of replicas for a new table to five, so as we have only three nodes in the us-east1 region, CockroachDB will use nodes from other regions to bring the replica count up to five:

ALTER DATABASE my_app PRIMARY REGION "us-east1";

SELECT lease_holder_locality, replicas, replica_localities

FROM [SHOW RANGES FROM TABLE american_scientists];

region=us-east1,az=c

{1,2,3,4,8}

{"region=us-east1,az=a","region=us-east1,az=b","region=us-east1,az=c","region=asia-northeast1,az=a","region=europe-west1,az=b"}

Next, drop the replica count down to three to ensure that all data moves across to the three nodes in the us-east1 region:

SET override_multi_region_zone_config = true;

ALTER TABLE american_scientists CONFIGURE ZONE USING num_replicas = 3;

SELECT lease_holder_locality, replicas, replica_localities

FROM [SHOW RANGES FROM TABLE american_scientists];

region=us-east1,az=c

{1,2,3}

{"region=us-east1,az=a","region=us-east1,az=b","region=us-east1,az=c"}

In the previous step, we reduced the number of replicas for the american_scientists table from five to three. Doing so will impact the resilience of the cluster, given that CockroachDB will now replicate data across fewer nodes. In later chapters, I share some cluster size and replication number recommendations for production environments.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for 3. Concepts

Create new playlist

Sign In

Sign Up

3. Concepts

Database Objects

Data Types

UUID

ARRAY

BIT

BOOL

BYTES

DATE

ENUM

DECIMAL

FLOAT

INET

INTERVAL

JSONB

SERIAL

STRING

TIME/TIMETZ

TIMESTAMP/TIMESTAMPTZ

GEOMETRY

Functions

Geo-partitioned Data

REGION BY ROW

REGION BY TABLE

Table of Contents for
3. Concepts