Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Cross column correlation

Cross column correlation can cause a wrong estimation of the number of rows as PostgreSQL assumes that each column is independent of other columns. In reality, there are a lot of examples where this is not true. For example, one could find patterns where the first and last names in certain cultures are correlated. Another example is the country and language preference of the users. To understand cross column correlation, let's create a table called users, as follows:

CREATE TABLE users (
  id serial primary key,
  name text,
  country text,
  language text
);
INSERT INTO users(name, country, language) SELECT generate_random_text(8), 'Germany', 'German' FROM generate_series(1, 10);
INSERT INTO users(name, country, language) SELECT generate_random_text(8), 'USA', 'English' FROM generate_series(1, 10);
VACUUM ANALYZE users;

If one wants to get users whose language is German and country is Germany, he/she will end up with a wrong estimation of rows as both columns are correlated, as follows:

EXPLAIN SELECT * FROM users WHERE country = 'Germany' and language ='German';
Seq Scan on users  (cost=0.00..1.30 rows=5 width=26)
  Filter: ((country = 'Germany'::text) AND (language = 'German'::text))

Note that the number of estimated rows is five, which is calculated as follows:

Estimated number of rows = Total_number_of_rows * selectivity of country * selectivity of language

Estimated number of rows = 20 *.5 * .5 = 5

A simple solution to correct the number of rows is to change the physical design of the table and combine both fields in one, as follows:

CREATE TABLE  user_2 (
  id serial,
  name text,
  user_info jsonb
);

INSERT INTO user_2(name,  user_info) SELECT generate_random_text(8), '{"country":"Germany", "language":"German"}'  FROM generate_series(1, 10);
INSERT INTO user_2(name,  user_info) SELECT generate_random_text(8), '{"country":"USA", "language":"English"}'  FROM generate_series(1, 10);
VACUUM ANALYZE user_2;

In the preceding example, jsonb is used to wrap the country and language; the explain plan gives the correct estimation of the number of rows, as follows:

EXPLAIN SELECT * FROM user_2 WHERE user_info = '{"country":"USA", "language":"English"}';
Seq Scan on user_2  (cost=0.00..1.25 rows=10 width=60)
  Filter: (user_info = '{"country": "USA", "language": "English"}'::jsonb)

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Cross column correlation

Create new playlist

Sign In

Sign Up

Cross column correlation

Table of Contents for
Cross column correlation