Chapter 5. SQL Language

Structured Query Language (SQL) is used to set up the structure of the database, to manipulate the data in the database and to query the database. This chapter will be dedicated to the Data Manipulation Language (DML).

After reading this chapter, you will understand the concept of SQL and the logic of SQL statements. You will be able to write your own SQL queries and manipulate the data using this language.

The complete reference of SQL can be found in the official PostgreSQL documentation at http://www.postgresql.org/docs/current/static/sql.html.

So, the topics we are going to cover in this chapter are as follows:

  • SQL fundamentals
  • Lexical structure
  • Select
  • Update
  • Delete

Code examples in this chapter are based on the car portal database described in the previous chapters. The scripts to create the database and fill it with data can be found in the attached media in the Chapter 5 folder. They are called schema.sql and data.sql.

All the code examples of this chapter can be found in the file examples.sql.

Refer to Chapter 2, PostgreSQL in Action for details on how to use the PSQL console.

SQL fundamentals

SQL is used to manipulate the data in the database and to query the database. Also, SQL is used to define the structure of the data. You already know that from the previous chapters. In general, SQL consists of three parts:

  • Data Definition Language (DDL)
  • Data Manipulation Language (DML)
  • Data Control Language (DCL)

The first part is used to create and manage the structure of the data, the second part is used to manage the data itself, and the third part—to control access to the data. Usually, the data structure is defined only once and then it is rarely changed. But the data is constantly inserted into the database, changed or retrieved. For that reason, DML is used much more often than DDL.

SQL is not an imperative programming language, which makes it different from many other languages. To be more specific, one cannot define a detailed algorithm of how the data should be processed, and this might make an impression of lack of control of the data. In imperative languages, the developer usually specifies it in very detailed level: where to take the data from and how to do it, how to iterate through the array of records, and when and how to process them. If it is necessary to process the data from multiple sources, the developer should implement the relationship between them in the application layer rather than in the database.

SQL, in contrast, is a declarative language. In other words: to get the same result in other languages, the developer writes a whole story. In SQL, the developer writes only one major sentence and leaves details for the database. In SQL, one just defines the format in which it is needed to get the data from the database, specifies the tables where the data is stored and states the rules for processing the data. The exact order of these operations and the actual algorithm for processing them are chosen by the database, and the developer should not care about it.

But this black-box behavior should not be treated as something bad. First, the box is not completely black: there are ways to know how the data is processed by the database engine, and there are ways to control it. And second, the logic in the SQL statement is very deterministic. Even if it is not clear how the database is processing the query on a low level, the logic of the process and the result of the query is entirely determined by the SQL statement.

This determines the size of a statement (smallest standalone element of execution). In Java, for example, every operation such as assignment a value to a variable is logically processed as a separate item of an algorithm. In contrast, the logic of SQL implies that the whole algorithm is executed all at once, as one statement. There is no way to get the state of the data at any intermediate step of the execution of the query. But this does not limit the complexity of the logic of the query. It is possible to implement any sophisticated algorithm in a single SQL statement. And usually it takes less time to implement complex logic in SQL than in any lower-level language. Developers operate with logical relational data structures and do not need to implement their own algorithms of data processing on a physical level. This is what makes SQL so powerful.

Another good thing about SQL is that there is a standard for the language, and every modern relation database supports SQL. Although different databases could support different features and implement their own dialect of SQL, the basics of the language are the same. PostgreSQL also has its own SQL dialect, and we will point out some differences to the other RDBMS. By the way, at the beginning of its history, postgres did not support SQL. It was added in 1994 and after a while the database was renamed PostgreSQL to indicate that fact.

SQL lexical structure

The minimal SQL instruction that can be executed by the database engine is called a statement. Also it can be called a command or query. For example, each of the following is a statement:

SELECT car_id, number_of_doors FROM car_portal_app.car;
DELETE FROM car_portal_app.a;
SELECT now();

SQL commands are terminated by a semicolon (;). End of input also terminates the command, but that depends on the tools used.

SQL statements can contain:

  • Keywords determine what exactly it is required from the database to be done
  • Identifiers refer to the objects in the database—tables, their fields, functions, and so on
  • Constants are parts of expressions whose values are specified directly in the code
  • Operators determine how the data is processed in the expressions
  • Special characters, such as parenthesis, brackets, commas, and so on, which have other meanings than simply being an operator
  • Whitespaces separate words from each other
  • Comments are used to describe a particular line of code

Keywords are words such as SELECT or UPDATE. They have special meaning in SQL. They are names of statements or parts of statements. The full list of keywords can be found in the documentation at http://www.postgresql.org/docs/current/static/sql-keywords-appendix.html.

Identifiers are the names of the database objects. Objects such as tables or views can be referred by the name of the schema where it belongs to (see Chapter 3, PostgreSQL Basic Building Blocks) followed by the dot symbol (.) and the name of the object. That is called a qualified object name. If the name of the schema is included in the search_path setting or if the object belongs to the current user's schema, then it is not required to use the schema name when referring to the object. In that case, it is called an unqualified object name. Names of the fields of tables are used in the same way: table name, then dot (.) and field name. In some cases, it is not necessary to specify table the name, for example, when only one table is queried, and in other cases, it is possible to use a table alias.

SQL is not case sensitive. Both keywords and identifiers can contain any letters (a-z), digits (0-9), underscores (_) or dollar signs ($). But they cannot start with a digit or dollar sign. That makes them similar to each other and without knowing the language, sometimes it is difficult to say if some word is a keyword or an identifier. Usually, keywords are typed in upper case.

In identifiers, it is still possible to use symbols other than those mentioned earlier, by double-quoting them. Also it is possible to create objects with the same names as keywords but is not recommended.

Constants in SQL are also called literals. PostgreSQL supports three types of implicitly typed constants: numbers, strings and bit strings. To use constant values of any other data type, implicit or explicit conversion should be performed.

Numeric constants contain digits and optionally decimal point, and exponent sign. These are examples of selecting valid numeric constants:

SELECT 1, 1.2, 0.3, .5, 1e15, 12.65e-6;

String constants should be quoted. There are two kinds of syntax of string constants in PostgreSQL: single quoted constants like in SQL standard, and PostgreSQL-specific dollar-quoted constants. Putting a letter E before the string constant makes it possible to use C-style backslash escaped characters such as for a new line, or for tabulation. A single quote character (') inside a literal should be doubled (") or used as an escape string (). Putting a letter U with an ampersand before the string without any spaces in between allows you tu specify unicode characters by their code after a backslash.

The examples would be:

SELECT 'a', 'aa''aa', E'aa
aa', $$aa'aa$$,
  U&'41C418420';
?column? | ?column? | ?column? | ?column? | ?column?
----------+----------+----------+----------+----------
 a        | aa'aa    | aa      +| aa'aa    | МИР
          |          | aa       |          |

The first is simple—a letter a. The second will have a single quote in the middle. The third has C-style new line sequence: . The next string is dollar-quoted. And the last has unicode characters (the word means "peace" in Russian).

Dollar-quoted string constants always have the same value as it was written. No escape sequences are recognized and any kind of quotes are part of the string except for dollar-quote when it is written in the same way as in the beginning. Dollar-quoted strings can have their names set between the dollar signs, which makes it possible to use one dollar-quoted string inside another, like this:

SELECT $str1$SELECT $$dollar-quoted string$$;$str1$;
 ?column?
----------------------------------
 SELECT $$dollar-quoted string$$;

Here the sequences $str1$ are the quotes, and another double dollar inside the literal does not terminate the string. That's why it is very common to use dollar-quoted string to define a function body that is usually given to the PostgreSQL server as a string literal.

Bit strings are preceded by a letter B and can contain only digits 0 or 1. Alternatively, they can be preceded by a letter X and contain any digits, along with letters A-F. In that case, they are hexadecimal strings. Most of the time bit strings are converted to a numeric data type:

SELECT B'01010101'::int, X'AB21'::int;
 int4 | int4
------+-------
   85 | 43809

Operators are basic elements of data processing. They are used in SQL expressions. They take one or two arguments and return a value. The examples of operators can be addition (+), subtraction (-), and so on. PostgreSQL supports a wide range of operators for all data types. In the statements, operators look like sequences of characters from the list: + - * / < > = ~ ! @ # % ^ & | ` ?.

When several operators are used in the same expression, they are executed in a specific order. Some operators, such as multiplication (*) or division (/), have higher precedence among others, and some other operators such as logical or comparison operators have lower precedence. The operators with the same precedence are executed from left to right. The full list of operators and their precedence can be found in the documentation at http://www.postgresql.org/docs/current/static/sql-syntax-lexical.html#SQL-SYNTAX-OPERATORS.

Special characters include:

  • Parenthesis (()): These are used to control the precedence of operations or to group expressions. Also they can have special meaning in the syntax of particular SQL command. And they are used as a part of a function name.
  • Brackets ([]): These are used to select elements from an array.
  • Colons (:): These are used to access parts of arrays.
  • Double colons (::): These are used for type casing.
  • Commas (,): These are used to separate elements of a list.
  • Periods (.): These are used to separate schema, table and column names from each other.
  • Semicolon (;): This is used to terminate a statement.
  • Asterisk (*): This is used to refer to all the fields of a table or all the elements of composite value.

Whitespaces separate words from each other. In SQL, any number of spaces, new lines, or tabulations are considered as a single whitespace.

Comments can be used in any part of SQL code. The server ignores comments treating them as whitespace. Comments are quoted in pairs of /* and */. Also, the whole line of code can be commented by using double dash (--). In this case, comment starts from double dash and ends at the end of the line.

Simply speaking, DML has only four types of statements:

  • INSERT is used to put new data into the database
  • UPDATE is used to change the data
  • DELETE is used to delete the data
  • SELECT is used to retrieve the data

The structure of every statement is strict and human readable, though the syntax of each statement is different. A complete list of detailed syntax diagrams can be found in the PostgreSQL documentation at http://www.postgresql.org/docs/current/static/sql-commands.html. In this chapter, the main elements of each statement will be described.

SELECT will be the first because it is the most used command and because very often it is used as an element of other commands. SQL allows that: it is possible to nest commands, using a result of one command as an input for another command. That nested queries are called subqueries.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.147.36.61