Procedural languages

SQL Server allows you to create a dynamic link library (DLL) in any language that produces the Common Language Runtime (CLR). These DLLs must be loaded into the server at boot time. To create a procedure at runtime and have it immediately available, the only choice is the built-in SQL dialect, Transact SQL (TSQL).

MySQL has a feature called plugins. One of the legal plugin types is a procedural language. Several languages have been tooled to work with MySQL via the plugin system, including most of the popular ones such as PHP and Python. These functions cannot be used for stored procedures or triggers, but they can be invoked from the common SQL statements. For the rest, you are stuck with the built-in SQL.

PostgreSQL has full support for additional procedural languages, which can be used to create any legal entity in the database that can be created with PL/pgSQL. The language can be added (or removed) from a running version of PostgreSQL and any function defined using this language can also be created or dropped while support for additional procedural languages, which can be used to create any legal entity in the database that can be created with PL/pgSQL. The language can be added (or removed) from a running version of PostgreSQL and any function defined using this language can also be created or dropped while PostgreSQL is running.

These languages have full access to PostgreSQL's internal functions and to all the data entities that the calling user has permission for. In addition to having access to internal PostgreSQL functions, entities, and data structures, some functions (in untrusted languages) can also access external services, create or delete files and directories, or send e-mails and invoke external processes. We will discuss trusted and untrusted languages in later chapters.

Many of these plugin language extensions are available for PostgreSQL. I have used the extensions for PHP, Python, Bash, and PL/pgSQL. Yes, this means that the standard language for PostgreSQL is also installed and managed using the same extension system as any other language.

This brings us to the point that we have more developers available for PostgreSQL than you might have originally thought. Software developers are not required to learn a new development language in order to write stored procedures. They can extend PostgreSQL with a language of their choice and continue to code in the manner and workflow that they choose.

Tip

Lesson learned

There are no second-class citizens in the PostgreSQL development community. Anyone can code in (almost) any language they choose.

Third-party tools

A frequent point of comparison among the database platforms is the number of third-party applications available. I'm not so sure that the total number matters, as much as the existence of the applications you actually need.

To this end, the following is a list of the products that I have used extensively with PostgreSQL:

  • Pentaho data integration (kettle): This is an outstanding Extract, Transform and Load (ETL) tool
  • Pentaho Report Server: This is a great reporting engine
  • pgAdmin3: This is an awesome database administration tool
  • php5-pgsql: This is a package that allows native access to PostgreSQL from PHP
  • QCubed: This is the PHP development framework with PostgreSQL support
  • Yii: This is another great PHP development framework
  • Talend: This is another ETL tool that works, but this is not my favorite
  • BIRT: This is a great JAVA reporting tool with an easy report creation environment
  • psycopg2: This is the Python bindings for PostgreSQL

These tools have made the PostgreSQL development experience a breeze, and this is no where near a complete list. We can fill this book with just a list of applications that support PostgreSQL, and thanks to its liberal license, PostgreSQL is embedded in many commercial applications that you never really know.

Tip

Lesson learned

Don't worry too much about how many tools are out there for the product. The ones that matter are available.

Platform compatibility

SQL Server is a Microsoft product. As such, it was, and will always be, a Microsoft platform tool. It is accessible to some limited degree via ODBC, but it is not a serious choice for cross-platform development.

MySQL and PostgreSQL support every operating system currently available today. This ability (or the lack of limitation) is a strong argument for long-term stability. If any particular operating system is no longer available, or no longer supports open source software, it is fairly simple to move the database server to another platform.

Tip

Lesson learned

In the commercial operating system wars, just say no.

Application design

 

"The thing that hath been, it is that which shall be; and that which is done is that which shall be done: and there is no new thing under the sun."

 
 --Ecclesiastes 1:9 (KJV)
 

"... old things are passed away; behold, all things are become new."

 
 --2 Corinthians 5:16-18 (KJV)

In software development, we are always running into the situation where what is old is new again, and developers who embrace a philosophy swear by it like a religion. We swing back and forth between thin servers and thin clients, flat and hierarchical storage, desktop applications and web applications and, most appropriately for this chapter, between client and server programming.

The reason for this swing between programming implementations has got nothing to do with the features that the client or the server offers. Developer experience is a much more likely influence, and this influence can go in either direction, depending on what the developer encountered first.

I encourage both the server-centric developer and the client-centric developer to lay down their pitchforks while reading the rest of this chapter.

We will discuss, in due time, most of the new features of server programming. If you're still not convinced, we will take a look at how you can harness the benefits of most of those features without leaving your application-centered point of view.

Databases are considered harmful

The simplest and least powerful way of looking at server programming, is to view the database as a data bucket. Using only the most basic SQL statements such as INSERT, SELECT, UPDATE, and DELETE, you can manipulate data, a single row at a time, and create application libraries for multiple databases easily.

This approach has some major drawbacks. Moving data back and forth to the database server one row at a time is extremely inefficient, and you will find that this method is simply not viable in a web-scale application.

This idea is usually associated with the concept of a database abstraction layer, a client library that allows the developer to switch the database out from under the application with little effort. This abstraction layer is very useful in the open source development community, which allows the use of many databases, but they have no financial incentive to get the best possible performance.

SQL, being based on relational algebra and tuple relational calculus, has the ability to quickly and efficiently perform set-based processing on large amounts of data; the application-side processing usually involves iterative looping, which is generally much slower.

In my 27-year career, I have never actually changed the database of an installed application without throwing away the application. One of the principles of agile software development is YAGNI (you ain't gonna need it). This is one of those cases.

Tip

Lesson learned

Data abstraction is valuable for projects that need to select a database platform at installation time. For anything else, just say no.

Encapsulation

Another technique used in more client-centric development philosophies, is to isolate the database-specific calls into a library of procedures. This design is usually aimed at leaving the application in control of all the business logic. The application is still the king, and the database is still just a necessary evil.

This view of database architecture sells the application developer short by ignoring a toolbox full of tools and choosing only the hammer. Everything in the application is then painted to look like a nail and is smacked with the hammer.

Tip

Lesson learned

Don't give up on the power of the database just because it is not familiar. Use procedural languages and check out extension toolkits. There are some awesome pieces of work in there.

What does PostgreSQL offer?

So far, we've mentioned procedural languages, functions, triggers, custom data types, and operators. These things can be created directly in the database via the CREATE commands or added as libraries using extensions.

Now, we will show you some things that you need to keep in mind when programming on the server in PostgreSQL.

Data locality

If possible, keep the data on the server. Believe me, it's happier there, and performance is much better when modifying data. If everything was done in the application layer, the data will need to be returned from the database with the modifications and then finally sent back to the database for a commit. If you are building a web-scalable application, this should be your last resort.

Let's walk through a small snippet that uses two methods in order to make an update to a single record:

<?php
  $db = pg_connect("host port user password dbname schema");
  $sql = "SELECT * FROM customer WHERE id = 23";
  $row = pg_fetch_array($db,$sql);
  if ($row['account_balance'] > 6000) {
  $sql = "UPDATE customer SET valued_customer = true WHERE id = 23;";;";
  pg_query($db,$sql);
  }
  pg_close($db);
?>

This code snippet pulls a row of data from the database server to the client, makes an evaluation, and changes a customer account based on the evaluation. The result of the change is then sent back to the server for processing.

There are several things that are wrong with this scenario. First, the scalability is terrible. Imagine if this operation needed to be performed for thousands, or even millions of customers. This will be really slow because the code will process the records one by one, and each record will be sent over the network and then updated, which involves going over the network again for each record.

The second problem is transactional integrity. What happens if the user's account balance changes from some other transaction between the query and the update? Is the customer still valued? This will depend on the business reason for the evaluation.

Try out the following example:

<?php
  $db = pg_connect('...'),
  pg_query('UPDATE customer SET valued_customer = true WHERE balance > 6000;', $db);
  pg_close($db);
?>

This example is simple, has transactional integrity, and works for an incredibly large number of customers. Why point out such a simple and obvious example? The answer is because many development frameworks work incorrectly by default. The code generator will produce some equivalent form of this example in the interest of being cross-platform, predictable, and easy to integrate into a simple design model.

This method promotes terrible practices. For systems that have a very low number of concurrent transactions, you will probably see what you expect, but as concurrency increases, the number of unintended behaviors also increase.

The second example exposes a better philosophy: operate on columns (not on rows), leave the data on the server, and let the database do the transactional work for you. That's what the database is made for.

More basics

It helps to have some basic background information before you start programming for the server. In the next few sections, we will explore the general technical environment in which you will be working. We will cover a lot of information, but don't worry too much about remembering it all right now. Just try to pick up the general idea.

Transactions

The default transaction isolation level in PostgreSQL is called Read Committed. This means that if multiple transactions attempt to modify the same data, they must wait for each other to finish before acting on the resulting data. They wait in a first-come-first-serve order. The final result of the data is what most people will naturally expect: the last chronological change being reflected.

PostgreSQL does not provide any way to do a dirty read. A dirty read is the ability to view the data the way it appears in someone else's transaction and to use it as if it were committed. This ability is not available in PostgreSQL because of the way in which the multiversion concurrency control works.

There are other transaction isolation methods available; you can read about them in detail at http://www.postgresql.org/docs/current/static/transaction-iso.html.

It is important to note, that when no transaction blocks are specified (BEGIN .. END), PostgreSQL will treat each individual statement like a private transaction and commit immediately when the statement is finished. This gives other transactions a chance to settle between your statements. Some programming languages provide a transaction block around your statements, while some do not. Please check your language documentation to find out whether you are running in a transacted session.

Note

When using the two main clients to interact with PostgreSQL, the transaction behavior is different. The psql command-line client does not provide transaction blocks. You are expected to know when to start/stop a transaction on your own. The pgAdmin3 query window, on the other hand, wraps any statement that you submit into a transaction block for you. This way it provides a cancel option. If the transaction is interrupted, ROLLBACK will be performed and the database will go back to its former state.

Some operations are exempt from transactions. For example, a sequence object will continue to increment even if the transaction fails and is rolled back. CREATE INDEX CONCURRENTLY requires the management of its own transactions and should not be called from within a transaction block. The same is true for VACUUM, as well as CLUSTER.

General error reporting and error handling

If you want to provide a status to the user during execution, you should be familiar with the commands RAISE, NOTICE, and NOTIFY. From a transactional perspective, the difference is that RAISE and NOTICE will send the message immediately, even when wrapped in a transaction, while NOTIFY will wait for the transaction to settle before sending a message. NOTIFY will, therefore, actually not notify you of anything if the transaction fails and is rolled back.

User-defined functions

The ability to write user-defined functions is the powerhouse feature of PostgreSQL. Functions can be written in many different programming languages, can use any kind of control structures that the language provides, and in the case of "untrusted" languages, can perform any operation that is available in PostgreSQL.

Functions can provide features that are not even directly related to SQL. Some of the upcoming examples will show you how to get network address information, query the system, move files around, and do just about anything that your heart desires.

So, how do we access the sugary goodness of PostgreSQL? We start by declaring that we want a function:

CREATE OR REPLACE FUNCTION addition (integer, integer) RETURNS integer
AS $$
DECLARE retval integer;
BEGIN
  SELECT $1 + $2 INTO retval;
  RETURN retval;
END;
$$ LANGUAGE plpgsql;

What if we want to add three integers together? Using the following code we can add three integers together:

CREATE OR REPLACE FUNCTION addition (integer, integer, integer) RETURNS integer
AS $$
DECLARE retval integer;
BEGIN
  SELECT $1 + $2 +$3 INTO retval;
  RETURN retval;
END;
$$ LANGUAGE plpgsql;

We just invoked a concept called function overloading. This feature allows you to declare a function of the same name, but with different parameters that potentially behave differently. This difference can be just as subtle as changing the data type of one of the arguments to the function. The function that PostgreSQL invokes depends on the closest match to the function arguments and expected return type.

Suppose we want to add together any number of integers? Well, PostgreSQL has a way to do this also, as follows:

CREATE OR REPLACE FUNCTION addition (VARIADIC arr integer[]) RETURNS integer
AS $$
DECLARE retval integer;
BEGIN
  SELECT sum($1[i]) INTO retval FROM generate_subscripts($1, 1) g(i) ;
  RETURN retval;
END;
$$
LANGUAGE plpgsql;

This will allow you to pass in any number of integers and get an appropriate response. These functions, of course, do not handle real or numeric data types. To handle other data types, simply declare the function again with those types and call them with the appropriate parameters.

For more information about variable parameters, check out http://www.postgresql.org/docs/9.3/static/xfunc-sql.html#XFUNC-SQL-VARIADIC-FUNCTIONS.

Other parameters

There is more than one way to get data into a function and out of it. We can also declare IN/OUT parameters, return a table, return a set of records, and use cursors for both the input and output.

This brings us to a pseudotype called ANY. It allows the parameter type to be undefined, and it allows any basic data type to be passed to the function. Then, it is up to the function to decide what to do with the data. There are several other pseudotypes available in PostgreSQL, also called the polymorphic types. These are anyelement, anyarray, anynonarray, anyenum, and anyrange. You can read more about these pseudotypes at http://www.postgresql.org/docs/9.3/static/datatype-pseudo.html.

Here is an example of the pseudotype anyarray. The following simple function takes an array of any type as an argument and returns an array by removing all the duplicates:

CREATE OR REPLACE FUNCTION remove_duplicates(anyarray)
RETURNS anyarray AS
$$
  SELECT ARRAY(SELECT DISTINCT unnest($1));
$$
LANGUAGE 'sql' ;

postgres=# SELECT remove_duplicates(ARRAY[1,1,2,2,3,3]);
remove_duplicates 
-------------------
 {1,2,3}
(1 row)

postgres=# SELECT remove_duplicates(ARRAY['a','a','b','c']);
 remove_duplicates 
-------------------
 {b,a,c}
(1 row)

More control

Once you have your function written the way you need, PostgreSQL gives you additional control over how the function executes. You can control what data the function can access and how PostgreSQL will interpret the expense of running the function.

There are two statements that provide a security context for your functions. The first one is SECURITY INVOKER, which is the default security context. In the default context, the privileges of the calling user are respected by the function.

The other context is SECURITY DEFINER. In this context, the user privileges of the creator of the function are respected during the execution of the function. Generally, this is used to temporarily escalate user rights for a specific purpose.

This is very useful if you want to have a stricter control over your data. In an environment where security of the underlying data is important, and you don't want users to directly SELECT the data or change it using INSERT, UPDATE, or DELETE, you can create a set of functions with the security-definer attribute as APIs for the tables. This approach gives you complete control over your API behavior and how users can access the underlying objects.

Cost can also be defined for the function. This cost will help the query planner estimate how expensive it is to call the function. Higher orders of cost will cause the query planner to change the access path, so your function will be called as few times as possible. The PostgreSQL documentation shows these numbers to be a factor of cpu_operator_cost. That's more than a little misleading. These numbers have no direct correlation to CPU cycles. They are only relevant in comparison with one another. It's more like how some national money compares with the rest of the European Union. Some Euros are more equal than others.

To estimate your own function's complexity, start with the language you are implementing it in. For C, the default will be 1 * number of records returned, and it will be 100 for all other languages, according to the PostgreSQL documentation at http://www.postgresql.org/docs/current/static/sql-createfunction.html. For plsh, you may want to use 150 or more, depending on how many external tool calls are involved in getting an answer. The default is 100 and this seems to work reasonably well for PL/pgSQL.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.17.110.58