Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Chapter 5. Data Types

PostgreSQL supports the workhorse data types of any database: numerics, characters, dates and times, booleans, and so on. PostgreSQL sprints ahead by adding support for dates and times with time zones, time intervals, arrays and XML. If that’s not enough, you can even add your custom types. In this chapter, we’re not going to dwell on the vanilla data types, but focus more on showing you ones that are unique to PostgreSQL.

Numeric Data Types

You will find your everyday integers, decimals, and floating point numbers in PostgreSQL. Of the numeric types, we just want to highlight the serial and bigserial data types and a nifty function to quickly generate arithmetic series of integers.

Serial

Strictly speaking, serial is not a data type in its own right. Serial and its bigger sibling bigserial are auto-incrementing integers. This data type goes by different names in different databases, autonumber being the most common alternative moniker. When you create a table and specify a column as type serial, PostgreSQL first creates a column of data type integer and then creates a sequence object in the background. It then sets the default of the new integer column to pull its value from the sequence. In PostgreSQL, sequence is a database object in its own right, and an ANSI-SQL standard feature you will also find in Oracle, IBM DB2, SQL Server 2012+, and some other relational databases. You can inspect and edit the object using pgAdmin or with ALTER SEQUENCE. You can edit its current value, where the sequence should begin and end, and even how many numbers to skip each time. Because sequences are independent objects, you can create them separate from a table using CREATE SEQUENCE, and you can share the same sequence among different tables. If you want two tables to never end up with a common identifier field, you could have both tables pull from the same sequence when creating new rows by setting the default value of the integer column to next sequence nextval() function.

Generate Series Function

PostgreSQL has a nifty function called generate_series() that we have yet to find in other leading databases. It’s actually part of a family of functions for automatically creating sequential rows. What makes generate_series() such a great function is that it allows you perform a FOR .. LOOP like behavior in SQL. Suppose we want a list of the last day of each month for a particular date range. To do this in another language would either involve some procedural loop or creating a massive cartesian product of dates and then filtering. With generate_series, you can do it with a query as shown in Example 5-12.

Here’s another example using integers with an optional step parameter:

Example 5-1. generate_series() with stepping of 13

SELECT x FROM generate_series(1,51,13) As x;

x
----
1
14
27
40

As shown in Example 5-1, you can pass in an optional step argument that defines how many steps to skip for each successive element. Leaving out the step will default it to 1. Also note that the end value will never exceed our prescribed range, so although our range ends at 51, our last number is 40 because adding another 13 to our 40 exceeds the upper bound.

Arrays

Arrays play an important role in PostgreSQL. They are particularly useful in building aggregate functions, forming IN and ANY clauses, as well as holding intermediary value for morphing to other data types. In PostgreSQL, each data type, including custom types you build, has a companion array type. For example, integer has an integer array type integer[], character has a character array type character[], and so forth. We’ll show you some useful functions to construct arrays short of typing them in manually. We will then point out some handy functions for array manipulations. You can get the complete listing of array functions and operators in the PostgreSQL reference Array Operators and Functions.

Array Constructors

The most rudimentary way to create an array is to simply type the elements:

SELECT ARRAY[2001, 2002, 2003] As yrs;

If the elements of your array can be extracted from a query, you can use the more sophisticated constructor function: array():

SELECT array(SELECT DISTINCT date_part('year', log_ts) 
 FROM logs ORDER BY date_part('year', log_ts));

Although array() has to be used with a query returning a single column, you can specify a composite type as the output, thus achieving multicolumn results. We demonstrate this in Custom and Composite Data Types.

You can convert delimited strings to an array with the string_to_array() function as demonstrated in Example 5-2:

Example 5-2. Converting a delimited string to an array

SELECT string_to_array('abc.123.z45', '.') As x;

x
---------------
{abc,123,z45}

array_agg() is a variant function that can take a set of any data type and convert it to an array. See this example Example 5-3:

Example 5-3. Using GROUP BY with array_agg()

SELECT array_agg(log_ts ORDER BY log_ts) As x
FROM logs
WHERE log_ts BETWEEN '2011-01-01'::timestamptz AND '2011-01-15'::timestamptz;

x
---------------
{'2011-01-01', '2011-01-13', '2011-01-14'}

Referencing Elements in An Array

Elements in arrays are most commonly referenced using the index of the element. PostgreSQL array index starts at 1. If you try to access an element above the upper bound, you won’t get an error—only NULL will be returned. The next example grabs the first and last element of our array column.

SELECT fact_subcats[1] AS primero
  , fact_subcats[array_upper(fact_subcats, 1)] As ultimo 
 FROM census.lu_fact_types;

We used array_upper() to get the upper bound of the array. The second, required parameter of the function indicates the dimension. In our case, our array is just one-dimensional, but PostgreSQL supports multi-dimensional arrays.

Array Slicing and Splicing

PostgreSQL also supports array slicing using the start:end syntax. What gets returned is another array that is a subset of the original. For example, if we wanted to return from our table new arrays that just contain elements 2 through 4 of each original, we would type:

SELECT fact_subcats[2:4] FROM census.lu_fact_types;

And to glue two arrays together end to end, we simply use the concatenation operator as follows:

SELECT fact_subcats[1:2] || fact_subcats[3:4] FROM census.lu_fact_types;

Character Types

There are tree basic types of character types in PostgreSQL: character (a.k.a. char), character varying (a.k.a. varchar), and text. Unlike other databases you might have worked with, text is not stored any differently from varchar, and no performance difference for the same size data so PostgreSQL has no need for distinctions like mediumtext, bigtext, and so forth. Even if a type is text, you can still sort by it. Any data larger than what can fit in a record page gets pushed to TOAST. So how text/varchar are stored is only contingent on the actual size of the data in the field and PostgreSQL handles it all for you. When using varchar, there are still some gotchas when you try to enlarge the number of characters. If you try to expand the size of an existing varchar field for a table with many rows, the process could take a while. People have different opinions as to whether you should abandon the use of varchar and just stick with text. Rather than waste space arguing about it here, read the debate at In Defense of VarcharX.

Note

The difference between varchar with no size modifier and text is subtle. varchar has a cap around 1 GB and text has no limit. In practice. you can do things like override the behavior of varchar operators. which you can’t do easily with text. This override is particularly useful for cross-database compatibility. We demonstrate an example of this in Using MS Access with PostgreSQL, where we show how to make varchar behave without case sensitivity and still be able to use an index. varchar without a size modifier is essentially equivalent to SQL Server’s varchar(max).

Most people use text or varchar except for cases where a value should be exactly n characters long. The reason for this? character is right-padded with spaces out to the specified size for both storage and display; this is more storage costly, though more semantically meaningful for a key that should be a fixed length. For comparison the extra spaces are ignored for character, but not for varchar. Performance-wise, there is no speed benefit with using character over varchar in PostgreSQL.

PostgreSQL has an abundant number of functions for parsing strings. In this section, we’ll give some common recipes we’ve found useful.

String Functions

The most common manipulations done to strings is to pad, trim off white space, and extract substrings. PostgreSQL has no shortage of these functions to aid you in these endeavors. In this section, we’ll provide examples of these. These functions have been around since the age of dinosaurs, so regardless of which version of PostgreSQL you’re using, you should have all these at your disposal. PostgreSQL 9.0 introduced a new string aggregate function called string_agg(), which we demonstrated in Example 3-8. string_agg() is equivalent in concept to MySQL’s group_concat().

Example 5-4. Using lpad() and rpad() to pad

SELECT lpad('ab', 4, '0') As ab_lpad, rpad('ab', 4, '0') As ab_rpad, lpad('abcde', 4, '0') As ab_lpad_trunc;

ab_lpad | ab_rpad | ab_lpad_trunc
---------+---------+---------------
00ab    | ab00    | abcd

Observe that in Example 5-4, lpad() actually truncates instead of padding.

PostgreSQL has several functions for trimming text. These are trim() (a.k.a. btrim()), ltrim(), rtrim(). By default, all trim will remove spaces, but you can pass in an optional argument indicating other characters to trim.

Example 5-5. Using trims to trim space and characters

SELECT a As a_before, trim(a) As a_trim , rtrim(a) As a_rt, i As i_before, ltrim(i,'0') As i_lt_0, rtrim(i,'0') As i_rt_0, trim(i,'0') As i_t_0
FROM (SELECT repeat(' ', 4) || i::text || repeat(' ', 4) As a, '0' || i::text As i 
  FROM generate_series(0, 200, 50) As i) As x;

a_before| a_trim |  a_rt   | i_before | i_lt_0 | i_rt_0 | i_t_0
--------+--------+---------+----------+--------+--------+-------
0       | 0      |     0   | 00       |        |        |
50      | 50     |     50  | 050      | 50     | 05     | 5
100     | 100    |     100 | 0100     | 100    | 01     | 1
150     | 150    |     150 | 0150     | 150    | 015    | 15
200     | 200    |     200 | 0200     | 200    | 02     | 2

Splitting Strings into Arrays, Tables, or Substrings

There are a couple of functions useful in PostgreSQL for breaking strings apart.

The split_part() function is useful for getting an element of a delimited string.

Example 5-6. Get the nth element of a delimited string

SELECT split_part('abc.123.z45', '.', 2) As x;

x
--------
123

The string_to_array() is useful for creating an array of elements from a delimited string. By combining string_to_array() with unnest() function, you can expand the returned array into a set of rows.

Example 5-7. Convert delimited string to array to rows

SELECT unnest(string_to_array('abc.123.z45', '.')) As x;

x
--------
abc
123
z45

Regular Expressions and Pattern Matching

PostgreSQL’s regular expression support is downright fantastic. You can return matches as tables, arrays, or do fairly sophisticated replace and updates. Back-referencing and other fairly advanced search patterns are also supported. In this section, we’ll provide a short-sampling of these. For more information, refer to the official documentation, in the following sections: Pattern Matching and String Functions.

Our example shows you how to format phone numbers stored simply as contiguous digits:

Example 5-8. Reformat a phone number using back referencing

SELECT regexp_replace('6197256719', '([0-9]{3})([0-9]{3})([0-9]{4})', E'(\1) \2-\3') As x;

x
----------------
(619) 725-6719

The \1, \2, etc. refers to the elements in our pattern expression. We use the reverse solidus ( to escape the parenthesis. The E' is PostgreSQL syntax for denoting that a string is an expression so that special characters like would be treated literally.

You might have a piece of text with phone numbers embedded; the next example shows how to extract the phone numbers and turn them into rows all in one step.

Example 5-9. Return phone numbers in piece of text as separate rows

SELECT unnest(regexp_matches('My work phone is (619)725-6719. My mobile is 619.852.5083. 
 Mi número de casa es 619-730-6254. Call me.',
   E'[(]{0,1}[0-9]{3}[)-.]{0,1}[0-9]{3}[-.]{0,1}[0-9]{4}', 'g')) As x;

x
---------------
(619)725-6719
619.852.5083
619-730-6254

Below, we list the matching rules for Example 5-9:

[(]{0,1}: Starts with 0 or 1 (.
[0-9]{3}: Followed by 3 digits.
[)-.]{0,1}: Followed by 0 or 1 of ),-, or .
[0-9]{4}: Followed by 4 digits.
regexp_matches() returns a string array consisting of matches of a regular expression. If you don’t pass in the 'g' parameter, your array will just return the first match of the regular expression. The 'g' stands for global and returns all matches of a regular expression as separate elements.
unnest() is a function introduced in PostgreSQL 8.4 that explodes an array into a row set.

Note

There are many ways to write the same regular expression. \d is short-hand for [0-9], for example. But given the few characters you’d save, we prefer the more descriptive long form.

In addition to the wealth of regular expression functions, you can use regular expressions with SIMILAR TO and ~ operators. In the next example, we’ll return all description fields with embedded phone numbers.

SELECT description FROM mytable WHERE description ~  E'[(]{0,1}[0-9]{3}[)-.]{0,1}[0-9]{3}[-.]{0,1}[0-9]{4}';

Temporal Data Types

PostgreSQL support for temporal data is the best of any database we've come across. In addition to the usual dates and times, PostgreSQL has support for time zones, enabling the automatic handling of DST conversions by region. Details of the various types and DST support is detailed in Data Types. Specialized data types such as interval allows for easy arithmetics using dates and times. Plus, PostgreSQL has the concept of infinity and negative infinity, saving us from explicitly having to create conventions that we’ll forget. Finally, PostgreSQL 9.2 unveiled range types that provide support for date ranges as well as numeric types with companion operators, index bindings and functions for working with them and ability to create new range types. There are nine data types available in a PostgreSQL database for working with temporal data and understanding the distinctions could be important to make sure you choose the right data type for the job. These data types are all defined in the ANSI-SQL 92 specs except for the PostgreSQL range types. Many other leading databases support some, but not all, these data types. Oracle has the most varieties of temporal types, MS SQL 2008+ comes in second, and MySQL of any version comes in last (with no support for timezones in any version and in lower versions not even properly checking validity of dates).

date just stores the month, day, and year, with no timezone awareness and no concept of hours, minutes, or seconds.
time records hours, minutes, seconds with no awareness of time zone or calendar dates.
timestamp records both calendar dates and time (hours, minutes, seconds) but does not care about the time zone. As such the displayed value of this data won’t change when you change your server’s time zone.
timestamptz (a.k.a. timestamp with time zone) is a time zone-aware date and time data type. Internally, timestamptz is stored in Coordinated Universal Time (UTC), but display defaults to the time zone of the server (or database/user should you observe differing time zones at those levels). If you input a timestamp with no time zone and cast to one with time zone, PostgreSQL will assume the server’s time zone. This means that if you change your server’s time zone, you’ll see all the displayed times change.
timetz (a.k.a. time with time zone) is the lesser-used sister of timestamptz. It is time zone-aware but does not store the date. It always assumes DST of the current time. For some programing languages with no concept of time without date, it may map timetzto a timestamp with a time zone at the beginning of time (for example, Unix Epoch 1970, thus resulting in DST of year 1970 being used).
interval is a duration of time in hours, days, months, minutes, and others. It comes in handy when doing date-time arithmetic. For example, if the world is supposed to end in exactly 666 days from now, all you have to do is add an interval of 666 days to the current time to get the exact moment when it’ll happen (and plan accordingly).
tsrange is new in PostgreSQL 9.2 and allows you to define opened and closed ranges of timestamp with no timezone. The type consists of two timestamps and opened/closed range qualifiers. For example '[2012-01-01 14:00, 2012-01-01 15:00)'::tsrange would define a period starting at 14:00 but ending before 15:00.
tstzrange is new in PostgreSQL 9.2 and allows you to define opened and closed ranges of timestamp with timezone.
daterange is new in PostgreSQL 9.2 and allows you to define opened and closed ranges of dates.

Time Zones: What It Is and What It Isn’t

A common misconception of PostgreSQL time zone aware data types is that an extra time zone information is being stored along with the timestamp itself. This is incorrect. If you save 2012-2-14 18:08:00-8 (-8 being the Pacific offest from UTC), Postgresql internally works like this:

Get the UTC time for 2012-02-14 18:08:00-8. This would be 2012-02-15 04:08:00-0.
PostgreSQL stores the value 2012-02-15 04:08:00 in the database.

When you call the data back for display, PostgreSQL goes through the following steps:

Find the time zone observed by the server or what was requested. Suppose it’s “America/New_York” and get the offset for that period of time corresponding to the UTC of the date time in question. For things that just have a time element like timetz, the offset assumed—if not specified—is the current local time offset. Let’s suppose it’s -5. You can also directly specify an offset instead of a time zone to avoid the Daylight Savings check.
Compute the date time 2012-02-15 04:08:00 with a -5 offset to get 2012-02-15 21:08:00, then display as 2012-02-15 21:08:00-5.

As you can see, PostgreSQL doesn’t store the time zone but simply uses it to know how to convert it to UTC for storage. The main thing to remember is that the input time zone is only used to compute the UTC value and once stored, that input time zone information is gone. When PostgreSQL displays back the time, it always does so in the default time zone dictated by the session, user, database, or server and checks in that order. If you employed time zone aware data types, we implore you to consider the consequence of a server move from one time zone to another. Suppose you based a server in New York City, and subsequently restored the database in Los Angeles. All timestamp with time zone fields would suddenly display in Pacific time. This is fine as long as you anticipate this behavior.

Here’s an example where something can go wrong. Suppose that McDonald’s had their server on the East Coast and the opening time for stores is timetz. A new McDonald’s opens up in San Francisco. The new franchisee phones McDonald’s HQ to add their store to the master directory with an opening time of 7 a.m. The data entry dude entered the information as he is told—7 a.m. PostgreSQL inteprets this to mean 7 a.m. Eastern, and now people are waiting in line wondering why they can’t get their breakfast sandwiches at 4 a.m. Being hungry is one thing, but we can imagine many situations where a screw-up with difference of three hours could mean life or death.

So why would anyone want to use time zone aware data types? First, it does save having to do time zone conversions manually. For example, if a flight leaves Boston at 8 a.m. and arrives in Los Angeles at 11 a.m., and your server is in Europe, you don’t want to have to figure out the offset for each manually. You could just enter the data with the Boston and Los Angeles offsets. There’s another convincing reason to use time zone aware data types: the automatic handling of Daylight Savings Time. With countries deviating more and more from each other in DST observation schedules and even changing from year to year, manually keeping track of DST changes for a globally used database would almost require a dedicated programmer who does nothing but keep up to date with the latest DST schedules.

Here’s an interesting example: A traveling sales person catches a flight home from San Francisco to nearby Oakland. When he boards the plane the clock at the terminal reads 2012-03-11 1:50 a.m. When he lands, the clock in the terminal reads 2012-03-11 3:10 a.m, How long was the flight? With time zone aware time stamps, you get 20 minutes, which is the plausible answer for a short flight across the Bay. We actually get the wrong answer if we don’t use time zone aware timestamps.

SELECT '2012-03-11 3:10AM'::timestamptz - '2012-03-11 1:50AM'::timestamptz;

gives you 20 minutes, while

SELECT '2012-03-11 2:45AM'::timestamp - '2012-03-11 1:50AM'::timestamp;

gives you 1 hour and 20 minutes.

We should add that your server needs to be in the US for the above discrepancy to show up.

Let’s drive the point home with more examples, using a Boston server.

Example 5-10. Inputting time in one time zone and output in another

SELECT '2012-02-28 10:00 PM America/Los_Angeles'::timestamptz;

2012-02-29 01:00:00-05

For Example 5-10, I input my time in Los Angeles local time, but since my server is in Boston, I get a time returned in Boston local time. Note that it does give me the offset, but that is merely display information. The timestamp is internally stored in UTC.

Example 5-11. Timestamp with time zone to timestamp at location

SELECT '2012-02-28 10:00 PM America/Los_Angeles'::timestamptz AT TIME ZONE 'Europe/Paris';

2012-02-29 07:00:00

In Example 5-11, we are getting back a timestamp without time zone. So you’ll notice the answer you get when you run this same query will be the same as mine. The query is asking: What time is it in Paris if it’s 2012-02-28 10:00 p.m. in Los Angeles? Note the absence of UTC offset in the result. Also, notice how I can specify time zone with its official names rather than just an offset, visit Wikipedia for a list of official time zone names (http://en.wikipedia.org/wiki/Zoneinfo).

Operators and Functions for Date and Time Data Types

The inclusion of a temporal interval data type greatly eases date and time arithmetics in PostgreSQL. Without it, we’d have to create another family of functions or use a nesting of functions as most other databases do. With intervals, we can add and subtract timestamp data simply by using the arithmetic operators we’re intimately familiar with. Table 5-1 provides a listing of operators and functions used with date and time data types.

Table 5-1. Date and Timestamp Operators

Operator Example

Operator	Example
`+` Adding an interval	SELECT '2012-02-10 11:00 PM'::timestamp + interval '1 hour'; 2012-02-11 00:00:00
`-` Subtracting an interval	SELECT '2012-02-10 11:00 PM'::timestamptz - interval '1 hour'; 2012-02-10 22:00:00-05
`OVERLAPS` Returns true or false if two temporal ranges overlap. This is an ANSI-SQL operator equivalent to the functional `overlaps()`. `OVERLAPS` takes four parameters, the first pair and the last pair constitute the two ranges.	SELECT '2012-10-25 10:00 AM'::timestamp,'2012-10-25 2:00 PM'::timestamp OVERLAPS '2012-10-25 11:00 AM'::timestamp,'2012-10-26 2:00 PM'::timestamp AS x, '2012-10-25'::date,'2012-10-26'::date OVERLAPS '2012-10-26'::date,'2012-10-27'::date As y; x \| y -------+-------- t \| f Overlap considers the time periods to be half-open, meaning that the start is included but the end is not. This is slightly different behavior than when using the common `BETWEEN` operator, which considers both start and end to be included. The quirk with overlaps won’t appear unless one of your ranges is a fixed point in time (a period where start and end are identical). Do watch out for this if you’re a avid user of the the overlaps function.

+ Adding an interval

SELECT '2012-02-10 11:00 PM'::timestamp + interval '1 hour';

2012-02-11 00:00:00

- Subtracting an interval

SELECT '2012-02-10 11:00 PM'::timestamptz - interval '1 hour';

2012-02-10 22:00:00-05

OVERLAPS Returns true or false if two temporal ranges overlap. This is an ANSI-SQL operator equivalent to the functional overlaps(). OVERLAPS takes four parameters, the first pair and the last pair constitute the two ranges.

SELECT '2012-10-25 10:00 AM'::timestamp,'2012-10-25 2:00 PM'::timestamp OVERLAPS '2012-10-25 11:00 AM'::timestamp,'2012-10-26 2:00 PM'::timestamp AS x, '2012-10-25'::date,'2012-10-26'::date OVERLAPS '2012-10-26'::date,'2012-10-27'::date As y;

x      | y
-------+--------
t      | f

Overlap considers the time periods to be half-open, meaning that the start is included but the end is not. This is slightly different behavior than when using the common BETWEEN operator, which considers both start and end to be included. The quirk with overlaps won’t appear unless one of your ranges is a fixed point in time (a period where start and end are identical). Do watch out for this if you’re a avid user of the the overlaps function.

In addition to the operators, PostgreSQL comes with functions with temporal types in mind. A full listing can be found here Date Time Functions and Operators. We’ll demonstrate a sampling here.

Once again, we start with the versatile generate_series function. Above PostgreSQL 8.3 or above, you can use this function with temporal types and interval steps.

Example 5-12. Generate a time series using generate_series()

SELECT (dt - interval '1 day')::date As eom 
FROM generate_series('2/1/2012', '6/30/2012', interval '1 month') As dt;

eom
------------
2012-01-31
2012-02-29
2012-03-31
2012-04-30
2012-05-31

As you can see in Example 5-12, we can express dates in our local date time format, or the more global ISO Y-M-D format. PostgreSQL automatically interprets differing input formats. To be safe, we tend to stick with entering dates in ISO, because date formats vary from culture to culture, server to server, or even database to database.

Another popular activity is extracting or formatting parts of a complete date time. Here, the functions date_part() and to_char() come to the rescue. The next example will also drive home the abidance of DST for a time zone aware data type.

Example 5-13. Extracting elements of a date time

We intentionally chose a period that crosses a daylight savings switchover in US/East.

SELECT dt, date_part('hour',dt) As mh, to_char(dt, 'HH12:MI AM') As formtime 
 FROM generate_series('2012-03-11 12:30 AM', '2012-03-11 3:00 AM', interval '15 minutes') As dt

dt                     | mh | formtime
-----------------------+----+----------
2012-03-11 00:30:00-05 |  0 | 12:30 AM
2012-03-11 00:45:00-05 |  0 | 12:45 AM
2012-03-11 01:00:00-05 |  1 | 01:00 AM
2012-03-11 01:15:00-05 |  1 | 01:15 AM
2012-03-11 01:30:00-05 |  1 | 01:30 AM
2012-03-11 01:45:00-05 |  1 | 01:45 AM
2012-03-11 03:00:00-04 |  3 | 03:00 AM

By default, generate_series() will assume timestamp with time zone if you don’t explicitly cast to timestamp. It will always return timestamp with time zone.

XML

The XML datatype is perhaps one of the more controversial types you’ll find in a relational database. It violates principles of normalization and makes purists cringe. Nonetheless, all of the high-end proprietary relational databases support them (IBM DB2, Oracle, SQL Server). PostgreSQL jumped on the bandwagon and offers plenty of functions to work with data of XML type. We’ve also authored many articles on working with XML in the context of PostgreSQL. (For further reading, you can find these articles at http://www.postgresonline.com/journal/index.php?/plugin/tag/xml.) PostgreSQL comes packaged with various functions for generating data, concatenating, and parsing XML data. These are outlined in PostgreSQL XML Functions.

Loading XML Data

To start, we show you how to get XML data into a table:

Example 5-14. Populate XML field

INSERT INTO web_sessions(session_id, session_state) 
 VALUES ('robe'
    , '<session><screen_properties>
 <prop><name>color</name><val>red</val></prop>
 <prop><name>background</name><val>snoopy</val></prop>
 </screen_properties></session>'::xml);

Querying XML Data

For querying XML, the xpath() function is really useful. The first argument is an XPath query statement, the second is an XML string. Output is an array of XML objects that satisfy the XPath query. In example Example 5-15, we’ll combine XPath with unnest() to return all the screen property names. Remember that unnest unravels the array into a row set. We then cast the XML fragment to text:

Example 5-15. Query XML field

SELECT (xpath('/prop/name/text()', prop) )[1]::text As pname 
  , (xpath('/prop/val/text()', prop) )[1]::text As pval 
 FROM (SELECT unnest(xpath('/session/screen_properties/prop', session_state)) As prop 
   FROM web_sessions WHERE session_id = 'robe') As X;

pname    |  pval
------------+--------
color      | red
background | snoopy

Unravel into <prop>, <name>, </name>, <val>, </val>, </prop> tags.

Get text element in name and val tags of each prop element.

We need to use array subscripting because XPath always returns an array, even if there’s only one element to return.

Custom and Composite Data Types

In this section, we’ll demonstrate how to define a simple custom type and use it. The composite, (a.k.a. record, row) object type is a special type in PostgreSQL because it’s often used to build an object that is then cast to a custom type or as return types for functions needing to return multiple columns.

All Tables Are Custom

As mentioned earlier, PostgreSQL automatically creates custom type for all the tables. For all intents and purposes, you can use custom types just as you would any other built-in type. So, we could conceivably create a table that has as a column type that is of another table’s custom type, and we can go even further and make an array of that type. We’ll go ahead and demonstrate this table turducken:

CREATE TABLE user_facts(user_id varchar(30) PRIMARY KEY, facts census.facts[]);

We can create an instance of factoid composite type as follows:

ROW(86,'25001010206', 2012, 123, NULL)::census.facts

And then stuff this factoid into our table:

INSERT INTO user_facts(user_id, facts) 
 VALUES('robe', ARRAY[ROW(86, '25001010206', 2012, 123, NULL)::census.facts]);

We can add more factoids to the same row using the array || (concatenation) operator and the array constructor array():

UPDATE user_facts 
 SET facts = facts || array(SELECT F FROM census.facts AS F WHERE fact_type_id = 86) 
WHERE user_id = 'robe';

Finally, we can query our composite array column:

SELECT facts[5].*, facts[1].yr As yr_1 FROM user_facts WHERE user_id = 'robe';

fact_type_id |  tract_id   |  yr  |   val    | perc | yr_1
-------------+-------------+------+----------+------+------
86           | 25001010304 | 2010 | 2421.000 |      | 2012

Building Your Own Custom Type

Although you can easily create composite types just by creating a table, at some point, you’ll probably wish to build your own from scratch. For example, let’s build a complex number data type with the following statement:

CREATE TYPE complex_number AS (r double precision, i double precision);

We can then use this complex_number as a column type:

CREATE TABLE circuits(circuit_id text PRIMARY KEY, tot_volt complex_number);

We can then query our table with statements such as:

SELECT circuit_id, (tot_volt).* 
 FROM circuits;

or an equivalent:

SELECT circuit_id, (tot_volt).r, (tot_volt).i
 FROM circuits;

Note

People from other databases are a bit puzzled by the (tot_volt) syntax. If you leave out the () for a composite type that is not an array, you get an error of form missing FROM-clause entry for table “tot_volt”, which is caused because tot_volt could just as easily refer to a table called tot_volt.

Although we didn’t show it here, you can also define operators that will work with your custom type. You could define the operator of + addition between two complex numbers or a complex number and a real number. Being able to build custom types and operators pushes PostgreSQL to the boundary of a full-fledged development environment, bringing us ever closer to our conception of an ideal world where everything is table-driven.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for 5. Data Types

Create new playlist

Sign In

Sign Up

Chapter 5. Data Types

Numeric Data Types

Serial

Generate Series Function

Arrays

Array Constructors

Referencing Elements in An Array

Array Slicing and Splicing

Character Types

Note

String Functions

Splitting Strings into Arrays, Tables, or Substrings

Regular Expressions and Pattern Matching

Note

Temporal Data Types

Time Zones: What It Is and What It Isn’t

Operators and Functions for Date and Time Data Types

XML

Loading XML Data

Querying XML Data

Custom and Composite Data Types

All Tables Are Custom

Building Your Own Custom Type

Note

Table of Contents for
5. Data Types