9.3. BLOB storage with FileStream

Prior to 2008, SQL Server-based applications used one of two methods for storing binary large objects (BLOBs) such as video, images, or documents (PDFs, docs, and so forth). The first method was to store the object within the database in an image or varbinary(max) column. Alternatively, BLOBs were stored in file system files, with a link to the file (hyperlink/path) stored in a table column.

Both of these methods have their pros and cons. SQL Server 2008 introduces a third method known as FileStream. This method lets you combine the benefits of both of the previous methods while avoiding their drawbacks.

Before we continue, keep in mind that character-based BLOBs are often referred to as CLOBs, or character large objects. In some texts, BLOBs and CLOBs are referred to collectively as LOBs, or large objects. For the purposes of this section, we'll use the term BLOBs to refer to either binary large objects or character large objects.

Before we cover the new FileStream option, let's briefly cover the details of the previous methods of BLOB storage, both of which are still supported in SQL Server 2008.

9.3.1. BLOBS in the database

SQL Server's storage engine is designed and optimized for storage of normal relational data such as integer and character-based data. A fundamental design component of the SQL Server engine is the 8K page size, which limits the maximum size of each record. All but the smallest BLOBs exceed this size, so SQL Server can't store them in row like the rest of the record's data.

To get around the 8K limitation, SQL Server breaks the BLOB up into 8K chunks and stores them in a B-tree structure, as shown in figure 9.6, with a pointer to the root of the tree stored in the record's BLOB column.

Prior to SQL Server 2005, the primary data type for in-database BLOB storage was the image data type. SQL Server 2005 introduced the varbinary(max) data type to overcome some of the image limitations, discussed next.

Image and text data types

The primary data type used for binary-based BLOB storage prior to SQL Server 2005 is the image data type, and the text data type supports character-based BLOBs (CLOBs). Both data types provide support for BLOBs up to 2GB.[] Still supported in SQL Server 2008, these data types have a number of drawbacks that limit their usefulness, chiefly the inability to declare image or text variables in T-SQL batches. As such, accessing and importing BLOB data required a combination of programming techniques, reducing the appeal of in-database BLOB storage somewhat.

[] The NTEXT data type, used for Unicode, supports up to 1GB of data.

Varbinary(max) and varchar(max)

Introduced in SQL Server 2005, the varbinary(max) data type, and its text equivalents varchar(max) and nvarchar(max), overcome the limitations of the image and text data types by providing support for variable declaration and a range of other operations.

Figure 9.6. SQL Server implements support for BLOB storage by using a pointer to a B-tree structure in which BLOBs are broken up into 8K chunks and linked using pointers.
 

Such support makes BLOB access and importing much simpler than the equivalent process in SQL Server 2000 with image and text data types. Here's an example:

-- Insert a jpg file into a table using OPENROWSET
INSERT INTO clients (ID, DOB, Photo)
SELECT 1, '21 Jan 1974', BulkColumn
FROM OPENROWSET (Bulk 'F:photosclient_1.jpg', SINGLE_BLOB) AS blob

As a BLOB storage strategy, in-database storage allows BLOBS to be tightly coupled with the related data. The BLOBS are transactionally consistent—that is, updates on the BLOB are rolled forward or back in line with the rest of the record, and included in backup and restore operations. All good so far. The downside, however, is significant. For databases with large numbers of BLOBs, or even moderate amounts of very large BLOBs, the database size can become massive and difficult to manage. In turn, performance can suffer.

In addressing these concerns, a common design is to store BLOBs in the file system with an appropriate reference or hyperlink stored in the column.

9.3.2. BLOBS in the file system

The alternative to storing BLOBs in the database is to store them in their native format as normal files in the file system. Windows NTFS is much better at file storage than SQL Server, so it makes sense to store them there and include a simple link in the database. Further, this approach lets you store BLOBs on lower-cost storage, driving down overall costs.

An example of this approach is shown in figure 9.7. Here, the table contains a photolink column storing the path to a file system-based file.

Figure 9.7. Rather than store the BLOB in the database, an alternate approach is to simply store the link to the file in the database.
 

The problem with this approach is twofold; the data in the database is no longer transactionally consistent with the BLOB files, and database backups aren't guaranteed to be synchronized with the BLOBs (unless the database is shut down for the period of the backup, which isn't an option for any 24/7 system).

So on one hand we have transactional consistency and strongly coupled data at the expense of increased database size and possible performance impacts. On the other hand, we have storage simplicity and good performance at the expense of transactional consistency and backup synchronization issues. Clearly, both options have significant advantages and disadvantages; DBAs and developers often passionately argue in favor of one option over another. Enter FileStream, King of the BLOBs...

9.3.3. FileStream data

Offering the advantages of both file system and in-database storage is the FileStream data type, introduced in SQL Server 2008.

Overview

FileStream provides these advantages:

  • BLOBs can be stored in the file system. The size of each BLOB is limited only by the NTFS volume size limitations. This overcomes the 2GB limit of previous in-database BLOB storage techniques, which prevented SQL Server from storing certain BLOB types such as large video files.

  • Full transactional consistency exists between the BLOB and the database record to which it's attached.

  • BLOBs are included in backup and restore operations.

  • BLOB objects are accessible via both T-SQL and NTFS streaming APIs.

  • Superior streaming performance is provided for large BLOB types such as MPEG video.

  • The Windows system cache is used for caching the BLOB data, thus freeing up the SQL Server buffer cache required for previous in-database BLOB storage techniques.

FileStream data combines the transactional strength of SQL Server with the file management and streaming performance strengths of NTFS. Further, the ability to place FileStream BLOBs on separate, NTFS-compressed volumes provides opportunities to significantly lower overall storage costs.

Unfortunately, there are some limitations with FileStream data, which we'll come to shortly. In the meantime, let's run through the process of enabling and using FileStream.

Enabling filestream

In chapter 4, we discussed installing SQL Server 2008. One of the steps involved choosing to enable FileStream data. Once it's installed, you can enable or disable FileStream using SQL Server Configuration Manager. Just right-click the SQL Server service for a selected instance and choose Properties, and then select the FILESTREAM tab (as shown in figure 9.8). Here you can enable FileStream for T-SQL access and optionally for file I/O streaming access.

Figure 9.8. You can enable FileStream using the SQL Server Configuration Manager tool.
 

Once enabled through Configuration Manager (or as part of the initial installation), the SQL Server instance must then be configured as a secondary step using the sp_configure command. For example, to configure an instance for both T-SQL and Windows streaming access:

-- Enable FileStream Access for both T-SQL and Windows Streaming
EXEC sp_configure 'filestream access level', 2
GO
RECONFIGURE
GO

Here, we used 2 as the parameter value. 1 will enable FileStream access for T-SQL only, and 0 will disable FileStream for the instance. Let's take a look now at the process of creating a table containing FileStream data.

Using filestream

When creating a database containing FileStream data, the first thing we need to do is ensure there is a FileStream filegroup. In our next example, we'll create the database with a SalesFileStreamFG filegroup by specifying CONTAINS FILESTREAM. We also use a directory name (G:FSDATASALES in this example) to specify the location of the FileStream data. For optimal performance and minimal fragmentation, disks storing FileStream data should be formatted with a 64K allocation unit size, and be placed on disk(s) separate from both data and transaction log files.

-- Create a database with a FILESTREAM filegroup
CREATE DATABASE [SALES] ON PRIMARY
(  NAME = Sales1
   , FILENAME = 'M:MSSQLDatasalesData.mdf'
)
,  FILEGROUP [SalesFileStreamFG] CONTAINS FILESTREAM
(  NAME = Sales2
   , FILENAME = 'G:FSDATASALES'
)
LOG ON
(  NAME = SalesLog
   , FILENAME = 'L:MSSQLDatasalesLog.ldf'
)
GO

Next up, we'll create a table containing a column that will store FileStream data. In this example, the Photo column contains the FILESTREAM attribute with the varbinary(max) data type. Note that we're adding a UNIQUEIDENTIFIER column to the table with the ROWGUIDCOL attribute and marking it as UNIQUE. Such columns are mandatory for tables containing FileStream data. Also note the use of the FILESTREAM_ON clause, which specifies the filegroup to use for FileStream data.

-- Create a table with a FILESTREAM column
CREATE TABLE Sales.dbo.Customer
(
   [CustomerId] INT IDENTITY(1,1) PRIMARY KEY
   , [DOB] DATETIME NULL
   , [Photo] VARBINARY(MAX) FILESTREAM NULL

, [CGUID] UNIQUEIDENTIFIER NOT NULL ROWGUIDCOL UNIQUE DEFAULT NEWID()
) FILESTREAM_ON [SalesFileStreamFG];
GO

At this point, we're ready to insert data into the column. For the purposes of this example, we'll insert a simple text fragment. A more realistic example (but beyond the scope of this book) would be an application that allows a user to specify a local JPEG image that would be streamed into the column:

INSERT INTO Sales.dbo.Customer (DOB, Photo)
VALUES ('21 Jan 1975', CAST ('{Photo}' as varbinary(max)));
GO

After inserting this record, inspection of the file system directory specified for the FileStream filegroup will reveal something similar to that shown in figure 9.9.

As you can see in figure 9.9, there is no obvious correlation between database records and FileStream file or directory names. It's not the intention of FileStream to enable direct access to the resulting FileStream data using Windows Explorer. The important thing is that SQL Server maintains transactional consistency with the data and includes it in backup and restore commands.

As mentioned earlier, there are some limitations with the FileStream data type that you should consider before implementing it.

Filestream limitations

Despite the obvious advantages covered earlier, FileStream has some restrictions that limit its use as a BLOB storage technique:

  • Database mirroring, which we'll cover in chapter 11, can't be enabled on databases containing FileStream data.

  • Database snapshots, covered in the next chapter, aren't capable of including FileStream data. You can create a snapshot of a database containing FileStream data, but only if you exclude the FileStream filegroup.

  • FileStream data can't be encrypted; a database that uses transparent data encryption won't encrypt the FileStream data.

  • Depending on the BLOB size and update pattern, you may achieve better performance by storing the BLOB inside the database, particularly for BLOBs smaller than 1MB and when partial updates are required (for example, when you're updating a small section of a large document).

Figure 9.9. FileStream directories and files. As shown here, there is nothing obvious about FileStream directory and filenames to enable individual BLOB objects to be identified.
 

Of these limitations, perhaps the biggest is the inability to use database mirroring on databases containing FileStream data. In such cases, alternate BLOB storage techniques such as those covered earlier in this section are required. SQL Server Books Online (BOL) contains a number of other limitations, guidelines, and best practices for using the FileStream feature.

Despite its limitations, FileStream is a powerful new feature introduced in SQL Server 2008. The same can be said for data compression, our next topic.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.141.3.175