CLR-Based User-Defined Types

With the advent of .NET integration in SQL Server 2005, you can now create CLR-based User-Defined Types (UDTs) using a .NET language of your choice. Much has been debated on the subject already since Microsoft announced their plans for strong .NET integration in the product. The whole concept of .NET (as well as XML) integration in a relational database is controversial. I must admit that I had my concerns. I thought that the worlds would collide, and that they belonged to separate universes. I was especially concerned about the large potential for abuse of these new features if they were not used wisely. In this respect, my original opinion still stands, but I believe that it’s the role of education to prevent such abuse. These features are integral parts of SQL Server 2005; so even though this book’s focus is T-SQL programming, I thought it should give you enough background and tools to decide when to use CLR and XML and when to use T-SQL.

My main source of information on the subject was my good friend Dejan Sarka, a SQL Server MVP and a mentor with Solid Quality Learning. Relational theory and data modeling is Dejan’s realm. While consulting with him, he made me realize that CLR and XML integration are actually in accord with the relational model. Their coexistence with T-SQL in SQL Server actually makes a lot of sense. I find his vision fascinating. To do justice to the subject, I asked Dejan to cover it, as well as the other parts of the book that discuss .NET and XML integration and their role in the relation model.

Theoretical Introduction to UDTs

Before showing you how to implement CLR UDTs, I’d first like to give you some theoretical background. This knowledge should help you decide how to implement CLR UDTs properly. A common question these days is whether SQL Server, with all this CLR (and XML) support, has become an object-relational or even an object database management system. I’ll explain why I think this is not the case and that, rather, SQL Server is becoming a more complete relational database management system.

Domains and Relations

To have a unified object-relational world, you must answer a single crucial question: what concept is there in the relational world that is a counterpart to the concept object class in the object world? In object-oriented programming (OOP), the most important concept is a class used as a template for instantiating (creating) objects; objects are variables that physically live in computer memory. So what are the most important concepts in the relational world?

Note

Note

My views on the subject have been largely affected by the work of Chris J. Date and Hugh Darwen in their famous book Foundation for Object/Relational Databases: The Third Manifesto (Addison-Wesley Professional, 1998).

Relational databases contain relations. Table 1-14 is an example of a typical Employees relation, which is physically represented in the database as a table.

Table 1-14. Employees Relation

EmployeeID: posInt

Name: string

City: cityenum

17

Fernando

Alicante

19

Alejandro

Bogotá

24

Herbert

Vienna

20

Douglas

Atlanta

In every row, you can find a proposition. For example, the second row represents an employee with an ID of 19, with the name Alejandro, who lives in Bogotá, Colombia. In the table’s header, you can find the predicates for the propositions: an employee with an ID that is a positive integer; a name that is an arbitrary string; and who lives in a city, which is included in some enumeration of cities. Notice the names of the domain components in the table’s header. Domains limit the propositions. Because an employee ID is a positive integer, it cannot be lower than zero. You can have only cities from the city list. Domains constrain the universe of discourse–they constitute the things you can talk about. Relations constitute propositions–the truths you utter about those things. With domains and relations, you can describe the whole subsystem of employees. Therefore, domains and relations are necessary and also sufficient to represent reality. These are the most important concepts in the relational world.

I have to emphasize the constraining role of domains. Propositions are assertions of facts, and databases are structured collections of propositions. If propositions are not true, the database represents falsehoods, not facts. Therefore, integrity is fundamental for databases. Constraints help you maintain integrity. But what is integrity? You must have some concepts in place to define conceptual integrity. You should be aware of the fact that you usually do not know who the users (people, applications) of a database are. Those users can come from any part of the world, any culture, any religion. You have to carefully choose the concepts on which integrity will be based. Preferably, those concepts and their nature should be few in number and agreed upon.

Let me explain this theoretical part with an example: imagine that you have a domain called Spouse. You should define a constraint for this domain: a person can have one spouse only. By defining such a constraint, you already made a mistake because some cultures and religions don’t limit this association to a one-to-one relationship! Still, a smart domain can include all possible constraints for an attribute. Whenever you try to manipulate the data, the system could check that the operands are of the correct types for the operation–for example, they have the correct domains. (A domain is a synonym for a data type.) For example, the system could allow a multiplication operation between the attributes Quantity and Price but not allow addition, which has no meaning in this context.

As you know, relational databases are based on set theory. A relation represents a set–that is, a set of propositions. A set consists of unique members; therefore, you need a way to distinguish between different propositions. You need to define candidate keys, out of which you will choose one as the primary key. Domains and keys are sufficient to implement all possible constraints. By the way, this is how the Domain-Key normal form (DKNF) is defined: a relation is in Domain/Key Normal Form if every constraint on the relation is a logical consequence of the definition of keys and domains. A relation in DKNF is free from all update anomalies. Of course, there is no simple algorithmic procedure to implement DKNF.

Relations and Classes

Developers commonly think of database relations the same way they think of classes in OOP. Object/relational mapping tools translate classes to relations. I’ll try to prove that this common perception is wrong, although the practical implementation of the mapping is usually correct.

Starting with relations, remember this: a relation is a set. Set theory defines the Axiom of Extensionality, which states that sets formed by the same elements are equal. There’s a corollary here: sets formed by different elements are different. What does this mean? If you insert, update or delete a proposition (a physical row), you are actually replacing one set (or relation) with a different one.

Note

Note

The Axiom of Extensionality comes from Zermelo-Fraenkel Set Theory, a version of set theory that is a formal system expressed in first-order predicate logic.

In programming, a variable is a holder of an encoded value. A value is an individual constant that has no location in time and space. A variable, on the other hand, has a strictly defined location in time and space. Many different variables can simultaneously contain an encoding of the same value. Updating a variable means replacing the value it holds with some other value.

Comparing the last two paragraphs, you can logically deduct that a relation is a variable! At a specific moment in a specific location, a relation contains a set; updating a relation means replacing its value with another set. Thinking of relations as variables can be quite confusing; in OOP, variables live in computer memory for a short time and are treated as transient, while relations live for a longer time on hard disk and are treated as persistent. But where exactly is the border between transience and persistence? Isn’t persistence just another level of transience and vice versa? Considering the physical layer, with all the progress in hardware, there is no strict border between memory and persistent storage anymore as well. Also consider different collections in OOP languages; isn’t a set just another collection with stricter rules?

As you can see, a relation is not the same thing as a class. To add more proofs, let’s for a moment assume that a relation is a class. If a relation were a class, it could contain other relations because a class can contain subclasses. Table 1-15 shows an example of such a relation.

Table 1-15. A Relation with a Multivalued Column

EmployeeID: posInt

Name: string

City: cityenum

Hobbies: relation

17

Fernando

Alicante

Name: string

Basketball

Beer

Type: string

Sport

NearlySport

19

Alejandro

Bogotá

...

 

24

Herbert

Vienna

...

 

20

Douglas

Atlanta

...

 

But according to relational theory, a relation can contain scalar values only (first Normal Form). A subrelation is a collection, yielding multivalued attributes. Note that a scalar value does not mean that the value cannot be internally complex. It means that you are dealing with a single item, complex as it might be. Consider SQL Server’s DATETIME data types covered earlier in the chapter. Remember that internally these datatypes consist of two parts–date and time–but we refer to a single timestamp; hence, DATETIME datatypes are scalar. Conversely, a collection of DATETIME values would be a multivalued attribute. Thus, if a relation is a class, it is no longer a relation.

To reinforce the point I’m trying to make, if a relation is a class, a row represents an object and a column represents a public instance of a variable. But in OOP, encapsulation is one of the most important concepts. Encapsulation means that internal members (variables) are hidden and accessible only through public operations (methods). Encapsulation proved to be very useful, so even special methods called properties evolved. Properties are merely standard methods that access (accessors) and replace (mutators) the values of a variable. So, starting with the assumption that a relation is a class and following the last two deductions, you get an absurd conclusion–a relation is neither a relation nor a class. Thus, a relation is not a class. QED.

I’d like to add a couple of words about entities. The term entity is very abstract and vague, and often used very loosely as a result of the difficulty in describing abstract things. There, I just had this difficulty myself, finishing the last sentence trying to describe the uses of this term and managing only the awkward "things." When you are designing a database, you are searching for entities and represent them as relations. When you are designing an OOP application, you are searching for entities and represent them as classes. So you could ask yourself: if you are searching for the same thing, aren’t then relations and classes the same? The question is tricky. Let me ask another question: what exactly is an entity? Here’s the definition from The Free Dictionary (http://www.thefreedictionary.com): "something that exists as a particular and discrete unit." It is a very abstract concept. We use the term entity to have a common dictionary for business objects, relational databases, and OOP applications. If things have a common abstract name, it does not mean they are the same!

Finally, if a relation is a class, where do domains fit in the picture?

Domains and Classes

According to Fabian Pascal (Practical Issues in Database Management [Addison-Wesley Professional, 2000]), a domain consists of the following:

  • A name

  • One or more named possible representations:

    • One is physically stored

    • At least one is declared to the users

  • Type constraints

  • A set of operators permissible on the type’s values

Note the last bullet: the set of operators permissible on the type’s values is actually what constrains us, besides explicit type constraints. Also, internal representations are hidden; you deal with presentations declared to the users. This definition does not deal with simple or complex types. It is correct for all types. Think of the integer type: addition, subtraction, and multiplication operations are defined on this domain, while integer division (DIV) does not have very nice properties. To summarize, a domain is a data type of arbitrary internal complexity, whose values are manipulable solely by means of the operators defined for the type; internal representations are hidden.

Now let’s write the definition for a class: a class is a data type of arbitrary internal complexity, whose values are manipulable solely by means of the operators defined for the type; internal representations are hidden. The definitions for a class and a domain are the same. Therefore, the proper equation says a domain is a class. Or, stated differently, an object-relational system is nothing more than a true relational system.

Complex Domains

Now that you know that a domain is the same thing as a class, you can ask another question: why not use arbitrary complex domains? You could implement an Employees relation in different ways:

  • Domain EmployeeDomain (EmployeeID, Name, City) and relation Employees (employee: EmployeeDomain)

  • Domains PosInt, String and CityEnum and relation Employees (EmployeeId: PosInt, Name: String, City: CityEnum)

The problem with the first design is that it enforces a higher level of encapsulation. An employee has no visible components. It is less open to end users. (Developers are also end users in this context.) Re-creating propositions is very difficult work, if not nearly impossible. All possible constraints except the primary key are included in the domain. Operations allowed on EmployeeDomain values are not commonly known; the author of the domain has to explain them to the public. The number of operations that the creator of the domain has to define rises exponentially with the complexity. Remember, you usually don’t know who the users of the domain are going to be. It is quite possible that the constraints will not be acceptable to all of them. You don’t want to implement such a high level of encapsulation. Clearly, the second design is preferable.

You realize that simple domains are preferable for relational database design. But why do you call built-in types such as integers simple? Well, that’s because it is simple for you to deal with values of those types. From school, you know what operations are allowed with integer types, how the operations are named, and how they should be implemented. You don’t have to learn new concepts to handle integers. The producers of database management systems are implementing standard definitions. This is why you regard built-in types as simple. Even a small deviation from commonly accepted concepts, such as SQL Server’s DATETIME data types, involves complexities that you have to learn and deal with. That’s the main reason why the second design of a relation is preferable–the design with simple domains, where propositions are less encapsulated.

You can build constraints in many different places:

  • Data types

  • Database schemas (table structures), including data types, nullability, uniqueness

  • Check and other constraints as explicit objects

  • Lookup tables, which are actually check constraints using enumerations with any finite cardinality

  • Triggers

  • Stored procedures

  • Data Access Layer (DAL) procedures

  • Middle-tier code

  • Client code

  • Screen event handlers

  • Code built dynamically from values in a database

Items at the top of the list are closer to the data. Going down the list, you’re moving away from the data. The further you are from the data, the easier it is to circumvent a constraint. On the other hand, the higher you are in the list, the harder it is to change a constraint. This fact can be a problem if a constraint is volatile. A domain is the closest possible place to the data. Therefore, it is a suitable place to implement constraints that are not changing or are extremely important. If constraints are volatile or not agreed upon, it is not the right place for their implementation. Imagine what would happen if Microsoft suddenly decided to change the implementation of the integer data type! Again, you can clearly see that domains should not be too complex. Constraints built in domains cannot be easily relaxed, and that’s just not practical.

That’s why even primary keys are not strictly enforced by relational database management systems. Imagine how procedures that perform data cleansing on text files before importing them to tables would look like if primary keys were mandatory. It’s much more practical to import the data first, cleanse it in a relational database, and finally create the constraints.

Besides logical reasons, there are important physical reasons for not using complex domains as well. The most important one is performance. If your data type is performing badly, your users cannot do much about it. Deployment is another issue. Client applications must know how to deal with your data type at the client side. Programming languages know how to deal with standard, or what we called simple, types. Code for your complex data type must be available to client applications; otherwise, they won’t know how to manipulate the values of your domain at the client side. This means that you might face a situation where you would have to deploy the code for your domain to thousands of client computers. First-time deployment is usually not a problem; you can do it along with the application deployment, but what about upgrades?

I am not entirely against complex domains. If you’re sure that your application is the sole user of a complex domain, and if constraints are so important for the business problem you are trying to solve, then a complex domain might be the right design. Another adequate use of complex types would be when you are sure that the knowledge and constraints are generally agreed upon. The question is, when do you know this? Well, if you invent a new format for storing some special kind of data, you can add a data type and make your format available in databases as well.

Why Do We Need Complex Classes?

After realizing that simple data types are preferable to complex ones inside a relational database, and equating domains and classes, you can turn the question around and ask: Why do you then need complex classes in OOP applications?

For a brief moment, imagine a classical factory. The factory has a warehouse with many production lines. The same item that is available openly in a warehouse (and is, by the way, maintained with set-oriented operations) changes its shape when it comes to a production line. In a warehouse, the item is stored in such a way that it can be used by any tool from any production line. While in a concrete production line, the item becomes more closed (that is, more encapsulated), and you do not want to make a mess by handling it with inappropriate tools.

Clearly, we need both designs: simple types with a lower level of encapsulation inside a database and more complex types in applications.

Finally, as I already mentioned, object-relational mapping tools typically map classes to relations, not to domains. You can now realize that this is the correct mapping in most of the cases, although a relation is not the same thing as a class. Anyway, in case you need to apply a more encapsulated approach, you can save data for an application using a complex domain as you sometimes have to save semi-finished products in a factory for a while. In an extreme case, you can save the state of an object (serialize it) in a binary data type column, such as VARBINARY(MAX), and thus make it unavailable to any application that does not know how to deserialize it properly.

Note that as mentioned earlier in the chapter, the XML datatype can be adequate to store the state of objects using XML serialization in a database as well. Because the XML datatype is well known, agreed upon and standardized, it opens objects that would otherwise be available only to applications that created them.

Language for Creating UDTs

When I was first introduced to the idea of supporting CLR UDTs inside SQL Server, I wondered why Microsoft hadn’t implemented the ANSI standard CREATE DOMAIN command instead. After all, I liked the idea of implementing them in .NET languages.

Well, to implement a data type you need to implement many fine-grained methods. Nonprocedural languages, such as T-SQL, consist of larger building blocks of commands. Think of it; a UDT knows to do only things you define in its methods and properties (which are actually methods as well). Now, imagine how many different methods you would have to implement just for basic operations even for a very simple type! Also, your mutator methods should be resilient to input errors; therefore, you need to control and constrain the input. Regular expressions, for example, are very useful for this task. Programming all these fine-grained details would be very awkward in a nonprocedural language.

So my conclusion is that Microsoft’s implementation of CLR-based UDTs is the correct way to go. My only disappointment is that SQL Server 2005 doesn’t support operator overloading in UDTs. Such support would have raised the level of usability of UDTs significantly. Comparison, logical, and arithmetic operators are well known all over the world. Thus, for any UDT that should support such operators, they would have been directly applicable. The lack of support for operator overloading forces you to learn how operators are implemented through UDT-specific methods. It also forces you to create your own User-Defined Aggregates (UDAs) for calculations that would have otherwise been supported by built-in aggregates directly against UDTs (for example, SUM).

I’ve mentioned that I would use complex domains if I needed to store a semi-finished product or implement an extremely important business rule, or if I invented some specific format for storing data only. Simple UDTs can be useful for standard formats that don’t need to support many operations. A good example is an IP address type–in many applications, you just need to store and read IP addresses. With operator overloading, implementing types such as complex numbers, which are very well known, with UDTs could have been much simpler.

Programming a UDT

After this long theoretical introduction, let’s get down to business–creating a concrete UDT. You need to create a class or a structure (struct in C#, Structure in Microsoft Visual Basic .NET) in .NET code. Classes in .NET live in assemblies, and SQL Server loads the .NET code from its own databases, so you have to catalog the assembly. I’ll show you how to create a complex number domain. Besides supporting basic operations (read and write), your complex numbers need to support comparisons, basic arithmetic operations, and a SUM aggregate function.

I’ll use the Cartesian form of a complex number:

cn = a + bi

where i = square root of –1 (i.e. i * i = –1).

Basic arithmetic operations are defined as:

  1. Addition: (a + bi) + (c + di) = (a + c) + (b + d)i

  2. Subtraction: (a + bi) – (c + di) = (a – c) + (b – d)i

  3. Multiplication: (a + bi) × (c + di) = (ac – bd) + (bc + ad)i

  4. Division: (a + bi)/(c + di) = ((ac + bd) + (bc – ad)i)/(c2+ d2)

UDT Requirements

Besides supporting operations intrinsic to the data type, a UDT must also support conversions to and from character strings to facilitate data input and presentation. Microsoft requires every UDT to implement default accessor and mutator specific methods for this: ToString and Parse. The ToString method converts the UDT to a string value. The Parse method converts a string input to a UDT value. If you want it to be resilient to erroneous input, you should include in it some error-checking logic. .NET attributes define the UDT’s behavior within SQL Server and provide information on which the deployment process in Visual Studio .NET relies. The most important attribute is Microsoft.SqlServer.Server.SqlUserDefinedTypeAttribute, which defines the behavior of the type itself. This attribute is required. It defines the UDT’s behavior inside SQL Server, like storage format (native or user-defined), and also helps in deployment from Visual Studio (for example, when Visual Studio sees this attribute, it knows that it has to use the CREATE TYPE command). For more details regarding UDT requirements please refer to Books Online.

All SQL Server data types, as opposed to .NET native types, should know how to deal with unknown values. A NULL instance of a UDT must be supported; you can choose how to represent it internally. If possible, represent it with a state of a UDT; otherwise, use a flag. The state of all NULL instances should be the same, so it is quite useful to create a class-level (static in C#, Shared in Visual Basic .NET), read-only variable that holds a NULL instance of your type. When you create a UDT, you have to implement the System.Data.SqlTypes.INullable interface and create a public static (Shared in Visual Basic .NET), read-only Null property. The interface consists of a single read-only IsNull property, which shows whether the value is unknown. The Null property is used to return an instance whose value is NULL.

Inheritance relationships are not recognized by SQL Server. You can use class hierarchies, but they are not defined in the SQL Server catalog and not used in T-SQL operations. In the CLR code, you can use class hierarchies; that is, use a base class and then define a subclass, which is your UDT. The subclass inherits all the methods of the base class. If the inherited methods are explicitly programmed in the subclass–and this means they are overridden (commonly you do this to change the behavior of the method, not just to repeat the code from the base class)–then SQL Server will recognize them. If not, SQL Server does not know how to go to the base class and execute the code from the base class. But at the client side, in a .NET application that has references to the assembly with the base class and the UDT class, you can use methods from the base class without explicitly overriding them, because .NET recognizes class hierarchies. So basically you would not use inheritance when you define a UDT, unless you want to have some additional functionality at the client side, and you don’t want to have the same functionality at the server side.

You have to define how to persist your UDT, or in .NET jargon, how to serialize it. The type of the serialization is defined with the Format property of the attribute SqlUserDefinedTypeAttribute. Possible formats are Native and UserDefined. Native means that the compiler will lay out the structure for you. You have to tell the compiler that you want to use compiler layout by specifying the Serializable and StructLayout(LayoutKind.Sequential) attributes. Sequential order means that variables defined first will be laid out first. (Persisting a state actually means saving the values of variables.) SQL Server takes care of normalizing the values in the variables, so you can specify that the IsByteOrdered property equals true. Normalization here means that SQL Server combines all variables of a class instance to a byte array of an arbitrary length, and transforms it into a byte array of a controllable length; such that no information will be lost in the transformation. The IsByteOrdered property means that values are ordered, so it is possible to use the column in an index, or in the ORDER BY clause of a query, and so on. You can use this property because SQL Server does the normalization. This way, values of your type can be compared, and you can use your type in a primary key, for example. With Native serialization, the MaxByteSize property should not be specified.

Native formatting is simple to implement, but it has a drawback–you can use only .NET value types in your UDT. The only really big problem is the string type, which is a reference type in .NET. To include a reference type in a UDT, you have to define your own serialization for the UDT. You have to specify Format.UserDefined and implement the IBinarySerialize interface, which has two methods: Read and Write. You can define the ordering of the values of your type in the Write method. A nice option would be, for example, to use the first 8 bytes for ordering and calculate them with your own algorithm. Then, when reading the value (Read method), you can skip these 8 bytes. If you don’t define ordering, values of your type would not be comparable. You also have to specify the MaxByteSize property, which can be at most 8000.

The UDT should support XML serialization as well. It must implement the System.Xml.IXmlSerializable interface. Or all public fields and properties must be of types that are XML serializable, or they must be decorated with the XmlIgnore attribute if overriding standard serialization is required. Our UDT does not implement an explicit XML serialization and also does not use the XmlIgnore attribute, so we can use XML serialization from string representation only.

SQL Server and .NET native types do not match one-to-one. You should use .NET SqlTypes whenever possible because they correspond directly to SQL Server native types. The System.Data.SqlTypes namespace contains the classes that represent SQL Server native data types available to the assembly.

Finally, to get a list of all requirements of a CLR user-defined data type, please refer to the "User-Defined Type Requirements" topic in Books Online.

Note

Note

You can find C# and Visual Basic .NET samples for a complex number UDT at C:Program FilesMicrosoft SQL Server90SamplesEngineProgrammabilityCLRUserDefinedDataType, if you installed the samples from the Microsoft SQL Server 2005 program group.

In this chapter, I’ll develop my own complex number UDT using a basic .NET template, but I’ll use things that I found good from the sample. You might wonder what’s wrong with the sample and why I don’t just use it as is. First of all, I want to show you the complete procedure for creating a UDT. As pointed out to me by Adam Machanic—a distinguished SQL Server MVP—another issue with the sample is that it includes overloaded comparison operators. This is misleading because SQL Server 2005 does not support operator overloading. Values of the sample type are comparable because the type is decorated with the "IsByteOrdered = true" property of the SqlUserDefinedType attribute, and SQL Server uses byte order to compare values. You can verify that comparisons still work even when you comment out all operator overloading code.

Creating a UDT

Visual Studio .NET 2005 has templates (skeletons) for creating CLR objects in SQL Server 2005 and also supports deployment of UDTs. Some editions of Visual Studio .NET 2005 don’t support SQL Server CLR objects directly–for example, the Standard edition. If you’re working with such an edition, you have to create a standard class library and then deploy the objects in SQL Server using T-SQL explicitly (via the CREATE ASSEMBLY | TYPE | AGGREGATE | FUNCTION | PROCEDURE | TRIGGER commands). In my examples, I’ll take the longer route so that you’ll be familiar with the whole process. But first let’s develop a UDT in C# step by step. I’ll present a code snippet in each step accompanied by explanations. Putting all code snippets together would give you a complete working UDT.

More Info

More Info

The complete UDT code is available to you as part of the book’s source code which you can download from http://www.insidetsql.com.

To create a UDT, first create a new project in Visual Studio 2005. Assuming that you are using Visual Studio 2005 Professional Edition or higher, choose the Visual C# project type, Database subtype, SQL Server project template. If you’re using an earlier edition of Visual Studio, use the Class Library project template. Name the project ComplexNumberCS, specify the folder C:, and confirm to create the solution.

Note

Note

You can specify any other folder that you like, but remember your choice because you will need to refer to the physical location of the assembly later when you deploy it in SQL Server.

Now that the solution has been created, add a new User-Defined Type item (relevant to SQL Server project template). Name the class ComplexNumberCS.cs. Feel free to examine the basic skeleton of the UDT that the IDE created. However, for our example, you’ll replace the template code with what I’ll provide here. If you used the Class Library project template, rename the existing class (Class1.cs) to ComplexNumberCS.cs.

The first part of our UDT declares the namespaces used by the assembly, the class name, and the attributes used:

using System;
using System.Data;
using System.Data.SqlClient;
using System.Data.SqlTypes;
using Microsoft.SqlServer.Server;
using System.Text.RegularExpressions;
using System.Globalization;

[Serializable]
[Microsoft.SqlServer.Server.SqlUserDefinedType(Format.Native,IsByteOrdered = true)]
public struct ComplexNumberCS : INullable
{

You’re going to use RegularExpressions to split and check the input. The Globalization namespace is needed because you want to have a culture-invariant string representation of values of your type. You can see that the type uses native formatting and is byte ordered when persisted. It also implements the INullable interface. Next, define the variables you need:

//Regular expression used to parse values of the form (a,bi)
private static readonly Regex _parser
    = new Regex(@"A(s*(?<real>-?d+(.d+)?)s*,s*(?<img>-?d+
    (.d+)?)s*is*)",
                RegexOptions.Compiled | RegexOptions.ExplicitCapture);

// Real and imaginary parts
private double _real;
private double _imaginary;

// Internal member to show whether the value is null
private bool _isnull;

// Null value returned equal for all instances
private const string NULL = "<<null complex>>";
private static readonly ComplexNumberCS NULL_INSTANCE
   = new ComplexNumberCS(true);

For more details on regular expressions, please refer to the .NET documentation.

Note

Note

The printed code is formatted to be more readable in the book; for example, in practice the complete regular expression should appear in a single line.

In the next step, you define two constructors (constructor methods can be overloaded), one for a known value and one for an unknown value:

// Constructor for a known value
public ComplexNumberCS(double real, double imaginary)
{
    this._real = real;
    this._imaginary = imaginary;
    this._isnull = false;
}

// Constructor for an unknown value
private ComplexNumberCS(bool isnull)
{
    this._isnull = isnull;
    this._real = this._imaginary = 0;
}

As mentioned earlier, you have to define a ToString method in your UDT to specify the default string representation. Because your type is derived from System.Object, which includes a ToString method, you must explicitly override the inherited method:

// Default string representation
public override string ToString()
    {
    return this._isnull ? NULL : ("("
        + this._real.ToString(CultureInfo.InvariantCulture) + ","
        + this._imaginary.ToString(CultureInfo.InvariantCulture)
        + "i)");
    }

Now you need two read-only properties to handle unknown values:

// Null handling
public bool IsNull
{
    get
    {
        return this._isnull;
    }
}

public static ComplexNumberCS Null
{
    get
    {
        return NULL_INSTANCE;
    }
}

And maybe the most important method, which will accept, check, and parse the input–the Parse method:

// Parsing input using regular expression
public static ComplexNumberCS Parse(SqlString sqlString)
{
    string value = sqlString.ToString();

    if (sqlString.IsNull || value == NULL)
        return new ComplexNumberCS(true);

    // Check whether the input value matches the regex pattern
    Match m = _parser.Match(value);

    // If the input's format is incorrect, throw an exception
    if (!m.Success)
        throw new ArgumentException(
            "Invalid format for complex number. "
            + "Format is ( n, mi ) where n and m are floating "
            + "point numbers in normal (not scientific) format "
            + "(nnnnnn.nn).");

    // If everything is OK, parse the value;
    // we will get two double type values
    return new ComplexNumberCS(double.Parse(m.Groups[1].Value,
        CultureInfo.InvariantCulture), double.Parse(m.Groups[2].Value,
        CultureInfo.InvariantCulture));
}

You’ve implemented the basic operations. All additional knowledge that the type should have can be added through additional methods and properties. So let’s start with the two properties already from the Microsoft sample–Real and Imaginary; they will be the public properties that access and modify the real and imaginary parts of a complex number:

// Properties to deal with real and imaginary parts separately
public double Real
{
    get
    {
        if (this._isnull)
            throw new InvalidOperationException();

        return this._real;
    }
    set
    {
        this._real = value;
    }
}

public double Imaginary
{
    get
    {
        if (this._isnull)
            throw new InvalidOperationException();

        return this._imaginary;
    }
    set
    {
        this._imaginary = value;
    }
}

You have arrived at the last part–programming arithmetic operations. The use of #region and #endregion here allows you to collapse this section of code in the Visual Studio .NET interface.

// Region with arithmetic operations
#region arithmetic operations

// Addition
public ComplexNumberCS AddCN(ComplexNumberCS c)
{
    // null checking
    if (this._isnull || c._isnull)
        return new ComplexNumberCS(true);
    // addition
    return new ComplexNumberCS(this.Real + c.Real,
        this.Imaginary + c.Imaginary);
}

// Subtraction
public ComplexNumberCS SubCN(ComplexNumberCS c)
    {
        // null checking
        if (this._isnull || c._isnull)
            return new ComplexNumberCS(true);
        // subtraction
        return new ComplexNumberCS(this.Real - c.Real,
            this.Imaginary - c.Imaginary);
    }

    // Multiplication
    public ComplexNumberCS MulCN(ComplexNumberCS c)
    {
        // null checking
        if (this._isnull || c._isnull)
            return new ComplexNumberCS(true);
        // multiplication
        return new ComplexNumberCS(this.Real * c.Real - this.Imaginary * c.Imaginary,
            this.Imaginary * c.Real + this.Real * c.Imaginary);
    }

    // Division
    public ComplexNumberCS DivCN(ComplexNumberCS c)
    {
        // null checking
        if (this._isnull || c._isnull)
            return new ComplexNumberCS(true);
        // division
        return new ComplexNumberCS(
            (this.Real * c.Real + this.Imaginary * c.Imaginary)
              / (c.Real * c.Real + c.Imaginary * c.Imaginary),
            (this.Imaginary * c.Real - this.Real * c.Imaginary)
              / (c.Real * c.Real + c.Imaginary * c.Imaginary)
            );
    }
    #endregion
}

If you followed the example, stitching the code snippets one after the other, your UDT is now ready to be built. Choose the Build menu, and select the first option, Build ComplexNumberCS. You could also deploy the UDT from Visual Studio .NET, and check the deployment options via the project’s Properties dialog box. But as I mentioned earlier, I’d like to show you the T-SQL commands required for this task.

Deploying the UDT Using T-SQL

The next steps deploy the new UDT in SQL Server using T-SQL code. To follow this part, use SSMS. First you need to enable CLR, unless it’s already enabled, and create a test database:

USE master;
EXEC sp_configure 'clr enabled', 1;
RECONFIGURE;
GO
CREATE DATABASE Clrtest;
GO
USE Clrtest;

Now you need to import the assembly to the database. To do so, use the CREATE ASSEMBLY command. The command has a PERMISSION_SET clause, which specifies a set of code access permissions that are granted to the assembly when it is accessed by SQL Server. If the clause is not specified, SAFE level is applied by default. I recommend using SAFE because it is the most restrictive permission set. Code executed by an assembly with SAFE permissions cannot access external system resources such as files, the network, environment variables, or the registry. EXTERNAL_ACCESS allows assemblies access to some external system resources. UNSAFE level allows unrestricted resource access, both within and outside the SQL Server instance, and calls to unmanaged code. Run the following code to import the assembly:

CREATE ASSEMBLY ComplexNumberCS
FROM ' ComplexNumberCSComplexNumberCSinDebugComplexNumberCS.dll'
WITH PERMISSION_SET = SAFE;

After you catalog the assembly, you can start using your new UDT. First you have to bind the SQL Server type to the .NET code using the CREATE TYPE command:

CREATE TYPE dbo.ComplexNumberCS
EXTERNAL NAME ComplexNumberCS.[ComplexNumberCS];

Then you simply use the UDT just like any other SQL Server native type. For example, the following code creates a table with a column of the new type:

CREATE TABLE dbo.CNUsage
(
  id INT IDENTITY(1,1) NOT NULL,
  cn ComplexNumberCS NULL
);

As you can see, the cn column allows NULLs. Now insert two rows with values in the format (a, bi), as expected by the Parse method. Then insert an incorrect value to show the error that you get:

-- Correct values
INSERT INTO dbo.CNUsage(cn) VALUES('(2,3i)'),
INSERT INTO dbo.CNUsage(cn) VALUES('(1,7i)'),
GO
-- Now an incorrect value
INSERT INTO dbo.CNUsage(cn) VALUES('(1i,7)'),

You get the following output as the result of the last INSERT:

Msg 6522, Level 16, State 2, Line 1
A .NET Framework error occurred during execution of user defined routine or aggregate
'ComplexNumberCS':
System.ArgumentException: Invalid format for complex number. Format is ( n, mi ) where n and
 m are floating point numbers in normal (not scientific) format (nnnnnn.nn).
System.ArgumentException:
   at ComplexNumberCS.Parse(SqlString sqlString)
.
The statement has been terminated.

As you can see, you get a generic SQL Server error number 6522 and the message you defined when throwing an exception in the Parse method. Next, issue a SELECT statement, and notice the output that you get in Table 1-16.

SELECT * FROM dbo.CNUsage;

Table 1-16. Output from SELECT Query against CNUsage

id

cn

1

0xC000000000000000C00800000000000000

2

0xBFF0000000000000C01C00000000000000

SSMS displays each cn value as a simple byte stream. SSMS is a client tool; the UDT’s code isn’t transferred from the server to the client, not even the code implementing the ToString method.

Note

Note

If CNUsage is viewed with SSMS’s "Open Table" command, the default string representation is shown; SSMS invokes ToString behind the scenes. You can easily see this for yourself by tracing the activity submitted from SSMS to SQL Server.

To get a proper string representation, you have to call ToString explicitly to force it to execute at the server. Or you can make the code available to the client by putting the assembly in the global assembly cache (GAC), for example. Dealing with the client side is outside the scope of this book, so I’ll make calls to ToString and other methods explicitly:

SELECT id, cn.ToString() AS cn
FROM dbo. CNUsage;

This query generates the output shown in Table 1-17.

Table 1-17. Formatted Complex Numbers

id

cn

1

(2,3i)

2

(1,7i)

With an explicit call to ToString, the presentation is more readable. Ordering should follow a byte order of the serialization (remember IsByteOrdered = true?), starting with the first member, which in our case is the real component of the complex number. Issue the following query, which sorts the rows by cn and generates the output shown in Table 1-18:

SELECT id, cn.ToString() AS cn
FROM dbo.CNUsage
ORDER BY cn;

Table 1-18. Formatted and Sorted Complex Numbers

id

cn

2

(1,7i)

1

(2,3i)

You can see that the second complex number (2, 3i) is sorted before the first one (2, 3i) because its real component is smaller. If you have different ordering needs, you have to implement a user-defined serialization and use your own algorithm for the Write method.

Note

Note

When you implement an interface, you have to implement all methods defined in the interface; hence you will also have to implement the Read method.

Next, check whether the UDT can really accept NULLs:

INSERT INTO dbo.CNUsage(cn) VALUES(NULL);

SELECT id, cn.ToString() AS cn,
  cn.Real AS [Real part],
  cn.Imaginary AS [Imaginary part]
FROM dbo.CNUsage;

The output is shown in Table 1-19.

Table 1-19. NULL Complex Numbers

id

cn

Real part

Imaginary part

1

(2,3i)

2

3

2

(1,7i)

1

7

4

NULL

NULL

NULL

The NULL is accepted and returned without any problem.

Note

Note

By the way, can you tell why the id column of the unknown complex number shows the value 4, and not 3? The reason is that upon an INSERT statement, the IDENTITY value increments regardless of whether the INSERT succeeded or failed.

As you can see, I also checked the additional Real and Imaginary properties implemented in the type. Finally, you can check the four arithmetic operations. I’ll do this using variables to show that a UDT works with variables just like any other data type. I’ll also use an explicit conversion from a string to the UDT–again, just to show that it is possible; SQL Server 2005 supports casting and conversion of UDTs to strings and vice versa. Run the following code to check whether the results of the complex arithmetic are correct:

-- Arithmetic operations
-- Addition
DECLARE @cn1 ComplexNumberCS, @cn2 ComplexNumberCS, @cn3 ComplexNumberCS;
SET @cn1 = CAST('(8, 5i)' AS ComplexNumberCS);
SET @cn2 = '(2, 1i)';
SET @cn3 = @cn1.AddCN(@cn2);
SELECT @cn3.ToString(), CAST(@cn3 AS VARCHAR(MAX)), @cn3.Real, @cn3.Imaginary;
GO
-- Subtraction
DECLARE @cn1 ComplexNumberCS, @cn2 ComplexNumberCS, @cn3 ComplexNumberCS;
SET @cn1 = CAST('(3, 4i)' AS ComplexNumberCS);
SET @cn2 = '(1, 2i)';
SET @cn3 = @cn1.SubCN(@cn2);
SELECT @cn3.ToString(), CAST(@cn3 AS VARCHAR(MAX)), @cn3.Real, @cn3.Imaginary;
GO
-- Multiplication
DECLARE @cn1 ComplexNumberCS, @cn2 ComplexNumberCS, @cn3 ComplexNumberCS;
SET @cn1 = CAST('(3, 2i)' AS ComplexNumberCS);
SET @cn2 = '(1, 4i)';
SET @cn3 = @cn1.MulCN(@cn2);
SELECT @cn3.ToString(), CAST(@cn3 AS VARCHAR(MAX)), @cn3.Real, @cn3.Imaginary;
GO
-- Division
DECLARE @cn1 ComplexNumberCS, @cn2 ComplexNumberCS, @cn3 ComplexNumberCS;
SET @cn1 = CAST('(10, 5i)' AS ComplexNumberCS);
SET @cn2 = '(2, 4i)';
SET @cn3 = @cn1.DivCN(@cn2);
SELECT @cn3.ToString(), CAST(@cn3 AS VARCHAR(MAX)), @cn3.Real, @cn3.Imaginary;
GO

However, try running the following code using the plus (+) operator and SUM aggregate function:

DECLARE @cn1 ComplexNumberCS, @cn2 ComplexNumberCS, @cn3 ComplexNumberCS;
SET @cn1 = CAST('(10, 5i)' AS ComplexNumberCS);
SET @cn2 = '(2, 4i)';
SET @cn3 = @cn1 + @cn2;
SELECT SUM(cn) FROM dbo.CNUsage;

You will get the following errors:

Msg 403, Level 16, State 1, Line 4
Invalid operator for data type. Operator equals add, type equals ComplexNumberCS.
Msg 8117, Level 16, State 1, Line 5
Operand data type ComplexNumberCS is invalid for sum operator.

You can now realize how useful it would have been if SQL Server 2005 supported operator overloading with CLR user-defined types. You can overcome the plus operator problem by writing your own methods, as I demonstrated earlier. However, you can’t use UDTs in aggregate functions that depend on arithmetic (like SUM, AVG), rather in functions that depend only on comparison or NULLability (like MAX, COUNT). Fortunately, SQL Server supports user-defined aggregate functions (UDA), which must be written in a .NET language, like user-defined types. If you need an aggregate function that would support your UDT, you have to create your own.

As background for creating UDAs, please refer to Inside T-SQL Querying. Let me quickly show an example. Add a new Aggregate item to the ComplexNumberCS project in Visual Studio .NET 2005, name it ComplexNumberCS_SUM.cs, substitute its body with the code in Example 1-4, and rebuild the project.

Example 1-4. C# .NET–based ComplexNumberCS UDA

C# ComplexNumberCS_SUM UDA
using System;
using System.Data;
using System.Data.SqlClient;
using System.Data.SqlTypes;
using Microsoft.SqlServer.Server;


[Serializable]
[Microsoft.SqlServer.Server.SqlUserDefinedAggregate(Format.Native)]
public struct ComplexNumberCS_SUM
{
    ComplexNumberCS cn;

    public void Init()
    {
        cn = ComplexNumberCS.Parse("(0, 0i)");
    }

    public void Accumulate(ComplexNumberCS Value)
    {
        cn = cn.AddCN(Value);
    }

    public void Merge(ComplexNumberCS_SUM Group)
    {
        Accumulate(Group.Terminate());
    }

    public ComplexNumberCS Terminate()
    {
        return cn;
    }

}

Next, you need to update the assembly in the database. To do so, use the ALTER ASSEMBLY command. You shouldn’t have any problems because you’re not changing the UDT you’re already using. Remember, you have to catalog the aggregate function using the CREATE AGGREGATE command:

-- Alter assembly to add the ComplexNumberCS_SUM UDA
ALTER ASSEMBLY ComplexNumberCS
FROM 'C:ComplexNumberCSComplexNumberCSinDebugComplexNumberCS.dll';
GO

-- Create the aggregate function
CREATE AGGREGATE dbo.ComplexNumberCS_SUM(@input ComplexNumberCS)
RETURNS ComplexNumberCS
EXTERNAL NAME ComplexNumberCS.[ComplexNumberCS_SUM];

And finally, use the new aggregate function to calculate the sum of all non-NULL values in the table, and you will get (3, 10i) back:

SELECT dbo.ComplexNumberCS_SUM(cn).ToString() AS ComplexSum
FROM CNUsage
WHERE cn IS NOT NULL;

You can see that CLR UDAs can be written to support CLR UDTs.

In case your language of preference is Visual Basic .NET, for your convenience, Example 1-5 and Example 1-6 have the code for the Visual Basic .NET–based UDT and UDA, respectively. Conceptually, all the discussion about UDTs and UDAs is language independent.

Example 1-5. Visual Basic .NET–based ComplexNumberVB UDT

Imports System
Imports System.Data
Imports System.Data.SqlClient
Imports System.Data.SqlTypes
Imports Microsoft.SqlServer.Server
Imports System.Text.RegularExpressions
Imports System.Globalization

<Serializable()> _
<Microsoft.SqlServer.Server.SqlUserDefinedType(Format.Native, _
   IsByteOrdered:=True)> _
Public Structure ComplexNumberVB
    Implements INullable

    Private Shared ReadOnly parser As New Regex( _
  "A(s*(?<real>-?d+(.d+)?)s*,s*(?<img>-?d+(.d+)?)s*is*)", _
    RegexOptions.Compiled Or RegexOptions.ExplicitCapture)
    Private realValue As Double
    Private imaginaryValue As Double
    Private isNullValue As Boolean
    Private Const nullValue As String = "<<null complex>>"
    Private Shared ReadOnly NULL_INSTANCE As New ComplexNumberVB(True)

    Public Sub New(ByVal real As Double, ByVal imaginary As Double)
        Me.realValue = real
        Me.imaginaryValue = imaginary
        Me.isNullValue = False
    End Sub

    Private Sub New(ByVal isnull As Boolean)
        Me.isNullValue = isnull
        Me.realValue = 0
        Me.imaginaryValue = 0
    End Sub

    Public Overrides Function ToString() As String
        If Me.isNullValue = True Then
            Return nullValue
        Else
            Return "(" & Me.realValue.ToString(CultureInfo.InvariantCulture) _
                & "," & Me.imaginaryValue.ToString( _
                  CultureInfo.InvariantCulture) _
                & "i)"
        End If
    End Function

    Public ReadOnly Property IsNull() As Boolean Implements INullable.IsNull
        Get
            Return Me.isNullValue
        End Get
    End Property

    Public Shared ReadOnly Property Null() As ComplexNumberVB
        Get
            Return NULL_INSTANCE
        End Get
    End Property

    Public Shared Function Parse(ByVal sqlString As SqlString) _
      As ComplexNumberVB
        Dim value As String = sqlString.ToString()

        If sqlString.IsNull Or value = nullValue Then
            Return New ComplexNumberVB(True)
        End If

        Dim m As Match = parser.Match(value)

        If Not m.Success Then
            Throw New ArgumentException( _
                "Invalid format for complex number. Format is " + _
                  "( n, mi ) where n and m are floating point numbers " + _
                  "in normal (not scientific) format (nnnnnn.nn).")
        End If

        Return New ComplexNumberVB(Double.Parse(m.Groups(1).Value, _
          CultureInfo.InvariantCulture), _
            Double.Parse(m.Groups(2).Value, CultureInfo.InvariantCulture))
    End Function

    Public Property Real() As Double
        Get
            If Me.isNullValue Then
                Throw New InvalidOperationException()
            End If
            Return Me.realValue
        End Get
        Set(ByVal Value As Double)
            Me.realValue = Value
        End Set
    End Property

    Public Property Imaginary() As Double
        Get
            If Me.isNullValue Then
                Throw New InvalidOperationException()
            End If
            Return Me.imaginaryValue
        End Get
        Set(ByVal Value As Double)
            Me.imaginaryValue = Value
        End Set
    End Property

#Region "arithmetic operations"

    ' Addition
    Public Function AddCN(ByVal c As ComplexNumberVB) As ComplexNumberVB
        'Null(checking)
        If Me.isNullValue Or c.isNullValue Then
            Return New ComplexNumberVB(True)
        End If
        ' addition
        Return New ComplexNumberVB(Me.Real + c.Real, _
            Me.Imaginary + c.Imaginary)
    End Function

    ' Subtraction
    Public Function SubCN(ByVal c As ComplexNumberVB) As ComplexNumberVB
        'Null(checking)
        If Me.isNullValue Or c.isNullValue Then
            Return New ComplexNumberVB(True)
        End If
        ' addition
        Return New ComplexNumberVB(Me.Real - c.Real, _
            Me.Imaginary - c.Imaginary)
    End Function


    ' Multiplication
    Public Function MulCN(ByVal c As ComplexNumberVB) As ComplexNumberVB
        'Null(checking)
        If Me.isNullValue Or c.isNullValue Then
            Return New ComplexNumberVB(True)
        End If
        ' addition
        Return New ComplexNumberVB(Me.Real * c.Real - _
          Me.Imaginary * c.Imaginary, _
            Me.Imaginary * c.Real + Me.Real * c.Imaginary)
    End Function


    ' Division
    Public Function DivCN(ByVal c As ComplexNumberVB) As ComplexNumberVB
        'Null(checking)
        If Me.isNullValue Or c.isNullValue Then
            Return New ComplexNumberVB(True)
        End If
        ' addition
        Return New ComplexNumberVB( _
            (Me.Real * c.Real + Me.Imaginary * c.Imaginary) _
              / (c.Real * c.Real + c.Imaginary * c.Imaginary), _
            (Me.Imaginary * c.Real - Me.Real * c.Imaginary) _
              / (c.Real * c.Real + c.Imaginary * c.Imaginary) _
          )
    End Function

#End Region

End Structure

Example 1-6. Visual Basic .NET–based ComplexNumberVB_SUM UDA

Imports System
Imports System.Data
Imports System.Data.SqlClient
Imports System.Data.SqlTypes
Imports Microsoft.SqlServer.Server


<Serializable()> _
<Microsoft.SqlServer.Server.SqlUserDefinedAggregate(Format.Native)> _
Public Structure ComplexNumberVB_SUM

    Dim cn As ComplexNumberVB

    Public Sub Init()
        cn = ComplexNumberVB.Parse("(0, 0i)")
    End Sub

    Public Sub Accumulate(ByVal value As ComplexNumberVB)
        cn = cn.AddCN(value)
    End Sub

    Public Sub Merge(ByVal value As ComplexNumberVB_SUM)
        Accumulate(value.Terminate())
    End Sub

    Public Function Terminate() As ComplexNumberVB
        Return cn
    End Function

End Structure

The T-SQL commands needed to deploy the UDT and UDA in SQL Server are the same ones you used earlier, of course.

Once you’re done experimenting with the new UDT, run the following code for cleanup:

USE master;
DROP DATABASE Clrtest;
GO
EXEC sp_configure 'clr enabled', 0;
RECONFIGURE;
GO
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.147.72.74