22.1. XML Schema Design Considerations

When creating an XML schema document whose XML instances will be stored in a relational database (in non-XML form), the XML schema designer needs to consider the impacts that design decisions might have on the database representation. In addition, certain decisions need to be made explicitly that would not normally affect an XML schema document. This section lists the major considerations and provides recommendations and discussion around each of them.

22.1.1. Patterns

When patterns are used, determine whether the database needs to enforce the pattern or whether the enforcement can be left solely to the XML schema processor. If database enforcement is required, be aware that SQL does not perform regular expression pattern matching but only wildcard and single character matching. Patterns such as ‘ab{2,}x’ simply cannot be represented in SQL and would require significant effort using Java stored procedures to enforce.

However, simple patterns and most real-world examples are representable. Refer to Section 22.2.1 for more detail. This section also provides working examples of some simpler patterns.

22.1.2. Whitespace

Whitespace can be preserved, trimmed, or collapsed. Although the database can trim and collapse whitespace for every database insert and update, this should be preprocessed before the database is involved. From a design pattern perspective, this would be done using a decorating filter. Otherwise, triggers would need to convert every insert and update for every character datatype column in every table where trim or collapse is explicitly or implicitly specified.

22.1.3. Strings

The length of a string is of critical importance to a database designer. Different string lengths force substantially different implementations in the database and affect the size and performance of the RDBMS. In addition, know what languages need to be supported. These again force substantially different implementations.

22.1.4. Decimals

For the decimal datatype and all its datatypes derived by restriction, specifying an appropriate combination of the number of totalDigits, fractionDigits, minInclusive, maxInclusive, minExclusive, and maxExclusive is critically important. Databases allocate a fixed amount of space for numbers; not specifying these constraining facets leads to wasted disk space and/or XML instance values that cannot be represented in the database.

22.1.5. Floats and Doubles

The float and double datatypes are described by IEEE 754-1985, IEEE Standard for Binary Floating-Point Arithmetic. Oracle does not support this standard directly; as a result, certain numbers cannot be represented easily and some operations require special support. The annotation associated with the use of these datatypes should be clear as to whether full support is required. This support is rarely needed in most business software systems. The XML schema designer should seriously consider using the decimal datatype or one of the datatypes derived from it by restriction, if possible.

22.1.6. Boolean

For the boolean datatype, maintaining all four possible literals in the database (‘true’, ‘false’, ‘1’, and ‘0’) is problematic because a boolean has two values in the value space but four in the database. Either use a pattern to restrict the values to two literals or have the database insert and update only two of the literals.

22.1.7. Time Zones and Their Interaction with gMonth, gYear, and gDay

For the gMonth, gYear, and gDay datatypes, consider whether a time zone is necessary or whether all values can be represented in UST. If not, these can be implemented as though they were integer datatypes. With a time zone, the implementation becomes complex and somewhat baroque.

22.1.8. Time Zones and Their Interaction with Other Date/Time Datatypes

For the time, date, dateTime, gYearMonth, and gMonthDay datatypes, considering the value of a time zone is important or whether all values can be represented in UST. In Oracle, a different database column datatype is chosen that is easier to use when a time zone is not necessary.

22.1.9. IDREFS and NMTOKENS Datatypes

The IDREFS and NMTOKENS datatypes allow multiple values by definition. The database can support multiple representations and can support full or partial validation. Determine whether the individual values need to be accessed separately or only together and whether database validation is necessary.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.227.24.60