RDBMS implementations

So far, both of the data object implementations that we've created have overridden the _create and _update methods that were required in BaseDataObject. It would be fair, under the circumstances, to question why those were put in place at all. The short answer to that question is that both of the implementations that have come together so far use the same process at the data store level for creating and updating records and documents. As a result, they simply haven't been needed. If it was expected that hms_sys would never need any other database backend, we'd be justified in removing them from the entire code base.

However, what would've happened if the decision to use MongoDB had gone a different way, and the preferred (or mandated) backend data store engine was an RDBMS such as Microsoft SQL Server? Or, worse, what if that sort of change was mandated after the system was operational?

Setting aside the data migration planning that would have to happen and focusing on only the application and service code, what would that kind of change require? Not much, when it comes right down to it. A generic SQL/RDBMS engine ABC (HMSSQLDataObject) might look something like the following, for a given RDBMS API/library:

class HMSSQLDataObject(BaseDataObject, metaclass=abc.ABCMeta):
    """
Provides baseline functionality, interface requirements, and 
type-identity for objects that can persist their state-data to 
a (GENERIC) SQL-based RDBMS back-end data-store.
"""

The HMSSQLDataObject class that is shown here is by no means complete, but should serve as a reasonable starting point for building a full implementation of such a class, which connects to and uses data from any of several RDBM systems. The complete code, such as it is, can be found in the hms_core/ch-10-snippets directory of the project code.

The same _configuration class property would probably be in use, serving the same purpose. It's possible that the _data_dict_keys class attribute would also be of use in reducing record fields to a valid argument dictionary in from_data_dict. Since SQL, for the various CRUD operations, or at least for specific starting points for those CRUD operations, would need to be stored and accessible to the classes, a viable option for doing so would be to attach them as class-attributes, as well:

    ###################################
    # Class attributes/constants      #
    ###################################

    # - Keeps track of the global configuration for data-access
    _configuration = None
    # - Keeps track of the keys allowed for object-creation from 
    #   retrieved data
    _data_dict_keys = None
    # - SQL for various expected CRUD actions:
    _sql_create = """Some SQL string goes here"""
    _sql_read_oids = """Some SQL string goes here"""
    _sql_read_all = """Some SQL string goes here"""
    _sql_read_criteria = """Some SQL string goes here"""
    _sql_update = """Some SQL string goes here"""
    _sql_delete = """Some SQL string goes here"""

Since the SQL for the various CRUD operations would include the tables that the data is stored in, and the process of connecting to the database in most RDBMS' handles the equivalents to the connection and database in our MongoDB approach, only the connection itself needs to be tracked and available as a property:

    ###################################
    # Property-getter methods         #
    ###################################

    def _get_connection(self):
        try:
            return self.__class__._connection
        except AttributeError:
            # - Most RDBMS libraries provide a "connect" function, or 
            #   allow the creation of a "connection" object, using the 
            #   parameters we've named in DatastoreConfig, or simple 
            #   variations of them, so all we need to do is connect:
            self.__class__._connection = RDBMS.connect(
                **self.configuration
            )
            return self.__class__._connection

Like its equivalent in the Mongo-based implementation, a connection is lazily instantiated and performs an actual deletion, rather than resetting to default values, as follows:

    ###################################
    # Property-deleter methods        #
    ###################################

    def _del_connection(self) -> None:
        try:
            del self.__class__._connection
        except AttributeError:
            # - It may already not exist
            pass

The related property declaration is identical, and is shown as follows:

    ###################################
    # Instance property definitions   #
    ###################################

    connection = property(
        _get_connection, None, _del_connection, 
        'Gets or deletes the database-connection that the instance '
        'will use to manage its persistent state-data'
    )

Object initialization is also identical, as follows:

    ###################################
    # Object initialization           #
    ###################################

    def __init__(self, 
        oid:(UUID,str,None)=None, 
        created:(datetime,str,float,int,None)=None, 
        modified:(datetime,str,float,int,None)=None,
        is_active:(bool,int,None)=None, 
        is_deleted:(bool,int,None)=None,
        is_dirty:(bool,int,None)=None, 
        is_new:(bool,int,None)=None,
    ):
        """
Object initialization.

self .............. (HMSMongoDataObject instance, required) The 
                    instance to execute against
oid ............... (UUID|str, optional, defaults to None) The unique 
                    identifier of the object's state-data record in the 
                    back-end data-store
created ........... (datetime|str|float|int, optional, defaults to None) 
                    The date/time that the object was created
modified .......... (datetime|str|float|int, optional, defaults to None) 
                    The date/time that the object was last modified
is_active ......... (bool|int, optional, defaults to None) A flag 
                    indicating that the object is active
is_deleted ........ (bool|int, optional, defaults to None) A flag 
                    indicating that the object should be considered 
                    deleted (and may be in the near future)
is_dirty .......... (bool|int, optional, defaults to None) A flag 
                    indicating that the object's data needs to be 
                    updated in the back-end data-store
is_new ............ (bool|int, optional, defaults to None) A flag 
                    indicating that the object's data needs to be 
                    created in the back-end data-store
"""
        # - Call parent initializers if needed
        BaseDataObject.__init__(self, 
            oid, created, modified, is_active, is_deleted, 
            is_dirty, is_new
        )
        # - Perform any other initialization needed

The significant, substantial differences are mostly in the methods that handle the CRUD operations. The original save method, as implemented in BaseDataObject, is left in place, and will call the _create or _update methods, as determined by the is_dirty or is_new property values for the instance. Each of these methods is responsible for acquiring the SQL template from the appropriate class attribute, populating it, as needed, with current state data values, sanitizing the resultant SQL, and executing it against the connection:

    ###################################
    # Instance methods                #
    ###################################

    def _create(self):
        # - The base SQL is in self.__class__._sql_create, and the 
        #   field-values would be retrieved from self.to_data_dict():
        data_dict = self.to_data_dict()
        SQL = self.__class__._sql_create
        # - Some process would have to add the values, if not the keys, 
        #   into the SQL, and the result sanitized, but once that was 
        #   done, it'd become a simple query-execution:
        self.connection.execute(SQL)

    def _update(self):
        # - The base SQL is in self.__class__._sql_update, and the 
        #   field-values would be retrieved from self.to_data_dict():
        data_dict = self.to_data_dict()
        SQL = self.__class__._sql_update
        # - Some process would have to add the values, if not the keys, 
        #   into the SQL, and the result sanitized, but once that was 
        #   done, it'd become a simple query-execution:
        self.connection.execute(SQL)

Sanitizing SQL is a very important security precaution, reducing the risk of a system being vulnerable to an SQL injection attack. These attacks can compromise data confidentiality and integrity, at a minimum, and can also raise the risk of authentication and authorization compromises, perhaps even across multiple systems, depending on password policies and the enforcement of them. Most RDBMS APIs will have some mechanism for sanitizing SQL before executing it, and some will also support query parameterization that can also reduce the risk of vulnerabilities. As a basic rule of thumb, if data supplied by a user is being passed into a query, or even into a stored procedure, it should be sanitized wherever/whenever possible.

The delete class method is simple:

    ###################################
    # Class methods                   #
    ###################################

    @classmethod
    def delete(cls, *oids):
        # - First, we need the database-connection that we're 
        #   working with:
        connection = cls.get_connection()
        SQL = cls._sql_delete % oids
        # - Don't forget to sanitize it before executing it!
        result_set = connection.execute(SQL)

Most of the pattern and approach behind the get method should look familiar; again, it's got the same signature (and is intended to perform the same activities) as the methods that have been created so far, which implement the required functionality of the BaseDataObject:

    @classmethod
    def get(cls, *oids, **criteria) -> list:
        # - First, we need the database-connection that we're 
        #   working with:
        connection = cls.get_connection()
        # - The first pass of the process retrieves documents based 
        #   on oids or criteria.
        # - We also need to keep track of whether or not to do a 
        #   matches call on the results after the initial data-
        #   retrieval:
        post_filter = False

        # - Records are often returned as a tuple (result_set) 
        #   of tuples (rows) of tuples (field-name, field-value):
        #   ( ..., ( ('field-name', 'value' ), (...), ... ), …)

The branch that handles oid requests is as follows:

        if oids:
            # - Need to replace any placeholder values in the raw SQL
            #   with actual values, AND sanitize the SQL string, but 
            #   it starts with the SQL in cls._sql_read_oids
            SQL = cls._sql_read_oids
            result_set = connection.execute(SQL)
            if criteria:
                post_filter = True

The criteria branch is as follows:

        elif criteria:
            # - The same sort of replacement would need to happen here 
            #   as happens for oids, above. If the query uses just 
            #   one criteria key/value pair initially, we can use the 
            #   match-based filtering later to filter further as needed
            key = criteria.keys()[0]
            value = criteria[key]
            SQL = cls._sql_read_criteria % (key, value)
            result_set = connection.execute(SQL)
            if len(criteria) > 1:
                post_filter = True

The default branch that simply gets everything else is as follows:

        else:
            SQL = cls._sql_read_all
            result_set = connection.execute(SQL)

All of the branches generate a list of data_dict values that can be used to create object instances, though they may not be returned from the backend data store as dictionary values.

The lowest common denominator results of a query are, as noted in the preceding code comments, a tuple of tuples of tuples, which might look something like the following:

# This is the outermost tuple, collecting all of the 
# rows returned into a result_set:
(
    # Each tuple at this level is a single row:
    (
        # Each tuple at this level is a key/value pair:
        ('oid', '43d240cd-4c9f-44c2-a196-1c7c56068cef'),
        ('first_name', 'John'),
        ('last_name', 'Smith'),
        ('email', '[email protected]'),
        # ...
    ),
    # more rows could happen here, or not...
)

If the engine, or the Python API to the engine, provides a built-in mechanism for converting rows returned into dictionary instances, that's probably the preferred approach to use to make the conversion. If there isn't anything built in to handle that, converting the nested tuples to a series of dictionaries isn't difficult to do:

        # - We should have a result_set value here, so we can convert 
        #   it from the tuple of tuples of tuples (or whatever) into 
        #   data_dict-compatible dictionaries:
        data_dicts = [
            dict(
                [field_tuple for field_tuple in row]
            )
            for row in result_set
        ]

From this point on, the process is pretty much the same as in the previous implementations, in JSONFileDataObject and HMSMongoDataObject:

        # - With those, we can create the initial list of instances:
        results = [
            cls.from_data_dict(data_dict) 
            for data_dict in data_dicts
        ]
        # - If post_filter has been set to True, then the request 
        #   was for items by oid *and* that have certain criteria
        if post_filter:
            results = [
                obj for obj in results if obj.matches(**criteria)
            ]

Another (potentially major) difference concerns how child objects, such as the products in an Artisan object, will have to be handled. If there is a need to fetch those child objects as objects and populate the parent object with them, assuming that they use the same BaseDataObject-derived interface, each child type will have a class associated with it, each of those classes will have a get method, and that get method will allow the oid of the parent object to be specified as criteria. That will allow for a process that looks like the following, used to retrieve and attach any child objects, as needed (using Artisan and Product classes as an example):

        # - Data-objects that have related child items, like the 
        #   Artisan to Product relationship, may need to acquire 
        #   those children here before returning the results. If 
        #   they do, then a structure like this should work most 
        #   of the time:
        for artisan in results:
            artisan._set_products(
                Product.get(artisan_oid=artisan.oid)
            )
        return results

The other members of a final business/data object class that derives from HMSSQLDataObject should, for the most part, be expected by now, since they are also required for the implementation of final data objects derived from the other two DataObject ABCs. They would include the concrete implementations of to_data_dict and matches instance methods and the from_data_dict class method, and the various class-specific variables (mostly the _sql class attributes).

Table of Contents for RDBMS implementations

Create new playlist

Sign In

Sign Up

Table of Contents for
RDBMS implementations