The BaseDataObject ABC

The bulk of the properties of BaseDataObject are Boolean values, flags that indicate whether an instance of the class is in a specific state. The implementations of those properties all follow a simple pattern that's already been shown in the definition of the available property of BaseProduct in the previous iteration. That structure looks like this:

###################################
# Property-getter methods         #
###################################

def _get_bool_prop(self) -> (bool,):
    return self._bool_prop

###################################
# Property-setter methods         #
###################################

def _set_bool_prop(self, value:(bool,int)):
    if value not in (True, False, 1, 0):
        raise ValueError(
            '%s.bool_prop expects either a boolean value '
            '(True|False) or a direct int-value equivalent '
            '(1|0), but was passed "%s" (%s)' % 
            (self.__class__.__name__, value, type(value).__name__)
        )
    if value:
        self._bool_prop = True
    else:
        self._bool_prop = False

###################################
# Property-deleter methods        #
###################################

def _del_bool_prop(self) -> None:
    self._bool_prop = False

###################################
# Instance property definitions   #
###################################

bool_prop = property(
    _get_bool_prop, _set_bool_prop, _del_bool_prop, 
    'Gets sets or deletes the flag that indicates whether '
    'the instance is in a particular state'
)

The deleter methods behind those properties, since they are also used to set the default values for an instance during initialization, should yield specific values when the properties are deleted (calling those methods):

###################################
# Property-deleter methods        #
###################################

def _del_is_active(self) -> None:
    self._is_active = True

def _del_is_deleted(self) -> None:
    self._is_deleted = False

def _del_is_dirty(self) -> None:
    self._is_dirty = False

def _del_is_new(self) -> None:
    self._is_new = True

Unless overridden by a derived class, or by a specific object creation process, any instance derived from BaseDataObject will start with these:

is_active == True
is_deleted == False
is_dirty == False
is_new == True

So a newly created instance will be active, not deleted, not dirty, and new , the assumption being that the process of creating a new object will usually be with the intention of saving a new, active object. If any state changes are made between the creation of the instance, those may set the is_dirty flag to True in the process, but the fact that is_new is True means that the object's record needs to be created rather than updated in the backend datastore.

The only significant deviation from that standard Boolean property structure is in the documentation of the properties themselves during their definition:

###################################
# Instance property definitions   #
###################################

is_active = property(
    _get_is_active, _set_is_active, _del_is_active, 
    'Gets sets or deletes the flag that indicates whether '
    'the instance is considered active/available'
)
is_deleted = property(
    _get_is_deleted, _set_is_deleted, _del_is_deleted, 
    'Gets sets or deletes the flag that indicates whether '
    'the instance is considered to be "deleted," and thus '
    'not generally available'
)
is_dirty = property(
    _get_is_dirty, _set_is_dirty, _del_is_dirty, 
    'Gets sets or deletes the flag that indicates whether '
    'the instance's state-data has been changed such that '
    'its record needs to be updated'
)
is_new = property(
    _get_is_new, _set_is_new, _del_is_new, 
    'Gets sets or deletes the flag that indicates whether '
    'the instance needs to have a state-data record created'
)

Two of the properties of BaseDataObject, created and modified, are shown in the class diagram as datetime values – objects that represent a specific time of day on a specific date. A datetime object stores the year, month, day, hour, minute, second, and microsecond of a date/time, and provides several conveniences over, say, working with an equivalent value that is managed strictly as a timestamp number value, or a string representation of a date/time. One of those conveniences is the ability to parse a value from a string, allowing the _set_created and _set_modified setter methods behind the property to accept a string value instead of requiring an actual datetime. Similarly, datetime provides the ability to create a datetime instance from a timestamp – the number of seconds elapsed from a common starting date/time. In order to fully support all those argument types, it's necessary to define a common format string that will be used to parse the datetime values from strings and to format them into strings. That value, at least for now, feels like it's probably best stored as a class attribute on BaseDataObject itself. That way, all classes that derive from it will have the same value available by default:

class BaseDataObject(metaclass=abc.ABCMeta):
    """
Provides baseline functionality, interface requirements, and 
type-identity for objects that can persist their state-data in 
any of several back-end data-stores.
"""
    ###################################
    # Class attributes/constants      #
    ###################################

    _data_time_string = '%Y-%m-%d %H:%M:%S'

The setter methods are somewhat longer than most, since they are dealing with four different viable value types, though there are only two subprocesses required to cover all of those variations. The setter process starts by type checking the supplied value and confirming that it's one of the accepted types first:

def _set_created(self, value:(datetime,str,float,int)):
    if type(value) not in (datetime,str,float,int):
        raise TypeError(
            '%s.created expects a datetime value, a numeric '
            'value (float or int) that can be converted to '
            'one, or a string value of the format "%s" that '
            'can be parsed into one, but was passed '
            '"%s" (%s)' % 
            (
                self.__class__.__name__, 
                self.__class__._data_time_string, value, 
                type(value).__name__, 
            )
        )

Handling either of the numeric types that are legitimate is fairly straightforward. If an error is detected, we should provide more specific messaging around the nature of the encountered problem:

 if type(value) in (int, float):
   # - A numeric value was passed, so create a new 
   #   value from it
      try:
         value = datetime.fromtimestamp(value)
      except Exception as error:
         raise ValueError(
             '%s.created could not create a valid datetime '
             'object from the value provided, "%s" (%s) due '
             'to an error - %s: %s' % 
             (
                self.__class__.__name__, value, 
                type(value).__name__, 
                error.__class__.__name__, error
              )
           )

The subprocess for handling string values is similar, apart from its call to datetime.strptime instead of datetime.fromtimestamp, and its use of the _data_time_string class attribute to define what a valid date/time string looks like:

 elif type(value) == str:
    # - A string value was passed, so create a new value 
    #   by parsing it with the standard format
      try:
         value = datetime.strptime(
         value, self.__class__._data_time_string
         )
       except Exception as error:
          raise ValueError(
            '%s.created could not parse a valid datetime '
            'object using "%s" from the value provided, '
            '"%s" (%s) due to an error - %s: %s' % 
             (
                 self.__class__.__name__, 
                 self.__class__._data_time_string, 
                 value, type(value).__name__, 
                 error.__class__.__name__, error
              )
          )

If the original value was an instance of datetime, then neither of the previous subprocesses would have executed. If either of them executed, then the original value argument will have been replaced with a datetime instance. In either case, that value can be stored in the underlying property attribute:

# - If this point is reached without error,then we have a 
#   well-formed datetime object, so store it
self._created = value

For the purposes of BaseDataObject, both created and modified should always have a value, and if one isn't available when it's needed – generally only when a data object's state data record is being saved – one should be created then and there for the current value, which can be accomplished in the getter method with datetime.now():

def _get_created(self) -> datetime:
    if self._created == None:
        self.created = datetime.now()
    return self._created

That, in turn, implies that the deleter method should set the property storage attribute's value to None:

def _del_created(self) -> None:
    self._created = None

The corresponding property definitions are standard, except that the created property doesn't allow deletion directly; it makes no sense to allow an object to delete its own created date/time:

###################################
# Instance property definitions   #
###################################

created = property(
    _get_created, _set_created, None, 
    'Gets, sets or deletes the date-time that the state-data '
    'record of the instance was created'
)

# ...

modified = property(
    _get_modified, _set_modified, _del_modified, 
    'Gets, sets or deletes the date-time that the state-data '
    'record of the instance was last modified'
)

The last property of BaseDataObject is, perhaps, the most critical oid, which is intended to uniquely identify the state data record for a given data object. That property is defined as a Universally Unique Identifier (UUID) value, which Python provides in its uuid library. There are at least two advantages to using a UUID as a unique identifier instead of some of the more traditional approaches, such as a serial record number:

UUIDs are not dependent on a database operation's success to be available: They can be generated in code, without having to worry about waiting for a SQL INSERT to complete, for example, or whatever corresponding mechanism might be available in a NoSQL data store. That means fewer database operations, and probably simpler ones as well, which makes things easier.
UUIDs are not easily predictable: A UUID is a series of 32 hexadecimal digits (with some dashes separating them into sections that are not relevant for this discussion), such as ad6e3d5c-46cb-4547-9971-5627e6b3039a. If they are generated with any of several standard functions provided by the uuid library, their sequence, if not truly random, is at least random enough to make finding a given value very difficult for a malicious user, with 3.4 × 10³⁴ possible values to look for (16 values per hex digit, 31 digits because one is reserved).

The unpredictability of UUIDs is especially useful in applications that have data accessible over the internet. Identification of records by sequential numbering makes it much easier for malicious processes to hit an API of some sort and just retrieve each record in sequence, all else being equal.

There are some caveats, though:

Not all database engines will recognize UUID objects as viable field types. That can be managed by storing actual UUID values in the data objects, but writing and reading string representations of those values to and from the database.
There may be very slight performance impacts on database operations that use UUIDs as unique identifiers as well, especially if a string representation is used instead of the actual value.
Their inherent unpredictability can make legitimate examination of data difficult if there aren't other identifying criteria that can be used – human-meaningful data values that can be queried against (against other identifying criteria).

Even setting the advantages aside, BaseDataObject will use UUIDs for object identity (the oid property) because of a combination of requirements and expected implementations:

The Artisan Application won't have a real database behind it. It'll probably end up being a simple, local document store so the generation of a unique identifier for any given data object must be something that's self-contained and not reliant on anything other than the application's code base.
The same oid values need to propagate to and from the Artisan Application and the Artisan Gateway service. Trying to coordinate identities across any number of artisans could lead, very quickly, to identity collisions, and mitigating that would probably require more work (maybe a lot more) without making significant changes to the requirements of the system, or at least how the various installables in the system interact. The likelihood of collisions between any two randomly-generated UUIDs is extremely low (if not impossible for all practical purposes), simply because of the number of possible values involved.

Implementation of the oid property will follow a pattern similar to the one established for the ones based on datetime. The getter method will create one on demand, the setter method will accept UUID objects or string representations of it and create actual UUID objects internally, and the deleter method will set the current storage value to None:

def _get_oid(self) -> UUID:
    if self._oid == None:
        self._oid = uuid4()
    return self._oid

# ...

def _set_oid(self, value:(UUID,str)):
    if type(value) not in (UUID,str):
        raise TypeError(
            '%s.oid expects a UUID value, or string '
            'representation of one, but was passed "%s" (%s)' % 
            (self.__class__.__name__, value, type(value).__name__)
        )
    if type(value) == str:
        try:
            value = UUID(value)
        except Exception as error:
            raise ValueError(
                '%s.oid could not create a valid UUID from '
                'the provided string "%s" because of an error '
                '%s: %s' % 
                (
                    self.__class__.__name__, value, 
                    error.__class__.__name__, error
                )
            )
    self._oid = value

# ...

def _del_oid(self) -> None:
    self._oid = None

Most of the methods of BaseDataObject are abstract, including all of the class methods. None of them has any concrete implementations that might be reused in derived classes, so they are all very basic definitions:

    ###################################
    # Abstract methods                #
    ###################################

    @abc.abstractmethod
    def _create(self) -> None:
        """
Creates a new state-data record for the instance in the back-end 
data-store
"""
        raise NotImplementedError(
            '%s has not implemented _create, as required by '
            'BaseDataObject' % (self.__class__.__name__)
        )

    @abc.abstractmethod
    def to_data_dict(self) -> (dict,):
        """
Returns a dictionary representation of the instance which can 
be used to generate data-store records, or for criteria-matching 
with the matches method.
"""
        raise NotImplementedError(
            '%s has not implemented _create, as required by '
            'BaseDataObject' % (self.__class__.__name__)
        )

    @abc.abstractmethod
    def _update(self) -> None:
        """
Updates an existing state-data record for the instance in the 
back-end data-store
"""
        raise NotImplementedError(
            '%s has not implemented _update, as required by '
            'BaseDataObject' % (self.__class__.__name__)
        )

    ###################################
    # Class methods                   #
    ###################################

    @abc.abstractclassmethod
    def delete(cls, *oids):
        """
Performs an ACTUAL record deletion from the back-end data-store 
of all records whose unique identifiers have been provided
"""
        raise NotImplementedError(
            '%s.delete (a class method) has not been implemented, '
            'as required by BaseDataObject' % (cls.__name__)
        )

    @abc.abstractclassmethod
    def from_data_dict(cls, data_dict:(dict,)):
        """
Creates and returns an instance of the class whose state-data has 
been populate with values from the provided data_dict
"""
        raise NotImplementedError(
            '%s.from_data_dict (a class method) has not been '
            'implemented, as required by BaseDataObject' % 
            (cls.__name__)
        )

    @abc.abstractclassmethod
    def get(cls, *oids, **criteria):
        """
Finds and returns all instances of the class from the back-end 
data-store whose oids are provided and/or that match the supplied 
criteria
"""
        raise NotImplementedError(
            '%s.get (a class method) has not been implemented, '
            'as required by BaseDataObject' % (cls.__name__)
        )

The to_data_dict instance method and the from_data_dict class method are intended to provide mechanisms to represent an instance's complete state data as a dict, and create an instance from such a dict representation, respectively. The from_data_dict method should facilitate record retrieval and conversion into actual programmatic objects across most standard RDBMS-connection libraries in Python, especially if the field names in the database are identical to the property names of the class. Similar usage should be viable in NoSQL data stores as well. Though the to_data_dict method may or may not be as useful in writing records to a data store, it will be needed to match objects based on criteria (the matches method, which we'll get to shortly).

PEP-249, the current Python Database API Specification, defines an expectation that database queries in libraries that conform to the standards of the PEP will, at a minimum, return lists of tuples as result sets. Most mature database connector libraries also provide a convenience mechanism to return a list of dict record values, where each dict maps field names as keys to the values of the source records.

The _create and _update methods are simply requirements for the record creation and record update processes, and will eventually be called by the save method. The need for separate record creation and record update processes may not be applicable to all data store engines, though; some, especially in the NoSQL realm, already provide a single mechanism for writing a record, and simply don't care whether it already exists. Others may provide some sort of mechanism that will allow an attempt to create a new record to be made first, and if that fails (because a duplicate key is found, indicating that the record already exists), then update the existing record instead. This option is available in MySQL and MariaDB databases, but may exist elsewhere. In any of those cases, overriding the save method to use those single-point-of-contact processes may be a better option.

The delete class method is self-explanatory, and sort probably is as well.

The get method requires some examination, even without any concrete implementation. As noted earlier, it is intended to be the primary mechanism for returning objects with state data retrieved from the database, and to accept both zero-to-many object IDs (the *oids argument list) and filtering criteria (in the **criteria keyword arguments). The expectation for how the whole get process will actually work is as follows:

If oids is not empty:
1. Perform whatever low-level query or lookup is needed to find objects that match one of the provided oids, processing each record with from_data_dict and yielding a list of objects
2. If criteria is not empty, filter the current list down to those objects whose matches results against the criteria are True
3. Return the resulting list
Otherwise, if criteria is not empty:
- Perform whatever low-level query or lookup is needed to find objects that match one of the provided criteria values, processing each record with from_data_dict and yielding a list of objects
- Filter the current list down to those objects whose matches results against the criteria are True
- Return the resulting list
Otherwise, perform whatever low-level query or lookup is needed to retrieve all available objects, again processing each record with from_data_dict, yielding a list of objects and simply returning them all

Taken together, the combination of the oids and criteria values will allow the get class method to find and return objects that do the following:

Match one or more oids: get(oid[, oid, …, oid])
Match one or more oids and some set of criteria: get(oid[, oid, …, oid], key=value[, key=value, …, key=value])
Match one or more criteria key/value pairs, regardless of the oids of the found items: get(key=value[, key=value, …, key=value])
That simply exist in the backend data store: get()

That leaves the matches and save methods, the only two concrete implementations in the class. The goal behind matches is to provide an instance-level mechanism for comparing the instance with criteria names/values, which is the process that the criteria in the get method uses and relies upon to actually find matching items. Its implementation is simpler than it might appear at first, but relies on operations against set objects, and on a Python built-in function that is often overlooked (all), so the process itself is heavily commented in the code:

###################################
# Instance methods                #
###################################

def matches(self, **criteria) -> (bool,):
    """
Compares the supplied criteria with the state-data values of 
the instance, and returns True if all instance properties 
specified in the criteria exist and equal the values supplied.
"""
    # - First, if criteria is empty, we can save some time 
    #   and simply return True - If no criteria are specified, 
    #   then the object is considered to match the criteria.
    if not criteria:
        return True
    # - Next, we need to check to see if all the criteria 
    #   specified even exist in the instance:
    data_dict = self.to_data_dict()
    data_keys = set(check_dict.keys())
    criteria_keys = set(criteria.keys())
    # - If all criteria_keys exist in data_keys, then the 
    #   intersection of the two will equal criteria_keys. 
    #   If that's not the case, at least one key-value won't 
    #   match (because it doesn't exist), so return False
    if criteria_keys.intersection(data_keys) != criteria_keys:
        return False
    # - Next, we need to verify that values match for all 
    #   specified criteria
    return all(
        [
            (data_dict[key] == criteria[key]) 
            for key in criteria_keys
        ]
    )

The all function is a nice convenience it returns True if all of the items in the iterable it's passed evaluate to True (or at least true-ish, so non-empty strings, lists, tuples, and dictionaries, and non-zero numbers, would all be considered True). It returns False if any members of the iterable aren't True, and returns True if the iterable is empty. The results of matches will be False if these conditions occur:

Any key in the criteria doesn't exist in the instance's data_dict – a criteria key that cannot be matched, essentially
Any value specified in criteria doesn't exactly match its corresponding value in the instance's data_dict

The save method is very simple. It just calls the instance's _create or _update methods based on the current state of the instance's is_new or is_dirty flag properties, respectively, and resets those flags after either executes, leaving the object clean and ready for whatever might come next:

    def save(self):
        """
Saves the instance's state-data to the back-end data-store by 
creating it if the instance is new, or updating it if the 
instance is dirty
"""
        if self.is_new:
            self._create()
            self._set_is_new = False
            self._set_is_dirty = False

        elif self.is_dirty:
            self._update()
            self._set_is_dirty = False
            self._set_is_new = False

The initialization of a BaseDataObject should allow values for all of its properties, but not require any of those values:

    def __init__(self, 
        oid:(UUID,str,None)=None, 
        created:(datetime,str,float,int,None)=None, 
        modified:(datetime,str,float,int,None)=None,
        is_active:(bool,int,None)=None, 
        is_deleted:(bool,int,None)=None,
        is_dirty:(bool,int,None)=None, 
        is_new:(bool,int,None)=None,
    ):

The actual initialization process follows the previously established pattern for optional arguments for all arguments in that case: calling the corresponding _del_ method for each, then calling the corresponding _set_ method for each if the argument isn't None. Let's use the oid argument as an example:

        # - Call parent initializers if needed
        # - Set default instance property-values using _del_... methods

        # ...

        self._del_oid()
        # - Set instance property-values from arguments using 
        #   _set_... methods
        if oid != None:
            self._set_oid(oid)

        # ...

        # - Perform any other initialization needed

This initializer method's signature is getting pretty long, with seven arguments (ignoring self, since that will always be present, and will always be the first argument). Knowing that we'll eventually define concrete classes as combinations of BaseDataObject and one of the business object classes defined, the signature for __init__ on those concrete classes could get much longer, too. That, though, is part of the reason why the initialization signature of BaseDataObject makes all of the arguments optional. Taken in combination with one of those business object classes, BaseArtisan, for example, with an __init__ signature of:

def __init__(self, 
    contact_name:str, contact_email:str, 
    address:Address, company_name:str=None, 
    website:(str,)=None, 
    *products
    ):

The combined __init__ signature for an Artisan that's derived from both, while long...

def __init__(self, 
    contact_name:str, contact_email:str, 
    address:Address, company_name:str=None, 
    website:(str,)=None, 
    oid:(UUID,str,None)=None, 
    created:(datetime,str,float,int,None)=None, 
    modified:(datetime,str,float,int,None)=None,
    is_active:(bool,int,None)=None, 
    is_deleted:(bool,int,None)=None,
    is_dirty:(bool,int,None)=None, 
    is_new:(bool,int,None)=None,
    *products
    ):

... only requires the contact_name, contact_email, and address arguments that BaseArtisan requires, and allows all of the arguments to be passed as if they were keyword arguments, like this:

artisan = Artisan(
    contact_name='John Doe', contact_email='[email protected]', 
    address=my_address, oid='00000000-0000-0000-0000-000000000000', 
    created='2001-01-01 12:34:56', modified='2001-01-01 12:34:56'
)

Allows the entire parameter set to be defined as a single dictionary and passed whole-cloth to the initializer using the same syntax that passing a keyword argument set would use:

artisan_parameters = {
    'contact_name':'John Doe',
    'contact_email':'[email protected]', 
    'address':my_address,
    'oid':'00000000-0000-0000-0000-000000000000', 
    'created':'2001-01-01 12:34:56', 
    'modified':'2001-01-01 12:34:56'
}
artisan = Artisan(**artisan_parameters)

That syntax for passing arguments in a dictionary using **dictionary_name is a common form of argument parameterization in Python, especially in functions and methods where the full collection of arguments is unreasonably long. It requires some thought and discipline on the design side of the development process, and an eye toward being very restrictive with respect to required arguments, but in the long run, it's more helpful and easier to use than might appear at first glance.

This last structure will be critical in the implementation of the from_data_dict methods of the various classes derived from BaseDataObject – in most cases, it should allow the implementation of those methods to be little more than this:

@classmethod
def from_data_dict(cls, data_dict):
    return cls(**data_dict)

Table of Contents for The BaseDataObject ABC

Create new playlist

Sign In

Sign Up

Table of Contents for
The BaseDataObject ABC