Data modeling is the process of translating the data requirements of your application to the features of your data storage technology. While the application deals in players, towns, weapons, potions, and gold, the datastore knows only entities, entity groups, keys, properties, and indexes. The data model describes how the data is stored and how it is manipulated. Entities represent players and game objects; properties describe the status of objects and the relationships between them. When an object changes location, the data is updated in a transaction, so the object cannot be in two places at once. When a player wants to know about the weapons in her inventory, the application performs a query for all weapon objects whose location is the player, possibly requiring an index.
In the last few chapters, we’ve been using the Python class db.Expando
to create and manipulate entities and
their properties. As we’ve been doing it, this class illustrates the
flexible nature of the datastore. The datastore itself does not impose or
enforce a structure on entities or their properties, giving the application
control over how individual entities represent data objects. This
flexibility is also an essential feature for scalability: changing the
structure of millions of records is a large task, and the proper strategy
for doing this is specific to the task and the application.
But structure is needed. Every player has a number of health points,
and a Player
entity without a health
property, or
with a health
property whose value is not an integer, is likely
to confuse the battle system. The data ought to conform to a structure, or
schema, to meet the expectations of the
code. Because the datastore does not enforce this schema itself—the
datastore is schemaless—it is up to the application to
ensure that entities are created and updated properly.
App Engine includes a data modeling library for defining and enforcing
data schemas in Python. This library resides in the google.appengine.ext.db
package. It includes
several related classes for representing data objects, including db.Model
, db.Expando
, and db.PolyModel
. To give structure to entities of a
given kind, you create a subclass of one of these classes. The definition of
the class specifies the properties for those objects, their allowed value
types, and other requirements.
In this chapter, we introduce the Python data modeling library and discuss how to use it to enforce a schema for the otherwise schemaless datastore. We also discuss how the library works and how to extend it.
The db.Model
superclass lets you specify a
structure for every entity of a kind. This structure can include the names
of the properties, the types of the values allowed for those properties,
whether the property is required or optional, and a default value. Here is
a definition of a Book
class similar to the one we created in
Chapter 5:
from google.appengine.ext import db import datetime class Book(db.Model): title = db.StringProperty(required=True) author = db.StringProperty(required=True) copyright_year = db.IntegerProperty() author_birthdate = db.DateProperty() obj = Book(title='The Grapes of Wrath', author='John Steinbeck') obj.copyright_year = 1939 obj.author_birthdate = datetime.date(1902, 2, 27) obj.put()
This Book
class inherits from db.Model
. In
the class definition, we declare that all Book
entities have
four properties, and we declare their value types: title
and
author
are strings, copyright_year
is an
integer, and author_birthdate
is a date-time. If someone
tries to assign a value of the wrong type to one of these properties, the
assignment raises a db.BadValueError
.
We also declare that title
and author
are
required properties. If someone tries to create a Book
without these properties set as arguments to the Book
constructor, the attempt raises a db.BadValueError
.
copyright_year
and author_birthdate
are
optional, so we can leave them unset on construction, and assign values to
the properties later. If these properties are not set by the time the
object is saved, the resulting entity will not have these properties—and
that’s allowed by this model.
A property declaration ensures that the entity created from the
object has a value for the property, possibly None
. As we’ll
see in the next section, you can further specify what values are
considered valid using arguments to the property declaration.
A model class that inherits from db.Model
ignores all
attributes that are not declared as properties when it comes time to save
the object to the datastore. In the resulting entity, all declared
properties are set, and no others.
This is the sole difference between db.Model
and
db.Expando
. A db.Model
class ignores undeclared
properties. A db.Expando
class saves all attributes of the
object as properties of the corresponding entity. That is, a model using a
db.Expando
class “expands” to accommodate assignments to
undeclared properties.
You can use property declarations with db.Expando
just
as with db.Model
. The result is a data object that validates
the values of the declared properties, and accepts any values for
additional undeclared properties.
The official documentation refers to properties with
declarations as static properties and properties on
a db.Expando
without declarations as dynamic
properties. These terms have a nice correspondence with the
notions of static and dynamic typing in programming languages. Property
declarations implement a sort of runtime validated static typing for
model classes, on top of Python’s own dynamic typing.
As we’ll see, property declarations are even more powerful than static typing, because they can validate more than just the type of the value.
For both db.Model
and db.Expando
, object
attributes whose names begin with an underscore (_
) are
always ignored. You can use these private attributes to attach transient
data or functions to model objects. (It’s possible to create an entity
with a property whose name starts with an underscore; this convention only
applies to object attributes in the modeling API.)
Because model objects also have attributes that are methods and
other features, you cannot use certain names for properties in the Python
model API. Some of the more pernicious reserved names are
key
, kind
, and parent
. The official
documentation has a complete list of reserved names. In the next section,
we’ll see a way to use these reserved names for datastore properties even
though they aren’t allowed as attribute names in the API.
Beyond the model definition, db.Model
and db.Expando
have the same
interface for saving, fetching, and deleting entities, and for performing
queries and transactions. db.Expando
is a subclass of
db.Model
.
You declare a property for a model by assigning a property
declaration object to an attribute of the model class. The name of the
attribute is the name of the datastore property. The value is an object
that describes the terms of the declaration. As discussed earlier, the
db.StringProperty
object assigned to the
title
class attribute says that the entity that an instance
of the class represents can only have a string value for its
title
property. The required=True
argument to
the db.StringProperty
constructor says that the
object is not valid unless it has a value for the title
property.
This can look a little confusing if you’re expecting the class
attribute to shine through as an attribute of an instance of the class, as
it normally does in Python. Instead, the db.Model
class hooks
into the attribute assignment mechanism so it can use the property
declaration to validate a value assigned to an attribute of the object. In
Python terms, the model uses property descriptors to
enhance the behavior of attribute assignment.
Property declarations act as intermediaries between the application and the datastore. They can ensure that only values that meet certain criteria are assigned to properties. They can assign default values when constructing an object. They can even convert values between a data type used by the application and one of the datastore’s native value types, or otherwise customize how values are stored.
The db.StringProperty
declaration has a feature
that always trips me up, so I’m mentioning it here. By default, a string
property value enforced by this declaration cannot contain newline
characters. If you want to allow values with newline characters, specify
the multiline=True
argument to the declaration:
prop = db.StringProperty(multiline=True)
This feature corresponds with a similar feature in the Django web
application framework, which is used to help ensure that text fields in
forms don’t accidentally contain newline characters. This is not a
restriction of the App Engine datastore, it is merely the default
behavior of db.StringProperty
.
db.StringProperty
is an example of a property
declaration class. There are several property declaration classes
included with the Python SDK, one for each native datastore type. Each
one ensures that the property can only be assigned a value of the
corresponding type:
class Book(db.Model): title = db.StringProperty() b = Book() b.title = 99 # db.BadValueError, title must be a string b.title = 'The Grapes of Wrath' # OK
Table 9-1 lists the datastore native value types and their corresponding property declaration classes.
Table 9-1. Datastore property value types and the corresponding property declaration classes
You can customize the behavior of a property declaration
by passing arguments to the declaration’s constructor. We’ve already
seen one example: the required
argument.
All property declaration classes support the required
argument. If True
, the property is required and must not be
None
. You must provide an initial value for each required
property to the constructor when creating a new object. (You can provide
an initial value for any property this way.)
class Book(db.Model): title = db.StringProperty(required=True) b = Book() # db.BadValueError, title is required b = Book(title='The Grapes of Wrath') # OK
The datastore makes a distinction between a property that is not
set and a property that is set to the null value (None
).
Property declarations do not make this distinction, because all declared
properties must be set (possibly to None
). Unless you say
otherwise, the default value for declared properties is
None
, so the required
validator treats the
None
value as an unspecified property.
You can change the default value with the default
argument. When you create an object without a value for a property that
has a default value, the constructor assigns the default value to the
property.
A property that is required and has a default value uses the
default if constructed without an explicit value. The value can never be
None
:
class Book(db.Model): rating = db.IntegerProperty(default=1) b = Book() # b.rating == 1 b = Book(rating=5) # b.rating == 5
By default, the name of the class attribute is used as the name of
the datastore property. If you wish to use a different name for the
datastore property than is used for the attribute, specify a
name
argument. This allows you to use names already taken
by the API for class or instance attributes as datastore
properties:
class Song(db.Model): song_key = db.StringProperty(name='key') s = Song() s.song_key = 'C# min' # The song_key attribute is stored as the # datastore property named 'key'. s.put()
You can declare that a property should contain only one of a fixed
set of values by providing a list of possible values as the
choices
argument. If None
is not one of the
choices, this acts as a more restrictive form of required
,
and therefore, the property must be set to one of the valid choices by
using a keyword argument to the constructor:
_KEYS = ['C', 'C min', 'C 7', 'C#', 'C# min', 'C# 7', # ... ] class Song(db.Model): song_key = db.StringProperty(choices=_KEYS) s = Song(song_key='H min') # db.BadValueError s = Song() # db.BadValueError, None is not an option s = Song(song_key='C# min') # OK
All of these features validate the value assigned to a property,
and raise a db.BadValueError
if the
value does not meet the appropriate conditions. For even greater control
over value validation, you can define your own validation function and
assign it to a property declaration as the validator
argument. The function should take the value as an argument, and raise a
db.BadValueError
(or an exception of your choosing) if the
value should not be allowed:
def is_recent_year(val): if val < 1923: raise db.BadValueError class Book(db.Model): copyright_year = db.IntegerProperty(validator=is_recent_year) b = Book(copyright_year=1922) # db.BadValueError b = Book(copyright_year=1924) # OK
In Chapter 6, we mentioned that
you can set properties of an entity in such a way that they are
available on the entity, but are considered unset for the purposes of
indexes. In the Python API, you establish a property as nonindexed by
using a property declaration. If the property declaration is given an
indexed
argument of False
, entities created
with that model class will set that property as nonindexed:
class Book(db.Model): first_sentence = db.StringProperty(indexed=False) b = Book() b.first_sentence = "On the Internet, popularity is swift and fleeting." b.put() # Count the number of Book entities with # an indexed first_sentence property... c = Book.all().order('first_sentence').count(1000) # c = 0
Several property declaration classes include features for setting values automatically.
The db.DateProperty
, db.DateTimeProperty
, and db.TimeProperty
classes can populate the
value automatically with the current date and time. To enable this
behavior, you provide the auto_now
or
auto_now_add
arguments to the property declaration.
If you set auto_now=True
, the declaration class
overwrites the property value with the current date and time when you
save the object. This is useful when you want to keep track of the last
time an object was saved:
class Book(db.Model): last_updated = db.DateTimeProperty(auto_now=True) b = Book() b.put() # last_updated is set to the current time # ... b.put() # last_updated is set to the current time again
If you set auto_now_add=True
, the property is set to
the current time only when the object is saved for the first time.
Subsequent saves do not overwrite the value:
class Book(db.Model): create_time = db.DateTimeProperty(auto_now_add=True) b = Book() b.put() # create_time is set to the current time # ... b.put() # create_time stays the same
The db.UserProperty
declaration class also
includes an automatic value feature. If you provide the argument
auto_current_user=True
, the value is set to the user
accessing the current request handler if the user is signed in. However,
if you provide auto_current_user_add=True
, the value is
only set to the current user when the entity is saved for the first
time, and left untouched thereafter. If the current user is not signed
in, the value is set to None
:
class BookReview(db.Model): created_by_user = db.UserProperty(auto_current_user_add=True) last_edited_by_user = db.UserProperty(auto_current_user=True) br = BookReview() br.put() # created_by_user and last_edited_by_user set # ... br.put() # last_edited_by_user set again
At first glance, it might seem reasonable to set a default for a
db.UserProperty
this
way:
from google.appengine.api import users class BookReview(db.Model): created_by_user = db.UserProperty( default=users.get_current_user()) # WRONG
This would set the default value to be the user who is signed in when the class is imported. Subsequent requests handled by the instance of the application will use a previous user instead of the current user as the default.
To guard against this mistake, db.UserProperty
does not accept the default
argument. You can use only
auto_current_user
or auto_
current_user_add
to set an automatic
value.
The data modeling API provides a property declaration
class for multivalued properties, called db.ListProperty
. This class ensures that
every value for the property is of the same type. You pass this type to
the property declaration, like so:
class Book(db.Model): tags = db.ListProperty(basestring) b = Book() b.tags = ['python', 'app engine', 'data']
The type argument to the db.ListProperty
constructor must be the
Python representation of one of the native datastore types. Refer back
to Table 5-1 for a complete list.
The datastore does not distinguish between a multivalued property
with no elements and no property at all. As such, an undeclared property
on a db.Expando
object can’t store the empty list. If it
did, when the entity is loaded back into an object, the property simply
wouldn’t be there, potentially confusing code that’s expecting to find
an empty list. To avoid confusion, db.Expando
disallows assigning an empty list
to an undeclared property.
The db.ListProperty
declaration makes it possible
to keep an empty list value on a multivalued property. The declaration
interprets the state of an entity that doesn’t have the declared
property as the property being set to the empty list, and maintains that
distinction on the object. This also means that you cannot assign
None
to a declared list property—but this isn’t of the
expected type for the property anyway.
The datastore does distinguish between a
property with a single value and a multivalued property with a single
value. An undeclared property on a db.Expando
object can store a list with one
element, and represent it as a list value the next time the entity is
loaded.
The example above declares a list of string values.
(basestring
is the Python base type for str
and unicode
.) This case is so common that the API also
provides db.StringListProperty
.
You can provide a default value to db.ListProperty
, using the
default
argument. If you specify a nonempty list as the
default, a shallow copy of the list value is made for each new object
that doesn’t have an initial value for the property.
db.ListProperty
does not support the
required
validator, since every list property technically
has a list value (possibly empty). If you wish to disallow the empty
list, you can provide your own validator
function that does
so:
def is_not_empty(lst): if len(lst) == 0: raise db.BadValueError class Book(db.Model): tags = db.ListProperty(basestring, validator=is_not_empty) b = Book(tags=[]) # db.BadValueError b = Book() # db.BadValueError, default "tags" is empty b = Book(tags=['awesome']) # OK
db.ListProperty
does not allow
None
as an element in the list because it doesn’t match the
required value type. It is possible to store None
as an
element in a list for an undeclared property.
Property declarations prevent the application from creating an invalid data object, or assigning an invalid value to a property. If the application always uses the same model classes to create and manipulate entities, then all entities in the datastore will be consistent with the rules you establish using property declarations.
In real life, it is possible for an entity that does not fit a model to exist in the datastore. When you change a model class—and you will change model classes in the lifetime of your application—you are making a change to your application code, not the datastore. Entities created from a previous version of a model stay the way they are.
If an existing entity does not comply with the validity
requirements of a model class, you’ll get a db.BadValueError
when you try to fetch the
entity from the datastore. Fetching an entity gets the entity’s data,
then calls the model class constructor with its values. This executes
each property’s validation routines on the data.
Some model changes are “backward compatible” such that old
entities can be loaded into the new model class and be considered valid.
Whether it is sufficient to make a backward-compatible change without
updating existing entities depends on your application. Changing the
type of a property declaration or adding a required property are almost
always incompatible changes. Adding an optional property will not cause
a db.BadValueError
when an old entity is loaded, but if you
have indexes on the new property, old entities will not appear in those
indexes (and therefore won’t be results for those queries) until the
entities are loaded and then saved with the new property’s default
value.
The most straightforward way to migrate old entities to new schemas is to write a script that queries all the entities and applies the changes. We’ll discuss how to implement this kind of batch operation in a scalable way using task queues, in Task Chaining.
You can model relationships between entities by storing entity keys as property values. The Python data modeling interface includes several powerful features for managing relationships.
The db.ReferenceProperty
declaration describes a
relationship between one model class and another. It stores the key of an
entity as the property value. The first argument to the
db.ReferenceProperty
constructor is the model class of the
kind of entity referenced by the property. If someone creates a
relationship to an entity that is not of the appropriate kind, the
assignment raises a db.BadValueError
.
You can assign a data object directly to the property. The property
declaration stores the key of the object as the property’s value to create
the relationship. You can also assign a db.Key
directly:
class Book(db.Model): title = db.StringProperty() author = db.StringProperty() class BookReview(db.Model): book = db.ReferenceProperty(Book, collection_name='reviews') b = Book() b.put() br = BookReview() br.book = b # sets br's 'book' property to b's key br.book = b.key() # same thing
We’ll explain what collection_name
does in a
moment.
The referenced object must have a “complete” key before it can be
assigned to a reference property. A key is complete when it has all its
parts, including the string name or the system-assigned numeric ID. If you
create a new object without a key name, the key is not complete until you
save the object. When you save the object, the system completes the key
with a numeric ID. If you create the object (or a db.Key
)
with a key name, the key is already complete, and you can use it for a
reference without saving it first:
b = Book() br = BookReview() br.book = b # db.BadValueError, b's key is not complete b.put() br.book = b # OK, b's key has system ID b = Book(key_name='The_Grapes_of_Wrath') br = BookReview() br.book = b # OK, b's key has a name db.put([b, br])
A model class must be defined before it can be the subject of a
db.ReferenceProperty
. To declare a reference property that
can refer to another instance of the same class, you use a different
declaration, db.SelfReferenceProperty
:
class Book(db.Model): previous_edition = db.SelfReferenceProperty() b1 = Book() b2 = Book() b2.previous_edition = b1
Reference properties have a powerful and intuitive syntax for accessing referenced objects. When you access the value of a reference property, the property fetches the entity from the datastore by using the stored key, then returns it as an instance of its model class. A referenced entity is loaded “lazily”; that is, it is not fetched from the datastore until the property is dereferenced:
br = db.get(book_review_key) # br is a BookReview instance title = br.book.title # fetches book, gets its title property
This automatic dereferencing of reference properties occurs the first time you access the reference property. Subsequent uses of the property use the in-memory instance of the data object. This caching of the referenced entity is specific to the object with the property. If another object has a reference to the same entity, accessing its reference fetches the entity anew.
db.ReferenceProperty
does another clever thing: it
creates automatic back-references from a referenced object to the objects
that refer to it. If a BookReview
class has a reference
property that refers to the Book
class, the Book
class gets a special property whose name is specified by the
collection_name
argument to the declaration (e.g.,
reviews
). This property is special because it isn’t actually
a property stored on the entity. Instead, when you access the
back-reference property, the API performs a datastore query for all
BookReview
entities whose reference property equals the key
of the Book
. Since this is a single-property query, it uses
the built-in indexes, and never requires a custom index:
b = db.get(book_key) # b is a Book instance for review in b.reviews: # review is a BookReview instance # ...
If you don’t specify a collection_name
, the name of the
back-reference property is the name of the referring class followed by
_set
. If a class has multiple reference properties that refer
to the same class, you must provide a
collection_name
to disambiguate the back-reference
properties:
class BookReview(db.Model): # Book gets a BookReview_set special property. book = db.ReferenceProperty(Book) # Book gets a recommended_book_set special property. recommended_book = db.ReferenceProperty(Book, collection_name='recommended_book_set')
Because the back-reference property is implemented as a query, it incurs no overhead if you don’t use it.
As with storing db.Key
values as properties, neither the
datastore nor the property declaration requires that a reference property
refer to an entity that exists. Dereferencing a reference property that
points to an entity that does not exist raises a db.ReferencePropertyResolveError
. Keys cannot
change, so a relationship is only severed when the referenced entity is
deleted from the datastore.
A reference property and its corresponding back-reference represent a one-to-many relationship between classes in your data model. The reference property establishes a one-way relationship from one entity to another, and the declaration sets up the back-reference mechanism on the referenced class. The back-reference uses the built-in query index, so determining which objects refer to the referenced object is reasonably fast. It’s not quite as fast as storing a list of keys on a property, but it’s easier to maintain.
A common use of one-to-many relationships is to model ownership.
In the previous example, each BookReview
was related to a
single Book
, and a Book
could have many
BookReview
s. The BookReview
s belong to the
Book
.
You can also use a reference property to model a one-to-one relationship. The property declaration doesn’t enforce that only one entity can refer to a given entity, but this is easy to maintain in the application code. Because the performance of queries scales with the size of the result set and not the size of the data set, it’s usually sufficient to use the back-reference query to follow a one-to-one relationship back to the object with the reference.
If you’d prefer not to use a query to traverse the back-reference,
you could also store a reference on the second object back to the first,
at the expense of having to maintain the relationship in two places.
This is tricky, because the class has to be defined before it can be the
subject of a ReferenceProperty
. One option is to use
db.Expando
and an undeclared property for one
of the classes.
A one-to-one relationship can be used to model partnership. A good
use of one-to-one relationships in App Engine is to split a large object
into multiple entities to provide selective access to its properties. A
player might have an avatar image up to 64 kilobytes in size, but the
application probably doesn’t need the 64 KB of image data every time it
fetches the Player
entity. You can create a separate
PlayerAvatarImage
entity to contain the image, and
establish a one-to-one relationship by creating a reference property
from the Player
to the PlayerAvatarImage
. The
application must know to delete the related objects when deleting a
Player
:
class PlayerAvatarImage(db.Model): image_data = db.BlobProperty() mime_type = db.StringProperty() class Player(db.Model): name = db.StringProperty() avatar = db.ReferenceProperty(PlayerAvatarImage) # Fetch the name of the player (a string) a # reference to the avatar image (a key). p = db.get(player_key) # Fetch the avatar image entity and access its # image_data property. image_data = p.avatar.image_data
A many-to-many relationship is a type of relationship between entities of two kinds where entities of either kind can have that relationship with many entities of the other kind, and vice versa. For instance, a player may be a member of one or more guilds, and a guild can have many members.
There are at least two ways to implement many-to-many relationships in the datastore. Let’s consider two of these. The first method we’ll call “the key list method,” and the second we’ll call “the link model method.”
With the key list method, you store a list of entity
keys on one side of the relationship, using a db.ListProperty
. Such a declaration does
not have any of the features of a db.ReferenceProperty
such as back-references or automatic dereferencing, because it does
not involve that class. To model the relationship in the other
direction, you can implement the back-reference feature by using a
method and the Python annotation @property
:
class Player(db.Model): name = db.StringProperty() guilds = db.ListProperty(db.Key) class Guild(db.Model): name = db.StringProperty() @property def members(self): return Player.all().filter('guilds', self.key()) # Guilds to which a player belongs: p = db.get(player_key) guilds = db.get(p.guilds) # batch get using list of keys for guild in guilds: # ... # Players that belong to a guild: g = db.get(guild_key) for player in g.members: # ...
Instead of manipulating the list of keys, you could implement automatic dereferencing using advanced Python techniques to extend how the values in the list property are accessed. A good way to do this is with a custom property declaration. We’ll consider this in a later section.
The key list method is best suited for situations where there
are fewer objects on one side of the relationship than on the other,
and the short list is small enough to store directly on an entity. In
this example, many players each belong to a few guilds; each player
has a short list of guilds, while each guild may have a long list of
players. We put the list property on the Player
side of
the relationship to keep the entity small, and use queries to produce
the long list when it is needed.
The link model method represents each relationship as an entity. The relationship entity has reference properties pointing to the related classes. You traverse the relationship by going through the relationship entity via the back-references:
class Player(db.Model): name = db.StringProperty() class Guild(db.Model): name = db.StringProperty() class GuildMembership(db.Model): player = db.ReferenceProperty(Player, collection_name='guild_memberships') guild = db.ReferenceProperty(Guild, collection_name='player_memberships') p = Player() g = Guild() db.put([p, g]) gm = GuildMembership(player=p, guild=g) db.put(gm) # Guilds to which a player belongs: for gm in p.guild_memberships: guild_name = gm.guild.name # ... # Players that belong to a guild: for gm in g.player_memberships: player_name = gm.player.name # ...
This technique is similar to how you’d use “join tables” in a SQL database. It’s a good choice if either side of the relationship may get too large to store on the entity itself. You can also use the relationship entity to store metadata about the relationship (such as when the player joined the guild), or model more complex relationships between multiple classes.
The link model method is more expensive than the key list method. It requires fetching the relationship entity to access the related object.
Remember that App Engine doesn’t support SQL-style join queries on these objects. You can achieve a limited sort of join by repeating information from the data objects on the link model objects, using code on the model classes to keep the values in sync. To do this with strong consistency, the link model object and the two related objects would need to be in the same entity group, which is not always possible or practical.
If eventual consistency would suffice, you could use task queues to propagate the information. See Chapter 16.
In data modeling, it’s often useful to derive new kinds of objects from other kinds. The game world may contain many different kinds of carryable objects, with shared properties and features common to all objects you can carry. Since you implement classes from the data model as Python classes, you’d expect to be able to use inheritance in the implementation to represent inheritance in the model. And you can, sort of.
If you define a class based on either db.Model
or db.Expando
, you can create other classes that
inherit from that data class, like so:
class CarryableObject(db.Model): weight = db.IntegerProperty() location = db.ReferenceProperty(Location) class Bottle(CarryableObject): contents = db.StringProperty() amount = db.IntegerProperty() is_closed = db.BooleanProperty()
The subclass inherits the property declarations of the parent class.
A Bottle
has five property declarations: weight
,
location
, contents
, amount
, and
is_closed
.
Objects based on the child class will be stored as entities whose
kind is the name of the child class. The datastore has no notion of
inheritance, and so by default will not treat Bottle
entities
as if they are CarryableObject
entities. This is mostly
significant for queries, and we have a solution for that in the next
section.
If a child class declares a property already declared by a parent
class, the class definition raises a db.DuplicatePropertyError
. The data modeling
API does not support overriding property declarations in
subclasses.
A model class can inherit from multiple classes, using Python’s own support for multiple inheritance:
class PourableObject(GameObject): contents = db.StringProperty() amount = db.IntegerProperty() class Bottle(CarryableObject, PourableObject): is_closed = db.BooleanProperty()
Each parent class must not declare a property with the same name as
declarations in the other parent classes, or the class definition raises a
db.DuplicatePropertyError
. However, the
modeling API does the work to support “diamond inheritance,” where two
parent classes themselves share a parent class:
class GameObject(db.Model): name = db.StringProperty() location = db.ReferenceProperty(Location) class CarryableObject(GameObject): weight = db.IntegerProperty() class PourableObject(GameObject): contents = db.StringProperty() amount = db.IntegerProperty() class Bottle(CarryableObject, PourableObject): is_closed = db.BooleanProperty()
In this example, both CarryableObject
and
PourableObject
inherit two property declarations from
GameObject
, and are both used as parent classes to
Bottle
. The model API allows this because the two properties
are defined in the same class, so there is no conflict.
Bottle
gets its name
and location
declarations from GameObject
.
The datastore knows nothing of our modeling classes and
inheritance. Instances of the Bottle
class are stored as
entities of the kind 'Bottle'
, with no inherent knowledge of
the parent classes. It’d be nice to be able to perform a query for
CarryableObject
entities and get back Bottle
entities and others. That is, it’d be nice if a query could treat
Bottle
entities as if they were instances of the parent
classes, as Python does in our application code. We want
polymorphism in our queries.
For this, the data modeling API provides a special base class:
db.PolyModel
. Model classes using this base
class support polymorphic queries. Consider the Bottle
class
defined previously. Let’s change the base class of GameObject
to db.PolyModel
, like so:
from google.appengine.ext.db import polymodel class GameObject(polymodel.PolyModel): # ...
We can now perform queries for any kind in the hierarchy, and get the expected results:
here = db.get(location_key) q = CarryableObject.all() q.filter('location', here) q.filter('weight >', 100) for obj in q: # obj is a carryable object that is here # and weighs more than 100 kilos. # ...
This query can return any CarryableObject
, including
Bottle
entities. The query can use filters on any property of
the specified class (such as weight
from CarryableObject
) or parent classes (such as
location
from GameObject
).
Behind the scenes, db.PolyModel
does three clever things
differently from its cousins:
Objects of the class GameObject
or any of its child
classes are all stored as entities of the kind
'GameObject'
.
All such objects are given a property named class
that represents the inheritance hierarchy starting from the root
class. This is a multivalued property, where each value is the name of
an ancestor class, in order.
Queries for objects of any kind in the hierarchy are translated
by the db.PolyModel
class
into queries for the base class, with additional equality filters that
compare the class being queried to the class
property’s
values.
In short, db.PolyModel
stores information about the
inheritance hierarchy on the entities, then uses it for queries to support
polymorphism.
Each model class that inherits directly from db.PolyModel
is the root of a class hierarchy.
All objects from the hierarchy are stored as entities whose kind is the
name of the root class. As such, your data will be easier to maintain if
you use many root classes to form many class hierarchies, as opposed to
putting all classes in a single hierarchy. That way, the datastore viewer
and bulk loading tools can still use the datastore’s built-in notion of
entity kinds to distinguish between kinds of objects.
The property declaration classes serve several functions in your data model:
The model calls the class when a value is assigned to the property, and the class can raise an exception if the value does not meet its conditions.
The model calls the class to convert from the value type used by the app to one of the core datastore types for storage, and back again.
The model calls the class if no value was assigned to determine an appropriate default value.
Every property declaration class inherits from the db.Property
base class. This class implements
features common to all property declarations, including support for the
common constructor arguments (such as required
,
name
, and indexed
). Declaration classes override
methods and members to specialize the validation and type conversion
routines.
Here is a very simple property declaration class. It accepts any string value, and stores it as a datastore short string (the default behavior for Python string values):
from google.appengine.ext import db class PlayerNameProperty(db.Property): data_type = basestring def validate(self, value): value = super(PlayerNameProperty, self).validate(value) if value is not None and not isinstance(value, self.data_type): raise db.BadValueError('Property %s must be a %s.' % (self.name, self.data_type.__name__)) return value
And here is how you would use the new property declaration:
class Player(db.Model): player_name = PlayerNameProperty() p = Player() p.player_name = 'Ned Nederlander' p.player_name = 12345 # db.BadValueError
The validate()
method takes the value as an argument,
and either returns the value, returns a different value, or raises an
exception. The value returned by the method becomes the
application-facing value for the attribute, so you can use the
validate()
method for things like type coercion. In this
example, the method raises a db.BadValueError
if the
value is not a string or None
. The exception message can
refer to the name of the property by using
self.name
.
The data_type
member is used by the base class. It
represents the core datastore type the property uses to store the value.
For string values, this is basestring
.
The validate()
method should call the superclass’s
implementation before checking its own conditions. The base class’s
validator supports the required
, choices
, and
validator
arguments of the declaration constructor.
If the app does not provide a value for a property when it
constructs the data object, the property starts out with a default
value. This default value is passed to the validate()
method during the object
constructor. If it is appropriate for your property declaration to allow
a default value of None
, make sure your
validate()
method allows it.
So far, this example doesn’t do much beyond db.StringProperty
. This by itself can be
useful to give the property type a class for future expansion. Let’s add
a requirement that player names be between 6 and 30 characters in length
by extending the validate()
method:
class PlayerNameProperty(db.Property): data_type = basestring def validate(self, value): value = super(PlayerNameProperty, self).validate(value) if value is not None: if not isinstance(value, self.data_type): raise db.BadValueError('Property %s must be a %s.' % (self.name, self.data_type.__name__)) if (len(value) < 6 or len(value) > 30): raise db.BadValueError(('Property %s must be between 6 and ' + '30 characters.') % self.name) return value
The new validation logic disallows strings with an inappropriate length:
p = Player() p.player_name = 'Ned' # db.BadValueError p.player_name = 'Ned Nederlander' # OK p = Player(player_name = 'Ned') # db.BadValueError
The datastore supports a fixed set of core value types for
properties, listed in Table 5-1. A property
declaration can support the use of other types of values in the
attributes of model instances by marshaling between the desired type and
one of the core datastore types. For example, the db.ListProperty
class converts between the
empty list of the app side and the condition of being unset on the
datastore side.
The get_value_for_datastore()
method converts the
application value to the datastore value. Its argument is the complete
model object, so you can access other aspects of the model when doing
the conversion.
The make_value_from_datastore()
method takes the
datastore value and converts it to the type to be used in the
application. It takes the datastore value and returns the desired object
attribute value.
Say we wanted to represent player name values within the
application by using a PlayerName
class instead of a simple string. Each player name has a surname and an
optional first name. We can store this value as a single property, using
the property declaration to convert between the application type
(PlayerName
) and a core datastore type (such as
unicode
):
class PlayerName(object): def __init__(self, first_name, surname): self.first_name = first_name self.surname = surname def is_valid(self): return (isinstance(self.first_name, unicode) and isinstance(self.surname, unicode) and len(self.surname) >= 6) class PlayerNameProperty(db.Property): data_type = basestring def validate(self, value): value = super(PlayerNameProperty, self).validate(value) if value is not None: if not isinstance(value, PlayerName): raise db.BadValueError('Property %s must be a PlayerName.' % (self.name)) # Let the data class have a say in validity. if not value.is_valid(): raise db.BadValueError('Property %s must be a valid PlayerName.' % self.name) # Disallow the serialization delimiter in the first field. if value.surname.find('|') != -1: raise db.BadValueError(('PlayerName surname in property %s cannot ' + 'contain a "|".') % self.name) return value def get_value_for_datastore(self, model_instance): # Convert the data object's PlayerName to a unicode. return (getattr(model_instance, self.name).surname + u'|' + getattr(model_instance, self.name).first_name) def make_value_for_datastore(self, value): # Convert a unicode to a PlayerName. i = value.find(u'|') return PlayerName(first_name=value[i+1:], surname=value[:i])
And here’s how you’d use it:
p = Player() p.player_name = PlayerName(u'Ned', u'Nederlander') p.player_name = PlayerName(u'Ned', u'Neder|lander') # db.BadValueError, surname contains serialization delimiter p.player_name = PlayerName(u'Ned', u'Neder') # db.BadValueError, PlayerName.is_valid() == False, surname too short p.player_name = PlayerName('Ned', u'Nederlander') # db.BadValueError, PlayerName.is_valid() == False, first_name is not unicode
Here, the application value type is a PlayerName
instance, and the datastore value type is that value encoded as a
Unicode string. The encoding format is the surname
field,
followed by a delimiter, followed by the first_name
field.
We disallow the delimiter character in the surname by using the
validate()
method. (Instead of disallowing it, we could
also escape it in get_value_for_datastore()
and unescape it
in make_value_for_datastore()
.)
In this example, PlayerName(u'Ned', u'Nederlander')
is stored as this Unicode string:
Nederlander|Ned
The datastore value puts the surname first so that the datastore
will sort PlayerName
values first by surname, then by first
name. In general, you choose a serialization format that has the desired
ordering characteristics for your custom property type. (The core type
you choose also impacts how your values are ordered when mixed with
other types, although if you’re modeling consistently this isn’t usually
an issue.)
When the app constructs a data object and does not provide
a value for a declared property, the model calls the property
declaration class to determine a default value. The base class
implementation sets the default value to None
, and allows
the app to customize the default value in the model, using the
default
argument to the declaration.
A few of the built-in declaration classes provide more
sophisticated default values. For instance, if a db.DateTimeProperty
was set with
auto_now_add=True
, the default value is the current system date and time. (db.DateTimeProperty
uses
get_value_for_
datastore()
to implement
auto_now=True
, so the value is updated whether or
not it has a value.)
The default value passes through the validation logic after it is
set. This allows the app to customize the validation logic and disallow
the default value. This is what happens when required=True
:
the base class’s validation logic disallows the None
value,
which is the base class’s default value.
To specify custom default behavior, override the
default_value()
method. This method takes no arguments and
returns the desired default value.
Here’s a simple implementation of default_value()
for
PlayerNameProperty
:
class PlayerNameProperty(db.Property): # ... def default_value(self): default = super(PlayerNameProperty, self).default_value() if default is not None: return default return PlayerName(u'', u'Anonymous')
In this example, we call the superclass default()
method to support the default
argument to the constructor,
which allows the app to override the default value in the model. If that
returns None
, we create a new PlayerName
instance to be the default value.
Without further changes, this implementation breaks the
required
feature of the base class, because the value of
the property is never None
(unless the app explicitly
assigns a None
value). We can fix this by amending our
validation logic to check self.required
and disallow the
anonymous PlayerName
value if it’s
True
.
If you want the application to be able to control the
behavior of your custom property declaration class, using arguments, you
override the __init__()
method. The method should call the
superclass __init__()
method to enable the features of the
superclass that use arguments (like required
). The
Property
API requires that the verbose_name
property come first, but after that all __init__()
arguments are keyword values:
class PlayerNameProperty(db.Property): # ... def __init__(self, verbose_name=None, require_first_name=False, **kwds): super(PlayerNameProperty, self).__init__(verbose_name, **kwds) self.require_first_name = require_first_name def validate(self, value): value = super(PlayerNameProperty, self).validate(value) if value is not None: # ... if self.require_first_name and not value.first_name: raise db.BadValueError('Property %s PlayerName needs a first_name.' % self.name) # ...
You’d use this feature like this:
class Player(db.Model): player_name = PlayerNameProperty(require_first_name=True) p = Player(player_name=PlayerName(u'Ned', u'Nederlander')) p.player_name = PlayerName(u'', u'Charo') # db.BadValueError, first name required p = Player() # db.BadValueError, default value PlayerName(u'', u'Anonymous') has empty first_name
3.148.144.139