Chapter 4. Metadata and the Alfresco Content Model

In Chapter 3, we took a high-level tour of Alfresco Share and the Share Records Management site. In this chapter, we will look at the Alfresco Content Model and specifically look at the part of the model that is relevant to Records Management.

In this chapter, we will describe:

  • What the Alfresco Content Model is and what elements comprise it
  • How to design, create, and deploy a new content model
  • How the Alfresco Records Management Content Model is structured

This chapter describes the mechanics for entering and configuring the content model within Alfresco. Each of the basic elements of the content model is discussed — types, aspects, properties, constraints, and associations. We will discuss how you can use these content model building blocks to design and build your own model. We'll then show how a new content model can be installed and made available from the Alfresco Share user interface.

Later in the chapter, we will look in detail at the built-in Alfresco Records Management Content Model. The model reveals much about the inner workings of Records Management within Alfresco and it also provides a very useful example of how a very rich content model can be created.

The Alfresco Content Model

Content and metadata storage is a core capability of an enterprise content management system, and it is an area where Alfresco excels. The content model is the framework that prescribes exactly how content data will be stored and how it later can be searched for retrieval. The model describes the structure, the format, and inter-relationships of content. It also provides the framework for organizing content and assigning meaning to it.

While the Alfresco Content Model is built from a very small set of components, the richness and flexibility of those components enable potentially very complex content models to be created.

The content model is actually segmented into a collection of models. For example, Records Management and Workflow are each implemented as separate models.

Each of the individual models contains the description for the specific types of content that can be stored in the repository. Each content type contains a fixed set of metadata properties. Constraints can be applied to properties to limit or to closely define the range of the allowed values for the properties. Associations can also be modeled and associated with types to define relationships between content items such as parent-child relationships or content-to-content references. Dynamic properties and associations can be added at runtime by applying aspects to the content.

When a new piece of content is added to the Alfresco repository, a structure called a node is created to hold the content. Each node gets added to a tree of nodes in the repository and is associated with at least one other node in the tree that acts as its parent. Every node is assigned a content type from the content model. A node can be associated with only a single content type at any one time, although the type of a node could potentially change. Aspects containing additional properties and associations can also be added to or removed from the node at any time.

Note

Alfresco also supports the ability to set ad hoc properties on a node, ones not defined by properties associated with either the type or with applied aspects. Ad hoc properties can be stored as name-value pairs in a generic property bag associated with a node and are called residual properties. While there may be isolated cases where the use of residual properties makes sense, a suggested best practice is to avoid the use of ad hoc properties and to explicitly define all properties that will be needed within the content model.

The model namespace

Creating new content models requires us to assign names to the elements of the models that we define. Our new model must be defined in a way that allows it to globally co-exist with the names used within all other content models that have already been defined.

A common problem that occurs when creating new element names for a content model is to have a name conflict with the name of an element already used by another model definition. Name conflicts can cause the software to not run at all or for data to become accidentally corrupted because of confusion over the naming of the elements.

Suppose, for example, that we decide to add a new property called container to a document type that we define in our new custom model. There would be a problem because that name conflicts with the Alfresco repository system content model that already has a property named container.

To avoid naming conflicts like this between content models, Alfresco uses namespaces. A namespace groups together all the elements of the content model and also provides a way to create names that will guarantee their global uniqueness.

Alfresco namespaces

Namespaces are typically written as URI strings that start with an HTTP address, usually belonging to the author or the author's company, and then followed by a path that describes or organizes the types of elements contained in the namespace. All standard Alfresco namespaces have URIs that start with http://www.alfresco.org. The URI typically ends with the version number for the namespace.

The table below shows a list of standard Alfresco Content Model namespaces. The namespace URIs can be quite long and writing code that appends the namespace URI to model element names everywhere can make for some very verbose and clumsy-looking code.

To avoid having to always append the namespace URI to an element name, namespace prefixes are defined that significantly shorten the namespace reference. So, instead of having to refer to an element like {http://www.alfresco.org/model/system/1.0}container, we can even simply write sys:container. The next table lists the prefixes that are used by convention when referring to Alfresco namespaces. The files defining these models can be found in the tomcatwebappsalfrescoWEB-INFclassesalfrescomodel directory.

Common Prefix

Namespace

Description

alf

http://www.alfresco.org

General Alfresco Namespace

app

http://www.alfresco.org/model/application/1.0

Application Model

bpm

http://www.alfresco.org/mod

el/bpm/1.0

Business Process Model

cm

http://www.alfresco.org/model/content/1.0

Content Domain Model

d

http://www.alfresco.org/model/dictionary/1.0

Data Dictionary Model

fm

http://www.alfresco.org/model/forum/1.0

Forum Model

st

http://www.alfresco.org/model/site/1.0

Site Model

sys

http://www.alfresco.org/model/system/1.0

Repository System Model

dod

http://www.alfresco.org/model/dod5015/1.0

DoD 5015.2 Records Management Model

rma

http://www.alfresco.org/model/recordsmanagement/1.0

Records Management Model

Important namespaces that you'll see frequently referred to are the Content Domain Model and the Dictionary Model. New content models typically inherit from or reuse definitions of these foundational models. You might also notice the Site Model included in this list. The Site Model supports the management of data related to Alfresco Share sites. At the end of the list, there are also two content models that are used by the Alfresco Records Management implementation that we will talk about towards the end of the chapter.

Types

Types in the Alfresco Content Model provide a way to classify content as it is added to the repository. Every node in the repository is assigned a single type, and the type brings along with it a set of properties, associations, and even aspects that are relevant for that kind of content.

Types must be uniquely named and include the namespace prefix at the beginning of the type name. Available elements that are enclosed by the<type> tag for describing the behavior of a type are as follows:

  • title — a title for the type. A text string that documents the type.
  • description — a description for the type. A text string that documents the type.
  • parent — the parent type of this type. Types can inherit from the definition of their parent type. The root type from which all types inherit is called sys:base. Subtypes inherit property, association, and constraint definitions from their parent type. Types can be nested to any depth.
  • archive — a Boolean flag that indicates when nodes of this type are deleted that they are moved to the archive store as a sort of recycle bin area.
  • properties — an element that encloses a list of properties for the type.
  • associations — an element that encloses a list of associations for the type.
  • mandatory-aspect — an element that encloses a list of aspects for the type.
  • includedInSuperTypeQuery — a Boolean that determines if this type is to be searched as part of a query over any of its parent types.
  • overrides an element that encloses a list of properties that override parent properties.

    The following features from parent properties can be overridden:

    • mandatory — a subtype can make a property mandatory, but cannot relax a property declared mandatory by the parent.
    • default — the subtype can override or include a parent default value.
    • constraints — new constraints can be applied to a parent property, but existing constraints inherited from the parent cannot be removed.

Note that when defining both properties and associations for a type, the properties must be listed before the associations. It is also not possible to split the properties within a tag among multiple<properties> tags; only a single<properties> tag can be used within any one type definition. An example of the definition of a content type can be found in the Records Management model for an rma:recordFolder:

<type name="rma:recordFolder">
<title>Record Folder</title>
<parent>cm:folder</parent>
<archive>false</archive>
<properties>
<property name="rma:isClosed">
<title>Record Folder Closed</title>
<description>Indicates whether the folder is closed</description>
<type>d:boolean</type>
<protected>true</protected>
<mandatory>true</mandatory>
<default>false</default>
</property>
</properties>
<mandatory-aspects>
<aspect>cm:titled</aspect>
<aspect>rma:recordComponentIdentifier</aspect>
<aspect>rma:commonRecordDetails</aspect>
<aspect>rma:filePlanComponent</aspect>
</mandatory-aspects>
</type>

Overrides to properties inherited from the parent type can be defined in the subtype as follows:

<type>
...
<overrides>
<property name="cm:autoVersion">
<default>false</default>
</property>
</overrides>
</type>

Properties

Properties are one of the most important components of the definition for types and aspects. All properties in type and aspect definitions are grouped together and enclosed by a single<properties> tag. Each property is uniquely named by including a namespace prefix as the initial part of the name. The property name is an attribute of the property called name, as in<property name="rma:location">.

Available elements that are enclosed by the<property> tag for describing the behavior of a property are as follows:

  • type — the data type of the property value. This element is required.
  • title — a title for the property. A text string that documents the property.
  • description — a description of the property. A text string that documents the property.
  • mandatory — a Boolean flag indicating whether or not the property is mandatory. Mandatory properties must have a value before an attempt to complete a transaction on a node with that property for it to be successful. The mandatory flag is always enforced by the Alfresco web client. The mandatory flag will also be enforced at the server when the<mandatory> tag is further qualified with a true value for the enforced attribute. When enforced is set to false, as in<mandatory enforced="false">, if the property is not set at the time of the transaction, the transaction will not be blocked, but after the transaction is completed, the node will be marked with the sys:incomplete aspect.
  • multiple — a Boolean flag that indicates that the property is able to support multiple values. Multiple values are handled as a list.
  • index — a Boolean flag that indicates that the property will be indexed and searchable. If this flag is true, there are additional elements enclosed by the tag that configure how the indexing will be performed. By selecting not to index some properties, you can save index space. Very often, it is known in advance that some properties will never need to be searched.
    • atomic — a Boolean flag that indicates that the property will be indexed when a transaction on the node with this property completes. The alternative to this is that the property will be indexed as part of a background process that will run after the node transaction is completed. Properties containing binary content are typically indexed in the background.
    • stored — a Boolean value that indicates that the original value of the property before being tokenized should be stored in the index. This should only be done for properties that are expected to be relatively short.
    • tokenized — a value of true, false, or both to indicate that the tokenized value of the property is stored in the index. When the value of the property is processed for indexing, the string will be cleaned, for example, by removing whitespace, and broken into smaller pieces, like individual words. Typically, it is useful to tokenize property values that contain text, but not things like numbers or dates. When the value is both, both the original and the tokenized strings are stored.
  • constraints — the constraints on the allowed values for the property.
  • default — the default value for the property.
  • protected — no child of the content type will be able to override this property.

Every property must be typed. This means that each property is associated with a data type that is defined by the type element. type is the only element of those listed above that is mandatory when defining a property. Alfresco has a wide range of data types available and it's possible to add more if the data type that you need isn't available. However, for most cases, the standard data types offered by Alfresco are most likely sufficient.

Because the core Alfresco software is written in Java, the data types available in a content model parallel very closely the data types available in Java. The following table lists some of the common data types available for use in the Alfresco Content Model. The complete list of Alfresco data types can be found in the file tomcatwebappsalfrescoWEB-INFclassesalfrescomodeldictionaryModel.xml.

Data type name

Java equivalent

Description

d:text

java.lang.String

A text or character string.

d:mltext

Alfresco custom type

Multilingual text. Able to store multiple translations of a text string.

d:content

Alfresco custom type

Arbitrary content stored as a text or binary stream.

d:int

java.lang.Integer

32-bit signed two's complement integer.

d:long

java.lang.Long

64-bit signed two's complement integer.

d:float

java.lang.Float

Single-precision 32-bit IEEE 754 floating point.

d:double

java.lang.Double

Double-precision 64-bit IEEE 754 floating point.

Data type name

Java equivalent

Description

d:date

java.util.Date

Date value.

d:datetime

java.util.Date

Date and time value.

d:boolean

java.lang.Boolean

Boolean data, either true or false.

d:locale

java.util.Locale

Locale to describe a geographical or cultural region.

d:path

Alfresco custom type

A file path.

d:any

java.lang.Object

Any value, regardless of type.

Constraints

Constraints limit the allowed range of values for a property. Within a model XML file, constraints can be defined independently of the definition for any one type or aspect. Constraints defined in this way can then be reused as part of the property definition anywhere within the model.

Note

Note that there is no limit to the number of constraints that can be applied to a property.

<property name="cm:userName">
<type>d:text</type>
<mandatory>true</mandatory>
<constraints>
<constraint ref="cm:userNameConstraint" />
</constraints>
</property>

It is also possible to define an in-line constraint as part of the definition of the property. In this case, the constraint cannot be applied to any other property outside the one in which it is defined. A simple example of this is the following:

<property name="test:constrainedProp">
<type>d:text</type>
<constraints>
<constraint type="LENGTH">
<parameter name="minLength"><value>0</value></parameter>
<parameter name="maxLength"><value>100</value></parameter>
</constraint>
</constraints>
</property>

Types of constraints

Alfresco out of the box supports four types of constraints, which will be discussed in this section.:

REGEX constraint

The REGEX constraint enforces the syntax, spelling, or format for a property value. The constraint expression is written using regular expression syntax. Valid<parameter> names for this constraint are as follows:

  • expression — the regular expression used to evaluate the incoming string.
  • requiresMatch — a Boolean value, set to either true or false, to specify whether the value must match the regular expression or must not match the expression. The default for this parameter is true, that means that the test will fail if the value does not match the regular expression.

An example of a REGEX constraint is cm:filename, which is used for matching valid filenames. This constraint is defined as part of the content model. The definition is shown here:

<constraint name="cm:filename" type="REGEX">
<parameter name="expression">
<value><![CDATA[(.*["*\><?/:|]+.*)|(.*[.]?.*[.]+$)|(.*[ ]+$)]]></value>
</parameter>
<parameter name="requiresMatch"><value>false</value></parameter>
</constraint>

Another simpler example that simply constrains the value of the property to be an all uppercase string is as follows:

<constraint name="test:regexExample" type="REGEX">
<parameter name="expression"><value>[A-Z]*</value></parameter>
<parameter name="requiresMatch"><value>true</value></parameter>
</constraint>

Note

Regular expressions are extremely powerful, but writing one can quickly become quite complex. There are many tutorials available online or books written about how to write them. Resources like http://regexlib.com/ offer a large library of online regular expressions that can be reused and also provide tools for online interactive debugging of regular expressions.

LENGTH constraint

The LENGTH constraint enforces the lengths of strings to be within a range of values. Valid<parameter> names for this constraint are as follows:

  • minLength — the minimum allowed length for the string. The value must be non-negative and less than or equal to maxLength.
  • maxLength — the maximum allowed length for the string. The value must be greater than or equal to the value of minLength.

Consider the following example of a LENGTH constraint where the length of the string for the property value must be between 0 and 100:

<constraint name="test:lengthExample" type="LENGTH">
<parameter name="minLength"><value>0</value></parameter>
<parameter name="maxLength"><value>100</value></parameter>
</constraint>

LIST constraint

The LIST constraint forces the values of a property to be one of the values contained in an enumerated list. Typically, a user will interact with entering the values for a LIST-constrained property by selecting a value from a drop-down list containing all allowed values. Valid<parameter> names for this constraint are as follows:

  • allowedValues — a list of allowed string values for the property. While the values are strings, it is possible for them to represent non-string values.
  • caseSensitive — a Boolean value, set to either true or false. This flag specifies if the case is case-sensitive. This parameter is optional and the default is true.

The Alfresco Content Model implementation for the DoD 5015.2 Records Management specification contains the following example of a LIST constraint:

<constraint name="dod:imageFormatList" type="LIST">
<title>Image Formats</title>
<parameter name="allowedValues">
<list>
<value>Binary Image Interchange Format (BIIF)</value>
<value>GIF 89a</value>
<value>Graphic Image Format (GIF) 87a</value>
<value>Joint Photographic Experts Group (JPEG) (all versions)</value>
<value>Portable Network Graphics (PNG) 1.0</value>
<value>Tagged Image Interchange Format (TIFF) 4.0</value>
<value>TIFF 5.0</value>
<value>TIFF 6.0</value>
</list>
</parameter>
<parameter name="caseSensitive"><value>true</value></parameter>
</constraint>

MINMAX constraint

The MINMAX constraint enforces that a numeric value be within a range of numbers. Valid<parameter> names for this constraint are as follows:

  • minValue — the minimum allowed value for this property. minValue must be less than or equal to the maxValue.
  • maxValue — the maximum allowed value for this property. maxValue must be greater than or equal to the minValue.

An example of a constraint on a numeric property that requires the number to be between 0 and 1000 is shown next:

<constraint name="test:minMaxExample" type="MINMAX">
<parameter name="minValue"><value>0</value></parameter>
<parameter name="maxValue"><value>1000</value></parameter>
</constraint>

Note

Custom constraint types can be written too, but doing that is a task that needs to be done using Java. Built-in constraints are defined by the Java package org.alfresco.repo.dictionary.constraint. <property> values for each constraint correspond to the setter methods of the Java class implementation for the constraint. An example and description on how to do create a custom constraint can be found on the Alfresco wiki: http://wiki.alfresco.com/wiki/Constraints.

Associations

Associations are relationships that are created between two types within the content model. Associations are ultimately realized as relationships between nodes in the repository and are controlled by the types assigned to the nodes. Associations must be uniquely named and include the namespace prefix at the beginning of the association name.

Two types of associations are possible — child associations and peer associations. Both types of associations consider one of the types as the source and the other as the target. The source is the type in which the association is defined.

Peer associations

For brevity, within the Alfresco Content Model, a peer association is simply referred to as an association. Available elements that are enclosed by the<association> tag for describing the behavior of an association are as follows:

  • title — the title of the association. A text string to document the association.
  • description — a description of the association. A text string to document the association.
  • source — an element that groups the parameter elements that define the source of the association:
    • mandatory — a flag that specifies whether having an association is mandatory.
    • many — a flag that specifies whether the source type can be associated with more than one target.
  • target — an element that groups the parameter elements that define the target of the association:
    • class — the allowed type for the target element. Selecting a class like sys:base would allow the target to be any kind of content, since all types inherit from sys:base. This element is required for defining the target.
    • mandatory — a flag that specifies whether having an association is mandatory.
    • many — a flag that specifies whether the source type can be associated with more than one target.

An example of a peer association can be found in contentModel.xml. The association here defines a reference from one item to another piece of content:

<association name="cm:references">
<source>
<role>cm:referencedBy</role>
<mandatory>false</mandatory>
<many>true</many>
</source>
<target>
<class>cm:content</class>
<role>cm:references</role>
<mandatory>false</mandatory>
<many>true</many>
</target>
</association>

Child associations

A child-association is described by the same set of enclosed elements. Additionally, the following two elements are also supported as part of the child-association definition:

  • duplicate — is a Boolean flag, either true or false, that specifies whether or not children of the parent node can have the same name. If it is not allowed, a transaction cannot be committed until this condition is met.
  • propagateTimestamps — is a Boolean flag, either true or false, that specifies when making a change to a child element, that the timestamp of the parent should also be updated.

An example of a child association can be found in the Records Management Content Model. This example shows a holds area that is capable of tracking the holds that have been placed:

<child-association name="rma:holds">
<title>Holds</title>
<source>
<mandatory>false</mandatory>
<many>false</many>
</source>
<target>
<class>rma:hold</class>
<mandatory>false</mandatory>
<many>true</many>
</target>
</child-association>

The mandatory flag is enforced whenever a node with the association is being committed at the end of a transaction. This holds for both<association> and<child-association> tags. If the mandatory flag is true, and if it is enforced, then the commit will fail if the association element does not exist, specified by writing<mandatory enforced="true">. If the mandatory flag is true but not enforced, the commit will succeed, but an aspect called sys:incomplete will be applied to the node.

When the two elements, mandatory and many, are considered together, they define the cardinality of the association. The following table shows how the cardinality can be determined, based on those two elements:

 

mandatory = true

mandatory = false

many = true

1 or more

0 or more

many = false

1

0 or 1

With a child association, if you delete the parent node, the child nodes will be automatically deleted. In a peer association, deleting the source node will break the association, but will not cause any other nodes to be deleted.

Aspects

Aspects are a shorthand method to group together property, association, and constraint definitions. Aspects can be applied to repository nodes, type definitions, or to the definition of other aspects. When an aspect is applied to, for example, a node, the properties and associations defined in the aspect are taken from it and added to those that already exist on the node. Application of aspects to types and to other aspects works in a similar way.

Much of what an aspect does overlaps with the functionality of a type. For example, like types, aspects support inheritance, with the concept of one aspect inheriting from a parent aspect. The one difference between types and aspects is that every node must have one and only one type, while any number of aspects can be applied to a node.

The application of multiple aspects to a node is often compared to multiple inheritance. Aspects can also be thought of as being similar to macros. A macro, once defined, can be reused again by referring to it by its name. In the same sort of way, a common practice in the Alfresco Content Model is to define an aspect and to then apply it to many type and aspect definitions as a mandatory aspect. For example, the aspect cm:titled from the content model is often used in the definition of a type, bringing along with it standard definitions for the properties cm:title and cm:description.

Another advantage of aspects is that they can be dynamically applied at runtime to nodes. For example, when a record is declared within Records Management, only at that time are the properties that are relevant to managing records appended to the node. In this way, only metadata relevant to an object needs to be tracked. Aspects create a clean way to assign metadata to objects and avoid tracking metadata fields that are not relevant to an object.

The definition of an aspect is very similar to that of a type. Aspects must be uniquely named and include the namespace prefix at the beginning of the aspect name. Available elements that are enclosed by the<aspect> tag for describing the behavior of an aspect are as follows:

  • title — a title for the aspect. A text string that documents the aspect.
  • description — a description for the property. A text string that documents the aspect.
  • parent — the parent aspect of this aspect. Aspects can inherit from the definition of their parent aspect. Aspects can be nested to any depth in inheritance.
  • archive — a Boolean flag that indicates when nodes of this aspect are deleted and are moved to the archive store as a sort of recycle bin area.
  • properties — an element that encloses a list of properties for the aspect.
  • associations — an element that encloses a list of associations for the aspect.
  • mandatory-aspect — an element that encloses a list of aspects for the aspect.
  • includedInSuperTypeQuery — a Boolean that determines if this aspect is to be searched as part of a query over any of its parent aspects.
  • overrides — an element that encloses a list of properties that override parent properties.

    The following features can be overridden:

    • mandatory — a child aspect can make a property mandatory, but cannot relax a property declared mandatory by the parent.
    • default — the child aspect can override or include a parent default value.
    • constraints — new constraints can be applied to a parent property, but existing constraints inherited from the parent cannot be removed.

A good example of an aspect that is defined in the Alfresco Records Management Content Model is rma:frozen. This aspect is applied to records that are subject to a hold:

<aspect name="rma:frozen">
<title>Frozen</title>
<properties>
<property name="rma:frozenAt">
<title>Frozen At Date</title>
<type>d:date</type>
<mandatory>true</mandatory>
</property>
<property name="rma:frozenBy">
<title>Frozen By</title>
<type>d:text</type>
<mandatory>true</mandatory>
<index enabled="true">
<atomic>true</atomic>
<stored>false</stored>
<tokenised>false</tokenised>
</index>
</property>
</properties>
<mandatory-aspects>
<aspect>rma:filePlanComponent</aspect>
</mandatory-aspects>
</aspect>

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.224.56.29