Chapter 2. Architecture

WHAT'S IN THIS CHAPTER?

  • Understanding the Content Application Server and its embedded content repository

  • Learning about the Alfresco Web Application Framework, including Spring Surf and Web scripts

  • Exploring deployment options

  • Integrating with the enterprise infrastructure

Alfresco, the product, has grown rapidly since its inception and therefore offers an extensive set of technologies and building blocks for implementing an ECM solution. You can use this chapter as a map for navigating your way through Alfresco and as assistance for choosing the correct approach to solving your business problems.

GUIDING PRINCIPLES

When Alfresco started in early 2005, the founding engineers were very fortunate to begin with a clean slate, which is a rare position for software development teams these days. Many of the engineers had previous experience building content management systems, so it was an ideal opportunity to step back and think deeply about how to approach building a product to support modern-day ECM requirements. Before diving into designing Alfresco, the engineers first set out the following architecture principles, which are still in use today.

Supporting ECM Requirements

Enterprise Content Management (ECM) covers a broad range of applications, including Document Management (DM), Web Content Management (WCM), Records Management (RM), Digital Asset Management (DAM), and Search. The Alfresco architecture is driven by the need to support the requirements of all these applications, resulting in a coherent solution where content and management processes are not forced into silos. Alfresco recognizes that each of these disciplines has unique and overlapping characteristics, so the design of each Alfresco capability is not done in isolation but in the context of the whole system. Failure to support the requirements of ECM is not an option.

Simple, Simple, Simple

Complexity is a barrier to entry. Many ECM deployments do not reach their full potential due to complexities enforced on developers, IT, and users of the solution. Alfresco aims to be as simple as possible to develop against, customize, deploy, and use. The simplest and probably most widely deployed ECM solution is the shared document drive. The Alfresco architecture is driven by the desire to be as simple as a shared drive.

Scaling to the Enterprise

Content is everywhere and growing at an alarming rate. This is why every service and feature of Alfresco is designed up front to scale: scale in terms of size of data set, processing power, and number of users.

A Modular Approach

Unlike many ECM systems, the Alfresco architecture takes a modular approach. Alfresco recognizes that solutions often require a pick and mix of ECM features; therefore, the architecture promotes a system where capabilities are bundled into modules whose implementation may be replaced if required, or not included at all. Cross-cutting concerns are encapsulated through Aspect-Oriented Programming (AOP) techniques. This allows for fine-tuning and optimization of an ECM solution.

Incorporating Best-of-Breed Libraries

Where possible, Alfresco incorporates best-of-breed third-party libraries. The open source nature of Alfresco lends itself to integrating with the wealth of available open source libraries. This is done whenever it is more profitable to integrate than build or whenever expertise is better provided in another project rather than in-house. This approach allowed Alfresco to efficiently build an ECM suite, innovate in multiple areas, and react quickly to market demands.

Environment Independence

ECM is often not the complete solution; it is part of a whole. Alfresco ECM, therefore, does not dictate the environment upon which it depends. You can choose which operating system, database, application server, browser, and authentication system to use when deploying Alfresco. ECM is less about the application and more about the services embedded within an application. You can choose how to package Alfresco—for example, as a Web application, an embedded library, or a portlet.

A Solid Core

The heart of Alfresco ECM is implemented in Java. This decision was driven by the wealth of available Java libraries, monitoring tools, and enterprise integrations. Just as importantly, Java is a trusted runtime for many enterprises wishing to deploy applications in their data centers. Each Alfresco capability is implemented as a black-box Java service tested independently and tuned appropriately.

Scriptable Extensions

There is no single solution for all ECM problems. Alfresco recognizes that its solid core, although very comprehensive, cannot solve all ECM needs. Extensions will always need to be created for custom solutions and there are many custom solutions versus the single Alfresco core. Therefore, Alfresco extension points are developed using JVM-based scripting languages, allowing a much wider pool of developers to build extensions versus those that can contribute to the Alfresco core. Extensions are packaged entities, allowing for the growth of a library of third-party reusable extensions.

A Standards-Based Approach

The Alfresco architecture always complies with standards where applicable and advantageous. Primary concerns are to reduce lock-in, improve integration possibilities, and hook into the ecosystems built around the chosen standards.

An Architecture of Participation

Finally, and most importantly, the Alfresco architecture promotes a system designed for community contribution. In particular, the architecture principles of a solid core, modularity, standards compliance, simplicity of development, and scriptable extensions encourage contribution of plug-ins and custom ECM solutions. Participation complements the open source approach to the development of Alfresco and fosters growth of the Alfresco community. As the community grows, the quality of self-service improves, as well as the quality of feedback to Alfresco. This, in turn, enhances Alfresco and creates the ultimate feedback loop.

A HIGH-LEVEL OVERVIEW

There are many ways to slice and deploy Alfresco; however, most deployments follow a general pattern. Ultimately, Alfresco is used to implement an ECM solution such as DM, WCM, RM, and DAM; across those solutions may also be elements of Collaboration and Search (as shown in Figure 2-1). The solutions are typically split between clients and server, where clients offer users a user interface to the solution, and the server provides content management services and storage. It is common for a solution to offer multiple clients against a shared server, where each client is tailored for the environment in which it is to be used.

Alfresco offers two primary Web-based clients: Alfresco Explorer and Alfresco Share. Alfresco Explorer has been offered since the initial release of Alfresco. A power-user client, it exposes all features of the Alfresco Content Application Server. Alfresco Explorer is implemented using Java Server Faces (JSF) and is highly customizable, but it is only deployable as part of the Alfresco Content Application Server.

Alfresco Share is a recent offering, focusing on the collaboration aspects of content management and streamlining the user experience by introducing features such as Web previews, tagging, and social networks. The central concept of Alfresco Share is the notion of a site: a place where users collaborate on the production of content and publish content. It is implemented using Spring Surf and is customizable without knowledge of JSF. Alfresco Share can also be deployed to its own tier separate from the Alfresco Content Application Server and, ultimately, managed through the Alfresco WCM solution. Over time, Alfresco Share will support all the features of Alfresco Explorer.

FIGURE 2-1

Figure 2.1. FIGURE 2-1

Of course, Alfresco is not available only through its Web clients. To drive user adoption, clients exist for portals (via JSR-168 portlets), mobile platforms (such as Apple iPhone), Microsoft Office, and the desktop (for example, through Flex and Microsoft .NET).

A client that is often overlooked is the folder drive of the operating system. This is probably one of the most common homegrown ECM solutions where users share documents through a network drive. Using JLAN technology, which Alfresco acquired, Alfresco can look and act just like a folder drive. JLAN is the only Java server-side implementation of the CIFS protocol. With this technology, users interact with Alfresco just as they do any other normal file drive, except the content is now stored and managed in the Alfresco Content Application Server.

The Alfresco Content Application Server comprises a content repository and value-added services for building ECM solutions. Within the last few years, the content repository has been defined by the following standards:

  • CMIS (Content Management Interoperability Services)

  • JCR (Java Content Repository / JSR-170/286)

These standards provide a specification for content definition and storage, retrieval of content, versioning, and permissions. Alfresco's content repository complies with both standards, providing a highly reliable, scalable, and efficient implementation. Content stored in Alfresco is, by default, placed into a combination of RDBMS (relational database management system) and file system. Due to the support of standards-based interfaces, there is no risk of content repository lock-in.

Even though standards efforts have gone a long way to define the core building blocks for ECM, there is still a gap between the features provided by the content repository and the requirements of a typical ECM solution. The Alfresco Content Application Server provides the following categories of services built upon the content repository:

  • Content services (for example, transformation, tagging, metadata extraction)

  • Control services (for example, workflow, records management, change sets)

  • Collaboration services (for example, social graph, activities, wiki)

Clients communicate with the Alfresco Content Application Server and its services through numerous supported protocols. Programmatic access is offered through HTTP and SOAP, while application access is offered through CIFS, FTP, WebDAV, IMAP, and Microsoft SharePoint protocols.

The Alfresco installer provides an out-of-the-box prepackaged deployment where the Alfresco Content Application Server (with embedded Alfresco Explorer) and Alfresco Share are deployed as distinct Web applications inside Apache Tomcat and configured for use with MySQL.

THE ALFRESCO CONTENT APPLICATION SERVER

The primary responsibility of the server is to provide a comprehensive set of services for use in building ECM solutions. In many respects, the server is just a black box where you place and manage content. Just like an RDBMS, the Alfresco Content Application Server exposes a set of remote public interfaces for allowing a client to communicate with it (as shown in Figure 2-2).

The remote public interfaces are the only part of the server visible to the client. There are two types: Remote APIs allow programmatic interaction with services of the server and Protocol bindings map those same services for use by a protocol-compliant client.

FIGURE 2-2

Figure 2.2. FIGURE 2-2

Internally, the server comprises several layers. The foundation is a set of infrastructure concerns such as configuration, authentication, permissions, and transactions that cut across all capabilities. Infrastructure also shields the server from being tied to any specific environmental implementation, such as transaction managers or caching mechanisms.

The Alfresco standards-based content repository is then built upon this infrastructure, which itself is the building block for content, control, and collaboration services. Each capability of the content repository and content services is individually bundled as a module with its own in-process interface and implementation. Modules are bound together by the infrastructure through their interfaces.

You can deploy extensions to the Alfresco Content Application Server to extend or override its capabilities. Their implementation may use the in-process interfaces offered by the content repository and content services.

The Content Repository

As already stated, two standards (CMIS and JCR) define what services a content repository should provide. These include (as shown in Figure 2-3):

  • Definition of content structure (modeling)

  • Creation, modification, and deletion of content, associated metadata, and relationships

  • Query of content

  • Access control on content (permissions)

  • Versioning of content

  • Content renditions

  • Locking

  • Events

  • Audits

  • Import/Export

  • Multilingual

  • Rules/Actions

The Alfresco content repository provides a comprehensive implementation of all of these services and exposes each of them through an Alfresco API, CMIS protocol bindings, and the JSR-170 Java API.

At the core of the Alfresco content repository is the storage engine, which is responsible for the storage and retrieval of content, metadata, and relationships. The storage engine operates on the following constructs:

  • Nodes—Provide metadata and structure to content. A node can support properties, such as author, and relate to other nodes such as folder hierarchy and annotations. Parent to child relationships are treated specially.

  • Content—The content to record, such as a Microsoft Word document or an XML fragment.

Content models are registered with the content repository to constrain the structure of nodes and the relationships between them, as well as to constrain property values.

FIGURE 2-3

Figure 2.3. FIGURE 2-3

The storage engine also exposes query capabilities provided by a custom query engine built on Apache Lucene that supports the following search constructs:

  • Metadata filtering

  • Path matching

  • Full text search

  • Any combination of the above

The query engine and storage engines are hooked into the transaction and permission support of the infrastructure, thus offering consistent views and permission access. Several query languages are exposed (as shown in Figure 2-4), including native Lucene, XPath, Alfresco FTS (Full Text Search), and CMIS Query Language (with embedded Alfresco FTS).

FIGURE 2-4

Figure 2.4. FIGURE 2-4

By default, Alfresco stores nodes in an RDBMS while content is stored in the file system. Using a database immediately brings in the benefits of databases that have been developed over many years, such as transaction support, scaling, and administration capabilities. Alfresco uses a database abstraction layer for interacting with the database, which isolates the storage engine from variations in SQL dialect. This eases the database porting effort, allowing the certification of Alfresco against all the prominent RDBMS implementations. Content is stored in the file system to allow for very large content, random access, streaming, and options for different storage devices. Updates to content are always translated to append operations in the file system. This allows for transaction consistency between database and file system.

Content Repository Services are all built upon the storage and query engines. As with the engines, the same infrastructure is shared. The concept of users and groups is introduced into these services, such as recording the author of content, who has content locked, or who has access to content. Implementation of the standards-defined services is packaged into the Alfresco content repository; however, there are two services, also provided, that are worth mentioning outside of the content-repository standards:

  • Multilingual—Support for properties that can store multiple values indexed by locale, as well as support for document translations.

  • Rules/Actions—Support for declaratively defining content management processes that are triggered when adding or updating content in folders. Think email rules. This is particularly powerful when used with clients that interact through protocols such as CIFS and FTP.

You can bundle and deploy the Alfresco content repository itself independently or as part of a greater bundle, such as the Alfresco Content Application Server.

Modularity through a Spring Framework

Looking inside Alfresco reveals a very modular system. Every moving part is encapsulated as a service, where each service provides an external face in a formally defined interface and has one or more black-box implementations. The system is designed this way to allow for:

  • Pick and mix of services for building an ECM solution

  • Reimplementation of individual services

  • Multiple implementations of a service, where the appropriate implementation is chosen based on the context within which the solution is executed

  • A pattern for extending Alfresco (at design and runtime)

  • Easier testing of services

To support this approach, Alfresco employed the Spring framework for its factory, Dependency Injection, and Aspect-Oriented Programming capabilities.

Services are bound together (as shown in Figure 2-5) through their interfaces and configured using Spring's declarative Dependency Injection. An important point here is that a service interface is literally defined as a Java interface. For services that form the internal embedded API for extensions, cross-cutting concerns such as transaction demarcation, access control, auditing, logging, and multitenancy are plugged in through Spring AOP behind the service interface. This means that service implementations are not polluted with these concerns. It also means the cross-cutting concerns may be configured independently or even switched off across the server if, for example, performance is the top-most requirement and the feature is not necessary.

Multiple services are aggregated into an Alfresco subsystem where a subsystem represents a complete coherent capability of the Alfresco server, such as authentication, transformation, and protocols. As a unit, subsystems have their own lifecycle where they may be shut down and restarted while the Alfresco server is running. This is useful to disable aspects of the server, or reconfigure parts of it, such as how LDAP synchronization is mapped. Each subsystem supports its own administration interface that is accessible through property files or JMX.

FIGURE 2-5

Figure 2.5. FIGURE 2-5

Content Services

The Alfresco Content Application Server is more than just a content repository. A significant addition is the set of extended high-value services split into the following categories (as shown in Figure 2-6):

  • Content—Advanced content management capabilities

  • Control—Encapsulation of processes

  • Collaboration—Integration of content into social networks

Whereas a content repository provides a very horizontal general set of capabilities, the purpose of these services is to provide focused building blocks to support the requirements of the various disciplines of content management, such as document management, Web content management, and records management. All of these services are implemented on top of the content repository, follow the modular patterns already described, and share the same infrastructure foundation.

FIGURE 2-6

Figure 2.6. FIGURE 2-6

Something that might seem obvious but not appreciated is that Alfresco can provide a uniform and integrated set of services for all disciplines of content management. Often, ECM architectures are grown through acquisition and combine distinct systems, perhaps each with their own repository. This is not the case with Alfresco, which allows the use of services in isolation or with each other. Although content management disciplines may seem discrete, there is actually a lot of overlap in their requirements and ECM solutions often require a mix of those disciplines anyway.

In summary, the advanced content services comprise the following:

  • Lifecycle—Management of content state over time

  • Transformation—Conversion of content from one type to another

  • Metadata Extraction—Synchronization of document metadata with node metadata

  • Tagging—Arbitrary user-generated tags versus formally defined classifications

Control services encapsulate processes through which content flows and comprise the following:

  • Workflow—Structured process that include a sequence of connected steps, often involving human interactions through allocation of tasks

  • Records—File plans, record types, retention and archival policies, disposition, and reporting, all certified to the DOD 5015.2 standard

  • Change Set—Working area for making safe content modifications

  • Preview—Viewing of content as it should be before publishing

  • Deployment: Publishing of content from one environment to another

Collaboration services integrate the production and publishing of content into social networks:

  • Social Graph—Represents people and their relationship to each other, either directly or indirectly through groups or teams

  • Activities—Continuous personalized feed of activities performed by others in the social graph or by Alfresco

  • Wiki—Easy creation and editing of interlinked Web pages

  • Blog—Log of regularly maintained entries of commentary, events, and other material, such as documents and videos

  • Discussions—Threaded conversations

Alfresco continues to add services in each product release. Of course, there will always be requirements that are not fulfilled by the out-of-the-box services. Due to the modularity and available embedded API of the Alfresco Content Application Server, you can always deploy your own custom services.

Protocols

To assist the adoption and ease of use of Alfresco, the Alfresco Content Application Server supports many folder- and document-based protocols. This allows you to access and manage content held within the content repository using client tools you may already be familiar with. In fact, some users may not even know they are using Alfresco, although the content they produce or consume has been through a process managed by Alfresco.

All the protocol bindings expose folders and documents held in the Alfresco content repository. This means a client tool accessing the repository using the protocol can navigate through folders, examine properties, and read content. Most protocols also permit updates, allowing a client tool to modify the folder structure, create and update documents, and write content. Some protocols go even further and allow interaction with capabilities such as version histories, search, and tasks.

Internally, the protocol bindings interact with the Content Repository Services (as shown in Figure 2-7), which encapsulate the behavior of working with folders and files. This ensures a consistent view and update approach across all client tools interacting with the Alfresco Content Application Server.

Note

An important feature is Rules and Actions, which allows the declarative definition of what happens to content when added to a folder or updated. Interaction through a protocol also adheres to those rules, meaning Alfresco can manage sophisticated processes of which the user of the client tool is completely unaware. For example, you can set up a rule to transform documents that are placed into a specific folder to PDF. This rule is triggered whenever you add a document to that folder using any of the available protocols.

An Alfresco subsystem for file servers allows configuration and lifecycle management for each of the protocols either through property files or JMX.

FIGURE 2-7

Figure 2.7. FIGURE 2-7

Here are the supported protocols:

  • CIFS (Common Internet File System) is a protocol that allows the projection of Alfresco as a native shared file drive. Any client that can read and write to file drives can read and write to Alfresco, allowing the commonly used shared file drive to be replaced with an ECM system without users even knowing. Alfresco acquired the only Java-based CIFS server implementation, known as JLAN.

  • WebDAV (Web-based Distributed Authoring and Versioning) is a set of extensions to HTTP that lets you manage files collaboratively on Web servers. It has strong support for authoring scenarios such as locking, metadata, and versioning. Many content production tools, such as the Microsoft Office suite, support WebDAV. Additionally, there are tools for mounting a WebDAV server as a network drive.

  • FTP (File Transfer Protocol) is a standard network protocol for exchanging and manipulating files over a network. This protocol is particularly useful for bulk loading folders and files into the Alfresco content repository.

  • IMAP (Internet Message Access Protocol) is a prevalent standard for allowing email access on a remote mail server. Alfresco presents itself as a mail server, allowing clients such as Microsoft Outlook, Apple Mail, and Thunderbird to connect to and interact with folders and files held within the Alfresco content repository. Three modes of operation are supported:

    • Archive—Allows the storage of emails in the Alfresco content repository simply by using drag/drop and copy/paste from the IMAP client.

    • Virtual—Folders and files held in the Alfresco content repository are exposed as emails within the IMAP client with the ability to view metadata and trigger actions using links embedded in the email body.

    • Mixed—A combination of the above.

  • Microsoft SharePoint protocol support enables Alfresco to act as a SharePoint server, creating tight integration with the Microsoft Office suite. This allows a user who is familiar with the Microsoft task pane to view and act upon documents held within the Alfresco content repository. The collaborative features of Microsoft SharePoint, such as Shared Workspace, are all mapped to Alfresco Share site capabilities.

APIs

The Alfresco Content Application Server exposes two flavors of API, each of which has been designed for a specific type of client:

  • Remote API—Used by clients to remotely communicate with the Alfresco Content Application Server—specifically, to treat it as a black box

  • Embedded API—Used by extensions that are registered and executed within the Alfresco Content Application Server

The Remote API

The Remote API is the API primarily used when building ECM solutions against the Alfresco Content Application Server. Actually, two styles of Remote API are exposed (as shown in Figure 2-8):

  • Web services—SOAP-based service-oriented interfaces

  • RESTful—HTTP-based resource-oriented interfaces

Alfresco first introduced its Web services API in version 1.0 of its product. It covers many of the core services that the Alfresco Content Application Server provides; however, as demand for SOAP-based interfaces has started to diminish, Alfresco is putting less emphasis on this particular API. One advantage of the Web services API is that there are many tools for building client bindings, covering all of the common environments and programming languages. You can remotely interact with the Alfresco Content Application Server through this interface from anywhere, such as Java, Microsoft .NET, PHP, and Adobe Flex. To ensure such compatibility, behind the scenes Alfresco embeds the Apache CXF engine and performs thorough integration testing. The Web services API also lends itself to orchestration through third-party business process engines, allowing the integration of content services into a wider business process.

FIGURE 2-8

Figure 2.8. FIGURE 2-8

So, if Alfresco is putting less emphasis on its Web services API, what should be used instead? Alfresco introduced its RESTful API in version 2.1 of its product and has since been expanding its scope to cover all services of the Alfresco Content Application Server. Developers tend to prefer the style of this API due to its natural alignment with the way the Web works. If you have an HTTP client then you can communicate with Alfresco, which covers almost every environment and programming language. Other attractions include the ease of use with AJAX-oriented Web clients. Alfresco Share, a Spring Surf–based client, remotely communicates with the Alfresco Content Application Server exclusively through its RESTful API. Behind the scenes, Alfresco embeds Spring Web scripts (contributed by Alfresco) for developing its RESTful API.

The Web services and RESTful APIs provided by Alfresco, although comprehensive, are proprietary APIs. A client implemented against these APIs can only execute against Alfresco, therefore locking out content that may reside in a content repository of another vendor. This issue has been the plague of the ECM industry for many years and is the reason for the introduction of CMIS.

CMIS provides a standardized set of services for working with content repositories. CMIS is not language-specific, it does not dictate how a content repository works, and it does not seek to incorporate every feature of every content repository. Instead, the goal is to define a set of common services for working with content repositories, both Web service (SOAP)– and RESTful–based.

Alfresco provides an implementation of CMIS Web service and RESTful bindings, as well as a CMIS client API for use in Spring Surf and other environments.

CMIS is important, as it provides a focal point for developers to collaborate on, one which is not locked in to any particular content management repository, allowing the growth of tools, utilities, and clients for ECM solutions. Further detail on CMIS is provided in Chapter 4.

The Embedded API

The Embedded API is the API used when developing extensions to the Alfresco Content Application Server. Extensions, which are deployed into the server, are often dependent on existing services provided by the server. Therefore, developers of extensions use the Embedded API to gain access to those services.

The Embedded API comes in several forms, where each form is structured for a particular need or kind of extension (as shown in Figure 2-9):

  • Alfresco Java Foundation API—The set of public Java interfaces exposed by services built into the Alfresco Content Application Server

  • JCR—Standard (JSR-170) set of Java interfaces for interacting with the content repository

  • JavaScript API—An object-oriented view of the Java Foundation API specifically tailored for use in JavaScript

  • FreeMarker API—An object-oriented view of the Java Foundation API specifically tailored for use in FreeMarker templates

  • Content Definition—An API for creating and editing content models

  • Workflow Definition—An API for defining business processes

FIGURE 2-9

Figure 2.9. FIGURE 2-9

This allows the following kinds of extension to be developed, some of which require Java knowledge while others may be scripted:

  • Web Script—Definition and implementation of a RESTful API

  • Action—Encapsulates a process primarily used with rules

  • Transformer—Converts content from one format to another

  • Policy—Event handler registered against an event

  • Service—Encapsulates a set of related features

  • Content Model—Definition of types, aspects, and their relationships

  • Workflow—A business process

Web scripts are an interesting extension as they allow you to define your own custom RESTful API: that is, define your own Remote API for clients to interact with the Alfresco Content Application Server. A Web script implementation may use any of the Embedded APIs, such as the Java Foundation API, JCR, JavaScript, and FreeMarker, for its implementation. Developing your own Remote API is very useful for the following scenarios:

  • Exposing new extension services deployed into the Alfresco Content Application Server to remote clients

  • Providing alternate batching or transaction demarcation of existing services

  • Creating a façade for integration with a third-party tool, such as a Forms engine

In fact, Web scripts have been used for a variety of solutions that were not originally considered when the Web Script Framework was designed, solutions encouraged by the simplicity of implementing a Web script using familiar scripting and MVC approaches. They are a popular extension for the Alfresco Content Application Server.

There is one other use case for the Embedded API. An application or client can directly embed the Alfresco Content Application Server to inherit its suite of content services (as shown in Figure 2-10). As stated before, the infrastructure of the server means it can be deployed into a number of environments, not just as a Web application.

FIGURE 2-10

Figure 2.10. FIGURE 2-10

Essentially, the Alfresco Content Application Server is treated as a library, where any of its services, including the content repository, can be chosen independently or mixed to provide a custom solution. The server can scale down as well as up.

CONTENT MODELING

The Alfresco content repository takes a simplistic approach to representing content and its relationships. A small number of reusable data structures are defined, which allows sophisticated content models to be built up. It also allows the implementation of the content repository to support different physical storage engines depending on requirements such as read versus write performance.

At the core is hierarchical node support. Nodes are entities that can represent anything you want stored in the repository. Each node is uniquely identified and is a container for any number of named properties, where property values can be of any data type, single or multi-valued. Nodes are related to each other through relationships. A special kind of relationship called parent/child exists to represent a hierarchy of nodes where child nodes cannot outlive their parent. You can also create arbitrary relationships between nodes and define different types of nodes and relationships.

Logically, the repository is split into multiple stores where each store contains its own hierarchy of nodes. Nodes can represent anything, but common ECM representations include folders, documents, XML fragments, renditions, collaboration sites, and people (as shown in Figure 2-11).

FIGURE 2-11

Figure 2.11. FIGURE 2-11

The Alfresco content repository provides services for reading, querying, and maintaining nodes. Events are fired on changes, allowing for processes to be triggered. In particular, the content repository provides the following capabilities based on events:

  • Policies—Event handlers registered for specific kinds of node events for either all nodes or nodes of a specific type

  • Rules—Declarative definition of processes based on addition, update, or removal of nodes (for example, the equivalent of email rules)

A content model defines how a node in the content repository is constrained. Each model defines one or more types, where a type enumerates the properties and relationships that a node of that type can support (as shown in Figure 2-12). Often, it is necessary to model concepts that cross multiple types of node, which the Alfresco content repository supports through the notion of an aspect. Although a node can only be of a single type, any number of aspects may be applied to a node. Both data and process can be encapsulated within an aspect, providing a flexible tool for modeling content.

FIGURE 2-12

Figure 2.12. FIGURE 2-12

Models also define kinds of relationships, property data types, and value constraints. A special data type called content is provided to allow a property to hold arbitrary, length binary data.

Within ECM, many patterns and models have emerged and/or been standardized for managing content. The Alfresco Content Application Server comes with many pre-defined models, such as:

  • Folder/Document hierarchy

  • Dublin Core

  • Wiki

  • Blogs

  • Discussions

  • Collaboration Sites

  • DOD 5015.2

All of these models are expressed in the content metamodel (as shown in Figure 2-13), which maps neatly to both the CMIS domain model and JCR node-type model.

FIGURE 2-13

Figure 2.13. FIGURE 2-13

You can define new models for specific ECM use cases, either from scratch or by inheriting definitions from existing models.

THE ALFRESCO WEB APPLICATION FRAMEWORK

It's time to change direction and focus on the Alfresco Web Application Framework. Alfresco Share and all new Web applications from Alfresco are now built on Spring Surf, a Web application framework contributed by Alfresco. This provides the typical features of a framework of this kind but with one very important design goal: to support the needs of Web content management, where the authoring, review, and publishing of Web site content is just as important as how to develop the Web site.

At the heart of Spring Surf is a site, assembly framework (as shown in Figure 2-14) that bundles a full site construction object model and toolkit for building Web sites and applications. Its features are

  • Site Dispatcher—Allows you to easily create pages and link them to the overall navigation of the Web site. It also allows you to build pages in a way that promotes reusability so that components do not need to be built more than once.

  • Templates—Allows you to define a page layout once and then reuse it across a large set of pages. You can develop pages using FreeMarker, JSP, HTML, or Java.

  • UI Library—Reusable UI components that can be bound into regions (or slots) within your page or template. They consist of back-end application logic and front-end presentation code.

  • Pages—Allows for pages to be rendered in multiple formats, such as print format, PDF, or mobile device.

  • AJAX support—Integration with YUI Library.

  • Forms—Rich Forms engine for rendering and collecting data.

FIGURE 2-14

Figure 2.14. FIGURE 2-14

Spring Surf embeds Spring Web scripts, allowing Surf component developers to use the same techniques that were used when building Alfresco Content Application Server RESTful APIs, taking advantage of scripting languages and a simple MVC approach.

Often, a Spring Surf Web site requires access to and management of content held within the Application Content Server, such as to support user-generated content, dynamic site artifacts, personalized presentation, and tagging. To support this, Spring Surf provides the following integration services:

  • Remote—Encapsulates any number of data sources with out-of-the-box support for the Alfresco Content Application Server

  • Credentials—Manages user authentication with out-of-the-box support for the Alfresco Content Application Server

By design, Spring Surf works hand-in-hand with Alfresco Web Content Management and provides virtualized content retrieval, preview, and test support for user sandboxes and Web projects. Applications built with Spring Surf can be deployed from Alfresco Web project spaces to production servers. To help facilitate this, Spring Surf uses a lightweight XML-driven model to represent all site artifacts, such as pages, templates, themes, and chrome. This means a Spring Surf site itself can be managed with Alfresco services such as change sets, preview, and deployment. In addition, an embedded API (as shown in Figure 2-15) is provided to support programmatic control of the same artifacts.

The XML and file-based nature of Spring Surf sites lends itself to being managed in Alfresco WCM (as shown in Figure 2-16), which offers features such as:

  • Safe editing of all Spring Surf artifacts, including the ability to snapshot your site and roll it backward in time

  • Review and Approve workflow of Spring Surf site changes

  • Preview of site changes

  • Deployment of site changes to test or production servers

By offering the Surf Web application framework to Spring, it is envisioned that the community will build many more components, thus enhancing the richness of the framework. In conjunction with the CMIS client API, Spring Surf provides an open, community-backed stack for implementing Web-based content-enabled applications.

FIGURE 2-15

Figure 2.15. FIGURE 2-15

FIGURE 2-16

Figure 2.16. FIGURE 2-16

DEPLOYMENT OPTIONS

As stated at the beginning of this chapter, one of the primary architectural guiding principles is to offer choice to the developer on how they can package Alfresco and to offer choice to those who deploy Alfresco, so they can make appropriate trade-offs to suit their requirements.

Alfresco's modular design and infrastructure foundation provide a platform for allowing Alfresco to be deployed in many different forms and topologies. In particular, the infrastructure foundation protects Alfresco from the environment within which it executes, allowing the choice of components such as operating system, database, application server, Web browser, and authentication system.

It's time to investigate each of the deployment options, starting with the simplest deployment for supporting the smallest footprint and progressing towards the most sophisticated deployments to support large-scale systems. Alfresco is designed to scale down as well as up.

Embedded Alfresco

An embedded Alfresco is contained directly within a host, where the host communicates with Alfresco through its embedded API, meaning the host and Alfresco reside in the same process (as shown in Figure 2-17). Typical hosts include content-rich client applications that require content-oriented storage, retrieval, and services, but can also include hosts such as test harnesses and samples. A client may choose to embed the Alfresco Web Application Framework or Alfresco Content Application Server, or both, treating Alfresco as a third-party library. In any case, the client can pick and mix the services of Alfresco to embed, allowing very small-footprint versions of Alfresco. The host is responsible for the startup and shutdown of Alfresco.

FIGURE 2-17

Figure 2.17. FIGURE 2-17

The Alfresco Content Application Server

An Alfresco Content Application Server is a stand-alone server capable of servicing requests over remote protocols. A single server can support any number of different applications and clients where new applications may be arbitrarily added. Clients communicate with Alfresco through its Remote API and Protocol bindings, although a server may be configured to omit or prohibit specific access points. This type of deployment takes advantage of an application server where Alfresco is bundled as a Web application (as shown in Figure 2-18). Application server features, such as transaction management and resource pooling, are injected into the Alfresco infrastructure foundation, allowing Alfresco to take advantage of them. For example, you can embed the Alfresco Content Application Server inside Apache Tomcat for the lightest-weight deployment, as well as inside Java Enterprise Edition–compliant application servers from JBoss, Oracle, or IBM to take advantage of advanced capabilities such as distributed transactions.

FIGURE 2-18

Figure 2.18. FIGURE 2-18

Clustered Alfresco

To support large-scale systems, Alfresco may be clustered, where multiple Alfresco servers are set up to work with each other, allowing client requests to be fulfilled across a number of processors (as shown in Figure 2-19). Both the Alfresco Web Application Framework and Alfresco Content Application Server can be clustered, allowing each tier to scale out independently.

FIGURE 2-19

Figure 2.19. FIGURE 2-19

Each node of a clustered Alfresco Content Application Server shares the same content repository store, although the store itself may be replicated across the nodes, if required. Caches and search indexes are also distributed, meaning that a clustered content application server looks and acts like a single content application server. Typically, a load balancer is placed in front of the clustered Alfresco Content Application Server to distribute requests across the nodes.

This setup also supports Cloud deployments. In fact, Alfresco provides images and tools for easily deploying a clustered Alfresco Content Application Server across multiple Amazon EC2 virtual nodes.

The Backup Server

This is a special case of the clustered deployment where, in case of failure, an application can switch to a backup version of the deployed stack. Depending upon configuration, the backup version may be available immediately on failure (known as hot backup) or shortly after failure, following some configuration changes (known as warm backup). One of the nodes in the cluster is designated the master, which supports the live application, while the other node is designated the slave, which keeps itself replicated with the master (as shown in Figure 2-20). The slave remains read-only until the point of switchover.

FIGURE 2-20

Figure 2.20. FIGURE 2-20

Multitenancy

Multitenancy allows a single Alfresco Content Application Server (clustered or not) to support multiple tenants, where a tenant such as a customer, company, or organization believes they are the only user of the server as they connect to a logical partition (as shown in Figure 2-21). Physically, all tenants share the same infrastructure, such as deployed nodes in a cluster and content, repository storage. However, data maintained by one tenant cannot be read or manipulated by another tenant.

FIGURE 2-21

Figure 2.21. FIGURE 2-21

A deployment of this type eases administration and reduces the cost associated with maintaining many different applications and user bases, in particular when upgrading core services or performing backups, as this only needs to be done once for all tenants.

Alfresco provides administration tools for managing tenants, including the creation of tenants at runtime. In conjunction with clustering, multitenancy provides an ideal deployment option for the cloud.

THE ENTERPRISE INFRASTRUCTURE

It is not always easy to understand what enterprise class or enterprise scale means, but Alfresco, by design, inherently supports the following capabilities that can be categorized under the enterprise umbrella (as shown in Figure 2-22).

  • Environment agnostic—Choose your stack, including operating system, database, application server, and Web browser.

  • Standards compliant—Interoperate with other parts of the software stack and reuse compliant tools. Alfresco supports standards such as JSR-168 (Portlet API), JSR-170 (JCR API), CMIS, and OpenSearch.

  • Authentication—Manage users and groups either through Alfresco's built-in support or through integration with third-party user and group directories such as LDAP and Active Directory with full and incremental synchronization of user data. Authenticate and support single sign-on against NTLM and Kerberos. Control access and management of content through fine-grained permissions granted to users, groups, and roles.

    FIGURE 2-22

    Figure 2.22. FIGURE 2-22

  • Administration—Administer all aspects of Alfresco through property files or JMX, including the reconfiguration, startup, and shutdown of subsystems at runtime.

  • Clustering—Add nodes to an Alfresco cluster and distribute content storage to support large numbers of documents and users.

  • Backup/Restore—Set up Alfresco topologies that incorporate redundancy or master/slave nodes for warm and hot backup.

  • Audit—Configurable record of actions performed through Alfresco or by users of Alfresco, stored in the database in a form that is simple for third-party reporting tools to consume.

  • Records management—File plans, record types, retention and archival policies, disposition, and reporting, all certified to the DOD 5015.2 standard.

  • Alfresco Network—Alfresco Network is a support portal for registered Alfresco Enterprise users. The Alfresco Content Application Server offers a heartbeat where it periodically sends a record of its health to Alfresco Network for subsequent reporting and pre-emptive care.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.224.52.200