25

Corporate Information Services

By Martin Kearn

The capability to store, collaborate on, and share documents and files has been at the heart of SharePoint products and technologies since its inception in 2001. The functionality has gone through several facelifts along the way, but the core principles of using SharePoint as a corporate information management system is really at the foundation of what SharePoint is all about.

This chapter examines some of the new functionality for SharePoint 2010 relating to corporate information services, as well as re-visiting some of the capabilities that have been around for several releases.

This chapter is organized as follows:

  • Designing corporate information services — This discussion covers some high-level information and principles relating to how to design SharePoint-based corporate information services.
  • Working with documents — This section provides some detailed information on key SharePoint features relating to document and information management.
  • Document management in the enterprise — This section provides details on features, best practices, and techniques for information management in the enterprise.
  • Software boundaries — This section discusses some important software limitations relating to large-scale information repositories.

DESIGNING CORPORATE INFORMATION SERVICES

As an architect, it is important to gauge exactly how much effort must go into the design of your corporate information services. SharePoint is a solid platform, and it is relatively simple for most business users to perform basic tasks — such as creating or uploading a document, and managing it. However, it is important to fully understand the needs of your users, and realize how much assistance and direction they will need when using SharePoint.

In some organizations, it may be appropriate to offer only minimal design and configuration, and then let users use SharePoint in the default configuration. However, in other organizations, this level of control is not practical, and the environment must be fine-tuned to suit the needs of the users.

Let's use Microsoft's own internal systems as an example. Microsoft has two main SharePoint-based collaboration systems. The first one is the place where users can go to create their own ad-hoc SharePoint sites. The environment is self-service oriented, and any user can create a site without approval. The sites created are chosen from the default SharePoint site templates, and they are largely unconfigured, apart from a small customization where users must state how business-critical the site is.

The second SharePoint environment is used for managing customer engagements. This environment has a little more “design” in that there are content types and document templates, as well as a more tailored default layout. These sites are created automatically whenever an engagement is set up, and is the official repository for customer documentation and deliverables.

Microsoft is an example of an organization where the corporate information services design is largely invisible to the user, and the underlying product is left open for users to simply use.

At the other end of the scale, some organizations have a need to provide a highly tailored and “locked down” SharePoint environment. This may include some or all of the following configurations:

  • Disabling self-service site creation
  • Providing customized site templates
  • Preventing users from creating new lists and libraries
  • Pre-configuring folder structures inside document libraries
  • Providing content types and document templates
  • Preventing users from adjusting the theme, master page, or web part layout, which includes adding new web parts
  • Mandating document approval and other workflows

There can be many reasons for taking a more locked-down approach, but often it is because IT organizations perceive that SharePoint will be too complex for their users to understand without significant configuration, or they are worried about the environment “getting out of control.”

Deviating from SharePoint's default configuration can be useful to gently introduce users to the SharePoint world, and to make SharePoint seem a bit more like the systems that they are used to.

It may also be useful if the data stored within SharePoint is very sensitive, or must conform to strict standards (for example, in terms of the way documents are written or regulatory compliance).

You have several factors to consider when judging how much to deviate from SharePoint's default configuration. Let's take a closer look at a few of them.

Familiarity of Users with Online Site-Based Collaboration Tools

Many of today's business people are highly familiar with using web-based collaboration tools because of the ever-increasing popularity of websites such as Facebook, MySpace, Yahoo Groups, blogs, and more. This is especially true with “Generation Z.”

Conversely, if your organization has a history of using filesystems with mapped drives and the familiar tree view, the change to a web-based tool may have a huge usability impact. Users may find it difficult to navigate SharePoint because of the lack of over-arching tree view or structure. Users may also struggle to understand the concept of sites and how they relate to documents.

Budget Considerations

What financial resources do you have to be able to create and support any changes that are made to SharePoint's default configurations?

Though many configurations can be applied without having to write code, configuring SharePoint is often complex and costly, especially if you are trying to make configuration changes in a testable, repeatable manner.

Expected Longevity of the Deployment

For how long will the SharePoint environment be in use? Is this just a temporary system, or is it part of a wider business strategy?

It is also worth considering SharePoint's product road map. How far away is the next version of SharePoint, and will its default configuration include any of the customizations you are considering? How will the changes you are making upgrade to the next version? It is no secret that the simplest upgrade procedure is with a platform that has not been modified from the default configuration. Configuration often equates to complication when it comes to upgrading.

Security Requirements

How sensitive is the data that is stored in SharePoint?

If the data is very sensitive, then close attention must be paid to the security design of your corporate information services.

Generally speaking, it is possible to apply very tight security settings to SharePoint 2010 using configuration and no customization. This does not negate the importance of a solid design, strategy, and approach to consistent security and, more importantly, ongoing administration of your SharePoint 2010 sites.

Document Control

How much central control do you want to retain over the documents that are stored within SharePoint?

By default, users can upload any file (within the allowed file types list) to any document library where they have permissions. The base Document content type has only an optional Title field, so the users will not have to fill out a long, complex metadata form to upload their documents.

Although this approach requires the minimal amount of thinking from the user's point of view, it does not take advantage of SharePoint's many features for content types, metadata, and so on.

At the other end of the scale, many companies build rich sets of content types, each using specific document templates. The content types may include specific metadata properties, some of which may be mandatory for the user. This approach requires more upfront work on the part of the user when uploading content to SharePoint libraries, but results in content with better classification information. This makes the overall experience of finding and filtering content much better for the user.

It is important to judge which of these two approaches is most suitable for your user community. It is also important to note that different approaches may be relevant for different repositories. If you consider the Microsoft internal example outlined earlier in the chapter, the area where users can create ad-hoc sites is simple, out-of-the-box SharePoint, and the only default content types are those that come with SharePoint.

The second area that Microsoft users use to store content does have prescribed content types, mandatory metadata, and so on. This is because the content is generally more important because it relates to customer engagements.

Environments Conflicting with SharePoint

SharePoint provides lots of different services related to the storage and collaboration of documents. For many customers, SharePoint may “compete” with one or more existing systems, which are either being run in parallel to SharePoint, or as a legacy system from which the organization is migrating.

If this is the case in your organization, consider how SharePoint could be customized so that users who are familiar with competing systems will find the SharePoint interface and structure to be as intuitive as possible.

Next, let's explore some of the key features that users can enjoy when working with documents in SharePoint 2010.

WORKING WITH DOCUMENTS

This section examines some of the key features related to how users work with documents in SharePoint. All of these features are important considerations when designing your corporate information architecture.

Social Networking

One of the major new feature areas with SharePoint 2010 is the addition of social networking capabilities. There is a plethora of new capabilities in this area, which are covered in more detail in Chapter 28.

Three areas of social networking directly relate to corporate information services:

  • Rating
  • Tagging
  • Note boards

Rating

Users now have the capability to rate documents in a document library. The feature works by allowing each user to provide a simple 1–5 rating for the document by using the familiar star-based rating system that is used in media players and other social tools.

Providing the item has at least one rating, the average will be shown via illuminated stars next to the document, as seen in Figure 25-1. The rating interface is rendered as a field, so it can be seen in any list views or list view web parts. The rating also shows up in Microsoft Office.

This feature is off by default, but it can be enabled on a per-library basis. The first design decision is whether or not to use ratings. Ratings are a powerful way of using the “wisdom of crowds” to highlight popular content. If used intelligently, ratings can be a key feed into systems that highlight popular content and introduce the social network to corporate information management.

images

FIGURE 25-1: Ratings in a document library

However, the rating implementation in SharePoint 2010 is quite basic, and can lead to some undesirable scenarios. Some of these limitations include the following:

  • Lack of weighting — The rating scores are not weighted by how many ratings have actually been given. This means that a document with two ratings of 4 stars each would be shown as the same as a document with 100 ratings of 4 stars each. This lack of weighting means that the rating system can easily be manipulated to show content as being more popular than it really is.
  • 1–5 stars — The rating system uses a score of 1–5 stars. There is no way to change this.
  • Design and presentation — The ratings are shown as stars, and it is not possible to change the way ratings are presented.
  • Rate without reading — The rating system will allow users to rate a document even if they have never opened it. This may be seen as unfair to the author of the document. However, this practice applies to all documents in the system, which means that everyone is subject to the same limitation.
  • Delayed presentation — The back-end rating processing system is a timer job that processes ratings in batches. This means that ratings do not show up immediately after a user has rated a document. This often causes confusion for users.
  • Lack of identification — The ratings are shown to end users in a rolled-up view, which prevents individual identification of specific ratings. This can be seen as both a limitation and a benefit.

The use and acceptance of ratings will really depend on the nature of your users. If your users are familiar with working in social networking systems, they will most likely be used to the rating concept, and understand the limitations of it. However, other types of users may find the rating system an overly open way for users to judge content without having to back up their judgments with comments or identification.

Tagging

The concept of tagging has been present on the Internet for several years in social networking sites such as Facebook and LinkedIn, and now has been introduced to SharePoint for the first time in the 2010 release.

The act of tagging in the context of SharePoint refers to a user applying a keyword to a piece of content. The keyword can typically be any word or phrase, and does not have to relate to a corporate metadata service (though it will resolve to managed terms, if they exist). Once a user has tagged a piece of content, the tag will appear in the user's newsfeed, and, therefore, be shown to any other users who are following the tagging user.

A benefit of tagging is that, when other users are looking at the content that is tagged, they will be able to see the tags that have been set, helping them form a fuller picture of what the community thinks about a given piece of content.

Tags also help search results to become more relevant, because content becomes associated with keywords that the end users have defined (normally making the relevance very high).

Any URL (including external URLs) can be tagged in SharePoint, as well as people, list items, and documents.

Tagging is very deeply integrated into SharePoint's user interface, and cannot easily be “turned off.” Because of this, few design decisions can be made with regard to tagging, and it should be embraced as a core part of your corporate information services design.

Note Boards

As a complementary technology to tags, SharePoint 2010 also introduces the concept of a note board to sites, list items, documents, and people.

Conceptually, note boards are very similar to the wall feature in Facebook in that any user can add a public comment to the content, which will be shown both on the content itself and the user's newsfeed. Note boards are a very useful way of allowing users to comment and provide opinions on content within SharePoint.

Like tags, the note board concept is deeply integrated into SharePoint's interfaces and cannot easily be disabled.

Check-in/Check-out and Versioning

Check in/check-out and versioning have been primary features of SharePoint ever since it was invented in the late 1990s, and remain a core part of the way documents are managed in SharePoint 2010.

The act of checking out is a way of letting other users know that someone is editing a document, because they will see “checked out to Mr. X.” In previous versions of SharePoint, a checked-out document was locked, and no one could edit the document until the user had checked it back in. Although this configuration is possible in SharePoint 2010, it is not the default option.

If a checked-out document is checked in by the user, a new version is created, and users can look at and restore old versions. The number of versions that are kept can be configured by administrators.

There are two numbering schemes to consider:

  • Major versions — Major versions simply increment the version number. For example, the version number will go from 1, to 2, to 3 and so on.
  • Major and minor versions — Minor versioning introduces major and minor versions, and uses a decimal point to differentiate. For example, 1.3, 1.4, and 1.5 are all minor versions, but 2.0 is a major version. Settings can define how minor and major versions are shown to users with differing rights. Minor versions are also known as draft documents.

The way version numbers increment and the format cannot be changed. This is seen as a limitation by some organizations that have more complex versioning schemes.

images Be sure to consider storage when enabling versioning. Each version of a document constitutes a completely separate BLOB object in the database. For example, if 10 versions of a 1 MB file are stored, then 10 MB of storage is required.

By default, new lists and libraries are not configured for versioning, and it is something that must be enabled on a library-by-library basis if required.

The act of enablement is relatively simple on a one-off basis. However, if your customer requires a company-wide policy that states all libraries (new and current) have versioning enabled, you would have to undertake some customization activity to achieve this, because SharePoint does not allow this sort of farm-wide configuration using out-of-the-box tools.

images Farm-wide configuration of versioning settings will require customization activity for your site templates, list definitions, or use of feature receivers.

If your users are using Office 2010, versioning may not be necessary because SharePoint 2010 supports co-authoring when used with Office 2010 on the client, thus removing the need to protect from save conflicts. You learn more about this later in this chapter.

images SharePoint 2010 supports co-authoring when used with Office 2010.

Document libraries can be configured to use workflows such as an approval process. Using this option ensures that draft documents must be approved before they become major versions. Major versions of documents are often exposed to more users than draft versions. Therefore, if you are using major and minor versioning, it makes sense to enable approval.

You have a number of key considerations to take into account when deciding how to design versioning, including the following:

  • Is approval required?
  • Do you have to mandate versioning as a farm-wide policy? Doing this will require customization.
  • Is versioning required at all, especially in scenarios where users have Office 2010 installed?
  • If versioning is required, how will it be configured, how many drafts will be kept, and how will the security be configured?
  • Do not overlook the storage implications of maintaining versions.

Content Types

Content types are the most important aspect of any information services design, because they provide the building blocks that documents are created from. They define the data that is captured during document creation, what happens to the document after creation, and many other factors that affect the overall document life cycle.

Content types define many aspects of a document or list item, including the following:

  • The underlying document file template that is used to create documents
  • Workflows that are associated with any items created from the content type
  • Custom document information panels that will appear in Office for a document created from the content type
  • Information management policies that are applied to items created from the document type
  • Metadata columns that are applied to the content type

All documents and list items are created based on a content type of some description. By default, it is likely to be one of the out-of-the-box content types such as List item or Document that define only basic information, such as a blank template and the Title metadata field.

One of the key design decisions that you must make as an architect is deciding whether you will use custom content types. If you choose to use custom content types, you must establish what types of content the organization uses, and how to use the various features of content types to the best effect.

This can be a time-consuming design process, but it is very important, and when completed effectively, it will provide a great baseline for other information management functions (such as search, record management, workflow, and many others).

Content Type Parentage

Content types support inheritance models, whereby every content type has a parent. If you work all the way up the chain, you'll find the out-of-the-box Item content type. The concept is that if a parent is changed in some way (such as additional metadata columns and so on), all child content types will pick up the change without administrator intervention.

When designing custom content types, it is best practice to establish a “base” content type that is the parent (or grandparent, great-grandparent, and so on) to all other custom content types. Typically, the base content type would not provide a template, but may provide metadata fields, information management policies, and even workflows that apply to all documents in the enterprise.

This approach is good practice even if the base content type does nothing. Having the base content type in place gives the customer flexibility in the future to add broad changes to all documents in the organization. If all content types had an out-of-the-box content type such as Document as the parent, these sorts of changes would involve modifying a content type that is owned by SharePoint. Although this is a supported activity, it is certainly not considered to be best practice.

If you are using a base content type, the next decision is which of the out-of-the-box content types to set as its parent. Common practice in this area is to use the top level Document content type as the parent.

The reason for this is that this content type does not provide anything other than a mandatory Title field (which is required for SharePoint to work properly). Therefore, by choosing Document, you'll minimize the dependencies.

The same conditions apply to the Item content type (Document's parent). However, the Content Organizer Feature only works if documents are based on the Document content type (or a child of it). Later in this chapter, you learn more about the Content Organizer.

Content Type Hub

One of the issues with content types in SharePoint 2007 was that they were scoped at a site or site collection level. This meant that if a content type was configured in a specific site collection, there was no easy way to move it to a different site collection, other than via list and/or site templates.

The best-practice way to address this was to use XML features to add the content types and activate the feature on all site collections where the content was required. This approach worked well for new sites, but changing a content type after deployment was very difficult.

SharePoint 2010 introduced a new feature called the content type hub that addresses these issues. The content type hub is a site collection that houses enterprise-wide content types. The Managed Metadata Service (MMS) application then takes those content types and publishes them to any site collection that is consuming services from the MMS. This allows administrators to maintain a central library of content types that are consumed around the enterprise. Changes to the content types can be made locally in the content type hub, and replicated automatically by the MMS.

The really great thing about this feature is that the content type hub is just a regular site collection that can be created and managed like any other site collection, and the content types themselves are just normal content types. So, no special skills or knowledge are required to manage them. This allows the tasks of management to be assigned to non-administrative users, and leaves the task of replication to the MMS.

images Content types hubs are just regular SharePoint sites, and the content types contained within them are just regular content types. No special skills are required.

When designing content types, it is important to consider and plan how the content type hub will be used and where it will sit. Because the content type hub is just a regular site, it can reside anywhere in the farm.

The correct location of the content type hub will depend on who is responsible for managing content types, and how they intend to access the site. For example, if content type management is a role of the farm administrators, it could reside within the Central Administration web application. Otherwise, it may reside in the web application that is closest to the user that will manage and maintain the content types. However, typically, the Central Administration web application is a great place to store the content type hub.

As with SharePoint 2007, it is still best practice to add the content types to the content type hub via XML features, rather than manual configuration. This approach allows easy re-use across different environments.

Document Sets

One of the features new to SharePoint 2010 is document sets. Document sets are a collection of two or more document types that can be created and managed as a single file. The purpose of this feature is to address scenarios where teams are preparing a closely related group of documents. Examples of this include the following:

  • A sales report that contains the main report, a Visio file for diagrams, and a presentation showing a summary of the report.
  • An expense report that contains a spreadsheet detailing the expense claim, and scanned images of the receipts.

Document sets are enabled on a site collection by activating the Document Sets Feature, as shown in Figure 25-2. This will add a new content type to the site collection called Document Set. Any content types created with this as their parent will be document sets.

images

FIGURE 25-2: A document set

Document sets allow all of the capabilities of regular content types. However, any change to columns, workflow, information management policies, and so on, can be applied to all documents within the document set.

Document sets also have an additional interface called Document Set Settings, which is where the details of the document set are configured. This screen has several options:

  • Available Content Types — This defines which content types are available for use within the document set. By default, this is just the default document content types, but you should consider adding your own in here.
  • Default Content — This allows you to add default files that are included with the document set, and choose which content types they are based on. Typically, a blank version of the document templates will be added here.
  • Shared Columns — This allows you to choose which columns from the document set are shared across all files. This means that if the column is changed once, it will apply to all files in the document set.
  • Welcome Page — When users actually view a document set in the browser, they will receive a welcome page, which may include additional content that provides instructions on how to use the document set. This page can be customized in SharePoint Designer 2010.

If you plan to use document sets, it is imperative that you have a sound content types design first, because document sets will include several content types. It is also important to establish a base document set, just as you would with content types. This way, you can define a core range of settings that apply to any document sets that use the base as a parent. Even if the base document set contains no settings or content, it is still good practice to use a base document set so that broad changes can easily be applied in the future, if required.

Navigating Documents

The task of helping users navigate documents and files has been a key factor for many years. It is one of the more challenging areas of designing corporate information services for architects, because different users will think about documents in different ways, and it is important to accommodate all of the users who will be using your system.

Different Approaches to Locating Documents

Users within an organization may think about locating documents in many different ways. A corporate information services design that works well will understand and provide functionality for each of these approaches. Poorly designed systems will try to impose one or more of these approaches on users, thus alienating a portion of the user community.

Table 25-1 describes some of the typical user types that are commonly seen in organizations:

TABLE 25-1: Typical User Behaviors in Locating Documents in Organizations

USER TYPE DESCRIPTION
Windows Explorer users These users have been using IT systems for a long time, and have grown up with the all-familiar Windows Explorer tree view and mapped network drives. This group often likes to see documents organized into a structure of some kind that they can expand and collapse, just like they would with a network drive in Windows Explorer.
Facebook users These users are typically from the younger generation, and their primary experience with IT systems is via the use of online social media sites such as Facebook. These users are used to simply placing documents in a fl at location, and using metadata and tagging to help other users locate the files.
“Open in Office” users These users will typically prefer to start creating or loading documents from the desktop application that corresponds with the document they are working on. For example, if a user wants to edit a Word document, he or she will fi rst load Word, and then expect to navigate to the document via Word's File Open dialog.
Search users These users will not generally navigate for documents, but will rely on Search to find what they are looking for. Typically, these users look to read content more than edit it, but this is still an important use case for document authoring, too.

SharePoint Tools for Finding Documents

Fortunately, SharePoint 2010 provides a lot of different ways to group and structure documents, which suits all of the approaches to locating documents outlined in this chapter.

SharePoint fully supports metadata and tagging. Users should be encouraged to provide metadata for all documents, and this can be enforced through the use of mandatory columns on content types. The provision of metadata is useful for lots of reasons, not the least of which is enabling metadata-driven navigation.

Within SharePoint, there are two ways to emulate the tree view approach that Windows Explorer users like to see. The traditional approach is to create a structure of folders and subfolders within a document library. Documents can then be physically placed in the folder that they are most closely related to. Although folders are a core and supported part of SharePoint, the use of folders should generally be discouraged for the following reasons:

  • Files can only exist in a single folder, and one person's classification may not make sense to another person, thus creating a frustrating experience for one of the two people.
  • Unlike Windows Explorer, it is not easy to physically move a document from one folder to another. It is possible via the Windows Explorer View, but a typical end user may not even realize this option exists, and may find using it a long-winded process.
  • The navigation tools provided by SharePoint for navigating folder structures are quite basic, and a user who is used to Windows Explorer may find them fairly limited.

As an alternative to folder-driven navigation, SharePoint 2010 introduced metadata-driven navigation. As shown in Figure 25-3, metadata-driven navigation provides a tree-view style interface that will look very familiar to the Windows Explorer users. However, the critical difference is that documents are placed in the structure based on their metadata, not their physical location.

images

FIGURE 25-3: Metadata-driven navigation tree

This introduces many benefits over folder-driven structures, including the following:

  • Documents can reside in multiple logical locations. Therefore, different users can place them based on what makes sense to them, and others can place the same document in a different logical location.
  • Because it is metadata that drives the structure, it can very easily be changed by simply changing the use of metadata in the documents.
  • The metadata fields will map back to the MMS and can, therefore, be centrally managed terms, rather than any term that a user wishes to use. This enables companies to introduce consistent navigation structures.

If users have Office 2010, they will be able to use both metadata-driven and folder-driven structures to open documents. However, older versions of Office (including Office 2007) support only the folder-driven approach.

Workflow

SharePoint 2010 is built on top of the Windows Workflow Foundation (WF) and has a wide range of functionality related to workflow. This is discussed in detail in Chapter 31.

SharePoint includes several built-in workflows that are designed to facilitate common document-based workflow scenarios, including the following:

  • Disposition approval — Manages document expiration and retention by allowing participants to decide whether to retain or delete expired documents.
  • Three state — Manages a document between three different states that can be defined by administrators.
  • Collect feedback — Routes a document for review. Reviewers can provide feedback, which is compiled and sent to the document owner when the workflow has completed.
  • Gather signatures — Gathers signatures needed to complete a Microsoft Office document.
  • Approval — Routes a document for approval. Approvers can approve or reject the document, reassign the approval task, or request changes to the document.

Each workflow can be configured on a content type or a library basis. However, it is best practice to try to map workflows to content types wherever possible. This way, the content is subject to the same workflow, no matter where it resides, whereas library-based workflow configurations will apply only to the items in the library where it was configured.

Offline

Users have several options for taking SharePoint content offline. The right option will depend on the use case and volume of data.

The simplest offline model is the use of Outlook 2010 to take specific lists offline in the same way that Outlook does with Exchange e-mail. Every list has a “Connect to Outlook” button in the List or Library tab. When users choose to connect a list or library to Outlook, the list will be downloaded as a new folder in Outlook, and be synchronized with the server when users perform a “Send and Receive” operation in Outlook. If the list contains Outlook-style content (such as Tasks, Contacts, and Events), the synchronization is two-way in that users can make changes in Outlook, and those changes are written back to SharePoint on the next synchronization.

In the case of documents, offline changes can be made to Office documents via SharePoint Drafts. This is a special folder in the user's My Documents that is used to store documents that are checked out. Items that are in SharePoint Drafts can be uploaded to SharePoint when the user is online. If a document library is offline in Outlook, and a user opens a document, Office will prompt the user to “edit offline,” at which point it will copy the document from Outlook to SharePoint Drafts and allow the user to edit the document offline.

The final option for taking content offline is SharePoint Workspace, which is a separately installed Office application. SharePoint Workspace 2010 is seen as the premier offline application for SharePoint, and supports many more scenarios than Outlook or SharePoint Drafts.

When considering the offline strategy for SharePoint, it is important to think about the offline scenarios that your users may be facing. Here are some key considerations:

  • Do users primarily read content offline? If so, then Outlook may be a good choice.
  • Do users need to work offline most of the time and perform most of their editing offline? If so, then SharePoint Workspace is a good choice. However, this must be deployed on the user's desktop.
  • How often are users actually offline? With the ever-increasing availability of mobile Internet, Wi-Fi hotspots, and technologies such as Direct Access or virtual private networks (VPNs), users are not fully offline as much as they used to be.
  • What role will mobile devices play in the offline strategy?

Co-authoring in Office 2010

Users will always get a richer experience with the latest version of Office when using it against SharePoint, and Office and SharePoint 2010 are no exception. When combined, users will benefit from a plethora of rich integration features. The most impressive is probably co-authoring.

Using Office 2010, multiple users can simultaneously open and edit Office documents from SharePoint libraries. Although co-authoring is taking place, Office will lock out the section of the file that the user is actively using, and will ensure that no other user can edit that particular section at the same time. However, other users can edit other sections of the document, as well as add review comments.

As shown in Figure 25-4, the integration is so rich that users can even see where other users are in the document, and can communicate directly with them if they have Office Communicator or Lync installed.

This feature is enabled by default, and requires no administrative configuration to use.

images

FIGURE 25-4: Seeing where other users are in a document

Though Office 2010 co-authoring is a great feature, it is only supported for Word 2010, PowerPoint 2010, or OneNote 2010. Excel 2010 is not supported.

Office Web Applications

Over recent years, a wide range of new hardware devices have become popular that do not necessarily have Office or Windows installed on them. To meet this trend, there has been an increasing need for web-based business productivity tools that allow users to perform basic authoring tasks directly through their browsers, without needing any client-side software installed (other than a browser).

Before SharePoint 2010, SharePoint really depended on and assumed there was a version of Microsoft Office installed on your user's desktop in order for the user to perform many of the document and content management tasks. However, in SharePoint 2010, you now have a web-based version of Office that can be used directly through the browser. This is called Office Web Applications (OWA).

images In the past, the acronym OWA has referred to Outlook Web Access, which is a web-based companion to Outlook provided by Exchange server. In the 2010 product family, OWA now refers to the broader set of Office Web Applications, not just Outlook Web Access.

As shown in Figure 25-5, OWAs provide basic editing functionality for Word, Excel, PowerPoint, and OneNote files, which support the majority of a user's core authoring tasks.

images

FIGURE 25-5: Basic editing functionality of an OWA

images OWAs are not a replacement for the full Office client. Instead they should be considered as a simple web companion.

Microsoft employees use both Office and OWA, and find that the two technologies complement each other very well. Office is used for day-to-day authoring, whereas the OWA is very useful if you want to read a document from a search result, or make a quick change to a document that is stored in a SharePoint site.

The OWA is a separate product that is built on top of SharePoint server, and must be installed and configured separately. When installed, the OWA will appear as service applications that can be grouped and accessed like any other service application.

As an architect, you should also consider the performance overhead of running OWAs, because they are a fairly resource-intensive service and, if used, must be factored into the overall server farm design. In some scenarios, it may be appropriate to have dedicated servers for running the OWA services.

When considering the use of OWA, the following factors should be taken into account:

  • Do users need OWA?
  • Do additional resources and/or servers need to be added to support OWA?
  • Will users also have the full Office suite? If so, how will users understand when to use OWA versus when to use Office?

Let's now take a look at how SharePoint can help to manage documents and information in an enterprise-scale environment.

DOCUMENT MANAGEMENT IN THE ENTERPRISE

So far, this chapter has covered some of the end-user functionality that enables corporate information management in SharePoint 2010.

The rest of the chapter focuses on how SharePoint can help to manage content in an enterprise deployment, and introduces more administrative features.

Determining Where Documents Will Live

One of the key decisions architects need to make when planning corporate information services for SharePoint is working out where the documents will reside within the SharePoint system. SharePoint is a highly diverse technology, and has different types of repositories, nearly all of which provide the core capability of storing documents.

The Humble Document Library

The first fact to understand when planning corporate information services is that all SharePoint documents (that is, files) reside in a library of some sort. In SharePoint 2010, libraries come in various shapes and sizes, namely the following:

  • Document library
  • Forms library
  • Picture library
  • Slide library
  • Asset library
  • Data connection library
  • Wiki page library
  • Pages library

All libraries are just SharePoint lists with added functionality that relates to the type of file the library supports.

All libraries share the same core capabilities, just like any list within SharePoint. This was not the case in previous versions of SharePoint.

Libraries (just like any type of list) have the capability to contain folders. Contrary to popular belief, folders are used for more than simply organizing the documents stored within them. They can also be used to apply security information to the contents that may differ from the library itself. Folders can also be used to manage compliance information for the documents stored within.

Of course, folders are also used to organize content and provide the familiar “tree-view” interface that so many users still like to see.

However, in SharePoint 2010, the same “tree view” can be provided by creating virtual “folders” based on the metadata of the items within the library. This is part of the metadata-driven navigation functionality that is discussed later in this chapter.

Sites and Site Collection Structure

One of the other key factors for where documents will live inside SharePoint is your approach to site collections, and site structures within site collections.

Site collections are one of the primary administrative objects in SharePoint. As the name suggests, they are logical groupings of one or more sites. Site collections will always contain at least a root site, but may optionally contain up to 250,000 subsites that can be arranged in a hierarchical structure.

This leads to a key design decision about whether you want to encourage lots of site collections (arranged in a flat structure), or a fewer number of large site collections (with structures built inside them).

There is no best practice in this area, because it really depends on your users and how they will think about this kind of navigation mechanism. However, there has been a recent trend toward creating a larger number of top-level site collections that contain only a small number of subsites. This approach leads to a lot more flexibility with regard to what happens within the site collection, as well as how it is administered. However, the downside is that it becomes difficult to represent a logical structure with this approach.

This approach is also popular because a site collection cannot span databases. Therefore, having a larger number of smaller site collections helps with database sizing and capacity planning.

The site structure decision is key to helping users understand where documents live. This is because most sites in a site collection will have at least one document library, and the method by which users access the site is the first part of how they access the documents in the document library.

Information Management Policies

Information Management Policies (IMPs) are often overlooked when architects design corporate information services for SharePoint. They are often considered to purely relate to record management, and many organizations do not have record management on their radar for initial SharePoint adoption.

Although it is true that IMPs play a big part in record management, they apply much more broadly than just records, and should play a core part of your overall information management strategy.

IMPs can be created at a site collection level and can be applied to any list or library, thus managing the items stored within them. This is one of the biggest misconceptions around IMPs in SharePoint. Many people do not realize that they can apply IMPs to any list item or document, and that IMPs are not exclusively for items in a Record Center.

IMPs are managed at the site collection level in the Site Collection Policies gallery, which is accessible via Site Settings. However, your site must have the “In place record management” feature activated in order to see this option.

Generally speaking, IMPs should be created at the site collection level and then applied to your content types, which, in turn, will apply the policies to items created from the content types. It is possible to create policies directly at the list or library level, but it is better to use content types, because the policy will then apply wherever that content type is used, rather than being bound to a specific list or library.

IMPs define several different policies relating to how the files inheriting the policies are managed. These include the following:

  • Policy statement — The policy statement is displayed to end users when they open items subject to this policy. The policy statement can explain which policies apply to the content, or indicate any special handling or information that users need to be aware of.
  • Retention — This shows the schedule for how content is managed and disposed of by specifying a sequence of retention stages. If you specify multiple stages, each stage will occur one after the other in the order they appear on this page.
  • Auditing — This specifies the events that should be audited for documents and items subject to this policy (as shown in Figure 25-6). These include the following:
    • Opening or downloading documents, viewing items in lists, or viewing item properties
    • Editing items
    • Checking out or checking in items
    • Moving or copying items to another location in the site
    • Deleting or restoring items
  • Barcodes — Assigns a barcode to each document or item. Optionally, Microsoft Office applications can require users to insert these barcodes into documents.
  • Labels — You can add a label to a document to ensure that important information about the document is included when it is printed. To specify the label, type the text you want to use in the “Label format” box. You can use any combination of fixed text or document properties, except calculated or built-in properties such as GUID or CreatedBy.

images

FIGURE 25-6: Defining Auditing policies in an IMP

As an architect, your first consideration should be whether or not you wish to manage documents and files in the general collaboration areas of SharePoint. Doing so will allow greater control over the information being produced. However, it may introduce additional burdens on the users as they provide the necessary information to comply with the policies.

images Regardless of whether or not you use IMPs for general documents, you should consider a Record Center and SharePoint's wider record management capability, which is discussed in depth in Chapter 32.

If you choose to use IMPs for general documents, you must decide on how polices are created and aligned with your content types. As discussed earlier in this chapter, having a sound content type design is critical, and will provide the building blocks for your IMP strategy.

It is also useful to note that you do not have to use all the features available in an IMP. For example, if you just wanted to add a retention policy, but did not require the other features available in IMPs, you simply leave the other features disabled.

Document IDs

SharePoint 2010 introduced a new feature called Document ID, which ensures that each document that is uploaded to SharePoint has a unique, human-readable ID. This can be used to easily reference the document. The advantage of this feature is that the document retains its ID if it is moved to a different location, and it can be referenced via its ID in a location-agnostic manner.

images

FIGURE 25-7: Using the Document ID Feature

To get this functionality, the Document ID Service Feature must be enabled on each site collection. This will add a special column to every document called Document ID, and will contain a randomly generated unique ID for the document, as shown in Figure 25-7.

As an architect, you must decide on how useful this Feature will be to your users. The footprint is relatively low as far as the users are concerned in that users do not need to do anything. However, they will notice an ID field showing up in their document metadata. From a user's perspective, this Feature may add a lot of value for a relatively low cost. But from an overall design perspective, you must be sure that the Feature is enabled on every new site collection in order to ensure a consistent experience.

To achieve this, you may need to look at creating a Site Template Association Feature that activates the Document ID Service Feature upon site creation.

Managed Metadata Service Application

One of the major new feature sets in SharePoint 2010 is the Managed Metadata Service (MMS) application.

The MMS is an all-new service application that provides centrally managed, structured groups of terms available for use in metadata through the SharePoint infrastructure. This is coupled with an intuitive new user interface for tagging content and looking up terms from the MMS.

The MMS also provides content type replication, as discussed in the “Content Type Hub” section earlier in this chapter.

Understanding What a Term Is

A term is the most granular object in the MMS. It refers to a word or phrase that describes something, and might be used to tag documents, people, and other content.

Before SharePoint 2010, the concept of a term did not exist. This meant that the majority of metadata values were entered as free text, which could open up various issues, including the following:

  • Misspellings and typographic errors.
  • Varying use of terminology that means the same thing. For example, the words “car,” “automobile,” and “vehicle” all refer to the same thing.
  • Users' not understanding what data was required for a given field. For example, a field entitled “document name” could refer to the filename of the document, the title of the document, or the type of document.

The difference between a term and normal free text is that a term is managed and can benefit from the following additional management features:

  • A single official term can have multiple alternative labels. This will allow for synonyms, abbreviations, and other alternative ways of describing the same term.
  • A description can help users identify whether or not they are referring to the right term.
  • Language variations enable users to use terms in their native language, but still point to the same official term.
  • The placement in a structure further provides context to the term. (See the “Understanding Term Sets” section later in chapter for more on this.)

The Act of Tagging

When users tag a piece of content, they associate that piece of content with a specific term from the MMS. This association can happen through various channels, such as the following:

  • Completing a managed metadata field that has been defined as part of a content type or a list column
  • Completing a user profile
  • Tagging a page or other URL

When users perform the act of tagging, they will generally do so by simply typing into a text box. The MMS will then make some suggestions of terms that match the character the user has entered. The user can then choose a suggestion, or, if none applies and the term set is open, the user can enter a new term.

This interface is commonly used across all areas that use managed metadata within SharePoint, and is generally referred to as a tagging application. Figure 25-8 shows an example.

images

FIGURE 25-8: Tagging application

Understanding Term Sets

A term set is simply a structured hierarchy of terms that will typically relate to the same topic. Term sets can also be referred to as taxonomy.

images The word “taxonomy” can mean different things to different people, so it is wise to avoid using it when describing managed metadata, or else ensure that you qualify it within the context of your discussion.

As well as simply providing a structure to store and organize terms, a term set has owners, contacts, and description, as well as a submission policy.

As an architect, your first job in designing an MMS is usually to understand what the customer's term sets are likely to be. This will entirely depend on the organization, and what members of the organization typically think about their content.

As an example, let's say that your two main term sets are as follows:

  • Product and Technologies — Contains a structure hierarchy of all Microsoft products and technologies, listed by their formal name, with abbreviations and acronyms used as labels for each term.
  • Regions and Offices — Contains all of Microsoft's physical offices, organized by country and region.

Other common term sets might include languages, document types, products or services, customers, and many more.

Where possible, it is best practice to create term sets around specific topics, rather than trying to use a single term set to represent the organization's entire terminology.

A submission policy is another important consideration for architects because this defines whether end users can submit terms to the term set (open submission policy), or whether they can only choose from predefined terms (closed submission policy). Some factors that may influence the policy here are as follows:

  • How much do term set administrators understand about the real terminology used by the business? If the answer is “not sure” or “not a lot,” then an open policy may work best because it allows the term set to organically grow (sometimes referred to as a folksonomy).
  • Is there a dedicated term set administration process? If not, then it is important to consider how new terms are added if users cannot add them directly.
  • How important is it that terms are authoritative? If the terms represent brand names or official product names, then a closed policy may work best to prevent users from adding incorrect terms that may confuse other users.

It is possible to add a “contact” to term sets. This is an e-mail address that users can use to send e-mail directly from within a tagging application, and suggest terms or provide feedback. This is a useful compromise between open and closed submission policies, because it gives users an alternative route to add terms.

Terms sets are grouped into term groups, which are the highest level in the MMS. Generally, you should try to stick with a single term group unless there is a good reason to have multiple ones. Reasons for multiple term groups might include the following:

  • Securing terms so that they are available only to certain parts of the organization
  • Separating terms groups so that different people or groups can manage them

The management of the MMS and its term sets happens through a Central Administration interface called the Term Store Management Tool, as shown in Figure 25-9.

images

FIGURE 25-9: Term Store Management Tool

Enterprise Keywords

In addition to term sets, terms are also used in the enterprise keyword repository. An enterprise keyword is a feature that can be activated on any site collection that allows users to enter simple keywords as tags against any content where the feature is enabled.

Enterprise keywords can be used in addition to or instead of managed metadata fields. The act of tagging is exactly the same between the two. However, the difference is that enterprise keywords are not bound to any specific term set (or term set group), and allow users to enter anything they like (that is, the submission policy is open and cannot be closed).

If the term that the user enters matches existing managed terms, the interface will offer those terms as suggestions, but the users are under no obligation to use the suggestions.

MMS administrators can take keywords that are used in the keyword repository and do not match existing managed terms and move them into a term set, thus making them managed terms.

The first decision for architects is whether or not to allow the use of enterprise keywords. Enterprise keywords provide a very powerful facility, and can be used to very quickly gain an understanding of the organization's real terminology. However, the lack of control may worry administrators in certain organizations.

When starting out with MMS, many administrators will choose to only use enterprise keywords because this will give them the best indication of what the real terms are within the organization. They can then start to build managed term sets from the keywords that the users are entering on a day-to-day basis.

Document Conversion Services

With SharePoint 2010, it is possible to convert a document from one file type to another. This can be very useful if you must use a batch process for incoming documents for publication on an intranet site or external website.

By default, SharePoint provides several conversion services, including the following:

  • From an InfoPath form to a web page (XML into HTML)
  • From a Word document to a web page (.docx into HTML)
  • From a Word document with macros to a web page (.docm into HTML)
  • From XML to a web page (XML into HTML)

It is possible to write custom conversion services that use the document conversion framework to convert files based on your own criteria.

From an architect's perspective the important information to note in this section is that these services do exist, and they make a great platform for building any conversion application on top of.

Document Center Sites

A Document Center site is a special type of site collection that is designed to facilitate the storage of large volumes of documents. These sites are often used to store “official documents,” and are more for final content than collaborative content.

A Document Center site is really just a standard site collection that is preconfigured with many of the features discussed in this chapter. The site also features some special web parts on the home page that lead users to common document-based activities (such as uploading a document and searching by Document ID). The site also features useful web parts that show the current use of the documents users have recently uploaded or modified, as well as the highest-rated documents.

If your organization has requirements focusing on an enterprise document repository, a Document Center site may be the right solution for them. However, it is important to consider whether requirements are best met by a Document Center or Record Center, because there is a lot of overlap.

Record Centers are discussed in detail in Chapter 32, and, if you are considering Document Center sites, ensure that you have read this chapter so that you can draw a well-balanced conclusion on which is the best fit.

The next section discusses some of the major software boundaries and imitations in SharePoint. These are very important because they may affect your design decisions in large-scale deployments.

SHAREPOINT SOFTWARE BOUNDARIES

Although SharePoint 2010 is certainly an enterprise-ready platform, boundaries exist for how far the platform will scale.

It is important that architects understand these boundaries because they can have a huge impact on how the platform is architected for corporation information services.

images A full list of published software boundaries is published online in a TechNet article called “SharePoint Server 2010 Capacity Management: Software Boundaries and Limits,” which you can view at http://technet.microsoft.com/en-us/library/cc262787.aspx.

The key points that can sometimes trip up architects are as follows:

  • Content database size — This is set at 200 GB per content database. This limitation applies especially when you consider that a single site collection can only be in a single database. Therefore, this effectively puts a limit of 200 GB on a site collection. (There are supported exceptions to this boundary for large sites like Record Centers and Document Centers. Refer to the online article referenced for more detail.)
  • File size — This is set at 2 GB. Although, at first glance, this seems like a high limit, this does rule out SharePoint as a suitable storage repository for large files like some media files or CAD drawings.
  • Major versions — This is set at 400,000. This boundary in itself is quite large. However, the architectural risk here is that this limit is much lower than recommended number of documents in a library (30 million). This means that if you have a library with major versioning enabled, you must work to the version retention boundary, not the overall number of documents boundary.

Most SharePoint deployments will never come anywhere close to the software boundaries, and if they do, the limitations can generally be worked around by altering the design. But the key point here is that these limitations are something you should know about before the system gets broadly adopted, not when it hits the limitation.

SUMMARY

This chapter has provided a broad summary of the features available to architects to help them manage, access, update, and derive value from documents stored in SharePoint 2010.

In this chapter, you have discovered some features that you may not have been familiar with, or are new to SharePoint 2010.

Content types are very important. Many of the corporate information management features are built on top of content types, and this is something that you should put a lot of thought into at the start of your SharePoint journey.

The MMS is a great new addition in SharePoint 2010, and provides a simple, powerful, but intuitive way for users to add metadata to content stored within SharePoint.

There are no “right” or “wrong” choices when it comes to corporate information services design. However, the key things to keep in mind are ensuring that you are making the right choice for your users, that you consider how they will use the technology, and how much of a jump you are asking them to make compared to their existing tools.

Chapter 26 examines some of the new business collaboration capabilities in SharePoint 2010.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.134.118.95