Appendix C. Stanford University Directory Architecture

Environment

The Stanford community consists of approximately 1,500 faculty, 8,000 staff, and 14,000 students. The extended community includes over 25,000 alumni. The university is organized in seven schools, several of which regularly receive top honors in national reviews. Many notable research projects are undertaken in over 100 locations, including the Stanford Linear Accelerator Center (SLAC) and the Stanford Hospital.

This environment demands sophisticated IT resources that can be easily accessed in a distributed computing model. Some IT support is provided centrally, but each school and research project has autonomy and may deploy computing resources. Central IT helps support resources that must have centralized management. The Stanford directory architecture is an example of such a resource.

Stanford employs a network-wide user identity system for authentication that is based on Kerberos. This system is known as the SUNet ID system, and it can be used to access many network services, including e-mail, the directory, Web sites, a Windows infrastructure, and other services.

The Stanford directory architecture has evolved over time to meet Stanford's diverse needs. Some of the elements of the architecture are vendor provided, while others are custom written. This composite nature of the directory architecture, along with the large, diverse environment, provide an interesting example of a data architecture that is worth a closer look.

Source Systems

Stanford has several systems of record. These systems hold authoritative data about Stanford's business. Each of these source systems is owned by offices that are responsible for the data, not the central IT organization. For example, one source system comes from the Registrar, is based on Peoplesoft, and contains authoritative information about students. Another source system comes from Human Resources and contains authoritative information about staff and faculty. Other sources include data from SLAC and the Stanford Hospital. An ID card system for all Stanford-affiliated people also is a source system. This system maps a person's name to a unique ID card number. Figure C-1 shows the relationship between the many source systems and the central repository that integrates each of these source systems. This central repository is called the Stanford Registry.

Stanford source systems

Figure C-1. Stanford source systems

Stanford Registry

The Stanford Registry is neither an LDAP directory nor any other kind of directory, but rather a database. The rationale behind the Registry being a database centers on the purpose and functionality it serves. A database meets several key requirements. Most notable among these requirements is the ability to make a large number of modifications and also to roll back data to a previously known state. The Stanford Registry provides a custom metadirectory functionality by amalgamating all the relevant data in one single repository. The Registry eliminates any potential duplication of information from the multiple sources, and it uses business logic specific to Stanford. For example, imagine a student who also works for the university as a staff member. At least two of the source systems hold authoritative information about this person. The Registry takes all the information and applies a rule that decides which information has more priority. In this example, some of the information from the student source is taken, and some from the staff source. The level of modification activity, reporting, and rollback-commit functionality required led to the decision to use a database for this metadirectory purpose.

The Registry gets information from the source systems via a periodic process involving XML formatted data. Because each of the source systems runs on a different platform, and each has a schema with slight variations from the others, the level of abstraction that XML provides is very useful. The Registry can then use the business rules it has defined to judge which source is ultimately more important or more current and whether values from multiple sources can coexist.

The Stanford Registry is a copy of the authoritative data, and not a referral that points back to the source database or directory. This means that any subsequent use of this data is read-only, or the authority of the source systems is put in jeopardy. So modification of data should be redirected to the authoritative source. However, for a subset of the data from the various source systems, specifically the person-related data set, the Registry is a co-owner of the authoritative data. Changes made to this subset of the data in the Registry propagate back to the source systems, just as all changes propagate to the Registry from the source systems. In other words, person data is replicated both ways.

In addition to information replicated from source systems, the Registry hosts a few other central information repositories. The Organization Registry holds an authoritative table of all the officially recognized departments, schools, and organizations associated with Stanford University. This organization data helps to provide unambiguous name resolution for applications that must differentiate between possibly ambiguous department names. For example, one application might call a department the business school, while another calls it the Graduate School of Business, while still another calls it the GSB. In addition to providing clear names, this data set also authoritatively establishes the hierarchical relationship between each department.

The Workgroup Registry provides a central place to define groups of people, such that the group definition can be reused for multiple services. This is similar to how groups are used in network operating systems like Windows, but it is platform independent so a group definition can be made once and be used by many services uniformly. Both departments and individual users can define groups for their own use.

The Authority Registry is something still in development, but its intent is to provide a central definition of who holds authority for specific responsibilities and administrative tasks. This will tie into the Organization Registry and will be used by network services to provide definition of roles and delegate administration. The Organization, Workgroup, and Authority Registries are incredibly important because the university employs a noncentralized computing administration model, and these repositories help to unify the distributed services that have been deployed by centrally defining groups and roles to make administration and interaction easier.

The Registry must provide privacy controls for information. As mandated by the federal law known as the Family Educational Rights and Privacy Act (FERPA), Stanford is liable for the privacy of student personal data. The university must honor a student's request to protect personal information. The Stanford Registry therefore has privacy settings for applicable data. Access controls are set on personal data attributes to protect the privacy of this data. All subsequent reuse of the data must also employ the same or a stricter level of privacy control.

Privacy Controls

The Registry provides the privacy control in an interesting fashion that is different from traditional access control list (ACL) methods. All users (student or otherwise) can specify three different privacy settings for each piece of information about their person. These settings are: World, Stanford, or Self. A World setting means that the information can be accessed by anyone. A Stanford setting means that the information can be accessed only by people who are members of the Stanford community. A Self setting means that the information is completely private, and only the person can access it. Of course, Stanford business processes and Stanford administrators must access data regardless of these settings to provide basic Stanford services. But these privacy settings ensure that general directory searches respect the rights of the person.

Each of the three privacy settings are placed in a special visibility attribute that is informally associated with the attribute it is intended to protect. For example, the suVisibEmail attribute holds the privacy settings that correspond to the mail attribute for each person entry. Almost every attribute that holds personal information has a corresponding visibility attribute. Even the person's name can be protected. Some attributes are grouped together in logical sets. For example, the suVisibAffiliation attribute protects the affiliation, o, and ou attributes. An other set covers all the personal attributes to simplify situations in which someone wants to treat all the information in the same manner.

These visibility attributes are then used as an authorization factor to determine whether any particular person has authority to access the informally linked attribute(s). Netscape Directory Server supports access control information (ACI) statements that provide this interesting authorization factor functionality. These statements can be associated with any container in the directory; but in Stanford's case, they are set at the root of the directory. The ACI statement allows a content-based access control to be implemented. In other words, the ACI statement specifies that the value of a special attribute of the requestor's binding entry must match a special attribute value of the targeted entry.

For example, imagine that I specify that my e-mail address has a privacy setting of Stanford (suVisibEmail=Stanford). Users who want to access the mail attribute of my entry must have a suPrivilegeGroup attribute on their entry with a value of Stanford to indicate that they are authorized to view my e-mail address. Otherwise, they will not get access. This functionality can be duplicated via traditional ACLs, but ACI statements allow for a much more dynamic application of access control than traditional ACLs do. Stanford's experience with the Netscape Directory Server product has been that the overhead involved with managing and processing attribute-level ACLs is greater than using ACI statements. For contrast, I will show how a comparable visibility is implemented in a traditional ACL model shortly when I turn to the Stanford Windows Infrastructure and Microsoft's Active Directory product.

Once all the data has been unified into the Registry, it is published in an LDAP directory, called the Stanford Directory, for subsequent use by services and applications. The method of moving the data from the Registry to the LDAP directory is a custom-designed process that is very interesting.

Directory Harvester

The directory harvester moves information from the Registry to the master directory server for the Stanford Directory. The directory harvester moves information in close to real time: as an update is made in the Registry, it is also reflected in the Directory. This functionality is enabled with the help of a special event database, which provides notification to the harvester of each change to the registry. The directory harvester is interested in only a subset of the information in the Registry. For example, it is not interested in the organization information, but it is interested in the people information. Stanford has more than one harvester, but the directory harvester is the most critical. It is unique among all the other harvesters: the directory harvester is the only one that retrieves information from the Registry for publication. All the other harvesters retrieve information from the Stanford Directory. These other harvesters tend to feed applications that require their own copy of the information, and can't look up the data via LDAP.

Event Database

The event database provides a way to track each change to an entry in a fairly simple manner. Each change results in an event posted to the Events database. The harvester keeps track of the last event ID it knows about and periodically checks the Events database for new events. So when a new event is posted, the harvester knows about it. The harvester queries the entry noted in the event and creates/deletes/modifies the corresponding directory entry. Events are triggered by each source system, but how each system accomplishes this event posting process differs between systems. For example, one source system parses an audit log of entry modifications every five minutes and creates events based on this information.

Stanford Directory

The Stanford Directory is currently run on the Netscape Directory Server product. A single-master replication model is employed, and this single master replicates the entire directory to two sets of directory servers. The first set of directory servers primarily provides mailbox resolution for the campus e-mail services. The second set of directory servers primarily provides a general white page service via a custom-designed Web interface. Each set provides a failover backup for the other set, but helps to isolate service-intensive load to specific servers so users from one service aren't arbitrarily impacted by other services. Incidentally in the short term, Stanford is actively migrating off Netscape Directory Server onto OpenLDAP. In the longer term, Stanford will closely evaluate each of the products to see which best meets its business requirements.

E-mail Service Integration

Stanford primarily runs a sendmail-based e-mail service in addition to other mail offerings. The sendmail service is integrated to perform its lookup and routing of user SMTP information against the LDAP directory. Usually this information is stored on each individual sendmail server in the form of a database mapping or flat file; but when there are multiple sendmail servers involved, the process of keeping these local mapping files synchronized while also up-to-date can be difficult. Information about how you might integrate your sendmail service with an LDAP directory can be found at http://www.iconimaging.net/~jradford/sendmail/sendmail-ldap.html. Jason Christopher Radford has provided these helpful online tips.

Web UI Integration

Currently at Stanford, directory searches are provided exclusively through a Web interface. In the future, LDAP protocol-based clients may be allowed access. The Web interface, called Stanford.Who, is quite friendly. A Web-based form is provided, and the user can search based on name. You can also designate a person's affiliation (student, staff, faculty) to help refine the name search. Alternatively, you can search based on e-mail address, campus phone number, or Stanford's network ID called the SUNet ID. Results include only the personal information that is publicly accessible. A special Web authentication system tied to the SUNet ID enforces the privacy access controls.

Updating Your Personal Information

In general, users can update their personal information via a Web interface called Stanford.You. This interface provides a portal for users to interface with the Registry (which co-owns their authoritative person data), without needing to know any specifics about the source system or Registry and the software it runs on. Users can view their personal information and modify it as needed. Additionally, users can choose privacy settings in this interface. This is a good example of the loose directory interconnection approach noted in Chapter 5.

Active Directory Harvester

The Active Directory of the Stanford Windows Infrastructure is a subscriber to the Stanford Directory via its own event harvester, as shown in Figure C-2. Stanford chose to harvest a minimum of person-related information to AD, so only name, the primary department affiliation, authorization group information (suPrivilegeGroup), and privacy settings were harvested. The primary department affiliation is used to determine where in the root domain of AD the user's account should reside. A hierarchy of organizational units that mimic the department hierarchy relationship at the university exists in the root domain for the accounts to be created within. A person's primary department affiliation determines the location of the account in this OU hierarchy. As a result, account administration can be easily delegated to the decentralized departmental Windows administrators across campus. The harvester is capable of moving accounts between departmental OUs when the primary departmental affiliation changes.

Active Directory harvester

Figure C-2. Active Directory harvester

As shown in Figure C-2, the password information for a person's account is also written to AD. This is done via a separate process from the harvester, and tight security restrictions are placed on this data. The AD employs a Kerberos realm trust, which along with using the altSecurityIdentities attribute, allows the existing MIT-style Kerberos 5 realm to authenticate all Kerberos ticket granting ticket (TGT) requests from Windows clients. The corresponding Windows account just functions as a shadow proxy account containing the proprietary Microsoft information. The passwords are written to AD to ensure that down-level clients that don't support Kerberos authentication can participate. At a later time when these down-level clients are no longer supported, this password synchronization will be discontinued.

Privacy Control in AD

Active Directory doesn't provide many authorization factors. For example, the ACI statement functionality discussed earlier isn't supported. Active Directory, however, does support inherited ACLs. When a person's entry is created by the AD harvester, it is placed somewhere beneath an Accounts OU. This OU has an inherited ACL that allows only the owner of that entry access to the entry. Inherited ACLs are statically applied in AD, so at the time of creation the setting is copied to the entry. This establishes the minimum level of access that all entries shares.

A special Windows-based service using LDAP code helps establish the more open access settings that people may have chosen. Active Directory supports the persistent search LDAP control, which enables this service to know whenever an entry has been modified. The service then checks the entry for two things, and takes action as needed. First, it creates membership in groups that match the values of the suPrivilegeGroup attribute of the entry. So a World and Stanford group are dynamically maintained by this service with memberships of all the appropriate entries. In actuality, there are far more groups dynamically created and maintained, and these groups correspond to the Workgroup Registry functionality described earlier. But for the purposes of privacy control, focus on just the two groups. Second, the service reads the privacy attributes set on the entry. The service compares the value of each of these attributes to the ACL it finds on the entry. If one of the informally linked attributes needs to have more access given (or access taken away), it has the authority to add an ACE to that entry's ACL. And of course, it uses the groups it is dynamically maintaining. This approach works quite well. If the special Windows service fails, no data is put at risk, because the default setting is more restrictive than the actual privacy desired.

Summary

As has been demonstrated already, a great number of applications and services participate in the overall directory architecture. I've purposely simplified the number of interactions that actually happen, so the general architectural concepts can be shown in a specific real-world environment. I cannot describe fully the schema definitions, data architecture, and directory functionality in the Stanford architecture. Hopefully this snapshot will be useful in illustrating how integration can be accomplished in a real-world setting. I appreciate the opportunity Stanford has allowed me to take in describing its environment.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.191.147.190