Baker Tech

This large university had numerous mail servers across the campus and wanted to consolidate their infrastructure. They had some experience with directory technology (LDAP), but no single campus-wide directory yet. A single authentication system was being developed around Kerberos. They had no web mail or it varied between the mail systems on campus as to whether it was offered or not. They would like a central mail system with web mail that had failover and used directory technology for user information, but can use their Kerberos servers for authentication. Good Sun and Solaris expertise existed in the IT department as well as throughout the campus, but they had little or no experience with clustering technology. It was necessary to support the customer's existing EMC Symmetric storage system. The customer would use existing SNMP tools to monitor the messaging system.

Baker Tech has:

  • 40,000 students

  • 10,000 faculty, staff, and other employees

A pair of Sun Enterprise 4500 servers with eight CPUs and eight gigabytes of memory was configured as the main mailstores and a pair of Sun Enterprise 280R servers was used for MTA and virus scanning. The architecture was designed to leverage about 1.2 terabytes of the customer's existing EMC storage subsystem and utilize Sun's Sun Cluster 3.0 software for high availability (clustering or failover). Unfortunately the customer still did not have a centralized enterprise Directory, but there were pockets of directory on campus. Additional Netra servers were added to one of their existing directory installations (islands) to support the messaging server's LDAP workload. An open source plug in to the Sun ONE Directory was used to provide Kerberos authentication out the back end of the directory. Figure A-2 shows the Baker Tech architecture configuration.

Figure A-2. Baker Tech Architecture Diagram


User information was already partially available, so the messaging server objects needed to be added to the directory and applied to the users.

They decided to add all new accounts to this system beginning with the next semester after going live, while allowing all other users the option to migrate. This policy would be revisited each year. Backups were integrated with their existing data center backup infrastructure using Legato Backup and a tape library. They elected to do the majority of the implementation themselves due to their experience level with Sun and Solaris, even though they had no experience with the messaging product. This implementation method was not recommended by Sun.

Timeline

The overall project took eight months from start to finish while the initial plan called for an aggressive three-month window. Several factors that contributed to the project delays are outlined the “Lessons Learned” section. The initial purchase from the initial contact to placement of the order took approximately eight weeks even though the customer's internal project plan was designed around a two week purchase cycle. The main delay was due to issues within the purchasing department and the requirements of their procedures and processes. Equipment was delivered to the customer in three weeks once the purchasing issues were resolved.

The initial equipment installation and Solaris set up took approximately a week since the customer had significant Solaris and Sun experience. Then, the installation of the Sun Cluster 3.0 software was started. However, something that should have taken approximately two weeks took almost six weeks due to EMC Symmetric storage unit integration issues. Incorrect adapter cards for the Sun system and incorrect drivers were recommended by EMC and purchased from the customer. After completely swapping out all 10 interface cards and installing the absolute latest driver from EMC for the cards, the EMC storage was able to be attached and failed over without issues. That means that just to get the basic hardware, operating system, and cluster software working took 18 weeks.

Once these initial obstacles and delays were overcome, the actual implementation of the messaging software took approximately two weeks. Load testing, backup restoration testing, and additional testing of the failover process took another three months. This process was started in May and targeted January of the following year, but this schedule was not met and production was delayed until Spring Break of the following year.

Lessons Learned

The following lessons were learned in his case study:

  • SNMP has a failure issue.

    During the failover testing with the messaging product, once the failover was working, an issue existed during failover condition where both messaging instances were operating on the same host and SNMP visibility went away. This was not an major issue for the customer as this is a failover condition. Failure of the SNMP monitoring would further enforce the fact that the systems required attention. This may or may not be the case for all customers.

  • Instrumenting and monitoring is key.

    During the initial testing of failover and load testing, no monitoring was enabled and many statistics were not being collected. Decisions regarding tuning specific parameters later on was difficult due to lack of data. This meant that some load tests had to be rerun once monitoring was enabled.

  • Allow additional time for third-party storage.

    Due to the difficulty and issues encountered, additional time when dealing with third-party hardware or software involved should be added to the project schedule. This can vary widely based upon the product and relationships involved.

  • For complex installations Sun Professional Services can make a difference.

    During the installation issues, using Sun Professional Services was brought up again and recommended to the customer. Some of the issues the customer experienced had already been encountered and addressed using Sun Professional Services. Many of the issues that caused significant delays would have been addressed quickly and would not have caused project time slippage.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.19.143.141