Server Technologies

All of the mySAP.com application components (SAP R/3, SAP BW, SAP APO, etc.) must run on some server hardware platform. This section covers which server hardware platforms are supported. It also covers what the servers are made of and which types of processors, caches, system bus, memory, and I/O architectures are applicable to SAP environments. Crossing the 32-bit memory addressing barrier is also presented at the end of this section on server technologies.

Server (HW) Platform Vendors

There are many types of server hardware platforms supported with SAP software. These servers are differentiated by the type and number of processors supported, the system buses used to connect the processors and memory together with the I/O, as well as operating system and server administration features (see Figure 2-3).

Figure 2-3. Typical Server Systems for SAP


The manufacturers of server hardware supported with SAP software include companies like Hewlett-Packard, IBM, Compaq, Sun, Fujitsu-Siemens, Bull, Dell, and many others, including Intel as a primary technology provider. Many of these companies offer high-end servers with more than eight processors, while a few of them only sell standard Intel-based servers.

Hewlett-Packard, for example, supplies both high-end Unix-based systems as well as standard Intel-based systems. HP's 9000 servers support HP-UX on the PA-RISC processor family, both with 32-bit and 64-bit software, as well as on IA-64 processors.[*]

[*] HP's NetServers support Windows NT/2000 and Linux.

IBM offers several platforms, including standard Intel-based servers and their RS/6000® RISC Unix-based systems based on the PowerPC processor family. IBM helped port SAP R/3 to its AS/400 and S/390 mainframe platforms with the DB2 database. IBM also purchased Sequent for the high-end (beyond 8-way) Intel server platform.

Compaq offers its Digital Alpha-based servers with its 64-bit OS called True64 Unix, along with its Intel-based servers.

Sun Enterprise servers support the Solaris Unix operating system for use with SAP with its SPARC processor family. Siemens AG supplied its servers with the ReliantUnix operating system and has since teamed up with Fujitsu to offer systems based on Sun's processors and Solaris Unix OS for high-end systems.

Other server vendors generally supply Intel-based systems, or systems based on the technology of IBM, HP, or Sun.

Processors

The processor, or central processing unit (CPU), is one of the most important server components for use with SAP. The two main characteristics of processors to mention are their performance and reliability. Factors that impact performance of a single CPU are the design (ex: instructions per clock cycle), frequency, and cache architecture. An important factor that impacts the processor's and potentially the system's reliability is the ability to gracefully handle data errors (see discussion about memory caches).

Processor Architectures

The traditional RISC (reduced instruction set) processor employs a superscalar design that can handle one instruction per clock cycle, and in some cases, more than one per cycle. These processors perform the task of parallelizing the instructions on the fly. In order to process instructions this efficiently, a pipelining design is required. This examines a set of instructions, determines which parts could be done in parallel, and then sends those parts into different CPU pipelines, each of which carries out the instructions for its given part. In the fully parallelized case, all of the pipelines would be working simultaneously, multiplying the power of the CPU by the number of pipelines. Often, the pipeline design is used for execution of branch prediction, or when the processor comes to a decision point and must choose one of the two paths (or pipelines) on which to continue processing.

The number of different instructions, or the processor's instruction set, was purposely kept to a minimum, so initially only a limited set of operating systems (typically Unix) were designed to work on these RISC processors. Some of the first RISC processors included:

  • Hewlett-Packard's PA-RISC

  • Sun SPARC

  • IBM PowerPC

  • Compaq/Digital Alpha

  • SGI MIPS

Higher frequencies and more pipelines have helped to increase these RISC processors' performance levels over time. Most of these processors evolved to handle 64-bit data lengths and memory addresses and were the first processors to support 64-bit OS, DB, and SAP R/3 combinations.

Intel Processors

Even Intel's IA-32 processors eventually adopted many of the same design principles found in these RISC processors. The Intel Xeon processors are a good example of this. However, the applications must be optimized for the newest IA-32 architecture to take advantage of the new processor design features, which, of course, SAP software is for the most part.

There are some differences in Intel processors regarding their support for larger caches and multiprocessor systems. Generally, Intel releases its fastest processors first for the workstation market where the higher volumes are. These do not support Intel's multiprocessor specification (MPS), which includes distributed I/O interrupt sharing and a few other features.

EPIC Architecture (IA-64)

The problem with these RISC processor designs is that the parallelization that the processor does could still be optimized. One way to solve this problem is to give the software compilers the responsibility of deciding which instructions can be processed in parallel. This is the foundation of the IA-64 architecture, with the EPIC (Explicitly Parallel Instruction Computing) design, which Hewlett-Packard and Intel codeveloped (see Figure 2-4).

Figure 2-4. Processor (CPU) Architecture Development


EPIC depends heavily on CPU aware compilers for the various operating systems and software applications. SAP must also compile its software with the EPIC aware compilers for the various operating systems supported on the IA-64 architecture. Otherwise, SAP software for Windows NT/2000, for example, will only run in the IA-32 compatibility mode, which is slower than running natively on the latest generation of IA-32 processors.

EPIC or IA-64 processors were initially designed to run 64-bit versions of Windows 2000, Linux, and HP-UX. Support of a few other Unix flavors on EPIC or IA-64 has also been announced, however SAP's official support strategy for Unix operating systems on IA-64 has been restrictive. In addition, the initial IA-64 processors were not expected to outperform the existing RISC or superscalar 64-bit processors, or even Intel's latest generation of IA-32 processors. Thus, the biggest benefits for using IA-64-based systems will be initially for the increased memory addressing and 64-bit software development environment on the Microsoft Windows 2000 and Linux operating systems.

TIP

Software for IA-64

If you deploy servers based on the IA-64 processors, then make sure to use the versions of software from SAP that are compiled explicitly for IA-64. Using existing 32-bit software on the IA-64 platform will run in the IA-32 compatibility mode, which may be slower than running it on the latest IA-32 processors (Intel Pentium III Xeon, etc.).


Processor Performance

Single processor performance for SAP is crucial for batch processing. Sometimes, SAP batch jobs cannot be divided easily to run on multiple processors. In this case, the processing of the batch jobs can go only as fast as a single processor (CPU).

An oversimplified approach to estimating the performance of a single processor would be to choose the one with the highest frequency within a processor family. The higher the frequency, the more instructions can be handled per second for comparable RISC architectures. This does not apply to the EPIC-based IA-64 processors, which can handle more instructions per clock cycle. Nor does this apply to IA-32 processors, which have additional design components other than pure RISC processors and operating system combinations.

A more appropriate measure of single CPU performance can be made with the SPEC (System Performance Evaluation Cooperative) benchmarks. For SAP, the integer-based benchmarks are most applicable. Examples of the SPECint95 benchmarks for the various servers available for SAP at the beginning of 2000 are shown in Table 2-2. Newer SPECint ratings are made available every few years. They cannot be compared with older versions because new tests are added to the suite and existing ones are changed (however, the relative performance ratings of each processor can still be found).

Table 2-2. Processors' SPECint-95 Performance at a Glance
Processors and Servers for SAP Primary Cache Secondary Cache SPECint Peak Result
PA-RISC 8600 552MHz 1024KB(D) - 41.4
HP 9000 Model N-4000 512MB(I)   
Alpha 700MHz 21264A 16KB(D) 8MB 39.1
Alpha Server GS140E or GS60E 16KB(I)   
Intel Pentium III 1000MHz 64KB(D) 1MB 35.6
Intel VC820 motherboard 64KB(I)   
PowerPC RS64-III 500MHz 128KB(D) 8MB 24.1
IBM RS/6000 M80 128KB(I)   
Intel Pentium III Xeon 500MHz 16KB(D) 2MB 22.4
Siemens AG Primergy 670-40 16KB(I)   
UltraSPARC II 400MHz 16KB(D) 8MB 18.3
Sun Enterprise 3500 16KB(I)   
(Source: www.spec.org)

Notice from the SPECint ratings in Table 2-2 that choosing the highest frequency alone does not guarantee the best single-processor performance. The architecture of the various processors and their primary cache sizes are important factors in determining their performance. For example, compare two processors with similar frequencies: Intel's Pentium III Xeon 500MHz processor and HP's PA-RISC 8600 552MHz processor. The SPECint performance difference between these is almost 2:1.

TIP

Choose the Fastest Processor for Batch Performance

To help make batch processing jobs go faster, or to help reports generate more quickly, consider only servers with the fastest processors. This is especially true for batch jobs that cannot be divided easily for parallel processing across multiple servers or processors within one server. Don't forget, batch jobs run in SAP R/3 application instances.

One way to determine the single processor speed is to use the SPECint ratings. Do not simply assume higher frequencies mean faster performance across various types of processors. Single processor performance is design and processor architecture dependent.


Careful with SPEC Ratings

It is not possible to determine a multiprocessor server's total scalability or throughput simply by multiplying the single processor performance by the total number of processors. Total system throughput must be measured differently, which is addressed in Chapter 3. More information about SPECint performance ratings can be found at www.spec.org.

Memory Architectures

Data and application code in a computer system are normally stored in main memory while being processed. A memory architecture consists of components that make up the primary memory area and caches to help accelerate processor performance and to increase system throughput.

Main Memory

Performance and reliability are two important aspects of main memory technology or solutions, more in mission-critical systems than in other computers. Main memory technology commonly uses industry standard DRAM components (dynamic random access memory components). These DRAMs are typically packaged on a memory module or board. The response time or time it takes to begin retrieving a row of data from main memory is measured in several nanoseconds (usually above 50ns). The clock cycle of a 500MHz RISC processor ticks every 2 nanoseconds by comparison.

Main memory modules are usually structured in interleaved banks to help improve the throughput of the memory subsystem. Although the initial response time does not reduce due to memory interleaving, it does improve the overall system throughput, especially when using multiple processors. It's always important to add enough memory modules in a server to configure it for the maximum memory interleaving performance possible.

The amount of memory a server can physically address is determined by the memory bus controller design. This is often an integrated component of the system bus chip set. There's a difference in how the data is stored and how much of that data is addressable. Typically, data is stored in 64-bit or 128-bit word lengths simply by placing a few 32-bit SIMMs or DIMMs together (this is why memory is often added in pairs or quads). However, it can only be addressed in 32-bit, 36-bit, or 64-bit chunks, depending on the underlying hardware and the OS support.

Caches

Caches are essential elements in the memory hierarchy of any computer system providing, at some expense in the total system cost, fast access to frequently used program code or user data. There are sometimes up to three levels of cache available in a system: primary, secondary (level-two or L2), and level-three (L3). The total amount of cache needed in the system for it to scale depends on the location of the caches and on the speed of main memory. The further away the cache is from the processor, the longer it takes to be accessed. Thus, secondary and third-level caches need to be larger to be worthwhile. A larger cache promotes higher reuse of code and data, especially on systems with homogeneous workloads (e.g., a database server). Two primary memory and cache design approaches for large scale SMP servers stand out:

  • Large primary, L1 processor cache (in megabytes) with a fast main memory that itself acts like a cache. This helps reduce the number of cache layers, and thus delay.

  • Smaller primary, L1 processor caches along with significantly larger L2 caches and a standard main memory design. This reduces both the processor and the main memory costs, but adds cost to the system (due to the much larger L2 cache or L3, if needed).

Primary or L1 caches are used to speed up the processor's access to data, allowing a single processor to be more efficient. Access to the larger main memory or even secondary caches is relatively slow, which means a processor may spend more time waiting for the data to process than the processing of that data itself. For this reason, the data is brought closer to the processor with the use of smaller but faster primary data and instruction caches. L1 caches are physically very close to the processor, usually on the same chip. Therefore, they can be run at frequencies close to that of the processor, helping improve single processor performance. A processor with a very large on-chip primary cache, although great for performance, is more expensive to produce, however.

For those processors with smaller primary caches, for example the Intel Xeon processors, an L2 cache for each processor becomes essential to help symmetric multiprocessor (SMP) systems scale. Standard SMP systems typically have only a limited number of buses for all processors to access the shared main memory. This shared memory bus (or system bus) is a primary performance limiter in most SMP servers. Any reduction in system usage of the bus allows more application work to proceed, as opposed to the overhead of refreshing program code or data from main memory into the L2 cache repeatedly. L2 cache hits reduce memory access across the shared memory bus, reducing system bus traffic, improving system throughput scaling. Level-two cache sizes for servers range anywhere from 512KB to 8MB for each processor.

L3 caches are sometimes employed when the overall available cache for the first two levels is not enough based on the processor and memory configuration or architecture. A third-level cache is usually much larger than the first two levels of caches and is often shared among all the processors. This is only needed when the main memory design is relatively slow.

TIP

Cache Size Choices

Sometimes a server can be configured with processors using different L2 cache sizes. For example, the Intel Xeon processors can be chosen with 512KB, 1MB, or 2MB of L2 cache. A general assumption for standard 4- or 8-way systems is a 10% overall system throughput gain (although it can vary from 8% to 15%) for every doubling of cache size, up to a certain point.[*] This needs to be factored into the sizing estimation of servers for SAP.

[*] Increases in processor frequencies often have less impact on overall system throughput than memory cache and system bus designs.


Cache and Memory Reliability

Often, memory bit errors can be introduced into the system from external sources (electrostatic discharge; ESD), or due to material defects in the memory itself. There are techniques employed to detect and correct single-bit errors for memory. This is called ECC, or Error Checking and Correcting memory. Extra parity bits are used to check the reading and writing to memory and to correct single-bit errors if needed. If the error is corrected on the fly, the system can continue processing. There are ways to detect double-bit errors but no easy way to correct them. When double-bit errors are detected, a system or component halt is usually performed before any further processing occurs. This helps protect data integrity but represents downtime for the SAP system.

The caches used in the memory and processor subsystems may be faster and smaller, but they are also prone to the same errors as the main memory banks. The ECC technique must be applied to the memory caches in the system as well, not just to main memory.

TIP

ECC Cache Protection and Dynamic Processor Deallocation

ECC is commonly employed in the main memory of most servers today. However, ECC memory protection is not always available in all of the caches used in a system. If a processor's primary or secondary level cache is not able to correct single-bit errors, then the system's overall availability will be reduced. Use only systems that have ECC protection for all of the memory components, including the processor caches.[**]

[**] In addition, choose systems that can dynamically deallocate (without impacting the SAP application) processors and memory when there are high rates of single-bit errors, helping increase system availability.


System Bus Architectures

The connection between the processors, memory, and I/O subsystems is the most critical design element in a server that determines its overall scalability or aggregate performance of all the processors in one system. This connection is known as the system bus architecture. In a standard PC, there's usually only one processor, one memory subsystem, and one I/O subsystem, making the architecture straightforward. Enterprise computing requirements, however, demand that servers support many more processors to deliver the performance that business applications need. Architectures used to implement multiprocessing include SMP, MPP, and ccNUMA. The SAP and TPC performance benchmarks of the whole system are covered in Chapter 3.

SMP Systems

In a Symmetrical Multi-Processing (SMP) system, there's only one global, unified memory subsystem, but many processors all trying to get access to that memory as if it was their own. SMP is typically a bus architecture where processors access main memory across a shared bus. This is sometimes considered a shared-everything design approach. Only one processor may control the bus at a time. To keep the contents of memory cache coherent among multiple processors, special cache-coherency algorithms are implemented in hardware, which snoop the traffic going through the system bus chip set to enforce the coherency (see Figure 2-5).

Figure 2-5. Typical SMP Architecture


The unified, global memory allows for an easier programming model, which is why most standard business applications support SMP-based systems. SAP's software is supported on servers with SMP system bus designs. For Intel multiprocessor systems, the standard hardware implementation for SMP is referred to as the Intel Multiprocessor Specification, or MPS. Standard Intel-based operating systems, such as Microsoft Windows NT/2000 or Linux, simply provide a hardware driver for the Intel MPS standard and then they are able to take advantage of the multiprocessor environment. For non-Intel systems, each server vendor typically provides their own OS versions and drivers to support the SMP servers. One of the main tasks an OS has for supporting multiprocessor servers is to distribute the I/O (interrupts) and application processing requests among the available processors in a balanced manner.

Standard SMP Chip Sets

The most common and economical SMP system bus chip sets available for servers are for 2- and 4-way processor configurations. Many of these chip sets are designed and produced by the same vendor who makes the processors, but some third-party companies also play a role here (for example, ServerWorks). Intel, for example, sells both the IA-64 MPS compatible Itanium processors as well as the MPS 4-way chip set (Intel 460GX) that goes along with it.

The standard 2-way system bus chip sets are typically first designed for advanced workstations and then leveraged for the low-end server market. The most common scaling for SMP systems, where multiple processors share one memory bus, seems to be with 4-way systems (6-way chip sets are also becoming more common for Intel-based servers).

Crossbar Technologies

To help address the SMP scaling issues beyond the 4- or 6-way processors per memory bus range, special crossbar or multi-memory bus technologies are used. If a crossbar or equivalent technology were not used for large SMP systems, adding more processors would more quickly reach a point of diminishing returns.

Simple crossbar architectures for 8-way systems would have two buses accessing memory, each with 4 processors, along with a minimum of 4-way memory interleaving. A more complex example would be 32 processors with 32-way interleaving memory architecture. Each processor would have access to any of the 32 memory banks, similar to a hardwired cross-connect switch. This crossbar technology is an important element to improving main memory throughput as well as overall system scalability. Because it is an important element to improving main memory throughput as well as overall system scalability, most of the high-end commercial enterprise systems differentiate based on their crossbar system bus design. This includes Sun's UE10000, IBM RS/6000 S80, and HP's SuperDome, which uses a combination of four processor cells or building blocks connected with crossbar technology. The latest generation of high-end systems can achieve over 10GB/s of system bus bandwidth to support the processor scaling requirements.

TIP

SMP Systems—The Standard Server Choice for SAP

SMP servers are the most cost-effective systems for SAP application servers since SMP technology has essentially become standardized. SAP R/3 database servers are also a good match for high-end SMP systems, however, SAP BW database servers may benefit from other designs.


Parallel Systems

Massively Parallel Processor (MPP) systems, on the other hand, do not have a shared memory design. Instead, each processor or processor set has its own main memory and its own direct access to I/O. While this is great for parallel processing and unlimited performance scaling, it does not generally apply to SAP systems because standard operating systems and database applications don't support parallel, independent memory designs. Special software must be written or compiled for MPP environments. Typical examples include scientific applications or some web servers that can process information independently.

The one exception that applies to parallel systems is SAP's support of IBM's mainframes, called Parallel Sysplex, which is based on the S/390 systems. This is not used to run the SAP application code, because that is available on scalable application servers. It is used instead as a dedicated database server to make the DB2 database scale across more mainframe nodes. However, running a parallel database requires more administrative effort and is subject to more potential downtime, unless also used in an HA configuration. This adds costs to a configuration that doesn't really outperform high-end SMP single-box systems.

ccNUMA

The art of system bus design is to provide enough memory bus bandwidth in the system so as to allow each additional processor to function at its maximum capacity but still be within economical limits. As processor performance continually improves, however, the cache-coherency requirement and access-contention to a shared memory subsystem in SMP servers are limiting factors in their scaling (although crossbar technologies have improved SMP system scaling).

Because SMP systems do have scaling limits, and because parallel systems are not an easy fit for standard business applications, a third alternative has emerged as the next generation system interconnect: ccNUMA. ccNUMA stands for cache-coherent non-uniform memory access and promises to scale a single system higher than SMP systems can. This additional scalability is important in the SAP system landscape for the critical performance bottlenecks—the database servers. Early benchmark results show that data warehousing systems, such as SAP BW with an optimized database, can benefit most readily from the ccNUMA system bus architecture. The price/performance benefits for SAP R/3 database servers, however, have not yet been as clearly demonstrated. SAP application servers, on the other hand, do not need more scalability in one server box because the client/server architecture already provides for this.

Figure 2-6 provides a generic comparison of the overall system-throughput capabilities as additional processors are added to the various architectures.

Figure 2-6. Throughput Scaling Compared


The important design goals of ccNUMA included achieving high scalability of I/O, CPU, and memory bus bandwidth while preserving the standard SMP programming. Additional goals included low memory access latency (both local and remote) while leveraging commercial, readily available processor designs.

A ccNUMA system is implemented with multiple building blocks (cells), each made of typically four processors, one I/O, and one shared memory subsystem each. These building blocks are connected together using SCI, or the scalable coherent interface IEEE standard (see Figure 2-7). SCI is the modern equivalent of a processor-memory-I/O bus and a LAN combined and made parallel. The result is a distributed multiprocessing system with high bandwidth, low memory access latency, and a scalable architecture that allows building large systems out of many inexpensive mass-produced building blocks. SCI provides computer-bus-like services but, instead of a bus, uses a collection of fast point-to-point unidirectional links to provide high-end throughput.

Figure 2-7. ccNUMA System Architecture


SCI supports distributed shared memory with cache coherence for tightly coupled systems. To maintain systemwide cache-coherency, the standard SMP-based snooping algorithms are used within the processor building block. Then SCI provides a directory-based cache protocol to link the various local bus protocols among the building blocks.

Memory access latency within a core 4-processor cell is generally fast (low wait or response times). However, the remote (outside of cell) memory takes longer to access but must be accessed by any of the processors in the system. In order to get the maximum scaling, the average (thus remote access) latency should be as low as possible, regardless of how high the memory throughput is.

TIP

Choosing ccNUMA Systems for SAP

For SAP to take advantage of the better scaling capabilities of ccNUMA systems, SAP software must be optimized for this architecture, especially in regards to awareness of local versus remote memory locations. SAP, however, does not generally write its software for a particular vendor's architecture. SAP application servers are not a performance bottleneck, so only the database servers will be cost effective on ccNUMA systems. An SAP R/3 database will run on a ccNUMA system, but its OLTP data patterns do not lend well to the partitioning requirements of ccNUMA. The data patterns of SAP BW and other data warehouse applications, however, can more easily take advantage of ccNUMA.

Standard database software will work on ccNUMA systems without changes but is not expected to provide significant performance benefits over high-end SMP systems.[*] Because database vendors will eventually optimize their software for ccNUMA systems, consider deploying SMP servers that are ccNUMA-ready or capable as an investment protection measure. Early TPC-H data warehousing benchmarks have already demonstrated that optimized software configuration can add significant performance gains. Thus, ccNUMAz systems can be considered for very large SAP BW projects.

[*] Or over those with cell-based SMP crossbar design.


The high-end IA-64 ccNUMA systems need to be compliant with the Intel Multiprocessor Specification (MPS) to be compatible with Microsoft Windows NT/2000 or Linux. Intel MPS requires that any I/O device is accessible from any processor building block, regardless of how it is connected. In addition, all processors must be capable of sending I/O interrupts to any other processor, including remote processors. Lastly, all system memory must be accessible to all processors and all DMA devices (typically I/O controllers).

I/O Architectures

The purpose of an I/O subsystem is to connect the peripheral devices, such as disk systems and network controllers, to the rest of the system.[**] The PCI bus has been the most common I/O bus implemented in most SMP servers. The benefit of an industry standard bus is that more peripheral devices can be supported.

[**] High I /O bandwidth is critical for larger database servers.

The PCI bus supports both 32-bit and 64-bit data widths, at bus frequencies of 33MHz and 66MHz, resulting in transfer rates from 133MB/s to over 500MB/s, respectively. Often, up to four PCI I/O connectors are used on one PCI bus (behind one PCI bus controller). Some servers have only one PCI I/O connector behind each PCI bus controller, resulting in potentially higher system I/O throughput (no I/O interrupt sharing conflicts).

Many new server and OS system combinations also support hot-plug PCI connectors, making online replacement of failed I/O adapters possible. This requires software support to disable the driver and to remove power to the PCI connector during replacement. This feature is very important for the database server in the production environment, where unplanned downtime should be minimized at all costs. Verify that the server and OS combinations used for the production database server(s) support hot-swap PCI card replacement.

TIP

Bit PCI Slots and High-Speed Disk and Network I/O

Not all PCI I/O connectors in a server are always capable of handling 64-bit PCI cards. The SAP database server should have enough high-speed (64-bit, 33MHz or 66MHz) PCI slots (preferably hot-swap) available to support the number of disk system host bus adapters or high-speed network controllers needed.

Mid-range systems have at least 8 PCI I/O slots, high-end servers often have over 50 PCI slots. Don't forget to double the amount of I/O controllers needed for redundancy or failover purposes. In addition, 64-bit I/O adapters are needed for performance with 36-bit memory addressing in Microsoft Windows 2000 AWE as well as for 64-bit operating systems.


Although a 66MHz, 64-bit PCI controller can deliver quite a lot of bandwidth, the PCI architecture is beginning to reach its limits. The emerging high-end enterprise applications, such as data warehousing or very large OLTP systems, continue to require even more I/O bandwidth. In response, some industry leaders have come together to address this important issue by leading an independent industry body called the InfiniBand(SM) Trade Association (www.infinibandta.org). This association is dedicated to developing a new common I/O specification to deliver a channel-based, switched fabric technology that the entire industry can adopt. The InfiniBand™ architecture represents one of several new approaches to I/O technology and may be necessary to balance out the scalability gains provided by new system bus architectures.

Memory Addressing

SAP's software products make heavy use of memory. Whenever a support call is made on an SAP system, one of the first things checked is the amount of memory in the server to run the desired SAP business application. This section focuses more on the physical memory capabilities of the various hardware and OS platforms. Chapter 3 introduces methods used to determine how much memory is needed based on the number of users and other parameters for the various mySAP.com software components.

Most of the servers SAP supported during the early 1990s were capable of addressing only 32-bit memory addresses, or a maximum of 4GB for both the operating system kernel files and the SAP and DB applications combined. As of the late 1990s, SAP began fully supporting 64-bit servers and operating systems with its SAP R/3 kernel (same kernel is used for most mySAP.com application components). A 64-bit address space is four billion times bigger compared with 32-bit technology, or 16 petabytes. Sixty-four-bit computing overcomes the current limitations with respect to real and virtual main memory size. For example, the relationship between 32-bit and 64-bit technology address space is the same as that between the length of a basketball field and the distance between the earth and the sun. SAP supports 32-bit platforms for SAP R/3 as long as customers decide to stay on 32-bit hardware, operating systems, or databases. It is also supported to combine 32-bit SAP R/3 application servers with 64-bit SAP R/3 database servers, for example.

SAP Applications and Memory Addressing

Most of the memory used in the SAP environment is shared memory. Experience has shown that the memory used by SAP R/3 is divided into 80% shared memory and 20% local memory (with some exceptions). The shared memory used by SAP includes all of the SAP R/3 buffers (program buffers, table buffers, roll buffers, etc.), as well as the SAP R/3 extended memory for user contexts. Shared memory restrictions by the operating system therefore drastically affect the configuration options of the SAP system.

Limitations of the 32-Bit Address Space

With the proliferation of high-performance servers, a new problem arose—the limitations of 4GB addressable memory for 32-bit applications. The number of users that can be logged on to a particular server is ruled by the avail able memory—both physical and logical (or virtual). The memory demand depends on the number of users and the modules they use. However, even without a single user logged on to the system, a significant minimum memory footprint is still required. Similar to the growing needs of CPU power, the memory demand of SAP applications is also growing from release to release at a steady pace. Figure 2-8 demonstrates the increase in memory requirements for 100 users. The lower line marks the demand for light modules like FI, the upper line marks the demand for heavy modules like SD and PP.

Figure 2-8. Base Memory Footprint (Usage) of SAP R/3 Releases


Memory limits were not a big issue in the past when the number of users that could be handled by a server was limited by the available CPU power. However, today's powerful multiprocessor systems can easily handle more users than a 32-bit system can provide with sufficient shared memory. If the SAP application runs out of memory due to 32-bit limitations, the standard workaround was to distribute the load to more SAP R/3 instances on more application servers, each with their own 32-bit memory space.

Benefits of 64-Bit SAP Software

The 64-bit technology has solved this memory-addressing problem. With 64-bit SAP kernels, an address space of many terabytes is available for each SAP work process. Preconditions for a 64-bit SAP system are 64-bit CPU architectures and mature 64-bit operating systems, along with a proven 64-bit database. The 64-bit kernels are delivered with SAP standard installation CDs (for most platforms). The migration from 32-bit to 64-bit kernels can be done within a few hours, by simply exchanging the executables. The resource demand for the 64-bit kernel is nearly identical to the 32-bit version. A mix of 64-bit and 32-bit application servers is possible but not recommended because this will add complexity to the upgrade process.

The 64-bit SAP R/3 kernel has no new functions compared with the 32-bit version. However, the memory management was considerably improved, which results in performance gains. There are some additional benefits from 64-bit SAP R/3 kernel. Comparing 32-bit and 64-bit SAP R/3 kernels, SAP has drawn the following conclusion: “Taken all in all the 64 bit SAP R/3 kernel gives a performance improvement of about 20 percent. There are a lot of statements at the same speed, but most statements are much faster. However, some exceptions still exist.”

SAP supports many variations of memory addressing, but recommends the use of 64-bit whenever possible. If the SAP application is 64-bit, the benefits include:

  • Higher performance and scalability with servers beyond 4-way;

  • Support of more SAP R/3 instances and systems per server—a system consolidation enabler;

  • Support of more users on each application server—a server consolidation enabler;

  • Faster memory-based computing to provide real-time, high-volume processing of business data in main memory (some customers reported from 15 to 30% faster processing of batch jobs);

  • Simplification of SAP R/3 customization and configuration—ease of management;

  • The Unicode releases of SAP R/3, for example 5.x and higher, will only be available on systems with large memory addressing support.

Performance and Scalability

Even on today's 32-bit platforms, SAP customers can use real main memories larger than 4GB. This is achieved with memory banks (windows) or memory mapping, which decreases the speed of low-level memory operations. With 64-bit technology, more real memory can be used while low-level memory operations are accelerated, resulting in improved overall performance. The speed of memory-mapped disk accesses, caching, and swapping increases significantly with 64-bit technology. Finally, 64-bit technology allows caching of a bigger percentage of the database in main memory, again boosting performance as compared with SAP applications on 32-bit platforms.

To serve many concurrent users on a single server, SAP R/3 buffers each user's context in main memory. Therefore, switching from one user context to another avoids disk reading and data copying. Because 64-bit platforms offer more real memory, more user contexts can be buffered, leading to higher scalability of SAP systems.

Eased System Management

With 64-bit technology, SAP R/3 applications can access terabytes of data both on disk and in memory. As database tables and indexes on these data may grow beyond their former limitations, the need to frequently archive and reorganize data is eliminated. In this way, system management becomes easier. To help SAP focus more on application development and less on basis or technology optimization, a minimum 20GB pagefile or swapfile is required for 64-bit systems.

SAP R/3 Kernel Work Process User Contexts and Virtual Memory

With typical 32-bit technology, it is only theoretically possible to address a maximum of 4GB of memory from one SAP R/3 work process. Because a lot of memory can actually not be used due to fragmentation into OS kernel and application space, the memory that is actually available for an SAP R/3 work process is much smaller in practice.

There's a difference between the SAP application and database software memory addressing capabilities. SAP supports only 32-bit or 64-bit OS versions with its SAP R/3 kernel (the same SAP R/3 kernel is used for most mySAP.com components). This means that the largest single SAP R/3 user context (whether for online dialog activity or, more likely, for a batch job or report) can only be as large as the amount of addressable memory provided by the underlying OS to that individual SAP R/3 process. In this case, one SAP R/3 process is considered one application from an OS perspective. Typically, a 32-bit OS only provides 2GB of memory to the application (depends on the OS version). Some individual batch jobs or reports can easily consume more than 2GB, however.

Some earlier releases of the SAP R/3 64-bit kernel were limited to a maximum number of 40 work processes per SAP R/3 instance. This has a potential impact on the maximum number of users supported in one instance. To avoid having to install multiple SAP R/3 instances as a workaround, SAP has planned an SAP R/3 kernel that can support hundreds of work processes.

Database System Global Area

The database memory area (ex: SGA for Oracle) should be as large as possible to keep the database response times low. The databases supported by SAP, however, are typically only one large work process from a system perspective. Therefore, the addressable memory is limited to what the OS provides as memory space for a single application (as opposed to total virtual space). Many of the database software packages also use 36-bit addressing, if available by the OS, in addition to 32-bit and 64-bit.

Unix Memory Addressing

The 32-bit versions of the various Unix flavors supported different amounts of addressable memory for applications (shared memory). Although 4GB is the theoretical maximum addressable for a 32-bit operating system, there's always a certain amount reserved for the OS kernel files. Some Unix 32-bit OS versions only offered 1.8GB of shared memory to applications, whereas others, such as Linux, offered close to 3.8GB.

The 32-Bit Limit Memory Workarounds

To help get around the memory limits imposed by some 32-bit Unix OS versions, special shared memory patches and SAP R/3 instance profile parameter adjustments could be used to increase the shared memory of the work processes. An additional solution was to use a 64-bit Unix OS version with memory windowing of the 32-bit SAP R/3 kernel. Each SAP R/3 instance or process could be given its own memory window, usually up to 1GB, for which it can allocate shared resources.

64-Bit Unix—The Recommended Solution

Unix systems tend to have hard virtual memory limits that cannot be easily exceeded. The only solution SAP really recommends to help get around the 32-bit memory limits for Unix is to use 64-bit technology for all three layers of software (the OS, the database, and the SAP application layers). This is much easier to implement than the workarounds available for 32-bit technology.

TIP

Use 64-Bit for Production SAP Systems on Unix

It is highly recommended to use 64-bit Unix, 64-bit databases, and 64-bit SAP R/3 or other mySAP.com application components for production environments. This reduces many potential configuration errors and enables computing resources to be used more effectively via consolidation.


Microsoft Windows Memory Addressing

The Microsoft Windows NT operating system has always provided applications with a flat 32-bit virtual address space that describes 4GB of virtual memory. The address space is usually split so that 2GB of address space is directly accessible to the application and the other 2GB is only accessible to the Windows NT executive (kernel) software. The SAP R/3 application work processes can each be given up to 2GB, and the total of all SAP R/3 work processes is limited to 4GB (as shown in Figure 2-9). Because the core SAP R/3 modules take up a fixed amount of shared memory space, the effective memory available for user data is anywhere from 1.2 to 1.7GB, depending on the module. (The virtual memory space referenced here is the total addressable memory or RAM by the OS, without swapping to the pagefile. The largest size of a single pagefile on Windows NT is limited to 4GB, although multiple pagefiles can be used.)

In contrast to multiple SAP R/3 work processes, the database is considered a single process or application, so it is limited here to 2GB total. These limitations apply to both Windows NT and Windows 2000, unless advantage is taken of the features described next.

Application Memory Tuning

With Windows NT Server 4.0 Enterprise Edition, Windows 2000 Advanced Server and Data Center, any 32-bit Intel-based system can provide applications with a 3GB flat virtual address space, with the OS kernel and executive using only 1GB. This is called the 4GB Tuning feature, and is enabled with the 3GB boot-time option. For SAP R/3 work processes, the total virtual limit remains 4GB, but now the largest single work process can be 3GB in size (effectively about 2.2 to 2.7GB for actual user data, depending on the SAP R/3 module). The database buffer space increases also, now able to address up to a theoretical maximum 3GB system global area.

Very Large Memory (VLM)—PSE36

When running on Windows NT v4.0 Enterprise Edition, some databases (for example, Oracle8i version 8.1.4 and above) have been enhanced to support Intel's Extended Server Memory Architecture (ESMA) via the use of Intel's 36-bit Page Size Extension (PSE36) device driver. This device driver allows applications to access up to 64GB of RAM when running on Intel Pentium II Xeon (or newer) processors with chip sets that support more than 4GB of RAM (Intel 450NX, Profusion, etc.). Database applications can now make calls to PSE36 in order to read from and write to memory not normally accessible to Windows NT's memory manager. This allows the memory above 4GB to be used as an application-managed cache (similar to a RAMdisk). By adjusting some parameters via APIs, a single database instance gets access to more memory than was previously possible, helping reduce the number of I/Os to or from disk. This does mean, however, that specific versions of the database applications are needed. SAP R/3 instances have not been adjusted to take advantage of the 36-bit addressing available with PSE36, so they are limited to 2 or 3GB for each work process.

To enable this feature for the databases, the latest Windows NT 4.0 EE service pack and Intel's PSE36 driver are needed. The PSE36 device driver was originally available on Intel's developer web site, and now is available via server OEMs.

Very Large Memory (VLM)—PAE

When running on Windows 2000 Advanced Server or Data Center, the Intel Physical Address Extension (PAE) feature of the Extended Server Memory Architecture (ESMA) can be used to make use of additional memory. This is a different implementation of 36-bit addressing than PSE and is available on Intel Pentium II Xeon (or newer) processors with chip sets that support more than 4GB of RAM (Intel 450NX, Profusion, etc.). This feature can be enabled by using the PAE boot-time option.

The main benefit of having more than 4GB of virtual memory available is that more SAP R/3 work processes can be run simultaneously, without changes to the SAP R/3 kernel. Each SAP R/3 work process is still limited to a 2 or 3GB virtual address space. However, more virtual memory can now be used to stack all of the individual work processes because the underlying OS supports it, as shown in the middle column of Figure 2-9. This allows more users to be logged on to the same SAP application server and more work processes to be run.

Figure 2-9. Windows NT/2000 Memory Usage in an SAP R/3 System


Address Windowing Extensions (AWE)

With the release of Windows 2000, Microsoft has enabled an even faster implementation of VLM support than PSE36. Called the Ad dress Windowing Extensions (AWE), this support is a set of API calls that allows applications to access more than the traditional 3GB of RAM normally accessible to Windows NT applications. As opposed to Intel's PSE36, which was a read/write interface to the extended memory, the AWE interface takes advantage of the Intel processor architecture to provide a faster map/unmap interface. This avoids the expensive memory copying done by PSE36, so AWE is a faster implementation. Therefore, the Intel PSE36 driver is not supported on Microsoft Windows 2000.

The Address Windowing Extensions (AWE) APIs were created by Microsoft to provide applications access to this additional memory. AWE allows database applications (such as SQL Server 2000 Enterprise Edition or the latest Oracle version) to use physical nonpaged memory beyond the 32-bit virtual address space. SAP did not create a special SAP R/3 kernel for 36-bit AWE support; it is only used by databases. Although it is first available with Windows 2000 Data Center edition, the AWE API is supported in all versions of Windows 2000. Most of the new client/server benchmarks released on Intel-based systems use this combination of software.

Summary of Windows NT/2000 Memory Option for SAP

Figure 2-9 shows the virtual memory available with SAP R/3 on Windows NT and Windows 2000, along with database support of Windows 2000 AWE. The two left-hand columns show how the 32-bit SAP R/3 kernel can be used with Windows NT 4.0 EE or with Windows 2000 with PAE support. The right-hand column shows how databases can be used with Windows 2000 AWE API support on an Intel-based server. This means that they can theoretically support up to 64GB of memory using the Data Center version, or 8GB using the Advanced Server. Special versions of these databases will be required that contain this API support. More information about AWE can be found at the following web link: www.microsoft.com/HWDEV/NTDRIVERS/AWE.htm.

TIP

SAP R/3 Application Servers and Windows 2000 Advanced Server

A cost-effective solution for SAP R/3 application servers is with Intel-based servers with six or eight processors and 8GB of physical memory support. Simply use Windows 2000 Advanced Server, the latest SAP R/3 32-bit kernel, and enable the 3GB and PAE boot-time options. The only restriction is that an individual dialog or batch work process cannot exceed 3GB of memory, which can be an issue when generating very large reports. In this case, 64-bit systems are needed.


If more than 8GB of memory is needed for the SAP application servers, then the more expensive Windows 2000 Data Center version must be used, which raises some important questions. If so much memory is needed, shouldn't a 64-bit solution be considered instead? A native 64-bit solution may have better support with the third-party solutions typically employed to manage such a large server and may be easier to set up and configure if used to consolidate several applications without worrying about hardware or software restrictions. This needs to be considered carefully on a case-by-case basis. However, there are some consolidation situations where using Windows 2000 Data Center (with up to 64GB using 36-bit addressing) with multiple database instances makes sense to deploy.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.117.72.224