Software Architectures of Parallel Database Systems

In implementing parallelism in their database software, database vendors use one of three software architectures, commonly referred to as:

  • Shared everything

  • Shared disk

  • Shared nothing

The sharing refers to the sharing of disk and memory by multiple processors.

Implementing each of these software architectures requires an appropriate underlying parallel hardware architecture. Some software architectures are a better match than others for a given hardware architecture. For example, a shared everything software architecture is a natural match for SMP hardware, because all processors in an SMP system share the same memory and disk. Other combinations are not so good. Implementing a shared everything software architecture on an MPP platform, for example, is not a good choice, because MPP hardware is based on a distributed memory architecture. The software implementation of a shared memory abstraction on top of the distributed memory of the individual nodes in an MPP system would be difficult, and performance would be poor.

The three figures in this section, Figure 2.6 through Figure 2.8, highlight the differences among the three database software architectures with respect to the sharing of memory and disks. They do not imply any particular underlying hardware architecture. It is possible to implement more than one software architecture on a given type of hardware architecture. For example, on IBM RS/6000 SP, which is an MPP system, Oracle Parallel Server runs with a shared disk software architecture. DB2/6000 Parallel Edition, on the other hand, runs on the same IBM RS/6000 SP system with a shared nothing software architecture.

Shared Everything

In a shared everything database architecture, all processors share the same memory and disks. Figure 2.6 illustrates such a system. One copy of the operating system and one copy of the database software run on the system. Shared memory allows for efficient coordination between DBMS processes. In this architecture, it is relatively simple to implement inter-query and intra-query parallelism, because the operating system automatically allocates the queries and subqueries to available CPUs. The shared everything software architecture is widely used on SMP hardware. The NUMA hardware architecture also is suitable for a shared everything software architecture.

Shared everything architecture

Figure 2-6. Shared everything architecture

All major DBMS vendors provide support for shared everything architectures. Oracle supports parallel execution on the shared everything software architecture. Shared everything software architectures have the scalability and availability limitations of the underlying hardware architecture. For example, with SMP hardware, memory, and system bottlenecks limit the number of processors that can be used in a shared everything architecture. In addition, there are two single points of failure because only one copy of the operating system and only one copy of the database software run in a shared everything architecture.

Shared Disk

In a shared disk database architecture, each node has its own CPU and memory. Storage devices (disks) are connected using a high-speed network and are shared by all nodes. Each node has access to any disk. Each node also runs its own copy of the operating system and database software. Logically and physically there is just one database, distributed among all the disks, which is accessed by all the nodes. Figure 2.7 illustrates the shared disk architecture.

Shared disk architecture

Figure 2-7. Shared disk architecture

The shared disk approach eliminates the performance bottleneck of shared memory systems as it is implemented on distributed memory hardware. To increase performance or to support a greater number of users, you can add additional nodes and disks. Because you are adding nodes, you don’t encounter the memory bottlenecks to which SMP systems are prone. Both shared disk and shared nothing systems have this scalability advantage over shared everything systems and, consequently, often are used to host very large databases. Oracle Parallel Server is based on the shared disk approach and is available on clusters (including NUMA clusters) and MPP systems.

On some platforms, such as the HP Enterprise Server, the hardware vendor supports shared disks at the hardware level by connecting nodes directly to the disks. With the HP Enterprise Server, there is no need for a software layer to simulate shared disks. On other platforms that do not provide hardware support for a shared disk architecture, a software layer simulates it. For example, IBM RS/6000 SP is a hardware platform based on MPP architecture, and each node in RS/6000 SP has its own disk. However, using a software layer called Virtual Shared Disk (VSD), any node can transparently access any disk physically located on any other node. The Virtual Shared Disk layer traps all requests for disk access. If the access is to a shared disk that is locally connected, then the VSD layer passes the request to the Logical Volume Manager (LVM) of the local node. The LVM is the operating system module that manages storage in an IBM RS/6000 SP system. When the access is to a shared disk that is connected to a remote node, the VSD layer sends the request to the LVM of the remote node via the high-speed interconnect. The remote node then accesses the disk and returns the requested data back through the interconnect to the originating node.

Shared Nothing

In a shared nothing database architecture, each node is independent and has its own memory and disk. Each node runs its own copy of the operating system and its own copy of the DBMS software. A high-speed network is used to connect the nodes together. As illustrated in Figure 2.8, processors in a shared nothing architecture do not share memory and disks with other nodes. Shared nothing architectures are possible on cluster and MPP systems.

Shared nothing architecture

Figure 2-8. Shared nothing architecture

Databases in a shared nothing architecture are partitioned, or divided, among nodes. In this sense, the term partition refers to part of a database, and not to partitions of a table or an index in the sense that Oracle uses the term. Each node has direct access to only that part of the database, referred to as the local partition, that is stored on its local disk. Nodes do not have direct access to data in nonlocal partitions.

In a shared nothing environment, database queries that access only data from the local partition execute much faster than queries that require data from a nonlocal node. When data is required from a nonlocal partition, the DBMS software transmits a query for that data to the remote node owning the partition in question. That node then executes the query, retrieves the data, and sends the results back to the requesting node. IBM’s DB2/6000 Parallel Edition, Informix’s Extended Parallel Server, and NCR’s Teradata are based on this architecture.

Tip

Oracle’s parallel processing implementation is not based on the shared nothing architecture.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.189.178.53