On modern computers, from mobile devices to servers, multiprocessor systems now dominate the landscape of computing. Traditionally, such systems have two (or more) processors, each with a single-core CPU, The processors share the computer bus and sometimes the clock, memory, and peripheral devices. The primary advantage of multiprocessor systems is increased throughput. That is, by increasing the number of processors, we expect to get more work done in less time. The speed-up ratio with N processors is not N, however; it is less than N. When multiple processors cooperate on a task, a certain amount of overhead is incurred in keeping all the parts working correctly. This overhead, plus contention for shared resources, lowers the expected gain from additional processors.
The most common multiprocessor systems use symmetric multiprocessing(SMP), in which each peer CPU processor performs all tasks, including operating-system functions and user processes. Figure 1.8 illustrates a typical SMP architecture with two processors, each with its ovwn CPU. Notice that each CPU processor has its own set of registers, as well as a private - or local - cache. However, all processors share physical memory over the system bus.
The benefit of this model is that many processes can run simultaneously - N processes can run if there are N CPUs - without causing performance to deteriorate significantly. However, since the CPUs are separate, one may be sitting idle while another is overloaded, resulting in inefficiencies. These inefficiencies can be avoided if the processors share certain data structures. A multiprocessor system of this form will allow processes and resources-such as memory- to be shared dynamically among the various processors and can lower the workload variance among the processors. Such a system must be written carefully, as we shall see in Chapter 5 and Chapter 6.
The definition of multiprocessor has evolved over time and now includes multicore systems, in which multiple computing cores reside on a single chip. Multicore systems can be more efficient than multiple chips with single cores because on-chip communication is faster than between-chip communication.
In addition, one chip with multiple core uses significantly less power than multiple single-core chips, an important issue for mobile device as well as laptops.
In Figure 1.9, we show a dual-core design with two cores on the same processor chip. In this design, each core ahs its own register set, as well as its own local cache, often known as a level 1, or L1, cache. Notice, too, that a level2(L2) cache is local to the chip but is shared byu the two processing cores. Most architectures adopt this approach, combining local and shared caches, where local, lower-level caches are generally smaller and faster than higher-level shared caches. Aside from architectureal considerations, such as cache, memory, and bus contention, a multicore processor with N cores apprears to the operating system as N standard CPUs. This characteristic puts pressure on operating-system designers-and application programmers-to make efficient use of these processing cores, an issue we pursue in Chapter 4. Virtually all modern operating systems-including Windows, macOS, and Linux, as well as Android and iOS mobile systems-support multicore SMP systems.
Adding additional CPUs to a multiprocessor system will increase computing power; however, as suggested earlier, the concept does not scale very well, and once we add too many CPUs, contention for the system bus becomes a bottleneck and performance begins to degrade. An alternative approach is instead to provide each CPU (or group of CPUs) with its own local memory that is accessed via a small, fast local bus. The CPUs are connected by a shared system interconnect, so that all CPUs share one physical address space. This approach-known as non-uniform memory access, or NUMA-is illustrated in Figure 1.10. The advantage is that, when a CPU accesses its local memory, not only is it fast, but there is also no contention over the system interconnect. Thus, NUMA systems can scale more effectively as more processors are added.
Apotential drawback with a NUMA system is increased latency when a CPU must access remote memory across the system interconnect, creating a possible performance penalty. In other words, for example, CPU cannot access the local memory of CPU as quickly as it can access its own local memory, slowing down performance. Operating system can minimize the NUMA penalty through careful CPU scheduling and memory management, as discussed in Section 5.5.2 and Section 10.5.4. Because NUMA system can scale to accommodate a large number of processors, they are becoming increasingly popular on servers as well as high-performance computing systems.
Finally, blad servers are systems in which multiple processor boards, I/O boards, and networking boards are placed in the same chassis. The difference between these and traditional multiprocessor systems is that each bladeprocessor board boots independently and runs its own operating system. Some blade-server boasrds are multiprocessor as well, which blurs the lines between types of computers. In essence, these servers consist of multiple independent multiprocessor systems.