
After years of delivering faster and faster chips that can easily boost the performance of most desktop software, Intel says the free ride is over. Already, chipmakers like Intel and AMD are delivering processors that have multiple brains, or cores, rather than single brains that run ever faster. The challenge is that most of today's software isn't built to handle that kind of advance.
"The software has to also start following Moore's law," Intel fellow Shekhar Borkar said, referring to the notion that chips offer roughly double the performance every 18 months to two years.
"Software has to double the amount of parallelism that it can support every two years."
Member since:
2005-07-08
I don't know how an OS would deduce and enforce shared memory dependencies between threads in a process. The application developer has to explicitly declare how access to data should be serialized and/or synchronized. It could be as easy as declaring that objects of a certain class can only be operated upon by one thread at a time. Or that a thread cannot pass a certain point until all other threads have reached this point the same number of times.
There's only a handful of commonly used schemes for implementing good multi-threaded code. Some languages have primitives built-in. Others don't. But these concepts are never hard to implement, for example, by embedding a lock in a struct or class. More often, the problem is that the application developers simply don't understand their code enough to know what code needs to be a critical section or what data needs to be protected by a lock. The OS can't help here.
The OS also can't help application developers figure out what parts of their application could be parallelized. Most desktop applications just don't have much potential for thread parallelism. Consider a word processor. Most of its time, it's waiting for user input. The most common user complaint regarding the performance of word processors is that they take too long to start up. Well, application startup and initialization is quite I/O-bound and usually very serial in nature.
I think what people often miss about the direction of client system architecture is that multi-core matters as much for graphics and multimedia processing as much as it does for general-purpose computation and logic. In just a few years, each of these kinds of execution units will begin to coexist as modular cores sharing a common bus architecture. Eventually, they could share decoders, dispatchers, and L3 cache. A typical client probably won't have more than 4-8 general-purpose cores, and many will have only 2. But a high-end client might have 32 graphics cores or more. Graphics and multimedia code has high potential for parallelization, whereas general-purpose code is highly serial on the client.
The server is a whole different story. Servers are evolving to have more and more general-purpose cores, and there's no end in sight. Their general-purpose workloads are highly parallel. Serving many requests concurrently is the classical strength of multi-threading, and most server applications are quite good at it. The integration of vector and general-purpose cores is big the HPC market. Server applications run great on big multi-socket systems.