Come gather round people wherever you roam
And admit that the waters around you have grown
And accept it that soon you’ll be drenched to the bone
If your time to you is worth saving
Then you’d better start swimming or you’ll sink like a stone
For the times, they are a changing
— Bob Dylan
These are interesting times for the microprocessor industry. At the same time the multicore revolution is happening, we’re also seeing the rise of data parallel architectures. Yes, vector computing is back, but this time, it’s not just for nerds.
In a recent Linux Magazine article by Doug Eadline on processor trends, he wrote that mainstream computing is splitting into two architectural paths: general-purpose multicore CPUs, and data parallel engines — what Eadline calls parallel/predictable computing units. The latter include GPUs, the Cell processor, and the future Larrabee processors. To that we could also add FPGAs and custom ASICs like the ClearSpeed devices.
General-purpose computing is great for software like word processors and operating systems, where the nature of the task is unpredictable from one moment to the next, and data-intensive operations are absent. This type of code is strewn with a lot of “if-then-else” statements to handle fine-grained complexity. On the other hand, predictable computing is well-suited to multimedia apps and most types of HPC, where high levels of data parallelism can be exploited. If your code contains a lot of “for” statements that are processing big chunks of tables, you probably could benefit from data parallelism.
The reason CPUs have dominated the computing landscape for so long is that all applications need some sort of program control, and any data-heavy for-loops could always be implemented serially. Today though, a high-end computer game wouldn’t be practical without a GPU or game processor. And as visual and audio media become commonplace on the Internet and in mobile devices, clients and servers will need to be equipped with chips that can process large arrays of data in real time. Data parallelism will become a requirement practically everywhere.
The same goes for high performance computing. For example, with GPU-equipped systems, we’re seeing HPC codes like seismic analysis or molecular dynamics accelerated by up to two orders of magnitude compared to CPU-based systems. The extra computing power is opening up HPC applications to a much larger audience. At the high-end, the Cell-based Roadrunner has put the petaflop supercomputer on the map, and NVIDIA GPU-accelerated supers are on the drawing board.
The rise of multimedia applications and the growth of HPC means that data parallel processors are targeted for some of the hottest markets. True, it will be multimedia that drives volume, but HPC will help to pull these processors up the performance curve as it has done with the Cell processor. Every chip vendor is aware of this. The processor realignment explains why AMD bought ATI, why NVIDIA is expanding its lineup for the mobile and HPC markets, why Intel is making a foray into high-end visual computing with Larrabee, and why IBM is quickly constructing an ecosystem around the Cell processor.
As TG Daily’s Theo Valich pointed out, it appears that for the first time GPUs will be implemented on a smaller manufacturing technology than CPUs. According to him, both NVIDIA and AMD will use Taiwan Semiconductor Manufacturing Company fabs to start churning out GPU silicon on the 40nm process node in early 2009. Intel CPUs are currently at 45nm and their move to 32nm is unlikely to happen until the second half of 2009. The five nanometer edge for GPUs would be mostly symbolic, but as Valich notes, AMD and NVIDIA will probably make a big deal about it.
So where is this leading? Eadline believes the optimal platform for highly parallel (predictable) applications will turn out to be a single general-purpose core hooked up to some number of parallel processing engines. The Cell processor, with a PowerPC core surrounded by eight SPEs is the current example. Larrabee will likely be a more tightly integrated version of this, with a wide SIMD unit integrated into each core — more like a vector-enhanced manycore CPU. AMD and NVIDIA are dabbling with CPU-GPU integrated chips, but the first generation is aimed at the low end (mobile clients). There are no public plans to integrate a CPU core into AMD’s FireStream or NVIDIA Tesla HPC platforms.
The discrete CPU will be around for a while, though. There is plenty of non-technical software that just needs a handful of cores — or even just one — to run at peak efficiency. Vanilla desktop systems and virtualized enterprise servers, equipped with multicore CPUs, will handle these apps just fine. It’s the cutting-edge applications that will require these new massively parallel architectures.
In August, there are a bunch of conferences that will feature some of the latest goings on in the data parallel realm. SIGGRAPH, the HOT CHIPS symposium, the Intel Developer Forum, and NVIDIA’s NVISION 08 conference will have a lot to say about the new processor landscape and how it’s being shaped by emerging applications. I’m going to be following the events over the next few weeks and give you my take on the developments.