Jon “Hannibal” Stokes is co-founder and Senior CPU Editor of Ars Technica. He has written for a variety of publications on microprocessor architecture and the technical aspects of personal computing. He recently published his first book Inside the Machine – An Illustrated Introduction to Microprocessors and Computer Architecture. We interviewed him to discuss how hardware bugs are dealt with, the use of reserved bits, the performance and efficiency of console CPUs and GPUs, the possibility to build a Playstation 3 cluster, and what future connection he sees between CPUs and GPUs.
Could you introduce yourself?
Jon Stokes: I’m a co-founder of Ars Technica, and a Senior Editor at the site. I typically cover microprocessors and graphics hardware, but over years I’ve covered a pretty broad array of additional topics, including intellectual property, national security, privacy and civil liberties, outsourcing and the H1B visa program, and electronic voting. I have an undergraduate degree in computer engineering from LSU, graduate two degrees in the humanities from Harvard Divinity School, and I’m currently working on a Ph.D. at the University of Chicago. I am the author of Inside the Machine.
I found a list of bugs inside Intel Core Duo/Solo that was released just 20 days after the official release of these CPUs. It’s pretty shocking. Is it something common? How do CPUs manufacturers react? With new revisions? Letting the software guys find a workaround?
Jon Stokes: Generally speaking, many of the major errata are fixed with new steppings of the processor. Other errata can be worked around using BIOS and OS tweaks.
There were quite a few errata with the Core Duo/Solo line, and maybe even a higher number than usual for which no fix is planned. However, I think Intel’s reasoning behind not planning fixes for these is pretty clear in this particular case.
Specifically, Core Duo/Solo (aka “Yonah”) is a transitional design between the Pentium M and the more advanced Core 2 Duo/Solo (aka “Conroe”). There really wasn’t much of a time gap between when Yonah was released in January of 2006 and when Conroe was released in July of that same year. So Yonah was obsolete pretty quickly. This being the case, there was no reason for Intel to put a lot of effort into updating this transitional microarchitecture.
Overall, I think too much is typically made of these errata in some forums. As I said above, the important bugs are fixable with new steppings, BIOS tweaks, and OS updates.
Browsing CPUs specs and datasheets I see a lot of reserved stuff. They are probably used for debugging the hardware during the development cycle, but I’m wondering what they could be used for once these CPUs hit the market. The paranoid android inside me could argue that maybe it’s just security through obscurity, and that some combinations of these reserved bits could be used to 0wn the system. Or maybe the always cool story about NSA conspiracy, so there could be a backdoor somewhere… What can you say about the hardware development process and the use of reserved bits?
Jon Stokes: My understanding of reserved bits is that they’re often intended not for secret features, but so that the architects can add new features to the ISA at some future point. In other words, a reserved bit gives you a “place” to insert a new option or capability, without breaking legacy software.
If I recall correctly, AMD made use of at least one reserved bit in the x86 instruction format when they created their 64-bit ISA, x86-64.
What can you tell us about the CPU inside Microsoft’s XBox 360?
Jon Stokes: The Xenon CPU is a three-core, multithreaded PowerPC processor designed by IBM. Each of the three cores is very similar to the general-purpose CPU in the Cell BE (the PS3’s processor), but it has some additional vector processing resources.
Probably the most important thing to note about Xenon is that it handles caching in a very special way that makes it more effective for media-intensive workloads like video decoding and gaming.
Streaming media applications tend to “dirty” the cache, which means that instead of storing a single working set in the cache and using that data for a while, they’re constantly moving data /through/ the cache. This kind of behavior makes very poor use of the cache, and in fact streaming data from one thread can result in non-streaming data from another thread being booted out of the cache needlessly.
Xenon’s fix for this is to “wire down” certain sections of the cache and dedicate them to a single thread. That way, a thread that is only moving data through the cache, and not storing it, can just dirty a small, dedicated part of cache. In a way, this “cache locking” mechanism enables the Xenon’s cache to function a little bit like the Cell’s “local store” memory.
About PS3 and Xbox 360. From a pure computational point of view which system could be considered more powerful?
Jon Stokes: I think that PS3 has more raw computational horsepower on paper, but in practice the two consoles will probably equal out to about the same for most game developers. However, there are some problems in high-performance computing where the Cell Broadband Engine that powers PS3 is much more powerful than anything else out there. The problem is that programmers have to design their code from the algorithm level on up to fit Cell exactly in order to see benefits. And again, this doesn’t seem to apply to games, but if some developer figures out that it does then eventually they could get more performance out of the PS3.
Do you think that the PS3 could be used to build cheaper computational clusters as happened with PS2? I was thinking at places like Google…
Jon Stokes: I think this is an interesting idea, but ultimately IBM’s Cell-based products will be a better fit for clusters than a PS3 console. The advantage of the PS3 console is, of course, that it’s cheap because Sony subsidizes it. So it’s entirely possible that someone would want to use it for a cluster. It does have gigabit Ethernet, so I guess it could work.
How do their performance-per-watt compare to that of modern power-savvy CPUs such as Intel Core Duo 2?
Jon Stokes: Although I don’t have any real numbers to back it up, I’d say that Core 2 Duo almost certainly has them both beat in performance/watt for ordinary workloads. But again, if you’re solving one of these exotic HPC problems using Cell, and you have code that’s custom-fitted to give you an outrageous performance delta vs. a traditional architecture, then those performance/watt numbers would skew pretty drastically in Cell’s favor for those applications.
And what about performance-per-watt of consumer GPUs compared to those included in PS3 and XBox 360?
Jon Stokes: That’s hard to estimate, I think, because “consumer GPUs” is such a broad category. I’m sure that for high-end GPUs that are comparable in horsepower to those in the PS3 and XBox 360, the performance/watt numbers are also comparable.
Why do you think this new generation of consoles abandoned the x86 instruction set choosing RISC CPUs?
Jon Stokes: This is a hard one to really pin down. Honestly, I think that IBM just did a great job pitching them on their chip design competency. I don’t think it had much to do with the ISA, and I also think that IBM made the sale on a case-by-case basis to each of the console makers, appealing to different aspects for Sony, MS, and Nintendo.
With Nintendo, it was about the fact that IBM had already proven they could deliver a console product with the GameCube. Nintendo was clearly pleased with the GC, and in fact Wii is basically just a GC with a higher clockspeed and a new controller.
With Sony, IBM was able to sell them on this exotic workstation chip. Sony likes to really push the envelope with their consoles, as evidenced by both the PS2 (really exotic and hard to program when it first came out) and the PS3. So IBM was able to appeal to their desire to have something radically different and potentially more powerful than everyone else.
As for MS, I have no idea how they pulled it off. I think that if the Xbox 360’s successor had used a dual-core Intel x86 chip or even an Opteron, everyone would’ve been better off. This is especially true if Intel could’ve found a way to get a Core 2 Duo, with its increased vector processing capabilities, out the door in time for the console launch. Of course, even Core 2 Duo can’t really stand up to the Xenon’s VMX-128 units, especially given VMX’s superiority to the SSE family of vector instructions, so Xenon does have that edge.
But regardless of the SSE vs. VMX (or AltiVec) issue, I’m not convinced that letting IBM design a custom PPC part for Xbox 360 was the best move, because now MS has to support two ISAs in-house, and I don’t think it really buys them much extra horsepower. But I acknowledge that I may be entirely wrong on this, and in the end you’re better off asking a game developer who codes for both platforms which one he’d rather have.
It seems that AMD (+ATI) is working on merging CPU and GPU. At the same time some projects, such as brookgpu, try to exploit GPU power to crunch numbers. What is your point of view on the evolution of CPUs and GPUs?
Jon Stokes: I don’t really have much of an idea where this is really headed right now. I don’t think anyone does. I mean, you could do a coarse-grained merging, like AMD says they want to do with a GPU core and a CPU core on one die, but I’m not convinced that this is really the best way to attack this problem. Ultimately, a “merged CPU/GPU” is probably going to be a NUMA, system on a chip (SoC), heterogeneous multicore design, much like Cell.
I also think it’s possible to overhype the idea of merging these two components. Regular old per-thread performance on serially dependent code with low levels of task and data parallelism will remain important for the vast majority of computing workloads from here on out, so a lot of this talk of high degrees of task-level parallelism (i.e. homogeneous multicore) and data-level parallelism (i.e. GPUs and heterogeneous multicore, like Cell) is really about the high-performance computing market, at least in the near-term.
At any rate, right now we’re all sort of in a “wait and see” mode with respect to a lot of this stuff, because CPU/GPU and some of the other ideas out there right now look a lot like solutions in search of a problem.