Apple Computer plans to discuss how it will incorporate HyperTransport, a rapid chip-to-chip communications technology, into future computers later this month at its developer conference.
Apple Computer plans to discuss how it will incorporate HyperTransport, a rapid chip-to-chip communications technology, into future computers later this month at its developer conference.
before it rolls of Steves tongue. why talk to developers about it if it is not going to bein the next release of the mac?
Mac developers have been urged to optimize for executable size because the systems are so bandwidth-starved. It’ll be nice to see gcc stretch its legs with O2 optimizations at least. I’ve heard IBM and Apple have been working on a PPC 970 optimized version of GCC 3.3. Wish I could afford to attend the WWDC.
I’ve always had the impression that bandwidth is measured in bits. But the article clearly states 6-12 gigabytes as the theoretical bandwidth acheivable with Hypertransport.
RE: No more optimize for size:-)
Even with more CPU/memory bandwidth, we still have hard drives being a bottle neck, it’s been that way for a long time now. SATA can boost bandwidth in that area but the hard drives themselves still need to be designed for performance. Software developers would still have to be concious about application sizes not for disk storage’s sake, but bandwidth’s sake and even more importantly when we start working with 64 bit binaries.
It’s always bytes with bandwidth… at least for processors. Machines generally grab chunks of data.
And on CNet no less. This sounds like an unlikely rumor to “invent”, so I’m really beginning to think that the next-generation PowerMac roll-out is truly about to begin. I can hardly wait!
Jared
Apple will likely not use HyperTransport in the same manner that computers using Opteron chips use it, said Kevin Krewell, senior editor of the Microprocessor Report, who could not confirm Apple’s plans. In Opteron, HyperTransport connects the processor to main memory.
Umm, totally and completely wrong?
For one, in Opteron systems HyperTransport is only used to access the memory controller on a different processor. The processor’s own memory controller is *on die*, not down a HyperTransport link to another chip.
Second, Apple probably will use HyperTransport in exactly the same manner as AMD: to provide non-uniform memory access in SMP systems. HyperTransport can be used to link two chipsets from two PPC970 processors, and will most likely scale in the same manner as AMD’s Opteron, 4 processors with a simple crossbar architecture, 8 by extending that crossbar two additional blocks, and so on.
The Cupertino, Calif.-based company will use HyperTransport as a high-speed link between the two processors that make up the chipset in new desktop Macintoshes, sources said.
I don’t know what to make of this. The PPC970’s chipset (for UP systems) will most likely consist of two chips: one which contains the memory controller and is attached to the 970’s front side bus, connected via HyperTransport to another chipset (similar to the one in the Opteron) which will bridge between HyperTransport and the AGP/PCI busses. Perhaps these chips are the two “processors” to which they refer.
One thing to keep in mind is that the PPC970 does not contain an internal HyperTransport controller like the Opteron. The PPC970’s bus is synchronous, double data rate, and bidirectional, and will run at either 800 or 900MHz (400 or 450MHz DDR)
Yep, apple at least. Last year on thier job boards they had openings for people to work on the PPC side of GCC to make it a better PPC compiler. Amazing what future product plans you can learn buy looking at job sites.
When I read “HyperTransport” all I could think of was AMD Opteron on the Mac. But that would be rumor.
Thanks for the info above on the 970 Bascule.
Vic
This conference is going to be big…and it seems it may be even if you take out Steve’s Reality Distortion Field 🙂
They have too. The 970 doesn’t support HyperTransport and it also doesn’t have a built in memory controller. AMD is not just using HT for CPU-CPU communication but also for CPU-PCI communications.
Thus, in an AMD system the HT nodes are (CPU+Memory) – [CPU+Memory]… – (PCI Interface) – (Graphics Interface) – (Anything else you want fast communictions with).
The 970 is designed for SMP use with no local memory. Thus, Apple will need to create a (memory controler +HT) chip. The 970 directly supports upto 4 CPUs in a group. Given this information Apple can create the following HT nodes: (4CPUs + Memory) – (PCI Interface) – (Graphics Interface) – (Anthing else…).
Because HT is a standard interconnect, the prices for various modules should be droping in price quickly. AMD is already using PCI modules for the hammer family of chips. Nvidia already has a graphics module used in the xBox.
Mother board design becomes very simple (cheaper). All you have to do is design each module independantly and then use HT to tie them together. Each manufacturer will be responsable for the module design. Thus if I want to supply a direct connect Fiberchanel contoller, I would just layout all the chips required, document how to program for the set, and just sale the design along with the chips. Mother board designers could then just grab my design, make room on the board for my chips, and then wire it up to an open HT port.
The 970 is designed for SMP use with no local memory. Thus, Apple will need to create a (memory controler +HT) chip. The 970 directly supports upto 4 CPUs in a group. Given this information Apple can create the following HT nodes: (4CPUs + Memory) – (PCI Interface) – (Graphics Interface) – (Anthing else…).
The rumors of the dual processor motherboard indicated there was likely a separate memory bank for each processor. If that were true, how would that fit in with HT?
In Opteron, there’s a memory controller on each chip, so each chip can (and does on the Tyan Mb) have it’s one memory. The HT link allows one processor to talk to the other processor’s memory controller, and get the data it needs. It’s a NUMA, not unlike a big SGI system with Craylink cables.
As mentioned above, the 970 doesn’t have an onboard memory controller (nor HT), but Apple/IBM could use a similar layout with seperate memory controller chips. The advantage is you’re seperating memory out, so you can have each processor accessing memory at the same time (presumably as long as the processors are accessing different banks of memory).
DJ: Think of it this way. Apple can design a CPU module that would include 1 to 4 CPUs, Memory, and a HT interface to the rest of the system. Each memory control would handle the L3 cache for the attached memory and feed the CPUs and HT interface. Apple can then use multiple CPU modules to increase the system’s capability.
If for example, Apple’s memory controller only handles 8Gbytes of ram. Using 1 module, the max system would be 4CPUs and 8GB ram. Using 2 modules, the max system would be 8CPUs and 16Gb ram. Using 3…
Matt: I wouldn’t expect the first generation of 970 macs to support the multi-module configuration. Having seperate memory for each CPU is great; however, to make the most use of it the OS has to know: 1) which phyiscal address range each CPU has, 2) schedules jobs to run on only 1 CPU, 3) be able to move starved jobs off a busy CPU to an idle CPU, and 4) allocate memory for a job in the controlling CPUs memory range. While all this is posiable; I don’t think Apple is going to make it time wise. Apple is fighting the 32bit to 64bit move (the kernel has never ran on a 64bit system) and the GCC’s poor support for the PPC chip sets (GCC is not very good at handling non-x86 processors).
Thus, I think Apple is just going to make it work with this release and then work on NUMA support for the next release.
As mentioned above, the 970 doesn’t have an onboard memory controller (nor HT), but Apple/IBM could use a similar layout with seperate memory controller chips. The advantage is you’re seperating memory out, so you can have each processor accessing memory at the same time (presumably as long as the processors are accessing different banks of memory).
This chip is expected to be manufactured by AMD, containing much of the same logic from the Opteron itself (as far as the memory controller and HyperTransport controller)
Matt: I wouldn’t expect the first generation of 970 macs to support the multi-module configuration. Having seperate memory for each CPU is great; however, to make the most use of it the OS has to know: 1) which phyiscal address range each CPU has, 2) schedules jobs to run on only 1 CPU, 3) be able to move starved jobs off a busy CPU to an idle CPU, and 4) allocate memory for a job in the controlling CPUs memory range. While all this is posiable; I don’t think Apple is going to make it time wise. Apple is fighting the 32bit to 64bit move (the kernel has never ran on a 64bit system) and the GCC’s poor support for the PPC chip sets (GCC is not very good at handling non-x86 processors).
Okay, a number of points here:
As far as scheduling goes, Mach was designed from the ground up with SMP scalability in mind. One of the advantages of a microkernel architecture is that there’s no low-level locking required… since everything is handled with message passing (typically) scaling a microkernel across multiple processors becomes relatively simple. FreeBSD is using a highly tuned version of what was originally the Mach VMM, and so Apple took the FreeBSD unified buffer cached and attached it to XNU’s Mach VMM (why Apple won’t simply take the FreeBSD VMM and refactor it to work with XNU I’m not really certain…) and added the FreeBSD VFS. Since all this source was from FreeBSD 3.x and 4.x that code also contains Giant, which means there’s an SMP scalability issue with its VFS, but otherwise the kernel is designed for excellent SMP scalability.
Now, the issue at hand is that Apple is going to use HyperTransport for SMP on PPC970-based systems. The only addition this will require to XNU is NUMA support. This seems like an extremely likely addition to the kernel for Panther (which will probably be one of the things Apple will be discussing at the conference) as were it simply used for interconnect between what are essentially the north and south bridge chipsets there would be no need for OS support. Furthermore, due to the complexity of feeding a synchronous bus with packetized data (especially a bandwidth hungry one like AGP 8x) it would seem foolish to use HyperTransport simply for interconnect between the chipsets without plans to use it for SMP.
Most likely the new Macs will consist of PPC970 processors from Apple and a HyperTransport/memory controller chipset from AMD as well as a HyperTransport to AGP/PCI bridge chipset from AMD, as was the general feeling of Apple and AMD’s involvement from this OSnews article:
http://www.osnews.com/story.php?news_id=3363
you seem well informed on the ppc 970, but your bus info. is incorrect. you state the bus is bi directional, while in truth it is 2x450mhz UNI-directional (hence 2x), (top-model, possible less on lesser models) one is outgoing and one incoming if it helps you to think of it that way. i try to be educated as well on these things, just a friendly correction.
(what the title says)
you seem well informed on the ppc 970, but your bus info. is incorrect. you state the bus is bi directional, while in truth it is 2x450mhz UNI-directional (hence 2x), (top-model, possible less on lesser models) one is outgoing and one incoming if it helps you to think of it that way. i try
That is actually old information, and unfortunately I cannot find anything more recent on it than this, which agrees with you:
http://www.arstechnica.com/cpu/03q1/ppc970/ppc970-4.html
The most recent information I have read on the PPC970’s bus is this, although I can not find anything to support it:
* The PPC970’s bus does not *necessarily* run at half the core clock (unlike Ars Technica was saying)
* The PPC970’s bus is composed of two 32-bit channels and supports 3 operating modes:
64-bits dedicated to input
32-bits dedicated to input, 32-bits dedicated to output
64-bits dedicated to output
This information is coming from an updated Ars Technica article I read but now cannot find. Please let me know if you know the URL.
I am unable to find the original article from which I got my information, so I went straight to IBM’s web site for the answer, and it appears that anonmac is correct:
http://www-3.ibm.com/chips/techlib/techlib.nsf/techdocs/A2CE393ABF2…
Here’s the relevant portion:
The 970?s FSB consists of two 32-bit unidirectional buses, one for loads and the other for stores. Technically, they?re source-synchronous point-to-point interfaces instead of true multidrop buses. In that respect, they?re similar to the Alpha EV6-derived FSB on Athlon XP processors. But the 970 drives its FSB at a phenomenally high effective clock speed: up to 900MHz. This compares with 533MHz for the fastest Pentium 4 and 333MHz for the fastest Athlon XP.