Geek Patrol benchmarked a MacBook Pro and a PowerBook G4. Not surprisingly, they concluded: “The MacBook Pro outperformed the PowerBook G4 in almost every benchmark. Since all of the MacBook Pro’s baseline scores are over 100, it even outperformed our baseline system, a Power Mac G5 1.6GHz! The only benchmark where the PowerBook G4 outperformed the MacBook Pro, Stdlib Allocate, depends more on library performance than raw hardware performance. If you’re upgrading from a PowerBook G4 (or even an early Power Mac G5), you’ll certainly notice how much faster the MacBook Pro is, especially with multi-threaded applications.”
These benchmarks are excellent. Hopefully this is indicative of a bright future for Apple. I had my doubts about the chip switch, but so far things seem to have been fairly seemless.
I am currently using a G4 PowerBook which has been very smooth in its operation and always processes things quickly. If the operational quality is as good as those benchmarks indicate, the new Apple machines must be great performers.
I’m not interested at all in upgrading to the MacBook Pro (no need – Tiger is doing quite well on the G4 I have), but it is nice to see that the Series A MacBooks are doing so well.
The benchmarks show some indication of the performance of both systems, however, the access of processed data is very nice and smooth. which is not the case in real world application.
real world applications have random data access; which make the performance more dependent on the cache/memory architecture of the system as well on the design of the pipeline in the processor and brach prediction performance.
-B
So why are multithreaded apps particularly good? Is it all about branch prediction, or what.
I any event, the results are gratifying. I expected the Intel chips to be pokey as molasses. I am glad I was wrong. Clearly, MOT and IBM spent the last few years asleep at the wheel. Regretable. They COULD have put Intel and AMD out of business if they had only TRIED HARDER. I am looking forward to seeing Vista vs OS X benchmarks next year!The must be nervous in Redmond!
Why are multithreaded apps extra good? Because there’s a second CPU core. That means double the speed for programs that can use two processing threads. It also means improved responsiveness to the user, since multiple cores mean the system can always give full attention to the user while still getting its work done.
Regretable. They COULD have put Intel and AMD out of business if they had only TRIED HARDER.
Right. Because the fact that Intel and AMD have some of the brightest minds in the business, and they probably spend an order of magnitude more on R&D means that IBM and Moto could have beat them just by “trying hard”?
It’s suppose to be faster, a new chip, etc. I really want to know is the Intel build of OS X faster then the PPC. Isn’t the x86 a native build anyways?
I just need some Pro App’s for my new Mac purchase…then I will buy.
I really want to know is the Intel build of OS X faster then the PPC. Isn’t the x86 a native build anyways?
Yeah, it’s all native. Pro Apps aren’t there yet, but I heard some of them would go Intel native in March.
yeah, but isn’t Mac’s power PC supposed to be faster and superior than that of Intel?
Sorry, not trying to be a troll, just trying to prove a point.
Millions of Mac users have been mislead and we don’t have to really show this latest information to prove that either.
What about battery life? 200-300% more performance is useless if the battery life is cut half. Any reviews with more insight on this issue?
It’s supposed to be roughly the same. The batter capacity is a lot greater, but the time it can remain on is virtually that same as the PowerBook G4.
Terrible results from it:
malkiaBook:~/Desktop/Geekbench Preview 2 malkia$ ./geekbench
Geekbench Information
Version: Geekbench Preview 2 (r73)
Compiler: GCC 4.0.1 (Apple Computer, Inc. build 5250)
System Information
OS: Version 10.3.9 (Build 7W98)
Model: PowerBook6,3
Motherboard: PowerBook6,3
CPU: PowerPC G4 (7450)
CPU ID: 18, 11
CPU Count (Physical): 1
CPU Count (Logical): 1
CPU Frequency: 933 MHz
Bus Frequency: 132 MHz
Memory: 640 MB
CPU Integer Performance
Emulate 6502 19 (1 thread, 35.48 megahertz)
Emulate 6502 27 (4 threads, 50.97 megahertz)
Blowfish 55 (1 thread, 76.4 megabytes/sec)
Blowfish 71 (4 threads, 107.8 megabytes/sec)
bzip2 Compress 16 (1 thread, 2.96 megabytes/sec)
bzip2 Compress 26 (4 threads, 4.876 megabytes/sec)
bzip2 Decompress 21 (1 thread, 8.816 megabytes/sec)
bzip2 Decompress 28 (4 threads, 12.25 megabytes/sec)
CPU Floating Point Performance
Mandelbrot 17 (1 thread, 119.7 megaflops)
Mandelbrot 24 (4 threads, 168.1 megaflops)
Memory Performance
Latency 67 (1 thread, 155 nanoseconds/load)
Read Sequential 17 (1 thread, 126.7 megabytes/sec)
Write Sequential 35 (1 thread, 207.5 megabytes/sec)
Stdlib Allocate 34 (1 thread, 26.96 kiloallocs/sec)
Stdlib Allocate 61 (4 threads, 47.9 kiloallocs/sec)
Stdlib Write 10 (1 thread, 163.1 megabytes/sec)
Stdlib Copy 12 (1 thread, 97.36 megabytes/sec)
Stream Performance
Stream Copy 10 (1 thread, 152.8 megabytes/sec)
Stream Scale 10 (1 thread, 147.3 megabytes/sec)
Stream Add 4 (1 thread, 67.25 megabytes/sec)
Stream Triad 4 (1 thread, 61.5 megabytes/sec)
Submit results to geekpatrol.ca? [Y/n] ^C
some of the things are more than 10x slower than the Intel PowerMac…
And here are the results without Power-cable plugged in (less Mhz, although the GeekBench still reports 933Mhz)
CPU Integer Performance
Emulate 6502 11 (1 thread, 20.3 megahertz)
Emulate 6502 23 (4 threads, 43.39 megahertz)
Blowfish 37 (1 thread, 52.23 megabytes/sec)
Blowfish 53 (4 threads, 81.75 megabytes/sec)
bzip2 Compress 11 (1 thread, 2.06 megabytes/sec)
bzip2 Compress 21 (4 threads, 3.99 megabytes/sec)
bzip2 Decompress 13 (1 thread, 5.608 megabytes/sec)
bzip2 Decompress 22 (4 threads, 9.636 megabytes/sec)
CPU Floating Point Performance
Mandelbrot 10 (1 thread, 72.79 megaflops)
Mandelbrot 20 (4 threads, 144.7 megaflops)
Memory Performance
Latency 50 (1 thread, 206.4 nanoseconds/load)
Read Sequential 13 (1 thread, 96.74 megabytes/sec)
Write Sequential 28 (1 thread, 165.9 megabytes/sec)
Stdlib Allocate 23 (1 thread, 18.26 kiloallocs/sec)
Stdlib Allocate 49 (4 threads, 39.06 kiloallocs/sec)
Stdlib Write 8 (1 thread, 137.6 megabytes/sec)
Stdlib Copy 10 (1 thread, 80.06 megabytes/sec)
Stream Performance
Stream Copy 7 (1 thread, 104.6 megabytes/sec)
Stream Scale 7 (1 thread, 101.6 megabytes/sec)
Stream Add 3 (1 thread, 52.27 megabytes/sec)
Stream Triad 2 (1 thread, 35.99 megabytes/sec)
they are about x2 x3 slower compared to when the power cable is plugged in to the laptop (off course I can disable that, but I would like to conserve power).
It’s still the same benchmark they used to compare some various machines 3 weeks ago. It still sucks. Let’s wait for real applications with universal binaries to truly measure real-life performance of new Macs.
I’ll give you one. PostgreSQL. In my rudimentary testing, I’m seeing nearly 150% to performance comparing an iMac CoreDuo (1.63) and a PowerMac G5 dual 1.8, both with 1.5gb RAM. I would be willing to bet that the numbers are even more startling compared to a PowerBook G4, or even an iMac G5, and that’s with no optimization for the x86 being done, just a tar xzf, ./configure, make, make install, initdb. Same test DB. The hard drive speeds should in theory give the PowerMac G5 an advantage, it doesn’t appear to though.
Sure, that’s more informative, thanks. I don’t doubt that new Intel processors are faster but I’d like to know how much faster they are. Syntetic benchmarks are not that great for showing exactly performance in Photoshop.
The PostgreSQL performance hike is less surprising than it might be.
Consider firstly that GCC for x86 is far more heavily optimized than for PPC; by a considerable margin. (Undoubtedly recent x86 chips also represent a significant improvement).
Secondly, PostgreSQL is more of a sophisticated DBMS rather than a SQL veneer over ISAM. In other words it hits the CPU harder than some DBMS that one could name and is therefore likely to reflect increases in CPU speed more sharply.
Beyond those factors; it should be born in mind that the bottleneck for most DBMSes, most of the time, is not the CPU but rather the performance of the hard disk systems (which are not as simplistic as we all sometimes quote).
“Nearly 150%” – great, I believe you. However, I would be interested to see how the comparisons work out where identical hard disk models are used.
Comparing recent IDE drives with large caches and independent intelligence (formerly only seen in hi-spec SCSI models) being compared to drives a few years old … really could predispose a comparison to a particular result.
I have read in my computer architecture course that the RISC processors are much faster than CISC. Intel’s processors are CISC and IBM’s power processors are RISC. Then how the hell this Intel’s processor is faster than IBM’s.
Current x86 processor designs basically are RISC at the core, with some translation hardware thrown on to interpret the CISC instructions. But saying all RISC processors are faster than CISC is simplistic nonsense. If one architecture has had a lot of work done optimizing it, it will be faster than another with less resources thrown at it.
Edited to add the following:
Not to mention the G4 is a several years old single core cpu and the Core Duo is a brand new 2 core cpu. Plus many people would argue that GCC produces more optimized code for the x86 processors.
Edited 2006-02-23 20:13
quote: “Not to mention the G4 is a several years old single core cpu and the Core Duo is a brand new 2 core cpu.”
True, but the G4 is the immediate predecessor to the Core Duo in Apple’s line of laptops, so it’s not exactly an invalid comparison.
I don’t mind the comparison, it’s quite useful. I was just replying to the original poster who said Then how the hell this Intel’s processor is faster than IBM’s.
Ah! fair enough. My comments keep appearing flat, so I can’t tell who’s replying to who all the time.
Where things start to get interesting I think, is when Intel begin mking chips specifically for Apple.
Intel’s main problem up until has been that they are hugely dictated to by Microsoft. It dosen’t matter how many interesting new ways they have of improving their processors, if Microsoft isn’t willing to support the feature, then it’s a waste of time.
However, now, Intel can float these new ideas with Apple. OS X has far less “legacy” requirements than Windows, and Apple will probably more interested to hear what Intel have to say about new chip features.
Where things start to get interesting I think, is when Intel begin mking chips specifically for Apple.
Dude. Dell doesn’t even get custom Intel chips — what makes you think Apple will? Motorola and IBM weren’t happy supplying custom chips for Apple’s relatively small volume, what makes you think Intel will be, especially since they have much bigger fish to fry?
Dude, Dell doesn’t get custom chips because they make Windows boxes just the same as everyone else so what’s the point?
For the case to be similar Dell would have to make the OS as well, which they don’t.
Dude, Intel sells processors to almost every PC OEM in the world. Every single one of those OEMs could sell any fringe operating system they wanted to, and it would make as little difference to Intel’s sales. Dude, codevelopming processor and operating system functionality would be interesting for a variety of classes of operating system + programming language research, but is virtually meaningless for classic operating system designs like XNU.
Dude, I think you seriously misunderstand the costs associated with developing and fabricating these processors.
Dude, I realize you think that Intel has been liberated by Apple and all but you should really keep in mind that Intel has a fiduciary responsibility to its investors.
Dude, maybe if you could tell everyone what you think exactly that Apple provides for development opportunities that Intel has been prevented from taking by Microsoft.
After all, Dude, Apple seemingly got stuck with x86 because Intel wasn’t going to add x86-64 support to its mobile line until Merom. Now Apple is going to have to transition from x86 to x86-64 and encourage all of its ISVs to support both in the future. Talk about cutting-edge.
Get a Life,
you make an interesting comment “Intel has a fiduciary responsibility to it’s investors”.
Does that mean it shouldn’t work with Apple to build the best cpu/chipset Apple can make, just because Dell can’t use it? Let me remind you that Intel does have competition( AMD ), and any good ideas from Apple help Intel compete, even if Microsoft sits on the side lines and refused to be moved into the future.
An Apple/Intel collaboration would benefit the shareholder.
Even Dell would eventually benefit, because as we’ve seen, Apple’s ideas that meet approval of the Microsoft developers get copied into the product.
Dell/Microsoft users should be kissing Steve’s A$$.
Running a proprietary line of processors for a company that sells fewer computers in a year than AMD sells processors in a quarter is a waste of money, because the marginal cost of the processors would be prohibitive to Apple’s profit margins. It would certainly be a lapse of judgement on the part of Intel, who would be the one getting stuck with all of the problems if Apple changed its mind. Apple certainly wasn’t looking to incur those costs from Motorola or IBM.
It is stunning that you honestly just suggested that Apple is going to think of some mind-blowingly stunning architecture for Intel. That just takes a previously unlikely scenario and puts it somewhere into outerspace. Please list the names of CPU designers currently working for Apple Computer’s CPU design team.
Nobody’s talking about a “Proprietary” line of processors, merely that Apple is collaborating with Intel on the next gen of processors. Intel will sell those to everyone, whether microsoft chooses to implement the features is another story.
Intel has never been dependent on Microsoft to implement its processor designs. If anything it’s constrained by having to make all of the software that runs on its architecture performant on future processor designs. That hasn’t changed at all.
Does that mean it shouldn’t work with Apple to build the best cpu/chipset Apple can make, just because Dell can’t use it?
And what exactly does Apple know about CPU design?
I don’t work for Apple,
You’d have to ask those people in the Apple/Intel group.
Hey calm down.
I don’t think I’m misunderstanding anything. The original comment, not from me BTW, was that Intel may make some custom chips for Apple. The reply said that “even Dell don’t get custom features”. All I said was that Apple is fundamentally different from Dell and everyone else.
Apple make the box AND writes the OS. No one else does that. Therefore they are different to Dell. Please note in case you are offended, I did not say “better”.
Therefore Apple hardware may be the perfect place to provide a proving ground for processor features. A bit like the Mac users were the proving ground for the iPod/iTunes environment.
BTW I didn’t start the “Dude” thing, the previous poster did.
I don’t think I’m misunderstanding anything. The original comment, not from me BTW, was that Intel may make some custom chips for Apple. The reply said that “even Dell don’t get custom features”. All I said was that Apple is fundamentally different from Dell and everyone else.
Except that they aren’t. The costs associated with doing custom jobs just for Apple are significantly higher than the amortized cost of doing developments in the main processor lines. Apple is no different than every other OEM that purchases processors from Intel, except that they have their own operating system that caters to a relatively trivial market niche. Dell could switch to selling EROS tomorrow and that wouldn’t make Intel more likely to take special orders from Dell.
Apple make the box AND writes the OS. No one else does that.
Incurring the cost of software development doesn’t make your money any greener.
Please note in case you are offended, I did not say “better”.
I am deeply offended by your insinuation that Dell is of lesser significance to Apple, despite my having never purchased anything from Dell except a couple of LCDs. Please continue to worry that my objection with this way of thinking is emotional in nature.
Therefore Apple hardware may be the perfect place to provide a proving ground for processor features.
If Apple Computer wanted to pay for the privilege of being Intel’s guinea pig it could buy Itanium processors. Much of the architectural changes Intel could make to its processors would simply break compatibility with the x86, and further target different operating system and programming language design goals that are not compatible with Apple technologies. Tacking on more SSE instructions has no benefit from Apple Computer over every other OEM. All of the meaningful improvements to Intel’s processors are architectural changes that don’t alter the ISA of the processor and thus make Apple’s platform irrelevant.
A bit like the Mac users were the proving ground for the iPod/iTunes environment.
I fail to see what the business decisions of the music cartel has in common with Intel.
BTW I didn’t start the “Dude” thing, the previous poster did.
I use the thread view, so I saw both of your comments. I just wanted in on the absurdity of the dude action. It’s 1993, dude!
“Apple is no different than every other OEM that purchases processors from Intel, except that they have their own operating system…”
Yes, that’s what I said.
“Incurring the cost of software development doesn’t make your money any greener. ”
Correct but it makes your needs potentially different and millions of chips per year might still be worth a run. Having said that, I don’t actually know the numbers. Apparently you do. With current processors, what IS a profitable sized run of processors?
“I am deeply offended by your insinuation that Dell is of lesser significance to Apple”
I did not say “lesser significance”, I said different.
“If Apple Computer wanted to pay for the privilege of being Intel’s guinea pig…”
I wouldn’t characterise it like that.
“I fail to see what the business decisions of the music cartel has in common with Intel.”
I’m sorry, I must not have explained myself fully, I was using an analogy. In the case of this analogy it was not the music cartels that were significant but the development of the system within the Mac environment before offering it to the Windows world.
When Apple released the iPod and established the iTunes music store. it was basically for Mac Users only. Mac users provided a great testbed of the viability of the system. By the time it was released for Windows it was quite mature. It was just an analogy, I did not mean to imply that chips and ipods are exactly the same thing. I’m just saying that the Macs provide a controlled microcosm for proving the usefulness of features.
Moving back to chips, my point is that in the case of Dell, Microsoft dictates what features of a CPU are used. In the case of Apple, Apple dictate what features of a CPU are used.
Case in point, TPM is used by Apple right now but is not used by any released OS that runs on a Dell. No I am not trying to imply that TPM is just for Apple, yes I know Vista will use it.
Having said that, however, Apple is clearly in a position to use newer features of processors faster because there are less people to involve.
I’m not saying Intel WILL create custom features for Apple but who knows, they might. I don’t believe it’s as ridiculous as you are making it out to be.
“I just wanted in on the absurdity of the dude action. ”
So I see.
Why do I think Apple may get custom chips? Read my post above properly!
There would be no point in Intel making custom chips for Dell, since the OS would still be Windows and unable to take advantage of any new features in such a Dell-Intel chip.
But Apple on the other hand are making the OS as well as the computers, so they can talk to Intel to create chip features specifically for them.
I’m not saying it will hapen, just that it will be interesting if it does.
Yeah IMHO PPC will always be better than x86. Someone just needs to make like a better G5, or other type of PPC. I personally find performance with PPC + Linux (Like YDL ,or SlackinTosh (pref Slackintosh)), way better than any of my x86 *nix boxen.
Yeah and let get a benchmark with same speed to see how fast intel is. It’s like putting a new Ferrari vs old Mazda RX-7. It can hold its on but….. kind of dated.
As everyone knows, the only important part of a processor is its ISA!
RISC versus CISC is an engineering tradeoff — it’s not a matter of one being better than the other. Moreover, the last x86 chip to be truely CISC was the Pentium, which is a decade old. All Intel and AMD processors since the Pentium Pro have been RISC processors internally. And PowerPC never was a particularly pure RISC.
Actually, x86 is an interesting example of the tradeoff shifting around beneath the hood of the processor. At first, there was a move to RISC, with CPUs like the Pentium Pro and Pentium II breaking everything down into internal RISC-like operations. A single x86 instruction could produce several internal micro-operations on a Pentium Pro. Slowly, CISC gradually crept back into the processor core. The latest x86 processors fuse micro-ops together, treating them as a single entity for the purposes of instruction scheduling.
ADD [ESI],EAX
as a single operation, the Pentium Pro would execute it as three micro-ops, while the Pentium-M again executes it as a single fused micro-op.
The reasons for this tradeoff are something you should learn later in your computer architecture course. Basically, simple instructions are easy to execute. The hardware for them is smaller, and can be made to cycle quickly. That’s why the purest RISCs, like the Alpha, have mostly 3-address instructions (two input, one output) that correspond directly to a single operation executable very quickly in hardware. On the other hand, instruction dispatch slots and reorder buffer entries are expensive. The size and complexity of these circuits grow quadratically with the number of instructions handled. Anything that reduces the number of instructions necessary to accomplish the same thing, such as using CISC-ier instructions that do more than one thing, reduces how big these expensive parts of the chip have to be.
“Fusing” micro-ops is done to tie dependent and indpendent instructions together to subvert the typical out-of-order scheduling of the underlying micro-ops. Of its advantages one can be the reduction of power consumption depending on the design of the RS by reducing the number of activations associated with the macro-op. Another is to improve the general utilization of entries and decoder throughput.
I don’t know what you’re talking about when you say that “size and complexity” of the ROB and DB grow quadratically with the number of “instructions handled.” An explaination would be helpful there.
The point of micro-ops fusion though isn’t to change the number of entries available in the ROB and RS (which consume a large portion of the chip real estate by design) but rather to improve the utilization through increased density and scheduling to improve IPC.
It’s possible that some might find
http://appft1.uspto.gov/netacgi/nph-Parser?Sect1=PTO2&Sect2=HITOFF&… to be interesting.
“Fusing” micro-ops is done to tie dependent and indpendent instructions together to subvert the typical out-of-order scheduling of the underlying micro-ops.
Right. Because that simplifies the scheduling compared to treating micro-ops seperately.
Another is to improve the general utilization of entries and decoder throughput.
Which is just another way to put the same thing. With micro-ops fusion, you can have more effective instructions in flight with the same number of tracked entries. Or, you can use a smaller scheduler with fewer entries and still keep an equivalent number of instructions in flight.
I don’t know what you’re talking about when you say that “size and complexity” of the ROB and DB grow quadratically with the number of “instructions handled.”
The size of the structures used for out-of-order execution grow quadratically with the depth of the instruction window. Fusion of micro-ops can increase the effective size of the instruction window without increasing the physical number of OOO instructions in flight. Intel says its micro-ops fusion reduces the number of instructions in flight by something like 10%. This increases the effective number of instructions in flight, given the same number of ROB entries, by 11%. To achieve the same effect by increasing the number of entries in the ROB, the area of the circuitry would increase by almost 25%.
The point of micro-ops fusion though isn’t to change the number of entries available in the ROB and RS (which consume a large portion of the chip real estate by design) but rather to improve the utilization through increased density and scheduling to improve IPC.
I didn’t say it increased the number of entries available in the ROB. I said it increases the number of instructions handled by the ROB. If you dispatch a LOAD-OP as a single entry, you can effectively increase your issue rate without actually increasing the issue width of the processor. If you go further and track fused-ops with a single ROB entry (as I believe the P-M does), you increase the effective size of the ROB with the same number of entries.
Right. Because that simplifies the scheduling compared to treating micro-ops seperately.
I would not say that it simplifies scheduling. It more effectively takes advantage of knowledge of the macro-ops to improve the overall scheduling. It might be argued that it’s simpler than the necessary logic to obtain the same result by the micro-op scheduler.
The size of the structures used for out-of-order execution grow quadratically with the depth of the instruction window.
The area increases quadratically with the issue-width.
I didn’t say it increased the number of entries available in the ROB.
No, you said:
Anything that reduces the number of instructions necessary to accomplish the same thing, such as using CISC-ier instructions that do more than one thing, reduces how big these expensive parts of the chip have to be.
While trivially true that reducing the number of micro-ops by introducing fused micro-ops could permit fewer ROB entries, this isn’t the intended purpose. In fact the Pentium M increases the width of the instruction window over the Pentium 3. The intended purpose is to obtain more work out of the area Intel is already paying for.
I would not say that it simplifies scheduling. It more effectively takes advantage of knowledge of the macro-ops to improve the overall scheduling.
Load-Op fusion simplifies scheduling because it collapses the number of possible dependencies between the instructions. The CPU knows that the OP depends on the LOAD, and the result of the LOAD is consumed only by the OP.
The area increases quadratically with the issue-width.
It increases quadratically with the size of the instruction window as well, to the extent that complexity is a good measure of required area.
While trivially true that reducing the number of micro-ops by introducing fused micro-ops could permit fewer ROB entries, this isn’t the intended purpose.
It’s not “trivially true”. Saying “using micro-ops can reduce the size of the ROB to achieve the same number of instructions in flight” and saying “using micro-ops can increase the number of instructions in flight with the same size ROB” is completely equivalent.
In fact the Pentium M increases the width of the instruction window over the Pentium 3.
The Pentium-M also has a higher IPC target than the P3. Read my sentence again and try to understand. “…reduces how big these… have to be.” If you have a certain IPC target for your design, and you determine you need to maintain a certain number of instructions in flight to achieve that IPC target, then you have to have a ROB of a particular size. If you use op-fusion, you can reduce how big your ROB has to be to still meet your target. The P-M achieves IPC comparable to the Opteron, while using a much smaller instruction window. Micro-ops fusion is one of the things that allows them to get away with doing this.
Load-Op fusion simplifies scheduling because it collapses the number of possible dependencies between the instructions.
It doesn’t change the dependencies between the instructions (the fused micro-ops are still two micro-ops defused at the RS), the introduction of fused micro-ops removes the need for the micro-op scheduler to determine this dependency, which removes time-complexity but doesn’t reduce the complexity of the scheduler.
It’s not “trivially true”. Saying “using micro-ops can reduce the size of the ROB to achieve the same number of instructions in flight” and saying “using micro-ops can increase the number of instructions in flight with the same size ROB” is completely equivalent.
And not especially meaningful. The purpose of Intel’s micro-ops fusion isn’t to reduce the size of the ROB. It’s designed to reduce the pressure of the micro-op scheduler and improve density. It’s a performance/power win. The entire emphasis of your post was on the reduction of cost associated with using a smaller area of the die. Just increasing the size of the instruction window 10% (that isn’t how the instruction window would be scaled but we’ll just talk hypothetically) wouldn’t have the same impact. The processor isn’t simpler as a result of introducing micro-ops fusion, it’s actually more complex.
It doesn’t change the dependencies between the instructions (the fused micro-ops are still two micro-ops defused at the RS),
According to the information I’ve seen, the micro-ops stay fused until they RS dispatches them seperately. This means that the RS tracks fused micro-ops instead of unfused ones. This simplifies the determination of dependencies for the two instructions, and reduces the number of RS entries necessary to achieve the same-sized instruction window.
And not especially meaningful. The purpose of Intel’s micro-ops fusion isn’t to reduce the size of the ROB. It’s designed to reduce the pressure of the micro-op scheduler and improve density.
Of course its a way to reduce the size of the ROB (and RS). Improving density is not an end in itself. Why do you want to improve density? Why go to all the extra trouble in the decode stage to fuse micro-ops? So you can get away with fewer ROB and RS entries in order to achieve the same number of instructions in flight.
You don’t seem to understand what I’m trying to say. I’m not saying Intel introduced micro-ops fusion in order to make the ROB smaller in the PM versus the P3. I’m saying that they introduced micro-ops fusion in order to make the ROB smaller in the PM than it would otherwise have to be in order to achieve the same IPC.
It’s a performance/power win. The entire emphasis of your post was on the reduction of cost associated with using a smaller area of the die.
The focus of my post was on the reduction of circuit complexity in the crucial structures used for OOO execution. The size of these components is a good metric for evaluation performance, power, and cost, because all of these things are a function of the size. A bigger ROB makes the die bigger, which costs more, it uses more power, and it can affect the clock rate by increasing the critical path length.
Informative. Thanks.
“Intel’s processors are CISC and IBM’s power processors are RISC. Then how the hell this Intel’s processor is faster than IBM’s.”
This comment is an unfortunate relic of bourgeois thinking which your education has not entirely eliminated. You need to work harder and understand better and reform your thoughts. Your teachers will help you with this problem.
At one time it was a helpful simplification for the understanding of the masses to explain that the Risc chips employed by the Party were superior to the Cisc chips employed by the counter-revolutionaries. This was true, like all the utterances of the Party. Now however that distinction has ceased to be of any importance, and references to it and to the former policy will only confuse the masses and should be avoided.
It is especially important not to engage in destructive criticism by scrutinising the pronouncements of the Party for apparent inconsistencies. To do this will only draw attention to your failure to understand the historical process.
What matters is looking forward. We have now, as we always had, the best chips, the best architecture, and above all, the best leader….
The original PPC had more of a place in the world of computing than the current deliveries from IBM and Freescale, as the ‘clock frequency race’ really did not matter in the days of the PC604 e.g.
In those days it was more a matter of raw efficiency; How many instructions could be executed per clock cycle.
Today, everything is so overly complex, that people may still argue that the G4, with its AltiVec instructions, is faster at some types of jobs than e.g. the Core Duo, even…
However, there are so many factors that together make up the picture of what processing power is, which then opens up doors to very different designs of each their own significance (Sun’s Niagara, Intel’s Core Duo and IBM’s Power5+ to name a few).
The scale is gonna tip over to one side at one point, and back to the other at another point in time… It’s a sort of yin and yan of computing.
‘Risc is better’, ‘CISC is RISC’, ‘short pipelines vs longer ones’, the focus on adequate ‘branch prediction’ … It all depends on the times and the average use…
Intel might very well be faster than PPC as long as its strenghths are accentuated, however the opposite might be also be true at the same time — without making the opposition rubbish…
In short: There is a time for everything!
I like my iBook 1.2 it is perfectly well suuited for what I do. I am not a fats typist, and play precious few games nothing deeper than civ or SMAC in classic (Addict) when I am on a long commute.
But when taxes come in then this box is going to my wife. I have loved Apple and the AIM and the risc v. cisc argument up until 1998. Then the AIM consrtium sort of lost it’s way. I have to let it go and have my favorite Unix at full speed. There really is no other way to say it. No I do not think it will make me type 4x faster and I do not use the pro Apps, but it will be all the right strengths and all of the future headroom without any real weknesses. Apple propabaly has an emulation layer (HW) to let windows run like classic, and I do not think that will see the (public) light of day.
83-93 68K
93-2000 PPC
2005-? Intel
Does anybody even miss OS 7.x?
Sorry, but I’m confused. Are you buying another MAC or going over to Windows?
LOL Of course I am staying Mac. I could not imagine having to use windows. Not to be a hater but win32 is completely irrelevant for what I need to do. And once I get used to security and having *everything* that I need access to in ‘~/’ then Windows seems alien
Once in a while I boot OS nine on one of the older G3’s to throw carrier landings in F/A 18 Hornet (by graphisim) or get shot down by Mig29’s in Falcon 4.0. All of my other games run Native in 10 for PPC
So for work and fun I am all about the MacOS it is about the experience not the chip.
Apple should of switched to Intel years ago.
Since all of the MacBook Pro’s baseline scores are over 100, it even outperformed our baseline system, a Power Mac G5 1.6GHz!
They’re surprised that a brand new dual core 2GHz CPU with four times the L2 cache outperformed a four year-old single core 1.6GHz CPU. Wow, I’m totally blown away myself.
We all know about these 1D benchmarks… I wanna see 2D graphs plotting CPU performance about the heat in my lap and the noise in my room.
I never see benchmarks on temperature or decibels. I have a small Crusoe-based minilaptop. A bit slow, but at least I can touch it! (after 5 minutes of typing on a PowerBook G4, my hands go sweating! so hot!), and also I can listen to MP3 and actually hear them and not those noisy fans, and even sleep with the computer turned on.
That’s what I want. Can’t trust any benchmark if they don’t give me the information I want.
7 year old chip beaten by 2 month old chip….film at 11.
Seriously, if the new MacBook didn’t outperform the G4, that would have been shocking and sad. The G4 was getting outdated by the time the G5s came (which came out years after expected, moto mis-start and so forth).
It’s very impressive how long the life of the G4 was stretched, but still was a very old chip. Though on the flip side, the Core Duo has some PIII roots.
Seriously, if the new MacBook didn’t outperform the G4, that would have been shocking and sad. The G4 was getting outdated by the time the G5s came (which came out years after expected, moto mis-start and so forth).
I think the interesting part of the article is the actual margins it beat it by, not so much that it beat it. I was expecting better performance sure, but the numbers were a little better than what i was expecting.
That the new Intel machine ate the G4 is no surprise. It’s nothing to do with age of the core or RISC V’s CISC though.
The G4 just hasn’t been updated at the same rate as the Intel processors and it shows, badly.
The G4 in that laptop is built using 130nm silicon, that’s 2 generations behind the Core duo.
The G4 also has a front side bus which crawls along by today’s standards.
These differences mean the G4 machine run slower and hotter than the Intel one. There are more modern G4s such as the 7448 with more cache built on 90nm but they arrived too late. The G4 is due to be replaced with the 8641D which has on die memory controllers and runs a lot cooler but it isn’t due until the end of the year.
Apple have been very clever switching now because they have been able to accentuate the difference by pitting a new generation of dual core Intel parts against a previous generation of single core PPC parts. If they’d waited the differences would have been considerably smaller.
IIRC, the 1.5 GHz PowerBook uses the MPC7447A, which is a 90nm part. It was the first PowerBook to do so.
IIRC, the 1.5 GHz PowerBook uses the MPC7447A, which is a 90nm part.
Freescale have it listed as a 130nm part (you have to go digging to find it though).
what exactly does Apple know about CPU design?
I once heard a rumour that Keith Diefendorff is working for them again, if so they know rather a lot – last time he was there he designed AltiVec…
that in benchmarks, a pentium 4 is faster than a pentium 3?
the first crop of P4’s released included a 1.6ghz model which was easily beaten in benchmarks by a 1ghz P3. not that its relevant.