Linked by Intel Researchers (for OSNews) on Wed 12th Mar 2003 23:25 UTC
Intel Processors with Hyper-Threading technology can improve the performance of applications by permitting a single processor to process data as if it were two processors by executing instructions from different threads in parallel rather than serially. However, the potential performance improvement can be only obtained if an application is multithreaded by parallelization techniques. This article presents the multithreaded code generation and optimization techniques developed for the Intel C++/Fortran compiler. We conduct the performance study of two multimedia applications parallelized with OpenMP pragmas and compiled with the Intel compiler on the Hyper-Threading (HT) technology enabled Intel single-processor and multi-processor systems.
Order by: Score:
good times ahead for all
by deb-man on Thu 13th Mar 2003 03:04 UTC

that use this tech on Linux, BSD, or windows. when will IBM make this tech work in the 970 class proc?

OpenMP, HT, etc
by Michael on Thu 13th Mar 2003 03:18 UTC

OpenMP works best on programs where you have a dataset that can be split into chunks and then the chunks assigned to worker threads. Much "multimedia" is great for OpenMP because you can easily segment the data.

What is problematic though is the software language side support for doing high-performance threading is immature compared to the compiler and some of the high-performance libraries (OpenMP, MPI, et al).

We have many C++ programming frameworks that are not even thread-safe much less amenable to thread-based optimization. Most current C++ GUI frameworks are classic examples of frameworks that were not designed with high-performance threading in mind.

As the hardware gets more and more evolved threading support, I would expect to see languages start tracking these developments and we will see new advanced parallelism constructs in our familiar languages.

To date, I know of only Erlang has having implemented pervasive multi-processing.

10-20 % increase in speed on a single chip. Is it worth all the trouble?
by bill gate's boyfriend on Thu 13th Mar 2003 04:06 UTC

good effort but it's not in that 50-100 percent speed increase range where you say "wow, real cool".

It's nice.

20% increase will give 80% benifit
by deb-man on Thu 13th Mar 2003 04:51 UTC

it is the law!!!

@ bill gate's boyfriend
by Anonymous on Thu 13th Mar 2003 06:30 UTC

Think again...:

An Intel P4 2800 costs 439 E, a 3,06 costs 699 E - now do the math and figure how many percent that is for the little increase in speed. I have seen videos from THG where two systems running head to head with video applications are equally fast. One of them is a plain 3,6 GhZ P4, the other 3,06 with HT enabled. Now, what does this tell us? In the above case you pay more than 70% extra for only 7% more CPU-power. With HT, you get 20% free and you don't care..? - So be it..

by me on Thu 13th Mar 2003 08:48 UTC

I thought pseudo-code should be written so it was easily readable. ;)

By Contributing Editor Intel Researchers
by robUx4 on Thu 13th Mar 2003 09:26 UTC

Who is that ?

RE: By Contributing Editor Intel Researchers
by Eugenia on Thu 13th Mar 2003 09:29 UTC

Read on the last page as to who is who. There are 5 of them, there was no space in the db field to mention all of them by name.

Re: Contributing Editor Intel Researchers
by mmu_man on Thu 13th Mar 2003 09:35 UTC

> Who is that ?
Wonders too, who uses OSNews as intel PR ;)

To me that looks a bit too "scientific" for the average OSNews reader, but maybe I'm wrong ;)
I'll definitely read it all sometime.

I really wonder what this thing would give with a multithreading-crazy BeOS (where we don't need optimizing compilers)...
Btw, I recently noticed VideoLan Client on BeOS was even more multithreaded than native media players ;)
(is it too on other platforms ?)

Re: Contributing Editor Intel Researchers
by Eugenia on Thu 13th Mar 2003 09:37 UTC

>To me that looks a bit too "scientific" for the average OSNews reader

I don't think so. Supposedly most of our readers are actually programmers/engineers:

complex article
by Interfacer on Thu 13th Mar 2003 09:44 UTC

I read the first 2 pages, and then decided that i will try to read it again some other time when i can take some more time to digest it.

some more lay-men explanation with the examples
would have been nice though.

om a side note: i would very much like to run BeOS or OpenBeOS on a dual CPU hyperthreading system. for example dual XEON or so. this would allow 4 threads in parallel.
I already use BeOS on my dual PIII and it rocks.

on the other hand it might be worth waiting for XEON 32/64 bit. i still think intel will release 32 bit compatible CPU's once AMD starts selling them. they had better, because i will not fork out 4000$ for a single CPU itanium2.


was this article meant to be readable (to me)?
by Charlie on Thu 13th Mar 2003 11:02 UTC

This looks like a draft for a peer-reviewed journal paper, and hence targeted at a different audience than me and presumably a lot of others. Can't comment on the facts as I got lost on the 2nd para! I'm sadly not a computer scientist. I've got no problems with such stuff appearing on OSNEWS though - makes a break from looking at log files :-)

Ars Technica has a great introduction to hyperthreading
by Roel Schroeven on Thu 13th Mar 2003 11:40 UTC

If you want to know what hyperthreading is all about, in understandable English, read the great article at Ars Technica:

Re: Re: Contributing Editor Intel Researchers
by mmu_man on Thu 13th Mar 2003 12:48 UTC

> >To me that looks a bit too "scientific" for the average OSNews reader
> I don't think so. Supposedly most of our readers are actually programmers/engineers:

Yes, though this looks more like a Ph.D. paper ;)
(don't have anything against that btw)

Re: who wrote this?
by Jim on Thu 13th Mar 2003 13:05 UTC

Page 5 lists the references and authors. The article was written by 5 PhD’s (to include other degrees). Intel has always been very good about providing in depth documentation about their microprocessor architecture. You have to wonder about it's usefulness to AMD sometimes.

Re: Contributing Editor Intel Researchers
by Sagres on Thu 13th Mar 2003 13:57 UTC

To me that looks a bit too "scientific" for the average OSNews reader, but maybe I'm wrong ;)

Don't let the math formulas fool you, these kind of scientic articles always have them but no one really reads them, unless of course there's no source code examples and we really have to ;)

Instruction streams...
by lamo on Thu 13th Mar 2003 16:16 UTC

I am surprized that OpenMP helps. It would seem the best case would be two instruction streams that are not related. OpenMP is usually used to create threads doing the same operations. In this case it is would be seem that they would be competing for the same resource. Perhaps this make up for the lack of registers in a p4. Does having the second state allow more data to flow to the same resources? Anyone know?

I did not see mention of the negatives. Is it just die space or do single streams get a performance/latency hit?

I would imagine with the poor state of smp in most OSS kernels that pretending to have two processors could easily more then make up for that performance increase.
I have seen lots of tests where 2 processors slows the linux kernel down instead of speed it up.

But perhaps on a HPTC machine having a separate virtual processor to handle os requests might not be too bad.

Anyone know what the big p4 Xeon linux clusters do about hyper threading?

So/So Performance
by Hank on Thu 13th Mar 2003 18:57 UTC

Lets take a close look at their results, referring to figures 13 and 14. The hyperthreading is giving them at best a 13% speed boost over the non-hyperthreading scenario. This is evident in the single processing case. The inherent parallelism of the operation is evident by the fact that they get a nearly factor of 2 speed improvement in the dual processor case. The speedup in the hyperthreaded dual processors is simply an aggregate of the ten percent speed gains within each processor. What Figure 13 is therefore showing is that even in cases where parallelism is excellent in the algorithm, by evidence of the boost in the DP score, we still only get marginal speed improvements with hyperthreading.

Figure 14 shows an inherent problem with trying to fool the system into thinking there are four processors instead of only two as well. As it states, the algorithm is really only working on three processes simultaneously. The system, believing it has four full fledge processors, is therefore inefficiently distributing idle tasks among the two physical processors, in deference to the four simulated processors. This therefore shows that there can be a functional decrease in speed in a hyperthreaded system. The single processor hyperthreaded case for the algorithm used in Figure 14 did perform very well, but again it is evident that the algorithm itself lends itself to parallelism, by looking at the dual processor case.

This article therefore highlights two things in my mind:

1. OpenMP is effective in parallelizing algorithms "on the fly" so to speak.
2. Hyperthreading does increase performance, but not substantially.

Are there articles on simultaneous thread executions on completely different computations, rather than functionally parallel threads. For example, what kind of speed up would there be if one thread was doing the SVM calculation and the other was doing the AVSR one? Better still, what would happen if we distributed two threads for SVM execution and three for AVSR? Interesting thoughts....

OpenMP and Optheron support
by Guma on Fri 14th Mar 2003 16:39 UTC

Does any one know it AMD has any tools for openmp in works?
Anyone else owrking on this? Will intel Compiler (in 32 bit only:( will work on Optheron?