The performance results show that the threaded code generated by the compiler achieved up to 1.28x speedups on a HT-enabled single-processor system and up to 2.23x speedup on a HT-enabled dual-processor system. Our three key observations are: (a) the threaded code generated by the Intel compiler yields a good performance gain with the parallelization guided by OpenMP pragmas in multimedia applications; (b) exploiting thread-level parallelism (TLP) causes inter-thread interference in caches and places greater demands on the memory system, however, Hyper-Threading technology hides the additional latency and delivers a good performance gain of the whole program; (c) Hyper-Threading technology is effective on exploiting both task-parallelism and data-parallelism inherent in multimedia applications.
1. Introduction
Simultaneous Multi-Threading (SMT) [7, 15] was proposed to allow multiple threads to compete for and share all processor’s resources such as caches, execution units, control logic, buses and memory systems. The Hyper-Threading technology (HT) [4] brings the SMT idea to the Intel architectures and makes a single physical processor appear as two logical processors with duplicated architecture state, but with shared physical execution resources. This allows two threads from a single application or two separate applications to execute in parallel, increasing processor utilization and reducing the impact of memory latency by overlapping the latency of one thread with the execution of another Hyper-Threading technology-enabled processors offer significant performance improvements for applications with a high degree of thread-level parallelism without sacrificing compatibility with the existing software or single-threaded performance. These potential performance gains are only obtained, however, if an application is efficiently multithreaded. The Intel C++/Fortran compilers support OpenMP directive- and pragma-guided parallelization, which significantly increase the domain of various applications amenable to effective parallelism. A typical example is that users can use OpenMP parallel sections to develop an application where section-A calls an integer-intensive routine and where section-B calls a floating-point intensive routine, so the performance improvement is obtained by scheduling section-A and section-B onto two different logical processors that share the same physical processor to fully utilize processor resources with the Hyper-Threading technology. The OpenMP directives or pragmas have emerged as the de facto standard of expressing thread-level parallelism in applications as they substantially simplify the notoriously complex task of writing multithreaded applications. The OpenMP 2.0 standard API [6, 9] supports a multi-platform, shared-memory, parallel programming paradigm in C++/C and Fortran95 on all popular operating systems such as Windows NT, Linux, and Unix. This paper describes threaded code generation techniques for exploiting parallelism explicitly expressed by OpenMP pragmas/directives. To validate the effectiveness of our threaded code generation and optimization techniques, we also characterize and study two workloads of multimedia applications parallelized with OpenMP pragmas and compiled with the Intel OpenMP C++ compiler on Intel Hyper-Threading architecture. Two multimedia workloads, including Support Vector Machine (SVM) and Audio-Visual Speech Recognition (AVSR), are optimized for the Intel Pentium 4 processor. One of our goals is to better explain the performance gains that are possible in the media applications through exploring the use of Hyper-Threading technology with the Intel compiler.
The remainder of this article is organized as follows. We first give a high-level overview of Hyper-Threading technology. We then present threaded code generation and optimization techniques developed in the Intel C++ and Fortran product compilers for the OpenMP pragma or directive guided parallelization, which includes the exploitation of nested parallelism, and workqueuing model extension for exploiting irregular-parallelism. Starting from Section 4, we characterize and study two workloads of multimedia applications parallelized with OpenMP pragmas and compiled with the Intel OpenMP C++ compiler on Hyper-Threading technology enabled Intel architectures. Finally, we show the performance results of two multimedia applications.
2. Hyper-Threading Technology
Hyper-Threading technology brings the concept of Simultaneous Multi-Threading (SMT) to Intel Architecture. Hyper-Threading technology makes a single physical processor appear as two logical processors; the physical execution resources are shared and the architecture state is duplicated for the two logical processors [4]. From a software or architecture perspective, this means operating systems and user programs can schedule threads to logical CPUs as they would on multiple physical CPUs. From a microarchitecture perspective, this means that instructions from both logical processors will persist and execute simultaneously on shared execution resources [4].

With the Hyper-Threading technology, the majority of execution resources are shared by two architecture states (or two logical processors). Rapid execution engine process instructions from both threads simultaneously. The Fetch and Deliver engine and Reorder and Retire block partition some of the resources to alternate between the two intra-threads. In short, the Hyper-Threading technology improves performance of multi-threaded programs by increasing the processor utilization of the on-chip resources available in the Intel NetBurst™ microarchitecture.
- "Hyper-Threading Technology for Multimedia Apps, Page 1"
- "Hyper-Threading Technology for Multimedia Apps, Page 2"
- "Hyper-Threading Technology for Multimedia Apps, Page 3"
- "Hyper-Threading Technology for Multimedia Apps, Page 4"
- "Hyper-Threading Technology for Multimedia Apps, Page 5"


