posted by Dmitry Abrosimov on Mon 14th Apr 2003 22:28 UTC

"Use of Intel Integrated Performance Primitives"
Intel Integrated Performance Primitives (IPP) [3] is a software library that provides a variety of multimedia functions for developing high performance applications. This software includes a number of library segments, each of which is highly optimized for the use with one of the Intel processors (including the Intel Pentium 4 processor, Intel Itanium processor, and Itanium 2 processor). The IPP library is very effective for processing data arrays, such as vectors, matrices, images, etc. In this case application performance rate may increase highly. The other purpose of the IPP library is to simplify code writing. IPP includes functions that are often used in different calculation regions, such as vector and matrix algebra, image development, etc. To avoid bulky code writing, for example, multiplication of two matrixes, a developer may write only one line of code, which is a call to the corresponding library function.

Code samples of the force components (4-4a) calculation function (ordinary and IPP variants) are presented below.

Common part

/* structures definition */

typedef struct _Vect3D
{
    float x;
    float y;
    float z;
} Vect3D F;

typedef struct _Coord3D
{
    float* x;
    float* y;
    float* z;
    float* M;
} Coord3D C;
Ordinary code
/* arrays allocation */

    float         
    *x  = (float*)malloc( N * sizeof(float) ),
    *y  = (float*)malloc( N * sizeof(float) ),
    *z  = (float*)malloc( N * sizeof(float) ),
    *M  = (float*)malloc( N * sizeof(float) );

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

    C.x  = x;
    C.y  = y;
    C.z  = z;
    C.M  = M;
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
    Force3D( &F, N, i, &C );
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

void Force_3D( Vect3D* F, int N, int i, Coord3D* C )
{
    int j;
    float Fx = 0.0f, Fy = 0.0f, Fz = 0.0f;
    float xi = C->x[i], yi = C->y[i], zi = C->z[i];

    for( j=0; jx[j], yj = C->y[j], zj = C->z[j], mj = C->M[j];
        xj -= xi;
        yj -= yi;
        zj -= zi;
        r2  = xj*xj + yj*yj + zj*zj;
        r   = (float)sqrt( r2 );    /* distance between i and j stars */
        r2 *= r;
        r2  = mj / r2 ;
        Fx += xj*r2;
        Fy += yj*r2;
        Fz += zj*r2;
    }

    F->x = Fx;
    F->y = Fy;
    F->z = Fz;
}
IPP use code
typedef struct _Buffer3D
{
    float* dx;
    float* dy;
    float* dz;
    float* R;
    float* b;
} Buffer3D B;
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
/* arrays allocation */

    float         
    *x  = ippsMalloc_32f( N ),
    *y  = ippsMalloc_32f( N ),
    *z  = ippsMalloc_32f( N ),
    *M  = ippsMalloc_32f( N ),

    *dx = ippsMalloc_32f( N ),
    *dy = ippsMalloc_32f( N ),
    *dz = ippsMalloc_32f( N ),
    *b  = ippsMalloc_32f( N ),
    *R  = ippsMalloc_32f( N );

    C.x  = x;
    C.y  = y;
    C.z  = z;
    C.M  = M;

    B.dx = dx;
    B.dy = dy;
    B.dz = dz;
    B.R  = R;
    B.b  = b;
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
    Force3D_ipp( &F, N, i, &C, &B );
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

void Force_3D_ipp( Vect3D* F, int N, int i, Coord3D* C, Buffer3D* B )
{
    IppStatus res;

    res = ippsSubC_32f   ( C->x, C->x[i], B->dx, N );
    res = ippsSubC_32f   ( C->y, C->y[i], B->dy, N );
    res = ippsSubC_32f   ( C->z, C->z[i], B->dz, N );
    res = ippsSqr_32f    ( B->dx, B->R, N );
    res = ippsSqr_32f    ( B->dy, B->b, N );
    res = ippsAdd_32f_I  ( B->b,  B->R, N );
    res = ippsSqr_32f    ( B->dz, B->b, N );
    res = ippsAdd_32f_I  ( B->b,  B->R, N );
    res = ippsInvSqrt_32f_A11( B->R, B->R, N );
    res = ippsSqr_32f    ( B->R,  B->b, N );
    res = ippsMul_32f_I  ( B->b,  B->R, N );
    res = ippsMul_32f_I  ( C->M,  B->R, N );
    res = ippsDotProd_32f( B->dx, B->R, N, &F->x );
    res = ippsDotProd_32f( B->dy, B->R, N, &F->y );
    res = ippsDotProd_32f( B->dz, B->R, N, &F->z );
}

To prove the efficiency of the IPP library, another performance test of the Kepler application was carried out. For this test computers based on two different processors, namely, a 700 MHz Intel? Pentium? III processor and a 2000 MHz Intel Pentium 4 processor were used. Two optimized modifications of the application were designed for the Pentium III processor and Pentium 4 processor respectively (these modifications differ from each other by the IPP library segments they use and by compiler presets). Both programs were compiled by the Intel C++ compiler [4]. The Pentium III processor does not support some Pentium 4 processor instructions, thus it was tested only with the first modification. All the tests were performed with number of stars N = 2000 and picture resolution 512x512 pixels.

The results of the tests are presented in Fig. 5. The figures above columns show averaged performance rate in frames per second; the blue columns correspond to the IPP disabled mode, the yellow ones correspond to the IPP enabled mode. The figure shows that the use of the IPP library in the case of substantial floating point computations (as in the Kepler application) may enhance the application performance up to 2.5 times. If integer calculations prevail (especially 1- or 2-bytes data development), this enhancement can be even greater. Fig. 5. Kepler application performance rate in frames per second shown on (from left to right): the Pentium III processor (1), Pentium 4 processor using the Pentium III processor-optimized library (2), and Pentium 4 processor using the Pentium 4 processor-optimized library (3)

References
1. http://developer.intel.com/software/products/ipp/ipp30/
2. http://www.nasa.gov/
3. http://developer.intel.com/software/products/compilers/
4. Aarseth S.J. Direct integration methods of the N-body problem. - ASS, 1971, v. 14, p. 118-132.

Copyright 2002 © Intel Corporation.
All Rights Reserved.

About the Author
Dmitry Abrosimov is a software engineer on the Intel Performance Libraries software development team at Intel Corporation, based in the Nighzy Novgorod Russia software lab. Dmitry graduated from Gorky state university in 1979 and earned a Ph. D. degree (Russian Candidate of sciences degree) in physics and mathematics in 1996. Prior to joining Intel, Dmitry worked at the Applied Physics Institute in Nizhny Novgorod, Russia, specializing in underwater acoustics.

Table of contents
  1. "Mathematical Model"
  2. "Simulation results"
  3. "Use of Intel Integrated Performance Primitives"
e p (0)    27 Comment(s)