Galaxy Dynamics Computer Simulation

Guest post by Dmitry Abrosimov 2003-04-14 3D 25 Comments

The paper considers a mathematical model of the behavior of an assembly of N stars. The ‘Kepler’ Microsoft Windows demo application based of this model enables to perform real-time simulation of star clusters dynamics for N~=2500. Such performance rate is possible through the use of the Intel Integrated Performance Primitives (IPP) library. The paper also estimates the efficiency of the IPP application and provides an example of C-code with the IPP functions calls. Computer-simulated images of the spiral galaxy forming process, as well as the real galaxies photos, are presented.

Introduction

Realistic simulation of star clusters dynamics, for example, the dynamics of galaxies, may be of interest to a broad sections of specialists – from game developers up to astrophysics scientists. While, in general, this task is not very difficult from the mathematical point of view (see section 1), it requires large computational resources if the number of stars N is big enough. Further you can see that the time of computing increases as the square of N, which presents the main difficulty for real-time simulation of star assemblages, even on modern computers. To solve the problem, we used the Intel Integrated Performance Primitives (IPP) software library [1], which enabled to increase the calculation speed and perform 2000 – 2500 stars assemblage behavior in real time. Destination and capabilities of this software are briefly described in in section 3.

The ‘Kepler’ Microsoft Windows demo application was developed using the mathematical models described in section 1. The application simulates the behavior of massive bodies assemblages like star clusters, galaxies, planetary systems, etc., in real time. At the beginning the user may set the following physical parameters:

– space dimension (2-D or 3-D);

– view angle (in 3-D case only);

– number of stars;

– gravity constant;

– initial spatial distribution of coordinates (root-mean-square (RMS) in X, Y and Z axes);

– time integration step (as the integration step decreases, the calculation accuracy becomes higher but the performance slows down);

– mean and RMS of random initial velocities;

– angular velocity of the conglomeration regular rotation.

In addition, the user may choose to either use or not use the Intel Integrated Performance Primitive library (see section 3). If IPP is used, the real-time performance rate (15-20 frames per second) for ~2500 stars assemblage may be achieved on a computer operating on a 2000 MHz Intel? Pentium?4 processor.

Fig. 2 describes an example of a spiral galaxy formation and evolution simulation. A prolate cluster of 2500 stars (Fig. 2a) was rotated counter-clockwise. Such an assemblage passes several stages of evolution and finally becomes an elliptic galaxy. To estimate the reality of this example, several photos of authentic spiral galaxies are presented in Fig. 3 (all images were taken from the NASA NSSDC web site [3] and are courtesy of the Space Telescope Science Institute).

Intel Integrated Performance Primitives (IPP) [3] is a software library that provides a variety of multimedia functions for developing high performance applications. This software includes a number of library segments, each of which is highly optimized for the use with one of the Intel processors (including the Intel Pentium 4 processor, Intel Itanium processor, and Itanium 2 processor). The IPP library is very effective for processing data arrays, such as vectors, matrices, images, etc. In this case application performance rate may increase highly. The other purpose of the IPP library is to simplify code writing. IPP includes functions that are often used in different calculation regions, such as vector and matrix algebra, image development, etc. To avoid bulky code writing, for example, multiplication of two matrixes, a developer may write only one line of code, which is a call to the corresponding library function.

Code samples of the force components (4-4a) calculation function (ordinary and IPP variants) are presented below.

Common part



/* structures definition */
typedef struct _Vect3D

{

    float x;

    float y;

    float z;

} Vect3D F;
typedef struct _Coord3D

{

    float* x;

    float* y;

    float* z;

    float* M;

} Coord3D C;

Ordinary code



/* arrays allocation */
    float

    *x  = (float*)malloc( N * sizeof(float) ),

    *y  = (float*)malloc( N * sizeof(float) ),

    *z  = (float*)malloc( N * sizeof(float) ),

    *M  = (float*)malloc( N * sizeof(float) );
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
    C.x  = x;

    C.y  = y;

    C.z  = z;

    C.M  = M;

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

    Force3D( &F, N, i, &C );

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
void Force_3D( Vect3D* F, int N, int i, Coord3D* C )

{

    int j;

    float Fx = 0.0f, Fy = 0.0f, Fz = 0.0f;

    float xi = C->x[i], yi = C->y[i], zi = C->z[i];
    for( j=0; jx[j], yj = C->y[j], zj = C->z[j], mj = C->M[j];

        xj -= xi;

        yj -= yi;

        zj -= zi;

        r2  = xj*xj + yj*yj + zj*zj;

        r   = (float)sqrt( r2 );    /* distance between i and j stars */

        r2 *= r;

        r2  = mj / r2 ;

        Fx += xj*r2;

        Fy += yj*r2;

        Fz += zj*r2;

    }
    F->x = Fx;

    F->y = Fy;

    F->z = Fz;

}

IPP use code



typedef struct _Buffer3D

{

    float* dx;

    float* dy;

    float* dz;

    float* R;

    float* b;

} Buffer3D B;

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

/* arrays allocation */
    float

    *x  = ippsMalloc_32f( N ),

    *y  = ippsMalloc_32f( N ),

    *z  = ippsMalloc_32f( N ),

    *M  = ippsMalloc_32f( N ),
    *dx = ippsMalloc_32f( N ),

    *dy = ippsMalloc_32f( N ),

    *dz = ippsMalloc_32f( N ),

    *b  = ippsMalloc_32f( N ),

    *R  = ippsMalloc_32f( N );
    C.x  = x;

    C.y  = y;

    C.z  = z;

    C.M  = M;
    B.dx = dx;

    B.dy = dy;

    B.dz = dz;

    B.R  = R;

    B.b  = b;

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

    Force3D_ipp( &F, N, i, &C, &B );

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
void Force_3D_ipp( Vect3D* F, int N, int i, Coord3D* C, Buffer3D* B )

{

    IppStatus res;
    res = ippsSubC_32f   ( C->x, C->x[i], B->dx, N );

    res = ippsSubC_32f   ( C->y, C->y[i], B->dy, N );

    res = ippsSubC_32f   ( C->z, C->z[i], B->dz, N );

    res = ippsSqr_32f    ( B->dx, B->R, N );

    res = ippsSqr_32f    ( B->dy, B->b, N );

    res = ippsAdd_32f_I  ( B->b,  B->R, N );

    res = ippsSqr_32f    ( B->dz, B->b, N );

    res = ippsAdd_32f_I  ( B->b,  B->R, N );

    res = ippsInvSqrt_32f_A11( B->R, B->R, N );

    res = ippsSqr_32f    ( B->R,  B->b, N );

    res = ippsMul_32f_I  ( B->b,  B->R, N );

    res = ippsMul_32f_I  ( C->M,  B->R, N );

    res = ippsDotProd_32f( B->dx, B->R, N, &F->x );

    res = ippsDotProd_32f( B->dy, B->R, N, &F->y );

    res = ippsDotProd_32f( B->dz, B->R, N, &F->z );

}

To prove the efficiency of the IPP library, another performance test of the Kepler application was carried out. For this test computers based on two different processors, namely, a 700 MHz Intel? Pentium? III processor and a 2000 MHz Intel Pentium 4 processor were used. Two optimized modifications of the application were designed for the Pentium III processor and Pentium 4 processor respectively (these modifications differ from each other by the IPP library segments they use and by compiler presets). Both programs were compiled by the Intel C++ compiler [4]. The Pentium III processor does not support some Pentium 4 processor instructions, thus it was tested only with the first modification. All the tests were performed with number of stars N = 2000 and picture resolution 512×512 pixels.

The results of the tests are presented in Fig. 5. The figures above columns show averaged performance rate in frames per second; the blue columns correspond to the IPP disabled mode, the yellow ones correspond to the IPP enabled mode. The figure shows that the use of the IPP library in the case of substantial floating point computations (as in the Kepler application) may enhance the application performance up to 2.5 times. If integer calculations prevail (especially 1- or 2-bytes data development), this enhancement can be even greater.

Fig. 5. Kepler application performance rate in frames per second shown on (from left to right): the Pentium III processor (1), Pentium 4 processor using the Pentium III processor-optimized library (2), and Pentium 4 processor using the Pentium 4 processor-optimized library (3)

References
1. http://developer.intel.com/software/products/ipp/ipp30/
2. http://www.nasa.gov/
3. http://developer.intel.com/software/products/compilers/
4. Aarseth S.J. Direct integration methods of the N-body problem. – ASS, 1971, v. 14, p. 118-132.

About the Author
Dmitry Abrosimov is a software engineer on the Intel Performance Libraries software development team at Intel Corporation, based in the Nighzy Novgorod Russia software lab. Dmitry graduated from Gorky state university in 1979 and earned a Ph. D. degree (Russian Candidate of sciences degree) in physics and mathematics in 1996. Prior to joining Intel, Dmitry worked at the Applied Physics Institute in Nizhny Novgorod, Russia, specializing in underwater acoustics.

25 Comments

2003-04-15 12:00 am

Anonymous
The images are fuzzy to read, they hurt my eyes and brain. 🙁
2003-04-15 12:07 am

Anonymous
This makes for a nice change of pace in the types of articles posted to osnews.com I welcome more of the same.
2003-04-15 12:22 am

Anonymous
It is not the compression of the images that makes them harder to read. It is because I had to scale them down from 670pix to 600, because they wouldn’t fit on our design.

Are you sure your eyes hurt because of the image and not maybe because of the advanced math? ;-D
2003-04-15 12:24 am

Anonymous
… thumbnails wouldn’t look nice for such an article btw.

> I welcome more of the same.

Tomorrow we have one more such article.
2003-04-15 1:27 am

Anonymous
It’s (for me at least) a little difficult to fully understand the advantages of Intel’s IPP, but the article does a nice job, very interesting.

I now understand better why some 3D software (3d Studio ! maybe maya 4.5) started to appear more independent from the VGA card performance. But, since it’s closed source, I can be wrong and the cause of this independence might be something else.
2003-04-15 1:47 am

Anonymous
Eugenia, you seem to have applied a suboptimal scaling algorithm to the text that results in aliasing — jaggies — making it hard to read. Next time, try bicubic interpolation.

Ironic that this is mentioned in an article partly about IPP (http://www.intel.com/software/products/ipp/ipp30/), which among other things contains the implementation of some actual interpolated scaling algorithms (afaik only bilinear interpolation, at least when I last used it).

IPP is an extremely helpful toolkit. It covers anything from alpha blending to MPEG encoding, everything making use of Intel’s optimized, specialized instruction sets (MMX, SIMD and what not).

It’s unfortunate that Intel has not open-sourced what is arguably a potential sales-booster for Intel processors, but at $199 for a commercial license it’s not exactly expensive.
2003-04-15 1:51 am

Anonymous
>try bicubic interpolation

I use Bilinear Resample + light sharp filter. It is the best with Paint Shop Pro.
2003-04-15 2:07 am

Anonymous
Paint Shop Yuk, in my book.

Photoshop does bicubic, with style. As does, I’m sure, The Gimp and probably other imaging apps, like ThumbsPlus and ACDSee. Only with less panache. I would be happy to rescale stuff for you for future articles, should you not have access to these programs.
2003-04-15 3:14 am

Anonymous
Are you sure your eyes hurt because of the image and not maybe because of the advanced math? ;-D

Haha, nice catch.. 😛 Problem solved by save the images on HD and view in the gThumb by view as ‘zoom in’ (120% – 150%)..

Anyway, it’s nice article, I understand ok around 3/4, but not the math part. 😉
2003-04-15 6:01 am

Anonymous
all i can say is: What?
2003-04-15 7:37 am

Anonymous
For all who is really interested, I can send images with good resolution. Connect with me via e-mail.
2003-04-15 7:47 am

Anonymous
> It’s (for me at least) a little difficult to fully

> understand the advantages of Intel’s IPP…

2.5 times performance increase – isn’t it impressive? And it is not the limit.
2003-04-15 9:08 am

Anonymous
Hi

Nice article – makes a refreshing change! Two questions though…

I can see how the IPP may be useful for multimedia stuff but how much difference does it make for the array operations, over and above using either BLAS/LAPACK or Fortran 90 functions?
2003-04-15 9:17 am

Anonymous
2.5 times performance increase – isn’t it impressive? And it is not the limit.

That is very impressive. A few quick questions. Is te IPP an exclusively C library only? By that I mean that it has only C functions, with no C++ stuff there.

Also, does it work on other compilers, apart from hte Intel C++?
2003-04-15 11:39 am

Anonymous
1. Reference 4 should be replaced to the 2nd position. References 2 and 3 – to 3 and 4 respectively.

2. The symbol ‘(R)’ (registered trademark) is shown as ‘?’.
2003-04-15 12:07 pm

Anonymous
Try to answer some questions. IPP library has been written in C and Assembler. It contains a number of static and dynamic modules (.lib and .dll) which can be used with ANY C/C++ compiler without problems. Of course, you may use IPP with Fortran, if you write interfaces to necessary C-functions, as usual. But I’d like to suggest another Intel product – MKL (Math Kernel Library) http://www.intel.com/software/products/mkl/mkl52/ . It already contains Fortran interfaces for all functions.
2003-04-15 2:49 pm

Anonymous
2.5 times performance increase – isn’t it impressive? And it is not the limit.

__________________________________

The IPP library is very effective for processing data arrays, such as vectors, matrices, images, etc.

…

the IPP library is to simplify code writing. IPP includes functions that are often used in different calculation regions, such as vector and matrix algebra, image development, etc. To avoid bulky code writing, for example, multiplication of two matrixes, a developer may write only one line of code, which is a call to the corresponding library function.

Yes, thanks. (I am only an “afficionado”, not a real hard core programer like you Mister Abrosimov).

I understood that (in relation to my old and trusty dual PIII 1Ghz).

The difficult part for me was the implications in real life of IPP (MMX, SIMD are proprietary – AMD has 3DNow proprietary instructions). 2.5 increase is very good (beats the competition), the cpu chip seems to already have this lib half (if not the full instruction without floating point calc) computed on die, or so it seems, you just have to call it to calculate the rest. This is the difficult part for me to understand.

It contains a number of static and dynamic modules (.lib and .dll) which can be used with ANY C/C++ compiler without problems.

It would be intersting to know if it’s a library for windows only (.dll .lib) or if one can use it on Linux too … with the C, C++ Intel compiler for Linux (seems like you say it can be used with gcc++ too).

There are a number of applications on Linux that could be compiled optimized to use IPP on Pentium processors (i586). The first that come to mind are Blender and Moonlight and some audio/video Linux applictions (mpeg2 …), maybe even GIMP could benefit on some “GIMP extras”, mostly written in Perl, as far as i know
2003-04-15 3:34 pm

Anonymous
If you want to write Russian, send me e-mail. It seems this site doesn’t support cyrillic.
2003-04-15 3:49 pm

Anonymous
If you read IPP web page attentively, you’ll find that two IPP packages exist – for Windows and for Linux. Functionality is the same. Any compiler as well.
2003-04-15 5:13 pm

Anonymous
Wonder how my neural network will perform with this library.

Guess I’ll download the trial and spend the next few weeks tuning and see.
2003-04-16 12:27 pm

Anonymous
Dima, It was a decent joke about the first page with math/phys. equations, and how it would be accepted on this site :-). Euginia has moderated my post down anyway, I think it did not match the design of the site =)
2003-04-16 1:07 pm

Anonymous
mmmm,, advanced math,, *thinks* *smoke comes out of ears* AGHHHHHHHHHHGHH!!!! BBBBBOOOOOOOOOMMMMM! *head explodes*
2003-04-16 1:41 pm

Anonymous
What is the point of a “real time” simulation of a star cluster? Who has time to wait billions of years for the results?
2003-04-16 2:02 pm

Anonymous
>> What is the point of a “real time” simulation of a star

>> cluster? Who has time to wait billions of years for the

>> results?

‘Real time’ means more than 20 frames (or integration steps in our case) per second. So we may see the galaxy evolution as a movie (many times accelerated, of course). I thought it’s clear.
2003-04-16 11:04 pm

Anonymous
So….when do we get a Stargate?

7 points, 6 points zero in on where you want to go, the 7th is where you are from.

At least…that’s how Jackson explained it.

Jack O’Neill.