# Galaxy Dynamics Computer Simulation

The paper considers a mathematical model of the behavior of an assembly of N stars. The ‘Kepler’ Microsoft Windows demo application based of this model enables to perform real-time simulation of star clusters dynamics for N~=2500. Such performance rate is possible through the use of the Intel Integrated Performance Primitives (IPP) library. The paper also estimates the efficiency of the IPP application and provides an example of C-code with the IPP functions calls. Computer-simulated images of the spiral galaxy forming process, as well as the real galaxies photos, are presented.

Introduction

Realistic simulation of star clusters dynamics, for example, the dynamics of galaxies, may be of interest to a broad sections of specialists – from game developers up to astrophysics scientists. While, in general, this task is not very difficult from the mathematical point of view (see section 1), it requires large computational resources if the number of stars N is big enough. Further you can see that the time of computing increases as the square of N, which presents the main difficulty for real-time simulation of star assemblages, even on modern computers. To solve the problem, we used the Intel Integrated Performance Primitives (IPP) software library , which enabled to increase the calculation speed and perform 2000 – 2500 stars assemblage behavior in real time. Destination and capabilities of this software are briefly described in in section 3.

The ‘Kepler’ Microsoft Windows demo application was developed using the mathematical models described in section 1. The application simulates the behavior of massive bodies assemblages like star clusters, galaxies, planetary systems, etc., in real time. At the beginning the user may set the following physical parameters:

• – space dimension (2-D or 3-D);

• – view angle (in 3-D case only);

• – number of stars;

• – gravity constant;

• – initial spatial distribution of coordinates (root-mean-square (RMS) in X, Y and Z axes);

• – time integration step (as the integration step decreases, the calculation accuracy becomes higher but the performance slows down);

• – mean and RMS of random initial velocities;

• – angular velocity of the conglomeration regular rotation.

In addition, the user may choose to either use or not use the Intel Integrated Performance Primitive library (see section 3). If IPP is used, the real-time performance rate (15-20 frames per second) for ~2500 stars assemblage may be achieved on a computer operating on a 2000 MHz Intel? Pentium?4 processor.

Fig. 2 describes an example of a spiral galaxy formation and evolution simulation. A prolate cluster of 2500 stars (Fig. 2a) was rotated counter-clockwise. Such an assemblage passes several stages of evolution and finally becomes an elliptic galaxy. To estimate the reality of this example, several photos of authentic spiral galaxies are presented in Fig. 3 (all images were taken from the NASA NSSDC web site  and are courtesy of the Space Telescope Science Institute).

Intel Integrated Performance Primitives (IPP)  is a software library that provides a variety of multimedia functions for developing high performance applications. This software includes a number of library segments, each of which is highly optimized for the use with one of the Intel processors (including the Intel Pentium 4 processor, Intel Itanium processor, and Itanium 2 processor). The IPP library is very effective for processing data arrays, such as vectors, matrices, images, etc. In this case application performance rate may increase highly. The other purpose of the IPP library is to simplify code writing. IPP includes functions that are often used in different calculation regions, such as vector and matrix algebra, image development, etc. To avoid bulky code writing, for example, multiplication of two matrixes, a developer may write only one line of code, which is a call to the corresponding library function.

Code samples of the force components (4-4a) calculation function (ordinary and IPP variants) are presented below.

Common part

```
/* structures definition */
typedef struct _Vect3D
{
float x;
float y;
float z;
} Vect3D F;
typedef struct _Coord3D
{
float* x;
float* y;
float* z;
float* M;
} Coord3D C;
```

Ordinary code
```
/* arrays allocation */
float
*x  = (float*)malloc( N * sizeof(float) ),
*y  = (float*)malloc( N * sizeof(float) ),
*z  = (float*)malloc( N * sizeof(float) ),
*M  = (float*)malloc( N * sizeof(float) );
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
C.x  = x;
C.y  = y;
C.z  = z;
C.M  = M;
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Force3D( &F, N, i, &C );
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
void Force_3D( Vect3D* F, int N, int i, Coord3D* C )
{
int j;
float Fx = 0.0f, Fy = 0.0f, Fz = 0.0f;
float xi = C->x[i], yi = C->y[i], zi = C->z[i];
for( j=0; jx[j], yj = C->y[j], zj = C->z[j], mj = C->M[j];
xj -= xi;
yj -= yi;
zj -= zi;
r2  = xj*xj + yj*yj + zj*zj;
r   = (float)sqrt( r2 );    /* distance between i and j stars */
r2 *= r;
r2  = mj / r2 ;
Fx += xj*r2;
Fy += yj*r2;
Fz += zj*r2;
}
F->x = Fx;
F->y = Fy;
F->z = Fz;
}
```

IPP use code
```
typedef struct _Buffer3D
{
float* dx;
float* dy;
float* dz;
float* R;
float* b;
} Buffer3D B;
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
/* arrays allocation */
float
*x  = ippsMalloc_32f( N ),
*y  = ippsMalloc_32f( N ),
*z  = ippsMalloc_32f( N ),
*M  = ippsMalloc_32f( N ),
*dx = ippsMalloc_32f( N ),
*dy = ippsMalloc_32f( N ),
*dz = ippsMalloc_32f( N ),
*b  = ippsMalloc_32f( N ),
*R  = ippsMalloc_32f( N );
C.x  = x;
C.y  = y;
C.z  = z;
C.M  = M;
B.dx = dx;
B.dy = dy;
B.dz = dz;
B.R  = R;
B.b  = b;
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Force3D_ipp( &F, N, i, &C, &B );
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
void Force_3D_ipp( Vect3D* F, int N, int i, Coord3D* C, Buffer3D* B )
{
IppStatus res;
res = ippsSubC_32f   ( C->x, C->x[i], B->dx, N );
res = ippsSubC_32f   ( C->y, C->y[i], B->dy, N );
res = ippsSubC_32f   ( C->z, C->z[i], B->dz, N );
res = ippsSqr_32f    ( B->dx, B->R, N );
res = ippsSqr_32f    ( B->dy, B->b, N );
res = ippsAdd_32f_I  ( B->b,  B->R, N );
res = ippsSqr_32f    ( B->dz, B->b, N );
res = ippsAdd_32f_I  ( B->b,  B->R, N );
res = ippsInvSqrt_32f_A11( B->R, B->R, N );
res = ippsSqr_32f    ( B->R,  B->b, N );
res = ippsMul_32f_I  ( B->b,  B->R, N );
res = ippsMul_32f_I  ( C->M,  B->R, N );
res = ippsDotProd_32f( B->dx, B->R, N, &F->x );
res = ippsDotProd_32f( B->dy, B->R, N, &F->y );
res = ippsDotProd_32f( B->dz, B->R, N, &F->z );
}
```

To prove the efficiency of the IPP library, another performance test of the Kepler application was carried out. For this test computers based on two different processors, namely, a 700 MHz Intel? Pentium? III processor and a 2000 MHz Intel Pentium 4 processor were used. Two optimized modifications of the application were designed for the Pentium III processor and Pentium 4 processor respectively (these modifications differ from each other by the IPP library segments they use and by compiler presets). Both programs were compiled by the Intel C++ compiler . The Pentium III processor does not support some Pentium 4 processor instructions, thus it was tested only with the first modification. All the tests were performed with number of stars N = 2000 and picture resolution 512×512 pixels.

The results of the tests are presented in Fig. 5. The figures above columns show averaged performance rate in frames per second; the blue columns correspond to the IPP disabled mode, the yellow ones correspond to the IPP enabled mode. The figure shows that the use of the IPP library in the case of substantial floating point computations (as in the Kepler application) may enhance the application performance up to 2.5 times. If integer calculations prevail (especially 1- or 2-bytes data development), this enhancement can be even greater.

Fig. 5. Kepler application performance rate in frames per second shown on (from left to right): the Pentium III processor (1), Pentium 4 processor using the Pentium III processor-optimized library (2), and Pentium 4 processor using the Pentium 4 processor-optimized library (3)

References
1. http://developer.intel.com/software/products/ipp/ipp30/
2. http://www.nasa.gov/
3. http://developer.intel.com/software/products/compilers/
4. Aarseth S.J. Direct integration methods of the N-body problem. – ASS, 1971, v. 14, p. 118-132.

Dmitry Abrosimov is a software engineer on the Intel Performance Libraries software development team at Intel Corporation, based in the Nighzy Novgorod Russia software lab. Dmitry graduated from Gorky state university in 1979 and earned a Ph. D. degree (Russian Candidate of sciences degree) in physics and mathematics in 1996. Prior to joining Intel, Dmitry worked at the Applied Physics Institute in Nizhny Novgorod, Russia, specializing in underwater acoustics.

1. 2003-04-15 12:00 am
2. 2003-04-15 12:07 am
3. 2003-04-15 12:22 am
4. 2003-04-15 12:24 am
5. 2003-04-15 1:27 am
6. 2003-04-15 1:47 am
7. 2003-04-15 1:51 am
8. 2003-04-15 2:07 am
9. 2003-04-15 3:14 am
10. 2003-04-15 6:01 am
11. 2003-04-15 7:37 am
12. 2003-04-15 7:47 am
13. 2003-04-15 9:08 am
14. 2003-04-15 9:17 am
15. 2003-04-15 11:39 am
16. 2003-04-15 12:07 pm
17. 2003-04-15 2:49 pm
18. 2003-04-15 3:34 pm
19. 2003-04-15 3:49 pm
20. 2003-04-15 5:13 pm
21. 2003-04-16 12:27 pm
22. 2003-04-16 1:07 pm
23. 2003-04-16 1:41 pm
24. 2003-04-16 2:02 pm
25. 2003-04-16 11:04 pm