Random numbers are utilized in many different areas, ranging from cryptography (in general) to source port and process ID randomization in some operating systems. So what is exactly a random number generator?

“Anyone attempting to generate random numbers by deterministic means is, of course, living in a state of sin.” John von Neumann

As random numbers are so useful, and make certain algorythms practicle that would be NP without (via Monti Carlo, or Las Vagas techniques) I guess this sin is good.

Random numbers are used a lot in finance, for example for Monte Carlo Simulation (option pricing). You can buy one million random numbers (normalised) within a Monte Carlo package. MIguel

The true randomness of a PRNG output sequence can’t be determined by statistical tests, because ALL possible output sequences should be equally probable (so even 11111… could occur in theory in a random sequence). Consequently, the entropy of a (single) PRNG output sequence doesn’t provide much insight on the PRNG’s randomness.

The quality of PRNG’s can be described more appropriately by complexity theory. This means, it should be impossible to predict the next bit of an output sequence from all previous output bits with probability significantly better than 0.5 by using a probabilistic polynomial algorithm.

You can use just about anything you can think of, although some would not work well in some situations.

On the software side you have memory usage, number of processes, cumulative CPU time for a group of processes, etc.

For hardware, you typically use electrical noise gleaned from resistors or capacitors. I’ve even heard people suggest using your soundcard’s line-in to record white-noise.

Its one of the rare articles I’ve seen on here lately which doesnt seem to be flaimbait. Hopefully more informative articles will show up in the future?

I use them for parsers, compilers and hashing various string types all the time. Fill a 256 uint table with rands where ever you want to recognize various string types ie alphaNum[a..zA..Z0..9] and hash the resulting alphaNum[*ch++] sequence till you fetch a 0. Most compiler texts generally use much less efficient but more readable boolean tests ‘a’>=*ch && …. The hash tables do have to be set up though.

Also use them for testing HW design, quality doesn’t neccesarily matter as much as the purists demand since they are used more to shine a light on hidden bugs unlikely to crop up often. PRNGs are usually good enough as long as its cycled to get a whole word change. Also use R250, inlined it gives nos every 3ns. I stay away from codes that use % or / though but the math stat guys seem to like them.

Ofcourse the same material has been covered by lots of authors, Knuth, Sedgwick etc, but I also like the Pentium asm programming books by the likes of Rick Booth which concentrate on getting these codes in asm in the min of cpu cycles. Even if you don’t use asm, the insight gained sheds light on how C codes perform.

I forgot to say that while rand num algorithms are IMO much nicer that other algorithms, they tend to wreak havoc with caches. If I set up a hash table thats bigger than the cpu cache, then the hash function might supposedly take 20cycles per hash by counting asm cycles, but when measured in real time its closer to 1000 cycles, ie rand table lookups have no locality of reference, very bad for performance.

I’m going to back up the first comment about Monte Carlo simulation. I work in a science lab and we use huge loads of random numbers to simulate complex processes. We study particle flow, but there are just to many parameters to allow for analytical solutions. It is always fascinating to me that it works, but the results are usually verified with experimental work that closely matches our model.

IIRC I picked it up from the Rick Booth book I think or M Abrash, or another writer, there used to be lots of books on optimizing for the 486,Pentium and these books would essentially rehash what Sedgewick would cover but do so in light of x86 codes, what to use for fast code, what not to use ie old 16b codes.

For the C compiler I am working on, there are at around 20 masking tables, alphaNum, alpha, HexNum, DecNum etc and if I expect to parse an alpha identifier just call hash(,alphaNum,), etc. The hashed value then goes into symbol table.

I am not sure if I can pin down a reference right now with out digging around.

In the usual scheme you might scan for alphaNum chars using && || bool tests and hash each new *ch into a hash sum using shifts & xors. The problem there is that the *ch values don’t have much of a distribution, if you had to parse a source file where all the identifiers look like i,ix,ii,i0,i3 etc, simple hashes would collide alot. By pushing each *ch through a table, you are hashing nice 32b rand nos and get a nice uncollided hash function even if the inputs are very close. You also get the end of string test in a single ==0 in 1 go since illegal chars in the table are 0s. For this Rand func all the tables should use the same PRNG values from a master 256 rand table and 0 out illegal entries.

The downside as I suggested before is that rand look up tables must be much smaller than the cache which for ascii is true.

Simple fact: complicated generators do not produce more complicated output.

For instance, the rule 30 cellular automaton http://mathworld.wolfram.com/Rule30.html has passed every statistical randomness test, even as many more complicated but mathematically “justified” generators have failed over the years.

The assumption that nature somehow generates better randomness is also highly suspect at best. Just because you don’t understand it doesn’t mean it is fundamentally mysterious or better.

From a practical standpoint, sampling randomness from nature is a difficult process, and the measurement process itself at many points allows for regularities to creep in.

The way the random.org random number generator works is quite simple. A radio is tuned into a frequency where nobody is broadcasting. The atmospheric noise picked up by the receiver is fed into a Sun SPARC workstation through the microphone port where it is sampled by a program as an eight bit mono signal at a frequency of 8KHz. The upper seven bits of each sample are discarded immediately and the remaining bits are gathered and turned into a stream of bits with a high content of entropy. Skew correction is performed on the bit stream, in order to ensure that there is an approximately even distribution of 0s and 1s.

He forgot (or does not know) that there is in fact a third kind of random numbers: quasirandom numbers, e.g. Sobol or Niederreiter sequences. They are more “random” in a different sense, that they fill up a multidimensional space better than normal pseudorandom numbers. These kind of RNG are used for numerical integrations.

Of course random numbers are used by far more places than just “ranging” from GPG to OS kernel, and cryptographic strength is not the only requirement for a good RNG outside cryptography. But I guess this is comes if you are spending your time only with operating system… 😉

“Anyone attempting to generate random numbers by deterministic means is, of course, living in a state of sin.” John von Neumann

As random numbers are so useful, and make certain algorythms practicle that would be NP without (via Monti Carlo, or Las Vagas techniques) I guess this sin is good.

Random numbers are used a lot in finance, for example for Monte Carlo Simulation (option pricing). You can buy one million random numbers (normalised) within a Monte Carlo package. MIguel

The true randomness of a PRNG output sequence can’t be determined by statistical tests, because ALL possible output sequences should be equally probable (so even 11111… could occur in theory in a random sequence). Consequently, the entropy of a (single) PRNG output sequence doesn’t provide much insight on the PRNG’s randomness.

The quality of PRNG’s can be described more appropriately by complexity theory. This means, it should be impossible to predict the next bit of an output sequence from all previous output bits with probability significantly better than 0.5 by using a probabilistic polynomial algorithm.

I would’ve liked more explanation on how to accumulate entropy in a computer system. I only know of taking the time as a basis.

when testing a person’s reaction to a stimulus. Randomization is needed for timing.

You can use just about anything you can think of, although some would not work well in some situations.

On the software side you have memory usage, number of processes, cumulative CPU time for a group of processes, etc.

For hardware, you typically use electrical noise gleaned from resistors or capacitors. I’ve even heard people suggest using your soundcard’s line-in to record white-noise.

Its one of the rare articles I’ve seen on here lately which doesnt seem to be flaimbait. Hopefully more informative articles will show up in the future?

I use them for parsers, compilers and hashing various string types all the time. Fill a 256 uint table with rands where ever you want to recognize various string types ie alphaNum[a..zA..Z0..9] and hash the resulting alphaNum[*ch++] sequence till you fetch a 0. Most compiler texts generally use much less efficient but more readable boolean tests ‘a’>=*ch && …. The hash tables do have to be set up though.

Also use them for testing HW design, quality doesn’t neccesarily matter as much as the purists demand since they are used more to shine a light on hidden bugs unlikely to crop up often. PRNGs are usually good enough as long as its cycled to get a whole word change. Also use R250, inlined it gives nos every 3ns. I stay away from codes that use % or / though but the math stat guys seem to like them.

Ofcourse the same material has been covered by lots of authors, Knuth, Sedgwick etc, but I also like the Pentium asm programming books by the likes of Rick Booth which concentrate on getting these codes in asm in the min of cpu cycles. Even if you don’t use asm, the insight gained sheds light on how C codes perform.

I forgot to say that while rand num algorithms are IMO much nicer that other algorithms, they tend to wreak havoc with caches. If I set up a hash table thats bigger than the cpu cache, then the hash function might supposedly take 20cycles per hash by counting asm cycles, but when measured in real time its closer to 1000 cycles, ie rand table lookups have no locality of reference, very bad for performance.

That is very interesting. I had never heard of using randomness in parsing, do you have any references?

I’m going to back up the first comment about Monte Carlo simulation. I work in a science lab and we use huge loads of random numbers to simulate complex processes. We study particle flow, but there are just to many parameters to allow for analytical solutions. It is always fascinating to me that it works, but the results are usually verified with experimental work that closely matches our model.

IIRC I picked it up from the Rick Booth book I think or M Abrash, or another writer, there used to be lots of books on optimizing for the 486,Pentium and these books would essentially rehash what Sedgewick would cover but do so in light of x86 codes, what to use for fast code, what not to use ie old 16b codes.

For the C compiler I am working on, there are at around 20 masking tables, alphaNum, alpha, HexNum, DecNum etc and if I expect to parse an alpha identifier just call hash(,alphaNum,), etc. The hashed value then goes into symbol table.

I am not sure if I can pin down a reference right now with out digging around.

In the usual scheme you might scan for alphaNum chars using && || bool tests and hash each new *ch into a hash sum using shifts & xors. The problem there is that the *ch values don’t have much of a distribution, if you had to parse a source file where all the identifiers look like i,ix,ii,i0,i3 etc, simple hashes would collide alot. By pushing each *ch through a table, you are hashing nice 32b rand nos and get a nice uncollided hash function even if the inputs are very close. You also get the end of string test in a single ==0 in 1 go since illegal chars in the table are 0s. For this Rand func all the tables should use the same PRNG values from a master 256 rand table and 0 out illegal entries.

The downside as I suggested before is that rand look up tables must be much smaller than the cache which for ascii is true.

Hope that helps

If a bot is stuck in Counter-Strike, it randomly wriggles so that it should get unstuck =)

Also Minesweeper, Sea Battle etc.

Also useful when testing stuff (e.g. filling databases with loads of absolutely unrelated data).

This article is pretty naive.

Simple fact: complicated generators do not produce more complicated output.

For instance, the rule 30 cellular automaton http://mathworld.wolfram.com/Rule30.html has passed every statistical randomness test, even as many more complicated but mathematically “justified” generators have failed over the years.

The assumption that nature somehow generates better randomness is also highly suspect at best. Just because you don’t understand it doesn’t mean it is fundamentally mysterious or better.

From a practical standpoint, sampling randomness from nature is a difficult process, and the measurement process itself at many points allows for regularities to creep in.

Mersenne Twister prng, has nice properties, and isn’t that bad for ruining caches either.

#define N 624

#define w 32

#define M 387

#define R 19

#define U 0x80000000

#define LL 0x7FFFFFFF

#define a 0x9908B0DF

#define s 7

#define t 15

#define b 0x9D2C5680

#define c 0xEFC60000

#define l 18

#define u 11

static unsigned int x[N];

static int i;

//bleh. initialize, you have to call it.

void mtsrand(int seed)

{

int j;

for(j = 0 ; j < N ; j++) {

x[j] = seed * ((j+1)<< 3)|0x1;

}

}

unsigned int mtrand(void)

{

unsigned int y;

y = x[i] & U;

y |= x[i%N];

y &= LL;

x[i] = x[(i + M)%N];

x[i] ^= y >> 1;

//so “nice” 😐

if(y & 0x1)

x[i] ^= a ;

else

x[i] ^= 0;

y = x[i];

y ^= y >> u;

y ^= (y << s) & b;

y ^= (y << t) & c;

y ^= y >> l;

i = (i + 1)%N;

return y;

}

from http://www.random.org/essay.html

The way the random.org random number generator works is quite simple. A radio is tuned into a frequency where nobody is broadcasting. The atmospheric noise picked up by the receiver is fed into a Sun SPARC workstation through the microphone port where it is sampled by a program as an eight bit mono signal at a frequency of 8KHz. The upper seven bits of each sample are discarded immediately and the remaining bits are gathered and turned into a stream of bits with a high content of entropy. Skew correction is performed on the bit stream, in order to ensure that there is an approximately even distribution of 0s and 1s.i thought RNG was one of the kernel modules … guess i am a bit off…

pretty sure they are called RNG random nomber generators….

oh well, right acronym, wrong subject i reckon

He forgot (or does not know) that there is in fact a third kind of random numbers: quasirandom numbers, e.g. Sobol or Niederreiter sequences. They are more “random” in a different sense, that they fill up a multidimensional space better than normal pseudorandom numbers. These kind of RNG are used for numerical integrations.

Of course random numbers are used by far more places than just “ranging” from GPG to OS kernel, and cryptographic strength is not the only requirement for a good RNG outside cryptography. But I guess this is comes if you are spending your time only with operating system… 😉