> Math.random() is simply a super-fast decent RNG Actually, java.util.Random is ...

TheLoneWolfling · on Sept 1, 2014

> There are other serious bugs in java.util.Random which Sun, er, Oracle, has steadfastly refused to fix or revise for a decade now due to concerns about "backward compatibility" (in an RNG!)

Backward compatibility in a RNG is actually rather important, weirdly enough. For example, anything that uses deterministic seeds for repeatability of procedural generation or optimization. (Read: Minecraft, among other more important things).

That being said, things like this are why I wish more programming languages had proper (read: fine-grained) versioning of libraries. "I need to link to a function with this signature, that is one of these versions or any version that has been declared to be compatible with them, preferably the latest version that is so". That sort of thing.

> For example, Mersenne Twister is a pretty high quality, widely used RNG. But it is not cryptographically secure... But SecureRandom is really bad for those tasks too: it is slow. For example, my MersenneTwister implementation is about 9 times faster than SecureRandom

By your own words, you're comparing apples to oranges here...

cdwhite · on Sept 2, 2014

Also, when you're writing scientific code (Monte Carlo simulations, for example, or disordered systems), the folk wisdom is that you must to keep track of what seed you used. I've never had to use these records myself, but I can image wanting to go back and reproduce exactly the same calculation, for (e.g.) debugging or verifying new code. Now you think about resurrecting some previous grad student's code that only works when used in exactly the right way and has documentation scattered through comments and notebooks---which is perfectly natural, since he probably didn't think anybody else would ever use it---and a change in PRNG algorithm could be immensely frustrating. Or think about trying to re-run old code to compare with new analytical results: you'd want to verify that you're getting exactly the same results, as a way of checking that there isn't some other bitrot hidden away somewhere.

(Now, why one would be using Java for Monte Carlo simulations I haven't the foggiest idea.)

prawks · on Sept 2, 2014

Couldn't you solve this by tracking the version of Java (for example) used initially to develop the simulation?

cdwhite · on Sept 2, 2014

Yeah, and this is why I'm very careful about recording the versions I use for a calculation (version number and/or commit for the language, library version numbers, and commit for my code). The thing is, that's not your first step: the first step is to try and get the thing to run on whatever Java (/Julia/Matlab/Mathematica/Fortran) you have installed. If that doesn't work, there are a thousand things it could be (improper usage, some missing file you had no clue you might need, change in libraries, change in OS/distro/CPUarch, out-of-memory with a new set of parameters); installing an old version of your language and trying it on that is not necessarily easy, and certainly not the first bug-finding step.

Philip Guo (pgbovine) addressed many of these problems in his thesis work, on Burrito (http://www.pgbovine.net/burrito.html ) and CDE (http://www.pgbovine.net/cde.html ). Both of these look very cool, but I've never managed to get over the hump and actually start playing with them.

At this point I should note that I'm actually in favor of fixing a bad standard-library PRNG---I just think it's important to understand (a) what could break, and (b) why users might be unhappy about the breakage. This isn't quite a Chesterton's Fence sort of situation, but it's similar in spirit.

lmm · on Sept 2, 2014

Sure, but it kind of sucks if your whole program has to remain on Java 1.3 forever because you need a particular RNG.

TheLoneWolfling · on Sept 2, 2014

Until a required library (or Java itself) has a security flaw and the new version does not support that version of Java.

SeanLuke · on Sept 2, 2014

> Backward compatibility in a RNG is actually rather important, weirdly enough. For example, anything that uses deterministic seeds for repeatability of procedural generation or optimization. (Read: Minecraft, among other more important things).

This seems like a very, very bad thing to rely on. Other than Java, how many other languages have guarantees in generator determinism from version to version as part of the language contract? Certainly Lisp considers it an antipattern.

michaelt · on Sept 2, 2014

Any language that allows the PRNG to be manually seeded, presumably?

I mean, if manually seeding the PRNG doesn't produce a deterministic result, you wouldn't make it part of the API.

SeanLuke · on Sept 2, 2014

Nope.

Essentially all languages allow manual seeding: almost none specify guarantees with regard to RNG behavior. I'm pretty sure this is because determinism of a given process important (everyone who does simulation needs that), but locking into potentially bad RNGs for the language or library itself, in fact specifying them, is a really bad idea.

TheLoneWolfling · on Sept 2, 2014

Counterexample: Python [1]

[1] http://stackoverflow.com/a/19179886/1814881

TheLoneWolfling · on Sept 2, 2014

If it's specified, I don't mind people relying on it. That being said, Java is over-specified in many ways, this potentially being one of them.

I know Python does not have this guarantee.

Personally? There should really be a couple different RNGs in Java's standard library - all combinations that make sense of [insecure/secure, can set seed / can't set seed, thread-local/global] (local/global being without distinction if one cannot set the seed). The default being secure / can't set seed.

floody-berry · on Sept 1, 2014

Cryptographic doesn't mean slow. If you generate in to a large enough buffer (1-2kb is good), Chacha8 is anywhere from ~1.5 cycles/byte (SSSE-3, Wolfdale) to ~0.5 cycles/byte (AVX2, Haswell). Even unoptimized implementations are still fairly efficient at ~5 cycles/byte.

If you don't care about the cryptographic properties, it could be made even faster by dropping to 6 rounds and tweaking the output step.

sehugg · on Sept 1, 2014

An appropriate Xorshift generator is likely much better (and faster) than java.util.Random: http://xorshift.di.unimi.it/

I've used them to improve my hashing collision rate.

dspillett · on Sept 2, 2014

If you want true random, then many CPUs/SoCs have environment based TRNGs. The SoC used in the rPi can emit half a million random bits per second (that passed the statistical tests I threw them through), and many Intel chips have an RNG in them now (though I don't know if it is as easy to access in userland as the rPi's is).

If you need more than your current CPU provides then you could chain a bunch of them together, or pick a good PRNG for your main source and simply reseed that on a regular basis with the output of your TRNG.

There are some concerns with generators like those in the rPi as they are not documented, so you have no way of knowing exactly how good they are or if the manufacturer has somehow arranged for there to be some predictability - but if you are that paranoid you need to be building your own generator from scratch anyway!

TheLoneWolfling · on Sept 2, 2014

The problem with this is that it's not ubiquitous, and there are multiple different implementations.

There's a reason why OSes provide a HAL.

MichaelGG · on Sept 1, 2014

How did Java and up with such a poor implementation? Certainly these things were known when writing that library?

TheLoneWolfling · on Sept 1, 2014

Java has a number of... questionable, shall we say, design decisions.

Some of it is legacy cruft - Random was first released in, what, 1995?

Some of it is fallout of it being designed by committee - "we need it to pass this bunch of certifications, which means we have to do this bunch of things even though it doesn't make much sense to do so".

And some of it is undoubtedly the classic execution speed / maintainability tradeoff - no-one has unlimited dev time, Oracle included. A LCRNG is simple and well-known.

ygra · on Sept 1, 2014

Also they cannot change it now, because its behaviour is documented and people certainly rely on it somewhere.

And it's about km owing what tools to use for what job anyway. I'd you just need to roll a few dice, Random is fine enough. If you need generated passwords, or cryptographic keys, you use SecureRandom. And if you need to run Monte Carlo simulations you'll just implement either MT-19937 or one of the WELL generators.

gleenn · on Sept 2, 2014

I remember reading they pulled the implementation almost straight out of TAOCP, from good old Donald Knuth (if that's enough of a name drop). It might not be great, but at least it was from a reasonably respectable, academic source and not copy-pasta'd out of stackoverflow.