Actually, java.util.Random is a horrible, horrible RNG. Its least significant bit (for example) is grotesquely nonrandom. See http://www.alife.co.uk/nonrandom/ for a wonderful demonstration of its awfulness. There are other serious bugs in java.util.Random which Sun, er, Oracle, has steadfastly refused to fix or revise for a decade now due to concerns about "backward compatibility" (in an RNG!)
> If you want cryptographic-quality random numbers
I think you're confusing "crytographic-quality" with "high quality".
RNGs tend to break down into two different camps. First, there are RNGs meant for simulation of stochastic processes. These RNGs need to be fast, have good periods, and have a very high degree of statistical randomness in their output.
Second, there are RNGs meant for crypto. Speed is less important. Rather it's important that these RNGs provide good statistical randomness, but also it's crucial that, given some M previous output numbers of the RNG, you cannot predict or make any statistical claims about the next number even if you know how the algorithm works. This isn't important for non-crypto RNGs.
For example, Mersenne Twister is a pretty high quality, widely used RNG. But it is not cryptographically secure: if you have about 1500 of the previous output numbers, you can perfectly predict the next number if you know it's MT producing them.
While there are many applications for crypto RNGs, for a great many tasks, a non-crypto RNG would be a better fit.
java.util.Random was meant for non-crypto tasks. And it completely fails at even those tasks.
But SecureRandom is really bad for those tasks too: it is slow. For example, my MersenneTwister implementation is about 9 times faster than SecureRandom; and my non-threadsafe MT implementation is about 36 times faster. That's a big deal if you're pulling down millions and millions of numbers.
> There are other serious bugs in java.util.Random which Sun, er, Oracle, has steadfastly refused to fix or revise for a decade now due to concerns about "backward compatibility" (in an RNG!)
Backward compatibility in a RNG is actually rather important, weirdly enough. For example, anything that uses deterministic seeds for repeatability of procedural generation or optimization. (Read: Minecraft, among other more important things).
That being said, things like this are why I wish more programming languages had proper (read: fine-grained) versioning of libraries. "I need to link to a function with this signature, that is one of these versions or any version that has been declared to be compatible with them, preferably the latest version that is so". That sort of thing.
> For example, Mersenne Twister is a pretty high quality, widely used RNG. But it is not cryptographically secure... But SecureRandom is really bad for those tasks too: it is slow. For example, my MersenneTwister implementation is about 9 times faster than SecureRandom
By your own words, you're comparing apples to oranges here...
Also, when you're writing scientific code (Monte Carlo simulations, for example, or disordered systems), the folk wisdom is that you must to keep track of what seed you used. I've never had to use these records myself, but I can image wanting to go back and reproduce exactly the same calculation, for (e.g.) debugging or verifying new code. Now you think about resurrecting some previous grad student's code that only works when used in exactly the right way and has documentation scattered through comments and notebooks---which is perfectly natural, since he probably didn't think anybody else would ever use it---and a change in PRNG algorithm could be immensely frustrating. Or think about trying to re-run old code to compare with new analytical results: you'd want to verify that you're getting exactly the same results, as a way of checking that there isn't some other bitrot hidden away somewhere.
(Now, why one would be using Java for Monte Carlo simulations I haven't the foggiest idea.)
Yeah, and this is why I'm very careful about recording the versions I use for a calculation (version number and/or commit for the language, library version numbers, and commit for my code). The thing is, that's not your first step: the first step is to try and get the thing to run on whatever Java (/Julia/Matlab/Mathematica/Fortran) you have installed. If that doesn't work, there are a thousand things it could be (improper usage, some missing file you had no clue you might need, change in libraries, change in OS/distro/CPUarch, out-of-memory with a new set of parameters); installing an old version of your language and trying it on that is not necessarily easy, and certainly not the first bug-finding step.
Philip Guo (pgbovine) addressed many of these problems in his thesis work, on Burrito (http://www.pgbovine.net/burrito.html ) and CDE (http://www.pgbovine.net/cde.html ). Both of these look very cool, but I've never managed to get over the hump and actually start playing with them.
At this point I should note that I'm actually in favor of fixing a bad standard-library PRNG---I just think it's important to understand (a) what could break, and (b) why users might be unhappy about the breakage. This isn't quite a Chesterton's Fence sort of situation, but it's similar in spirit.
> Backward compatibility in a RNG is actually rather important, weirdly enough. For example, anything that uses deterministic seeds for repeatability of procedural generation or optimization. (Read: Minecraft, among other more important things).
This seems like a very, very bad thing to rely on. Other than Java, how many other languages have guarantees in generator determinism from version to version as part of the language contract? Certainly Lisp considers it an antipattern.
Essentially all languages allow manual seeding: almost none specify guarantees with regard to RNG behavior. I'm pretty sure this is because determinism of a given process important (everyone who does simulation needs that), but locking into potentially bad RNGs for the language or library itself, in fact specifying them, is a really bad idea.
If it's specified, I don't mind people relying on it. That being said, Java is over-specified in many ways, this potentially being one of them.
I know Python does not have this guarantee.
Personally? There should really be a couple different RNGs in Java's standard library - all combinations that make sense of [insecure/secure, can set seed / can't set seed, thread-local/global] (local/global being without distinction if one cannot set the seed). The default being secure / can't set seed.
Cryptographic doesn't mean slow. If you generate in to a large enough buffer (1-2kb is good), Chacha8 is anywhere from ~1.5 cycles/byte (SSSE-3, Wolfdale) to ~0.5 cycles/byte (AVX2, Haswell). Even unoptimized implementations are still fairly efficient at ~5 cycles/byte.
If you don't care about the cryptographic properties, it could be made even faster by dropping to 6 rounds and tweaking the output step.
If you want true random, then many CPUs/SoCs have environment based TRNGs. The SoC used in the rPi can emit half a million random bits per second (that passed the statistical tests I threw them through), and many Intel chips have an RNG in them now (though I don't know if it is as easy to access in userland as the rPi's is).
If you need more than your current CPU provides then you could chain a bunch of them together, or pick a good PRNG for your main source and simply reseed that on a regular basis with the output of your TRNG.
There are some concerns with generators like those in the rPi as they are not documented, so you have no way of knowing exactly how good they are or if the manufacturer has somehow arranged for there to be some predictability - but if you are that paranoid you need to be building your own generator from scratch anyway!
Java has a number of... questionable, shall we say, design decisions.
Some of it is legacy cruft - Random was first released in, what, 1995?
Some of it is fallout of it being designed by committee - "we need it to pass this bunch of certifications, which means we have to do this bunch of things even though it doesn't make much sense to do so".
And some of it is undoubtedly the classic execution speed / maintainability tradeoff - no-one has unlimited dev time, Oracle included. A LCRNG is simple and well-known.
Also they cannot change it now, because its behaviour is documented and people certainly rely on it somewhere.
And it's about km owing what tools to use for what job anyway. I'd you just need to roll a few dice, Random is fine enough. If you need generated passwords, or cryptographic keys, you use SecureRandom. And if you need to run Monte Carlo simulations you'll just implement either MT-19937 or one of the WELL generators.
I remember reading they pulled the implementation almost straight out of TAOCP, from good old Donald Knuth (if that's enough of a name drop). It might not be great, but at least it was from a reasonably respectable, academic source and not copy-pasta'd out of stackoverflow.
Actually, java.util.Random is a horrible, horrible RNG. Its least significant bit (for example) is grotesquely nonrandom. See http://www.alife.co.uk/nonrandom/ for a wonderful demonstration of its awfulness. There are other serious bugs in java.util.Random which Sun, er, Oracle, has steadfastly refused to fix or revise for a decade now due to concerns about "backward compatibility" (in an RNG!)
> If you want cryptographic-quality random numbers
I think you're confusing "crytographic-quality" with "high quality".
RNGs tend to break down into two different camps. First, there are RNGs meant for simulation of stochastic processes. These RNGs need to be fast, have good periods, and have a very high degree of statistical randomness in their output.
Second, there are RNGs meant for crypto. Speed is less important. Rather it's important that these RNGs provide good statistical randomness, but also it's crucial that, given some M previous output numbers of the RNG, you cannot predict or make any statistical claims about the next number even if you know how the algorithm works. This isn't important for non-crypto RNGs.
For example, Mersenne Twister is a pretty high quality, widely used RNG. But it is not cryptographically secure: if you have about 1500 of the previous output numbers, you can perfectly predict the next number if you know it's MT producing them.
While there are many applications for crypto RNGs, for a great many tasks, a non-crypto RNG would be a better fit.
java.util.Random was meant for non-crypto tasks. And it completely fails at even those tasks.
But SecureRandom is really bad for those tasks too: it is slow. For example, my MersenneTwister implementation is about 9 times faster than SecureRandom; and my non-threadsafe MT implementation is about 36 times faster. That's a big deal if you're pulling down millions and millions of numbers.