Hacker News new | past | comments | ask | show | jobs | submit login

The error is in assuming that applying a simple permutation to a word increases the entropy by a meaningful amount. Capitalizing a letter, substituting a symbol, moving your hand's position on the keyboard, or repeating a letter are common things to do. A dictionary word that has had one of these things done to it, for purposes of password strength, is still a dictionary word.

People commonly think they are being random when they modify their passwords, but in point of fact, they are doing the same thing as everyone else. You cannot ever trust yourself to be random; the only things you can trust to be random come out of random number generators.

That is why I say probability does not work that way. In order for a modification to make a password or phrase meaningfully secure, it must come from a genuinely random source with a large number of live outcome possibilities. Mutating a phrase in a clever, original way that everyone else uses is pointless. It does not make the passphrase "very hard to predict". It makes it "slightly less hard to predict than it was, which was not very hard in the first place."

Fundamentally, "probability does not work that way" in the sense that just because the outcome looks random and the process feels random, that doesn't mean it is.




> Capitalizing a letter, substituting a symbol, moving your hand's position on the keyboard, or repeating a letter are common things to do. A dictionary word that has had one of these things done to it, for purposes of password strength, is still a dictionary word.

For single-word passwords it can be approximated this way. However, for anything longer, especially a natural language sentence, misspellings makes a big difference.

The OED documents 171,146 words in active use. Assume that every word has at least two simple mispellings. Suddenly your dictionary becomes 513,438 words big. This is a linear expansion, but it's in the exponent since you're taking permutations. That's a big deal. Some mispellings may be much more common than others, so you can bias your dataset accordingly, but it's still a huge expansion.

> You cannot ever trust yourself to be random; the only things you can trust to be random come out of random number generators.

This is true, but neither really here nor there. The entropy of the passphrase is already so great, and then is expanded exponentially with the addition of misspellings and substitutions, even if the distribution of those is biased.


> "for anything longer... misspellings makes a big difference."

The question to ask is, how big a difference? Put another way, how many bits of entropy do your misspellings generate?

In your above example, where each word has 2 common misspellings, each misspelling gets you ~1.5 bits of entropy. For comparison, adding another randomly selected OED word gets you just over 17 bits of entropy. If we're talking about making meaningfully stronger passwords, making a grammatically correct phrase and then adding misspellings (what the article calls "seemingly random modifications") is a less effective strategy than simply using a series of actually-random words from the OED.

It's better to add entropy 17 bits at a time (whole words) than trying to add entropy piecemeal, 2 bits here and 3 bits there (misspellings, punctuation).




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: