*Seemingly random modifications of the phrase would aid in strengthening brainwa...

frisco · on March 13, 2012

Yes it does. A priori, P("freeeedom") << P("freedom"). This decreases the probability that an attacker will stumble across your passphrase using anything other than a pure brute-force approach (more on that below). Further, though not completely what you're talking about, English has huge amounts (~50%) of redundancy in its structure. It's a tremendously easier problem to attack the passphrases with knowledge of the statistical structure of English, if that's actually a good assumption to make. For example, the prior probability of an unknown word in a sentence being "freedom" is X, but when you know that the three preceding words are, "I went seeking", the probability of the unknown word being "freedom" becomes Y > X. Beyond misspellings, reordering words (say, German-style verb inversion) might also be effective in thwarting this compression while still being easy to remember. [1]

If you want to brute force your way through, sure, there's no difference in the example sentences given. Brute force is a totally ridiculous proposition given the length of the secret involved [2], though, so you're banking on people preserving language structure to aid in memorization to constrict your search space. Otherwise you're basically screwed (not that you aren't basically screwed anyway if the sentence is really not derived from literature and 10 words long).

[1] For more, this is what you're looking for: http://cm.bell-labs.com/cm/ms/what/shannonday/shannon1948.pd...

[2] http://xkcd.com/936/

Dove · on March 13, 2012

The error is in assuming that applying a simple permutation to a word increases the entropy by a meaningful amount. Capitalizing a letter, substituting a symbol, moving your hand's position on the keyboard, or repeating a letter are common things to do. A dictionary word that has had one of these things done to it, for purposes of password strength, is still a dictionary word.

People commonly think they are being random when they modify their passwords, but in point of fact, they are doing the same thing as everyone else. You cannot ever trust yourself to be random; the only things you can trust to be random come out of random number generators.

That is why I say probability does not work that way. In order for a modification to make a password or phrase meaningfully secure, it must come from a genuinely random source with a large number of live outcome possibilities. Mutating a phrase in a clever, original way that everyone else uses is pointless. It does not make the passphrase "very hard to predict". It makes it "slightly less hard to predict than it was, which was not very hard in the first place."

Fundamentally, "probability does not work that way" in the sense that just because the outcome looks random and the process feels random, that doesn't mean it is.

frisco · on March 13, 2012

> Capitalizing a letter, substituting a symbol, moving your hand's position on the keyboard, or repeating a letter are common things to do. A dictionary word that has had one of these things done to it, for purposes of password strength, is still a dictionary word.

For single-word passwords it can be approximated this way. However, for anything longer, especially a natural language sentence, misspellings makes a big difference.

The OED documents 171,146 words in active use. Assume that every word has at least two simple mispellings. Suddenly your dictionary becomes 513,438 words big. This is a linear expansion, but it's in the exponent since you're taking permutations. That's a big deal. Some mispellings may be much more common than others, so you can bias your dataset accordingly, but it's still a huge expansion.

> You cannot ever trust yourself to be random; the only things you can trust to be random come out of random number generators.

This is true, but neither really here nor there. The entropy of the passphrase is already so great, and then is expanded exponentially with the addition of misspellings and substitutions, even if the distribution of those is biased.

lotharbot · on March 14, 2012

> "for anything longer... misspellings makes a big difference."

The question to ask is, how big a difference? Put another way, how many bits of entropy do your misspellings generate?

In your above example, where each word has 2 common misspellings, each misspelling gets you ~1.5 bits of entropy. For comparison, adding another randomly selected OED word gets you just over 17 bits of entropy. If we're talking about making meaningfully stronger passwords, making a grammatically correct phrase and then adding misspellings (what the article calls "seemingly random modifications") is a less effective strategy than simply using a series of actually-random words from the OED.

It's better to add entropy 17 bits at a time (whole words) than trying to add entropy piecemeal, 2 bits here and 3 bits there (misspellings, punctuation).

stuhood · on March 13, 2012

Attacks against this type of phrase will almost certainly use dictionaries for the next decade, until it becomes practical to do rainbow tables of size 64^76. So adding elements to avoid dictionary words definitely helps (for now).

throwaway64 · on March 13, 2012

do note 64^76 is approximately 57 magnitudes larger than the number of atoms in the visible universe, its never going to be viable