This is exactly the letter-based version of the "Dissociated Press" Markov-chain algorithm, right?
I'd suspect some of the perceived quality at higher orders (particularly for the source-code example) is just coming from the transition graph becoming sparse and deterministically repeating long stretches of the input text verbatim.
Goldberg says parenthesis-balancing and indentation etc. would require lots of non-trivial human reasoning to implement, but I found https://news.ycombinator.com/item?id=9585080 's example even more amazing: the NN actually learns meter and parts of speech – I don't find it very likely that you could get a character-based markov-chain to generate completely novel grammatically correct constructions (with correct meter!) from such a small corpus, since you always have to weigh correctness (high order) up against ability-to-generalise (low order).
It's not true that it learns the meter. If you look through the large generated example you'll see that there's not much consistency in the number of syllables per line.
EDIT: Oh, misunderstood. I know the RNN can do the parens-balancing, and that's why Goldberg said that parens-balancing was impressive, since with his method you'd need to add other hacks around it.
People have been working for age trying to generate passable text by looking at n-grams for whole words. It is quite surprising to see that all along we could have done better by using a simpler model!
Yes, I guess computers are better than us at deciding how to organize characters in words, judging by the author's typos: 'liklihood', 'Mathematiacally', 'langauge', 'immitate', 'somehwat', characer', 'commma', Shakespear', 'characteters', 'Shakepearan'.
I'd suspect some of the perceived quality at higher orders (particularly for the source-code example) is just coming from the transition graph becoming sparse and deterministically repeating long stretches of the input text verbatim.