The Unreasonable Effectiveness of Character-Level Language Models

bcoates · on May 26, 2015

This is exactly the letter-based version of the "Dissociated Press" Markov-chain algorithm, right?

I'd suspect some of the perceived quality at higher orders (particularly for the source-code example) is just coming from the transition graph becoming sparse and deterministically repeating long stretches of the input text verbatim.

_pvxk · on May 26, 2015

Goldberg says parenthesis-balancing and indentation etc. would require lots of non-trivial human reasoning to implement, but I found https://news.ycombinator.com/item?id=9585080 's example even more amazing: the NN actually learns meter and parts of speech – I don't find it very likely that you could get a character-based markov-chain to generate completely novel grammatically correct constructions (with correct meter!) from such a small corpus, since you always have to weigh correctness (high order) up against ability-to-generalise (low order).

foldr · on May 27, 2015

It's not true that it learns the meter. If you look through the large generated example you'll see that there's not much consistency in the number of syllables per line.

SilasX · on May 26, 2015

That same article (that you linked) showed how the RNN learned to balance parentheses.

_pvxk · on May 26, 2015

? I tried linking to a comment …

EDIT: Oh, misunderstood. I know the RNN can do the parens-balancing, and that's why Goldberg said that parens-balancing was impressive, since with his method you'd need to add other hacks around it.

JoachimSchipper · on May 26, 2015

HN discussion of the blog post this is a response to (890 points, 204 comments): https://news.ycombinator.com/item?id=9584325.

rectangletangle · on May 26, 2015

I did a project that used a similar technique, but mapping only a single state transition. For its simplicity it was very effective.

https://github.com/rectangletangle/atypical

flipp3r · on May 26, 2015

I also did a project that used a similar technique which scrapes text from 4chan, mapping N words to the next possible word.

https://github.com/skphilipp/humanity

transpy · on May 26, 2015

Why 4chan?

flipp3r · on May 26, 2015

I browse it myself, it has a really simple API and the content is refreshed quite fast on active boards.

wodenokoto · on May 26, 2015

People have been working for age trying to generate passable text by looking at n-grams for whole words. It is quite surprising to see that all along we could have done better by using a simpler model!

transpy · on May 26, 2015

Yes, I guess computers are better than us at deciding how to organize characters in words, judging by the author's typos: 'liklihood', 'Mathematiacally', 'langauge', 'immitate', 'somehwat', characer', 'commma', Shakespear', 'characteters', 'Shakepearan'.