You know, if you're at the point where you can give a human-readable spec of the problem and the AI can make a passable attempt at it, that's basically the Turing Test -- hence why I think it deserves its status as holy grail. Something that passes would really give the impression of "there's a ghost inside here".
The problem is that fundamentally all our AI techniques are heavily data-driven. It's not clear what sort of data to feed in to represent good/bad algorithm design