Bro. The basic fundamental idea of Computer logic has always been trivial even without understanding of binary. There are tons of mechanisms outside of binary that can mimic logic, there's was never anything mysterious here. Understanding boolean logic and architecture is not a far leap from an intuitive understanding of how computers work.
Human thought and and human intelligence on the other hand was a great and epic concept on the scale of the origin of the universe. It was truly this mysterious epic thing that seemed like something we would never crack. ML brought it down and completely reduced this concept and simplified it by a massive scale. The entire field is now an extension of this curve fitting concept. And the disappointing thing is that the field is correct. That's all intelligence is in the end.
This is all I mean. Not saying ML is less interesting or easier then any other STEM field. All I'm saying is the reduction was massive. The progress is amazing but 99% of the wonder was lost. The scale at which we lacked understanding was covered in a single step and now the average person can understand the basics easier than they can understand something like quantum mechanics. There's still a lot going on in terms of things to discover and things to engineer, but the fundamentals of what's going on are clearer than ever before.
So I think what happened here is that you mistook what I wrote and took offense as if I was attacking the field, I'm not. I'm writing this to explain to you that you're mistaken.
So dial your aggressive shit back. Is everyone from Romania like you? I certainly hope not.
Usually when people say "ML is just curve fitting" they mean to continue with something like "so it will never be able to compete with humans."
The interesting thing to me about the secret language is that it seems to imply that when DALL-E fit words to concepts, it created extrapolations in its curve fit that are more extreme than the actual training samples, ie. its fit has out-of-domain extrema. So there are letter sequences that are more "a whale on the moon" than the actual text "a whale on the moon." Linguistic superstimulus.
Yes, I can confirm that's how I read the "just curve fitting" bit.
Regarding the gibberish word to image issue - CLIP uses a text transformer trained by contrastive matching to images. That means it's different from GPT, where it trains to predict the probability of the next word. GPT would easily tell apart gibberish words from real words, or incorrect syntax because they would be low probability sequences. CLIP text transformer doesn't do that because of the task formulation, not because of an intrinsic limitation. It's not so mysterious after realising they could have used a different approach to have both the text embedding and filter out gibberish if they wanted.
A good analogy would be a Rorschach test - show an OOD image to a human asking him to caption it. They will still say something about the image, just like DALL-E will draw a fake word. It's because the human is expected to generate a phrase no matter if the image makes sense or not, and DALL-E has a similar demand. The task formulation explains the result.
The mapping from nonsense word to image is explained by the continuous embedding space of the prompt and the ability to generate images from noise of the diffusion model. Any point in the embedding space, even random ones, fall closer to some concepts and further from other concepts. The lucky concept most similar to the random embedding would trigger the image generation.
Usually except I went on to elaborate that curve fitting was essentially what intelligence was. If mr. Genius here read my post more carefully he wouldn't have had to reveal how immature he was with those comments.