I think it depends if one is using “AI” as a tool or as a replacement for an intelligent expert? The former, sure, it’s maybe not expected, because the prompter is already an intelligent expert. If the latter, then yes, I think, because if you gave the task to an expect and they did not notice this, I would consider them not good at their job. See also Anscombe's quartet[1] and the Datasaurus dozen[2] (mentioned in another comment as well).
This is true, but I would replace 'intelligent expert' with 'intelligent human expert'.
Graphing data to analyze it - and then seeing shapes and creatures in said graph - is a distinctly human practice, and not an inherently necessary part of most data analysis (the obvious exception being when said data draws a picture).
I think it's because the interface uses human language that people expect AI to make the same assumptions and follow the same processes as humans. In some ways it does, in other ways it doesn't. Expecting it to be the same as a human leads to frustration and a flawed understanding of its capabilities and limits.
> Graphing data to analyze it - and then seeing shapes and creatures in said graph - is a distinctly human practice, and not an inherently necessary part of most data analysis...
I disagree. Even apart from the obviously silly dinosaur and star in the Datasaurus Dozen, the othe plots depict data sets which are clustered in specific ways which point clearly to something unusual going on in the data. For instance, no competent analysis of the "dots" data set would fail to call out that the points were all clustered tightly around nine evenly spaced centers. Whether you come to that conclusion through numerical analysis or by looking at a graph is immaterial, but, at least for us meatbags, drawing a graph is highly effective.
> Whether you come to that conclusion through numerical analysis or by looking at a graph is immaterial, but, at least for us meatbags, drawing a graph is highly effective.
This is what I was trying to say - some things that are extremely helpful for humans (i.e. making graphs) might not be as necessary for AI, so asking a question and expecting a response contingent upon the particular way humans approach a problem is unlikely to get the results desired.
> not an inherently necessary part of most data analysis
You do realize that the LLMs did not find the data suspicious, right? I think your answer is appropriate if they answered (without follow-up prompting which is leaking information to the LLM!) that the data was suspicious. But in fact, all models are saying that the data is normally distributed. Sure, the author said this, but they confirmed it. If you run normaltest on any BMI or steps, you'll find that they are very NOT normal. In fact, you can also see this from the histograms.
So honestly, this isn't even about the Gorilla. You're hyper focused there because you're looking for a way to make the LLM right while not looking for why the LLM got it wrong (it did, there's no denying it, so we should understand why it is wrong, right?). The problem isn't so much about expecting it to be human, the problem is if it can do data analysis. The problem here is that the LLM will not correct you, it will not "trust but verify" you. It is a "yes man" and is trained to generate outputs that optimize human preference. That last part alone should make you extremely suspicious, as it means when it is wrong, it is more likely to be in exactly the way you won't notice.
[1]: https://en.wikipedia.org/wiki/Anscombe's_quartet [2]: https://en.wikipedia.org/wiki/Datasaurus_dozen