Realistically? Grade based on thought process and validity of the argument, not whether it has spelling or grammar mistakes. GPT3 is still pretty incoherent over the span of enough text.
Kids' writing can also be very incoherent, sometimes more so. But incoherent writing still counts as turned in work and will get you points and teacher feedback, but GPT-3 generated should not.
Test the kids on their own essays, for example? Maybe this could itself be automated with GPT-3?
The highest-quality answer involves skilled teachers with enough time who know and understand their students. (Actually the very highest might involve personal tutors but let's leave that aside.)
Going down a few steps you might combine the automated approach with skilled teachers and maybe add human editors who can do support work asynchronously?
Watching my son try it, he spends more time reading the created essay and correcting mistakes in it than he does writing one himself. The checking process is very similar to marking, and I think it's possible he's learning more this way.
(Also, he's madly trying to automate fact checking which is doing no harm to his programming at all!)
No, I mean managing an AI to achieve a random task. Prompting, iterating, filtering - they all require high level input from the user. A LLM is a complex beast, not easy to use (yet).