Easiest way is to post csv/json/whatever through the http endpoint into a replacing merge tree table.
Duplicates get merged out, and errors can be handles at the http level. (Admittedly, one bad row in a big batch post is a pain, but I don’t see that much)
What I meant is that you'll get an HTTP error code from the insert if it didn't work, so that can go through the error handling. This isn't really an "explore this thing", it's a "splat this data in, every minute/file/whatever". I've churned through TBs of CSVs this way, with a small preprocessor to fix some idiosyncratic formatting.
Duplicates get merged out, and errors can be handles at the http level. (Admittedly, one bad row in a big batch post is a pain, but I don’t see that much)