the extension apparently can be configured to use a locally running instance of the server. But yes, by default it uses the remote version, and thus you post publicly the json, which may or may not be ideal depending on what you're doing.
The fact that it needs a server at all seems unnecessary. It's all written in JavaScript, and isn't doing anything that couldn't be done in a browser, I see no reason why this can't be an entirely client-side application.
Processing multi-GB files in the browser is... fun. Doing that kind of thing on a server is easier.
*I'm not justifying doing it on the server, especially for an application like this where yes: it can be done in the client.* But I do sympathize because I know from experience why it's easier to do it server-side, without any conspiracies.
I wrote Papa Parse[0] about 10 years ago, and back then at least, it was extremely difficult to stream large files in an efficient, reliable way. Web Workers make things slightly better, but there's so many issues with large-scale local compute in a browser tab.
You get deep enough into the weeds and eventually you realize you can make it work cross-browser if you know which browser you're using (YES, User-Agent does matter for things like this) and call you crazy for trying to find out:
Despite all this, I *100%* agree and local-only processing is also a hard-rule for me as well. (That's why JSON-to-Go[1] does it all client-side. `go fmt` event compiles to WASM and runs in the browser!)
> Processing multi-GB files in the browser is... fun. Doing that kind of thing on a server is easier.
This sounds like a strawman. Not everyone wrangles multi-GB files, let alone JSON documents. Those who do are already readily aware of the implications. I mean,some popular text editors even struggle with multi-GB of plain text files.
You don't need a server to handle JSON. There is no excuse.
"the extension apparently can be configured to use a locally running instance of the server" - well that sounds needlessly complicated, I mean, the code could be implemented directly in the extension (I know, that's probably easier than it sounds if you are trying to maintain both the extension and the online version with the same code base).
"you post publicly the json, which may or may not be ideal depending on what you're doing" - that's never ideal, it's just a smaller problem (if the JSON is publicly available anyway) or a much bigger problem (if it's sensitive personal data).
> or a much bigger problem (if it's sensitive personal data).
Personal data is a red herring. It's not the only thing that matters. For starters, using this at work with anything not explicitly public is likely a violation of your contract. In some contexts, it may even be gross misconduct or illegal and potentially exposing your employer to large fines.
And, in general, I'd say a tool like this that comes without explicit, bold warning that it's shipping data off your machine, is just being rude.
> Personal data is a red herring. It's not the only thing that matters. For starters, using this at work with anything not explicitly public is likely a violation of your contract. (...)
"Personal data" means the reddest of data. If a system collects and tracks personal information then it will be expected to collect highly sensitive information that is not personal. It makes absolutely no sense at all to try to downplay security problems by coming up with excuses such as "oh it's only leaking personal data".
> The telemetry feature doesn't collect personal data, such as usernames or email addresses. It doesn't scan your code and doesn't extract project-level data, such as name, repository, or author. The data is sent securely to Microsoft servers using Azure Monitor technology, held under restricted access, and published under strict security controls from secure Azure Storage systems.
I.e. "we're not collecting personal data, so you have nothing to worry about". Plus the classic "the data is sent securely to our servers", as if that was supposed to be reassuring. It's one of the most common types of distraction I see: focusing on how the data in-flight won't leak to third parties, and ignoring the fact that it's the first party that shouldn't be getting this data in the first place.
> I mean it the other way: I see the problems routinely downplayed with excuses like "it's not collecting personal data".
You claimed that personal data was a red herring. It is not. Shipping personal data is the worst possible scenario. It's unthinkable to try to make the case that a data leak is not serious because it's just personal data.
> You claimed that personal data was a red herring. It is not. Shipping personal data is the worst possible scenario.
Which is exactly what makes it the red herring. Shipping personal data is one of the worst possible scenarios (I'd argue that, in corporate context, shipping data that's subject to export controls is worse, as it could easily get you fired, the company fined, and potentially land someone in jail) - which makes it a perfect distraction from all the other data that's being exfiltrated. "We're not collecting personal data" is the equivalent of putting a "doesn't contain asbestos" label on food packaging.
Either you do not know what's the meaning of "red herring" or you're failing to understand the problem. Personal data is the reddest of data, even and specially in a corporate context.
You can also have more data that is red, but if your infosec policies fail to prevent or stop personal information being sent, which is the lowest of low-hanging fruits to spot, then you will assuredly be leaking more red data that is harder to spot.
It makes no sense to try to downplay the problem if leaking personal data. It's the most serious offense in any context, not only for the data but specially for what it says about the security policies in place.
> Either you do not know what's the meaning of "red herring" or you're failing to understand the problem.
Merriam-Webster: "red herring [noun] (...) 2. [from the practice of drawing a red herring across a trail to confuse hunting dogs] : something that distracts attention from the real issue"
English Wikipedia: "A red herring is something that misleads or distracts from a relevant or important question. It may be either a logical fallacy or a literary device that leads readers or audiences toward a false conclusion. A red herring may be used intentionally, as in mystery fiction or as part of rhetorical strategies (e.g., in politics), or may be used in argumentation inadvertently."
This is exactly the meaning I'm using, so I think I know it just fine. To reiterate once again: leaking personal data isn't the only way telemetry can be problematic - it's not even the major issue in practice, thanks to associated risk of fines and bad PR (GDPR was quite helpful here). Saying that your telemetry is fine because it's not collecting personal data is just a way to distract the reader. It's the equivalent of advertising your heavily processed food product as safe "because it doesn't contain asbestos".
I agree, mostly. But since when isn’t it obvious that posting data with a browser will send that data somewhere? And the users here are (from what I can tell) developers.
I think this is a cool tool for public data and obviously I can’t paste private data sets on any public website, ever.
It's not obvious ever since some of those tools started to blur the line; there are plenty of such little utilities that do everything client-side, or at least claim so. I don't use them with anything but public data, as it takes one mistake or one silent update for the data to get shipped off my machine, but there's a whole generation of devs now who were growing up with webapps and online-first software, so I can easily see some developers making this mistake.
Plus, they offer a VS Code extension. It's not so obvious that it's just the same public website underneath.
Additionally, developers who understand those concerns kind of expect that other developers also understand them, and thus would not create an on-line tool like this in the first place.