Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Canon's cloud platform has lost users' files and can't restore them (digitalcameraworld.com)
646 points by chvik on Aug 18, 2020 | hide | past | favorite | 399 comments


This sounds to me like the classic “replication is not backups” situation where (at best) all of the user files were stored in RAID array someplace and that is what the malware ate. If there had been actual backups and effective backups then it should have been trivial to restore non-corrupted files. It also sounds like someone made the decision not to backup the raw images because they were “big” - that is actually the one thing they should have backed up because all of the smaller files can be regenerated from the raw ones. I would not be surprised if all of this was running under someone’s desk.


> It also sounds like someone made the decision not to backup the raw images because they were “big” - that is actually the one thing they should have backed up because all of the smaller files can be regenerated from the raw ones.

Ironically my experience has been exactly the opposite. It's the demosaic-ed, fully developed copies of my photos that are larger and harder to preserve than the original RAWs. And these files can't be trivially regenerated from the RAWs, either. Let me explain.

The most important issue is that the process of taking a raw image and turning it into an edited RGB-pixel image is not obvious, at all. Tons of steps have to happen in this process, and there's currently no way to describe this process in a way that's compatible with a single open standard or even multiple pieces of software. At all. The steps can be broken down roughly into a series of "instructions", but what those instructions mean (e.g. to Lightroom) is entirely opaque and even secret in the case of closed source programs. Even open source programs, like RawTherapee and Darktable, use entirely different and incompatible approaches, algorithms, and instructions.

What this means on a practical level is that you can preserve your raw files and the instructions for replicating the edits as carefully as you like, but without the exact same piece of software (often even the same version of the program with the same settings and defaults set), your edits are as good as gone forever.

As a result I've had to take fairly drastic steps to make sure my photos are safe. For my older photos, I keep an entire VirtualBox image with Windows and Lightroom backed up along with my raw images, so that I can be sure of restoring exactly the same output files if necessary. (This more-or-less has to be a hacked version of Lightroom because you can't take chances with licensing problems preventing the program from running now that Lightroom is subscription based.) And actually for newer photos, I've moved away from Lightroom to RawTherapee. Even though I feel it's usually inferior, I feel safer since I know the pipeline from raw to burned edit is essentially public. I keep a backed up copy of the RawTherapee source code, but even if that somehow failed someone could make a RawTherapee compatible raw converter from scratch.

So that's why edited images are actually harder to preserve: whereas with raw you can just save them in 2-3 different physical locations and storage devices, to keep the edits you have to take several additional steps and there are more points of failure. Why not just keep the static edited images too? Well, I do that for my most essential photos. But that gets into the other issue, that the edited images are actually larger than the original raws.

If the point is preserving my edits, not just having a copy that's good enough for Facebook, the images have to be lossless. This pretty much means PNG or TIFF, and in fact the latter seems to be necessary since PNG doesn't handle metadata anywhere nearly as well. Unfortunately, while the compression used on raw files tends to be pretty good (which is further aided by the images being mosaiced), the compression algorithms compatible with TIFF are pretty terrible. Add that to the fact that you almost certainly want 16 bit images, in order to preserve as much raw detail as possible (in case you want high quality prints or need to do further editing), and you end up with whopping huge TIFFs. I regularly see my TIFF output files 4-5 times larger than the corresponding raws.

In short, managing backups for a high quality photography workflow is actually a good bit more difficult than it seems at first sight - and how it looks at first is not that easy either!


By "raw" I think GP meant the files originally provided by the users, not actual RAW files.


That might be the case. Admittedly I don't know much about this cloud platform, and the article doesn't fill in a lot of details, but I would naively have assumed the target audience for something like this would be pro and semi-pro users, who would want to store RAW files, since amateurs and those just looking to share photos would get more out of Flickr, or even Google Photos / Facebook / Twitter / etc.


Also I don't think the point of canon's service was to store heavily edited versions of photos, if you were able to edit them at all on their service.


> Even open source programs, like RawTherapee and Darktable, use entirely different and incompatible approaches, algorithms, and instructions.

`git checkout v3.0.2`

`./build.sh`

No idea what you're talking about. As long as you have your RAWs and your sidecar file you can trivially reproduce the picture, at least that's how it works in darktable.

Hell, the old code that is deprecated and "hidden" in the newest versions is actually all still there and if you import your stuff in the newest version it is practically guaranteed to look exactly the same. In fact there is a CI system that checks for delta-Es in each release.


I mean that RawTherapee and Darktable don't use the same code as each other at all. The instructions (usually sidecar files) the tell each program what to do to generate the output are completely incompatible.

> the old code that is deprecated and "hidden" in the newest versions is actually all still there and if you import your stuff in the newest version

I'd expect this to be the case with any good photo developer, but it doesn't pay to take chances. There have been significant bugs in the past with DNG files, for example, and different bugs could be introduced in the future, or bugs you didn't know about could be fixed, and so on. There's a lot of stuff that could go wrong. It's good to be able to just tar the source code.


> I mean that RawTherapee and Darktable don't use the same code as each other at all. The instructions (usually sidecar files) the tell each program what to do to generate the output are completely incompatible.

I'll be honest, this is probably the most ridiculous argument against anything I've ever heard on hacker news. This is like saying C/gcc sucks because Javascript exists. What the fuck?

> I'd expect this to be the case with any good photo developer, but it doesn't pay to take chances. There have been significant bugs in the past with DNG files, for example, and different bugs could be introduced in the future, or bugs you didn't know about could be fixed, and so on. There's a lot of stuff that could go wrong. It's good to be able to just tar the source code.

So... it's a non issue? I don't even...


> I'll be honest, this is probably the most ridiculous argument against anything I've ever heard on hacker news. This is like saying C/gcc sucks because Javascript exists. What the fuck?

Well, sorry you feel that way. /s

The difference is that C and Javascript are widely implemented open standards. Lightroom, Darktable and RawTherapee are using three different opaque, undocumented approaches to developing raw images. Neither one has been reimplemented in other software, and it would be extraordinarily difficult to actually do that, because of all the specificity and quirks. You basically need the original software, which means making sure you can still compile it, making sure you have a platform it can run on, and so on. This is more complexity than most people ordinarily expect when they talk about "backing up photos", and that's exactly my point.

> So... it's a non issue? I don't even...

I should probably not even bother responding, since you already showed with your original comment that you didn't bother to even read my post, but I really don't get this. I explained a very specific issue: that there's a lot of complexity to developing raw photos, which means you have to take extra steps to make sure your edits are properly backed up. Open source software has a distinct advantage because you can archive a copy of the software yourself, but it doesn't change the fact that you do need the original software. And in fact you might need the same version, that's what the example of DNG conversion is supposed to illustrate.

This is ... the opposite of a non-issue. In fact it's a very specific issue that I took quite a bit of time to explain in detail. Having to back up your software, or possibly even an entire VM along with your photos to make sure your edits are preserved, goes way beyond what your average person, or even many photographers think they have to do to keep proper backups.


To add to this, I have a love/hate relationship with Adobe updating their RAW Engine version, as opening up old RAW images (with edits stored in XMP sidecar format) will look completely different due to the engine interpreting edits differently, or in extreme cases, when Adobe decides to add or remove features, which has happened to me numerous times during the last ~15 years since going from RawShooter directly to Lightroom. I usually delete exports for non-critical projects to save space, but any paid jobs are backed up multiple times because I've been bitten before by not being able to render the exact same TIFF for print as 5+ years ago.


Actually this highlights a different issue--for photo editing, there isn't one "right" way to do things, like for instance text editing in like Microsoft Word or some shit. darktable has a very different approach compared to Lightroom for instance.


> Neither one has been reimplemented in other software, and it would be extraordinarily difficult to actually do that, because of all the specificity and quirks.

Why is this a problem? That's like saying "I need a good GPU to run this game, this sucks!" Well, no shit I guess?

And it's not even close to that because you can always just buy/compile/compile the right version. And for darktable you don't even need that, backward compatibility is guaranteed. Also darktable has a rudimentary Lightroom to darktable conversion tool (never used it).

> This is more complexity than most people ordinarily expect when they talk about "backing up photos", and that's exactly my point.

Not sure about the other ones, but for darktable, it's as simple as keeping the XML alive.

> I explained a very specific issue: that there's a lot of complexity to developing raw photos, which means you have to take extra steps to make sure your edits are properly backed up. Open source software has a distinct advantage because you can archive a copy of the software yourself, but it doesn't change the fact that you do need the original software. And in fact you might need the same version, that's what the example of DNG conversion is supposed to illustrate.

If you don't like having a non-destructive copy, save a high quality JPEG with your edits. Also, I can't think of any other applications (not just photo editing) that you can easily have future-proof non destructive copies of whatever you need to save.

For example: for audio, you can always just save a WAV or FLAC of your work, but you're still relying on the fact that your DAW workflow is still gonna exist 10 years in the future if you try to save your project as opposed to a mastered copy.

Code is also similar. You can save a binary which will probably work, but if you want to save your repo, unless it has good package managers you might have issues later down the line, like that left-pad incident a while back.

Hell, even for something completely different: language drifts over time, and reading Shakespeare is kind of hard unless you know how to read Middle English. And we don't seem to have a problem with forward compatibility, it's just a bit of a pain in the ass. I read it in High School and I didn't go insane.

This isn't a photography issue, this isn't a software issue, this is a core part of human experience.

> This is ... the opposite of a non-issue. In fact it's a very specific issue that I took quite a bit of time to explain in detail. Having to back up your software, or possibly even an entire VM along with your photos to make sure your edits are preserved, goes way beyond what your average person, or even many photographers think they have to do to keep proper backups.

You don't need a VM, at least for darktable. You're making an issue that hasn't appeared but is potentially possible into a real issue for yourself. Maybe it will become an issue, but just figuring out how to install the old version is gonna be good enough. You do you mane. I'll stick with running dt natively and trusting the forward compatibility and OpenCL acceleration.


For storing lossless rgb you might consider FLIF given you've demonstrated some flexibility in your setup, https://flif.info it supports 16-bit channels


Thanks for the suggestion! I tested it on one file. Support for the format might turn out to be a problem: to use the flif command line tool, I had to convert my input file from TIFF to PNG first, which means I'd have to be extremely careful not to lose metadata if I started doing this for real. It was also pretty slow.

File sizes (DNG lossless, other formats 16 bits):

* DNG: 18.5 MiB

* TIFF: 84.2 MiB

* FLIF: 55.1 MiB

So while FLIF is unsurprisingly much better than TIFF, it's still ~3 times larger than the DNG, which means that a pretty noticeable amount of additional disk space would have to be sacrificed to store the final edited versions of all my photos.


This is lossless as in it losslessly preserves an frame buffer, but it isn't lossless in that it can be edited in RAW losslessly.


I'm not sold. In the cases where you lose the raw vs a post processed image, one of those is significantly more lost information. It may be non trivial to reproduce a post process image from a raw image, but the other way around is usually impossible.

In one case you've lost original sensor information, in the other case you've lost some parameters and possibly the algorithm used. The former is infinitely more difficult to synthesize from scratch.


I think that's a really shallow way to look at it. For a professional photographer it's the final version of the file that is actually sold, not the RAW or film negative. These final files are often not some 300x200 web preview. These are full sized images. You can argue the final edit has more value and is the one to preserve. Ansel Adams broke the process down into 3 parts with the initial capture being but the first. So losing the final edits is losing 66% of the total work.

I'm only a hobbyist photog and the bday parties and weddings I've shot result in thousands of images. I can't imagine having to shoot multiple events every week for years. For a pro, the edits represent thousands of hours of work. That the work can be redone theoretically might mean very little to a working photographer.


> For a pro, the edits represent thousands of hours of work. That the work can be redone theoretically might mean very little to a working photographer.

This is exactly the issue I had in mind, well said.


My wife shoots weddings. A wedding lasts an hour, the reception maybe three or four. She’ll spend the better part of a week editing the whole event.

Raw images seem to be very important until the very moment they’ve been edited (or, hell, rejected from editing). Then their worth is very little.


Some images require hours of painstaking editing. There are some photographers who will spend days or more editing an image. The edited image is as much an instantaneous snapshot of the photographer as it is the scene in front of the lens. Looking at my Darktable catalog, it is easy to estimate that I spend at least 250 hr/yr editing images.

Raw files, and sidecars, retain their value because sometimes one wishes to revisit an image/edit. If I had to pick one or the other, I might prefer to lose the edited images. That is because I believe I have more/better days of my editing life ahead of me and because I underestimate how much editing I have done. Moreover, there are plenty of RAW images awaiting the day they come to life.

In commercial work, like weddings, the final product is the edit. If there is one thing that must be retained, that is it.


So far, I have found that Darktable's edits have survived changes in versions. Someday there will probably be breaking changes, but I haven't yet felt it personally.


every single "dead" code is still in marketable and isn't accessible unless you try to import a sidecar from a previous version. LR might be shitty but the fact that the guy says darktable has this problem is just silly.


This, like reproducibility in machine learning workflows, is one of those things that needs to be baked in at the start or will become a complete nightmare.


This was a fascinating, but also disturbing, read; it does sound like your solution (quasi legal as it might be) is the one that makes the most sense if you do this professionally


The original story spoke of the rumours of malware involvement, but in the update which forms the first half of the linked article, Canon says there was no malware involved.

(“Replication isn’t backup” still applies, of course.)


accidental malware (I get that the name implies malicious and deliberate but that never stops definitions from expanding)


Do any cloud providers create backups on top of replication though?

Backing up databases (terabytes) is feasible, they're not that big.

But an entire cloud storage for photo and video for millions, we're talking maybe exabytes. The notion of making "separate" backups seems cost-prohibitive.

I am curious, though -- for services like Dropbox or Google Drive, how many replicas are there of your files? I know there must be redundancy in case a disk fails, but do they keep 2 instances of your data or more? And are the instances spread out geographically, some or all?


Google talks a bit about their general backup strategy in the SRE book:

https://landing.google.com/sre/sre-book/chapters/data-integr...

The short version is that the backups are a mix of short-term local backups + backups across the network on distributed filesystems + offline tape backup.

I can't speak to Drive, but the tape backup is certainly used for GMail. (There's a case study about Gmail having to restore from backup in the above link.)


"Do any cloud providers create backups on top of replication though?"

We do.[1]

For exactly 1.75x our normal pricing, we will replicate your entire account, nightly, to a geo-redundant site which is not open for normal customer use (and, therefore, has a lower risk profile). This GR site is the he.net core datacenter, in Fremont.

It's also worth pointing out that replication to rsync.net buys you malware/ransomware protection since your account is snapshotted, by ZFS, nightly, and those snapshots are immutable (read-only).

[1] rsync.net


I've been wondering about ransomware protection through snapshots. Presumably (and I do know more or less nothing about it) the malware aspect of it is present on the system significantly before the ransomware aspect is triggered - so restoring to yesterday's backup just puts you back in the position to get pwn3d again? How do companies get around this?


No you don't. What would you do if all the files of your customers that did not pay for backups are lost ?


>customers that did not pay for backups are lost

But they pay for it.


>> we're talking maybe exabytes

I don't see the size as the big problem. If one cloud can handle the size, the backup storage system can too. For me, the real issue is timing. How do you backup a cloud full of constantly changing data? Do you draw a line in the sand, an image of the cloud state at a particular moment? Is that even possible? You have to do backups of smaller chunks, individual accounts, but eventually that just looks like another software-managed internal array structure rather than a true duplicate. Your backup system is just as susceptible to deletion error as the cloud it lives within.

Mandatory xkcd: https://xkcd.com/1737/


"For me, the real issue is timing. How do you backup a cloud full of constantly changing data? Do you draw a line in the sand, an image of the cloud state at a particular moment?"

I can only describe what we do, and, of course, there is an enormous scale difference here, but ...

Every single rsync.net account is it's own ZFS filesystem which means every single account gets snapshotted[1] nightly on a schedule. This means that the enormous operation of "backing up" all of rsync.net happens in small, manageable, chunks.

Of course, the ZFS snapshots of a customer account are not "backups" per se, but if a customer choose "geo-redundant" storage in another facility, it is those very same snapshots that we (zfs) send over. Those are, indeed, backups.[2]

The most interesting part, in my opinion, is that the daily/weekly/monthly snapshots are immutable. So you can publish your rsync.net credentials or suffer a disgruntled employee or ransomware attack, etc., and those snapshots remain safe - they are read-only.

[1] Accessible, browseable, in ~/.zfs/snapshot

[2] GR storage costs 1.75x normal pricing.


So it does not assert that the filesystem is in a consistent state before it snapshots. IE we could be midway through applying a database transaction; or there could be a reference in one file to another that doesn't exist (yet).

It could be argued that one needs to do a site wide "write cache flush", stop, and snapshot. Not that I think for one second the service provider should (or could) be in a position to detect when a good time to snapshot might be....


It can't be "just" as susceptible as what happened here.

If you even do the straightforward "don't freeze time anywhere, just rolling copy all the files onto a hard medium", you may end up with logically inconsistent data, in that files from later in the backup may actually not have existed alongside ones from earlier ones, but if you write down the timestamps periodically, you end up with a way better end state than "I lost all the files".

At worst, you end up with "I may have lost the last 3 minutes worth of files from the last 3 minutes of the last backup I did".


> Do any cloud providers create backups on top of replication though?

Yes - I work at one of the FAANGs for a team that's doing precisely this. We develops an internal disaster recovery tool that creates backups of data files that can't be touched by the creating application, and that can be read back in a disaster event to recover the data.


Are those backups for you or for your customers? The parent is referring to cloud providers, e.g. AWS S3.


both


I’m sure this is dated but here is a write up in Google’s backup system from High Scalability - http://highscalability.com/blog/2014/2/3/how-google-backs-up...


Just using AWS S3 as an example, you can copy a bucket to another bucket (preferably in a different region) with a policy of retaining all versions- preventing the problem of a delete whether malicious or accidental, from being reflected in the backup copy.

There is also the option of using Glacier for lower cost long term storage.

If you are asking does regular S3 provide a true backup capability out of the box, it does not, or at least did not last time I actively looked at this about 2 years ago.


We don't think AWS does periodic offline backups of stuff stored in S3 so that they don't find themselves in this exact embarrassing scenario? Regardless of whether it's user-facing for AWS users, I'd hope they do, or certainly that they did before they got a good sense of the long-term reliability of S3 as a big system.


I think they do all sorts of backup stuff - and as a consumer of S3 you need to evaluate how resilient their backing up is. S3 is pretty murky about permanence guarantees so I'd always look at making sure there is a separately maintained replication script to some medium I have control over if that data is irreplaceable and the costs of losing it are significant to the business.

Judging those costs and making that call is a complex matter of course.


What would such a service then do as an end-user beyond what `aws s3 sync s3://my/bucket /mnt/my/backup/drive/` does?

(Honest question, not provocation. To me either you trust AWS or any party to never lose your data [unwise], or you basically ask them to offer a way to rsync, which AWS does)


I think that's essentially what such a service would do. You might throw in some periodic less frequent syncs (maybe you sync down to the main backup every day and sync that backup to a secondary backup weekly or monthly) and maybe some of those successive syncs are done to hosts that are usually disconnected from the network to add in a firebreak.


They publish durability numbers for S3:

designed to provide 99.999999999% durability of objects over a given year


Well, versioned bucket replication is essentially backup when you configure it in a way that the writer to the first bucket can't do any operations on the second bucket. Since bucket replication doesn't replicate deletion markers you essentially end up with your data being duplicated instead of just replicated writes.


A thing stored in google cloud is generally erasure coded, which provides redundancy against the failure of individual devices or hosts, and another copy is stored separately in a geographically separate place. So you might think of it as there being 1.7 copies of your file in each of at least two places.


> Do any cloud providers create backups on top of replication though?

Any serious database does offer backups in addition to replication, and that kind of validates OP's point. As for object stores there are a variety of techniques [0].

> But an entire cloud storage for photo and video for millions, we're talking maybe exabytes. The notion of making "separate" backups seems cost-prohibitive.

Cold-stores like the one Facebook built would be apt [1].

> I know there must be redundancy in case a disk fails, but do they keep 2 instances of your data or more? And are the instances spread out geographically, some or all?

See [0]. Backblaze do share quite a bit too about the space, too [2][3]. With minio, one could run their S3-like system on-premise [4].

[0] https://maisonbisson.com/post/object-storage-prior-art-and-l...

[1] https://engineering.fb.com/core-data/under-the-hood-facebook...

[2] https://news.ycombinator.com/item?id=10540361

[3] https://news.ycombinator.com/item?id=17550837

[4] https://news.ycombinator.com/item?id=12392081


Basecamp used to, I have no idea if they still do. Databases were replicated, and also backed up remotely. User uploads were replicated in the storage system, and also backed up to S3.


I haven't looked at dropbox in a while, but they at least used to have an option for keeping revisions. It did come with an extra cost.


Unfortunately Dropbox's revision history isn't reliable enough for a backup. If you notice, their marketing carefully avoids use of the word "backup" despite the features seemingly implying it.

I had a client where a glitch in the Dropbox sync caused 300k+ files to be deleted when we added a new PC. Dropbox support was unable to successfully undo all the file changes, and I had to get a ticket with CS escalated to a special team to get everything restored. Even when it was finished, they could not give a guarantee that every single file deleted was restored.

Maybe things got better since they've launched Dropbox Rewind, but given that revision history has been a feature for years and it still didn't work right, I no longer trust them as a backup.


I'm completely soured on dropbox since they dialed the greed up with the arbitrary device count limit and nagging suggestions to upgrade that can't be turned off if you're near the max. I don't care enough to stop using it but I will never give them money after this.


Not to mention no delayed deletion.

Ideally, a "delete" should mark images as unavailable and queue them for deletion at a later date (e.g. in 30 days). This provides protection against accidental deletion by users, accidental account deletions/deactivations, paid accounts terminated due to lack of payment, automated software mistakes (such as this), and so on.

The last company I worked at, 80% of our database was inactive, useless, or redundant data, all kept to protect against all kinds of issues (such as reused usernames, subpoenas, annual reporting, etc). We could always zero out PII from an account, but we never removed a record we didn't have to. It led to a lot more RAID migrations than I would have liked to have done, but it certainly made the application easier to manage. Plus, we never worried about fragmentation or holes in our data files, cascading deletes, etc.


There was an event when a startup I was at asked Basho (they were the company behind Riak db) about backing up our data. Backing was a little side-feature that was possible to rig up, but I recall they looked at this inquiry as if I had two heads -- as if to say, it's replicated, why jump the shark? There was a bug with one of the Riak releases, and all the data was lost. (When we scaled up with this buggy Riak release, the empty node assumed master roll, and all the child nodes went, ah... the new state has no data, let's all delete records 0..k. Fun times.)


BTsync did that to me!

One computer had a hard drive failure, BTSync deleted all the files from the computer that didn't.

Doubleplus ungood.


Thats why syncronisation/replication is NOT a Backup.


What’s the difference between replication and backups? Is the distinction that backups must be stored on separate infrastructure, whereas replication might still be 1 or 2 points of failure?


Replication is about having data level redundancy to protect from drive failure. Backups are about having point in time snapshots of the system state, and about having them tiered from a location perspective. The 3-2-1[1] principle says to have 3 total copies, 2 of which are local but on different devices, and 1 of which is offsite. This gives you tiers of recoverability.

It’s important from a backup perspective that it’s point in time as well, otherwise as soon as you get ransomware that encrypts your file you now have replicated those changes everywhere.

[1] https://www.backblaze.com/blog/the-3-2-1-backup-strategy/


I would add to that, that the offsite one should also be offline.


That’s a lot harder to pull off. What methods do you use to accomplish this?


Where I worked, weekly point in time backups were required. Those backups were put onto tape drives, those tape drives were set on a pallet and then driven by truck to an offline second location. IMO _that’s_ how it’s done properly.


When we used Iron Mountain at my last job, there were two interesting additional concerns:

1) That nobody in our company actually knew where they were physically stored (insider risk?), but

2) That we had assurance that the physical storage was far enough away that a physical disaster in our area wouldn't also touch where the offline storage was located.

There's a concomitant jump in RTO if that's what you do this, but hopefully that's well understood among the stakeholders.


Definitely a commitment jump, and not always necessary. We ran our own data centers, off the topic of backups, and it was wild to me that they were so disaster proof. There were different power lines from different power companies coming in in case either of the power companies had an outage, and some giant diesel power generator that could last a long time while fully powering the data center.


Well, once upon a time, when I worked at a smaller company, I kept a revolving collection of tapes at home.

A slightly larger company also used tapes, and stored them at an undisclosed (to me) offsite location.

A much larger company kept it all in datacenters, but the "offline" backups were disconnected from the WAN when they weren't actively being used.


That’s a lot harder to pull off. What methods do you use to accomplish this?

Depending on how much data you're backing up, sneakernet works.

When I was still in the office, my company had me rotate a set of backup hard drives between the office lockup and a strongbox in my house. The notion was that it was unlikely that both the office building and my house 12 miles away would both burn down at the same time.

Of course, now I'm working from home, so all of the eggs are in one basket again.


External USB drives for me. One set of drives is stored in a fire safe at home, one set of drives is stored at the office, and one set is stored at a relative's house in another state. The ones at home get refreshed most often, the ones stored at the office get refreshed about once a quarter, and the one at the relative's house get refreshed during holidays and family get-togethers.

The drives are encrypted (Truecrypt) since they will be outside my physical control. The ones at the office I am prepared to abandon should I get fired/laid-off.


S3 bucket with object lock is as offline and convenient as it gets


If you're going to pitch an AWS service as "offline", then pitch Glacier.


I don't know much about S3, but isn't S3, by definition, online, not offline?


AWS S3 (and a few compliant providers) offer immutable options, both in governance/compliance mode.

Allegedly, compliance mode is unalterable by any account period, I guess the equivalent of the immutable attribute without an overriding account. I'm not immediately familiar with any literature on attacks on this feature, but I've also not searched hard; however, I know from my clients that it's an accepted form of WORM, and that cloud storages like S3 are considered in the same vein as tape when immutability is in play.

I suppose it will be a case in the future that proves the efficacy of AWS/Azure/s3 providers, but for now, a lot of regulatory policies for 3-2-1 allow for such storage to fulfill the "2" part of the 3-2-1 rule.


I’ve always wondered if Amazon backs up S3. I don’t think they explicitly say but I get the impression that it is the user’s responsibility to replicate to a second region to guard against data loss so I am guessing not. Object Lock wouldn’t protect against an S3 failure.


They say it's designed to provide significant durability : https://docs.aws.amazon.com/AmazonS3/latest/dev/DataDurabili... (99.999999999% durable over a year per object, and able to sustain data loss in 2 facilities).

Given the peeks that amazon has provided into the scale of S3, I don't know if you CAN 'back it up'.


As far as I can tell it’s replicated (by default - see “reduced redundancy storage”), not backed up.


Or at least in an append only data store.


Though, tread carefully there. If the thing that makes it append only is that you're just not using its destructive update features, then the store isn't really append only. Just because everyone agreed to limit the ways they interact with the data store doesn't mean you can trust buggy code, sloppy programming, and attackers to honor that agreement.


I see a bunch of explanations, but I don't think any of them really drove home the reason for why you usually need both.

Imagine if you had a document that stored useful information. You had this document automatically replicated to another system in a different time zone every time there was a change.

You think you're doing great, if there is an outage in the one system you just get your important document from the other system.

Then one day someone accidentally copies and pastes the wrong data into the file. Now what do you do? If you goto your replicated copy it also has the bad data.

The answer is you should have had backups too. so you could go back an hour,a day or a week or even much longer. Depending on how much data you have, how often it changes, how important it is, and how quickly someone would notice bad data.


I think the biggest factor is timing (although backups should also be on a second system).

In short, if anything that happens to the files is immediately copied to the "backup" then you don't actually have a chance to recover from any software problems. Whereas, if you make a copy of the data every night and keep the last 30 copies you can find an issue like this and go back in time to retrieve the files from before it started.


Replication is intended to create a 1:1 copy of the data. If data is removed from the primary system, replication will ensure it's removed from the replica.

A backup can be a 1:1 copy but it should be set up such that something going wrong with the primary (e.g. cryptolocker malware), can't affect it (since "something went wrong with the primary" is what the backup is intended to resolve).

To do that you could take the backup offline or use features like filesystem snapshots to ensure that changes can be rolled back.


The problem is that most replication setups will replicate deletes (and other modifications) as well. So if misbehaving software deletes (or corrupts) things, those deletes (or corruptions) will be replicated, and the replication does not give you a way to actually recover from a mistaken delete.


Backups protect against a wider class of problems. For example, a software vulnerability/bug or a human error could result in the deletion of an object in all replicas (because deletions are replicated too), but you'd usually need another incident to occur for you to lose the backup as well.


If your drive fails (or a single computer fails) replication saves your data, and keeps things running seamlessly.

If you `sudo rm -rf / --no-preserve-root` your drive, replication deletes everything, while backups let you restore to the last time you took a backup.


Replication includes replicating deletes (and overwrites, and subtilier changes). Replication is READ / WRITE.

Backups are WRITE ONCE. They persist if the original is deleted or modified.

Separate infrastructure and geographical distribution are orthogonal.


Classically, backups are taken then stored offline (e.g. on archive tape, on optical media, or on drives that are disconnected and put in storage between backups). Otherwise, if they are not 'cold' backups, they are still disconnected so if the main storage blows up, it won't impact the data in the backups.

Replication usually involves storage that is powered up and connected somehow to the same systems as the main storage; and any data that's corrupted on the main storage would propagate to the replicated storage.


A backup would be resistant to someone accidentally deleting all the files, for instance. Replication would not.


replication is not backup and backups need to be replicated too.. and it goes on and on..


This is all because someone thought it would be “easy” to add a stream of recurring revenue for cloud photo storage but didn’t take the time to design the service from a technical perspective to be resilient enough to not crash and burn.


This can be just calculated risk.

Incredible damage for some users can be negligible damage to the company. In this case just 10TB of data was lost, so maybe thousands of users for the long term storage option.

Losing customer data has potential for big reputation or brand damage, but surprisingly often the damage is relatively small. Putting in too much effort when potential damage to business is minimal may not be worth of the cost.


This comment reads as "This is normal; nothing to see here move along".

Which is more of an indictment on the state of things than on the comment itself.

If you can't protect people's shit, don't offer to hold onto it for money!


No, that comment sounds like the insurance job described in Fight Club[1]

> Narrator: A new car built by my company leaves somewhere traveling at 60 mph. The rear differential locks up. The car crashes and burns with everyone trapped inside. Now, should we initiate a recall? Take the number of vehicles in the field, A, multiply by the probable rate of failure, B, multiply by the average out-of-court settlement, C. A times B times C equals X. If X is less than the cost of a recall, we don't do one.

[1] Video link: https://www.youtube.com/watch?v=SiB8GVMNJkE


A farmer's donkey dies. His son gets an idea on how to make some money.

Sometime later he turns up at dinner with $200. "Where did you get that money?" His father asks.

"I sold raffle tickets for the donkey" his son replied.

"What?!?" Shouts his father. "You can't sell raffle tickets for a donkey that you know has already died! Weren't people mad?"

"Just the guy who won" said the son, "so I gave him his money back."


Really, he'd want the value of a donkey - not his money back.

But with a raffle, the 'take' can easily exceed the cost of the prizes. So I guess the son would still make out ok.


He would want the value of the donkey. The son would offer the value of the tickets. Lawyers would make it somewhere in the middle. Then the lawyers would walk off with all the money.


This has been a valuable thought experiment for me.


Or that actually happened, in the known Ford Pinto Memo "affair":

https://en.wikipedia.org/wiki/Ford_Pinto#Cost–benefit_analys...


And a little further down the same page you linked....

https://en.wikipedia.org/wiki/Ford_Pinto#Retrospective_safet...

Turns out the Pinto was within the realm of normal for a car of that type and era and if you're building something normal it's hard to justify recalling it at great expense unless everyone else is recalling it to (in which case that would be normal).


Yep, but that is an analysis from a Legal viewpoint.

What - I believe - made the Pinto case memorable was not that in a given subset of accidents it was (or was not) slightly more dangerous, it is the fact that the matter was looked into by the manufacturers and that it was waved off on a mere cost/benefit basis (with a supposed cost for increased security per car of US$ 11).

Mind you this is all in all normal, any safety norm or related technical specifications is (or should be) based on the "reasonable costs for society to obtain a reasonable increase in safety".


This thread has been a bit of a roller coaster.


AND in the Suzuki Samuri rollover deaths (I knew one victim): where they knew there was an issue very early and had a "crisis plan".

https://en.wikipedia.org/wiki/Suzuki_Motor_Corp._v._Consumer...


And meanwhile something like 20% of Bronco IIs got rolled and nobody cared. But as soon as someone puts an Explorer on it's roof or a Samurai barrel rolls through an intersection and there's a problem.

Pretty much that whole generation of midsize SUVs rolled easily. That's just one of the engineering trade-offs all those vehicles made and anyone can tell you that. If Consumer Reports was picking my SUV as the one to target I'd assemble a "get these people off our back" team too.

The media has made more than a few scandals out of pure normalcy over the years. The Frontline(?) gas cap thing comes to mind as a good example. Reality doesn't seem to matter nearly as much as the marketing team and lawyers you use to shape the public's perception is what matters. It's no wonder that corporations reaction to reported problems seems to always be to try to steamroll whomever is complaining and/or keep things on the down low.


I think GP's point was that they did the fight club math ahead of time, knowing they'd have a certain number of deaths and lawsuits, and proceeded to market with that plan.

Spin/media/marketing/CR/NHTSA after it's all public is a bigger subject, yes.


He forgot to add up the cost of bad publicity due to car crashes. If not somehow thrown under the carpet it would harm the car's credibility reducing overall sales (and profits)


Or what? They could and they did.


Or they could like, not.

I think there's a world where corporations aren't given a carte blanche pass on human decency.

I believe that world is nicer and where possible I work to give it room to exist;

I boycott Disney because I believe what they did to the copyright system is wrong.

I don't use Google or Microsoft products because I don't think they treat users with respect. (Minus youtube which I use a mirror, and the occasional google search when I get tired of banging my head against ddg (~1-2/week, but my tolerance is probably higher than most))

I'm even abstaining from my copy of Overwatch despite loving the gameplay of a character in it, because I don't believe supporting the HK protests warrants a ban.

I'm not going to boycott Canon from this, incompetence is generally forgivable, but I certainly think it's noteworthy, and if it comes to light they are as nonchalant as OP says they can be then I won't be supporting them.

Downthread someone had mentioned there was a 10GB/user cap, so it's unlikely someone had been using it as their only copy which is also reassuring.

But then again maybe someone was just getting into photography after having a baby, and their precious first photos were lost! Or photos from a special date, or first (god forbid last) moments with a pet; it's sad to think about, and it can't be justified with "darn, guess it's gone forever so Canon could gain a small advantage using some risk calculus. Oh well, I give more of a damn about the stock market than those photos."

Don't give me that shit!

Disclaimer: I only own a pitiful EOS 20d and a cute f1.4 50mm, but my friend has been super into photography and I've been wanting to get into it again.

Thanks for reading my rant.


> there was a 10GB/user cap, so it's unlikely someone had been using it as their only copy

I don't understand why a small total size implies people are less likely to use it for their only copy of something?


I think because video and photos have the potential to be huge, especially in higher-end cameras, so it wouldn't take very long to exhaust 10gb of data, implying that they might be using a service that allows more for longer-term stuff (e.g. S3).


> I think there's a world where corporations aren't given a carte blanche pass on human decency.

I'll wager a guess that in that world a corporation is not a 'legal person'. In that world, a "person" must have human characteristics. They call it the Legal Turing Test.


"Minus youtube which I use a mirror"

?


Not the poster, but intent was probably something like "... which I use [via] a mirror"

I suppose it partially aligns with the intent of a boycott


Mhm, it was a disclosure of the extent of the boycott, as it is unfortunately difficult to get away from YouTube. Taken further it's also about denying data (which recommendations I click through, at which times, from which videos, in which moods etc, and social graph via view timings).

Edit: I don't believe that kind of stuff should be collected without explicit, informed user consent. Make it clear what you're collecting and doing with it and chances are I'll opt-in.

https://instances.invidio.us/ Deleted sibling comment: https://i.imgur.com/YhXIU6O.png


"If you can't protect people's shit, don't offer to hold onto it for money!"

If you can't sell a reliable car, don't make one?

Products fail sometimes and we in the tech-sphere get to laugh about it. No such thing as 100% failure proof. Customers that paid for the service would be angry but no doubt, they'd get their money back.

Yes they might have lost more, but that's the business sometimes.


There's a price point out there for everybody, and the lower ones are not very secure I imagine.

What's missing is, some certification level that tells you how secure the data is. So you know what you're buying into?


You mean like an SLA which says the availability of your data?


SLAs are bullshit.

Refunds are fine for retail customers, and most SLAs I’ve ever had the need to discuss seem to be based on that model.

If I buy a computer from you for $1000 and it breaks, I’m out $1000, I want my money back. Vendor might be fine with that. If I buy a commercial oven from you for $40,000 and it breaks I’m out the price of the oven, plus half of my sales for a week while it gets repaired, wages for people who are now half useless to my business, and loss of trust and/or customers for my company since I had none of my signature baked goods for the entire week.

I don’t want SLAs, I want a warranty. Insurance.


Your comment is basically saying capitalism is bad.

I agree.


Then turn off the computer and cancel your internet.


You know, people can hold even very critical views of things without full-on 120% zealot-level boycott, especially if it's about the very system that produces next to every single thing necessary for life and survival, save oxygen maybe.


Capital markets produced the the goods and services that make this forum possible. If you hate capitalism, you hate basically everything that got you to this conversation. So why not stop feeding into it, in the few obvious ways you can?

Because it's fake, you don't hate capitalism if you like Hacker News, even if you think you can. Hacker News is literally owned and operated by venture capital firm


There is a difference between being critical of something and hating it. And if you want to change a system, there are not many ways to do so without interacting with it.

Say you happen to be born into a communist regime but dislike communism. Getting out of it might not be feasible for any reason ranging from not being allowed to, not having the means, to not wanting to leave your community behind. No matter if you now want to completely tear down the predominant social-economical systems or just tweak them, either way you going to have to interact with them.

If you want to destroy capitalism, hacker news is probably not the most promising place to start. If your goal is to tweak it I actually think it is.


> There is a difference between being critical of something and hating it.

The original comment said “hate” not “is bad”; probably the source of a lot of the confusion in this thread lol.

> If you want to destroy capitalism, hacker news is probably not the most promising place to start. If your goal is to tweak it I actually think it is.

Idunno, even just calling it “bad” seems to imply that there are better things, which is not really a statement about “tweaking” anything. A statement in favour of tweaking capitalism would be something like “capitalism on its own can not reliably produce moral goods”.


> The original comment said “hate” not “is bad”

I read that as a somewhat hyperbolic expression of dislike, like "I hate spinach" – strong dislike, but not actual hatred. I might actually say that I hate spinach, because I really can't stand the taste, but I don't feel hatred towards it. But yes, I guess that may have caused some misunderstanding.

> just calling it “bad” seems to imply that there are better things, which is not really a statement about “tweaking” anything.

Like an improved version of that thing. Case in point: My last batch of salsa verde was bad, too few tomatillos, too much seasoning, so I'll tweak the recipe with a note to make sure the rations there don't skew that way next time. I don't have to give up on my recipe for good, or on salsa verde for good, or on food for good.

> A statement in favour of tweaking capitalism would be something like “capitalism on its own can not reliably produce moral goods”.

There are different variantions of capitalism implemented around the world right now; there are parameters that can be tweaked, like with the German "soziale Marktwirtschaft", which was a version of a capitalist market-based economy with socialist elements. Even in the US, the flavor of capitalism in Roosevelt's day was quite a bit different from the one currently in place, I believe.


That doesn't follow.

That's like saying:

- Slavery is bad.

- Then don't wear cotton clothes.

Maybe instead change the system so there's no slavery, and have non-slaves pick up cotton and make cotton clothes?

Non-capitalist countries can have computers and internet too. In fact most computers today are made in "communist" China.


> In fact most computers today are made in "communist" China.

No Most computers today are made in "Special Economic Zone" China that has exempted almost all communist economic policies in favor of Free market capitalism because even hard core authoritarian communists know that their economic model is a path to poverty for everyone including the leaders


Nevertheless, such a system is quite a long way from a true free-market approach. And I think it demonstrates that market based systems that aren't "free markets" can be quite effective, and indeed from a theoretical perspective it seems to me very likely that such approaches work better than a "remove regulation at all costs" approach.

EDIT: there was initially a typo in this comment. I meant "aren't free markets", not "are free markets".


it is insane to me that anyone would point to china as a example of how the rest of the world should operate

If you do clearly you place no value on what most people call Basic Human Rights.

>> And I think it demonstrates that market based systems that are "free markets" can be quite effective,

I am not sure what you are trying to say here, everyone knows Free markets result in the best economic outcomes, that was my point. The argument I was countering was the Free markets are bad and socialism is good which history has proven wrong time and time again the only counter that socialist have is "those were not real socialism"

Any society more than a couple hundred people in size will fail miserably under a socialist economic model. Only Free markets provide for the best outcomes for the most people.

That is not to say there is no issue with capitalism in general, Crony Capitalism which is what you have with Heavily regulated markets like we see in the US is a real problem and is largely the cause of Wealth Disparity in capitalist systems,

I am further confused by your statement the Free Markets are better than "remove regulation at all costs" approach

Truly free markets have no regulation, so I am not sure what point you are driving at.


>If you do clearly you place no value on what most people call Basic Human Rights.

For starters, who said human their rights violations go wholesale with their economic policy (and you can't have one without the other)?

Second, well, the US is not some paragon of freedom either compared to many western countries.

From the largest prison population per capita (25% of the worlds inmates for 4% of the worlds population), to slavery, Jim Crow, redlining, seggregation until 1970s, Japanese interment camps, the only country dropping atomic bombs (and to civilians nonetheless), McCarthyism, mass surveillance, and so on, all the way to Bush, Trump, and co (including the multi-war supporter Obama), plus constant meddling on foreign countries, invasions and occupations, etc., all the way to waterboarding, and such, yeah, I'd take "socialist" Denmark over it...


>>From the largest prison population per capita (25% of the worlds inmates for 4% of the worlds population)

Some of that is because many nations massively under report their prison population (aka China).

I agree that the US has over criminalized Society, namely in the failed attempt to curb drug abuse by criminal enforcement

>>to slavery

yes yes, the US is the only nation to ever have slavery in all of history... ::rolleyes::

>> Jim Crow, redlining, seggregation until 1970s, Japanese interment camps, the only country dropping atomic bombs (and to civilians nonetheless), McCarthyism, mass surveillance, and so on, all the way to Bush, Trump, and co (including the multi-war supporter Obama), plus constant meddling on foreign countries, invasions and occupations, etc., all the way to waterboarding, and such, yeah, I'd take "socialist" Denmark over it...

I guess that depends on where you place your personal freedoms, All of those things are terrible, most of them are historic atrocities that many nations unfortunately have in their past, or the more modern ones impacted a small number of citizens, and they all universally hated by the population.

However with socialism even Denmark style socialism my Income would be massively stolen errr "taxed", I would not have many of my core freedoms (aka Self Defense / gun rights), I would not have the same free speech protections as the US, etc.

These all do impact me directly...


> it is insane to me that anyone would point to china as a example of how the rest of the world should operate. If you do clearly you place no value on what most people call Basic Human Rights.

I don't like the human rights abuses in China. But that doesn't mean they don't do anything well. Such an argument is silly. I could makes a similar claim about the US. How does the US do on the right to healthcare for example? Awfully. But that is only tangentially related to the merit of their overall economic policy.

> everyone knows Free markets result in the best economic outcomes

No they don't. There is a lot of evidence that free markets lead to better economic outcomes then soviet-style planned economies. That they lead to the best possible outcomes of all possible economic systems is a highly suspect claim given that the economic outcomes they lead to aren't all that great (e.g. high levels of inequality).

You seem to have in your mind that free markets and planned economies (which you call socialism, although there are actually many forms of socialism that don't use a state-planned approach such as market socialism https://en.wikipedia.org/wiki/Market_socialism) are the only possible economic models. But in fact there are many other approaches. For example the scandinavian model is a mixed economy with some free-market aspects but also a large state (32% of the economy is public sector in Denmark vs 17% for the US and 50% for China https://en.wikipedia.org/wiki/List_of_countries_by_public_se...), and much stronger regulation than in the US.

> That is not to say there is no issue with capitalism in general, Crony Capitalism which is what you have with Heavily regulated markets like we see in the US is a real problem and is largely the cause of Wealth Disparity in capitalist systems

Regulation is indeed a problem in the US. But that is only because of the regulatory capture leading to ridiculous regulations that enforce monopolies. In other places (I live in the UK) regulation actually prevents monopolies (e.g. we have laws that anybody may start an ISP using the shared fibre/copper infrastructure) and leads to much better outcomes than unregulated markets. I find it hilarious that you think the US markets are heavily regulated when they are some of the least regulated amongst OECD countries

> I am further confused by your statement the Free Markets are better than "remove regulation at all costs" approach. Truly free markets have no regulation, so I am not sure what point you are driving at.

Having no regulation is what I am describing as "remove regulation at all costs". My point is that there is a lot of evidence that such markets don't in fact function better than a more measured approach of "regulate where it's appropriate". The political position that all markets ought to be as free as possible is almost as extreme as the one that economies ought to state run.


>>I could makes a similar claim about the US. How does the US do on the right to healthcare for example? Awfully.

Well first and foremost Healthcare is not a right. It is a Service that has to be provided based on limited resources

Rights are things like Free Speech, the Right to Self Defense, the right to not have your property stolen, etc

Aka Negative Rights

The right to be free from abuse

Things like "health care" are not Human Rights at all


The fact that you don't see things like healthcare as a right is precisely my criticism of your system. Of course everything is subject to limited resources and external constraints. Enforcing property laws is subject to limits on policing resources.

How is "the right to not have your property stolen" any different from "the right to not have preventable diseases from killing or harming you"? In both cases there is some external problem that is prevented through the use of resources: in one case it's the police and legal apparatus. In the other case it's healthcare providers.


>> Enforcing property laws is subject to limits on policing resources.

Incorrect, in the US I have no right to policing either, I have the right to defend my property myself from people that wish to take it. Our Courts have held continuously that no individual has the right to protection from the police force.

We do not have Positive Rights in the US, never have and hopefully never will because in order to have positive rights one must infringe on negative rights to supply them

So while I have the right to property, I must provide defense of that property myself. I can also ask the courts to mediate if there is dispute over said property.

Similar with healthcare, and one area I believe government regulations today intrude on my rights. I should have the right to ACCESS to care, but I would not have the right to require someone provide me with care. So for example I should have the right to purchase any medical supply, drug, etc from any person that wants to sell those supplies free from government interference, but the government should not provide me with that care, and should not steal errr tax other people's labor to provide me with that care either


Courts don't just mediate. Their judgements are backed by the power of the state (otherwise a thief could simply ignore them). And this is absolutely somebody having a positive right (to have their property ownership defended) that infringes on someone else's negative right (to take their property without the state interfering). It happens to be a right that I agree with. But I don't think you can differentiate it with the positive/negative distinction like you think you can.


>>that infringes on someone else's negative right (to take their property without the state interfering)

That is not a negative right, it is not a right at all.

The entire concept of rights, as understood by common law, is based on Natural Rights under a Lockean philosophy of Self Ownership, that you own yourself.

Under no conceptional understanding of natural rights would have have the right to take someone else's property.

So if we are going to start from a completely foreign concept of negative rights then we have no basis for a conversation at all

https://www.learnliberty.org/videos/positive-rights-vs-negat...


Rights are a much broader concept than "natural rights". See: https://plato.stanford.edu/entries/rights/ I certainly recognise property rights (although in principle there's no reason why one couldn't believe in rights without that being one of them), but I also recognise a lot of other rights (including "positive" rights). And don't recognise negative rights as being more fundamental than positive rights.

The positive/negative distinction


It is not a matter of them being more "fundamental"

There is no way for a government to provide you with healthcare (positive rights) with out infringment of my negative rights to my labor (i.e they must take a part of my labor via theft what you call taxation in order to provide you with the right to a service). The only other way for a government to provide a service would be indentured servitude where they require a person to provide the service under threat of violence / imprisonment

That is why I reject the concept of positive rights, because they require that other people have their labor, property, or self ceased by the government to fulfill those rights that is fundamentally unethical to me

negative rights do not have this ethical problem


There's no way for government to protect your property rights without infringement on my negative rights to my liberty either.

This is particularly obvious in the case of private ownership of land. Without private property rights I would be free to walk across and otherwise make use of that land it is your property rights which take that freedom away from me. And this isn't a theoretical issue either: prior to the enclosure of land in the 15th-19th centuries (https://en.wikipedia.org/wiki/Enclosure), most land here in the UK was common land that anybody had a right to make use of to graze animals and make a living. The expansion of private property directly took away that right and deprived many of their livelihoods. Even today we still have the Right To Roam (https://en.wikipedia.org/wiki/Freedom_to_roam#United_Kingdom) and at least walk on private land which provides for significantly more freedom than is available in the US (there is close to zero danger of being shot just for walking somewhere in the UK).

For other property it's less obvious, but enforcing your property rights still requires encroaching on my liberty. For example, without property rights, I would be free to walk into a shop and take whatever I wanted to. Or rock up to your unlocked car and drive off with it. The only way that this is prevented is through active enforcement by the state. Ultimately, if I persist in doing these things then I will be physically imprisoned. And it takes significant resources to run the policing and justice systems, just like healthcare. Now obviously nobody thinks I should just be able to walk up and take your care. But other forms of private property are controversial. In particular the idea that you ought to be able to keep all of that wealth that people give you (without taxation). You are only able to hold onto that wealth in the first place because the state enforces your right to keep it. There is nothing a priori that says that the state (which ultimately represents its citizens) must always enforce that right.

To summarise, many "negative rights", including the right to private property, have exactly the same ethical problems as "positive rights". At which point it becomes a simple matter of what you consider moral and important. You're right that state-provided healthcare deprives you of wealth that you could otherwise have. But the reverse is also true. You having the wealth deprives others of healthcare that they could otherwise have. You just happen to think that property rights are more "natural" and important.


I have already address most of this so we are just talking in circles, as it is not the government that protects my property rights, this is point I dont think we can get past as you seem to believe that government is the source of rights and I do not.

Under no rational ethical foundation of cooperative society would anyone have the right "to walk into a shop and take whatever I wanted to" that would not be a civilization at all something you seem to inherently recognize but still insist using this straw-man to advance your flawed view of rights

The argument about Real Property(land /natural resources) is a better one, and one I am actually sympathetic to but this argument was not one I ever used to support my position, in fact I am very much a Georgist and do believe that land & natural resources should be "owned" by humanity collectively and I have ethical problems with the foundation of homesteading.

That said this discussion was around wealth generated not by land but by labor, if you want to fund a public healthcare system under a Georgist single tax system then I would be amenable to this discussion but taxing wages is theft of labor, it is akin indentured servitude and there is no way to get around this fact. However in modern society no one suggests such a public finance method, the default go to funding model is theft of Income, either personal or business. I can never support such an unethical system


Actually, most computers today are made in China that hardly favors "free market capitalism". The companies take lots of subsidies, some are state owed, they depend on state infrastructure and policies (even to get workers), and so on.

At best it's a mix, but hardly "free market capitalism".


But we’d make more money!


I try to avoid normative statements because it leads to better discussion. Unfortunately this style of discussion leads to comments that project normative stance, or criticize me from not making my normative position clear.

(positive statement is statement of what without indication of approval or disapproval, normative statement expresses a value judgment)


> In this case just 10TB of data was lost

Seems like a value judgement to me.

Anyhow, something about `innumerable, singular sites of suffering' makes the point rather moot.


just is positive assessment about relative size. I could be wrong about positive statement of course. But it seems very low amount for a cloud service.

I wast trying to understand how business thinking work. Not judge or praise it.


> In this case just 10TB of data was lost, so maybe thousands of users for the long term storage option.

I'm not even a professional photographer and have about 2TiB of photographs.

This is barely data loss in the great scheme of things, but certainly a wake up call for Canon and their users.


A “calculated risk” doesn’t make it okay when the risk is all on the consumer losing their memories. This corporate view on business risk alone is unethical.


There is no such thing as lack of risk in this world.

When a restaurant takes your reservation for Friday night, there's a 0.00001% (or so) risk that they will burn down on Thursday or that half the staff will get heart attacks or something. Should they not do reservations at all?


If they don't have a working smoke alarm, or are storing flammable materials over the burner, or don't have enough emergency exits then no, they should not do reservations at all.

It seems Canon did not have a backup, they had replication of data. If you are storing data for money, then this is a professionally negligent behavior. Not being able to prepare for everything is not an valid excuse to not prepare for what you reasonably can.


What does adherence to fire code have anything to do with taking reservations?

Also, reservations are free. It's not reasonable to expect restaurants to do anything and everything to honor 100% of reservations when many customers never show up for them and don't have any skin in the game anyway. Whatever point is trying to be made here is going out the window thanks to the "zero 9" nature of restaurant reservations. (Yes, not even 90% of restaurant reservations end up turning into actual paying tables.)


Indeed, it has nothing to do with taking reservations, but providing service which is the crux of the issue. If they accept customers while knowingly not following fire regulations, then yes, it’s wrong and unethical; similar thinking should apply to software services.


Losing a restaurant reservation on a single night when surrounded by alternatives (aka - lots of backup plans) is not the same as losing data that people trust as the backup plan for their data - the loss of which causes a lot more harm. Especially if it’s a respected brand.


“I lost your reservations” is so much less serious than “I lost the pictures of grandma’s birthday”


Limited resources means you can never realistically mitigate 100% of the risk. You just have to decide how much risk your company will accept.

People still die in risky surgeries. But we do them despite the risk on the consumer is them being dead.


I'm not sure this is a great comparison.

> People still die in risky surgeries. But we do them despite the risk on the consumer is them being dead.

The risk of doing or not doing is death. I'm not sure this is the best analogy.


> The risk of doing or not doing is death

Cosmetic surgery...


That's valid, and I dunno why you're being downvoted. I didn't read "elective surgery" from your original comment, but re-reading it, there's nothing in it that discounts it.


It's easy for this calculated risk to misfire, if the platform gets a reputation for losing data early in its life.


In this case just 10TB of data was lost

I don't know about you, but something doesn't seems right here. Just 10 TB of data? For me even 0.5TB is too much. Hech, 0.0001 is too much.


How hard can it be to have a set of data, make 2 copies and store the in different locations.


Reading their description of the problem, it wasn't that they lost one of their redundant servers, it was that their code that cleans up temporary storage accidentally was also cleaning up long-term storage. Increasing the redundancy of their storage wouldn't have helped.


If you have regular (daily, hourly) tape backups that are shipped off-site then you greatly mitigate the potential damage of software bugs. It sounds like they weren’t making backups and just treating failover replica servers as a “backup” (which they clearly aren’t).


This sounds like a case of a bug in their software uncovered a huge gaping hole in their architecture.

Put another way, while they seem to have fixed the immediate problem, no word on whether they are putting real BACKUP plans in place to avoid a similar problem happening and whipping out everything.


Always have a back that is write once! Storage are cheap now days. If this is your core business get your act together.


If you're storing a large enough set of data for long enough, 2 copies will be woefully insufficient to ensure you don't suffer a loss of a given subset of the data.


The famous Fort Pinto story comes to mind. The car with the fuel thank in a bad spot, making the car a 'fire trap'.

A risk analysis made by Ford was misinterpreted and seen as a calculated risk of recall vs settlements for injuries/death.

But It turns out it was mostly a hoax.

https://en.wikipedia.org/wiki/Ford_Pinto#Retrospective_safet...


Intent tends to matter with the law. Ford knew there was a problem, knew they could fix it, and elected not to because "it's a small number of deaths". You can replace things, not people.

That retrospective analysis utterly misses the point: for the area they were responsible for, if the company knew they could do better and elected not to, then they got treated exactly as they should have.


>Intent tends to matter with the law. Ford knew there was a problem, knew they could fix it, and elected not to because "it's a small number of deaths". You can replace things, not people.

Per the link posted by the GP the Pinto did not have any problems relative to other cars of the same type and era. Ford's "intent" (if you can call it that since they didn't know the future at the time) was to say "err, we don't really think there's a problem here so we'll pay out money when this happens but it's not recall worth" and the investigation bore that out which is why they were not found guilty. I don't really want to defend BigCo (or any other large group with diffuse responsibility) but it's 2020 and here I am.

>That retrospective analysis utterly misses the point: for the area they were responsible for, if the company knew they could do better and elected not to, then they got treated exactly as they should have.

Everyone could always do better on any given metric but the question is at what cost and with what tradeoffs. People buy compacts and subcompacts because they are cheap and fuel efficient. You could build a subcompact that just bounces when rear-ended by a freight train but nobody would buy that because it would cost a ton and weigh a ton (several tons actually) defeating the value proposition of a compact car.


Seems like you're failing to address the 'knew they were going to kill people' part.

Everyone can always do better, but everyone doesn't know exactly which of their decisions will end up killing another person. If you DO know that, then you're obligated to change things.


>Seems like you're failing to address the 'knew they were going to kill people' part.

Damn near every engineering decision could lead to some sequence of events that kills someone and given infinite time will eventually do so. This is a core concept of pretty much every ethics in engineering class that many people here have taken in order to obtain their degree.

Don't slap doors befitting a missile silo on your car? If your cars rick up enough miles someone's gonna die in a T-bone eventually.

Build an overpass with small beams and a support on the median instead of big expensive beams and no support? Someone's gonna hit that support eventually.

The sorts of excesses that can prevent unlikely deaths are justifiable in individual and in the abstract but at a societal level we cannot afford to think this way. As we get richer we can afford to hedge against more and more remote failure modes but even then at some point the risk is low enough.

In the 1950s we couldn't justify seat-belts in all cars. In the 1980s we couldn't justify making cars as rigid as we do today. Between wealth and technological progress things are generally getting better all the time but to hold the past to the standards of the present is asinine.


Everyone entering a car today knows that given the wrong combination of events, they may kill someone, maybe even a toddler chasing a ball. Yet, millions of people still drive daily.

Even cyclists are not free from that risk - when I leave later and take my cargo bike to pick up my kid, I might be distracted in the wrong instant and run over someone else’s child. I try hard not to, but the safest way to reduce that risk would be to walk. Still, I don’t.

We all balance risks, even deadly ones, to us and others, all of the time.


We balance risks which have uncontrollable inputs. Not controllable ones for which we now have an answer. That's the difference. You note it right in your comment - that you in fact do take measures to be safe, and try to be specifically aware of elements of the dynamic situation which require attention.

That's very different to say, if you planned on reversing a car and decided it took too much effort to look behind you before you do. Sure - chances are most of the time you won't run over a child doing this...


>Not controllable ones for which we now have an answer.

We do it all the time?

How far are you from the nearest fire extinguisher? Can't that number be reduced? That is but one example.


Has my present distance to a fire extinguisher been implicated in a spate of house fires which have killed or injured other people?

I as an individual can take the risks I choose to take with my life. I have no right to apply that to others though.

In fact the building I'm currently in was built to local standards which mandated all sorts of things in relation to providing adequate control of known risks related to the spread of fire, and has smoke detectors fitted for that purpose. Because the builder of the house might have been okay accepting that risk for themselves, but we don't let them just pass it on to others.


I wonder if they discuss that and actually write down calculations


I've designed a storage system for an image storage app for a client, and I had an extensive spreadsheet with the cost and risks of various combinations of erasure coding and data centres, as well as other mitigations.

So yes.

But this data loss highlights that all too often it ends there, with calculating the risk of loss due to hardware or data centre failures.

You need to also do the same in terms assessing how you design your software, and what processes you put in place to mitigate risks.

In this case it was the software system design that failed: They allowed the system access to modify long term stored data, when really the system ought to have been designed to allow "write once unless the user has convinced us that they really, really want the data destroyed, in which case keep a backup for a limited time" type design.

A lot of software fails in that respect. The moment your internal API's allow modification of data, you risk catastrophic loss, as Canon found out, and you really ought to have strategies for mitigation of that.


But was this framed as easy business from gambling with users artefacts ? or just the usual "shit happens" preliminary thinking ?


I'd guess in this case nobody had thought about it as a real risk. People are often woefully unaware of the risks of accidental effects of unnecessary privileged.


This. Outside of FAANG, Gov and finance, very few consumer facing companies have the budgets to engineer redundancies.


That. Who in their right mind would expect IT professionals to take any responsibility whatsoever for the empty promises they make to their users?


I suspect this is more a view from the trenches of "modern" SAS companies hosting their offerings on cloud providers charging a premium for storage+transit.

Because I can assure you, from the perspective of someone who worked for >10 years in a backup related part of the storage industry, our product penetration covered a huge portion of public companies, and untold numbers of non public businesses. Given that we were one of a half dozen or so similar products and our competitors also had non-insignificant footprints in places we didn't most companies were taking backup of their core systems very seriously.

And even then, today, there are dozens of online backup services. There are plans which can backup a TB or two of data a year off a NAS/etc and provide a read only historical snapshot for the price of a couple starbucks coffees a month (and a reasonable internet connection). And then there are the dozens of mom & pop places which just use two or three ~$100 USB disks on a rotation with windows backup/time machine/whatever.

So, no, lack of a proper backup (and recovery) strategy isn't really a question of cost. Its more a question of the competencies of the individuals charged with configuring/maintaining the IT systems.

And sure, if you have .5PB+ of data that needs backing up, a tape library, or replicating dedupe appliance, software license, etc is going to cost $$$$ but overwhelmingly when compared with the overall IT budget for machines+storage+administrators those devices rarely crack double digit percentages of the yearly IT budgets. Further, there are ways to lower those costs even more with a bit of hunting on the resale markets where picking up a three year old piece of equipment for a tiny fraction of its original cost isn't unusual.


>budgets to engineer redundancies

You dont need that, but a backup ;)


They obviously do have _some_ backups (since were able to restore all videos and all photos - but not all photos at the original resolution).

In the end it's all a tradeoff... ignoring cost, we all know what to do to get pretty safe storage. But how much are you really willing to pay, per month, for that safety?


>but not all photos at the original resolution

Thats why you backup the original data and NOT the downscaled etc one.

>But how much are you really willing to pay, per month, for that safety?

You have to backup ALL your customers data, otherwise DONT go into file-hosting business.


But they did backup the original data - the article states that the problem was with deploying wrong cleanup code on the long-term storage (that was supposed to be only deployed for the short-term storage). How do you guard against that? You could store everything in backup storage & never delete it, but that has implications:

- privacy

- legal (GDPR)

- cost etc.

It's basically a "black swan" event, they designed their system without anticipating this particular failure mode. It's easy to claim in hindsight that "they should've or they should not go into that business" but seriously.... look in the mirror. Are you as tough on yourself when it comes to your customers? Can you honestly claim that there's absolutely no failure mode that could result in customer data loss?


These "old" files are deleted to save disk space. Instead of deleting data automatically, they could give users a disk quota and block uploads until the user explicitly moves or deletes files, eliminating all failure modes in which automated deletions go wrong: it's far from a "black swan" case.


> privacy

Encryption?

> legal (GDPR)

You dont have to delete offline backups...never heard of such a case

> cost etc.

Cheaper than loose your customer's data wich they trust you with.

EDIT: If you can delete a backup it's not a backup.


Under GDPR "right to be forgotten", you have to delete backups, can't keep them forever - see https://ico.org.uk/for-organisations/guide-to-data-protectio...

There are alternatives (one of which is to encrypt the backed up data & just throw away the encryption keys instead) but of course those incur additional costs

> Cheaper than loose your customer's data wich they trust you with.

This is just opinion. And also it's heavily context-dependent(cost of losing customer data is not the same for all data or all businesses).

> EDIT: If you can delete a backup it's not a backup.

If it gets destroyed, is it a backup? How's that different from "delete"? Those were not intentionally deleted, they were accidentally destroyed. Can your backups be accidentally destroyed?


>you have to delete backups, can't keep them forever

>>The key issue is to put the backup data ‘beyond use’, even if it cannot be immediately overwritten.

It's clear that you cannot delete WORM-tapen with many other data on it.

>This is just opinion.

Not for a file hosting service, the customer trusts you to do your job.

>If it gets destroyed, is it a backup?

Destroyed means physikal interaction, since it is not possible to make a reallife backup this is you last line of defence.

>Can your backups be accidentally destroyed?

Well if someone "accidentally" breaks into my bank opens the right tresor and "accidentally" uses a flametrower..the you are right.


Why did you stop quoting after "immediately overwritten" ?

> ie that the backup is simply held on your systems until it is replaced in line with an established schedule

Again, you can't keep backups forever, you are _required_ to delete them.

Also, not sure why you keep you backups in the bank (do you, really?) but even banks can suffer from fire/arson, explosion (accidental or not), or a number of other natural disasters (earthquake, flooding, etc).

[later edit] Also, sort of by definition, in all businesses customers trust you to do your job. Can you give an example of a business where customers don't typically trust you to do your job?


Really depens on the industry and the country (Finance is 30years, Health-Insurance until the death of the person (in switzerland)) You cannot go to you finance institute and ask them to delete all you records...that will never happen.

>Also, not sure why you keep you backups in the bank

Business backup's, secure, really good fireprotection, flood protection and in case of a Bus-hit-scenario someone else has access to it.

>Can you give an example of a business where customers don't typically trust you to do your job?

Politic and Canon filehosting from now on, Airbus Max and Foxnews ;)


Right to be forgotten does not apply in cases where you're legally required to keep the data, or have other legitimate reasons to require said data, or there's significant public interest. But for _my_ photos, on a hosting service, it most definitely applies - if I request erasure, you can't keep them 30years in backup without breaching the law, that much I guarantee.

> Politic and Canon filehosting from now on, Airbus Max and Foxnews ;)

I know it's tongue-in-cheek, but Foxnews viewers typically trust them, that's one of the problems, right? And customers (advertisers) obviously trust them to deliver message to audience.

Airbus is better trusted than ever, especially now after the 737 Max fiasco :P - but even Boeing was obviously trusted by their customers (whether that trust was misplaced is a different issue; my point was "customers must trust me" is in no way unique or particularly important for file hosting).


>Airbus is better trusted than ever, especially now after the 737 Max fiasco :P

True true


With your argument, Google photos should also not be in business since they also don't back up your original resolution unless you pay.


But Google makes it very clear that they don't backup the original photos.


>Google photos should also not be in business since they also don't back up your original resolution unless you pay.

Not true, but you are resticted to the free 15Gb.


> since were able to restore all videos and all photos - but not all photos at the original resolution

"Canon has said that there is no technical measure to restore lost video files, but that photo files can be restored – albeit not at their original resolution."


Ah I somehow misread that, thought they can restore videos but for photos they don't have the original resolution.


And even then, if you store your data there the responsibility is ultimately still yours.


I’m sure plenty of engineers will be made redundant by this fiasco.


"Is image.canon free to use? Yes, it is free to use."

from: image.canon/st/en/faq.html

so as of now there is no recurring revenue added as of now


And there won't be at this rate. Besides, it is now a net negative for their users.

'The cloud is not a backup' should be a mantra that everybody that uses the cloud for their work and personal data be familiar with.


> The cloud is not a backup

Of course the cloud is a backup. Why do you think it isn’t? Because it may break? All storage may break.

The cloud is a backup. Like all backups, you need multiple independent backups.


Ok, let me spell that out more clearly. Having something stored in the cloud does not mean that you no longer need backups.


And you need to check that you can actually restore from backups - which is a step a remarkable number of people forget to do.


I was a Unix administrator responsible for backup many years ago. When I asked the manager of the IT department for funds to do a disaster recovery exercise I was turned down flat and told that no amount of discussion would change his mind.

We knew that we could restore individual files because users asked for them when they had accidentally deleted them but we had no idea whether or not our procedures for restoring the systems themselves would work.

Luckily we never had to find out.


321 rule for backups: 3 copies, 2 different media, at least 1 off site

Something is not a backup if you haven't tested that you can restore from it.

All backups not in your direct physical possession should be encrypted before being written.


Exactly this. I have onsite, and I have a cloud backup. They mitigate different risks, even for personal data.


The cloud is a backup. Just like any other backup you should test restore or at least verify the data regularly.

I trust Google more than a random hard drive (to not lose random bits) in this respect.

And unless your admin configured it away, GSuite allows takeout just like GMail afaik?


> GSuite allows takeout just like GMail afaik

Hm, interesting ok, I will look into that. I didn't think of using takeout. Thank you.

The only option I found before was to allow some random third party application access to all my files which was not an option.


It's a bit annoying for restore since it can take hours to be ready for download. Still better than Amazon Glacier though :D


Better something than nothing. They must have changed this recently, even a few months ago attempts to bulk download Gsuite data led to error messages.


It's Gmail, not GMail. :)

Also, I believe all Google products have to support takeout, just like they have to support deletion within whatever the legally mandated time frame is (months?).


I use the Cloud to store all my work and it's my way of backup. I trust Google's capacity to save my files. But, this being said, you SHOULD have other ways of backing up your data, like another HDD, storing locally in your PC too.

Being too careful is never enough.


Sure, Google can save your files. But will it? and will you never lose access to Google?

The problem with these big cloud providers is there are employees and robots with a big "disable account button", but there's no one you can call to talk about reenabling your account. I don't know what the odds are that Google is going to hit that "disable account" button on your account, it's probably incredible small, but if it would ever happen they might have permanently deleted all your stuff before you get far enough in their customer support system to have them reinstate your account.

If my livelihood depended on Google storing my files, I'd store a backup of my data somewhere that I can access in person if push comes to shove. As a matter of fact it does, and I do.


Well I always sync my files between two PCs so I always have them stored on these. If anything goes wrong with any of these PCs (both) I always have Google. I also do some backups on both my externals HDDs so yeah I don't see all this failing anytime soon.


> Sure, X can save your files. But will it? and will you never lose access to X?

Google, DropBox, a storage server in your house, external hard drives, tapes, burnt DVDs... you can swap in any backup system there and it's exactly the same. Google's cloud is just as valid as a backup as any other.


True, the problem is people thinking their files being stored at Google means they don't need another location for backing them up.


>> I trust Google's capacity to save my files

Sure but do you Trust Google's AI not pick up something from you, or a network you happen to be connected to then determine your account is in violation of their ToS at which time all of your accounts are suspended.

Which if that happens you better hope you can get enough social media attention to have your issue attract and actual person with authority in google to do something...

If not your SOL as is your data that google "reliably saved" for you


One of the advantages of having another independent backup is that, if you need it, you not only have a backup somewhere other than a specific cloud provider but you also have a backup that's being made via a completely different mechanism.

But I know that, careful as I am, I still feel vulnerable if, for example, I lose my main data disk.


3 copies, 2 different media, 1 offsite


No single repository of data is ever a backup

A Backup requires

* 3 Copies of the data

* 2 Copies on Different Media / Services (i.e Amazon Photos + Cannon Images)

* At least 1 copy in a Geographic diverse location...

aka the 3-2-1 Rule

If you do not have those 3 items at a MINIMUM then your data is not backed up


I guess thousands of companies and startups on hn would have a problem if amazon lost files on s3. Its easy to say the cloud is no backup, but its hard sometimes to have all data in multiple locations synced.


well it is a backup. if you loose the backup but have the original, there is no problem. if you loose the original but have the backup on a cloud, there is no problem.

of course if you loose both at the same time, that's bad. actually the best thing is of course to use multiple backup locations


It seems like there’s two types of backups people think about: Archival and Current. In the case of the former the “original” may very well be the cloud hosted backup. But I suppose the solution is the same regardless: backup time multiple providers and or locations.


As an aside, how the hell does Canon have their trademark as a TLD? Google, at least, I understand, because they control a sizeable portion of DNS; but Canon?


Any eligible organization can apply and go through the process with the somewhat new gTLD Program: https://newgtlds.icann.org/en/applicants/global-support/faqs...


Yeah no... I might be wrong but the last time I checked it cost around $200K to register a TLD. If you pay $200K, you get a TLD.


Selling customer data is a recurring stream. There’s likely also something in there about selling access to photo sets for training purposes.


Or maybe an engineering mistake? I do not think you can claim anything without knowing the details. AWS lost customer data too, I would not try to assign blame without reading the post mortem.


According to TFA (which apparently nobody reads) it most certainly was an engineering mistake. Files in long-term storage were accidentally subject to the same code that deletes files in short-term (30 day) storage.


It's sound similar to "junior dev had access to prod database" scenario which really isn't engineering mistake but process/administration.


I knew a guy who used a different color scheme for terminal sessions connected to production servers, so he didn’t do something in the wrong window.

A guy. One. How many people have I met with shell access to production servers?


I have all hosts share the same .zshrc and config with yadm, so all terminals look the same and share keyboard shortcuts. My oh-my-zsh with powerlevel10k is configured in a similar way: when I'm connected by ssh to a host it shows hostname@domain or hostname@ip on RIGHT side of a screen. It's not much, but it's easily visible, sometimes even annoying, but at least I never deleted wrong docker images from dev :)


It seems more likely that there was some process that performed a "delete older than 30 days" on the long term storage. Likely untested code. That's what the article says anyway, and we all know that the public message is always the best spin on a scenario.


An engineering mistake combined with an obvious lack of backups.


This reminds me of that oracle thread, where a guy had lots of backups, all encrypted with a key stored in RAM.


How do you know this?

Because this explanation:

>Canon has announced the results of its investigation into the loss of image data on the image.canon cloud platform. According to Canon, when the company switched over to a new version of the image.canon software on 30 July, the code to control the short-term storage operated on both the short-term storage and the long-term storage functions, causing the loss of some images stored for more than 30 days.

seems pretty banal. Updating a service and forgetting to update the backups is not an uncommon thing, and quite a few software companies have published post mortems saying they did exactly that.


More likely than not some manager decided all those extra nines weren't really necessary.


The manager probably put five 9s onto the spec sheet, and provided absolutely no resources. You’ll never guess who’s going to get the heat for this, the manager or the poor person who didn’t follow the spec.


A reliability guarantee I like to call "nine fives".


There was an article about “five 8’s” during the Trough of Disillusionment for the original hype cycle.


I don’t know why you’re being downvoted since you can clearly see echoes of this manager in the comments above...


Too many managers, not enough engineers on HN. As an engineer I can definitely attest to managers saying "99.9% is enough"


My two cents is going to be that the ad tech MacBook deleted the old S3 bucket for mockup but it wasn’t a mockup at that point


Are you trying to discredit using MacBooks in tech or something?


What do you mean? The article says it was a software bug that accidentally deleted both short-term and long-term storage, while it should have been only short-term.


Japanese camera companies are really bad at cloud software - Sony Playmemories probably the worst offender. Also the in-camera "mini-apps" tend to be terrible.

They make good hardware but horrible software. They should really buy a computational photography company that also knows how to make good web and native apps, like https://skylum.com/


We were actually on NTT (Nipon Telegraph and Telephone) cloud for many many years up until very recently. At the time we started using them AWS, Azure, Google Cloud, etc. did not exist. But over the past few years the quality of service has dropped considerably. My guess is that their best engineers were being poached by AWS, Azure, etc. and the leftovers couldn't keep it up.


I remember paying $10 to buy a mini-app that added what should have been native functionality to my Sony camera. Then a few months later a firmware update made it incompatible.


I'd go further and say that a lot of software coming out of Japan is, if not of a poorer quality, then at least something to be assessed more carefully.

This is particularly evident in videogames, where anything for PC is liable to be extremely weirdly built and feels like it was made for Windows XP. There's a strong tendency to use archaic cheat prevention tools like nProtect GameGuard, which is a rootkit, and a weird reluctance to run servers in sensible ways. Oddly this is all actually helpful when you want to run their games on Linux because even though they don't seem to know it exists, the games are so "old" that they're pretty easy to get running.

Sony is actually probably one of the better companies for software. But they still do utterly weird nonsense that I'd never even think of. The PS4 throttles your download speed based on your ping to the download server. It's...totally wrong and confusing. But easily bypassed by running a proxy on your local network.


I take nightly images of my computer's primary drive and replicate them offsite, with alerting if it fails and automated scheduled backup restore testing.

It blows my mind to think that some random schmuck like me has better backup procedures than these multibillion dollar global corporations.

Companies that don't invest in their IT end up paying for it tenfold down the track.


Well who can blame them. The files are not theirs. /s


How do you automatically test restores?


Not the parent, but my homebrew backup script first makes the backup and then attempts to restore a canary file. If that fails it raises the alarm.

It's not perfect. I can imagine a scenario where the canary is restorable, but some other important files aren't. However it certainly protects against cases where a bug in the backup software makes a completely unreadable snapshot (which has happened to me before).


I wouldn't be so harsh on your homebrew solution, deja-dup uses the same strategy to verify backups.


I use Duplicati to manage the offsite backups, it comes with a command line tool which can pull random chunks from the storage bucket, decrypt them, and check the file hashs against expected values


On topic topic, can I get HN’s advice on my backup strategy?

All important files on Synology Ds218+ NAS, which has 2x12TB helium HDDs in mirror.

Daily HyperBackup to Google Drive. I test restoring files on a monthly basis.

Email alerts on failure.

Anything I can improve?


Google, as a company, is not reliable - I refuse to host anything critical on Google Drive (even G Suite) because there's zero recourse or accountability if a bug in Google's software or if Google arbitrarily decides to ban you for life from their services (e.g. due to misidentifying you from a dodgy YouTube user or repeated Google Play Store policy violations) then they will delete all of your personal and business accounts and you're SOL.

What you have is fine for short-term recovery - but I'd make sure you have a long-term and/or cold-storage option set up. It doesn't need to be anything particularly fancy: I'd get a rugged+ portable+external single-drive USB enclosure with a single 16TB drive and have the Synology do a backup of your most critical data onto that drive and store it in your bank's safety deposit box or better yet: leave it with a trusted friend who lives in rural Minnesota or similar.

It goes without saying to encrypt that drive as you won't have full custody of it: but use a simple, proven encryption scheme with a large ecosystem that you know that you'll be able to decrypt in 10-20 years' time.

For backup drives that I keep local to me, I refuse to encrypt them because (in my experience) the possibility of being unable to decrypt data in a desperate or urgent situation just makes me wince.


What do you make of comparable services from other providers, say, Amazon Glacier?


I've used glacier. For stuff you don't need to restore in a rush, it's cheap.

I did however have a recent health scare and it made me wonder how my non tech wife could possibly have restored the files as the interfaces are all heavy.

Not a factor I'd previously considered when assessing my backup/restore.


> For stuff you don't need to restore in a rush, it's cheap

That was my thinking. It seems a good fit as a last-resort backup. Low month-on-month storage costs, high retrieval costs. So we're essentially betting that we'll never retrieve the data. Which seems fine.

Also, it apparently has strong assurances against data-loss. Lots of nines. [0]

> how my non tech wife could possibly have restored the files as the interfaces are all heavy

It's all web-API-based, right? Is there a a decent FOSS GUI to navigate it?

[0] https://aws.amazon.com/glacier/features/


If you retrieve the data slowly it's cheap. It's expensive if say you are a retailer who needs their database restored asap.

I use a Linux perl client!


I suppose the biggest risk is failure-to-pay, as with all cloud backups/storage. If I allow my payment card to expire, Amazon aren't obligated to continue to store my data. If I drop off the grid for an extended holiday, that could be a real risk.

To my knowledge, Amazon offer no means of prepaying.


> If I allow my payment card to expire, Amazon aren't obligated to continue to store my data.

Business idea: cloud archive storage where you pay when you upload data and optionally pay a modest monthly fee for real-time access to stored data, but they'll guarantee to keep your data for you if you stop paying: you'll just need to pay to retrieve that data.

As the long-term archival data wouldn't need to be stored in a data-center: just a commodity tape-library box in a basement in a farm somewhere near a freeway I imagine it would be kinda cheap to run as a business. You could set-up a Foundation or other entity to ensure long-term continuity of operations and have it self-sufficient through an endowment. E.g. a $1m endowment would easily pay for something like this into perpetuity.


If offered by an organisation that I can trust to still exist in, say, 40 years time, this might work. Maybe Google/Amazon/Microsoft have this credibility. Maybe. There's no way it would work as a start-up.

You'd also need to charge enough upfront that you can still turn a profit if they stop paying immediately. Asking for upfront payment of 40 years of data-storage fees, might be a problem.

> a commodity tape-library box in a basement in a farm somewhere near a freeway

I'm not a data-storage expert by any means, but that doesn't sound anywhere near good enough. You need to redundantly protect against flood, fire, crime, etc. You'll also need to be able to retrieve data at scale. You're essentially rolling your own Amazon Glacier.


There was Permanent.org, three months ago: https://news.ycombinator.com/item?id=22943620


I agree with this comment on how that project completely fails to provide the necessary assurances of dependable longevity: https://news.ycombinator.com/item?id=22944681


> Amazon aren't obligated to continue to store my data. If I drop off the grid for an extended holiday, that could be a real risk. To my knowledge, Amazon offer no means of prepaying.

Another option I've been meaning to set-up is two-way Synology sync between sites. My parents have a Synology box and now that they have a decent internet connection (DOCSIS, not ADSL's 768kbps upstream) we could backup each others' data on each other's Synology boxes. I just need to figure it out...


You get emails. Imho it's not a big risk.


Possibly you miss the mail, the mail into spam, forget to fix it.


I have bad experiences encrypting backup drives. If they corrupt a little the data might be gone. You also have the problem of keeping track of the key for years.


Protip: write the passphrase in sharpie on the slither of whitespace left available on the drive’s spec sticker


What's the point in encrypting the drive if the key is written on the drive?


> Daily HyperBackup to Google Drive

Is that really backup or a sync? If you have a bitrot (or a ransomware attack encrypts it) in an important file on your disk, and you don't notice it for a week and that file gets sent to Google Drive, will you still be able to restore the original version?


You’re way ahead of the average. The only improvement I can think of would be to server side copy the backup files to another Google Drive account that the account linked to HyperBackup can not access. That way if the Synology gets hacked it cannot delete the offsite backup too.


To add slightly, a "pull" backup from a second backup storage system is possibly a less likely to be a route that an attacker could take from your production system to compromise backups vs a "push" backup from a more open production system.

So, for example, ssh key access from backup to production, but not vice versa. Definitely keeping that backup system as simple and low access as possible.


Synology just opened a US datacenter for their C2 backup service, it's quite easy to use and IMO better than hosting on Google.


Some way todo a offline snapshot a few times a year.


Looks like a sweet-spot to me.


How fast is your internet connection? That sounds like a big lift for a full image every night.


100/40, the daily images are incrementals which average around 8GB, I take full images once a month


But insufficient redundancy for that much data saves millions of dollars!


...which can instead be pocketed by higher ups in the company food chain :)


"We will contact affected users shortly and offer..." I'm not sure how I expected this sentence to end, but it certainly wasn't anything as useless as "our deepest apologies".


Lesson for those designing cold-storage solutions--design your system, as much as reasonable, to not support deletion operations. The reason why files get lost is usually due to software bugs, bad configurations, and operator errors. Design your system to protect against these things.


To be fair, image.canon is definitely not a cold-storage or backup solution, never mind primary storage. Its main use case is acting as a buffer for auto-uploading photos from camera and then later downloading them to a computer. As an afterthought it also offers a tiny long-term storage space, 10 GB per user, smaller than most memory cards available today.


It also appears that only images stored for >30 days were affected, so in theory most photos "should" have been downloaded and stored properly etc. if people are using the service as you describe


Both AWS S3 and Google cloud storage buckets have an option that makes it impossible to delete stored objects for some period of time. The option was added for some legal compliance reasons, but I find it useful as an extra safeguard that important service data is not accidentally or maliciously deleted.


Yes, but...

This does nothing to protect against a bad cron job script. The web frontend, and even backend, are not linked to background maintenance tasks.

And a field in a DB doesn't prevent a manual DB delete, or api call to a backing store.

The real problem here, was a lack of backup testing / restore testing.

If you have backups, you MUST manually verify they can be restored, in a desired state.

And this means restores MUST be tested every time code changes happen, which effect backups.


I noticed that gitlab added a change so that once you delete a repo, it holds it for over a week and puts a big "Pending Deletion" banner on the repo.


I suppose that feature would be incompatible with user data in a GDPR world ??


No, as the right to be forgotten (if that’s what you’re thinking of) isn’t absolute.

The example I’m aware of is where personal information is held due to ‘legitimate interest’ and the request to delete isn’t deemed (however that is defined) to override that.

I won’t try to go further as it’s not 100% clear cut and IANAL.

Have a look at the legislation or the various explainers that have been posted online if you’re interested.

Edit: In this case I would guess that Canon would have trouble claiming a legitimate interest in keeping hold of your photos!


Depends on how long it is stored, you have some time (one month) to clear your systems when you get a request for deletion.


I'm not a lawyer, but my understanding is that GDPR requires companies to remove user data upon request in reasonable time-frame. If you keep not deletable backups for one month in order to improve reliability of your service, then my understanding is that it is fine to fulfill the GDPR data deletion requests withing one month period, not immediately.


Filtering this stuff from a master transaction log is kinda ludicrous, not looking forward to implementing that.


It's sometimes easier to encrypt everything sensitive with a user-specific key and destroy the key.


That's how I've done it as well. The files are backed up for as long as we want, they are just useless once the user's secret is trashed. You can do the userid blocklist's on database restores etc. but for long term backup file storage it becomes silly.


I don't think the GDPR imposes a requirement to actively filter backups.


It's not a backup, it's the authoritative dataset.


Git is also incompatible with GDPR, you can't simply delete a file from all history.


Git is not really intended for personal data


Yeah, I don't really understand why you would like to stuff personal data into a git repository (unless we're strictly speaking about author data). It's really not the tool for it.

Now for those who decided that a blockchain is the perfect solution for storing personal data however...


Name and Email is personal data...


You can, you just have to be willing to rewrite history. See git-filter-branch (and don't forget to read the fat WARNING).


Git allows you to rewrite history, that is not the problem.

But you might have a hard time chasing down every copy, it's a distributed system by design.


Once you rewrite history, you've basically broken the git concept. It's an emergency feature if you've accidentally committed a security token or similar, or before you've pushed upstream, but if you're rewriting history with any sort of regularity, you're using git very, very wrong.


Don't store your users PII in a git repository then.


The use case is to do dataset versioning for ML. The dataset itself is updated frequently. It would be nice to use a tool that can store efficiently when small changes are made yet allow versioning for reproducibility.


If you do ML on sensitive data from actual users, reproducibility is a crime, not a feature. You need to ensure that your ML forgets deleted data completely and irreversibly.


Oh man, all my WORM tapes are not GDPR compatible...thats terrible!!!


You could always not pay the bill :)


This is actually larger lesson.

When I work on design I try to make catastrophic failure impossible. Not implementing deletion or implementing deletion with some unseparable constraint check is one way.

For example, the files could have a flag set and from that point on the only delete function would check the flag and refuse to remove the image regardless of how it got invoked or where the image is placed (think like write protect notch on a floppy disk).

Another way is to not actually remove the files but instead introduce grace period when they are marked as deleted, not accessible, to the user.

The only real deletion would happen in a vacuuming system that will actually remove the files but with a hardcoded constraint next to filesystem delet that the file must have been marked as removed at least XX days ago. This would give some time after deletion to actually recover files.

Yet another technique is to move objects marked as deleted to some other, cheaper form of storage. The function that moves objects must first confirm the object is available in new storage through its API and only then removes the original.

You can also prevent damage that could result from failed modification using immutable objects and Copy on Write. The copies would be deleted using the above described mechanism.

Yet another technique is to keep XX days of redo-log that could contain enough information to rebuild any object to any of its current or past state within XX days. Depending on the type of the file and nature of changes it can be much more efficient than keeping whole copies.

All of this of course adds overhead but you can recover a lot of that overhead by relaxing requirements on duplication of your operational or other forms of backup storage with rationale that if you have a redo log where you can locate current or past version of the object independently of operational or backup storage you can count it as one more operational or backup copy.


Yeah, it seems some of the other commenters miss that point: it’s not as if some disk crashed or something, it’s a bug in a software upgrade that deleted long term storage in addition to short term storage.

In my experience, the best way to mitigate these kind of things is to use PITR, which is always immutable. You make periodic snapshots, keep track of all changes since those snapshots. And then you use a policy to determine just how much data you want to keep here.

The problem is that not supporting delete operations is often very much impractical. Consider for example the recent heat Flickr (or was it Instagram?) has gotten for not physically deleting files but just marking them as deleted/invisible. You sometimes have really good reasons that you must support delete operations, so you must just make sure that there’s a reasonable delay before they are actually completely gone.


In case anybody else wonders what PITR is:

https://en.wikipedia.org/wiki/Point-in-time_recovery


This reminds me of this comment that was posted in the "medical data leak" discussion that is currently on the front page as well:

https://news.ycombinator.com/item?id=24196041


Good luck complying with the GDPA.


Do you mean GDPR? You don’t have to delete data immediately under the right to be forgotten.


The problem with that is that under GDPR you have to support full deletion of an user's data.


Not necessarily if you have a valid reason to storing the data. You might for example have to keep user's billing data around for legal reasons.


Was this billing data though?


Crucially, you do not have to do it immediately.


Store customer data immutable and encrypted, with a unique AES encryption key per customer. Encrypt that AES key again with a single RSA key-pair and store the encrypted AES-key in a database.

You can access your customer data, using the customer-specific AES key. You can access the customer-specific AES key using your private RSA key.

When you need to delete the customer data under GDPR, you can delete the encrypted AES key for that customer from your database.


> Store customer data immutable and encrypted, with a unique AES encryption key per customer. Encrypt that AES key again with a single RSA key-pair and store the encrypted AES-key in a database.

Now you have the worst of both worlds. You also now have 2 points of failure where data can get lost, because if either has a problem you lose data.


I can't think of a mechanism where you can lose the customer's database records but can still recover files associated with that customer ...


Total systen compromise. The backup for the files worked. the ones for the DB didn't.


Right, but how to you figure which customers own which files? I don't label the assets FrankPJonesPortlandOregonUSA9095551212.jpg. If I've lost the database record I have no idea who owns cat_picture.jpg


If memory serves, a class of Glacier storage (AWS) isn't practical to use in the EU because the inability to delete makes it impossible to serve "Right to Erasure" GDPR requests.


Store data and keys seperately.

The set of keys for all your customers should be only a few megabytes, so is much much easier to back up more frequently and more copies of.

It's also easier to have a "we do daily backups and delete any backups after 21 days" policy.


Wouldn't you need to edit every single backup where the key is stored in order to comply?


I assume they mean the standard practice of encrypting your backups with per-customer encryption keys. If you ever want to destroy all of the data for a particular customer you don't have to do any editing of backups, you just have to nuke every copy of the encryption key you use for that customer.


Storing names separately can help but the subset of data you have a legitimate legal reason to "refuse to forget" can change faster than you would want to rewrite your backups(well, nobody wants to rewrite backups, but you get what I mean)


You need to denormalize the non-forgettable information. This way you don't have to worry about what you can delete or not.

Store Invoices, billing, etc, separately. These also usually have a fixed pre-determined retaining policy time. Past that time? Anonymize/aggregate or just delete it.


Legislation can change and what you once thought to be forgettable becomes suddenly not.


It'd be extremely unusual for such changes to be retrospective. For example, if you are currently required to keep certain kinds of financial records for 3 years, and you delete records older than that, you won't be held at fault when that law gets updated to say 5 years, and you then fail to produce a 4-year-old document; it has not yet been a year since the new law took effect.

This extends to criminal law too, for example with statutes of limitation. If, 20 years and 1 day ago, there was a 20-year statute of limitations for murder charges, you cannot be charged for committing a murder on that day (barring extraneous circumstances, such as tolling); even if a law was passed the day after said murder removing this limitation.


Well, yes, but the topic here was data. And if at some time you thought it was forgettable and you scheduled it for deletion, once the law changes you should make damn sure all code involved in that deletion was checked or the data will be gone - against the new law. A big opportunity for human error here.


Not necessarily. While it's not set in stone, GDPR right to erasure might not be in conflict with backup retention.

>The GDPR is open to interpretation, so we asked an EU Member State supervisory authority (CNIL in France) for clarification. CNIL confirmed that you’ll have one month to answer to a removal request, and that you don’t need to delete a backup set in order to remove an individual from it. Organizations will have to clearly explain to the data subject (using clear and plain language) that his or her personal data has been removed from production systems, but a backup copy may remain, but will expire after a certain amount of time (indicate the retention time in your communication with the data subject). Backups should only be used for restoring a technical environment, and data subject personal data should not be processed again after restore (and deleted again). While this adds some complexity, it allows organizations to have some time to re-engineer their data protection processes.

https://blog.quantum.com/2018/01/26/backup-administrators-th...


As I understood it at the time this is the difference between archives and backup.


I see memory served me incorrectly! Apologies if I have spread misinformation. It was an offhand comment from an AWS rep a few years ago and things may have changed.


When GDPR passed there was a bit of a scramble to understand it, and people had all sorts of different interpretations that they were sharing.


I suspect you can square the circle, if you encrypt all the files, and store the keys in storage you can delete quickly.


So today I learnt:

1. Canon don’t have any backup procedures in place for their cloud platform. Any hacker now will be salivating at the idea of pulling a ransomware hack I imagine.

2. Canon developers follow the ‘test in production’ methodology of continuous integration.


>Canon developers follow the ‘test in production’ methodology of continuous integration.

Assuming they are making those calls


How do you know that? Gitlab had 5 backups and could restore none.


Is it a backup if it can't be used like one?


It is and is not at the same time until you try to restore it.


Quantum mechanics goes over my head


There is one somewhat reassuring lesson here -- No matter the company, no matter the brand, no matter their worth: software development is wrought with challenges, and major mistakes for seemingly obvious things are made by everyone, all the time, at every level.


This is not one of them. Ensuring proper data loss protection is something both Amazon and Google are very good at. A company of Canons size could easily get the proper guidance to avoid it. But it would not have been as cheap as storing the files on your own servers. So this happened because Canon did not want to spend the money on doing it right with offline backups or append only storage in a cloud with enough nines.


Companies of Canon's size can be constrained for resources just as much as any other company. Canon is largely a legacy company at this point, with significantly declining revenue the last several years. https://i.redd.it/lrlfen2x7sm31.jpg


Even more reason to spend a little bit more to make this right. It's not expensive to get a proper expert on the matter to figure things out. The last thing they want is bad publicity for things that could earn them recurring revenue.


I don’t think they deliberately thought they were doing something inadequately, these problems are always obvious in hindsight. Issue is that eventually most companies will have this moment seeing through the lens of hindsight, the reasons are always different.


Everyone should be looking for a backup strategy that involves at least a hot backup and a cold backup, with some sort of offline validation. However this is easier said than done (because now you've to deal with multiple file systems, decisions about cryptography, lack of interest in creating something that you hope you'll never need, and so on). I wish there was a service (and don't tell me about Backblaze because it's incomplete and doesn't work that well - from my own experience) that would do all these things for you in a reliable and trustworthy way (including giving you snapshots in physical media from time to time - delivering to different addresses, possibly in different jurisdictions, and so on).


I have a multi-dimensional backup strategy, which includes “cloudy” backups of my most important assets, using services like GitHub, local hourly NAS (Synology double-redundant RAID-like) incremental backups of my whole system, and running “hot” backups of my system and development volumes; automatically updated every four hours (or on-demand. It's a bootable CCC clone).

I seldom need to use anything more than my “hot” backups, but have occasionally needed to restore individual files from NAS.

Whether or not I have a backup is never even something I think about at all, which is a big weight off my shoulders.

But photos and videos are a different matter from code (my main assets). They require a lot of space. It’s easy for me to be smug about backups; but photographers have a much more intense set of assets.


Yes, backing up photos and videos is hard. However, if you do so professionally, you are probably better off not risking losing data so you should take precautions. If you are a hobbyist, it’s also cheaper for you to do so. The problem is lack of a usable and straight-forward solution, in my point of view.


A problem with digital media as well is that it encourages you to just save everything but it's easier that way. But as I'm discovering well into the digital era, I wish I had more curated photos and video--though that comes with its own problems.


I dread the day I decide to go through all my photos and organize them. It's one reason why I'm so reluctant to move off of google photos, it's sooo easy to find stuff and it groups photos together nicely.

But I really want to move my photos off of google.


From time to time, I have a perisistent feeling that iCloud might be misplacing my files. It's just a feeling, but I realized that it's quite challenging to be sure it's not happening.


It’s not so hard. A few command line commands on a Mac. Store the md5 of each file, throw it into a JSON file. Load it all up in a dict. I do something similar every month. Gotta do something with those cores overnight :)

It’s very brute force and inelegant, but also simple and quick enough.


One day, I'll do it. Honestly, I'm just afraid to find out the answer… :)


wow this is bad. I know you should keep local backups too but if you shoot RAW then that quickly becomes cumbersome, and only using cloud backup looks very convenient.

If I were a professional photographer missing client material or someone who lost potentially irreplaceable memories I'd want a hell of a lot more then just "[Cannon's] deepest apologies".


Fortunately it was a new service and likely not yet very commonly used. Definitely not something any professional or even amateur in their right mind would use as a primary storage, never mind the only one.

The primary use case of the service is the short-term 30-day storage mode where you can auto-upload photos straight out of camera and auto-download them to your computer. The long-term storage is an afterthought, the available space is tiny at 10GB, and if the service is not used in a year, all images are automatically deleted.


The software and service might have been ... provided "as is", without warranty of any kind, expressed or implied, including but not limited to...


> provided "as is", without warranty of any kind, expressed or implied, including but not limited to...

IANAL, but when a service is advertised as being ideal for storing your work or photos, isn't that directly implying that there is a warranty that their service is fit-for-purpose? You can't advertise something in big lettering and then countermand that in the small-print - so those magic words certainly don't shield the company from liability at all.

I understand that the "without warranty of any kind, expressed or implied"-line we see in software-licenses and EULAs is when software is distributed without consideration (e.g. open-source software), but when there is consideration (i.e. people paying Canon to host their photos...) then there's a liability if Canon lost peoples' data - so I understand they will be sued for this if anyone lost anything of value. At the very least a 100% refund...

They only way I can see Canon getting out of this is if they had prominent warnings displayed throughout their service's UX advising their users that their service was not suitable for long-term storage of valuable data.

Again, IANAL - can anyone is is a lawyer chip-in?


Also NAL, but I would say it very much depends on the law in which-ever jurisdiction it gets tested in - which might be the one you are in, the one Canon are in, or somewhere else entirely.

Some legal concepts such as "fit for purpose" as defined in the UK's Consumer Rights Act (2015) certainly seem relevant here, but that states that companies should offer replacement or refund. Note the word refund, not recompence, is used: Canon may be required to repay you in full everything you paid for the service. All £0.00 of it.

> i.e. people paying Canon to host their photos

Is this the case though? Several other comments have mentioned it being a free service. https://image.canon/st/en/faq.html states that too. Unless there is a non-free option too.

> but when a service is advertised as being ideal for storing your work or photos

I'm not sure what it is advertised as, but that doesn't seem to be what the service is intended for.

Reading other parts of the FAQ it seems that the service is intended as a transfer agent, with the convenience of online storage being a useful side effect. Quoting the FAQ: "Image.canon is designed to ease your imaging workflow – whether you are a professional, enthusiast, or casual user. Wirelessly connecting your camera to the service allows seamless forward of images not only to your computer and smartphone devices but ...". The implication I'm making being that they can just argue that their service is designed to move images around your devices and the user should have been backing them up from there.

Users might try to argue false advertising if "fit for purpose" doesn't fly because of the purpose being defined differently. But good luck funding such as case against the lawyers that Canon can afford. It would have to be a class-action or similar, unless some government body takes the issue up (which from the users PoV will effectively be the same, and the best they'll get is a small voucher for a few £ off future Canon offerings). That is what a lot of things like this boil down to: legally enforceable sometimes doesn't exactly mean legal, it sometimes means "can be enforced by having a better legal team than the little guy"!


> Again, IANAL - can anyone is is a lawyer chip-in?

I would expect the answers to be highly dependent on jurisdiction.


I'd personally prefer services to warrant a certain number of $$$'s in case of service outage, a certain number for data loss, and another number for data leaks.

I'd then like then to have insurance to ensure they are able to pay out rather than go bankrupt.

They can then proudly write those $$$ numbers on the feature list, and I'll use it to decide which service to buy.


Best avoid putting all your eggs in the same basket (build in some degree of redundancy in storing information, dependant on how important that info is to you).


I was wondering what those weird messages from https://image.canon/ were about.

IIUC the idea of their new platform was to be ephemeral in any case. Just to give a pipeline to stream in to other accounts.

I had no idea it was offering long term storage. Better off with S3 Glacier for that!


On-site and at least two off-site.

It sucks for their users who maybe don’t know the golden rule above. I certainly don’t blame non-tech folk who paid a known photography company to handle a complex problem on their behalf - they did the right thing in handing this responsibility to people who they believed could be trusted.

To the greater point - this is incredibly damning of Canon’s cloud storage going forward. As others have said, the amount of data lost overall is not much in the greater scheme of things, but that’s not the concern. What’s worrying is that they were able to lose any data at all. How much redundancy do they have? How are permissions managed?

When I backup photos and videos at home I have a script to chattr -i all the files independently as they’re stored on top of the redundancy and backups. You need to protect your data from yourself, too.


There is no cloud, it's just someone else's computer.


They’re paying for the negligence against software developers. Those two incidents come down to the design flaws after all. The CEOs of Canon Inc and Canon USA are too old to admit the fact that software is more powerful than what it used to be when the hardware was selling well simply owing to its pure performance and features and the software was just a driver. You would be surprised if I tell you how many skilled software engineers have left the companies in the last three years. What makes things worse is that the old men aren’t aware of why they left.


The cloud is just another backup option and you should have 3. That being said, you expect more from a paid service.


I can't access the article, but there was a recent ransomware attack on Canon [1]. I wonder if this could be related to it.

[1] https://news.ycombinator.com/item?id=24185734


I think they claim that this was a coding error and not a ransomware case.

The timing is convenient though.


Ouch, I am guessing they are using some on-premise or none major cloud provider? I feel for them though, without the right resource or budget they are doing their best. I doubt they intentionally decided to not make a good product.


Canon is no software company and doesn't have the right mindset (also see most other camera manufacturers mobile app reviews). Probably outsourced with a focus on having low development and operating costs.


Anyone who relies on any software camera companies produce deserves to lose their data IMO. Just look at Canon DPP. Almost 20 years old, and it's still a primitive, unusable, steaming pile of shit.


I'm redirected into a refresh loop with this url: https://www.digitalcameraworld.com/cc.html


I think people would be surprised at the amount of backup or file storage services that keep everything in one data center. Your files might be redundant across machines but it's not like they have a second data center somewhere with another copy of all your data, and they certainly don't backup everything up to tape or something.


The camera companies are so bad at software that the software companies are building better cameras than them... in the few millimeters thickness of a phone! I salivate for the day that Google or Apple buys Nikon or Leica or whatever and shoves all of that computational photography goodness under a 24x36mm camera sensor.


As of 2019, Apple sources their iPhone camera sensor components from Sony and the camera lenses from Largan Precision (and other companies). The software that Apple contributes is important - yes (after-all, this is software-defined-photography now, with Portrait Mode and that fancy but fake depth-of-field stuff in the single-lens iPhones) - but let's not pretend Apple is "a camera company".


I expect Apple sold more camera than Canon last year.


I'm not sure what the "software" companies would gain from buying Canon or Leica that they don't already have.

That said, you're also dismissing one of the biggest value propositions camera companies have, which is decades of physical user interface experience and developed hardware modularity. Phones absolutely do not replace good cameras for usability.

When travelling I take a mix of photos between my phone and camera, and when I need my camera it's irreplaceable, and when I don't need my camera, I use my phone. The phone is quick and easy, but the camera makes me a photo ninja and I get specifically configured shots quickly in a way I could never replicate with a phone interface.


I think you're replying to the wrong comment. The software companies (Google and Apple) would buy Nikon and release a device that looked and operated like a Nikon FM3HP camera with a 24x36mm sensor, interchangeable lenses, and all of Apple's image processing tech onboard. You take the photo and in 2 seconds the image is in your iPhone camera roll. You get the picture-taking optics and ergonomics of the Nikon, the photo-processing AI magic from Apple/Google, and the picture-sharing immediacy of SMS/Instagram.


Oh, I misunderstood your comment. I thought you meant for a Camera company to put their tech into a phone. You meant for software companies to put software into cameras.


Saving your shit on "the cloud": not even once.


I store my data on the nextcloud server sitting in the same room. I guess for added safety I could also set up automated encrypted backups to backblaze or something.


With spinning rust so cheap nowadays - anything could be backed up easily both in the cloud and locally.


No backups? Wut?


People call me crazy for having local backups, cloud backups, and offline backups for my critical assets.


3-2-1 backup is not crazy and also is an IT standard.


I'm utterly shocked at how many people assume that "The Cloud" is both run by people infinitely more intelligent and more competent than they are and that everything there is backed up.

What a crock of shit.


[flagged]


Brilliant username. But we're not reddit so content over style please :-)


Has GDPR compliance gone too far?


No.

Losing data is just as bad from a GDPR perspective.


Oh, really? Can you actually be fined for destroying disks containing customer information?


Yes, deletion also counts as data processing. Unauthorized data processing in this case.


Yes! Accidental loss is a data breach, see page 7ff of http://ec.europa.eu/newsroom/document.cfm?doc_id=47741


Interesting, they seem to call this an “availability breach”; of course, they require disclosure if it affects people. Do data storage providers need to essentially SLA, then?


The article doesn't mention GDPR.


I know it doesn’t.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: