Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Brian from Backblaze here. We do angst over the idea of a "hot spare" where the very second we fail a drive it can begin rebuilding elsewhere. But that takes up redundancy even when it is not used (an extra drive waiting) which raises cost.

At our current scale it is becoming less and less of an open debate, because we now have 7 day a week staffing at our datacenter and the datacenter techs jump right in and replaced failed drives often within an hour or so. A "hot spare" would only save a couple hours of rebuild time. But remember, your mileage will vary - until you reach half our scale you cannot afford even a Monday-Friday datacenter tech, so you might only be able to replace failed drives on Mondays and Wednesday, which widens your exposure.



Have you considered rebuilding into already available space in the cluster?

Something similar to how ceph or swift handles rebuilds? you get rid of the individual disk sitting around as a spare. though it would break the idea of a tome being a specific collection of disks. you would need to be able to identify and move a shard around your cluster into other vaults and a shard would need to be smaller than the raw disk size.

this would increase network overhead as well. (more movement.)

I'm probably just rambling here so you can probably ignore me. (you have awesome tech there though)


> I'm probably just rambling here

:-) Not at all! Don't assume we're some perfect team of scientists that know all the correct solutions before we start coding. We often angst over these decisions and designs, knowing that once we write the code a lot will be set in stone (hard to change) for a numbers of years. The reason it becomes hard to change is we don't have a huge development team that can afford to rewrite the software every year, so we try to get it correct and then go on to work on new things or polishing up corners that need polishing.


Two people for a year is a lot of hard drives you don't get to buy. One of the interesting things I got to experience at Google when I was there was the difference between drive economics at scale and single drive pricing. Next time you're in Sunnyvale we should chat over a beer.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: