Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

When I was at AWS we were using generators from a large commercial supplier. We were constantly having issues with them refusing to take over if there wasn't sufficient load. Doing so puts lots of stress on a generator and can significantly shorten its life.

We went to the manufacturer and tried to get them to make a firmware change; we wanted the generators to sacrifice themselves under most every circumstance (short of danger to humans). Generators are an irrelevant cost compared to the cost of an outage. The manufacturer refused, even when we offered to buy them without warranties.

I don't know if they still do but at that point AWS started to buy basically the same generators straight from China and writing their own firmware.

There is also a great story about an AWS authored firmware update on a set of louver microcontrollers causing a partial outage.

AWS really likes to own the entire stack.

*It has been years since I left, details are a bit fuzzy but I think I've got these right.



> we wanted the generators to sacrifice themselves under most every circumstance

The rare failure mode that can cost $100m looks like this: when the utility power fails, the switch gear detects a voltage anomaly sufficiently large to indicate a high probability of a ground fault within the data center. A generator brought online into a direct short could be damaged. With expensive equipment possibly at risk, the switch gear locks out the generator. Five to ten minutes after that decision, the UPS will discharge and row after row of servers will start blinking out. This same fault mode caused the 34-minute outage at the 2012 super bowl: The Power Failure Seen Around the World. Backup generators run around 3/4 of million dollars so I understand the switch gear engineering decision to lockout and protect an expensive component. And, while I suspect that some customers would want it that way, I’ve never worked for one of those customers and the airline hit by this fault last summer certainly isn’t one of them either.

http://perspectives.mvdirona.com/2017/04/at-scale-rare-event...


This sounds nice, but makes zero sense. Direct short will be fused and isolated, otherwise UPS banks would get damaged (maybe even explode).

Edit, and reading further there it is:

>"If there was a ground fault in the facility, the impacted branch circuit breaker would open and the rest of the facility would continue to operate on generator and the servers downstream of the open breaker would switch to secondary power and also continue to operate normally. No customer impact."


The company that maintains my generators has a dummy load that is basically a large resistor bank and fan on a trailer. It seems like Amazon could just build something like that on a larger scale to load the generators - short term they could even rent the dummy load trailers and hook them up in the parking lot, right? For a permanent solution, have logic to switch in the appropriate amount of resistance to maintain the minimum load required.


There are, unfortunately, a lot of things between on and off when it comes to utility power. Over voltage, voltage drop (which can cause fuses to blow since amperage goes up), phase outage, phase imbalance, harmonic noise (causing ground/neutral feedback), micro outages (few cycles; <50ms). All of which can confuse systems such as a transfer switch, ups, generators or paralleling gear.

Most downtime I've seen in datacenters has been due to irregular power, not power loss. Case in point.

Dummy loads are good for testing but they are not typically variable - if you had 2.5mva generators and a 1mw dummy load you wouldn't be able to run more than 1mw of critical IT load.

I'll say this is the first I've heard of not having enough load to start a generator. They will happily start up and idle.


I would say the automatic transfer switch should be configured conservatively and go to backup power at the first hint of instability. If irregular power is the usual cause of downtime then I would expect the ATS to cope with those conditions, or it gets replaced with one that will.

Good point about the lack of variability. Not sure if the generators would take kindly to suddenly switching out a 1 mw dummy load to free capacity for real load.

On the other hand, it's not like these engines have fixed RPM throttles, and they should be able to handle a wide range of loads, albeit not at peak efficiency. Something doesn't fully make sense about that story. Maybe the generators were sized for their ultimate planned capacity and thus way too much for their deployment at the time.


I thought large diesel generators generally did operate at a fixed speed in order to output 60Hz. I know inverter/generators are common these days for small gas gensets, but have the big diesel ones switched over as well?


Can't you just have a smaller generator for low load? Literally wasting fuel so you can run a large generator for a low load should be criminal.


You might not realize how much fuel it takes just to maintain generators. They have to be exercised, load tested, overhauled, all on top of whatever time they spend providing backup power. Which is just as well, because an engine that sits without ever running won't operate when it is most needed, and the fuel can stagnate, or even attract water and grow algae.


It seems like excessively low load would be a relatively easy problem to solve. One could heat tanks of water to boiling and vent the steam, that could probably absorb as much energy as needed.

Or just crank up the AC :)


Huge DCs like Amazons and ours at Google are very efficient, cooling uses at most 10% of total power. So cranking up this wouldn't help. Nor wouldn't it be possible usual anyway, as these are typically using evaporation cooling, which can't really be cranked up like traditional compression cooling. And keeping a huge tank and heater ready would introduce another point of failure. The easiest is just to have your servers do some heavy calculations.


Compared to heating a water tank, it would be a much more responsible use of power to fire up some folding@home images.

Perhaps a more financially responsible solution would be to spin up a bunch of instances that mine some sort of cryptocurrency. I doubt it'd cover electricity costs, but it could offset it some.


There are load banks made for generators, just large resistive grids. A facility that I work at has one so that we can exercise the generator at operating load without actually switching over our power. So I would think a good solution for a DC might be to have load banks you could switch in when necessary.


Somewhere I have a picture of a car that was parked in the no parking area next to the dummy load for a data center generator. All of the plastic parts on the outside are melted off and the paint is a different color on the side that got the bulk of the hot air coming off the dummy load.

I had not realized it could get pretty hot there but it made sense given the energy they were dumping out.


Apart from the other ideas for picking up load, if you have land for it, then there are many productive ways to use excess generation capacity that can still be reliable, especially if a conventional resistive load is available on standby. Some methods can even bolster the DC’s own opex. None are cheaper than a resistive load, but I wonder if the resistive load was modeled as requiring high availability, then what is the pricing of the excess capacity?

* Plasma arc garbage incineration. Not smelly if the input arrives in a ceramic sealed container.

* Glass cullet smelter producing glass foam insulation bricks. Turn excess power into additional DC modular insulation. My personal favorite, because this is a giant resistive load that directly supports the DC’s bottom line.

* Aluminum recycling smelter. Build additional heat sinks for increasing rack efficiency.

* Distilled water generation. Route it back through water chillers, cut down on mineral scaling damage over time.


The generator is not going to run unless there's an outage.


I forgot to mention that you only need to set up for very small-scale production, almost maker-scale, so not a whole lot of land is required. With automation, any outage will create a steady trickle of usable goods. The DC's I work with run generators once a month to test (and generally, most engines don't like sitting still all the time). Certainly sufficient for use by nearby residents, so co-generation would benefit third parties if not the DC itself.


Do you know why its harder for a generator to power an unloaded circuit? Just curious.


The generator's mechanical parts will perform more efficiently at some speeds/loads than at others. Everything happening at a different temperature and pressure will change fluid flows and forces and put different stresses in different places. Combustion won't happen cleanly, oil and gases won't flow the way they're designed to and the engine will get dirty, etc. If you're running outside of the region the engine was designed to operate in you're stressing it in ways it was not designed to handle, lowering efficiency and shortening its lifespan.

(You could run the generator at its favorite speed and sink excess power into a dummy load. This'd waste a ton of fuel, though, enough so that I'm not actually sure whether it'd be cheaper than eating an engine. That and I'm not sure what a data-center-sized dummy load would look like.)

https://en.wikipedia.org/wiki/Power_band


> This'd waste a ton of fuel, though, enough so that I'm not actually sure whether it'd be cheaper than eating an engine.

I think you'll find fuel is extremely cheap, even when you're burning many gallons per minute.


Hmmm. I didn't actually do the numbers. Let's see...

A randomly-googled 2MW diesel generator consumes 50 gallons/hour at quarter load and 160 gallons/hour at full load [1]. So let's say it's 100 gallons/hour to run a 2 MW generator at full load instead of quarter load. The OVH incident report [OP] says that their data center has two cables in each carrying 10 MVA (mega-volts-amperes, ~ watts), giving us 20 MW as roughly the maximum power consumption of the data center. Divide, multiply, it'd cost 1000 gallons/hour to run at full load, orrrr $3000/hour at the current price of diesel [2].

Yeah, that's pretty cheap, okay. You might run these generators for a few hundred hours over the lifetime of the installation and they probably cost millions of dollars, so you're not going to spend more on fuel than you'd spend having to replace the generator early. Why don't they have a dummy load for that scenario?

[1] http://www.dieselserviceandsupply.com/Diesel_Fuel_Consumptio... [2] https://www.eia.gov/petroleum/gasdiesel/

edit: Now that I look at the math I did I realize that I didn't need to look at OVH in particular at all, just compare the cost of fuel to the purchase price of one of the diesel generators I was looking at. >_< Meh, math still works, I'll leave it.


Also, if you have one power outage, it's more likely you'll have another soon, so you don't want to burn up your generator (likely a long lead time to replace) during the first outage, just to have nothing during the next outage.


Fuel is much more expensive in Europe. A liter of Diesel is about 1.40, so $6.50 per gallon.


Does Europe tax diesel the same way no matter where it's used? I know in the US diesel for generators is exempt from a lot of taxes resulting in it being significantly cheaper in some states.


I only know of Finland, but here diesel for work machines is much cheaper than diesel for driving.


In the UK Diesel is taxed differently for agricultural usage, it typically has a red dye added to it to indicate it's for agricultural use only. I don't honestly know about generators though.


You can use red diesel in a generator, I think the legality comes down to whether it's being used to power a vehicle on a public road.


You're right, some countries have exceptions. Netherlands where I live used to, but no longer does. Not sure about France.


There's the externality of pollution you didn't consider.


That's true, but how much pollution does it cost to replace the generator?


Since the generators apparently run less efficiently at partial load, that's not a given either. They'll almost certainly produce more pollution per unit fuel. Hm, is that a few percent more, or a few times more?


Dummy load looks like this: http://www.simplexdirect.com/images/pic-l-atlasTrailer.jpg It's "only" a megawatt but there are bigger ones as well.

Basically a big hairdryer. :-)


My understanding is that it's the diesel engines that don't like it (the generator head itself is fine). Basically, some of the fuel in the combustion chamber doesn't get burned, and comes out the exhaust. This builds up in the exhaust system, forming an ooze in the pipes. I think it's worse on a turbo diesel, because now the ooze is coming into the turbocharger.

https://en.wikipedia.org/wiki/Wet_stacking


Adding a heat-bank / load-bank[1] solves this problem.

1. Typically these are just big fan heaters https://en.m.wikipedia.org/wiki/Load_bank


As far as I understood, it's mostly because it won't reach optimal temperatures and pressure, which then leads to incomplete combustion and carbon buildups in the cylinders.


And at AWS scale you can always have a human team present to provide a human back up the cost of having 2 or 3 people on an overnight shift is trivial compared to the down time costs.


I wonder why they couldn't do that at OVH. The data centres are always manned and they had 8 minutes on batteries. Why didn't they detect after 20 seconds that generators don't start despite power failure and triggered a manual failover? Sounds like something the staff should be able to do?


> The manufacturer refused

That's how you create business opportunities for others, heh


Why Diesel generators? Natural gas or gasoline should avoid wet stacking.


Diesel tends to be used more because they tend to be much lower maintenance than other fuel engines, and the cost of the fuel itself is also lower due to the higher efficiency.


I think also storing diesel is easier. It's not very flammable as compared petrol (gasoline) or LPG / LNG.

Although diesel can get microbial contamination if not used in 6 - 12 months.[1]

1. https://en.m.wikipedia.org/wiki/Microbial_contamination_of_d...


Why are they lower maintenance in this situation (i.e. power failsafe generators that are used for only a few hundred hours over their life)? I appreciate that diesel car/truck engines are lower maintenance, but they are used in a very different way.


Easier to generate a fixed-frequency AC output. Diesel engines can run at a fixed RPM and adjust fueling to match load, where gasoline and natural gas need a narrow AFR and have to vary rotation speed for power output. Maybe less of an issue these days as inverter/generators are more common, though I dunno if that's the case for the big generators.


> There is also a great story about an AWS authored firmware update on a set of louver microcontrollers causing a partial outage.

Do tell :-)


Seems like you would have to disable emissions systems to get what you described, which would be illegal ;)

The actual solution to that problem, of course, is to use smaller generators?


Or you can fudge your emissions testing.


> AWS really likes to own the entire stack.

I wish you and I could own the full stack on our hardware, too. Imagine free drivers, firmware and microcode, full schematics and specification of all the parts. That would approach my atheist's heaven.


I'm not an expert, but a high power device safe load means that it won't place a lot of the load on itself. when it places the load on itself it gets really hot, so hot that parts may melt, and things that were supposed to be insulated may now be conductive.

Doing high voltage electronics without a license is a crime in a lot of places. I'm pretty sure a person with the right specialty can add a load so you won't need to hack up the firmware to violate its safe operating ranges.

If someone told me this in person I'd consider contacting authorities over their negligent behaviour. The manufacturer was willing to lose business for a safety matter even though allowing the firmware hacks would be really easy.


Amazon spends billions on energy and has hundreds of hardware and electrical engineers. They are one of the best if not the best in data center engineering. They know what they are doing.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: