It's not a crazy moment. They have a system running at real scale, and they found that while keeping the system up constantly with immense amounts of data in it, they were able to dynamically add a new node to their cluster --- just that balancing everything out took a lot of time for the system.
The operational challenge I infer from this is that they had waited to add that node until they really needed it, because their expectation was that adding the node would get them quick relief to their scaling issue. Instead, they got relief a few days later when the node was fully integrated.
Solution: don't wait to add nodes until the last minute.
The operational challenge I infer from this is that they had waited to add that node until they really needed it, because their expectation was that adding the node would get them quick relief to their scaling issue. Instead, they got relief a few days later when the node was fully integrated.
Solution: don't wait to add nodes until the last minute.