Uber migrates microservices to multi-cloud platform running Kubernetes and Mesos

scarface_74 · on Oct 22, 2023

> The team used existing tooling to move services between zones in order to ensure they were portable. Firstly, they allowed services to be moved back to the original zone to resolve any portability issues, but once resolved, services would be moved periodically to validate portability and prevent regressions.

This is something that most companies don’t do when they say they want to do $x to “prevent lock in”.

Uber actually is testing for portability along the way.

dehrmann · on Oct 22, 2023

It's probably more cost effective to negotiate a long-term max price with your cloud provider with a force majeure clause.

scarface_74 · on Oct 22, 2023

Unless you’re crazy enough to work with GCP, the “my cloud provider is going to lock me in and then raise prices” doesn’t happen. AWS has only raised prices in a few very obscure cases ever.

One of which is putting a price on HEAD requests in S3 (?).

AWS already gives long term price discounts/guaranteed prices for reserve pricing and Big customers already have negotiated contracts.

Narkov · on Oct 22, 2023

Price increases are just one way you can get screwed. You can also lose out when your provider doesn't drop prices or pick up operating efficiencies that other providers have.

scarface_74 · on Oct 22, 2023

And when has that happened with respect to either GCP, AWS or Azure at a level that it’s worth migrating?

Even if you have done everything in a “cloud agnostic” way, “infrastructure has weight”. Any large migration isn’t just technical , it involves project management, organization training, regression testing, compliance testing, security testing, architecture review boards, vendor negotiations, firewall changes, coordination with third parties who may only allow list certain IP addresses, data migration, etc.

Heck they often have multiple physical network connections to the cloud provider (Direct Connect)

Anyone who thinks they can run everything on K8s and they have “cloud agnosticism” has never done a very large scale migration.

You would be amazed how long it takes to do a bog standard lift and shift of a hundreds of plain jane VMs and VM hosted databases. You can’t get anymore cloud agnostic than that.

source: I’ve done a few over the years in both the “real world” and working in the cloud consulting department at AWS (Professional Services). I no longer work at AWS and have no specific loyalty to AWS.

dehrmann · on Oct 22, 2023

The other thing I'd add is cloud agnosticism doesn't scale. If everyone were prepared for it, there wouldn't be enough elastic capacity with other cloud providers. You'd need enough reserved capacity in another cloud to pull it off, but I guarantee you finance will say "no." What makes the most sense is multi-region work since it's more cost effective, and it's the more likely failure scenario.

scarface_74 · on Oct 23, 2023

There usually isn’t enough elasticism if a region fails. You really even then need to have reserved capacity and maybe even a hot standby or active-active.

charcircuit · on Oct 23, 2023

>And when has that happened with respect to either GCP, AWS or Azure at a level that it’s worth migrating?

Egress price makes it worth migrating away from those three.

scarface_74 · on Oct 23, 2023

You’ve never been the one neck to choke when things go wrong have you? If Billy Bob’s cloud provider goes down, you are going to constantly be blamed for making a poor decision. If anything goes wrong they are going to question your decision.

If you choose AWS (or Azure) and a region goes down - everyone else is down too. “No one ever got fired for choosing IBM”.

Choosing the most popular vendor - AWS, Salesforce, ServiceNow, or whatever vendor is in the upper right Gartner magic square quadrant never gets questioned by the powers that be.

charcircuit · on Oct 23, 2023

Even if the alternate cloud provider goes offline for an entire day it still would be worth it financially compared to AWS because egress is so expensive there.

scarface_74 · on Oct 23, 2023

And you ignored the entire reply didn’t you? It’s naive to think at the “one neck to choke level” that all decisions are made for purely technical reasons.

And for you to just say “it’s okay to be down an entire day” because of egress cost tells me that you have never done infrastructure requirements analysis at scale.

First you have to assess the cost of being down for a period of time, then you have to access RTO, RPO requirements and not all workloads have high egress costs - especially things like data lakes that may have a lot of ingress and processing costs, but relatively low egress costs.

I’ve done a lot of different cloud projects over the years from lift and shifts, to data lakes, to cloud call centers, to serverless, to ETL jobs, you can’t just blindly repeat “egress costs” in a vacuum without understanding use cases.

charcircuit · on Oct 23, 2023

I never claimed that it is always worth it to switch because of egress costs, but that egress costs are a reason to switch. If I ran my sites on AWS it would 100x the cost of running it.

scarface_74 · on Oct 23, 2023

These were your words

> Egress price makes it worth migrating away from those three.

> Even if the alternate cloud provider goes offline for an entire day it still would be worth it financially compared to AWS because egress is so expensive there.

You never qualified either with “in my particular use case”. If you had, I would have had no argument. I haven’t been flown into your company along with SAs, sales, project managers, etc for a week to do a proper “as-is” assessment and to see what your requirements are.

I haven’t accessed the competencies of your staff or determined what is your competitive advantage and what is the “undifferentiated heavy lifting” in your company.

I would never make any blanket statements without knowing your specific use case and automatically assume “cloud” is always the right or wrong answer

charcircuit · on Oct 23, 2023

>And when has that happened with respect to either GCP, AWS or Azure at a level that it’s worth migrating?

This suggests you are looking for a single example where the pricing of the big 3 is compratively high compared to the competition at a point where it worth it to switch. I gave the example that the price of egress is one cost which is not competitive. If I had instead said that SQS was not competitive obviously that wouldn't matter to businesses that don't use it enough to make a difference.

scarface_74 · on Oct 23, 2023

I’m looking at it from more than just “cost of infrastructure”. You also have to consider reliability, managed vs unmanaged, the competencies and expertise of your team, organizational constraints whether you have a more or less static or dynamic workload…

Microsoft and AWS have versions of the “Cloud Adoption Framework”

https://learn.microsoft.com/en-us/azure/cloud-adoption-frame...

https://aws.amazon.com/cloud-adoption-framework/

And the TOGAF framework has something similar

https://pubs.opengroup.org/architecture/togaf9-doc/arch/chap...

I am saying when considering any “large” implementation there are a lot of considerations outside of infrastructure bills.

I’m not saying that every company should go cloud. But the “lenses” you have to look through are multifaceted

leesalminen · on Oct 23, 2023

I’ve been in places where total AWS spend was a rounding error compared to revenue. Egress fees weren’t a top 10 cost and wasn’t worth optimizing for.

The bosses would’ve blamed me for choosing a tier 2/3 noname provider the first time a day of downtime happens. And they would’ve been right.

insanitybit · on Oct 23, 2023

Egress prices make migrating away hard. But a lot of products don't need to push much data out of AWS, especially with VPC peering based products.

scarface_74 · on Oct 23, 2023

And for things like transferring data to and from S3 within AWS you use an S3 gateway endpoint so data stays within AWS’s network

intelVISA · on Oct 23, 2023

Whilst there are some nice toys e.g. Spanner, there's generally little other reason to tolerate the abusive pricing of 'the cloud' ime

dehrmann · on Oct 22, 2023

That's a long-term concern, though. The thing to worry about is a rug-pull, but no major provider will do that. They could, but they won't.

CHY872 · on Oct 23, 2023

The size of the savings you can negotiate is a function of how locked in you are. A big customer will get discounts well beyond reserved instances, in response for (usually) committing to increase their expenditure above where it is.

The better your competing offer, the better the negotiating position. And while Amazon hasn’t gotten more expensive per se, it’s certainly not gotten cheaper.

eatonphil · on Oct 22, 2023

This article is a recap of the original engineering article by the quoted developer and manager at Uber.

https://www.uber.com/en-GB/blog/up-portable-microservices-re...

dang · on Oct 22, 2023

Ok, we've changed the URL to that from https://www.infoq.com/news/2023/10/uber-up-cloud-microservic.... Thanks!

voz_ · on Oct 22, 2023

Uber Microservices were such an inefficient PITA. There was buzzword soup of a bunch of half baked infra pieces and they were always migrating. Every part of the stack was rotten. Udeploy, xterra, tchannel, schemaless, etc etc.

My peak “wtf” moment was when we had a SEV because two services that should communicate actually used different versions of thrift, both hard forked by Uber, with different implementations for sets. Passing a set from one service to another caused everything to break.

activescott · on Oct 22, 2023

> In preparation for the move to the cloud, the company spent two years working towards making all stateless microservices portable so that their placement in zones and regions can be managed centrally without any involvement from the service engineers

I'd like to hear more about how Uber organized the engineering teams over two years to make "stateless microservices portable".

How many teams? What were the requirements to each team? What was the timeline? How did they know it was completed? How was it prioritized along other business priorities of the teams? How long did they think it would take originally? Was it worth it?

s3p · on Oct 22, 2023

Maybe direct these questions to a C-level employee at Uber who could potentially answer them for you?

echelon · on Oct 23, 2023

There are no doubt lots of Uber employees that post here. This is an appropriate forum to ask.

scarface_74 · on Oct 23, 2023

And why do you think they could answer you with any details without going through comms?

fragmede · on Oct 23, 2023

because this is the Internet and anyone can make an anonymous account via VPN, if someone were so inclined.

scarface_74 · on Oct 23, 2023

Yes and I would break my NDA to answer a random question on HN for what personal gain?

echelon · on Oct 23, 2023

It's a ridiculously innocuous question. If I were worried about blowback, I'd question working for Uber at all.

testfrequency · on Oct 23, 2023

TBH I don’t trust HN with my data. They have weird account policies, and I’d not be surprised if they felt like witch-hunting someone down they would…especially in the interest of ycomb alumni

s3p · on Oct 23, 2023

I'm replying to my own comment here since it was so severely downvoted. OP was musing about a bunch of questions that didn't seem useful to the discussion. Who was he asking? And if you're going to say an Uber staff member, why doesn't his comment indicate that? It just didn't seem to add to the discussion at all.

activescott · on Oct 23, 2023

OP here... @s3p: FWIW, I didn't take issue with your comment. Surprised it was down voted. I do find that often people to reply to such comments though with something like "Uber team member here..." so it didn't seem ridiculous, but your suggestion seemed authentic and fair to me.

jbotdev · on Oct 22, 2023

It seems like they’ve gotten to the “holy grail” of deployment where developers don’t have to worry about infrastructure at all in theory.

I’ve seen many teams go for simple/leaky abstractions on top Kubernetes to provide a similar solution, which is tempting because it’s easy and flexible. The problem is then all your devs need to be trained in all the complexities of Kubernetes deployments anyway. Hopefully Uber abstracted away Kubernetes and Mesos enough to be worthwhile, and they have a great infra team to support the devs.

zeroCalories · on Oct 22, 2023

It's not clear to me that being completely unaware of your infrastructure is a good thing. I don't think it's too much trouble to ask an engineer to understand k8s and think about where their service will live, even if it's a ci system that actually deploys. Furthermore, many layers of abstraction, especially in-house abstraction, just mean you have more code to maintain, another system for people to learn, and existing knowledge that you can't leverage anymore.

opportune · on Oct 22, 2023

There is a wide spectrum of infrastructure (and platforms, frameworks, etc) from “allows applications to do just about anything, though it may be very complex” to “severely constrains applications but greatly simplifies doing things within those constraints.” To be clear by “just about anything” I am not talking about whether some business logic is expressable, but whether you can eg use EBPF and cgroups, use some esoteric network protocol, run a stateful service that pulls from a queue, issue any network call to anything on the Internet, etc.

If you are developing applications software like Uber 99.99% of the time you really do not need to be doing anything “fancy” or “exotic” in your service. Your service receives data, does some stuff with it (connects to a db or issues calls to other services), returns data. If you let those 0.01% of the things dictate where your internal platform falls on that spectrum, you will make things much more complicated and difficult for 99.99% of the other stuff. Those are where leaky abstractions and bugs come from, both from the platform trying to be more general than it needs to be and from pushing poorly understand boilerplate tasks (like configuring auth, certifications, TLS manually for each service) to infrastructure users.

Being unaware (of course not completely unaware, but essentially not needing to actively consider it while doing things) of infrastructure is actually the ideal state, provided that lack of awareness is because “it just works so well it doesn’t need to be considered”. It means that it lets people get shit done without pushing configuration and leaky abstractions onto them.

I’ll give you one example of something that does an excellent job of this: Linux. Application memory in linux requires some very complex work under the hood, but it has decent default configurations with only a couple commonly changed parameters that most applications don’t need much, and it had a very simple API for applications to interface with. Similar with send/receive syscalls and the use of files for I/O ranging from remote networking to IPC to local disk. These are wonderful APIs and abstractions that simplify very hard problems. The problem with in-house abstraction isn’t that they are trying to do abstractions but that sometimes they just don’t do a good job or churn through them faster than it takes them to stabilize.

intelVISA · on Oct 23, 2023

Well put, 99% of companies don't need to introduce such complexity for their relatively trivial use cases (though well-intentioned albeit bad engineers will try to invent it anyway).

jbotdev · on Oct 22, 2023

Part of my point is the goal with such a system is usually to require less infra work/knowledge from your devs, but it backfires if you don’t invest enough in your abstraction.

The implicit goal of these abstractions is really to central knowledge and best practices around the underlying tech. Kubernetes itself is trying to free developers from understanding server management, but you could argue it’s not worth using directly vs. just teaching your devs how to manage VMs for the vast majority of organizations.

I don’t think you’re ever going to stop more and more layers of abstraction, so the best we can hope for is they’re done well. Otherwise you may as well go back to writing raw ethernet frames in assembly on bare metal.

zeroCalories · on Oct 22, 2023

> Part of my point is the goal with such a system is usually to require less infra work/knowledge from your devs, but it backfires if you don’t invest enough in your abstraction.

I disagree that the solution is to simply build more. Often the best thing to do is accept that devs will need to know a little infra, and work with that assumption.

> The implicit goal of these abstractions is really to central knowledge and best practices around the underlying tech.

I agree with that.

> Kubernetes itself is trying to free developers from understanding server management, but you could argue it’s not worth using directly vs. just teaching your devs how to manage VMs for the vast majority of organizations.

The difference is that spinning up a VM and setting it up to have all the features you would want from k8s would be too much to ask from a dev. You would probably just end up re-creating k8s.

> I don’t think you’re ever going to stop more and more layers of abstraction, so the best we can hope for is they’re done well. Otherwise you may as well go back to writing raw ethernet frames in assembly on bare metal.

The problem is that abstractions are not free, and most of the time they aren't done well. Once in a while you'll get one that reduces(hides) complexity and becomes an industry standard, making it a no-brainer to adopt, but most of your in-house abstractions are just going to make your life worse.

pm90 · on Oct 22, 2023

I think the biggest “win” with abstractions is that it makes it easier for infra teams to update underlying concretions (is that a word? the concrete version of the abstraction) without having to dig deep into the codebase.

e.g. with kubernetes, if you have the actual manifests defined by every team, it is a pain to do any sort of k8s updates. With a simple abstraction where teams only define the things they are interested in configuring (eg helm values), that simplifies this task a lot.

oceanplexian · on Oct 22, 2023

All it takes is for one microservice to start hanging on a GRPC request, server hardware stops doing some fundamental thing correctly, or some weird network quirk that 10x’s latency to half the switch ports in a rack, and you end up with insane, sophisticated cascade failures.

Because engineers don’t have to understand infra, it often spans geographies and failure domains in unanticipated, undetectable ways. In my opinion the only antidote is a thorough understanding of your stack down to the metal it’s running on.

scarface_74 · on Oct 23, 2023

A single engineer can’t understand everything at scale.

Even in a 100 person startup that I worked for where I designed the infrastructure and the best practices and wrote the initial proof of concept code and best practices for about 15 microservices it got to the point where I couldn’t understand everything and had to hire people to separate out the responsibilities.

We sold access to micro services to large health care organizations for their websites and mobile app's. We aggregated publicly available data on providers like licenses, education etc.

Our scaling stood up as we added clients that could increase demand by 20% overnight and when a little worldwide pandemic happened in 2020 causing our traffic to spike

nitwit005 · on Oct 22, 2023

None of the layers of abstraction are perfect. You have to deal with the whole mess all the way down.

We've had individual EC2 instances go bad where I currently work, with Amazon acknowledging a hardware problem after a ticket is raised. The reality is, quickly resolving the issue means detecting it and moving off of the physical machine.

Naturally our tooling has no convenient way to do that, because we have layers of things trying to pretend physical machines don't matter.

scarface_74 · on Oct 23, 2023

No the answer is keeping all of your VMs stateless and just using autoscaling with the appropriate health checks. Even if you just having a min/max of 1.

nitwit005 · on Oct 23, 2023

Describe a health check that can detect any possible hardware problem.

The error rate on the machines was higher in both cases, but many requests still succeeded. Amazon certainly didn't detect an issue right away either.

scarface_74 · on Oct 23, 2023

There is no way that you could record metrics - even custom metrics that get populated via the CloudWatch logs agent to CloudWatch and over a certain threshold of errors, bring another instance up and kill the existing instance? If you could detect sporadic errors there must be some method to automated it.

I’m assuming this isn’t a web server, if so it’s even simpler.

nitwit005 · on Oct 23, 2023

A statistical rule moves you into the realm of deciding what rate of false positives and false negatives you'll tolerate. Based on data from exactly two incidents in this case, which is obviously a bit fraught.

5e92cb50239222b · on Oct 22, 2023

  > abstractions on top Kubernetes
  > abstracted away Kubernetes

I am beginning to think it's not such a bad thing to live and work in a third-world country far away from SV-induced hype cycles. This is genuinely painful to read.

aeyes · on Oct 22, 2023

But lots of people here talk positive about services like Heroku or Fly where you just push the code somewhere and it runs without you having to know a lot about the infrastructure.

Not every software development problem is a big scale problem and once you identify such a case you can start optimization work taking all the low level details into account. In reality most scalability problems revolve around databases, caches, concurrency and locks and you probably aren't going to tackle a lot of these in your average stateless service.

threeseed · on Oct 22, 2023

Kubernetes works great for larger projects when combined with ArgoCD or similar.

They all use GitOps which means all infra deployments and changes are tracked and easily able to be rolled back on any issues. And the complexity is nothing compared to having to manage your own cloud resources using Terraform etc which used to be the case.

And these days every developer needs to be on board with DevOps and so there are no real old-school infra teams supporting anyone.

Nextgrid · on Oct 22, 2023

The other "leak" in these abstractions that arises from physical limits is performance, especially when it comes to IO.

This is a major problem for databases and ultimately makes database "portability"/fault tolerance tricky since they work best with direct-attach storage that's inherently bound to a single physical machine.

pm90 · on Oct 22, 2023

Not to mention there is all sorts of other limits that you can hit at scale just on the compute layer itself (eg max pids, file descriptors etc).

I don’t know if we can truly abstract away the underlying system. The best we can do is give a best effort approximation that works in most cases, but explicitly call out the limits when they are reached.

I suspect that this is just the bubbling up of the underlying physics limitation of having limited resources where compute is run.

x86x87 · on Oct 22, 2023

Does Uber really need 4000 microservices?

danpalmer · on Oct 22, 2023

A different (better?) question is, does Uber need 4000 API contracts?

The answer to that is probably yes. APIs let us split work across systems/people/teams/regions, and provide a way for both sides of a split to work together. Uber has a lot of teams, a lot of engineers, and so it makes sense that there are a lot of API boundaries to allow them to work together more efficiently. Sometimes those APIs make sense to package as microservices.

lhorie · on Oct 22, 2023

Uber has several different APIs for users. A naive purist might think that's silly until you realize a rider is user, a driver is a user, a courier is a user, a restaurant owner is a user, a line cook is a user, a doctor's secretary is a user, a Uber employee is a user, a freight broker is a user, an advertising manager is a user... people can simultaneously be multiple types of users and have multiple profiles as a single type of user, and did I mention that you have to properly secure PII due to being in a high regulated industry? And that's just users.

Don't even get me started on anything money related :)

piyh · on Oct 22, 2023

Plus there's a surprisingly high floor on the number of apis a large company needs for basic stuff like "set up new hires automatically in all the system needed"

parthdesai · on Oct 22, 2023

> a line cook is a user, a doctor's secretary is a user

I was with you on other types of users, but can you elaborate on these particular use cases?

anon84873628 · on Oct 22, 2023

UberEats

parthdesai · on Oct 22, 2023

Where does a line cook's use case fit in it? From what I know, uber eats sends a order to a restaurant, an employee manually punches in the order on their POS system, and the order ticket goes into the kitchen

anon84873628 · on Oct 22, 2023

You don't think there are a variety of people at a restaurant who might interact with the system? Is this particular detail so very important to the point of the parent comment?

edgyquant · on Oct 23, 2023

I think the parent comment tried to prove a point by making an extremely frivolous claim and naming every person they could think of as a “user” which means they are either wrong or they failed to adequately make whatever point they were trying to. Uber doesn’t need an api for line cooks, so using them as a justification for a large number or micro services was not rhetorically sound.

4death4 · on Oct 22, 2023

> Uber has a lot of teams, a lot of engineers, and so it makes sense that there are a lot of API boundaries to allow them to work together more efficiently.

Isn’t that the premise of the question? Does Uber need so many engineers?

joshmanders · on Oct 22, 2023

> Isn’t that the premise of the question? Does Uber need so many engineers?

The only people who can answer that are employees at Uber.

christkv · on Oct 22, 2023

New unrelated databases, frameworks and queue services don’t write themselves duh

davidw · on Oct 22, 2023

I wonder what setting up a local dev instance is like for anything involving more than one or two of those.

threeseed · on Oct 22, 2023

I've worked on a couple of extremely large micro services projects.

And the thing is that nobody ever needs to run the entire stack other than end to end tests which get run in the cloud.

You just checkout the services you need and because they are designed to be isolated the dependencies will usually be automatically stubbed out. So it's just a matter of running them or chaining them together if you have a particular scenario to test.

x86x87 · on Oct 22, 2023

Question for you: how does performance measurement and optimization work in that environment? Is the key some sort of meta tooling that understands relationships between microservices? How would you express such relationships in the first place?

Q2: How do you ensure the stubbed deps behave like the real thing?

Q3: how do you handle logging and metrics in an unified way across the stack? And related to this: how do you ever get to upgrade services crosscutting concerns that ideally are not invented in every service?

oooyay · on Oct 22, 2023

> how does performance measurement and optimization work in that environment?

SRE here. Generally speaking, each API or each service will have a contract that it must adhere to depending on upstream and downstream relationships and their fail safes. Each service (or API) will then load test in isolation.

After that, if you want to be really sure about regressions (which would include fail safes) you load test the whole thing put together.

> Is the key some sort of meta tooling that understands relationships between microservices?

This is quite hard to do when you have a lot of transactions. I don't think there's commodity software that does this because you'd need to configure that software to map on keys, then map those keys to services. Generally, the easiest way is to get engineering teams to declare upstreams and downstreams.

> Q2: How do you ensure the stubbed deps behave like the real thing?

Generally, generation. Something like protobuf or Open API generation will do.

> Q3: how do you handle logging and metrics in an unified way across the stack?

You issue high level standards like, "We'll use JSON logging with UTC time formatting". At the end of the day logging is very contextual and in a service ownership model the service owners are usually the ones reading and alerting on their logs.

> And related to this: how do you ever get to upgrade services crosscutting concerns that ideally are not invented in every service?

Shared dependencies. I'm not actually sure what's a cross cutting concern; generally services that are this small should be designed to operate mostly independently. They're small, but "microservices" tend to have a lot of fail safes built in. If you're referring to how do we not write 4000 config loaders then there's usually a team that builds a very generic config loader and everyone or a majority use it.

thakoppno · on Oct 23, 2023

> JSON logging with UTC time formatting

perhaps simplest, biggest impact in my log life has been adhering these principles.

gen220 · on Oct 23, 2023

I've worked at a place that architected 100s of microservices pretty well, in a similar way that Uber apparently does.

Q1: (perf) these tools exist, the buzzword phrase is "distributed tracing". The relationships are actually not explicitly defined for the tooling to work, but rather inferred. Visualize a network call as a call-stack, where each service is a level in the stack. Jaeger (a CNCF project addressing distributed tracing) was coincidentally started by Uber.

Q2 (stubs): In my experience, mocked responses get you a long, long way. Typically the API response type that you're mocking is generated from a protobuf (or thrift, OpenAPI, etc.) file. If your dependency changes that type in a way that breaks your test, the CI platform will let them know.

If it's a more subtle change (like, it used to deterministically return 18 and now it deterministically returns 20), it's really on the service owners to communicate changes and grep the code base before making the change.

Q3 (logging/metrics): Typically by using shared "logging" and "metrics" lib for each language. Every service will typically be a gRPC service and accordingly a standardized + generated-from-protobufs set of metrics to Prometheus, by default.

Q4 (how to upgrade common libraries): this is definitely a tricky one. The answer is, basically, really carefully. Typically, you'll want your infrastructure to be compatible with vX and vX+1, and give teams a deadline to cut over from logging X to X+1. The couple of weeks before that deadline usually involves a lot of cat-herding and handwringing.

joshmanders · on Oct 22, 2023

Not OP, but I worked on a large microservices based system at a leading financial institute and when we needed to work on a single service, we had docker compose files that pulled the images needed for the dependency services of that service for us to develop what we needed. They all just ran in our local docker. If we wanted we could have a massive compose file with all services in it, but typically we only needed the iam service and a few other small ones depending on what services we were working on.

eddtries · on Oct 22, 2023

Q2 - techniques like contract testing help here, beyond simple stubs. Also, mocked services maintained by the services original devs you can work against help

dboreham · on Oct 23, 2023

What you are imagining happens (it's not pretty).

plonk · on Oct 22, 2023

Is there typically a “cold restart” plan to rebuild the whole infra from scratch? I’m thinking of things like circular dependencies when services boot.

nurple · on Oct 22, 2023

I've found that the eventual consistency provided by orchestrators like k8s solve this problem rather well. If the services are written using the right paradigms to handle such a situation, they will also be much more resilient to platform disruptions.

I like to view k8s a lot like erlang's OTP, if something isn't right with the state of a service, I advocate calling 'exit()' and letting the restart with exponential backoff handle the transient.

cma · on Oct 22, 2023

Nobody needs to if they can't possibly.

jameshart · on Oct 22, 2023

In general in a micro service environment, you try to build things so that 1) you don't need to run other things locally, and 2) if you did need to, the services are just containers so it's pretty easy to run one.

But you tend to try to write your service so that it treats everything else it depends on like a vendor-provided API. Like, if you were building a Slack bot, you wouldn't ask Slack to let you pull down and run a local copy of Slack's API to test against. You'd maybe set up a test account in Slack's production system, and run your local bot against that to test it before you deploy it with credentials to run against your real slack account.

In a microservice architecture, you integrate with other internal systems in the same way.

magicalhippo · on Oct 22, 2023

We're not big on microservices, but we do integrate with a lot of other systems.

I find the opaqueness of other services to reduce development speed quite drastically. With local code I can view both sides of the fence and easily see if I'm using it wrong or if it's a bug in my colleagues code.

Seems that if you're constantly developing against opaque services you'd end up in the same quagmire quite quickly?

gen220 · on Oct 23, 2023

This is exactly why the "microservices" pattern is usually adopted along with the "monorepo" pattern. IMO, it's a strong anti-pattern to have the former without the latter.

scarface_74 · on Oct 22, 2023

You shouldn’t care about anything outside of published contracts for dependencies. You’re slwags dependent on underlying APIs

magicalhippo · on Oct 22, 2023

Of course I shouldn't.

But when things don't work as I expect, it's far more efficient to be able to view the code on both sides, rather than only on my own side and try to guess what the other side is doing.

Besides the usual suspect of wrong understanding on my end leading to misuse, this can also be due to lacking or wrong documentation of the other system, or bugs in the other system due to unexpected inputs or similar.

Like just a few days ago we spent an unreasonable amount of time with an API of one of our customers, where we would get empty list back for some of our queries. Turned out something in their service crashed when handed national characters, despite accepting JSON and hence UTF-8 input and nothing in the documentation about English letters only. Rather than returning 400 or 500, the service returned 200 with an empty list, leading us to assume we did something wrong.

scarface_74 · on Oct 22, 2023

> But when things don't work as I expect, it's far more efficient to be able to view the code on both sides, rather than only on my own side and try to guess what the other side is doing.

Are you able to view the source code of your platform vendor? Everyone is at some level dependent on Black box APIs.

If you can document where with certain input you don’t get the expected output, you reach out to the team that is responsible for it whether internally or externally and they either explain it or they fix it.

This is the API service I’ve been working with over the past five+ years - three actually working at AWS (Professional Services).

https://boto3.amazonaws.com/v1/documentation/api/latest/inde...

I found a bug in one relatively new API that a service team released, I reached out to the team with a documented scenario and they fixed it.

Other times they explained what I was doing wrong. That’s what any large organization does.

I’ve worked with other vendors and internal teams plenty of times over the years.

maronato · on Oct 22, 2023

When you want to run a local version of a service, you start it locally as normal and configure either local stubs of dependencies and dependents, or you can hook up the actual “production” services to your local service process. If doing the latter, you can create special user accounts and events that change how the networking routing happens. Events from those fake users pass through regular production apps up until the service you’re testing, then they are instead routed to your local version, and you can continue the calls to other prod services.

Uber employees’ apps are special and allow us to log in as these fake users and create fake rides or deliveries, and then we can look at the traces and logs to debug and stuff.

aeyes · on Oct 22, 2023

Looks like they are shifting away from local development: https://www.uber.com/en-CL/blog/devpod-improving-developer-p...

fizx · on Oct 22, 2023

Something like https://tilt.dev/ where you spin up a subset of the service graph in a cloud environment that hot-reloads based on local edits.

mitjam · on Oct 22, 2023

Microservices allow development orgs to scale horizontally which enables businesses to expand to adjacent markets, faster.

dehrmann · on Oct 22, 2023

Static function calls are also API contracts.

jpalomaki · on Oct 22, 2023

There's an an interesting HN comment[1] from 2020 by former Uber engineer, which discusses the complexity a bit. It's more about UI, but the thread discusses the backend as well. In brief something that may look super simple for the user (like handling payments) is actually quite complicated when you cover all the market, different payment types etc. And all this carries to the backend as well.

[1] https://news.ycombinator.com/item?id=25376346

scarface_74 · on Oct 22, 2023

And also certain states and localities have different requirements for ride share. I noticed this in NYC and Seattle

lamontcg · on Oct 22, 2023

Do all of those need to be microservices or you could you instead have one monolithic payment service that handled all those use cases?

pc86 · on Oct 22, 2023

Of course you could, just like you could do this in 2000 less-well-defined microservices, or 8000 more finely-grained ones.

The question is what makes you think 1 service is immediately better than however many payment services there are now?

lamontcg · on Oct 23, 2023

All other things being equal, 1 service is obviously better than 4,000 services to maintain.

isbvhodnvemrwvn · on Oct 23, 2023

Not that obvious, how do you coordinate people from several of teams working on it?

MichaelZuo · on Oct 23, 2023

IBM managed to coordinate several hundred people on a single software product for decades on end.

Even Microsoft managed to do so for multiple products while also stack ranking the teams.

And I doubt there's a single service, even payments, that's as technically complex as Excel.

lamontcg · on Oct 23, 2023

Do you really have 4,000 teams working on payments?

And I'd agree with the other child comment that the monolith can always be broken into separate components which are owned by different teams.

scarface_74 · on Oct 22, 2023

And then every time you had a change, you would have to deploy everything and your surface of failure is greater.

What would a monolith buy you?

jgraettinger1 · on Oct 22, 2023

Not having to evolve or understand or staff 4,000 micro services.

An ability to easily change the boundaries of your conceptual components, because they WILL be wrong now or in the future.

scarface_74 · on Oct 22, 2023

You still have to understand your boundaries when you have a large monolith unless you have one big ball of mud.

Even with a well constructed monolith, you need to have well defined “services” with contractual interfaces.

You don’t have to understand 4000 services to make one change anymore than I need to understand the entire boto3 library when I am building on top of it.

https://boto3.amazonaws.com/v1/documentation/api/latest/inde...

You can’t just change your interface in a monolith either without breaking other parts of it.

a_vanderbilt · on Oct 22, 2023

Managing a numerically large set of services has its own challenges, but it pales in comparison to the complexity of a monolithic service serving the same functionality. As the other poster already pointed out, such a behemoth would be a nightmare to change at all. It would also be a scalability and and reliability nightmare. We migrated away from monoliths because they don't work in modern compute architectures.

folmar · on Oct 22, 2023

It might be easier if you have the same API for payment microservices, but each different implementation in a different service, so approximately 100 times less distinct APIs than microservices.

s3p · on Oct 22, 2023

And what service would that be?

threeseed · on Oct 22, 2023

Uber is a global company (70+ countries) operating Uber and Uber Eats.

So almost certainly they are duplicating their entire stack per-country if only to get around the vastly different regulatory environments.

Isthatablackgsd · on Oct 22, 2023

I'm guessing it have to do with payment processors. I remember reading an article while back about why Uber app are large (100+ MB) that most of it is related to payment processors and taxes that it is operating globally.

id00 · on Oct 22, 2023

No, that's not correct (mostly). Services are written in the way to support global operations.

But that scale introduces a lot of complexity so you can't just have "one service for onboarding drivers"

whatwhaaaaat · on Oct 22, 2023

Do you have knowledge of this?

I find this hard to believe given the regulations from some of the larger countries requiring, by law, customer data be processed in country.

joneholland · on Oct 22, 2023

Deployments != unique codebases.

whatwhaaaaat · on Oct 22, 2023

Try but read the full comment chain again to understand the context of our discussion.

joneholland · on Oct 22, 2023

I did. Your concern about data needing to be isolated due to regulation doesn’t require you to make a complete copy of the code.

whatwhaaaaat · on Oct 22, 2023

>“So almost certainly they are duplicating their entire stack per-country if only to get around the vastly different regulatory environments.”

Responding comment says no they are not and the services are built to handle global traffic.

I respond and say I doubt that due to on soil laws.

You can argue two regions with different configurations but the same code bases are different services but that’s not what we’re talking about here.

jakewins · on Oct 22, 2023

I re-read the chain and don’t follow your argument.

Do you mean that your original point was about deployments to begin with?

FWIW I work in a microservices shop for a global app in an extremely regulation heavy industry, and we run a single codebase per service, segregating regulator-imposed behaviour via flags to deployments

whatwhaaaaat · on Oct 22, 2023

were discussing if Uber has on soil deployments. Unless they are running afoul of on soil laws they likely do.

Your fwiw is exactly what we’re taking about here and I’d venture a guess nearly half this site works for some Corp with duck tape, hope, and micro services powering their junk. Me too!

id00 · on Oct 22, 2023

Yep. There are might be some services that are entirely geo specific but I haven't seen them.

The same microservice that deployed in multiple geos still counts as one service, so considered to be 1 out of 4000 in this case.

scarface_74 · on Oct 22, 2023

It’s not just countries. I’ve used Uber across the US and you can look on the receipt and see different regulations in play depending on the city

bastawhiz · on Oct 22, 2023

Uber has a really liberal definition of a micro service. Every web UI or dashboard is a service (of which there are many hundreds). Every application anyone builds across their many thousands of engineers is a service. It's rare, I think, for services to have fewer then a few thousand lines of code. In my experience, most companies would have a monolith that serves multiple UIs from the same service. Uber instead ships that monolith as a library which is a framework for building individual UIs. It has its pros and cons but I quite liked how they did it.

ninja3925 · on Oct 22, 2023

(Worked at Lyft) Our number of active micro services was small in comparison. 4,000 is likely a overblown number to highlight the accomplishment possibly counting inactive ones

drexlspivey · on Oct 22, 2023

Isn’t Lyft US only? Uber operates in 80+ countries

latchkey · on Oct 22, 2023

Worked at Grab. They had a ton of micro services. It was their way of partitioning databases so that they didn't have to deal with joins. Yes, it caused a lot of problems.

zht · on Oct 22, 2023

Lyft also doesn’t do food delivery…

justapassenger · on Oct 22, 2023

From experience working at big tech I’m willing to take a guess.

Maybe a couple of dozens will be actual more complex and meaningful services. Then few dozens more services that are somewhat more unique.

And then majority of the long tail will be mostly cookie cutter services, doing X, but for lots of different use cases, where each of use cases is separate deployment counting as a service (for example - systems to process streams of logs related to business logic).

lokar · on Oct 23, 2023

The same binary with a different configuration

talent_deprived · on Oct 22, 2023

I've seen at least one place with many more than that in recent years. If you have one microservice "listener" per queue and another for the database processing and persistence (business logic) and another providing an API for one or more frontend UI's related to it then the microservice tally goes up very fast. It's kind of surprising to read so many comments indicating HN readers weren't aware of this.

herval · on Oct 22, 2023

sounds like a massive nightmare

isbvhodnvemrwvn · on Oct 23, 2023

Most are likely limited to some subdomain, with limited communication between domains.

donutshop · on Oct 22, 2023

Massive scale*

whynotmaybe · on Oct 22, 2023

There's quite a sizing range between monolith and microservice.

If all their It needs are behind micro "micro" services, that figure is understandable.

Outside of the map, taxi, food, payments, onboarding, they also have monitoring, deployment, HR, billing, legal, taxes, internationalized stufd, and the usual "..." for what I'm missing.

If you just take a standard ERP, you could easily split it in dozens even hundreds of microservices.

bcrosby95 · on Oct 22, 2023

> If all their It needs are behind micro "micro" services, that figure is understandable.

I call them nano services.

whynotmaybe · on Oct 22, 2023

And you're pinpointing to the problem with such news.

Everybody knows what is a monolith but nobody really knows what is the size of a "micro" service.

Just for taxes, do you make one service for taxes or one for each recipient of taxes? (In the EU, is it one for each country, in US, one for each state + federal ) with a different team managing each service?

mirekrusin · on Oct 22, 2023

But how does your "..." sum up to 4000?

belter · on Oct 22, 2023

Apparently they started at 1000 and went from there...

"What I Wish I Had Known Before Scaling Uber to 1000 Services" - https://youtu.be/kb-m2fasdDY

speedgoose · on Oct 22, 2023

It reminds me this thread about Netflix, with insane amounts of events and logs compared to active users.

https://news.ycombinator.com/item?id=30635369

ilrwbwrkhv · on Oct 22, 2023

Yup Pornhub serves much more video than Netflix and they do so without that insane amount of complexity.

Roark66 · on Oct 22, 2023

Isn't pornhub free, their only monetization being ads? Also does Pornhub have personal recommendations, per country and region libraries, an android, ios and android TV apps? Probably not.

That is a much easier business model and a lot lower level of complexity than Netflix. I imagine running pornhub is essentially running a large website that hosts video. Probably just the billing side of Netflix is more complicated than the entirety of Pornhub operation.

Nextgrid · on Oct 22, 2023

PornHub has significantly more content than Netflix, not to mention the ability for users to upload content and have it immediately available worldwide. That alone makes it significantly more complex than Netflix.

x86x87 · on Oct 22, 2023

Pornhub is not about waving the engineering flag up and down to signal how cool their infra is though.

mattsan · on Oct 22, 2023

Honestly Pornhub's stack is genuinely impressive. More start-ups should just use PHP and get shit done

Nextgrid · on Oct 22, 2023

I think one of the reasons behind PornHub's tech stack is that their industry doesn't really lend itself to VC, so building an engineering playground for PH would be a waste of money as no amount of complexity would net them VC money (nor an invite to a cloud provider conference), where as most startups live and die based on the VC funding their complexity and buzzwords allow them to grift.

plaguuuuuu · on Oct 22, 2023

I think the takeaway was that choice of language might be less important than engineering strategy, because PH are successful despite their choice of stack :)

nvarsj · on Oct 22, 2023

Yup I remember that. Netflix seems to be the poster child for overengineered architecture - for something that is almost entirely commoditised nowadays (one-way over the internet video streaming).

Uber's problem space is significantly more complex than Netflix, so I'm unsure it's a fair comparison. But they do seem to have quite a lot of overengineering going on. At least that's how I feel each time I read an Uber tech article.

About the only companies which seem to justify their complex architectures are Google/Meta/Amazon imo.

dilyevsky · on Oct 22, 2023

> Uber's problem space is significantly more complex than Netflix

What makes you say this? Netflix serves probably several orders of magnitude more bytes and online video is hard. At its core Uber is basically a Passenger Service System and we had systems like these implemented in software since 1950s

isbvhodnvemrwvn · on Oct 23, 2023

There are a lot more use cases in taxi and food delivery space than in video streaming. At least by an order of magnitude. Consider various user personas for one, legal considerations and so on. Technically each use case might be less demanding than video streaming, but overall much more complex.

0xblinq · on Oct 22, 2023

What would be the engineers doing otherwise? You get bored if you don’t.

dkarl · on Oct 22, 2023

It's insane to hear attrition used as an excuse for architectural decisions, but I've seen it firsthand.

x86x87 · on Oct 22, 2023

So we're going to get to the point where everyone gets their own microsevice, right?

BigJono · on Oct 22, 2023

That is unironically a better way to do it than anywhere I've worked that does microservices.

My experience with microservice shops is you have one macromonolith with 50 people working on it (which has all the problems of a monolith and none of the benefits), 5 actual decent microservices with a team or individual that properly maintains them, and 100 random utility micro"services" that are like 3 lines of code, used by exactly one other service, and you need 40 loc and a network call to interact with them.

I'll take everyone has their own service any day of the week. At least when I need to interface with 12 different things I can have 12 different people to roast for not properly documenting their API. And tbh literally the only positive I can come up with for microservices is the ability to neatly fire one into the sun and rewrite it from scratch.

scarface_74 · on Oct 23, 2023

Isn’t that basically what happens when you split an API out to the different methods and write a Lambda for each one?

gtirloni · on Oct 22, 2023

Does it matter how they organize their services? Your experience and environment will be different in so many ways that I doubt it's comparable.

0xDEF · on Oct 22, 2023

Yes, there are specific business rules for each nation, region/state, and city.

barbazoo · on Oct 22, 2023

Maybe they meant instances.

nine_zeros · on Oct 22, 2023

How else would engineers demonstrate "impact" for promotions?

/s

mooreds · on Oct 22, 2023

I worked at a SV startup (series A) for a while and an EM once mentioned struggling to keep the number of microservices under the number of engineers.

aitchnyu · on Oct 23, 2023

Is there a compelling article about the ideal microservice to engineer ratio (ie less than 1.0)?

mooreds · on Oct 23, 2023

I don't know of one, but this thread has some interesting discussion: https://www.reddit.com/r/ExperiencedDevs/comments/x1p5gj/my_...

zrail · on Oct 22, 2023

You jest but also you're not wrong.

nine_zeros · on Oct 22, 2023

My own company has 800+ microservices. I am very familiar with the politics of microservices.

thakoppno · on Oct 22, 2023

My personal microservice fiefdom is about ten. My company probably has 1000. Is this ratio normal?

nine_zeros · on Oct 22, 2023

Any more than 1 microservice per engineer (as in, 7 microservices for a 7 engineer team) is too much for engineers to handle during on-call incidents.

If management values business SLAs that is.

wiseowise · on Oct 22, 2023

Dysfunctional organizations lead to absurd solutions to absurd problems.

deathanatos · on Oct 22, 2023

There's no way that number isn't fiction; Occam's razor say's its out of the range of believable. That's ~2 per eng according to Google. That's absurd. (That eng headcount is also a bit … high.)

This sounds like a figure from someone who sees a signle microservice running across 100 pods/instances, and counted that as 100 "microservices".

Game_Ender · on Oct 22, 2023

Uber invested heavily in tooling that makes creating and deploying a new service take about 30 minutes. This was before they invested in making it as easy to share code. If you combine that with fast hiring and a big pressure to ship, it makes sense to solve every problem with a new service that calls a few others.

scarface_74 · on Oct 22, 2023

S3 alone is built on top of 300 micro services. I don’t find it unbelievable that Uber needs a lot of them.

zo1 · on Oct 22, 2023

I find it highly disturbing that so far I've seen Zero Uber devs on this thread adding any sort of context/info or just confirmation. Wtf is this, the NSA? KGB? Can't they just list/dump the names of said 4000 services, or is that somehow some sort of secret-sauce?

scarface_74 · on Oct 22, 2023

A random engineer at any company is not going to divulged non public information about the inner working of their company without permission.

We had to sign something at AWS not to divulge internal tooling like what we used for our internal account factory that we used to create AWS accounts. Literally tens of thousands of people know what this tool is.

x86x87 · on Oct 22, 2023

you see someone telling you how many microservices s3 has right above.

scarface_74 · on Oct 22, 2023

Yes that “someone” was me - a former employee who couldn’t remember whether that information was public or whether it was something I was exposed to from the inside.

It in fact was public.

https://aws.amazon.com/blogs/storage/how-automated-reasoning....

I verified that before I posted that little tidbit.

locustmostest · on Oct 23, 2023

I couldn't find any explanation of where the data would be found. Are they splitting data across clouds, and constantly "porting" that data from cloud to cloud as part of their portability?

Orchestrating the application layer across clouds is interesting, but how does their data layer work?

fbnbr · on Oct 23, 2023

The title is misleading. I don’t see Mesos mentioned ones in the article.

I got so excited about reading for Mesos helping in the multi cloud world, potentially as the hypervisor for running k8s

xyst · on Oct 23, 2023

I dislike the Uber business itself (horrible treatment of drivers, poor customer service, poor safety controls, bullying of small businesses with Uber Eats, shitty executive level team with questionable ethics).

But the underlying technology which carried them to this point is a fascinating read.

abbadadda · on Oct 22, 2023

“Microservices” https://m.youtube.com/watch?v=y8OnoxKotPQ

opportune · on Oct 22, 2023

I believe the dollar amount savings figures, they’re big and worthy of a congratulations to the engineers involved!

IMO, engineering man hour savings are a lot less trustable. This may eliminate or simplify some engineering processes but IME massive migrations like this simply replace them with a different set of processes; because they’re different and theoretically addressable they’re not counted against the hours saved as they can be bucketed into bugs/to be addressed by the roadmap/legacy behavior migrated from the old system (which is now dangerously-fragile-legacy and not ol-reliable-legac). Eventually someone will come along and decide this too is an inherently flawed platform that needs to be entirely replaced at great expense, and the circle of life continues.

This is still a massive undertaking not just from an engineering perspective but from an organizational/process one though. Whoever pulled this off essentially had to coordinate (or figure out how to simplify/explain things well enough to skip coordination) with almost every engineer and likely almost every production service in a company with thousands of engineers. Those in startups may balk about this kind of thing taking two years, but having done my own two year projects (at a smaller but comparable scale) in a big company I can say two years is what I’d consider a highly optimistic and unlikely outcome for a project of this magnitude.

jvans · on Oct 23, 2023

> This may eliminate or simplify some engineering processes but IME massive migrations like this simply replace them with a different set of processes

Yes

> because they’re different

Now I have to learn an entire new set of tools/processes etc that are more useful to someone else but not helpful for me. The old one had its quirks but I knew it inside out and now the whole org has to re-learn how to do everything we did before.

lowbloodsugar · on Oct 22, 2023

Look forward to the future write -up of how a Zookeeper issue nuked their entire Mesos stack.

this_user · on Oct 22, 2023

For a company that is basically a taxi service, they seem to invest an awful lot in constant rebuilds of their extremely complex infrastructure, which raises the question of whether that is even remotely necessary or just an exercise in pretending that they are a tech company.

ttul · on Oct 22, 2023

“Basically a taxi service,” except that Uber spans hundreds of cities, coordinates millions of drivers - none of whom work on a fixed schedule - and its only interface with customers is an app that has to be fast, accurate, and reliable at all times.

matwood · on Oct 22, 2023

Even a single taxi event is complex. Tell Uber you want to go to the ATL airport and it'll ask which terminal. If you're catching one from the ATL airport, it'll map and walk you to the rather distant pick up spot. And we haven't even touched on payment yet...

bogota · on Oct 22, 2023

This is just such a bad take that it makes everything else you say after it null.

And google is just a search engine they only need like 20 engineers……………

mardifoufs · on Oct 22, 2023

They do food delivery, parcel courriers, regular ubers, plan ahead uber, grocery shopping, and a lot of other stuff. if anything this is simpler than most silo driven architectures you'd usually get with such a massively diversified business.

folmar · on Oct 22, 2023

To be fair all of those listed were also handled by taxis previously, just the process was more manual and more distributed, the dispatcher allocated a cab to the requester and maybe passed an initial message, and then it was directly between you and the driver.

gedy · on Oct 22, 2023

> "Basically a taxi service"

Not defending their tech stack, but I mean that is a lot of realtime data that needs to be accurate - this is not your typical SaaS crud app.

zht · on Oct 22, 2023

Just the taking payments and reconciliation in dozens of currencies and payment methods alone would be so kind numbing difficult and complex

zht · on Oct 22, 2023

I love these r/iamverysmart takes on HN.

Is this generally a sign of youthful wishful thinking or just plain hubris?

intunderflow · on Oct 23, 2023

Oh hey, this is the thing I work on.

We're giving a talk about this at KCD Denmark on the 14th of November "Keynote: Uber - Migrating 2 million CPU cores to Kubernetes" if anyone is in the area and has any particular interest in this.

kosolam · on Oct 22, 2023

Congrats to the UP team. The platform sounds good. I especially liked the Balancer component.

jiveturkey · on Oct 22, 2023

To save you the deep deep dive: on OCI and GCP.

mbrumlow · on Oct 22, 2023

In 3 years… “Uber saved cost by migrating their micro service to their own colo.” followed by “Uber simplified operations by migrating their micro service platform to a monolith”.

corney91 · on Oct 22, 2023

Might be a good guess, there's precedent of them changing fundamental techology in a similar timeframe...

2013: "Migrating Uber from MySQL to PostgreSQL"[1]

2016: "Why Uber Engineering Switched from Postgres to MySQL"[2]

[1] https://www.yumpu.com/en/document/view/53683323/migrating-ub...

[2] https://www.uber.com/en-GB/blog/postgres-to-mysql-migration/

liquidpele · on Oct 22, 2023

More like “we hired a new principal arch who drove a change they personally liked, and everything was better because it allowed a lot of time to fix tech debt”

politelemon · on Oct 22, 2023

In 5 years... "We've discovered a new paradigm for efficiently carving up and distributing computational units for our application. We call it, nanofunctions."

mirekrusin · on Oct 22, 2023

Later "Lowering cost and dramatically reducing complexity with nanofunctions running on monoecosystem".

baz00 · on Oct 22, 2023

Don't even start on that. It's no longer funny coming up with things like that for me.

I'm currently trying to get out of the industry because I'm drowning in architectural bullshit like this constantly. It is pedalled by snakes, bastards and wankers who care nothing for solving problems but want to create new ones.

malux85 · on Oct 22, 2023

It’s been happening since the dawn of time, fat client/thin client, static link/dynamic link, micro services/monolith, centralised/decentralised, it doesn’t just migrate from one to the other, the pendulum swings back and forwards, and will do for eternity.

You can be all angry about it, but being angry at the storm doesn’t affect the storm, it only affects you. A lot. Negatively.

The trick is to position yourself to maximally profit from the next trend swing, I’ve been doing it for 20 years now, if you can predict where the next place is gold will fall from the sky, then go and stand there, with a really big bucket.

I always found it strange that there is a certain type of intellect who is capable of accurate observation of reality, but incapable of execution (sometimes called “the disconnected intellect”), they can tell you exactly the problem, and the solution, but sit angry/frustrated that the observed world doesn’t match some imagined ideal in their head, and rather than adapt their internal model and be entrepreneurial enough to capture the value that generates, they bleet and complain while losing all opportunities - opportunities they can see! I can’t imagine being like this.

baz00 · on Oct 22, 2023

I have ridden these fads for the last 25 years (well actually longer - I had a different career first!) and made a fuck load of money out of it. But I am tired of it now. I really don't care any more.

I am fed up of solving the same problems again and again. It's more than just earning; there's intellectual dishonesty in this and it's tiring and demotivating.

malux85 · on Oct 22, 2023

Sounds like you need to move up maslow's hierarchy of needs and focus more on self actualization (or whatever if upwards for you) ... because your posts are full of negativity and you sound miserable.

baz00 · on Oct 22, 2023

Oh I'm bloody fantastically happy. I am merely cynical about the state of the industry as it stands.

I'm literally 18 months from packing up this shit and doing what I really want to do which involves nothing whatsoever to do with computers.

darkwater · on Oct 22, 2023

Good for you and congratulations! I hope to do the same some day, but I need at very least 10 more years (no, this is not an advice for FIRE enthusiasts to chip in with tips or help, thank you).

vb-8448 · on Oct 22, 2023

> The trick is to position yourself to maximally profit from the next trend swing, I’ve been doing it for 20 years now, if you can predict where the next place is gold will fall from the sky, then go and stand there, with a really big bucket.

Where the pendulum will be in 2030? Asking for a friend :D

baz00 · on Oct 22, 2023

I'm out of the market then so I'll drop what I think will be the situation:

Put everything on: cost savings, energy reduction, privacy, death of advertising.

Cost savings -> inefficient languages and architectures will die because the main datacentre currency is going to be performance/watt and that's going to cost serious money when transport infra is contending with DC power consumption. Things which are compiled and not interpreted will have a cost benefit then. Rust/C# (with AOT)/Go etc. Half these bloated piles of shit with expensive build toolchains will die too.

Energy reduction -> linked with above, energy usage reduction is going to be a big one. That means reducing workforce, simplification and efficiency are going to be key drivers. This may kill some ML approaches off that consume a lot of energy. So ARM etc.

Privacy -> Confidence in surveillance states and the cloud is declining so privacy first oriented services are going to have a huge uptick. Apple / standalone systems / new opportunities.

Death of advertising -> advertising is in the death throes with AI coming in as it decreased the signal-to-noise ratio. It becomes less effective so discovery rather than promotion will be the way to get attention for your product. Portals / landing pages / software catalogues.

Me I'd concentrate on cloud cost management and code efficiency and business efficiency as key areas to invest my time in.

brookst · on Oct 22, 2023

I’m 30+ years in and I don’t think there’s any bad faith. It’s just people who are new seeing all of the problems with the old implementation and thinking their way will be better.

uoaei · on Oct 22, 2023

That is bad faith. They're not trying to understand the systems that they feel they can take the responsibility to try to "fix".

taeric · on Oct 22, 2023

That isn't what bad faith typically covers?

Bad faith is easiest seen when they hold opposition to higher standards than they can hit. In particular, on purpose. It is a smoke and mirror dialogue that is not intended to make progress, but only to spend the other side's time.

In contrast, being wrong and or not fully grasping difficulties is normal. And sometimes, you get lucky and unexpectedly make progress