It actually is though. I don't need to build a custom upload client, I don't need to manage restart behavior, I get automatic restarts if any of the background workers fail, I have a dead letter queue built in to catch unusual failures, I can tie it all together with a common API that's a first class component of the system.
Working in the cloud forces you to address the hard problems first. If you actually take the time to do this everything else becomes _absurdly_ easy.
I want to write programs. I don't want to manage failures and fix bad data in the DB directly. I personally love the cloud and this separation of concerns.
For S3 you do need to generate a presigned URL, so you would have to add this logic there somewhere instead of "just having a generic HTTP upload endpoint".
Unless the solution is "don't have the problem in the first place" the cloud limitations are just getting in the way here.
The solution is to use the appropriate tool for the job. If you're locked in to highly crusty legacy software, it's inevitably going to require workarounds. There are good technical reasons why arbitrary-size single-part file uploads are now considered an anti-pattern. If you must support them, then don't be shocked if you wind up needing EC2 or other lower-level service as a point of ingress into your otherwise-serverless ecosystem.
If we want to treat the architectural peculiarities of GP's stack as an indictment of serverless in general, then we could just as well point to the limitations of running LAMP on a single machine as an indictment of servers in general (which obviously would be silly, since LAMP is still useful for some applications, as are bare metal servers).
We down play how trivial it is to generate a signed url, it’s only like a few lines and a function call to get but, you then have to send this to the client. The client has to then use this url, then check back with you to see if it arrived resulting in a kind of pea soup architecture unless your application is also entirely event driven. Oh how we get suckered in…
> Working in the cloud forces you to address the hard problems first.
It also forces you to address all the non-existent problems first, the ones you just wish you had like all the larger companies that genuinely have to deal with thousands of file upload per second.
And don't forget all the new infrastructure you added to do the job of just receiving the file in your app server and putting it into the place it was going to go anyway but via separate components that all always seem to end up with individual repositories, separate deployment pipelines, and that can't be effectively tested in isolation without going into their target environment.
And all the additional monitoring you need on each of the individual components that were added, particularly on those helpful background workers to make sure they're actually getting triggered (you won't know they're failing if they never got called in the first place due to misconfiguration).
And you're now likely locked into your upload system being directly coupled to your cloud vendor. Oh wait, you used Minio to provide a backend-agnostic intermediate layer? Great, that's another layer that needs managing.
Is a content delivery network better suited to handling concurrent file uploads from millions of concurrent users than your app server? I'd honestly hope so, that's what it's designed for. Was it necessary? I'd like to see the numbers first.
At the end of the day, every system design decision is a trade off and almost always involves some kind of additional complexity for some benefit. It might be worth the cost, but a lot of these system designs don't need this many moving parts to achieve the same results and this only serves to add complexity without solving a direct problem.
If you're actually that company, good for you and genuinely congratulations on the business success. The problem is that companies that don't currently and may never need that are being sold system designs that, while technically more than capable, are over-designed for the problem they're solving.
> You will have these problems. Not as often as the larger companies but to imagine that they simply don't exist is the opposite of sound engineering.
A lot of those failure mode examples seem well suited to client-side retries and appropriate rate limiting. If we're talking file uploads then sure, there absolutely are going to be cases where the benefits of having clients go to the third-party is more beneficial than costly (high variance in allowed upload size would be one to consider), but for simple upload cases I'm not so convinced that high-level client retries aren't something that would work.
> if they never got called in the first place due to misconfiguration
I find it hard to believe that having more components to monitor will ever be simpler than fewer. If we're being specific about vendors, the AWS console is IMHO the absolute worst place to go for a good centralized logging experience, so you almost certainly end up shipping your logs into a better centralized logging system that has more useful monitoring and visualisation features than CloudWatch and has the added benefit of not being the AWS console. The cost here? Financial, time, and complexity/moving parts for moving data from one to the other. Oh and don't forget to keep monitoring on the log shipping component too, that can also fail (and needs updates).
> The protocol provided by S3 is available through dozens of vendors.
It's become a de facto standard for sure, and is helpful for other vendors to re-implement it but at varying levels of compatibility.
> It only matters if it is of equivalent or lessor cost.
This is precisely the point, I'm saying that adding boxes in the system diagram is a guaranteed cost as much as a potential benefit.
> Yet you explicitly ignore these
I repeatedly mentioned things that to me count as complexity that should be considered. Additional moving parts/independent components, the associated monitoring required, repository sprawl, etc.
> No, I just read the documentation, and then built it.
I also just 'read the documention and built it', but other comments in the thread allude to vendor-specific training pushing for not only vendor-specific solutions (no surprise) but also the use of vendor-specific technology that maybe wasn't necessary for a reliable system. Why use a simple pull-based API with open standards when you can tie everything up in the world of proprietary vendor solutions that have their own common API?
It actually is though. I don't need to build a custom upload client, I don't need to manage restart behavior, I get automatic restarts if any of the background workers fail, I have a dead letter queue built in to catch unusual failures, I can tie it all together with a common API that's a first class component of the system.
Working in the cloud forces you to address the hard problems first. If you actually take the time to do this everything else becomes _absurdly_ easy.
I want to write programs. I don't want to manage failures and fix bad data in the DB directly. I personally love the cloud and this separation of concerns.