One word of caution about naively porting a regular web app to lambda: since you’re charged for duration, if your app does something like make an API call, you’re paying for duration while waiting on that API call for each request. If that API breaks and hangs for 30s, and you are used to it being (say) a 300ms round trip, your costs have 100xed.
So lambda pricing scales down to cheaper than a VPC, but it also scales up a lot faster ;)
That's one thing that's really nice about Cloudflare workers, you're billed on CPU time, not duration, so you can happily make slow requests to other services.
When a Linux process is waiting for a socket to have data ready to read (using a system call like select, poll, or epoll), that process is put into a sleep state by the kernel. It continues accruing “wall-clock time” (“wall time”) but stops accruing CPU time. When the kernel has data for the socket, the process starts running again.
The above tracking method works for containerized things. For virtualized things, it’s different: When a Linux system has nothing to do (all processes sleeping or waiting), the kernel puts the CPU to sleep. The kernel will eventually be woken by an interrupt from something, at which point things continue. In a virtualized environment, the hypervisor can measure how long the entire VM is running or sleeping.
Economically, Cloudflare can do this because they only need a V8 isolate per instance (so barely any RAM besides instantiated JS objects), whereas a Lambda needs RAM for the entire runtime environment. So many more customers can run on the same machine, even if many are waiting on I/O simultaneously.
At the most basic level, your process is in the D or S states rather than R state. Cloudlflare workers are not exactly Unix processes, but the same concept applies.
As @leef mentioned, you should transition between Lambda and <another offering> based on the current load.
I've published LambdaFlex as an Infrastructure as Code (IaC) template. It automatically scales and manages traffic between AWS Lambda and AWS Fargate [1].
This setup leverages the strengths of both services: rapid scaling, scaling down to zero, and cost-effectiveness.
Interesting. It makes me wonder about designing "lambda-client-friendly APIs".
To reduce the chances of being long-running, the api would be async, meaning: the client makes a request and gets a claim code result quickly, with the api host to process the request separately and then make a callback when done.
Lots of issues to consider.
Anyone have experience doing this? Would be great to hear about it.
I did this for a client using APIGW and DynamoDB. The claim check is the key, and then there’s a polling endpoint in APIGW that goes directly to DynamoDB with no Lambda in between. When the claim check is processed, the location of the data is an additional attribute in DynamoDB and the APIGW/DynamoDB integration sends an HTTP 302 with the final result URL.
In terms of how I have thought about them before, webhooks are similar, but not quite the same thing. I've really only dealt with webhooks when the response was delayed because of a need for human interaction, for example: sending a document for e-signature and then later receiving notifications when the document is viewed and then signed or rejected.
I haven't experienced much need to make REST APIs async, only in cases where the processing was generally always very slow for reasons that couldn't be optimized away. I haven't seen much advocacy for it either.
However, if we think about lambda-based clients, then it makes a lot more sense to provide async apis. Why have both sides paying for the duration, even if the duration is relatively short?
update:
Whether or not it's cheaper for the client depends on the granularity of the pricing of the client's provider. For AWS, it seems to be per second of duration, so async would be more expensive for APIs that execute quickly.
That is kind of the point of this repo. If your API implementation is too expensive for lambda move the same container to run on EKS/ECS fargate or ec2. The opposite holds as well - if you can save money by moving it from fargate to lambda then you can use the same container.
This is an interesting point. Hangs usually cost $ from user experience, with serverless they cost $ from compute. All the more reason to set strict deadlines on all API calls!
> All the more reason to set strict deadlines on all API calls!
I've seen this backfire in production though. E.g. suppose there was a 5 sec limit on an API call, but (perhaps unintentionally) that call was proportional to underlying data volumes or load. So, over time, that call time slowly creeps up. Then it gets to the 5 sec limit, and the call consistently times out. But some client process retries on failure. So, if there was no time limit, the call would complete in, say, 5.2 seconds. But now the call times out every time, so the client just continually retries. Even if the client does exponential backoff, the server is doing a ton of work (and you're getting charged a ton for lambda time), but the client never gets the data they need.
3s would rarely be acceptable, much less 30s. Not that it never happens, but it should be rare enough that the cost of the lambda isn't the main concern. (Or if it's not rare, your focus should be on fixing the issue, not really the cost of the lambdas.)
Anyway, I think you'd typically limit the lambda run time limit to something a lot shorter than 30 sec.
Used to be a hard quota of 29 seconds for APIGW, but within the last month AWS has updated that to a soft quota. You will have to trade off concurrent calls for the longer APIGW timeout though.
Well, the point is you want a short timeout if your lambda is hanging and you're being billed by the second. If you're not setting it explicitly when you deploy, you're more or less writing Amazon a blank check.
What is the pattern to use on Lambda if you actually have to call out other services, which may, sometimes, take a long time to answer?
Do you make requests with a shorter timeout and have your Lambda fail when it triggers?
Do you delegate long calls to a non-Lambda service?
In a case where a dependency has a consistent high latency and you can’t avoid it, I’d run the numbers and see if it’s more worthwhile to just run your app on ECS or something instead.
I find that running your app on ECS is generally a great way to just do away with a whole class of problems. I’ve never found lambda to do away with any of the issues I encounter on ECS.
I don’t feel great about lambda (outside of jobs processing and such) as it sometimes feels like you took your application and broke it into a thousand little footguns where a mistake or bad actor can tank your business, but if your company experiences very spikey workloads or not enough income to pay for 24/7 ECS and such, I can see the appeal.
If you have enougu volume to warrant it, probably re-architecting the lambdas so the before/after part of the calls is a separate invocation, with an adapter service living on ECS that gets a call from the “before” lambda, handles the remote request and response, the calls the “after” lambda with the response.
The simpler solution is often to just lift the app from lambda to ECS without internal changes
It seems like Lambda is not suited for such "hanging" use cases. To me the best use case for Lambda is for an API call that might be made very infrequently, like say 50 times a day. Doing the same with a VM instance (remember, it has to be HA, secure etc) is probably not worth the effort.
Not untrue, but also not true. It isn’t a Lambda problem, It’s the problem of running the same old, same old code on lambda.
Running clusters of VM/container/instances to “scale up” to transient demand but leaving largely idle (a very, very common outcome) then an event oriented rewrite is going to save a big hunk of money.
But yeah, if you keep every instance busy all the time, it is definitely cheaper than lambda busy all the time.
Sure, if you’re willing to rewrite for an event-based architecture you can avoid the problem, but the post is a framework designed for running on Lambda without that rewrite. One of the stated goals is to have the same container run on EC2 or Fargate.
The minimum cost of running a task on Fargate is under a dollar a month. If you’re a AWS company, you can get cheaper if you take the lock in (not that there’s much lock-in with Fargate)
> Running clusters of VM/container/instances to “scale up” to transient demand but leaving largely idle (a very, very common outcome) then an event oriented rewrite is going to save a big hunk of money.
If you're using k8s, HPA is handled for you. You just need to define policies/parameters. You're saying as if rewriting an entire app doesn't cost money.
* We write our HTTP services and package them in containers.
* We add the Lambda Web Adapter into the Dockerfile.
* We push the image to ECR.
* There's a hook lambda that creates a service on ECS/Fargate (the first-party Kubernetes equivalent on AWS) and a lambda.
* Both are prepped to receive traffic from the ALB, but only one of them is activated.
For services that make sense on lambda, they ALB routes traffic to the lambda, otherwise to the service.
The other comments here have more detailed arguments over which service would do better where, but the decision making tree is a bit like this:
* is this a service with very few invocations? Probably use lambda.
* is there constant load on this service? Probably use the service.
* if load is mixed or if there's are lot of idle time in the request handling flow, figure out the inflection point at which a service would be cheaper. And run that.
While we wish there was a fully automated way to do this, this is working well for us.
I guess the part I don't get is, if your traffic is that low, doesn't that mean you could run the service on a ~$4/month instance anyway? Even if we assume the lambda option is cheaper, it's only cheaper by pennies, basically. In exchange for taking on a bunch of added complexity.
Sort of, but two things add up - lots of little services (I don't like the microservices word), but we do make small services for the workloads that make sense to run and maintain independently.
The other is burst, we run a business where customers send us a couple hundred thousand things to do just once a week or once a day. And of course we don't know exactly when the orders will come in. Much easier to let the lambdas run than have large running servers waiting. Autoscaling is possible, of course, but laggy autoscaling won't work, and aggressive autoscaling is basically lambda.
Not the OP, but things can be infrequent and bursty, or infrequent and expensive, or infrequent and memory-intensive. In those cases (and many more I'm sure) lambda can make sense.
I like the idea. The main thing I'd want to adopt this design is a comparable ability to inject environment variable secrets into lambda that ECS offers.
It'd also be nice to have a better log aggregation strategy for lambda than scraping cloudwatch, but that feels less important.
We do have environment variable injection, that's mostly what the hook lambda does. Same envvars are injected into both the lambda and the ECS task definition.
That’s a choice, like using Kubernetes or being a micro service fundamentalist. A decade ago, you had people building in-house Hadoop clusters for tiny amounts of data instead of doing their jobs analyzing that data, and before that you had the J2EE types building elaborate multi server systems for what could have been simple apps.
This repeats constantly because the underlying problem is a broken social environment where people aren’t aligned with what their users really need and that’s rewarded.
I remember that, too. There were also just a lot of places where it’s like … your data is already in a SQL database, why not get better at reporting queries until you’re running so many of them that you don’t have an easy path to handle them? It was especially tragicomic in places where they were basically hoping not to need to hire analysts with technical skills, which is possibly the most MBA way ever to very expensively “save” money.
Does it, though, or are you just more familiar with traditional servers? There are a lot more things to manage in a traditional environment and things like tools are more primitive so you’ll have to invest a lot more time getting anywhere near equivalent reliability or security. That’s especially important if you have compliance requirements where you’ll be on the hook for things provided or greatly simplified by the cloud platform.
Here’s a simple one: if I need to say something accepts inbound 443 from 1.2.3.4, I create an AWS security group rule and will never need to touch that for at least a decade, probably longer. If I’m doing the same thing on premise, the starting cost is learning about something like Cisco hardware which costs a lot more to buy and I am now including in my maintenance work. Over a year, I will spend no time on my AWS rule but I will have to patch the physical switches and deal with primitive configuration tools for a complex networked device. If my organization cares about security, I will have to certify a ton of configuration and practice unrelated to packet filtering whereas I can point auditors at the cloud service’s documentation for everything on their side of the support agreement.
What about running code? If I deploy, say, Python Lambdas behind an API Gateway or CloudFront, they’ll run for years without any need to touch them. I don’t need to patch servers, schedule downtime, provision new servers for high load or turn them off when load is low, care that the EL7 generation servers need to be upgraded to EL8, nobody is thinking about load balancer or HTTPS certificate rotation, etc. What you’re paying for, and giving up customization for, is having someone else handle all of that.
In a cloud environment, I turn a few knobs and I can command an absurd amount of compute power, have it autoconfigured with all kinds of service discovery and whatnot. Things that would have taken weeks to months to build in a physical datacenter. Some people take that spare time to train as architecture astronauts I guess, and overcomplicate everything with all kinds of fiddly layers and microservices. I get it, it's fun to play with new toys. Doesn't mean the toys are bad, they're just not necessarily ready for production on day 1.
That's because it's easy to get started with one of the hundreds of services available in the cloud and before you know it your cloud project is using 50 of them that depend on each other.
Yep. I've also worked on "serverless" projects where similar or more time is spent on IaC (Terraform, etc.) compared to application development. All this effort for something that barely gets any requests.
One of the core value propositions of AWS is to take advantage of the various services to scale more easily with more linear costs than you would be able to otherwise.
Writing and maintaining code that sets up and configures S3, SQS, SNS, DynamoDB, and Lambda is a non-zero cost, but if it then reduces the amount of code I have to write and maintain to "write the application," that's often a good thing. These services are typically solving the hard and/or tedious parts of the application, so it's actually quite valuable to just focus on developing the core differentiating points of the application instead of solving / maintaining highly available object storage or message queues.
How long would it take to setup an event-driven python function ? I know aws lambda and terraform and, to me, it takes a couple of minutes at most.
Indeed, I had to learn it ! Just as I had to learn how to code in Python (which makes me far more effective now), just as I had to learn LVM2 or iproute2 or C or whatever.
Once the learning curve is processed, AWS allows you to increase your productivity. Of course, and this is very important, you shall still use the right tool for the right job !
For many systems, the application and the machine are so intertwined as to basically be the same thing.
A lot of web apps are just boring CRUD APIs. The business logic behind these is nothing special. The differentiating factor for a lot of them is how they scale, or how they integrate with other applications, or their performance, or how fast they are to iterate on, etc. The “machine” offers a lot of possibilities these days that can be taken advantage of, so customizing “the machine” to get what you need out of it is a big focus for many devs.
I feel this strongly. If the project in the OP is meant to run "off the shelf" servers like Express or Next, isn't this a way to reduce the amount of "programming the machine" that needs to be done? That's what I'm hoping.
Our approach to lambdas is to delegate all the real work to a "library" and just write a very thin adapter around it dealing with the Lambda details (event format etc.). Hopefully this can take care of as much of the wrapper as possible?
> What I mean by "programming the machine" is doing technical tasks that relates to making the cloud work in the way you want.
I take a different perspective.
When using a cloud, i need to design my application to work within my selected clouds constraints.
A lot of the extra work then comes down to if using a cloud, not just a single VPS on a cloud provider, you are dealing with distributed computing and have to deal with all the issues of distributed computing. Consensus, error handling, etc.
I agree though, more often than not it's a lot of extra work but mostly due to dealing with distributed architecture/designing for being cost effective.
On personal projects I find designing to be cost effective has me jumping through a lot of hoops and doing novel things. It means I can run multi-region apps with performance better than most webapps for cents/dollars. The equivalent apps I work on in my day job can be costing tens to hundreds of thousands of dollars as they don't jump through the hoops I do. "It's not worth it", "let's look at it later", "we'll just do this, it's good enough for now". As it's someone else's money i just think "meh" and move on as isn't my fight when the majority of people don't want to do it.
How does this compare to Fly.io which AFAIK wakes on requests in a very similar way with containers made into Firecracker VM's (unless using GPU, I think)?
I suppose Fly typically doesn't need to scale per request so has a typical cheaper max cost? I guess you'd just set the concurrency but how well that scales, I don't know.
What's the difference between using this vs a Lambda function with an API Gateway app fronting it? I've been doing the latter for many years with Serverless and it has worked very well. Or does this enable you to run your entire web application as a Lambda function? If so, how does it work around the 30s execution window?
This basically maps the incoming event to an ASGI request. You can create a typical web server and invoke it via function url and it will get mapped to your listening port
Help me understand how this could possibly result in a healthier SDLC. Great, it is possible to run the same container on all these services (and probably more given the usual quip about the 50 ways to run a container on AWS). And, I can see that being a tempting intellectual idea and even a practical “cheat” allowing understaffed orgs to keep running the same old code in new ways.
But why? The less it changes the the more vulnerable it is, the less compliant it is, the more expensive it is to operate (including insurance), the less it improves and add features, the less people like it.
Seems like a well-worn path to under performance as an organization.
> The less it changes the the more vulnerable it is, the less compliant it is, the more expensive it is to operate (including insurance), the less it improves and add features, the less people like it.
I don’t follow this. Complexity is what leads to vulnerabilities. Reducing complexity by reusing the same, known code is better for security and for compliance. As a security person, if a team came to me and said they were reusing the same container in both contexts rather than creating a new code base, I would say “hell yeah, less work for all of us”.
There are other reasons why using the same container in both contexts might not be great (see other comments in this thread), but security and compliance aren’t at the top of my list at all (at least not for the reasons you listed).
If the container has any dependencies, and they change, a lack of change quickly becomes an escalating liability. I’ve heard the “fixed is reliable” arguments for decades and have never seen anything that would justify the widespread failure to keep systems patched.
As the other commenter said, nobody is saying to make your application completely immutable. You still patch it, add features, etc. But now you only have to patch one container image rather than two (or more).
> The less it changes the the more vulnerable it is, the less compliant it is, the more expensive it is to operate (including insurance), the less it improves and add features, the less people like it.
All of that is true, but all of that cost is being paid regardless while the legacy system goes unmaintained. When the company decides to shut down a data center, the choice with legacy systems (especially niche ones) is often "lift and shift over to cloud" or "shut it down". Notably missing among the choices is "increase the maintenance budget".
All these hypotheticals aside, one actual bonus of shifting onto lambda is reduced attack surface. However crusty your app might be, you can't end up with a bitcoin miner on your server if there's not a permanent server to hack.
Abstracting the runtime environment is nice for mixed environments and supporting local development. Maybe some deployments have really low load and Lambda is dirt cheap whereas other instances have sustained load and it's cheaper to be able to swap infrastructure backend.
It doesn't have to be about life support for neglected code.
Slow as hell. Which is why everything we are doing is no longer in Lambda. Literally it was crippling and most people don't notice it in the noise of all the other problems that software complexity brings in. If you measure it, you end up going "oh shit so that's where all our money has gone!"
We have one service which has gone back to a Go program that runs locally using go run. It's then shipped in a container to ECR and then to EKS. The iteration cycle on a dev change is around 10 seconds and happens entirely on the engineer's laptop. A deployment takes around 30 seconds and happens entirely on the engineer's laptop. Apart from production, which takes 5 minutes and most of that is due to github actions being a pile of shite.
AWS codepipeline/codebuild still takes a really long time. think 1 minute to notice a commit, 2 minutes to download code, 3-8 minutes to get a codebuild runner, however long your build is, and 30 seconds to update the lambda.
Especially when you don't have a good local AWS runner, vs using a HTTP service you can run and live reload on your laptop.
deploying to prod in less than 10 seconds, or even /live reloading/ the AWS service would be awesome.
For fast iteration, our developers build, test and deploy locally using cdk deploy directly into their dev account, no need to wait around for the pipeline to do all those steps. Then when they’re ready, higher environments go through the pipeline.
CDK has a new(ish) feature called hotswap that also bypasses CloudFormation and makes deploys faster.
Pushing your image up to your registry, running `kubectl apply`, waiting for it to schedule your Pod and start your containers and exec'ing into them when shit breaks is WAY faster than zipping up your artifact and any layers that come with it, uploading it into S3, and re-deploying your Lambda functions (and any API Gateway apps associated with them), IMO.
When I use Serverless to do this, deploys take up to two minutes, which I sometimes do often because I have no real way of troubleshooting the application within Lambda (definitely necessary if you're using Lambda base images; less necessary if you're rolling your own; likely if you're integrating with API Gateway since documentation on the payloads sent from it is lacking IMO).
Scaling to 0 means me being able to deploy each branch/PR to a whole, fully separate environment without it costing too much. As soon as you do that everything from testing migrations (via database branching), running full end-to-end tests (including the full infra, db, backend, etc.), having proper previews for people to view when reviewing the changes (like for example the UX designer seeing the whole feature working including the backend/db changes before it going in) just falls into place.
If I don't scale to 0 I'd prefer to work on dedicated hardware, anything in-between just doesn't give me enough benefit.
You can get a similar effect with Azure App Service, which is basically a cloud hosted managed IIS web farm. A web app is “just” a folder and takes zero resources when idle.
I wouldn’t recommend Azure App Service. Container Apps / AKS / Container Instances / Static Web Apps are all fine services for different use cases, but with App Service you are buying into a pretty strange infra approach compared to traditional FaaS / PaaS / IaaS distinctions. In the past 100% of the time App Service ended up being too weird to debug and another Azure service was a better compute fit.
Lambda scales faster if you really do need that. For instance, imagine bursts of 100k requests. Cold start on Lambda is going to be lower than you can autoscale something else.
What actually happens is you hit the concurrency limit and return 99k throttling errors, or your lambdas will try to fire up 100k database connections and will again probably just give you 99k-100k errors. Meanwhile a 1 CPU core container would be done with your burst in a few seconds. What in the world needs to handle random bursts from 0 to 100k requests all at once though? I struggle to imagine anything.
Lambda might be a decent fit for bursty CPU-intensive work that doesn't need to do IO and can take advantage of multiple cores for a single request, which is not many web applications.
You'd obviously need to make sure the persistence pieces and downstream components can also handle this. You could dump items onto a queue or utilize key value like Dynamo
A 1 CPU container isn't going to handle that many "app" requests unless you have a trivial (almost no logic) or highly optimized (like c static web server) app
Lambda apps don't need to take advantage of multiple cores since you get a guaranteed fractional core per request
One example is retail fire sales. I interviewed with a company that had this exact use case. They produced anti bot software for limited product releases and needed to handle this load (they required their customers submit a request ahead of time to pre scale)
Also useful with telemetry systems where you might get a burst in logs or metrics and want to consume as fast as possible to avoid dropped data and buffering on the source but can dump to a queue for async processing
In my experience, real world applications mostly do deal with basically trivial requests (e.g. simple CRUD with some auth checks). The starting point I'm used to for multiple web frameworks is in the thousands of requests/second. Something like telemetry is definitely trivial, and should be easy to batch to get into the 10s of thousands of requests/second.
The 25 ms request you mentioned in another comment is what I'd categorize as extremely CPU intensive.
If you go from 0 to 100K legit requests in the same instant, any sane architecture will ramp up autoscaling, but not so instantly that it tries to serve every last one of them in that moment. Most of them get throttled. A well-behaved client will back off and retry, and a badly-behaved client can go pound sand. But reality is, if you crank the max instances knob to 100k and they all open DB connections, your DB had better handle 100k connections.
A sane architecture would be running your application on something that has at least the resources of a phone (e.g. 8+ GB RAM), in which case it should just buffer 100k connections without issue if that's what you need/want it to do. A sane application framework has connection pooling to the database built in, so the 100k requests would share ~16-32 connections and your developers never have to think about such things.
You need to multiple that out by request servicing time.
Say your application uses 25ms real CPU time per request. That's 40 reqs/sec/cpu core. On a 4 core server, that's 160reqs/sec. That's 625 seconds to clear that backlog assuming a linear rate (it's probably sub linear unless you have good load shedding).
So that's 10 minutes to service 100k requests in your example. I'm ignoring any persistent storage (DB) since that would exist with our without Lambda so that would need its own design/architecture.
I'd call pooling part of "the DB". "DB Layer" if you must, or the interface to it, whatever. Anyway, AWS has RDS Proxy, which held up pretty well against my (ad hoc and unscientific) load tests. But if you're actually trying to handle 100K DB requests in flight at once, your DB layer probably has some distributed architecture going on already.
If you're using RDS proxy, now you're not scaling to zero, and you still can't handle 100k burst requests because lambda can't do that. So why not use a normal application architecture which actually can handle bursts no problem and doesn't need a distributed database?
Lambda could be a compelling offering for many use cases if they made it so that you could set concurrency on each invocation. e.g. only spin up one invocation if you have fewer then 1k requests in flight, and let that invocation process them concurrently. But as long as it can only do 1 request at a time per invocation, it's just a vastly worse version of spinning up 1 process per request, which we moved away from because of how poorly it scales.
If your response time is 100ms, that's 100k requests in 1 minute.
Lambda runs your code in a VM that's kept hot so repeated invocations aren't launching processes. AWS is eating the cost of keeping the infra idle for you (arguably passing it on).
A normal application can scale to 10k concurrent requests as fast as they come in (i.e. a fraction of a second). Even at 16kB/request, that's a little over 160 MB of memory. That's the point: a socket and a couple objects per request scales way better than 1 process/request which scales way better than 1 VM per request, regardless of how hot you keep the VM.
Serving 10k concurrent connections/requests was an interesting problem in 1999. People were doing 10M on one server 10 years ago[0]. Lambda is traveling back in time 30 years.
The container is useful for running in locally in development. It ensures that you have the same runtime environment (and dependencies) both locally and on AWS.
To some extend. Lambda making a completely fresh environment for every invocation means you get away with a lot of bad coding that an ECS instance would never allow.
Why does a web server have significantly more cold start overhead than whatever is receiving requests and passing them to the handler in the classic lambda case?
A lambda lives for a relatively short period of time, ephemeral.
It only handles a single request at a time but will live long enough to serve multiple requests.
If another request comes in while one is in process a new lambda can spin up.
As a lambda lives for a period of time you can create a pool for that individual lambda. That pool lives for the life of the lambda.
Now for things like HTTP clients this is generally OK.
If you have a Postgres pool etc, it's not ok. Postgress doesn't like lots of connections and you will have POOL_SIZE * NUMBER_OF_ACTIVE_LAMBDA connections.
So unless you limit the number of Lambda's you have using reserved capacity Lambda's are simply not a good fit for running a PG pool locally inside the lambda for web stuff.
If you have a batch job, that runs once a day, with a single lambda, inserts a bunch of records in to postgres or similar, then a small pool in the lambda is fine.
So a lot depends on what you are doing in the lambda, how many you have, what your downstream service you are connecting to us.
As you might have guessed, connection pools don't work well with Lambda. For database connection pool, they came up with another chargeable service, RDS Proxy, to fix the problem created by Lambda :)
Function instances should stay alive for a certain amount of time. But you will pay the cold start price one way or another, so naively using Spring Boot is probably not a good idea. Spring Boot Native might help here, but I haven't tried it yet.
But is the web server spun up and kept up somewhere magically? Another poster mentioned something about an ECS task. The reason I am asking is I genuinely don't understand, I thought lambda as an abstraction was all about something which spins up and down when finished.
So lambda pricing scales down to cheaper than a VPC, but it also scales up a lot faster ;)