There are three core elements to the AWS Lambda invocation lifecycle: Lambda Service, Runtime and User Function. These are not all the elements, but the minimum required to execute application code. Understanding how this works makes it much easier to understand certain limitations of the service.
The lambda runtime API consists of three endpoints. One for requesting work, i.e. triggering the User Function, and two for registering a response, success or error.
Responsibilities of each component
Let’s first see what each element is responsible for.
Lambda service is basically a queue. It receives the invocation events, whether it’s from an API Gateway, an EventBridge schedule, or some other Event Source Mapping. When the first invocation is received, it creates the execution environment and spins up a Runtime to start processing the work in the queue. It also serves as an HTTP server with the three endpoints mentioned previously.
Runtime is the brain of this system, but it’s basically just a loop. It’s in charge of requesting work, i.e. an event from the Lambda Service, and waits for a reply for as long as it takes. In practice it performs long polling, which blocks the runtime thread. The runtime does not handle concurrency — it works on a single thread, one event at a time.
Once it receives an event, it executes the User Function and waits for the result. Depending on what the result was, it calls the relevant endpoint for success or error, and then requests more work. Rinse and repeat until the execution environment is reclaimed by the Lambda platform.
In short, the Lambda Service is responsible for orchestrating the creation of new execution environments, the runtime is responsible for orchestrating the invocations of the User Functions, and the User Function is the application code.
Why Lambda uses a pull model
You may have noticed I’ve emphasized that it’s the runtime that requests work rather than the work being scheduled or pushed to the runtime. It’s an important distinction as this means it is a pull-based system and the runtime decides when it’s ready to accept the next event. This has a couple of benefits:
- backpressure — a runtime can’t be overloaded. If it’s processing an event it won’t be flooded with other events. A push-based system would require more complicated flow control
- simpler scaling — the runtime does not have to handle concurrency. To scale the runtime, the Lambda service simply creates new execution environments i.e., scales horizontally
HTTP Runtime API
Communication with the Lambda Service happens over HTTP, but it is always initiated by the runtime. This means the runtime container only needs outbound network access. No inbound ports need to be exposed.
This simplifies container security, networking, firewall rules and sandboxing.
If AWS used a push model, the container would need a publicly reachable endpoint, which is not desirable in a multi-tenant environment.
This also makes it easy to implement new runtimes. All that is required is implementing a small HTTP client that interacts with a handful of endpoints rather than relying on messaging systems or custom binary protocols. HTTP keeps the barrier to entry low.
Cold vs warm invocations
Now it should be quite clear what causes cold and warm Lambdas.
A cold Lambda occurs when there is no execution environment when the invocation event reaches the Lambda Service and it has to create a new one. Once the runtime starts and processes the first message, it immediately performs a thread-blocking long polling request to the Lambda Service to get the next event/invocation.
From that point on, the runtime is essentially sitting there waiting for work. When the Lambda Service receives another invocation event, it doesn’t have to create a new execution environment and simply returns the event through the already open HTTP connection.
This is what we refer to as a warm Lambda — the execution environment already exists and the runtime is already waiting for the next event.
Runtime errors
When the runtime requests an invocation event it receives a Lambda-Runtime-Deadline-Ms header which specifies the timestamp by
which the response must be sent back to the Lambda Service. This is the Lambda timeout and it can be enforced both by the
runtime and by the Lambda platform. If the User Function takes too long to execute, the runtime enforces the deadline and sends
a timeout error response.
However, what happens when the runtime itself crashes? In this case the User Function crashes with it, as it’s part of the same process. Lambda receives this information immediately and doesn’t have to wait for the success/error message or timeout so it can cleanup the execution environment right away.
Lambda does not guarantee atomic execution. When there’s a crash or timeout, there are no guardrails preventing partial state updates or other side effects. Depending on the event source, the Lambda Service may retry the invocation, which means the User Function can run again after a crash or timeout.
That’s why it’s important to design Lambdas with idempotency in mind.
Summary
AWS Lambda invocation is driven by a simple pull-based loop between the runtime and the Lambda Service. The runtime continuously requests work using the Runtime API, executes the user function for each event, and reports the result back to the service. This model simplifies scaling, avoids concurrency inside the runtime, and explains why cold and warm invocations behave differently.