AWS Lambda, Performance, FinOps

Cutting auth p95 from 600 ms to under 120 ms

Right-sizing and ARM64 cutover on a Lambda authorizer that gated every request.

Client

Compliance SaaS, EU

Duration

2 weeks

Year

2026

The problem

Every API call hit a custom Lambda authorizer before reaching the business logic. The authorizer was provisioned at 256 MB on x86, so token validation, JWKS fetch, and tenant lookup ran on a starved CPU slice. p95 sat between 200 and 600 ms on cached paths and far worse on cold starts. The whole product felt slow even when downstream services responded in tens of milliseconds.

The solution

Profiled the authorizer end to end, then made two surgical changes. Raised memory from 256 MB to 1024 MB so CPU scaled with it, and cut over the function plus 27 siblings in the identity service to ARM64 Graviton 3. Added a warm cache for JWKS and tenant config keyed by a short TTL. Validated with synthetic load and a one week canary at 10 percent before full rollout. Rolled the same memory and architecture change across 99 Lambdas in 5 services with no functional code changes.

Architecture

Authorizer hot path, before and after

Stack

AWS LambdaARM64 Graviton 3API GatewayCognitoJWTCloudWatchX-Ray

Outcomes

from 600 ms to under 120 ms

Auth p95

28 in identity, 99 fleet wide

Functions migrated to ARM64

zero in production

Cold start regressions

Want something like this?

More case studies on the projects page.

See all projects