AWS Lambda, Performance, FinOps
Cutting auth p95 from 600 ms to under 120 ms
Right-sizing and ARM64 cutover on a Lambda authorizer that gated every request.
Client
Compliance SaaS, EU
Duration
2 weeks
Year
2026
The problem
Every API call hit a custom Lambda authorizer before reaching the business logic. The authorizer was provisioned at 256 MB on x86, so token validation, JWKS fetch, and tenant lookup ran on a starved CPU slice. p95 sat between 200 and 600 ms on cached paths and far worse on cold starts. The whole product felt slow even when downstream services responded in tens of milliseconds.
The solution
Profiled the authorizer end to end, then made two surgical changes. Raised memory from 256 MB to 1024 MB so CPU scaled with it, and cut over the function plus 27 siblings in the identity service to ARM64 Graviton 3. Added a warm cache for JWKS and tenant config keyed by a short TTL. Validated with synthetic load and a one week canary at 10 percent before full rollout. Rolled the same memory and architecture change across 99 Lambdas in 5 services with no functional code changes.
Architecture
Stack
Outcomes
from 600 ms to under 120 ms
Auth p95
28 in identity, 99 fleet wide
Functions migrated to ARM64
zero in production
Cold start regressions