Skip to content

Add Lambda rollout with CodeDeploy traffic shifting, smoke tests, and alarm-based rollbacks#6721

Open
anatolzak wants to merge 5 commits intoanomalyco:devfrom
anatolzak:feat-lambda-rollback
Open

Add Lambda rollout with CodeDeploy traffic shifting, smoke tests, and alarm-based rollbacks#6721
anatolzak wants to merge 5 commits intoanomalyco:devfrom
anatolzak:feat-lambda-rollback

Conversation

@anatolzak
Copy link
Copy Markdown
Contributor

@anatolzak anatolzak commented Apr 11, 2026

closes #6719
closes #6720
closes #6722

Summary

Adds rollout support to sst.aws.Function using AWS CodeDeploy for controlled traffic shifting on deploy. This enables smoke testing new versions before traffic reaches them, gradual canary/linear deployments, and automatic rollback on CloudWatch alarm triggers.

Demos

  • Smoke Test — Function URL
smoke.test.function.url.mp4
  • Smoke Test — HTTP API
smoke.test.api.gateway.http.api.mp4
  • Smoke Test — Router
smoke.test.router.mp4
  • Smoke Test — Direct Lambda Invoke Calls
smoke.test.lambda.sdk.mp4
  • Lambda Rollout (canary with alarms) —
canary.deployment.mp4

Changes

New components and SDK

  • FunctionRollout component (function-rollout.ts) — encapsulates all CodeDeploy resources (application, deployment group, deployment config, IAM role, deployment, waiter)
  • CodeDeployLambdaDeployment dynamic provider (codedeploy-lambda-deployment.ts) — creates CodeDeploy deployments via RPC
  • CodeDeployDeploymentWaiter dynamic provider (codedeploy-deployment-waiter.ts) — polls deployment status
  • rollout SDK (sdk/js/src/aws/rollout.ts) — rollout.handler() for typed lifecycle hook events, rollout.report() to report status back to CodeDeploy without needing the AWS SDK

Function component changes (function.ts)

  • New rollout property on FunctionArgs — supports all-at-once, canary, and linear strategies with before/after traffic hooks, alarms, SNS notifications, and conflict handling
  • addRollout() method — deferred rollout configuration for when hook functions need to reference the parent function
  • latestUrl — optional function URL pointing to the latest published version (via alias), separate from the stable url
  • targetArn / latestTargetArn — getters that resolve to the correct alias ARN for event source subscriptions
  • latestQualifier — exposed in getSSTLink for invoking the latest version via the Lambda SDK
  • Rollout transforms moved to FunctionArgs.transform.rollout (not inside rollout.transform)
  • latestAlias exposed in nodes getter

Event source compatibility

  • All subscriber/route types (Queue, Bucket, SnsTopic, Dynamo, KinesisStream, ApiGatewayV2, ApiGatewayV1, Realtime, Cron, CognitoUserPool, etc.) now accept Function instances in their type signatures. By default the components will use the stable alias by using the targetArn.
  • lambda.Permission — added qualifier to all permission resources across the codebase, including ssr-site.ts
  • Realtime — migrated from Function.fromDefinition to functionBuilder, uses targetArn for IoT authorizer

Go changes

  • aws-codedeploy.go — shared utilities with struct-based inputs: handleDeploymentConflict, stopDeployment, findActiveDeployment (with pagination), createDeployment, buildAppSpec
  • aws-codedeploy-lambda-deployment.go — Lambda-specific deployment handler that builds AppSpec and manages version diffing

Examples

  • aws-lambda-rollout — full canary rollout with CloudWatch alarms and SNS notifications
  • aws-lambda-smoke-test — before-traffic hook using Lambda SDK invoke
  • aws-lambda-smoke-test-function-url — before-traffic hook using function URL
  • aws-lambda-smoke-test-http-api — smoke test with API Gateway HTTP API
  • aws-lambda-smoke-test-router — smoke test with SST Router

Testing locally

The sst/aws/rollout SDK is new and not yet published. To test examples locally:

bun run setup
bun run build:platform
cd sdk/js && bun run build

# In each example directory:
bun install
rm -rf node_modules/sst && ln -s ../../../sdk/js node_modules/sst
go run ../../cmd/sst install
go run ../../cmd/sst deploy

Notes

  • Rollout only triggers when function code changes between deploys. During sst dev, the deployed code is a stub that never changes, so rollout is effectively a no-op. Examples set dev: false to demonstrate this.
  • When connecting a function with rollout to event sources, pass the function directly or use fn.targetArn. Using fn.arn bypasses rollout entirely.

Note on failed deployments

If a deployment fails (e.g., the before-traffic hook reports failure or an alarm triggers a rollback assuming wait: true), the sst deploy process will error out with a message pointing to the CodeDeploy deployment in the AWS console.

If you re-run sst deploy without changing any function code, the deployment waiter will detect the deployment ID hasn't changed and skip the wait — so the deploy will succeed cleanly without re-triggering the failed deployment. Once you push a code fix and deploy again, a new CodeDeploy deployment is created and validated normally.

@anatolzak
Copy link
Copy Markdown
Contributor Author

Hey @vimtor! I noticed you recently added the targetArn getter to the Function component so all event sources use the function version directly, that was a huge help for the rollout work I just put up.

The PR adds full CodeDeploy-managed rollout support to sst.aws.Function, canary, linear, and all-at-once deployment strategies with before/after traffic hooks for smoke testing, CloudWatch alarm-based automatic rollbacks, SNS notifications, and a new sst/aws/rollout for the before/after traffic hook status reporting SDK allows developers to avoid any interaction with the CodeDeploy SDK.

With rollout enabled, targetArn resolves to the stable alias ARN managed by CodeDeploy, so all event sources automatically invoke the validated version.

In the future, we could also support ECS services through CodeDeploy as well.

@vimtor
Copy link
Copy Markdown
Collaborator

vimtor commented Apr 13, 2026

thanks for your contribution @anatolzak

this looks fantastic, your pull requests are always top tier

i'm a bit worried about the scope for this one. the function component it's already very complex

i'm wondering what the common use case is here. probably doing a gradual rollout of the new function code? i'm not convinced the other things (sdk, custom alarms or function hooks)

i'm not super familiar with codedeploy but my guess is that in 99% of cases people just want to ensure that the new lamba doesn't error

what do you think?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

2 participants