Most Well-Architected Review guides describe the six pillars in broad strokes. Very few give you the actual questions you'll face in the AWS console - which is exactly what you need if you're preparing for a review, running one, or trying to convince your team to do one.
In this guide, you'll find all 57 current review questions organized by pillar, the step-by-step process to run a review, and concrete CDK examples for fixing the most common High-Risk Issues (HRIs).
I run WAFRs as part of Landing Zone and security review engagements, so this comes from practice, not from paraphrasing the AWS docs. The GitHub Gist that still ranks on the first page for this topic dates from 2018 - it's missing the entire Sustainability pillar and reflects a framework version that has since been significantly updated. This checklist is based on the November 6, 2024 framework revision, updated with 78 new best practices in April 2025.
Jump directly to the checklist if you already know what a WAFR is.
What Is a Well-Architected Review?
A Well-Architected Review (WAFR) is a structured, blame-free conversation against a set of questions tied to AWS best practices. It is not an audit. There is no pass/fail. Teams take responsibility for the quality of their architecture, and the review surfaces what to address - prioritized by risk.
The framework was built from the experience of AWS Solutions Architects who have reviewed thousands of customer architectures. It's designed for CTOs, cloud architects, developers, and operations teams. The goal is to identify architectural and operational choices that might cause significant negative business impact before those choices cause an incident.
One concept from the official guidance worth internalizing: the distinction between one-way doors and two-way doors. A two-way door is a reversible decision - you can change it later without major consequences, so a lightweight review process is fine. A one-way door is a hard-to-reverse architectural choice (like your data model, your account structure, or your choice of managed database service). One-way doors deserve more inspection before you make them. Reviews applied at design time help you identify which decisions you're about to make that fall into that category.
The six cross-pillar design principles that apply regardless of which pillar you're reviewing:
- Stop guessing capacity needs - scale in and out automatically
- Test systems at production scale, then decommission test environments
- Automate with architectural experimentation in mind
- Build for evolutionary architectures - data-driven design changes are lower risk in the cloud
- Drive architecture decisions using data from how your workload actually behaves
- Improve through game days - simulate events in production to build organizational experience
Framework vs. Tool: What's the Difference?
Two things share the "Well-Architected" name and they work together, but they're not the same thing.
The Well-Architected Framework is the document - a whitepaper that defines the six pillars, the 57 questions, and the best practices. You can read the entire framework on the AWS docs site without touching any AWS service.
The AWS Well-Architected Tool is the free console service that administers the review. You define a workload, answer the framework questions in the console, and the Tool tracks your answers, flags HRIs and MRIs, and generates an improvement plan with AWS-recommended remediation steps. The AWS Well-Architected Tool is free - no additional charge beyond underlying AWS resource costs. I've seen teams assume the Tool has a licensing cost and put off reviews for that reason. It doesn't.
The Tool also supports lenses (specialist extensions for specific workload types), profiles, Jira sync, cross-account sharing via AWS Organizations, and Trusted Advisor integration (Business or Enterprise Support plan required for Trusted Advisor).
How Long Does a Well-Architected Review Take?
The review itself is designed to be hours, not days - the official AWS guidance is explicit on this. A full cycle including preparation, review sessions across all six pillars, HRI prioritization, and an initial improvement plan typically takes 2-4 weeks calendar time, but the actual working time is much less.
The practical approach: spread the six pillars across multiple shorter sessions rather than trying to do everything in one marathon meeting. Two or three 90-minute sessions work better than one six-hour block. The post-review process follows a rough timeline of day-1 recap email with findings, days 2-3 for HRI prioritization, and a 90 or 180-day improvement plan window starting in week 1.
The Business Case: Cost, Credits, and ROI
Before you start working through 57 questions, it's worth understanding what you're actually getting for the investment.
The Tool is free. That's the baseline. No software cost, no licensing. The only cost of a self-directed review is the time of the people doing it.
Partner-led reviews are often free for qualifying AWS customers. AWS Partners can deliver WAFRs at no cost as part of the AWS Partner program. But the most overlooked part of the business case: completing a before-and-after Well-Architected Review with an AWS Partner can unlock up to $5,000 in AWS service credits for qualifying customers through the AWS Well-Architected Partner Program. This is almost never mentioned in WAFR content, and it's a meaningful financial incentive that effectively pays you to do the review.
The risk-reduction case is harder to quantify but real. HRIs represent architectural choices that AWS has found to result in significant negative business impact. Identifying even one critical HRI - no Multi-AZ for a production database, no backup policy, public S3 buckets containing sensitive data - before it causes an incident is easily ROI-positive. I've seen teams where a single WAFR finding, fixed before go-live, prevented what would have been a serious data exposure.
Self-Assessment vs. Partner-Led Review
There are two ways to run a WAFR: on your own using the free AWS WA Tool, or with an AWS Partner. Here's when each makes sense:
| Factor | Self-Assessment | Partner-Led |
|---|---|---|
| Cost | Free | Often free for qualifying customers |
| Time | Your team's time | Shared between teams |
| Objectivity | Limited (you review your own work) | High (external perspective) |
| AWS credit eligibility | Not eligible | Eligible (up to $5,000) |
| Depth of findings | Depends on team knowledge | Consistent with framework expertise |
Self-assessment works well when: you have a team with strong AWS knowledge, you're doing an initial baseline review to understand your current state, or you're working through a smaller or less-critical workload.
Partner-led makes more sense when: you're preparing for a compliance audit, doing due diligence ahead of a fundraise or acquisition, after a security incident, or when the credit ROI justifies the engagement. An external reviewer will also catch things your team has normalized - the things you've lived with so long you've stopped seeing them as problems.
The Complete WAFR Checklist: All 57 Questions by Pillar
This is the artifact. All 57 current review questions from the November 2024 framework revision, organized by pillar and best-practice area. These are the exact questions the AWS Well-Architected Tool presents when you work through a review.
Each question has multiple best-practice choices in the Tool. When you can't check any of the choices for a given question, or when you select "None of these apply," the Tool flags the question as a risk. The questions themselves are the starting point - the choices under each question define what "good" looks like.
| Pillar | Questions |
|---|---|
| Operational Excellence (OPS) | 11 |
| Security (SEC) | 11 |
| Reliability (REL) | 13 |
| Performance Efficiency (PERF) | 5 |
| Cost Optimization (COST) | 11 |
| Sustainability (SUS) | 6 |
| Total | 57 |
A note on trade-offs: being more reliable often costs more. Optimizing cost may reduce sustainability investment. The framework doesn't expect you to maximize every pillar simultaneously - it expects you to make these trade-offs explicitly and document them.
Operational Excellence (OPS) - 11 Questions
Operational Excellence is a commitment to building software correctly while consistently delivering a great customer experience. The four best-practice areas are Organization, Prepare, Operate, and Evolve.
Design principles worth internalizing for this pillar: organize teams around business outcomes (not functions), implement observability for actionable insights (not just logs), make frequent small reversible changes (not big bang deployments), and learn from all operational events - including the ones that didn't cause incidents.
| ID | Area | Question |
|---|---|---|
| OPS 1 | Organization | How do you determine what your priorities are? |
| OPS 2 | Organization | How do you structure your organization to support your business outcomes? |
| OPS 3 | Organization | How does your organizational culture support your business outcomes? |
| OPS 4 | Prepare | How do you implement observability in your workload? |
| OPS 5 | Prepare | How do you reduce defects, ease remediation, and improve flow into production? |
| OPS 6 | Prepare | How do you mitigate deployment risks? |
| OPS 7 | Prepare | How do you know that you are ready to support a workload? |
| OPS 8 | Operate | How do you utilize workload observability in your organization? |
| OPS 9 | Operate | How do you understand the health of your operations? |
| OPS 10 | Operate | How do you manage workload and operations events? |
| OPS 11 | Evolve | How do you evolve operations? |
Source: AWS Well-Architected Framework - Operational Excellence appendix
Security (SEC) - 11 Questions
The Security pillar is about protecting data, systems, and assets while using cloud capabilities to improve your security posture. The seven best-practice areas are: Security foundations, Identity and access management, Detection, Infrastructure protection, Data protection, Incident response, and Application security.
The key design principles: implement a strong identity foundation (least privilege, eliminate long-term static credentials), maintain traceability (monitor, alert, and audit in real time), apply security at all layers (defense in depth from the edge of your network down to the application code), and prepare for security events before they happen - not after.
| ID | Area | Question |
|---|---|---|
| SEC 1 | Security foundations | How do you securely operate your workload? |
| SEC 2 | Identity and access management | How do you manage authentication for people and machines? |
| SEC 3 | Identity and access management | How do you manage permissions for people and machines? |
| SEC 4 | Detection | How do you detect and investigate security events? |
| SEC 5 | Infrastructure protection | How do you protect your network resources? |
| SEC 6 | Infrastructure protection | How do you protect your compute resources? |
| SEC 7 | Data protection | How do you classify your data? |
| SEC 8 | Data protection | How do you protect your data at rest? |
| SEC 9 | Data protection | How do you protect your data in transit? |
| SEC 10 | Incident response | How do you anticipate, respond to, and recover from incidents? |
| SEC 11 | Application security | How do you incorporate and validate the security properties of applications throughout the design, development, and deployment lifecycle? |
Source: AWS Well-Architected Framework - Security appendix
Reliability (REL) - 13 Questions
The Reliability pillar covers the ability of a workload to perform its intended function correctly and consistently - including the ability to operate and test the workload through its entire lifecycle. The four best-practice areas are Foundations, Workload architecture, Change management, and Failure management.
One important context for this pillar: Reliability received the most significant update of any pillar in the April 2025 framework refresh. Fourteen of its best practices were updated for the first time since major Framework improvements started in 2022, so if you're working from pre-2025 guidance, the best-practice choices in the Tool may look different from what you remember.
Design principles: automatically recover from failure (monitor KPIs and trigger automation when a threshold is breached), test recovery procedures (simulate different failure modes), scale horizontally instead of vertically (eliminate single points of failure), and manage all infrastructure changes through automation.
| ID | Area | Question |
|---|---|---|
| REL 1 | Foundations | How do you manage Service Quotas and constraints? |
| REL 2 | Foundations | How do you plan your network topology? |
| REL 3 | Workload architecture | How do you design your workload service architecture? |
| REL 4 | Workload architecture | How do you design interactions in a distributed system to prevent failures? |
| REL 5 | Workload architecture | How do you design interactions in a distributed system to mitigate or withstand failures? |
| REL 6 | Change management | How do you monitor workload resources? |
| REL 7 | Change management | How do you design your workload to adapt to changes in demand? |
| REL 8 | Change management | How do you implement change? |
| REL 9 | Failure management | How do you back up data? |
| REL 10 | Failure management | How do you use fault isolation to protect your workload? |
| REL 11 | Failure management | How do you design your workload to withstand component failures? |
| REL 12 | Failure management | How do you test reliability? |
| REL 13 | Failure management | How do you plan for disaster recovery (DR)? |
Source: AWS Well-Architected Framework - Reliability appendix
Performance Efficiency (PERF) - 5 Questions
The Performance Efficiency pillar is about using cloud resources efficiently to meet performance requirements, and maintaining that efficiency as demand changes and technologies evolve. Five best-practice areas: Architecture selection, Compute and hardware, Data management, Networking and content delivery, and Process and culture.
PERF is the most concise pillar at five questions, but each covers broad ground. The design principles are worth noting: democratize advanced technologies by consuming complex capabilities as managed services, go global in minutes by deploying across multiple regions, use serverless architectures to remove operational overhead, experiment more often using on-demand resources, and apply mechanical sympathy by understanding how your chosen cloud service is actually designed to work.
| ID | Area | Question |
|---|---|---|
| PERF 1 | Architecture selection | How do you select appropriate cloud resources and architecture for your workload? |
| PERF 2 | Compute and hardware | How do you select and use compute resources in your workload? |
| PERF 3 | Data management | How do you store, manage, and access data in your workload? |
| PERF 4 | Networking and content delivery | How do you select and configure networking resources in your workload? |
| PERF 5 | Process and culture | How do your organizational practices and culture contribute to performance efficiency in your workload? |
Source: AWS Well-Architected Framework - Performance Efficiency appendix
Cost Optimization (COST) - 11 Questions
The Cost Optimization pillar is about running systems to deliver business value at the lowest price point. Five best-practice areas: Cloud Financial Management, Expenditure and usage awareness, Cost-effective resources, Manage demand and supply resources, and Optimize over time.
Design principles: implement Cloud Financial Management as a capability (not just a dashboard), adopt a consumption model (pay only for what you use), measure overall efficiency (business output relative to cost), stop spending on undifferentiated heavy lifting by using managed services, and analyze and attribute expenditure to specific workloads and teams. For a deeper dive on cost findings from a WAFR, the AWS cost optimization checklist maps directly to this pillar.
| ID | Area | Question |
|---|---|---|
| COST 1 | Cloud Financial Management | How do you implement cloud financial management? |
| COST 2 | Expenditure and usage awareness | How do you govern usage? |
| COST 3 | Expenditure and usage awareness | How do you monitor your cost and usage? |
| COST 4 | Expenditure and usage awareness | How do you decommission resources? |
| COST 5 | Cost-effective resources | How do you evaluate cost when you select services? |
| COST 6 | Cost-effective resources | How do you meet cost targets when you select resource type, size and number? |
| COST 7 | Cost-effective resources | How do you use pricing models to reduce cost? |
| COST 8 | Cost-effective resources | How do you plan for data transfer charges? |
| COST 9 | Manage demand and supply resources | How do you manage demand, and supply resources? |
| COST 10 | Optimize over time | How do you evaluate new services? |
| COST 11 | Optimize over time | How do you evaluate the cost of effort? |
Source: AWS Well-Architected Framework - Cost Optimization appendix
Sustainability (SUS) - 6 Questions
Sustainability is the newest of the six pillars, added to the framework in 2021. It focuses on minimizing environmental impacts - particularly energy consumption and efficiency. This is the pillar that the 2018 GitHub Gist doesn't cover at all, and two of the current top-ranking third-party WAFR guides still omit it entirely.
Six best-practice areas: Region selection, Alignment to demand, Software and architecture, Data, Hardware and services, and Process and culture. Key design principles: understand your impact (measure workload output against total environmental cost), establish sustainability goals, maximize utilization by right-sizing and reducing idle resources, use managed services (sharing infrastructure across customers maximizes efficiency), and reduce downstream impact on the customers using your service. Two tools worth knowing for this pillar: the AWS Customer Carbon Footprint Tool for measuring your workload's emissions, and Graviton processors for energy-efficient compute across EC2 and managed services.
| ID | Area | Question |
|---|---|---|
| SUS 1 | Region selection | How do you select Regions for your workload? |
| SUS 2 | Alignment to demand | How do you align cloud resources to your demand? |
| SUS 3 | Software and architecture | How do you take advantage of software and architecture patterns to support your sustainability goals? |
| SUS 4 | Data | How do you take advantage of data management policies and patterns to support your sustainability goals? |
| SUS 5 | Hardware and services | How do you select and use cloud hardware and services in your architecture to support your sustainability goals? |
| SUS 6 | Process and culture | How do your organizational processes support your sustainability goals? |
Source: AWS Well-Architected Framework - Sustainability appendix
How to Run a Well-Architected Review: The 3-Phase Process
Now that you have the questions, here's how to use them. The following summary follows the official AWS review process documentation and the guidance in the WAFR user guide. The WAFR has three formal phases: Prepare, Review, and Improve. The whole thing is designed to be a conversation, not an audit - blame-free, collaborative, focused on understanding the current state of the architecture rather than assigning fault for its shortcomings.
The timing for when to run a review matters as much as how. Reviews are most valuable at three points: during the design phase before you make one-way-door decisions, before go-live, and after significant architectural changes. Teams doing continuous reviews update their answers as the architecture evolves, using milestones in the WA Tool to track improvement over time.
Phase 1 - Prepare
Preparation is where most teams underinvest. The three key elements:
1. Define the workload and scope. The review should target a specific system or service, not your entire AWS account. "All of our infrastructure" is not a useful scope. "The order management service and its dependencies" is. Set this boundary before you open the WA Tool.
2. Align people and culture. Identify who needs to be in the room - the people who understand the architecture decisions, not just the people who manage the account. Communicate the blame-free intent explicitly before you start. If people feel like they're being evaluated, they'll defend decisions instead of honestly assessing them.
3. Gather documentation and infrastructure context. Architecture diagrams, recent incident post-mortems, existing runbooks. Anyone coming in externally to the team (an AWS Partner or someone from another team) needs context. Create the workload in the WA Tool before the review session so you're not configuring the tool during the conversation.
Identify your business outcomes before you start. Common ones: reduce costs, improve security posture, improve customer satisfaction, improve environmental sustainability. These outcomes shape how you prioritize findings afterward.
Source: Preparing for a WAFR
Phase 2 - Run the Review
Work through the questions pillar by pillar in the AWS WA Tool. A few things from the official guidance that actually matter in practice:
Divide the discussion into parts. Six pillars in one sitting is too much. Two or three sessions of 90 minutes each works better. Start with Operational Excellence and Security if you're time-constrained - those are where the highest-density HRIs typically surface.
Take notes in the WA Tool. The notes box next to each question is for capturing context - why you answered the way you did, what compensating controls exist, what the team was uncertain about. Future reviewers (including you in 12 months) will need that context.
A "maybe" means "no." When the team says "we kind of do that" or "mostly," the honest answer is usually "no" in the current state. Don't round up. The improvement plan is where you address the gap.
Don't solve during the session. The goal of the review is to capture the current state accurately, not to brainstorm solutions. When a finding surfaces and someone jumps to "we could fix that by..." - capture it in the notes and move on. Solutions belong in the Improve phase.
Keep the discussion conversational. Paraphrase the questions rather than reading them verbatim from the console. The Tool is an administrative interface; the conversation is the actual review.
Source: Running a WAFR
Phase 3 - Improve
The review session is done. Here's what happens next:
Day 1: Send a recap email to everyone who was in the review. Include who attended, the key findings summary, and a timeline for next steps. Attach the improvement plan from the WA Tool.
Days 2-3: Run an HRI prioritization meeting. Go through every High Risk Issue and prioritize by three factors: severity of impact if the issue triggers an incident, effort to remediate, and which team owns the fix. This meeting produces a prioritized backlog, not a full remediation plan.
Week 1: Begin the improvement plan. The recommended duration is 90 days for a focused improvement sprint or 180 days for a longer program. Assign HRI ownership. For each priority HRI, the WA Tool links to AWS-recommended remediation guidance.
Ongoing: Build a cadence of follow-up meetings to review remediation progress. Save milestones in the WA Tool as you complete fixes - milestones let you compare before-and-after states and demonstrate improvement to stakeholders. Plan when you'll run the next review.
Source: Improving your workload
Prioritizing Findings: HRIs, MRIs, and What to Fix First
After the review, the AWS WA Tool classifies your findings into risk categories. The API returns five risk levels: UNANSWERED, HIGH, MEDIUM, NONE, and NOT_APPLICABLE.
High Risk Issues (HRIs) are architectural and operational choices that AWS has found might result in significant negative impact - affecting organizational operations, assets, and individuals. These are the ones that can cause an incident or a breach. Fix these first.
Medium Risk Issues (MRIs) are choices that might negatively impact business but to a lesser extent than HRIs. Important, but not emergency-level.
The prioritization question isn't just "what's HIGH?" - it's "which HRIs would cause the most damage if they triggered an incident today?" Sort by:
- Impact severity: A public S3 bucket with sensitive customer data is more urgent than an untagged EC2 instance, even if both show as HRI.
- Remediation effort: Quick wins - high-impact, low-effort fixes - should go first. Enable CloudTrail multi-region, enable backup retention on RDS, block public S3 access. These take minutes in the console or hours in code.
- Team ownership: Assign each HRI to a team before the prioritization meeting ends. Unassigned findings don't get fixed.
In most reviews I've run, these are the findings that surface most consistently in the Security and Reliability pillars. They're worth looking for before you even start:
- No Multi-AZ for production databases (REL 10, REL 11) - single point of failure for your most critical data
- No automated backup or untested backup restoration (REL 9) - you have backups but have never verified you can restore from them
- Root account without MFA (SEC 2) - the highest-impact security HRI, and still common. The AWS account best practices post covers root account hardening, including SCP-based enforcement
- Public S3 buckets containing sensitive data (SEC 7, SEC 8) - often created by developers during testing and never cleaned up
- CloudTrail not enabled in all regions (SEC 4) - you can't investigate an incident in a region you're not logging
- No auto-scaling for production workloads (REL 7) - traffic spikes that aren't handled by scaling become outages
- No cost anomaly detection (COST 3) - you find out about a runaway resource when the bill arrives
The AWS Well-Architected Tool's Trusted Advisor integration can supplement your manual review on some of these. It requires AWS Business or Enterprise Support and adds a "Trusted Advisor checks" tab next to each review question - surfacing findings marked "Action recommended" (red), "Investigation recommended" (yellow), or "No problems detected" (green).
For a deeper look at remediating Security pillar findings specifically, the AWS security review checklist maps each of the seven security best-practice areas to concrete remediation steps.
Turning WAFR Findings into IaC Remediation
Here's where most teams lose the value from a WAFR. They finish the review with a list of HRIs, fix them through the AWS console, close the Jira tickets, and declare victory. Six months later, a deployment reverts the configuration, or a new environment is created without the fix, because the change was never codified.
Console fixes drift. CDK fixes stick.
When you map each HRI to an IaC change - a commit to your CDK codebase - the fix is permanent, reviewable, and auditable. It deploys to every environment. It gets reviewed in code review. It doesn't disappear when someone creates a new stack.
This is why I recommend combining WAFRs with AWS CDK best practices and treating WAFR findings as a backlog of infrastructure debt to resolve in code, not in the console. The broader case for why console fixes drift is covered in depth if you need to make the argument to your team.
The WA Tool's Jira integration supports this workflow directly. You can set the sync mode to "Sync workload - Automatic" and questions will appear in Jira in the format [QuestionID] QuestionTitle (e.g., [REL 9] Back up data) with individual choices formatted as [QuestionID | ChoiceID] ChoiceTitle. Link each Jira ticket to the CDK commit that resolves it. Now you have a traceable chain from WAFR finding to deployed infrastructure.
Common HRIs and Their CDK Fixes
Here are five of the most frequently found HRIs and what the CDK fix looks like in TypeScript. These are working CDK v2 patterns.
REL 9: Back up data - Enable automated RDS backups with point-in-time recovery
import * as rds from 'aws-cdk-lib/aws-rds';
import { Duration } from 'aws-cdk-lib';
const database = new rds.DatabaseInstance(this, 'ProductionDatabase', {
// ...other config
backupRetention: Duration.days(7),
enablePerformanceInsights: true,
deletionProtection: true,
});
Without backupRetention set explicitly, CDK defaults to 1 day on RDS instances, which is almost always insufficient for production.
SEC 2: Manage authentication - Block root account usage via SCP
Enforcing MFA on root accounts requires an SCP applied at the organization root or relevant OU. Here's the policy JSON to attach via CDK using the Organizations L1 constructs:
import * as organizations from 'aws-cdk-lib/aws-organizations';
const denyRootScp = new organizations.CfnPolicy(this, 'DenyRootAccess', {
name: 'DenyRootAccountUsage',
type: 'SERVICE_CONTROL_POLICY',
content: JSON.stringify({
Version: '2012-10-17',
Statement: [{
Sid: 'DenyRootAccountUsage',
Effect: 'Deny',
Action: '*',
Resource: '*',
Condition: {
StringLike: {
'aws:PrincipalArn': 'arn:aws:iam::*:root'
}
}
}]
}),
targetIds: ['r-xxxx'] // replace with your org root ID
});
SEC 7/8: Protect data at rest - S3 bucket with encryption and blocked public access
import * as s3 from 'aws-cdk-lib/aws-s3';
const secureBucket = new s3.Bucket(this, 'SecureDataBucket', {
blockPublicAccess: s3.BlockPublicAccess.BLOCK_ALL,
encryption: s3.BucketEncryption.S3_MANAGED,
enforceSSL: true,
versioned: true,
serverAccessLogsPrefix: 'access-logs/',
});
Note that BLOCK_ALL blocks both ACL-based and policy-based public access. Don't use BLOCK_ACLS alone for sensitive data buckets.
SEC 4: Detect and investigate security events - Multi-region CloudTrail
import * as cloudtrail from 'aws-cdk-lib/aws-cloudtrail';
import * as s3 from 'aws-cdk-lib/aws-s3';
import * as logs from 'aws-cdk-lib/aws-logs';
const trailBucket = new s3.Bucket(this, 'CloudTrailBucket', {
blockPublicAccess: s3.BlockPublicAccess.BLOCK_ALL,
encryption: s3.BucketEncryption.S3_MANAGED,
enforceSSL: true,
});
const logGroup = new logs.LogGroup(this, 'CloudTrailLogs', {
retention: logs.RetentionDays.ONE_YEAR,
});
new cloudtrail.Trail(this, 'OrganizationTrail', {
bucket: trailBucket,
cloudWatchLogGroup: logGroup,
isMultiRegionTrail: true,
includeGlobalServiceEvents: true,
enableFileValidation: true,
});
isMultiRegionTrail: true is what makes the difference here. Without it, you only capture events in the region where the Trail is deployed.
COST 3: Monitor cost and usage - Cost Anomaly Detection
import * as ce from 'aws-cdk-lib/aws-ce';
import * as sns from 'aws-cdk-lib/aws-sns';
const alertTopic = new sns.Topic(this, 'CostAlertTopic');
const monitor = new ce.CfnAnomalyMonitor(this, 'CostAnomalyMonitor', {
monitorName: 'WorkloadCostMonitor',
monitorType: 'DIMENSIONAL',
monitorDimension: 'SERVICE',
});
new ce.CfnAnomalySubscription(this, 'CostAnomalySubscription', {
monitorArnList: [monitor.attrMonitorArn],
subscribers: [{
address: alertTopic.topicArn,
type: 'SNS',
}],
subscriptionName: 'CostAnomalyAlert',
threshold: 20,
frequency: 'DAILY',
});
Linking Findings to Architecture Decision Records
The fix is only half the work. The other half is documenting why the change was made - so the next person who touches that code understands it's not arbitrary.
Architecture Decision Records (ADRs) work well here. Title each ADR with the WAFR question ID it resolves: "ADR-012: Enable Multi-AZ for Production RDS [REL 10]." This connects the decision to its source.
The full audit trail then looks like: WAFR finding -> Jira ticket in format [REL 10] Fault isolation -> ADR documenting the decision -> CDK commit with the fix -> deployed infrastructure. If someone asks in six months why the RDS instance is Multi-AZ, the answer is traceable through the entire chain.
Well-Architected Lenses: Extending the Review
The base AWS Well-Architected Framework lens is applied to all workloads by default. But the Tool also supports lenses - specialist extensions that add questions and best practices specific to particular workload types. As of November 2025, there are 17 official lenses in the Lens Catalog, including three AI-focused lenses launched at re:Invent 2025.
Lens limits worth knowing: you can apply up to 20 lenses per workload (5 at a time), and create up to 15 custom lenses per AWS account. When you remove a lens from a workload, the data is retained and restored if you re-add it later - so don't worry about losing review progress if you need to swap lenses.
Two types of lenses:
- Lens Catalog lenses: Official lenses maintained by AWS, available to all accounts without any installation.
- Custom lenses: User-defined lenses you create with your own questions, best practices, and improvement guidance. Useful if your organization has internal architectural standards or regulatory requirements not covered by the official catalog. Sharable with other AWS accounts. Defined as JSON files up to 500 KB, supporting up to 10 pillars, 20 questions per pillar, and 15 choices per question.
The right approach is to apply lenses that match your workload - not all 17 to every review.
The New AI Lenses (2025)
Three AI-focused lenses launched at re:Invent 2025 on November 18, 2025. They're designed to work together across the full AI development lifecycle:
Responsible AI lens (new): Guides safe, fair, and secure AI development. Helps balance business needs with technical requirements and supports the transition from AI experimentation to production. If you're putting any AI model into production, this lens should be on your list.
Generative AI lens (updated): Added guidance specifically for Amazon SageMaker HyperPod users, new insights on Agentic AI workflows, and updated architectural scenarios for LLM-based architectures. If you're building with Bedrock or any LLM-based feature, this lens adds relevant questions the base framework doesn't cover.
Machine Learning lens (updated): Enhanced guidance for data and AI collaborative workflows, AI-assisted development capabilities, large-scale infrastructure provisioning, and customizable model deployment. Powered by SageMaker Unified Studio, Amazon Q, SageMaker HyperPod, and Bedrock.
Which Lens Applies to Your Workload?
| If your workload uses... | Apply this lens |
|---|---|
| Lambda, API Gateway, EventBridge | Serverless Applications |
| ECS, EKS, container builds | Container Build |
| Glue, Athena, Redshift, data pipelines | Data Analytics |
| Multi-tenant SaaS architecture | SaaS |
| Bedrock, LLM-based features, Agentic AI | Generative AI + Responsible AI |
| SageMaker, custom ML models | Machine Learning |
| Migrating workloads to AWS | Migration |
| Financial services workloads | Financial Services Industry |
| Healthcare workloads | Healthcare Industry |
| Government workloads | Government |
| DevOps-focused delivery | DevOps |
| SAP on AWS | SAP |
For organizations with internal architectural standards - compliance requirements, internal SLAs, specific security controls - custom lenses let you encode those standards into the Tool and review against them alongside the official framework.
How Often Should You Run a Well-Architected Review?
The framework guidance says reviews should happen at key milestones and that teams doing continuous reviews update their answers as the architecture evolves. In practice, "continuous review" is aspirational for most teams. A more realistic cadence framework:
Tier 1 workloads (production, customer-facing, regulated): Annual full review minimum, plus a quarterly checkpoint on HRI remediation progress. More frequent if the architecture is changing significantly.
Tier 2 workloads (internal tools, staging, non-critical services): Annual or biannual.
New workloads: Before go-live. This is where a review has the most impact - you can still influence the one-way-door decisions.
Beyond calendar-based cadence, certain events should trigger a review regardless of timing:
- Major architectural change (adding a new data store, moving to containers, going multi-region)
- Cost spike that doesn't have an obvious explanation
- Security incident - run a targeted Security pillar review immediately after the post-mortem
- Compliance audit preparation
- Pre-acquisition or fundraise due diligence
The WA Tool's milestone feature is useful here. Save a milestone snapshot when you complete a review or when you finish remediating a set of HRIs. Milestones let you compare your before-and-after state and show stakeholders concrete improvement over time - not just "we fixed some things," but a side-by-side comparison of HRI counts at different points in time.
Wrapping Up
The Well-Architected Review is 57 questions across 6 pillars. You now have all of them. The AWS WA Tool is free. The review is a conversation, not a test - the goal is finding HRIs before they cause incidents, not scoring a perfect review.
The business case is real: free tool, often-free partner reviews, and up to $5,000 in AWS credits for completing a before-and-after review with a partner. That's not a typical arrangement where reviewing your own infrastructure costs you money - it's the opposite.
Map your findings to IaC. Console fixes drift. CDK fixes stick.
If you're building with GenAI or ML, apply the three new lenses launched at re:Invent 2025 - the base framework doesn't cover agentic workflows or responsible AI considerations.
The practical next step: open the AWS Well-Architected Tool in your console, create a workload for your most critical production service, and work through the Operational Excellence and Security pillars first. Those two pillars consistently surface the highest-density HRIs in most environments, and tackling them gives you immediate, measurable risk reduction.
If you have findings you've been sitting on from a previous review, the IaC remediation section above gives you the CDK patterns to close them in code rather than in the console. What HRIs keep showing up in your reviews? Leave a comment - curious what surfaces most often for others.
Get a Professional AWS Well-Architected Review
Not sure if you're missing critical HRIs? I conduct Well-Architected Reviews as part of the AWS security review service - external perspective, documented findings, and a prioritized remediation plan you can act on immediately.