Most teams come out of their first Well-Architected Review with a list of improvement items and no real sense of where to start. The best-practice codes in the Improvement Plan tab (SEC01-BP02, REL09-BP04, COST07-BP01) are not self-explanatory, and the AWS documentation behind each one can run to dozens of pages.
This article names the 10 findings that surface in the majority of production workload reviews, explains why each one appears, tells you how to detect it in your account, and points you at the official AWS documentation that walks you through the fix.
The April 2025 framework update added 78 new best practices (the Reliability Pillar was refreshed in its entirety). If your previous review used an older version, some finding codes may have changed.
For the full 57 questions behind these findings, the complete WAFR checklist covers every pillar. For how the review runs, see the Well-Architected Review process guide.
What "Findings" Actually Mean in a Well-Architected Review
A finding in a WAR is an unmet best practice. Not a bug, not a compliance violation, not a judgment on your team's competence. It is a gap between your current workload architecture and what AWS has identified as a foundational or enabling practice for that workload type.
The WA Tool classifies every unmet best practice as either an HRI (High Risk Issue) or an MRI (Medium Risk Issue). HRIs are foundational must-do practices; AWS has determined that skipping them may result in significant negative business impact affecting operations, assets, or individuals. MRIs are enabling practices: important, but the absence has more limited impact than an HRI.
Both appear in your Improvement Plan, HRIs first. Each item links directly to the AWS documentation page for that best practice. You can track remediation status (Not Started, In Progress, Complete, Risk Acknowledged) within the AWS Well-Architected Tool and generate a PDF report. If your team works in Jira, the WA Tool Jira connector (added April 2024) syncs improvement items as Epics, Tasks, and Sub-tasks.
One important caveat: HRI and MRI classifications are guidelines, not mandates. If there is a legitimate technical or business reason you cannot implement a best practice, the tool supports documenting that exception. In my experience, most teams have not yet evaluated the exceptions - they assume inapplicability before checking.
The 10 findings in this article come from the base Well-Architected Framework lens, which applies to every workload by default. Additional lenses (Serverless, SaaS, GenAI, and others) add workload-specific best practices on top of the base.
Quick-Reference Table: 10 Findings at a Glance
Use this table to cross-reference items in your own WAR improvement plan. Each finding maps to one of the six Well-Architected Framework pillars, with pillar and risk level in every row.
| # | Finding | Pillar | Risk | Best Practice Code |
|---|---|---|---|---|
| 1 | Root account not secured | Security | HRI | SEC01-BP02 |
| 2 | IAM users instead of federated identity | Security | HRI | SEC02-BP04 |
| 3 | Hardcoded secrets / no secrets management | Security | HRI | SEC03-BP01 |
| 4 | Encryption at rest not enforced | Security | HRI | SEC08-BP02 |
| 5 | No multi-region CloudTrail trail | Security | HRI | CloudTrail.1 |
| 6 | Backups untested / no recovery validation | Reliability | MRI | REL09-BP04 |
| 7 | No IaC / no CI/CD pipeline | Ops Excellence + Security | MRI + HRI | OPS05-BP04 + SEC11-BP06 |
| 8 | Service quotas not monitored | Reliability | HRI | REL01-BP01 |
| 9 | All workloads still on On-Demand pricing | Cost Optimization | HRI | COST07-BP01 |
| 10 | No Graviton evaluation | Perf + Sustainability | HRI / varies | SUS05 / PERF |
Let's look at each finding in detail, starting with the Security pillar, which consistently produces the highest concentration of HRIs.
Finding 1 - Root Account Not Secured
SEC01-BP02 (Security pillar, HRI). The root user is the one identity in your AWS account that cannot be fully constrained by IAM policies or SCPs. It has unrestricted access to every resource, which is why AWS classifies anything less than full root security as an HRI.
What triggers this finding: root access keys exist (even if never used), MFA is not enabled on root, or there is evidence root credentials were used for routine operations. GuardDuty surfaces a Policy:IAM/RootCredentialUsage finding type when root credentials are used, and that is often what the reviewer will look for when asking about your detective controls.
How to detect it: Open the IAM console and check root access key status. Run aws iam get-account-summary and look at AccountAccessKeysPresent. Check GuardDuty for any Policy:IAM/RootCredentialUsage findings in the past 90 days.
How to fix it:
- Delete root access keys. Do not rotate them - delete them. Rotation preserves the risk; deletion removes it.
- Enable MFA on the root user. Hardware MFA devices are recommended for payer accounts (up to 8 MFA devices are allowed per account, but enrolling more than one disables the lost-MFA recovery flow).
- In AWS Organizations with Control Tower: deploy the "Disallow Creation of Root Access Keys for the Root User" and "Disallow Actions as a Root User" SCPs on all member accounts. The AWS Organizations best practices guide covers these SCP patterns and the broader multi-account security model.
- Deploy a CloudWatch alarm via GuardDuty to alert on any future root credential use.
Quick win: Deleting root access keys and enabling MFA takes under 30 minutes. The Control Tower SCP is a single policy deployment. The SEC01-BP02 Secure account root user and properties guidance walks through every remediation step, and the IAM root user best practices cover MFA enrollment in detail.
If root is the highest-risk account identity, IAM users with long-lived access keys are the most common identity risk across regular workloads.
Finding 2 - IAM Users Still Active Instead of Federated Identity
SEC02-BP04 (Security pillar, HRI). IAM users with long-lived access keys are the most common credential theft vector in cloud environments. The access key does not expire, does not require MFA by default, and survives after a person leaves the organization unless someone manually revokes it. This best practice classifies them as an HRI not because IAM users cannot work, but because the lifecycle management problem is structural.
What triggers the finding: multiple IAM users with active access keys, no SSO or federation configured, identity lifecycle management is manual, or IAM users are used for human console access. Even if each individual access key has limited permissions, the aggregate risk across a fleet of long-lived credentials is what the reviewer is evaluating.
How to detect it: Generate an IAM credential report (aws iam generate-credential-report then aws iam get-credential-report). Review the access_key_1_last_used_date and password_last_used columns. Flag any access key unused for more than 90 days and any IAM user who represents a human identity rather than a service account.
How to fix it:
Deploy AWS IAM Identity Center and configure your corporate IdP (Okta, Entra ID, Google Workspace) via SAML 2.0 + SCIM for automated user provisioning. Create Permission Sets that map to your existing IAM roles. After federation is established, apply an SCP to the organization to deny iam:CreateUser to prevent new IAM users from being created. Then delete the existing IAM users.
The SEC02-BP04 Rely on a centralized identity provider guidance and the IAM Identity Center getting-started guide cover the federation, Permission Set, and SCIM configuration end to end.
Project scope: Multi-sprint. Identity Center federation requires planning around IdP configuration and role mapping, but Permission Sets can be deployed and tested incrementally while existing IAM users remain active during the transition.
IAM users store credentials inside AWS. Hardcoded secrets scattered through application code expose credentials before they ever reach AWS.
Finding 3 - Hardcoded Secrets and Long-Lived Credentials
SEC03-BP01 (Security pillar, HRI). Hard-coded secrets in application code, environment variables, or config files trigger an HRI here because the secret is visible to anyone with repository access and it never expires. Database passwords that were "temporarily" put in a .env file two years ago are still there.
Common locations the review will probe: environment variables in Lambda function configuration, .env files baked into AMIs, database passwords in application.properties, and API keys that appear in Git history even after deletion from the current branch.
How to detect it: Use Amazon CodeGuru Reviewer or Amazon Q to scan repositories for hardcoded credentials. Check AWS Secrets Manager - if it is empty relative to the number of databases and integrations in your workload, this finding will surface. An empty Secrets Manager is its own signal.
How to fix it:
Migrate secrets to AWS Secrets Manager (auto-rotation, KMS encryption, CloudTrail audit trail). For non-sensitive config that does not require rotation, use AWS Systems Manager Parameter Store with SecureString. Reference secrets by ARN from Lambda, ECS, and RDS - never by value.
The AWS Secrets Manager user guide covers automatic rotation, cross-account access, and the SDK retrieval patterns. The Parameter Store SecureString documentation explains the KMS-backed encryption tier for less-sensitive configuration.
Quick win for new workloads. For existing workloads with scattered secrets, plan as a per-application migration project.
Securing secrets protects access to your data. If the data stores themselves are unencrypted, a stolen credential gives an attacker everything.
Finding 4 - Encryption at Rest Not Enforced
SEC08-BP02 (Security pillar, HRI). This best practice flags any EBS volume, S3 bucket, or RDS instance that lacks encryption at rest. It is an HRI because unencrypted storage is the difference between a credential theft incident and a full data breach: an attacker who reaches unencrypted storage retrieves the data directly.
What the WA Tool checks: EBS volumes without encryption, S3 buckets without default server-side encryption, RDS instances created without StorageEncrypted: true, and S3 account-level Block Public Access not enabled.
One critical RDS detail: you cannot enable encryption on an existing RDS instance after creation. The remediation requires taking a snapshot, restoring to a new encrypted instance, and redirecting traffic. Plan this with a maintenance window.
How to detect it: Enable AWS Config Rules: encrypted-volumes for EBS, rds-storage-encrypted for RDS, s3-default-encryption-kms for S3. Security Hub controls S3.1 and S3.2 will also appear alongside this finding.
How to fix it:
Enable EBS encryption by default at the account and Region level (one API call, applies to all new volumes). Set S3 default encryption per bucket. Deploy AWS Config conformance packs to detect future drift.
The SEC08-BP02 Enforce encryption at rest guidance, the EBS encryption-by-default documentation, and the RDS encryption documentation cover the workload-level expectations, per-service settings, and the snapshot-and-restore migration path for existing unencrypted RDS instances.
Scope: Quick win for new resources. Existing unencrypted RDS instances require a planned snapshot-and-restore migration.
Without a complete API audit trail, you cannot tell whether that data was accessed or by whom.
Finding 5 - No Multi-Region CloudTrail Trail
CloudTrail.1 (Security Hub control, Security pillar, HRI). AWS CloudTrail records 90 days of event history by default. Without a trail configured, that 90-day window is all you have. No ongoing log for incident response weeks later, no file integrity validation, no persistent audit record for compliance.
Security Hub CSPM control CloudTrail.1 classifies the absence of a multi-region trail as a Critical severity control violation. The WA Tool surfaces this under SEC04 (How do you detect and investigate security events?). A single-region trail misses global services like IAM - you need a multi-region trail to capture those events.
How to detect it: Run aws cloudtrail describe-trails. Check that IsMultiRegionTrail: true and LogFileValidationEnabled: true are set.
How to fix it:
Create one multi-region trail capturing both read and write management events. Enable log file integrity validation. Encrypt logs at rest (satisfies CloudTrail.2 in Security Hub). In a multi-account setup, send logs to a centralized S3 bucket in a dedicated logging account.
The CloudTrail Creating a trail guide walks through the multi-region setting, and the organization trail documentation covers the centralized logging pattern for AWS Organizations.
Quick win: Creating a trail takes minutes. This finding touches both CloudTrail.1 and CloudTrail.2 in Security Hub, so closing it removes two control violations simultaneously.
The first five findings all sit within the Security pillar. The Security pillar deep-dive and the most common AWS security misconfigurations post both complement this section if you want more depth. The next two findings move into Reliability, where the most common failures are not in what you built but in whether you have ever tested that it works.
Finding 6 - Backups Exist But Recovery Has Never Been Tested
REL09-BP04 (Reliability pillar, MRI). This is one of the most psychologically uncomfortable findings in a WAR. The best practice does not check whether backups are configured. It checks whether recovery has been validated: whether the team has actually restored from a backup to a test environment, verified data integrity, and confirmed the recovery completed within the defined RTO and RPO.
Having AWS Backup configured satisfies the backup question. Having never run a restore test does not satisfy the verification question. The reviewer will ask: when did you last restore from a production backup? What was the result? How long did it take? "We haven't tried" produces this finding reliably.
Common anti-patterns that trigger it: restoring backups without a written runbook, not verifying data format or checksum on restore, and not measuring actual recovery time against the RTO target.
How to detect it: Review AWS Backup job history. If you see backup jobs but no restore jobs in the past quarter, this finding will surface. Check whether RTO and RPO are formally defined for each data tier - if not documented, that is its own finding.
How to fix it:
Define RTO and RPO for each data tier. Configure AWS Backup for centralized policy-driven backups across EC2, EBS, RDS, DynamoDB, EFS, and S3. Enable DynamoDB Point-in-Time Recovery (PITR). Use AWS Elastic Disaster Recovery for EBS continual replication with drill capability.
Then schedule a quarterly restore test: restore to a named test environment, verify data integrity (checksums, row counts, format validation), and document the actual time taken against your RTO. Use AWS Resilience Hub to assess whether your architecture can meet its RTO/RPO targets before an incident shows you it cannot.
The REL09-BP04 Perform periodic recovery of the data guidance has the full restore-test checklist, and the AWS Backup developer guide covers the policy-driven backup and RTO/RPO assessment workflows.
Scope: Enabling AWS Backup and PITR is a quick win. The quarterly restore drill is an operational process, not a code change, but it is the part the reviewer is actually checking for.
Tested recovery protects you from data loss. Infrastructure as Code protects you from configuration drift, and its absence triggers findings in two separate pillars simultaneously.
Finding 7 - No Infrastructure as Code and No CI/CD Pipeline
OPS05-BP04 + SEC11-BP06 (Operational Excellence and Security pillars, MRI + HRI). This finding is unique because it surfaces in two pillars at different risk levels. OPS05-BP04 (use build and deployment management systems) classifies console-driven deployments as an MRI under Operational Excellence. SEC11-BP06 (deploy software programmatically) classifies persistent human write access to production environments as an HRI under Security. If your team pushes changes to production via the console or SSH, the Security pillar generates the more severe finding.
The practical implication: fixing this one improvement item closes two finding codes from your plan.
What triggers it: infrastructure changes made through the AWS console without version-controlled templates, no automated deployment pipeline, persistent IAM user or role permissions allowing engineers to directly modify production resources, and no separation between staging and production deployment paths.
Why the reviewer cares: manual deployments produce untracked changes. An environment built by hand cannot be reliably reproduced, consistently security-reviewed, or recovered from accurately after a failure. The reviewer will ask to see a deployment pipeline, a version-controlled repository, and evidence that production changes flow through that pipeline.
How to detect it: Ask whether any production resources were last modified through the console. Check AWS Config configuration history for resources with no CloudFormation stack association. If the team cannot answer "what changed in production last Thursday and who approved it," this finding surfaces.
How to fix it:
Adopt an Infrastructure as Code tool that fits your team — AWS CloudFormation, AWS CDK, or Terraform all close both finding codes. Store infrastructure code in Git with pull-request-based review. Build a CI/CD pipeline (AWS CodePipeline with CodeBuild and CodeDeploy for AWS-native teams, or GitHub Actions with OIDC for GitHub-based teams). Remove persistent human write access to production and route all changes through the pipeline. Use multiple environments (dev, staging, production) with configuration externalized per environment.
The OPS05-BP04 Use build and deployment management systems guidance and the SEC11-BP06 Deploy software programmatically guidance cover the deployment-pipeline expectations. If your team already runs CDK, the CDK Review service surfaces anti-patterns and missing guardrails before they become new WAR findings.
Project scope: Multi-sprint. Start by codifying one workload; migrate remaining workloads incrementally. The Security HRI closes as soon as persistent production write access is removed.
Service quotas management prevents the capacity failures that happen when AWS limits meet your failover traffic.
Finding 8 - Service Quotas Not Monitored
REL01-BP01 (Reliability pillar, HRI). Every AWS service has limits on what you can create or consume. Most teams do not think about quotas until they hit them, typically at the worst possible moment: during a failover event when new resources need to be provisioned quickly.
The best practice classifies unawareness of service quotas as an HRI because hitting a quota during failover is a well-documented cause of extended outages. A workload designed for Multi-AZ may fail to launch replacement resources in the target AZ if that AZ's quota for EC2 instances, ENIs, or Lambda concurrent executions is already at the limit. Service Quotas covers over 250 AWS services with approximately 3,000 quota names per Region.
"Quota drift" - where you request an increase in your primary Region but forget to apply the same increase to your DR Region - is the most common triggering pattern.
How to detect it: Open the Service Quotas console and review usage versus limit for your highest-volume services (EC2, Lambda, EBS, RDS, VPC). Enable AWS Trusted Advisor quota checks. Set up CloudWatch alarms on the AWS/Usage metric namespace for services approaching their limits.
How to fix it: Audit quotas in all active Regions. Maintain a 15% buffer between current peak usage and the quota limit, since inaccessible resources may still count against quotas during a disruption. Use quota request templates for consistent multi-Region management.
The Service Quotas user guide covers quota request templates, the Service Quotas API, and the integration with AWS Trusted Advisor.
Scope: Discovering and monitoring quotas is a quick win. Applying increases consistently across Regions is an operational task worth scheduling quarterly.
Now for the most financially impactful finding in the Cost Optimization pillar.
Finding 9 - All Workloads Still Running On-Demand
COST07-BP01 (Cost Optimization pillar, HRI). This best practice requires a pricing model analysis for every workload component. Running everything On-Demand is appropriate for variable workloads, but running predictable, steady-state workloads On-Demand when Savings Plans would reduce that cost by 40-66% is what the WA Tool classifies as an HRI.
The pricing model decision tree: Compute Savings Plans save up to 66% for EC2, Lambda, and Fargate (most flexible - applies across instance families and Regions). EC2 Instance Savings Plans save up to 72% but are locked to a specific instance family and Region. For RDS, Redshift, ElastiCache, and OpenSearch: Reserved Instances still apply, as Savings Plans do not cover these services. For fault-tolerant batch workloads: Spot Instances at up to 90% off On-Demand. For dev/test: stop/start scheduling saves up to 75% by running 40 hours per week instead of 168.
What the reviewer looks for is not just the conclusion but evidence the analysis was performed. AWS Cost Explorer's SP/RI recommendation report is the natural artifact.
How to detect it: Open AWS Cost Explorer. If the On-Demand spend line is flat and high for continuously running services with no active Savings Plans commitments, this finding will surface.
How to fix it: Run the Cost Explorer SP/RI recommendation report. Start with Compute Savings Plans for maximum flexibility. Use 1-year terms for the first commitment before moving to 3-year terms. The COST07-BP01 pricing model analysis guidance has the full service-by-service decision tree.
Quick win: Purchasing a Savings Plan takes five minutes in Cost Explorer once the analysis is done. For cost levers beyond pricing models, the AWS cost optimization guide covers rightsizing, orphaned resources, and FinOps practices.
Finding 10 - No Graviton Evaluation
PERF03 + SUS05 (Performance Efficiency and Sustainability pillars, HRI for PERF03). Performance and sustainability both benefit from Graviton, and the WAR reviewer will ask whether the team has evaluated ARM64 compatibility. "Not applicable because we run Windows workloads" is a perfectly acceptable answer. "We haven't looked at it" is not.
What Graviton delivers: Graviton3 processors offer up to 40% better price performance than comparable x86 instances and are 60% more energy efficient. Cost savings of up to 45% over equivalent x86 instances. For Lambda functions, switching the runtime architecture to arm64 applies the same savings and performance benefits with no code changes in most supported runtimes (Python, Java, Node.js, Go).
Why most teams have not done this: they assume all software must be recompiled for ARM64, which is not true for interpreted runtimes and container images built with multi-arch support.
How to detect it: Open AWS Compute Optimizer and select the "Graviton (aws-arm64)" CPU architecture preference. It will surface which instance types in your fleet have Graviton equivalents and the projected savings. That report is the artifact the reviewer will look for as evidence the evaluation was performed.
How to fix it: Start with non-production workloads to validate performance before migrating production. For Lambda, switching the runtime architecture to arm64 is a single configuration change in the function settings, with no code changes in most supported runtimes. For EC2 and ECS workloads, validate your container images are multi-arch or rebuild them for arm64, then migrate using a rolling deployment.
The Lambda instruction set architecture documentation covers the runtime compatibility list and the function-level setting. The AWS Graviton getting-started guide (published by the Graviton team) has framework-specific migration notes for Java, Python, .NET, Node.js, and Go.
Scope note on Sustainability: AWS does not publicly classify individual Sustainability pillar best practices at the same HRI/MRI granularity as other pillars. The Performance Efficiency dimension of this finding (PERF03) is classified as High risk if unaddressed. Treat the Sustainability dimension as a medium-priority finding and use the Compute Optimizer analysis as your remediation artifact.
Quick win for Lambda; rolling project for EC2/ECS fleet migration.
How to Prioritize Your Findings After the Review
You have 14 HRIs and 8 MRIs on your improvement plan. Here is the framework for deciding where to start.
Start with Security HRIs. Root access, long-lived credentials, and missing CloudTrail have the highest blast radius across your entire account - they are not scoped to a single workload. Fix these first.
Within HRIs, apply the WA Tool's risk matrix: score each finding on likelihood (1-5) times impact (1-5). An untested DR plan scores low on likelihood but catastrophic on impact - it may outrank a finding with high likelihood but limited blast radius. The tool does not do this scoring for you, but making it explicit forces a real prioritization conversation.
Use the quick win classification from this article. Enable CloudTrail, delete root access keys, and turn on EBS encryption by default in a single sprint. These each take less than an hour and close multiple HRI codes simultaneously. Multi-sprint projects like IAM federation or IaC adoption get queued after the quick wins demonstrate progress.
Track everything in the WA Tool's Improvement Plan. A second-pass review will compare your current state against the first-pass findings.
Frequently Asked Questions
What is an HRI in an AWS Well-Architected Review?
What is the difference between an HRI and an MRI?
How long does a Well-Architected Review take?
Do I need an AWS Partner to run a Well-Architected Review?
Are my review findings shared with AWS?
Do findings change depending on which lens I use?
Can I export my findings from the Well-Architected Tool?
What to Do Next With Your Findings
Here is where most teams land after a WAR:
- Security findings produce the most HRIs in production workloads. Address root, credentials, and CloudTrail first. These are account-level controls, not workload-specific ones.
- A finding is not a failure verdict. It is a prioritized remediation item with official documentation attached. The purpose of the review is to surface these gaps, not to judge the team that built the workload.
- Quick wins (CloudTrail, EBS encryption default, root access key deletion) can close multiple HRI codes in a single sprint.
- Untested backups and unmonitored service quotas are the Reliability findings most likely to matter during an actual incident, not just during a review.
- IaC adoption closes findings in both Operational Excellence and Security simultaneously. That double close is worth prioritizing ahead of single-pillar improvements.
If you are preparing for a review, the 57-question WAFR checklist covers every question across all six pillars. If you already have a report and need help triaging HRIs and building a remediation roadmap, the Well-Architected Review service below is the structured path from findings to fixed.
Next step
Turn Your WAR Findings Into a Remediation Plan
We review your Well-Architected findings, prioritize HRIs by business impact, and deliver a remediation roadmap linked to the official AWS documentation - with clear sprint assignments so your team knows exactly what to fix and in what order.