This is a no-nonsense guide to help you get prepared for the AWS Certified DevOps Engineer Professional exam. I try to keep it short and on point so you don't have to waste your time and helps you focus on the right services which are covered in the exam.
I passed the AWS Certified DevOps Engineer exam last week for the second time.
The first time that I passed this exam was in 2018. Back then I had way less experience than I do now, but even having more experience doesn't mean you don't have to learn in order to pass the exam.
As you might know, taking exams requires patience and focus. Even when you got enough experience and knowledge you can still easily fail the exam If you don't focus and read the questions thoroughly. There are a couple of tactics that you can apply for the exam before you even dive into the content. Please give that blog post a read after you finish reading my exam guide 😉
So let's get started! Here is the outline of what you'll find in this AWS Certified DevOps Engineer professional guide:
- AWS Certified DevOps Engineer Professional exam overview
- Content outline
- Technical Preparation notes
- Domain 1: SDLC Automation
- Domain 2: Configuration Management and Infrastructure as Code
- Domain 3: Monitoring and logging
- Domain 4: Policies and Standards Automation
- Domain 5: Incident and Event Response
- Domain 6: High availability, fault tolerance, and disaster recovery
- Study material
- AWS DevOps Engineer Professional exam – FAQ
According to AWS, they recommend you to have the following prerequisites prior to taking the exam:
- Two or more years’ experience provisioning, operating and managing AWS environments
- Good working knowledge of AWS core services
- Experience working with a programming or scripting language e.g. Bash & Python
- Familiarity with Linux or Windows operating systems
- Familiarity with the AWS CLI
AWS provides the content outline in their exam guide
It gives some practical information on the AWS Certified DevOps Engineer Professional exam:
- Consists of 75 multiple-choice, multiple-answer questions.
- The exam needs to be completed within 180 min (you can request a permanent 30 min extra for the AWS exam if you follow this secret tip
- Costs $300,-
- The minimum passing score is 750 points
- The exam is available in English, Japanese, Korean, and Simplified Chinese.
The content outline of the AWS Certified DevOps Engineer Professional exam consists of 6 separate domains, each with its own weightings. The table below lists the domains with their weightings:
Further on in the guide, a more detailed explanation is added to each domain to give a better idea of what you should know.
In this section, I've bundled up my notes which you can use when you're preparing for the AWS Certified DevOps Engineer Professional exam. Prior to this Blogpost, I've also released a guide for the AWS Cloud Practitioner exam technical preparation notes. This contains the foundational information which also helps for this exam, so I highly recommend reading the notes from there as well.
Moving on to the preparation, I’ve written technical notes which highlight all the important details that are worth remembering for the exam. I've categorized it into the domain sections as it's displayed in the content outline. This makes learning easier because if you do a practice exam from AWS they'll share the results of how you scored on each domain. Then you can see which domains require more attention when studying for the exam.
After reading this exam guide I would definitely recommend watching the Exam Readiness: AWS Certified DevOps Engineer – Professional video from the AWS Training portal. Most of the tips and guidelines I wrote down came from that training. In addition to that, I've added more context and also written down some key tips that are worth knowing for the exam.
- A fully managed build service: Build your application from sources like AWS CodeCommit, S3, Bitbucket, and GitHub
- Build and test code: Debugging locally with an AWS CodeBuild agent is possible
- To configure build steps you create a
buildspec.ymlin the source code of your repository.
This is what a typical AWS CodeBuild
buildspec.yml looks like:
- Minimizes downtime because of a controlled deployment strategy
- Centralized control
- Iteratively release new features
- Three types of deployments: In-place, rolling, blue-green deployments
- Three sorts of deployment configurations: OneAtATime, HalfAtATime, AllAtOnce
- Ability to install CodeDeploy agents on EC2 instances and on-prem services to do deploys
- To specify what commands you want to run for each phase of the deployment you use an AppSpec configuration file.
This is what a typical AWS CodeDeploy
appspec.yml looks like:
- Is a standardized solution that adds consistency start from source code to build/test and then to deploy in one flow
- Gives you the ability to add a manual approval step
- Pipeline actions looks like this:
- Source: CodeCommit, S3, GitHub
- Build & Test: CodeBuild, Jenkins, TeamCity
- Deploy: AWS CodeDeploy / AWS CloudFormation / AWS Elastic beanstalk / AWS OpsWorks
- Invoke: Specify a custom function to invoke e.g. AWS Lambda
- Approval: Publish SNS topic for manual approval
For services like AWS CodeDeploy, CloudFormation, AWS Beanstalk, and AWS OpsWorks you can apply several deployment strategies. Each having its pros and cons.
The cheat sheet below shows the types of deployments and shows how good they rank on these columns: impact, deploy time, zero downtime, rollback process, and deploy target.
For the second domain, it's important to know the following for the AWS Certified DevOps Engineer exam:
- Know the functions of AWS CloudFormation in depth
- Know when and how to use AWS CloudFormation, AWS Elastic Beanstalk, and AWS OpsWorks
- Understand how to deliver Docker container images into Amazon ECS using CI/CD pipelines
- When routing "portions of users" to the application, always choose Route53
- If there is a question related to compliance or configuration management of AWS resources, the answer is most likely AWS Config
- Infrastructure as code, templates are in Yaml or JSON format
- Version control/replicate/update templates like code
- Integrated with CI/CD tools
- Run automated testing for CI/CD environments
AWS CloudFormation template anatomy: Template anatomy:
AWSTemplateFormatVersion: "version date" Description: String Metadata: template metadata Parameters: set of parameters Rules: set of rules Mappings: set of mappings Conditions: set of conditions Transform: set of transforms Resources: set of resources Outputs: set of outputsYAML
Types of CloudFormation stack updates:
- Updates with no interruption: No disruption in operation and without changing the physical name
- Updates with some interruption: Some disruption without changing the physical name
- Replacement: Resource is recreated and a new physical ID is generated
AWS Cloudformation helper scripts:
- cfn-init: Executes cfn metadata one time, typically in user data
- cfn-hup: Monitors cfn metadata and applies changes when discovered
- cfn-signal: Provides completion signal of a CreationPolicy or WaitCondition
- cfn-get-metadata: View the metadata that is stored in a CloudFormation stack.
AWS CloudFormation Template resource attributes:
- CreationPolicy Attribute: Define a period of time during which AWS CloudFormation will wait for a signal before marking the specific resource as Create Completed. Useful when you want the resource to finish its configuring before proceeding to deploy the next resource e.g. software installation on an EC2 instance.
- DeletionPolicy Attribute: Preserve a backup of a resource when its stack is deleted, you can specify the options retain or snapshot. By default, there is no DeletionPolicy enabled.
- DependsOn Attribute: Create an explicit dependency that requires a specified resource to be created before another can begin.
- Metadata Attribute: Associate structured data with a resource.
- UpdatePolicy Attribute: Define how CloudFormation updates the AWS::AutoScaling::AutoScalingGroup resource.
- Application infrastructure management
- Flexibility to change the configuration of the environment over time
- There are 3 OpsWorks offerings:
- AWS OpsWorks for Chef automate
- AWS OpsWorks for Puppet Enterprise
- AWS OpsWorks Stacks
For the third domain, it's important to know the following for the AWS Certified DevOps Engineer Professional exam:
- Determine how to set up the aggregation, storage, and analysis of logs and metrics
- Apply concepts required to automate monitoring and event management of an environment
- Apply concepts required to audit, log, and monitor operating systems, infrastructures, and applications
- Determine how to implement tagging to categorize resources and get better insights into the costs
- Know the different logging options and see which is most cost-effective based on requirements
- Collect metrics and logs
- Monitor: alarms and dashboards
- Act: auto-scaling and events
- Analyze: trends and metrics
- Compliance and security
Important CloudWatch metrics:
- Metrics are kept for 15 months, the older the data the less granular they become. After 2 weeks (5 min interval), after months it will be hourly interval, etc.
- Know these ELB metrics:
- SurgeQueueLength: Backend systems aren't able to keep up with the ELB requests
- SpillOverCount: When the above happens, the requests get dropped, hence the SpillOverCount.
- Know these EC2 metrics:
- StatusCheckFailed: Reports whether the instance has passed both the instance status check and the system status check at the last minute.
- CPUCreditUsage: The number of CPU credits spent by the instance for CPU utilization. One CPU credit equals one vCPU running at 100% utilization for one minute or an equivalent combination of vCPUs, utilization, and time (for example, one vCPU running at 50% utilization for two minutes or two vCPUs running at 25% utilization for two minutes).
- CPUCreditBalance: The number of earned CPU credits that an instance has accrued since it was launched or started. For T2 Standard, the CPUCreditBalance also includes the number of launch credits that have been accrued.
- Know these HTTP status code metrics:
- HTTPCode_Backend_5xx: Instances or databases might be at capacity, check their metrics to verify
- HTTPCode_ELB_4xx: Check instance logs, connections are timing out
- Check latency metrics:
- If latency increases during load testing your application might not scale horizontally e.g. nog auto-scaling or DB is bottleneck or calls to external services are slow.
- Cloudwatch agent installed on an instance or container
- These instances log events in a log stream
- The log streams are bundled up in a log group
- Collect process and analyze real-time streaming data (for quick incident response)
- Logs are ingested via Kinesis data streams or Firehose
- Analysis is done with Kinesis Data Analytics
- Use Kinesis Firehose if you need a fully managed service to transfer data to S3, Redshift, Elasticsearch of Splunk.
- Use Kinesis data streams if real-time logs are needed. However, it required more effort to set up and manage.
- A good use case to implement Kinesis Firehose is if you centralize CloudWatch log events and move the data to S3 for longer retention and storage.
- Track user activity and API usage
- Log, continuously monitor and retain account activity related to actions
AWS CloudTrail best practices:
- Enable in CloudTrail all regions
- Enable log file validation
- Encrypt logs
- Integrate with CloudWatch logs
- Centralize logs from all accounts
- Create additional trails as needed
- Understand how to enable log integrity.
- When you see a question about auditing user actions in AWS or reporting on an API call, the answer most likely contains AWS CloudTrail.
- Use IAM roles whenever possible (avoid the usage of IAM users and groups if possible)
- Requirements to set up an IAM role:
- Trust policy: Who can assume this role
- Access permission policy: What actions and resources the one assuming the role is allowed to do
- Security is important, even if questions don't focus on security, try to target the answer with the most security (more encryption is better)
- Here are some highlights of AWS services that support encryption:
- S3 service side encryption
- EBS service side or host encryption
- Glacier by default encrypted
- EFS supports only KMS encryption
- Protect AWS accounts and workloads
- Monitors your AWS environment for suspicious activity and generate findings
- Allows you to add your own threat list and trusted IP lists
- Analyzes multiple data sources: CloudTrail, VPC flow logs, and DNS logs
- Track resource configuration changes
- Sends notifications or automatically remediate when changes occur
- Enables compliance monitoring and security analysis
- When you see a question about auditing or checking the state of resources. There is a huge chance that the answer contains AWS Config.
- Agent-based solution
- Detects vulnerabilities
- Verifies security best practices
- Generates findings report
- It's a good practice to add EC2 security assessments as part of your CI/CD pipeline
- Important: Automatic assessments run through a Lambda function + CloudWatch event
- Manages systems in the Cloud and on-premises, good use case for patch management.
- When you tag patch groups don't forget that the tags are case sensitive and you can separate patch groups based on tags.
- Automate admin tasks (state manager):
- Collect software inventory
- Apply OS patches with patch manager
- Create system images
- Configure Windows and Linux systems
- Session manager
- Set maintenance windows
- SSM Parameter Store:
- Create unique parameter names (strings, string lists, secure strings)
- Encrypt with AWS KMS
- Use API to pull parameters in your machines
- Secrets Manager:
- Encrypted via AWS KMS (costs are higher)
- Supports automated credentials rotation
- License Manager:
- Strictly for storing licenses
- Also stores information where the license is activated
- Keep cost optimization in mind when answering the exam questions. Use tagging for cost management.
- You need to have a support plan to be able to use AWS Trusted Advisor.
You can view the check descriptions and results for the following check categories:
- Cost Optimization: Highlights underutilized resources on your account to save money.
- Performance: Recommendations that can improve the speed and responsiveness of your applications.
- Security: Finds possible security improvements e.g. enable MFA on root user.
- Fault Tolerance: Looks for potential over-used resources and helps to increase resiliency in your AWS account.
- Service Limits: Checks if your AWS account approaches or exceeds service limits for your resources.
The AWS Personal Health Dashboard organizes issues in three groups: open issues, schedule changes, and other notifications. Important to know here is that you'll get EC2 instance retirement/maintenance messages.
- Create and manage catalogs of approving IT services
- Limit access to underlying AWS services
- Helps with consistent governance and compliance requirements
- Enable turn-key self-service solutions for all users
- Shorten deploy time by creating golden AMI's with pre-defined configurations
- Bootstrap custom user data scripts
- Configuration management with Puppet, Chef, Ansible, or OpsWorks to manage configurations
For this domain, you should know how to troubleshoot issues and know how to restore operations. It's also important to know how to automate healing and being able to set up event-driven automated actions including alerting.
- Use the CloudWatch logs agent for EC2/ECs to push logs to CloudWatch Logs
- Centralize logging in a separate account, use Kinesis firehose to move multiple streams of logging to S3 for example.
- Log as much as you can even if you don't immediately use it
- Keep logs as long as you can and use them for long term analysis
- Cloudformation supports beanstalk
- Good for devs who want to setup application without provisioning
- Tomcat for Java
- Apache for PHP or Python apps
- Nginx/Apache for node.js
- Passenger for ruby apps
- Controlled with AWS Elastic Beanstalk:
- access CloudWatch
- adjust application server settings e.g. JVM and pass environment variables
- Multi-AZ support/ no multi-region
- Restrict IP's on security groups and ACL's
- By default publicly available
- AWS Elastic Beanstalk supports IAM, VPC, code is stored in S3
- Multiple environments are allowed
- Only changes from git repositories are replicated
- Stack: Container for grouping resources e.g. ec2 instances, EIP, DB servers
- Instance must be assigned to one layer
- App layer, DB layer, ELB layer, monitoring, DB cache
- Layers can be extended with Chef recipes
- Can run as 24/7, Load based, and Time based
- Apps are deployed with Git or S3
- Provisioned using Chef or Puppet
- Make sure you know how lifecycle hooks work when scaling in and out.
- Instances can be put in a wait state in a lifecycle hook operation. The maximum amount of time to put an instance in wait state is 48 hours (default is 1 hour).
- There are 7 termination policies:
- Use a CreationPolicy attribute to EC2 instances in an auto-scaling group to configure or bootstrap it. Then make sure to use the
cfn-signalhelper script to signal when an instance creation process has been completed successfully.
You should know the following for the exam:
- Know when to use Multi-AZ vs Multi-region architectures
- Know how to implement HA, scalability, and fault tolerance
- Know the right services based on business requirements e.g. RTO, RPO, and costs
- Know how to design and automate disaster recovery strategies
- Evaluate a deployment for points of failure
- Cross-region snapshot copies are good for high availability and failover scenarios
- Read replicas are important for quicker cross-region failover scenarios
- Asynchronous communications
- Direct queries to the read replicas
- Use Elasticache in front of RDS
- Cache common requests in Elasticache to offload your RDS
- Global tables are important to store data across multiple regions
- Reduce response times of eventually consistent read workloads
- For read-heavy or bursty workloads
Understand the concepts of Recovery Point Objective (RPO) and Recovery Time Objective (RTO):
- RTO: How much data can you afford to lose e.g. the business can recover from losing the last 8 hours of data.
- RPO: How quickly must you recover from downtime e.g. the application can be unavailable for 4 hours per month
There are 4 types of disaster recovery:
- Backup and restore: This is the cheapest method but takes a long time to restore from disaster recovery.
- Pilot light: You replicate part of your infrastructure e.g. VPC and autoscaling groups. Once a disaster happens you scale up the compute resources.
- Warm standby: A scaled-down version of your infrastructure is replicated in the disaster recovery region. You only need to scale up the resources and update the domain to point to the disaster recovery region. This is good if you need RPO and RTO within minutes.
- Hot standby: Same as the warm standby but except that the disaster recovery infrastructure is replicated one-on-one with the original environment. This is the most expensive disaster recovery method, but it minimizes your RPO and RTO to seconds instead of minutes.
On the internet, you'll find a lot of study material for the AWS Certified DevOps Engineer Professional exam. It can be really overwhelming if you need to search for great quality material. Lucky for you, I've spent some time to curate the available study material and highlight some of the stuff worth reading:
The notes that I've written in the previous chapter contain keywords and summaries, don't solely depend on that! If a concept or keyword is unknown to you then see it as an incentive to dive deeper into that topic. Based on my experience with the exam I would recommend reading the official documentation on the following services:
- Auto Scaling - pay attention to: launch templates, launch configurations, lifecycle hooks, termination policies.
- AWS Elastic Beanstalk - pay attention to: deployment methods and configurations (.ebextensions, .elasticbeanstalk configs)
- AWS OpsWorks- pay attention to: stacks, best practices.
- AWS Codedeploy - pay attention to: Working with instances, deployment configurations, applications (Appspec), deployment groups.
- AWS CodeBuild - pay attention to: code sources and buildspec file setup.
- AWS CodePipeline - pay attention to: pipeline structure and use cases for CodePipeline
- AWS Systems Manager - pay attention to: patch baselines, maintenance windows, and run command.
- AWS CloudFormation - pay attention to: best practices, template anatomy, creationpolicy, deletionpolicy, and dependson.
If you want to update your foundational knowledge I would really recommend giving these official study guides a chance:
There is one whitepaper, in particular, that's a must-read to give you a good understanding of automation, compliance, and infrastructure as code:
After you have gone through my guide, you should be able to answer all of the AWS DevOps Engineer Professional exam questions with ease and confidence. If you still want to practice a little bit more then I can recommend you to take a couple of practice tests before taking the real exam.
Let me know if this guide helped you pass the exam!
If you don't have a lot of practical experience, the answer is yes. You'll need to learn the foundations of the DevOps way of working. That means you need to know how to automate, configure and deploy workloads. If you're comfortable with these concepts, it's critical to spend time learning the AWS services that you don't use that often. In my case that was mostly AWS Beanstalk and AWS OpsWorks.
It took me around 4 days to prepare for this exam. I work daily with AWS services and specialized myself in DevOps and migrations, so for me, it was most important to focus on AWS services that I didn't use such as AWS Elastic Beanstalk and AWS OpsWorks. If you're less experienced it can take at least a couple of weeks to get comfortable with each service and I would recommend you to test them out in the AWS Console to get a better understanding of it.
I've written a walkthrough on how to schedule the AWS Certification exam:
It also shows you how you can permanently request 30 minutes extra for each AWS exam!