The Well Architected Framework

November 13, 2017 Amazon Web Services (AWS), Design Priciples

https://www.udemy.com/aws-certified-solutions-architect-associate/learn/v4/t/lecture/5267844?start=0

https://d0.awsstatic.com/whitepapers/architecture/AWS_Well-Architected_Framework.pdf

Main Menu

Introduction – What is the Well Architected Framework?
Pillar 1 – Security
Pillar 2 – Reliability
Pillar 3 – Performance Efficiency
Pillar 4 – Cost Optimization
Pillar 5 – Operational Excellence

What is “The Well Architected Framework”

This has been developed by the Solutions Architecture team based on their experience with helping AWS customers. The well architected framework is a set of questions that you can use to evaluate how well your architecture is aligned to AWS best practices.

The Five Pillars of The Well Architected Framework

Security
Reliability
Performance Efficiency
Cost Optimization
Operational Excellence

Structure of each pillar

Design Principles
Definition
Best Practices
Key AWS Services
Resources

General Design Principles

Stop guessing your capacity needs
- Think Auto-scaling from the word Go.
Test systems at production scale
- Spin it up, test it, decommission it.
Automate to make architectural experimentation easier
- Create and replicate with no additional effort
Allow for evolutionary architectures
- Keep it dynamic and don’t lock yourself into bad architecture.
Data-Driven architectures
- Use systems like CloudWatch to make fact based decisions regarding your architecture.
Improve through game days
- Spin up a test environment to fully mimic your production, then test the heck out of it to see if it is ready to hold up under real life conditions.

Pillar 1 – Security

Design Principles

Apply security at all layers (not just public facing firewalls!)
- Subnets -Network ACLs
- Load Balancers
- Individual EC2 instances
- Windows servers should have some sort of Anti-virus software.
Enable Traceability
- CloudTrail
Automate responses to security events
- Brute force attempts to Port 22
- WP admin or xmppa attacks
Focus on securing your system
- What is the client responsible for? (See image below)
  - Securing the data
  - The application
  - The Operating system
Automate security best practices
- The center for Internet Security
- Harden your OS and create an AMI
- All auto-scaled instances will be secured from the word go.

Definition (Need to remember these definitions!)

Data Protection

Before starting the design, data should be organized and classified into segments, such as:

Publicly available
Only within your organization
Only to specific members of your organization
The B.O.D.

Each segment should be protected by with a ‘least privilege’ access system so people can only access what they need. Additionally, data should be encrypted where ever possible, be it at rest or in transit (HTTPS).

Best Practices

In AWS, the following practices help to protect your data:

AWS customers maintain full control over their data.
AWS makes it easier for your to encrypt your data and manage keys, including regular key rotation, which can be easily automated natively by AWS or maintained by a customer. (Key Management Services)
Detailed logging is available that contains important content, such as file access and changes. (CloudTrail)
AWS has designed storage systems for exceptional resiliency. Example: S3 is designed for 11 nines of durability. (if you store 10,000 objects with S3, you can expect to incur a loss of a single object once every 10,000,000 years.)
Versioning, which can be part of a larger data lifecycle management process and can protect against accidental overwrites, deletes and similar harm.
AWS never initiates the movement of data between regions. Content placed in a region will remain in that region unless the customer explicitly enables a feature or leverages a service that provides that functionality.

Questions to Ask

How are you encrypting your data at rest?
How are you encrypting your data in transit? (SSL)

Privilege Management

Privilege Management ensures that only authorized and authenticated users are able to access your resources, and only in a manner that is intended. It can include:

Access Control Lists
Role Based Access Controls
Password Management (such as password rotation policies & password strength)

Questions to Ask

How are you protecting access to and use of the AWS root account credentials?
- Have you enabled MFA?
How are you defining roles and responsibilities of system users to control human access to the AWS Management Console and APIs?
- Are users divided into groups that allow access only to what they need?
How are you limiting automated access (such as from apllications, scripts, or their-party tools or services) to AWS resources?
How are you managing keys and credentials?

Infrastructure Protection

Outside of Cloud, this is how you protect your data center. RFID controls, security, lockable cabinets, CCTV etc. Within AWS they handle this, so really Infrastructure protection exists at your VPC level.

How are you protecting your VPC
What security groups are in place?
What ACLs are in place?
Is the subnet public or private?

Questions to ask

How are you enforcing network and host-level boundary protection?
- Are you just using Security Groups or are you also using Network ACLs?
- Are you using Public and Private subnets?
- Are you using Bastion hosts for the private instances?
- What users have access to these services? Root or other users also?
How are you enforcing AWS service level protection?
- Do you have multiple users that are able to log into the console?
- Do you have groups set up so different users have different privileges?
- Do you have MFA enabled?
- Do you have a strong password protection/rotation policy?
- How are you protecting the integrity of the operating systems on your EC2 instances?
  - If Windows, do you have Anti-virus installed?

Detective Controls

You can use detective controls to detect or identify a security breach. AWS Services to achieve this include:

CloudTrail
CloudWatch
- CPU or RAM goes through the roof
Config
S3
Glacier

Questions to ask

How are you capturing and analyzing AWS logs?
- Is CloudTrail turned on in each Region in which you are operating?
- Are you using any 3rd party virtual instances such as IDS/IPS as provided by Alert Logic?

Best Practices

Key AWS Services

Data Protection
- Encrypt your data both in transit and at rest
  - ELB
  - EBS
  - S3
  - RDS
Privilege Management
- IAM
- MFA
Infrastructure Protection
- VPC
- Security Groups
- Network ACLs
Detective Controls
- CloudTrail
- Config
- CloudWatch

Resources

Pillar 1 – Security: Exam Tips

Know the 4 areas of Security
- Data Protection
- Privilege Management
- Infrastructure Protection
- Detective Controls
Know some of the questions to ask for each area
- Data Protection
  - How protecting and encrypting your data at rest?
  - How protecting and encrypting your data in transit?
- Privilege Management
  - How are you protecting access to and use of the AWS root account credentials?
  - How are you defining roles and responsibilities of sysem users to control human access to the AWS Management Console and APIs?
  - How are you limiting automated access (such as from applications, scripts, 3rd party tools or services) to AWS resources?
    - Are you using Roles???
  - How are you managing Keys and credentials?
- Infrastructure Protection
  - How are you enforcing network and host-level boundary protection?
  - How are you enforcing AWS sercie level protection?
  - How are you protecting the integrity of the OS?
    - Anti-virus?
- Detective Controls
  - How are you capturing and analyzing your AWS logs?
    - Are you USING CloudTrail?

Pillar 2 – Reliability

The reliability pillar covers the ability of a system to recover from service or infrastructure outages/disruptions as well as the ability to dynamically acquire computing resources to meet demand.

Design Principles

Test recovery procedures
- Simulate different failures (turn off key instances/resources)
- Recreate scenarios that led to failures previously
Automatically recover from failure
- Use Key Performance Indicators (KPIs) system monitoring to trigger events (automation) when a threshold is breached. (Scaling up/down)
Scale horizontally to increase aggregate system availability
- Use multiple smaller instances instead of 1 larger one to reduce the impact of a single instance failure.
Stop guessing capacity
- Under provision and you will experience issues when limits are exceeded or during peak times.
- Over provision and you’re waisting money.
Manage changes through Automation.
- Changes to infrastructure should be done using automation. The changes that need to be managed are changes to the automation.

Definition

Reliability in the cloud consists of 3 areas:

Foundations
Change Management
Failure Management

Best Practices

Foundations

“Before building a house, make sure that the foundation is securely in place before laying the first brick.”

In a traditional Datacenter world: Mis-provisioning during the initial design/build can cause significant delays or other issues at deployment
In AWS, the cloud is designed to be essentially limitless
- AWS can handle the networking and compute requirements
- They do set service limits to stop customers from accidentally over-provisioning.
  - https://docs.aws.amazon.com/general/latest/gr/aws_service_limits.html

Foundation questions to ask

How are you managing AWS service limits for your account?
How are you planning your network topology on AWS?
Do you have an escalation path to deal with technical issues?
- Do we need to upgrade our level of services so we have an account manager?

Change Management

You need to be aware of how change affects a system so that you can plan proactively around it. Monitoring allows you to detect any changes to your environment and react.

In traditional systems, change control is done manually and are carefully coordinated with auditing.
With AWS, you can use CloudWatch to monitor the environment and use services such as autoscaling to automate change in response to changes in the production environment.

Change Management questions to ask

How does your system adapt to changes in demand?
How are you monitoring your AWS resources?
How are you executing change management?

Failure Management

With cloud, you should always architect your systems with the assumptions that failures will occur. You should become aware of these failures, how they occurred, how to respond to them and then plan on how to prevent these from happening again.

Failure Management questions to ask.

How are you backing up your data?
How does your system withstand component failures?
How are you testing for resiliency?
How are you planning for recovery?

Key AWS Services

Foundations
- Identity Access Management
- VPC
Change Management
- CloudTrail
- Config
Failure Management
- CloudFormation (Provides templates for the creation of resources and provisions them in an orderly and predictable fashion.
- Also recommend (but not listed in the white paper)
  - CloudWatch/Auto Scaling
  - RDS & Multi-AZ

Exam Tips

Reliability in the cloud consists of 3 areas:
- Foundations
- Change management
- Failure management
Foundations
- How are you managing AWS service limits for your account?
- How are you planning your network topology on AWS?
- Do you have an escalation path to deal with technical issues?
Change Management
- How does your system adapt to changes in demand?
- How are you monitoring AWS resources?
- How are you executing change management?
Failure Management
- How are you backing up your data?
- How does your system withstand component failures?
- How are you planning for recovery?

Pillar 3 – Performance Efficiency

https://www.udemy.com/aws-certified-solutions-architect-associate/learn/v4/t/lecture/5267852?start=0

The Performance Efficiency pillar focuses on how to use computing resources efficiently to meet your requirements and how to maintain that efficiency as demand changes and technology evolves.

Design Principles

Democratize advanced technologies
- Rather than having your IT team learn how to hose and run a new technology, they can simply consume it as a service
  - NoSQL Databases
  - media transcoding
  - Machine Learning
Go global in minues
- Use CloudFormation to provision your environment in different regions globally.
Use serverless architectures
- Removes the need to run and maintain servers.
  - Use storage for static web pages and content
  - Event services host your code
- These can drastically lower operational costs because you are only paying for EXACTLY what you use!
Experiment more often
- Try using different instance types to see if you can achieve the same results at lower costs.
Mechanical Sympathy
- Use the technology approach that aligns best with what you are trying to achieve. Consider actual patterns of usage when determining the best database or storage solutions.

Definition

Selection
- Compute
- Storage
- Database
- Space-time trade-off
- Network
Review
Monitoring
Tradeoffs

Best Practices

Selection – Compute

When architecting your system, it is important to choose the right kind of server
- CPU heavy
- I/O heavy
- Ram heavy
- Serverless?
Ability to change at virtually the click of a button or API call.

Compute Questions to ask

How do you select the appropriate instance type for your system?
How do you ensure that you continue to have the most appropriate instance type as new instance types and features are introduced?
How do you monitor your instances post launch to ensure they are performing as expected?
How do you ensure that the quantity of your instances matches demand?

Selection – Storage

The optimal storage solutions for your environment depends on a number of factors. For example:

Access Method – Block, File or Object?
Patterns of Access – Random or Sequential
Throughput Required
Frequency of Access – Online, Offline (infrequently accessed) or Archival
Frequency of Update – WORM (Write Once, Read Many), Dynamic
Availability Constraints
Durability Constraints

Options:

S3
- 11 -9’s with S3
- Cross-Region replication
EBS
- SSD
- Magnetic
- Provisioned IOPs
- Ability to change by creating and restoring snapshots.

Storage questions to ask

How do you select the appropriate storage solution for your system?
How do you ensure that you continue to have the most appropriate storage solution as new storage solutions and features are launched?
How do you monitor your storage solution to ensure it is performing s expected?
How do you ensure that the capacity and throughput of your storage solutions matches demand?

Selection – Databases

The optimal database solution depends on a number of factors. Do you need:

Database consistency?
High availability?
NoSQL?
DR?

Database questions to ask

How do you select the appropriate database solution for your system?
How do you ensure that you continue to have the most appropriate database solution and features as new database solution and features are launched?
How do you monitor your databases to ensure performance is as expected?
How do you ensure the capacity and throughput of your databases matches demand?

Selection – Network (aka Space-Time Trade Off)

This could be re-labeled “How to lower latency”

Use Read Replicas to reduce load on the primary DB and put the replica close to the destination?
Use DirectConnect to optimize and have predictable latency between HQ and AWS
Use Global Infrastructure to put your environment in regions close to your customers.
Use ElastiCache or CloudFront

Network questions to ask

How do you select the appropriate proximity and caching solutions for your system?
How do you ensure that you continue to have the most appropriate proximity and caching solutions as new solutions are launched?
How do you monitor your proximity and caching solutions ensure performance is as expected?
How do you ensure that the proximity and caching solutions you have matches demand?

Key AWS Services

Compute
- Autoscaling
Storage
- S3, EBS, Glacier
Database
- RDS, DynamoDB, Redshift
Networking
- CloudFront, ElastiCache, Direct Connect, RDS Read Replicas

Exam Tips

Know the 4 different areas of Performance Efficiencies
- Compute
- Storage
- Database
- Network
Remember the 4 types of questions for each
- How do you select the right one?
- How do you stay up to date as new options are released?
- How are you monitoring and testing what you have?
- How do you make sure what you have matches your demaind.

Resources

Pillar 4 – Cost Optimization

https://www.udemy.com/aws-certified-solutions-architect-associate/learn/v4/t/lecture/5267856?start=0

Use the Cost Optimization pillar to reduce your costs to a minimum and use those savings for other parts of your business. A cost-optimized system allows you to pay the lowest price possible while still achieving your business objectives.

Design Principles

Transparently attribute expenditure.
- This expense was consumed by Dept. X and that was Dept. Y.
- Tags?
Use managed services to reduce cost of ownership
- Email services?
Trade capital expense for operating expense
- Spin up Dev and you need it and destroy when you’re not
Benefit from economies of scale
- AWS buys in super-mega bulk and gets deep discounts you could not get on your own.
Stop spending money on data center operations.
- Operational Overhead

Definition

Match Supply and Demand
Cost-effective resources
Expenditure awareness
Optimizing over time

Best Practices

Match Supply and Demand

Try to optimally align supply with demand. Don’t over provision or under provision. Instead, as demand grows, so should your supply of compute resources.

Autoscaling
Lambda (serverless) which only executes when a request is generated.
CloudWatch can be used to keep track of what your demand is.

Supply and Demand questions to ask

How do you make sure your capacity matches but does not substantially exceed what you need?
How are you optimizing your usage of AWS services?

Cost-Effective Resources

Using the correct instance type can be key to cost savings. Fore example, you might have a reporting process that is running on a t2.micro and takes 7 hrs to complete. That same process could be run on an m4.2xlarge in a matter of minutes. The result remains the same but the t2.micro is more expensive because it ran much longer.

A well architected system will use the most cost efficient resources to reach the end of business goal.

Cost-Effective Resources questions to ask

Have you selected the appropriate resource types to meet your cost targets?
Have you selected the appropriate pricing model to meet your cost targets?
Are there managed services (AWS SaaS) that you can use to improve your ROI (Return On Investment)

Expenditure Awareness

With Cloud, you no longer have to go and get quotes on physical servers, choose a supplier, wait for delivery, install and configure the new hardware before you can use it. You can provision things within seconds, however this comes with its own issues. Many organizations have different teams, each with their own AWS accounts. Being aware of what and where each team is spending is crucial to any well architected system. You can use cost allocation tags to track this, billing alerts as well as consolidated billing.

Expenditure Awareness questions to ask

What access controls and procedures do you have in place to govern AWS costs?
How are you monitoring usage and spending?
How do you decommission resources that you no longer need, or stop resources that are temporarily not needed?
How do you consider data-transfer charges when designing your architecture?

Optimizing Over Time

AWS moves FAST! There are hundreds of new services (and potentially 1000 new services this year). A service that you chose yesterday may not be the best service to be using today. For example consider MySQL RDS: Aurora was launched at re:invent 2014 and is now out of preview. Aurora may be a better option now for your business because of its performance and redundancy. You should keep track of the changes made to AWS and constantly re-evaluate your existing architecture. You can do this by:

Subscribing to the AWS blog
Using service such as Trusted Advisor.

Optimizing Over Time questions to ask

How do you manage and/or consider the adoption of new services?

Key AWS Services

Matched Supply and Demand
- Autoscaling
- Lambda
Cost-Effective Resources
- EC2 Reserved instances
- Spot Instances
- AWS Trusted Advisor
Expenditure Awareness
- Tags
- CloudWatch Alarms
- SNS
- Billing Alerts
Optimizing Over Time
- AWS Blog
- Trusted Advisor

Exam Tips

4 Areas
- Matched Supply and Demand
  - How do you make sure your capacity matches but does not substantially exceed what you need?
  - How are you optimizing your usage of AWS services?
- Cost-effective resources
  - Have you selected the appropriate resource types to meet your cost targets?
  - Have you selected the appropriate pricing model to meet your cost targets?
  - Are there managed services (AWS SaaS) that you can use to improve your ROI (Return On Investment)
- Expenditure Awareness
  - What access controls and procedures do you have in place to govern AWS costs?
  - How are you monitoring usage and spending?
  - How do you decommission resources that you no longer need, or stop resources that are temporarily not needed?
  - How do you consider data-transfer charges when designing your architecture?
- Optimizing over time
  - How do you manage and/or consider the adoption of new services?

Pillar 5 – Operational Excellence

The Operational Excellence pillar includes operational practices and procedures used to manage production workloads. This includes how planned changes re executed, as well as responses to unexpected operational events. Change execution and responses should be automated. All processes and procedures of operational excellence should be documented, tested and regularly reviewed.

Design Principles

Perform operations with code
Align operations processes to business objectives
- Collect metrics that your business objectives are being met.
Make regular, small incremental changes
Test for responses to unexpected events
Learn from operational events and failures
Keep operations procedures current

Definition

Preparation
Operation
Response

Best Practices

Preparation

Effective preparation is required to drive operational excellence. Operations checklists will ensure that workloads are ready for production operation and prevent unintentional production promotion without effective preparation.

Documentation is king!

Be sure that documentation does not become stale or out of date as procedures change. Also make sure that it is thorough. Without application designs, environment configurations, resource configurations, response plans, and mitigation plans, documentation is not complete! If documentation is not updated and tested regularly, it will not be useful when unexpected operational events occur. If workloads are not reviewed before production, operations will be affected when undetected issues occur. If resources are not documented, when operation events occur, determining how to respond will be more difficult while the correct resources are identified.

Workloads should have:

Runbooks – Operations guidance that operations teams can refer to so they can perform normal daily tasks.
- Day to Day Operations
Playbooks – Guidance for responding to unexpected operational events. Playbooks should include response plans, as well as escalation paths and stakeholder notifications.
- Escalation path and procedures.

Preparation questions to ask

What best practices for cloud operations are you using?
How are you doing configuration management for your workload?

Operations

Operations should be standardized and manageable on a routine basis. The focus should be on automation, small frequent changes, regular quality assurance testing and defined mechanisms to track, audit, roll back and review changes. Changes should not be large and infrequent nor should they require scheduled downtime nor manual execution. A wide range of logs and metrics that are based on key operational indicators for a workload should be collected and reviewed to ensure continuous operations.

With AWS you can set up a continuous integration / continuous deployment (CI/CD) pipeline (e.g. source code repository, build systems, deployment and testing automation.) Release management processes, whether manual or automated, should be tested and based on small incremental changes, and tracked versions. You should be able to revert changes that introduce operational issues without causing operational impact.

Routine operations, as well as responses to unplanned events, should be automated. Manual processes for deployments, release management, changes and rollbacks should be avoided. Releases should not be large batches that are done infrequently. Rollbacks are more difficult in large changes, and failing to have a rollback plan, or the ability to mitigate failure impacts will prevent continuity of operations. Align monitoring to business needs, so that the responses are effective at maintaining business continuity. Monitoring that is ad hoc and not centralized, with responses that are manual, will cause more impact to operations during unexpected events.

Operations questions to ask

How are you evolving your workload while minimizing the impact of change?
How do you monitor your workload to ensure it is operating as expected?

Responses

Responses to unexpected operational events should be automated. This is not just for alerting, but also for mitigation, remediation, rollback and recovery. Alerts should be timely and should invoke escalations when responses are not adequate to mitigate the impact of operational events. Quality assurance mechanisms should be in place to automatically roll back failed deployments. Responses should follow a per-defined play book that includes stakeholders, the escalation process and procedures. Escalation paths should be defined and include both functional and hierarchical escalation capabilities. Hierarchical escalation should be automated and escalated priority should result in stakeholder notifications.

In AWS, there are several mechanisms to ensure both appropriate alerting and notification in response to unplanned operational events, as well as automated responses.

Response questions to ask

How do you respond to unplanned operational events?
How is escalation managed when responding to unplanned operational events?

Key AWS Services

Preparation
- CloudFormation
  - Reduces potential for human error
  - All CloudFormation implemenations should be fully tested before putting into production to ensure reliability.
- Auto Scaling
- AWS Config
  - Provides a detailed inventory of your AWS resources and configurations.
  - Continuously records configuration changes.
- AWS Service Catalog
  - Create a standardized set of service offerings that are aligned to best practices.
- Tagging
Operations
- CodeCommit
- CodeDeploy
- CodePipeline
- SDKs (Software Development Kits)
- CloudTrail
Response
- CloudWatch + SNS

Exam Tips

Operational Excellence consists of 3 areas

Preparation
- What best practices for cloud operations are you using?
- How are you doing configuration management for your workloads?
Operation
- How are you evolving your workload while minimizing the impact of change?
- How do you monitor your workload to ensure it is operating as expected?
Responses
- How do you respond to unplanned operational events?
- How is escalation managed when responding to unplanned operational events?

Summary of the Well Architected Framework

https://www.udemy.com/aws-certified-solutions-architect-associate/learn/v4/t/lecture/5267868?start=0

What are the 5 Pillars?

Security
Reliability
Performance Efficiency
Cost Optimization
Operational Excellence

Security

Data Protection (Encryption)
- How are you encrypting your data at rest?
- How are you encrypting your data in transit? (SSL)
Privilege Management
- How are you protecting access to and use of the AWS root account credentials? (MFA)
- How are you defining roles and responsibilities of system users to control human access to the AWS Management Console and APIs? (Groups?)
- How are you limiting automated access (such as from applications, scripts, or 3rd party tools or services) to AWS resources? (Roles from Identity Access Management)
- How are you managing keys and credentials?
Infrastructure Protection (Handled by AWS)
- How are you enforcing network and host level boundary protection?
- How are you enforcing AWS service level protection?
- How are you protecting the integrity of the operating systems on your EC2 instances?
Detective Controls (CloudTrail)
- How are you capturing and analyzing AWS logs? (CloudWatch, CloudTrail – Verify!)

Reliability

(These seem to be the same as Pillar 5 – Operational Excellence)

Foundations (Preparations)
- How are you managing AWS service limits for your account? (Billing alerts?)
- How are you planning your network topology on AWS?
- Do you have an escalation path to deal with technical issues?
  - AWS Technical Account Manager?
Change Management (Operations)
- How does your system adapt to changes in demand? (Auto Scaling)
- How are you monitoring AWS resources?
- How are you executing change management? (Automated via Auto Scaling)
Failure Management (Responses)
- How are you backing up your data?
- How does your system withstand component failures?
- How are you planning for recovery?

Performance Efficiency

Each set of 4 questions is pretty much the same for each key area

Compute
- How do you select the appropriate instance type for your system
- How do you ensure you continue to have the right type as new types and features are introduced?
- How are you monitoring these systems to ensure they perform as expected
- How do you ensure you have the correct quantity of these instances?
Storage
- How do you select the appropriate storage type for your system
- How do you ensure you continue to have the right type as new types and features are introduced?
- How are you monitoring these systems to ensure they perform as expected
- How do you ensure you have enough capacity and throughput to match your demand?
Database
- How do you select the appropriate database solution for your system
- How do you ensure you continue to have the right type as new types and features are introduced?
- How are you monitoring these systems to ensure they perform as expected
- How do you ensure you have enough capacity and throughput to match your demand?
Network (Space-time trade-off)
- How do you select the appropriate proximity and caching solutions for your system
- How do you ensure you continue to have the right proximity and caching solutions as new types and features are introduced?
- How are you monitoring your proximity and caching solutions to ensure they perform as expected
- How do you ensure that the proximity and caching solutions you have match your demand?

Cost Optimization

Matched supply and demand
- How do you make sure your capacity matches but does not substantially exceed your demand?
- How are you optimizing your usage of AWS services?
Cost-effective resources
- Have you selected the appropriate resource types to meet your cost targets?
- Have you selected the appropriate pricing model to meet your cost targets?
- Are there managed services (SaaS) that you can use to improve your ROI?
Expenditure awareness
- What access controls and procedures do you have in place to govern AWS costs?
- How are you monitoring usage and spending?
- How do you decommission resources that you no longer need, or stop resources that are temporarily not needed?
- How do you consider data-transfer charges when designing your architecture?
Optimization over time
- How do you manage and/or consider the adoption of new services as they become available to you? (Subscribe to the AWS blog!)

Operational Excellence

(Still think this is the same as Pillar #3, Reliability)

Preparation
- What best practices for cloud operations are you using?
- How are you doing configuration management for your workload?
Operations
- How are you evolving your workload while minimizing the impact of change?
- How do you monitor your workload to ensure it is operating as expected?
Responses
- How do you respond to unplanned operational events?
- How is escalation managed with responding to unplanned operational events?

LEAVE A COMMENT Cancel reply

Recent Posts

Recent Comments

Archives

Categories

Meta