Monitoring and Alerting for AWS EC2

  Amazon Web Services (AWS), CloudWatch

Main Menu

Section Menu

Introduction to Monitoring and alerting EC2

https://www.udemy.com/aws-monitoring-alerting-with-aws-cloudwatch-and-aws-sns/learn/v4/t/lecture/7082800?start=0

Monitoring EC2

  • Basic metrics (CPU Utilization, Network) are pushed every 5 minutes
    • Can be increased to 1 min at extra charge
      • requires more compute
      • requires more storage
  • Track various standard metrics
  • Track custom metrics
  • Automatically start, stop, terminate, reboot or recover EC2
    • Can automatically detect a VM that has crashed due to some software or hardware issue and terminate that instance.
    • Ensures you do not have dead servers in your pool
    • Reduces charges – you’re charged for VMs as long as they are up.
  • Two System Level checks for the overall status

Monitoring EC2 Instances

https://www.udemy.com/aws-monitoring-alerting-with-aws-cloudwatch-and-aws-sns/learn/v4/t/lecture/7082802?start=0

  • Basic monitoring (5 min) enabled by default
    • Detailed monitoring: 1 min
    • Can be enabled from the EC2 console

EC2 and CloudWatch Alarms

https://www.udemy.com/aws-monitoring-alerting-with-aws-cloudwatch-and-aws-sns/learn/v4/t/lecture/7082804?start=0

How to use Alarms to managed our EC2 instances

  • CloudWatch > Metrics > EC2
    • Per-Instance Metrics
      • Good for setting alarms for individual instances
    • Aggregated by Instance Type
      • Aggregate value for ALL EC2 instances for that type (t2.micro, m4.small, etc.)
      • Must have Detailed Monitoring Enabled!

Per-Instance

  • Filter by using the EC2 id
    • Example: i-0bdac641606572f86
    • Filter by Tags is NOT supported

Create a Per-Instance Alarm

  • CloudWatch > Alarms > Filter on Instance ID > Select Metric (CPU Utilization)
    • Adjust parmeters (Time interval, etc.)
    • [Next] > Define Alarm
  • Define Alarm
    • Alarm Threshold
      • Name, Description, etc.
      • Criteria for setting alarm (Metric >= X)
      • Consecutive periods
    • Actions
      • Alarm is: OK, Alarm or Insuff. data
      • Notifications (Default)
        • Send notifications setup in SNS
        • Can also output a document to SQS
      • Auto Scaling
        • Works best (only?) when EC2 instances are working together as a Load Balanced group or cluster
        • Example: CPU Utilization >60
      • EC2 Action
        • Requires an IAM “EC2ActionsAccess” Role
        • Recover
        • Stop
        • Terminate
        • Reboot

EC2 System Checks and metrics

https://www.udemy.com/aws-monitoring-alerting-with-aws-cloudwatch-and-aws-sns/learn/v4/t/lecture/7082806?start=0

Different types of Status checks

  • Status checks performed Every Minute
    • Pass
      • All checks must pass for this Status return
    • Fail
  • System Status Checks
    • Monitor systems for the underlying infrastructure – Actual Hardware
    • Detects problems that require AWS involvement
      • Loss of network connectivity
      • Loss of system power
      • Software issues on the physical host
      • Hardware issues on the physical host
    • You CANNOT do anything to ‘fix’ these problems
    • You CAN terminate and replace the VM, which will automatically replace it on a new host
  • Instance Status Checks
    • Issues that might prevent your VM from running applications
      • Incorrect networking or startup configurations
      • Exhausted Memory
      • Corrupt file system
      • Incompatible kernel
    • Usually can be fixed by rebooting or making configuration changes

Demo

  • You can view the status of these system checks in the EC2 console

CloudWatch Metrics for EC2

  • CPU Utilization: Compute as %
  • Disk Read and Write Operations
    • Measure the completed Read / Write operations from all Instance Store volumes available to the VM
  • Network In / Out
    • Bytes sent or received on all network interfaces on the VM
  • Status Check Failed
    • Pass or Fail for BOTH the system and internal status checks
    • If value = 0 both passed
    • If value = 1 either 1 or both failed

Custom Metrics

  • Can publish to AWS using the AWS CLI or the API
  • Can view statistical graphs within the console
  • CloudWatch stores the data as a series of datapoints
    • Each datapoint has an associated timestamp
  • Can use these custom metrics to set up custom alarms

Custom metrics for EC2

https://www.udemy.com/aws-monitoring-alerting-with-aws-cloudwatch-and-aws-sns/learn/v4/t/lecture/7082808?start=0

How to Create Custom Metrics

Create the IAM Roles

  • Required:
    • To send data to CloudWatch
    • For CloudWatch to fetch these metrics from the EC2 instance
    • Anytime an AWS services wants to talk to another AWS service, they require the appropriate roles that provide the required privileges.
  • IAM > Policies > [Create Policy]
    • CloudWatch:
      • Put Metric Data
      • Get Metric Statistics
      • List Metrics
    • EC2
      • Describe Tags
    • Name: Ec2-Custom-CloudWatch
  • IAM > Roles > [Create role]
    • Add Ec2-Custom-CloudWatch policy
    • Name: CustomMetricsRole

Add the role to an existing EC2 instance

  • Ec2 > Instance > Select Instance
  • Actions > Instance Settings > Attach/Replace IAM Roles > CustomMetricsRole

Install the monitoring scripts

sudo yum install perl-DateTime perl-Sys-Syslog perl-LWP-Protocol-https
In the next section, I found I also had to install the following to prevent this error:
Can’t locate Digest/SHA.pm in @INC (@INC contains: /usr/local/lib64/perl5 /usr/local/share/perl5 /usr/lib64/perl5/vendor_perl /usr/share/perl5/vendor_perl /usr/lib64/perl5 /usr/share/perl5 . .) at AwsSignatureV4.pm line 23.

sudo yum install -y perl-Digest-SHA
wget http://aws-cloudwatch.s3.amazonaws.com/downloads/CloudWatchMonitoringScripts-1.2.1.zip
unzip CloudWatchMonitoringScripts-1.2.1.zip

Delete the downloaded zip file

rm CloudWatchMonitoringScripts-1.2.1.zip

Switch to the script directory

cd aws-scripts-mon

List the contents of the directory

ls -la
  • awscreds.template
    • provides credentials for EC2 instance
      • Access ID and Secret Key
    • Only required if you did not assign the CloudWatch role created above.
  • AwsSignatureV4.pm
  • CloudWatchClient.pm
  • LICENSE.txt
  • mon-get-instance-stats.pl
    • Important!
  • mon-put-instance-data.pl
    • Important!
  • NOTICE.txt

Push custom metrics from EC2 to CloudWatch

https://www.udemy.com/aws-monitoring-alerting-with-aws-cloudwatch-and-aws-sns/learn/v4/t/lecture/7082810?start=0

File Descriptions

    • awscreds.template
      • provides credentials for EC2 instance
        • Access ID and Secret Key
      • Only required if you did not assign the IAM CloudWatch role created above.
    • AwsSignatureV4.pm
    • CloudWatchClient.pm
    • mon-get-instance-stats.pl
      • Queries AWS CloudWatch and displays the most recent metics for the instance.
    • mon-put-instance-data.pl
      • Collects system metrics on an EC2 instance and sends them to AWS CloudWatch
        • Memory
        • Swap
        • Disk space utilization

Running the scripts

./mon-put-instance-data.pl --mem-util --mem-used --mem-avail

Successfully reported metrics to CloudWatch. Reference Id: a581db4c-b47e-11e8-a10b-c96e1c074e21

Verify the Metrics

  • CloudWatch > Metrics > Linux System > InstanceId

Setup as a recurring cron job

crontab -e
*/5 * * * * /home/ec2-user/aws-scripts-mon/mon-put-instance-data.pl --mem-util --disk-space-util --disk-path=/ --from-cron

Quiz

Amazon EC2 stands for:

  • Elastic Cloud Compute
  • Elastic Cloud Two
  • Elastic Clouding Compute
  • Elastic Cloud Computation

The two different status checks which detect issues with an AWS instance are:

  • System status check and Non-system-status check
  • System status check and Instance-status check
    • These two systems status checks are for monitoring the AWS systems running behind the scenes, such as the hypervisor and overall health of the VM itself.
  • System status check and Monitoring system check
  • Instance status check and Monitoring status check

“The status checks are meant for monitoring your AWS instances and their underlying hardware, network and software configurations.”

  • True
  • False

Which of the following metric identifies the p4rocessing power that is being consumed to run operating system and applications upon a selected instance.

  • CPUUtilization
  • DiskUtilization
  • NetworkUtilization
  • PowerUtilization

“By default, AWS EC2 is configured to provide metrics at 5 minute intervals, but detailed metrics which provides metric data for every 1 minute can also be enabled.”

  • True
  • False

 

 


              

LEAVE A COMMENT