{"id":1569,"date":"2018-09-04T09:20:40","date_gmt":"2018-09-04T09:20:40","guid":{"rendered":"http:\/\/wiki.thomasandsofia.com\/?p=1569"},"modified":"2018-09-10T23:04:27","modified_gmt":"2018-09-10T23:04:27","slug":"monitoring-and-alerting-for-aws-ec2","status":"publish","type":"post","link":"https:\/\/wiki.thomasandsofia.com\/?p=1569","title":{"rendered":"Monitoring and Alerting for AWS EC2"},"content":{"rendered":"<p><a href=\"http:\/\/wiki.thomasandsofia.com\/2018\/08\/31\/aws-cloudwatch-masterclass\/\">Main Menu<\/a><\/p>\n<h1>Section Menu<\/h1>\n<ul>\n<li><a href=\"#IntroductiontoMonitoringandalertingEC2\">Introduction to Monitoring and alerting EC2<\/a><\/li>\n<li><a href=\"#MonitoringEC2Instances\">Monitoring EC2 Instances<\/a><\/li>\n<li><a href=\"#EC2andCloudWatchAlarms\">EC2 and CloudWatch Alarms<\/a><\/li>\n<li><a href=\"#EC2SystemChecksandmetrics\">EC2 System Checks and metrics<\/a><\/li>\n<li><a href=\"#CustommetricsforEC2\">Custom metrics for EC2<\/a><\/li>\n<li><a href=\"#PushcustommetricsfromEC2toCloudWatch\">Push custom metrics from EC2 to CloudWatch<\/a><\/li>\n<li><a href=\"#Quiz\">Quiz<\/a><\/li>\n<\/ul>\n<p><a name=\"IntroductiontoMonitoringandalertingEC2\"><\/a><\/p>\n<h1>Introduction to Monitoring and alerting EC2<\/h1>\n<p><a href=\"https:\/\/www.udemy.com\/aws-monitoring-alerting-with-aws-cloudwatch-and-aws-sns\/learn\/v4\/t\/lecture\/7082800?start=0\" target=\"_blank\" rel=\"noopener\">https:\/\/www.udemy.com\/aws-monitoring-alerting-with-aws-cloudwatch-and-aws-sns\/learn\/v4\/t\/lecture\/7082800?start=0<\/a><\/p>\n<p>Monitoring EC2<\/p>\n<ul>\n<li>Basic metrics (CPU Utilization, Network) are pushed every 5 minutes\n<ul>\n<li>Can be increased to 1 min at extra charge\n<ul>\n<li>requires more compute<\/li>\n<li>requires more storage<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/li>\n<li>Track various standard metrics<\/li>\n<li>Track custom metrics<\/li>\n<li>Automatically start, stop, terminate, reboot or recover EC2\n<ul>\n<li>Can automatically detect a VM that has crashed due to some software or hardware issue and terminate that instance.<\/li>\n<li>Ensures you do not have dead servers in your pool<\/li>\n<li>Reduces charges &#8211; you&#8217;re charged for VMs as long as they are up.<\/li>\n<\/ul>\n<\/li>\n<li>Two System Level checks for the overall status<\/li>\n<\/ul>\n<p><a name=\"MonitoringEC2Instances\"><\/a><\/p>\n<h1>Monitoring EC2 Instances<\/h1>\n<p><a href=\"https:\/\/www.udemy.com\/aws-monitoring-alerting-with-aws-cloudwatch-and-aws-sns\/learn\/v4\/t\/lecture\/7082802?start=0\" target=\"_blank\" rel=\"noopener\">https:\/\/www.udemy.com\/aws-monitoring-alerting-with-aws-cloudwatch-and-aws-sns\/learn\/v4\/t\/lecture\/7082802?start=0<\/a><\/p>\n<ul>\n<li>Basic monitoring (5 min) enabled by default\n<ul>\n<li>Detailed monitoring: 1 min<\/li>\n<li>Can be enabled from the EC2 console<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<p><a name=\"EC2andCloudWatchAlarms\"><\/a><\/p>\n<h1>EC2 and CloudWatch Alarms<\/h1>\n<p><a href=\"https:\/\/www.udemy.com\/aws-monitoring-alerting-with-aws-cloudwatch-and-aws-sns\/learn\/v4\/t\/lecture\/7082804?start=0\" target=\"_blank\" rel=\"noopener\">https:\/\/www.udemy.com\/aws-monitoring-alerting-with-aws-cloudwatch-and-aws-sns\/learn\/v4\/t\/lecture\/7082804?start=0<\/a><\/p>\n<h2>How to use Alarms to managed our EC2 instances<\/h2>\n<ul>\n<li>CloudWatch &gt; Metrics &gt; EC2\n<ul>\n<li>Per-Instance Metrics\n<ul>\n<li>Good for setting alarms for individual instances<\/li>\n<\/ul>\n<\/li>\n<li>Aggregated by Instance Type\n<ul>\n<li>Aggregate value for ALL EC2 instances for that type (t2.micro, m4.small, etc.)<\/li>\n<li><strong>Must have Detailed Monitoring Enabled!<\/strong><\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<h2>Per-Instance<\/h2>\n<ul>\n<li>Filter by using the EC2 id\n<ul>\n<li>Example: <span id=\"detailsInstanceId\">i-0bdac641606572f86<\/span><\/li>\n<li>Filter by Tags is NOT supported<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<h2>Create a Per-Instance Alarm<\/h2>\n<ul>\n<li>CloudWatch &gt; Alarms &gt; Filter on Instance ID &gt; Select Metric (CPU Utilization)\n<ul>\n<li>Adjust parmeters (Time interval, etc.)<\/li>\n<li>[Next] &gt; Define Alarm<\/li>\n<\/ul>\n<\/li>\n<li>Define Alarm\n<ul>\n<li>Alarm Threshold\n<ul>\n<li>Name, Description, etc.<\/li>\n<li>Criteria for setting alarm (Metric &gt;= X)<\/li>\n<li>Consecutive periods<\/li>\n<\/ul>\n<\/li>\n<li>Actions\n<ul>\n<li>Alarm is: OK, Alarm or Insuff. data<\/li>\n<li>Notifications (Default)\n<ul>\n<li>Send notifications setup in SNS<\/li>\n<li>Can also output a document to SQS<\/li>\n<\/ul>\n<\/li>\n<li>Auto Scaling\n<ul>\n<li>Works best (only?) when EC2 instances are working together as a Load Balanced group or cluster<\/li>\n<li>Example: CPU Utilization &gt;60<\/li>\n<\/ul>\n<\/li>\n<li>EC2 Action\n<ul>\n<li><strong>Requires an IAM &#8220;EC2ActionsAccess&#8221; Role<\/strong><\/li>\n<li>Recover<\/li>\n<li>Stop<\/li>\n<li>Terminate<\/li>\n<li>Reboot<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<p><a name=\"EC2SystemChecksandmetrics\"><\/a><\/p>\n<h1>EC2 System Checks and metrics<\/h1>\n<p><a href=\"https:\/\/www.udemy.com\/aws-monitoring-alerting-with-aws-cloudwatch-and-aws-sns\/learn\/v4\/t\/lecture\/7082806?start=0\" target=\"_blank\" rel=\"noopener\">https:\/\/www.udemy.com\/aws-monitoring-alerting-with-aws-cloudwatch-and-aws-sns\/learn\/v4\/t\/lecture\/7082806?start=0<\/a><\/p>\n<h2>Different types of Status checks<\/h2>\n<ul>\n<li>Status checks performed Every Minute\n<ul>\n<li>Pass\n<ul>\n<li>All checks must pass for this Status return<\/li>\n<\/ul>\n<\/li>\n<li>Fail<\/li>\n<\/ul>\n<\/li>\n<li>System Status Checks\n<ul>\n<li>Monitor systems for the underlying infrastructure &#8211; Actual Hardware<\/li>\n<li>Detects problems that require AWS involvement\n<ul>\n<li>Loss of network connectivity<\/li>\n<li>Loss of system power<\/li>\n<li>Software issues on the physical host<\/li>\n<li>Hardware issues on the physical host<\/li>\n<\/ul>\n<\/li>\n<li>You CANNOT do anything to &#8216;fix&#8217; these problems<\/li>\n<li>You CAN terminate and replace the VM, which will automatically replace it on a new host<\/li>\n<\/ul>\n<\/li>\n<li>Instance Status Checks\n<ul>\n<li>Issues that might prevent your VM from running applications\n<ul>\n<li>Incorrect networking or startup configurations<\/li>\n<li>Exhausted Memory<\/li>\n<li>Corrupt file system<\/li>\n<li>Incompatible kernel<\/li>\n<\/ul>\n<\/li>\n<li>Usually can be fixed by rebooting or making configuration changes<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<h2>Demo<\/h2>\n<ul>\n<li>You can view the status of these system checks in the EC2 console<\/li>\n<\/ul>\n<h2>CloudWatch Metrics for EC2<\/h2>\n<ul>\n<li>CPU Utilization: Compute as %<\/li>\n<li>Disk Read and Write Operations\n<ul>\n<li>Measure the completed Read \/ Write operations from all Instance Store volumes available to the VM<\/li>\n<\/ul>\n<\/li>\n<li>Network In \/ Out\n<ul>\n<li>Bytes sent or received on all network interfaces on the VM<\/li>\n<\/ul>\n<\/li>\n<li>Status Check Failed\n<ul>\n<li>Pass or Fail for BOTH the system and internal status checks<\/li>\n<li>If value = 0 both passed<\/li>\n<li>If value = 1 either 1 or both failed<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<h2>Custom Metrics<\/h2>\n<ul>\n<li>Can publish to AWS using the AWS CLI or the API<\/li>\n<li>Can view statistical graphs within the console<\/li>\n<li>CloudWatch stores the data as a series of datapoints\n<ul>\n<li>Each datapoint has an associated timestamp<\/li>\n<\/ul>\n<\/li>\n<li>Can use these custom metrics to set up custom alarms<\/li>\n<\/ul>\n<p><a name=\"CustommetricsforEC2\"><\/a><\/p>\n<h1>Custom metrics for EC2<\/h1>\n<p><a href=\"https:\/\/www.udemy.com\/aws-monitoring-alerting-with-aws-cloudwatch-and-aws-sns\/learn\/v4\/t\/lecture\/7082808?start=0\" target=\"_blank\" rel=\"noopener\">https:\/\/www.udemy.com\/aws-monitoring-alerting-with-aws-cloudwatch-and-aws-sns\/learn\/v4\/t\/lecture\/7082808?start=0<\/a><\/p>\n<h2>How to Create Custom Metrics<\/h2>\n<h3>Create the IAM Roles<\/h3>\n<ul>\n<li>Required:\n<ul>\n<li>To send data to CloudWatch<\/li>\n<li>For CloudWatch to fetch these metrics from the EC2 instance<\/li>\n<li>Anytime an AWS services wants to talk to another AWS service, they require the appropriate roles that provide the required privileges.<\/li>\n<\/ul>\n<\/li>\n<li>IAM &gt; Policies &gt; [Create Policy]\n<ul>\n<li>CloudWatch:\n<ul>\n<li>Put Metric Data<\/li>\n<li>Get Metric Statistics<\/li>\n<li>List Metrics<\/li>\n<\/ul>\n<\/li>\n<li>EC2\n<ul>\n<li>Describe Tags<\/li>\n<\/ul>\n<\/li>\n<li>Name: Ec2-Custom-CloudWatch<\/li>\n<\/ul>\n<\/li>\n<li>IAM &gt; Roles &gt; [Create role]\n<ul>\n<li>Add Ec2-Custom-CloudWatch policy<\/li>\n<li>Name: CustomMetricsRole<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<h3>Add the role to an existing EC2 instance<\/h3>\n<ul>\n<li>Ec2 &gt; Instance &gt; Select Instance<\/li>\n<li>Actions &gt; Instance Settings &gt; Attach\/Replace IAM Roles &gt; CustomMetricsRole<\/li>\n<\/ul>\n<h2>Install the monitoring scripts<\/h2>\n<pre>sudo yum install perl-DateTime perl-Sys-Syslog perl-LWP-Protocol-https<\/pre>\n<div class=\"warning\">In the next section, I found I also had to install the following to prevent this error:<br \/>\n<i>Can&#8217;t locate Digest\/SHA.pm in @INC (@INC contains: \/usr\/local\/lib64\/perl5 \/usr\/local\/share\/perl5 \/usr\/lib64\/perl5\/vendor_perl \/usr\/share\/perl5\/vendor_perl \/usr\/lib64\/perl5 \/usr\/share\/perl5 . .) at AwsSignatureV4.pm line 23.<\/i><\/p>\n<pre>sudo yum install -y perl-Digest-SHA<\/pre>\n<\/div>\n<pre>wget http:\/\/aws-cloudwatch.s3.amazonaws.com\/downloads\/CloudWatchMonitoringScripts-1.2.1.zip<\/pre>\n<pre>unzip CloudWatchMonitoringScripts-1.2.1.zip<\/pre>\n<p>Delete the downloaded zip file<\/p>\n<pre>rm CloudWatchMonitoringScripts-1.2.1.zip<\/pre>\n<p>Switch to the script directory<\/p>\n<pre>cd aws-scripts-mon<\/pre>\n<p>List the contents of the directory<\/p>\n<pre>ls -la<\/pre>\n<ul>\n<li>awscreds.template\n<ul>\n<li>provides credentials for EC2 instance\n<ul>\n<li>Access ID and Secret Key<\/li>\n<\/ul>\n<\/li>\n<li>Only required if you did not assign the CloudWatch role created above.<\/li>\n<\/ul>\n<\/li>\n<li>AwsSignatureV4.pm<\/li>\n<li>CloudWatchClient.pm<\/li>\n<li>LICENSE.txt<\/li>\n<li>mon-get-instance-stats.pl\n<ul>\n<li>Important!<\/li>\n<\/ul>\n<\/li>\n<li>mon-put-instance-data.pl\n<ul>\n<li>Important!<\/li>\n<\/ul>\n<\/li>\n<li>NOTICE.txt<\/li>\n<\/ul>\n<p><a name=\"PushcustommetricsfromEC2toCloudWatch\"><\/a><\/p>\n<h1>Push custom metrics from EC2 to CloudWatch<\/h1>\n<p><a href=\"https:\/\/www.udemy.com\/aws-monitoring-alerting-with-aws-cloudwatch-and-aws-sns\/learn\/v4\/t\/lecture\/7082810?start=0\" target=\"_blank\" rel=\"noopener\">https:\/\/www.udemy.com\/aws-monitoring-alerting-with-aws-cloudwatch-and-aws-sns\/learn\/v4\/t\/lecture\/7082810?start=0<\/a><\/p>\n<h2>File Descriptions<\/h2>\n<ul>\n<li style=\"list-style-type: none;\">\n<ul>\n<li>awscreds.template\n<ul>\n<li>provides credentials for EC2 instance\n<ul>\n<li>Access ID and Secret Key<\/li>\n<\/ul>\n<\/li>\n<li>Only required if you did not assign the IAM CloudWatch role created above.<\/li>\n<\/ul>\n<\/li>\n<li>AwsSignatureV4.pm<\/li>\n<li>CloudWatchClient.pm<\/li>\n<li>mon-get-instance-stats.pl\n<ul>\n<li>Queries AWS CloudWatch and displays the most recent metics for the instance.<\/li>\n<\/ul>\n<\/li>\n<li>mon-put-instance-data.pl\n<ul>\n<li>Collects system metrics on an EC2 instance and sends them to AWS CloudWatch\n<ul>\n<li>Memory<\/li>\n<li>Swap<\/li>\n<li>Disk space utilization<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<h2>Running the scripts<\/h2>\n<pre>.\/mon-put-instance-data.pl --mem-util --mem-used --mem-avail<\/pre>\n<p>Successfully reported metrics to CloudWatch. Reference Id: a581db4c-b47e-11e8-a10b-c96e1c074e21<\/p>\n<h2>Verify the Metrics<\/h2>\n<ul>\n<li>CloudWatch &gt; Metrics &gt; Linux System &gt; InstanceId<\/li>\n<\/ul>\n<p>Setup as a recurring cron job<\/p>\n<pre>crontab -e\r\n*\/5 * * * * \/home\/ec2-user\/aws-scripts-mon\/mon-put-instance-data.pl --mem-util --disk-space-util --disk-path=\/ --from-cron\r\n<\/pre>\n<p><a name=\"Quiz\"><\/a><\/p>\n<h1>Quiz<\/h1>\n<h3>Amazon EC2 stands for:<\/h3>\n<ul>\n<li><strong>Elastic Cloud Compute<\/strong><\/li>\n<li>Elastic Cloud Two<\/li>\n<li>Elastic Clouding Compute<\/li>\n<li>Elastic Cloud Computation<\/li>\n<\/ul>\n<h3>The two different status checks which detect issues with an AWS instance are:<\/h3>\n<ul>\n<li>System status check and Non-system-status check<\/li>\n<li><strong>System status check and Instance-status check<\/strong>\n<ul>\n<li>These two systems status checks are for monitoring the AWS systems running behind the scenes, such as the hypervisor and overall health of the VM itself.<\/li>\n<\/ul>\n<\/li>\n<li>System status check and Monitoring system check<\/li>\n<li>Instance status check and Monitoring status check<\/li>\n<\/ul>\n<h3>&#8220;The status checks are meant for monitoring your AWS instances and their underlying hardware, network and software configurations.&#8221;<\/h3>\n<ul>\n<li><strong>True<\/strong><\/li>\n<li>False<\/li>\n<\/ul>\n<h3>Which of the following metric identifies the p4rocessing power that is being consumed to run operating system and applications upon a selected instance.<\/h3>\n<ul>\n<li><strong>CPUUtilization<\/strong><\/li>\n<li>DiskUtilization<\/li>\n<li>NetworkUtilization<\/li>\n<li>PowerUtilization<\/li>\n<\/ul>\n<h3>&#8220;By default, AWS EC2 is configured to provide metrics at 5 minute intervals, but detailed metrics which provides metric data for every 1 minute can also be enabled.&#8221;<\/h3>\n<ul>\n<li><strong>True<\/strong><\/li>\n<li>False<\/li>\n<\/ul>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<pre><\/pre>\n","protected":false},"excerpt":{"rendered":"<p>Main Menu Section Menu Introduction to Monitoring and alerting EC2 Monitoring EC2 Instances EC2 and CloudWatch Alarms EC2 System Checks and metrics Custom metrics for EC2 Push custom metrics from EC2 to CloudWatch Quiz Introduction to Monitoring and alerting EC2 https:\/\/www.udemy.com\/aws-monitoring-alerting-with-aws-cloudwatch-and-aws-sns\/learn\/v4\/t\/lecture\/7082800?start=0 Monitoring EC2 Basic metrics (CPU Utilization, Network) are pushed every 5 minutes Can be ..<\/p>\n<div class=\"clear-fix\"><\/div>\n<p><a href=\"https:\/\/wiki.thomasandsofia.com\/?p=1569\" title=\"read more...\">Read more<\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[18,39],"tags":[],"class_list":["post-1569","post","type-post","status-publish","format-standard","hentry","category-amazon-web-services-aws","category-cloudwatch"],"_links":{"self":[{"href":"https:\/\/wiki.thomasandsofia.com\/index.php?rest_route=\/wp\/v2\/posts\/1569","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/wiki.thomasandsofia.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/wiki.thomasandsofia.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/wiki.thomasandsofia.com\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/wiki.thomasandsofia.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=1569"}],"version-history":[{"count":12,"href":"https:\/\/wiki.thomasandsofia.com\/index.php?rest_route=\/wp\/v2\/posts\/1569\/revisions"}],"predecessor-version":[{"id":1587,"href":"https:\/\/wiki.thomasandsofia.com\/index.php?rest_route=\/wp\/v2\/posts\/1569\/revisions\/1587"}],"wp:attachment":[{"href":"https:\/\/wiki.thomasandsofia.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=1569"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/wiki.thomasandsofia.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=1569"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/wiki.thomasandsofia.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=1569"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}