Graduate Program KB

Cloud Monitoring

  • CloudWatch is a monitoring service for providing metrics for every service in AWS

    • Metric is a variable to monitor (Ex. CPUUtilization, NetworkIn, etc.)
    • Metrics have timestamps
    • Can create CloudWatch dashboards of metrics
  • Some important metrics:

    • EC2 instances: CPU Utilization, Status Checks, Network (not RAM)
      • Default metrics every 5 minutes
      • Pay option for detailed monitoring, metrics every 1 minute
    • EBS volumes: Disk read / writes
    • S3 buckets: BucketSizeBytes, NumberOfObjects, AllRequests
    • Billing: Total Estimated Charge (only in us-east-1)
    • Service Limits: How much a service API is used
    • Custom metrics: Push your own metrics
  • Alarms are used to trigger notifications for any metric

    • Some alarm actions:
      • Auto Scaling: Increase or decrease EC2 instances "desired" count
      • EC2 Actions: Stop, terminate, reboot or recover an EC2 instance
      • SNS notifications: Send a notification into an SNS topic
    • Various options (sampling, %, max, min, etc.)
    • Choose the period to evaluate an alarm
    • Ex. Create a billing alarm on CloudWatch Billing metric
    • Alarm states: OK., INSUFFICIENT_DATA, ALARM
  • CloudWatch Logs can collect logs from:

    • Elastic Beanstalk: Collection of logs from application
    • ECS: Collection from containers
    • AWS Lambda: Collection from function logs
    • CloudTrail based on filter
    • CloudWatch log agents on EC2 machines or on-premises servers
    • Route 53: Log DNS queries
  • Enables real-time log monitoring

  • Adjustable CloudWatch Logs retention

  • CloudWatch Logs for EC2

    • By default, no logs from your EC2 instance will go to CloudWatch
    • Need to run a CloudWatch agent on EC2 to push log files
    • Ensure IAM permissions are correct
    • The CloudWatch log agent can be setup on-premises as well

Amazon EventBridge

  • A serverless task scheduler for creating, executing and managing millions of schedules across AWS services

    • Used to be called CloudWatch Events
    • Schedule cron jobs (scheduled scripts) to run
    • Event pattern: Event rules to react to a service doing something
    • Trigger Lambda functions, such as sending SQS / SNS messages...
  • Schema Registry

    • A model event schema
    • Can archieve events (all / filter) sent to an event bus (indefinitely / set period)
    • Ability to replay archived events

AWS CloudTrail

  • A web service that records API activity in your AWS account
    • Obtain history of events / API calls by Console, SDK, CLI and AWS Services
    • CloudTrail is enabled by default
    • Provides governance, compliance and audit for your AWS account
    • Put logs from CloudTrail into CloudWatch Logs or S3
    • A trail can be applied to all regions by default or a single region
    • If a resource is deleted in AWS, investigate CloudTrail first

AWS X-Ray

  • A service to monitor the components and services that make up your application

  • Typically, the traditional way to debug in production:

    • Test locally
    • Add log statements everywhere
    • Re-deploy in production
    • Log formats differ across applications which makes log analysis difficult
    • Easy to debug a single application, but hard for distributed services (connected with services such as SQS / SNS)
    • No common views of your entire architecture
  • AWS X-Ray resolves the issues of traditional debugging, some advantages are:

    • Troubleshooting performance (less bottlenecks)
    • Understanding dependencies in a microservice architecture
    • Pinpoint service issues
    • Review request behaviour
    • Find errors and exceptions
    • Is time SLA being met?
    • Where I am throttled?
    • Identify impacted users

Amazon CodeGuru

  • An ML-powered service for automated code reviews and application performance recommendations

  • Provides two functionalities:

    • CodeGuru Reviewer: Automated code reviews for static code analysis (development)
      • Built-in code reviews with actionable recommendations
    • CodeGuru Profiler: Visibility / recommendations about application performance during runtime (production)
      • Detect and optimise expensive lines of code pre-production
      • Identify performance and cost improves in production
  • Amazon CodeGuru Reviewer

    • Identify critical issues, security vulnerabilities and bugs that are difficult to find
      • Ex. Common coding best practices, resource leaks, security detection, input validation
    • Uses machine learning and automated reasoning
    • Enables developers to learn across millions of code reviews on thousands of open-source and Aamazon repositories
    • Supports Java and Python
    • Integrated with GitHub, Bitbucket and AWS CodeCommit
  • Amazon CodeGuru Profiler

    • Helps understand the runtime behaviour of an application
      • Ex. Identify if your application is consuming excessive CPU capacity on a logging routine
    • Features:
      • Identify and remove code inefficiencies
      • Improve application performance (ex. reduce CPU utilisation)
      • Decrease compute costs
      • Provides heap summary (identify which objects using up memory)
      • Anomaly detection
    • Support applications running on AWS or on-premises
    • Minimal overhead on application

AWS Health Dashboard - Service History

  • Shows all regions and health of all services
  • Shows historical information for each day
  • Can subscribe to an RSS feed
  • Used to be called AWS Service Health Dashboard

AWS Health Dashboard - Your Account

  • AWS Account Health Dashboard is a global service providing alerts and remediation guidance when AWS is experiencing events that may impact you
    • Shows how AWS outages directly impact you and your AWS resources
    • Alerts, remediation, proactive, scheduled activities
  • While the Service Health Dashboard displays the general status of AWS services, Account Health Dashboard gives a personalised view into the performance and availability of the AWS services underlying your AWS resources
  • Dashboard displays relevant and timely information to help manage events in progress
    • Provides proactive notification to help you plan for scheduled activities
  • Can aggregate data from an entire AWS Organisation
  • Used to be called AWS Personal Health Dashboard (PHD)