Cloud Monitoring
-
CloudWatch is a monitoring service for providing metrics for every service in AWS
- Metric is a variable to monitor (Ex. CPUUtilization, NetworkIn, etc.)
- Metrics have timestamps
- Can create CloudWatch dashboards of metrics
-
Some important metrics:
- EC2 instances: CPU Utilization, Status Checks, Network (not RAM)
- Default metrics every 5 minutes
- Pay option for detailed monitoring, metrics every 1 minute
- EBS volumes: Disk read / writes
- S3 buckets: BucketSizeBytes, NumberOfObjects, AllRequests
- Billing: Total Estimated Charge (only in us-east-1)
- Service Limits: How much a service API is used
- Custom metrics: Push your own metrics
- EC2 instances: CPU Utilization, Status Checks, Network (not RAM)
-
Alarms are used to trigger notifications for any metric
- Some alarm actions:
- Auto Scaling: Increase or decrease EC2 instances "desired" count
- EC2 Actions: Stop, terminate, reboot or recover an EC2 instance
- SNS notifications: Send a notification into an SNS topic
- Various options (sampling, %, max, min, etc.)
- Choose the period to evaluate an alarm
- Ex. Create a billing alarm on CloudWatch Billing metric
- Alarm states: OK., INSUFFICIENT_DATA, ALARM
- Some alarm actions:
-
CloudWatch Logs can collect logs from:
- Elastic Beanstalk: Collection of logs from application
- ECS: Collection from containers
- AWS Lambda: Collection from function logs
- CloudTrail based on filter
- CloudWatch log agents on EC2 machines or on-premises servers
- Route 53: Log DNS queries
-
Enables real-time log monitoring
-
Adjustable CloudWatch Logs retention
-
CloudWatch Logs for EC2
- By default, no logs from your EC2 instance will go to CloudWatch
- Need to run a CloudWatch agent on EC2 to push log files
- Ensure IAM permissions are correct
- The CloudWatch log agent can be setup on-premises as well
Amazon EventBridge
-
A serverless task scheduler for creating, executing and managing millions of schedules across AWS services
- Used to be called CloudWatch Events
- Schedule cron jobs (scheduled scripts) to run
- Event pattern: Event rules to react to a service doing something
- Trigger Lambda functions, such as sending SQS / SNS messages...
-
Schema Registry
- A model event schema
- Can archieve events (all / filter) sent to an event bus (indefinitely / set period)
- Ability to replay archived events
AWS CloudTrail
- A web service that records API activity in your AWS account
- Obtain history of events / API calls by Console, SDK, CLI and AWS Services
- CloudTrail is enabled by default
- Provides governance, compliance and audit for your AWS account
- Put logs from CloudTrail into CloudWatch Logs or S3
- A trail can be applied to all regions by default or a single region
- If a resource is deleted in AWS, investigate CloudTrail first
AWS X-Ray
-
A service to monitor the components and services that make up your application
-
Typically, the traditional way to debug in production:
- Test locally
- Add log statements everywhere
- Re-deploy in production
- Log formats differ across applications which makes log analysis difficult
- Easy to debug a single application, but hard for distributed services (connected with services such as SQS / SNS)
- No common views of your entire architecture
-
AWS X-Ray resolves the issues of traditional debugging, some advantages are:
- Troubleshooting performance (less bottlenecks)
- Understanding dependencies in a microservice architecture
- Pinpoint service issues
- Review request behaviour
- Find errors and exceptions
- Is time SLA being met?
- Where I am throttled?
- Identify impacted users
Amazon CodeGuru
-
An ML-powered service for automated code reviews and application performance recommendations
-
Provides two functionalities:
- CodeGuru Reviewer: Automated code reviews for static code analysis (development)
- Built-in code reviews with actionable recommendations
- CodeGuru Profiler: Visibility / recommendations about application performance during runtime (production)
- Detect and optimise expensive lines of code pre-production
- Identify performance and cost improves in production
- CodeGuru Reviewer: Automated code reviews for static code analysis (development)
-
Amazon CodeGuru Reviewer
- Identify critical issues, security vulnerabilities and bugs that are difficult to find
- Ex. Common coding best practices, resource leaks, security detection, input validation
- Uses machine learning and automated reasoning
- Enables developers to learn across millions of code reviews on thousands of open-source and Aamazon repositories
- Supports Java and Python
- Integrated with GitHub, Bitbucket and AWS CodeCommit
- Identify critical issues, security vulnerabilities and bugs that are difficult to find
-
Amazon CodeGuru Profiler
- Helps understand the runtime behaviour of an application
- Ex. Identify if your application is consuming excessive CPU capacity on a logging routine
- Features:
- Identify and remove code inefficiencies
- Improve application performance (ex. reduce CPU utilisation)
- Decrease compute costs
- Provides heap summary (identify which objects using up memory)
- Anomaly detection
- Support applications running on AWS or on-premises
- Minimal overhead on application
- Helps understand the runtime behaviour of an application
AWS Health Dashboard - Service History
- Shows all regions and health of all services
- Shows historical information for each day
- Can subscribe to an RSS feed
- Used to be called AWS Service Health Dashboard
AWS Health Dashboard - Your Account
- AWS Account Health Dashboard is a global service providing alerts and remediation guidance when AWS is experiencing events that may impact you
- Shows how AWS outages directly impact you and your AWS resources
- Alerts, remediation, proactive, scheduled activities
- While the Service Health Dashboard displays the general status of AWS services, Account Health Dashboard gives a personalised view into the performance and availability of the AWS services underlying your AWS resources
- Dashboard displays relevant and timely information to help manage events in progress
- Provides proactive notification to help you plan for scheduled activities
- Can aggregate data from an entire AWS Organisation
- Used to be called AWS Personal Health Dashboard (PHD)