Section 4 - High Availability & Scalability
- Scalability is the adaptability of a system / app to handle greater loads
Vertical Scalability
- Increasing size of an instance (ex. more compute power)
- Common for non-distributed systems such as a database
- RDS, ElastiCache are services that can scale vertically
Horizontal Scalability
- Increasing number of instances
- Common for distributed systems such as web apps
- In EC2, you can use auto scaling groups and load balancers
High Availability
- Usually complements horizontal scaling
- High availability usually implies running your app in at least 2 data centres (availability zones)
- High availability can be passive (ex. RDS Multi AZ) or active (ex.horizontal scaling)
- In EC2, you can use ASG multi AZ and load balancer multi AZ
Load Balancers
- Load balancers are servers that forward traffic to multiple servers downstream
- Benefits:
- Spread load across multiple downstream instances
- Expose DNS to your app
- Easily handle failures of downstream instances
- Regular health checks
- Provide SSL termination (HTTPS) for your websites
- Enforce stickiness with cookies
- High availability across zones
- Separate public traffic from private traffic
Elastic Load Balancer
- An ELB is a managed load balancer
- AWS guarantees its uptime, upgrades, maintenance and high availability
- Cheaper to setup but more effort
- Integrated with many other AWS services (ex. EC2, ECS, Certificate Manager, CloudWatch, Route 53, WAF, Global Accelerator, etc.)
Health Checks
- Important for load balancers, need to know if instances receiving the traffic are available to reply to requests
- Health checks are done on a port and a route (/health is common)
- If the response is not 200, then the instance is unhealthy
Types of Load Balancer
- Classic: HTTP, HTTPS, TCP, SSL
- TCP / HTTP based health checks
- Fixed hostname
- Application: HTTP, HTTPS, WebSocket
- Load balance multiple HTTP apps across target groups or same machine
- Great for micro-services and container-based apps
- Health checks are at the target group level
- Port mapping feature to redirect to a dynamic port in ECS
- Fixed hostname
- App servers don't see client IP directly, the true IP is inserted in the X-Forwarded-For header
- Network: TCP, TLS, UDP
- Handle millions of requests per second at very low latency
- Has one static IP per AZ and supports assigning Elastic IP
- No free tier in AWS
- Gateway: IP
- Deploy, scale and manage multiple 3rd party network virtual appliances
- Combines transparent network gateway to provide a single entrance/exit for traffic and a load balancer to distribute the traffic
- Uses GENEVE protocol on port 6081
- Targets EC2 instances and private IP addresses
Sticky Sessions
- Stickiness ensures clients are redirected to the same instance behind load balancers, therefore, retaining their session data
- Works for all load balancer types except gateway
- The sticky cookie expiration date is managed in CLB and ALB
- Enabling stickiness may introduce imbalance to the load over the backend EC2 instances
- Application-based cookies
- Custom
- Generated by the target
- Includes any custom attributes required by the app
- Cookie name must be specified for each target group, don't use reserved names such as AWSALB, AWSALBAPP or AWSALBTG
- Application
- Generated by load balancer
- Cookie name is AWSALBAPP
- Duration-based cookies
- Generated by load balancer
- Cookie name is AWSALB (ALB) or AWSELB (CLB)
Cross-Zone Load Balancing
- Each load balancer distributes traffic evenly across instances in an AZ, rather than distributing requests in the instances of a node group
- Enabled by default in ALB (disabled in other types), no charges for inter-AZ data
- In NLB and GLB, you pay charges for inter-AZ data if enabled, this is not the case in CLB
SSL/TLS
- An SSL Certificate allows traffic between clients and your load balancers to be encrypted in transit
- TLS certificates are mainly used but people still refer to it as SSL
- Public SSL certificates are issued by Certificate Authorities, they must be renewed and you must set an expiration date
- Load balancers use an X.509 certificate, manageable via Certificate Manager
- Alternatively, you can use your own certificates
- For an HTTPS listener:
- Specify a default certificate
- Add an optional list of certificates to support multiple domains
- Clients can use a Server Name Indication to specify the target hostname
- Ability to specify a security policy to support older versions of SSL/TLS
Server Name Indication (SNI)
- SNI solves the issue of loading multiple SSL certifications onto a web server to server multiple websites
- Requires the client to indicate the hostname of the target server in the initial SSL handshake
- Server will then find the correct certificate or return the default one
- Only works for CloudFront, ALB and NLB
SSL Certificates for ELBs
- CLB can only support one SSL certificate
- Must use multiple CLB for multiple hostnames with multiple SSL certificates
- ALB and NLB supports multiple listeners with multiple SSL certificates, it uses SNI
Connection Draining
- Feature naming:
- Connection Draining for CLB, Deregistration Delay for ALB and NLB
- The time to complete in-flight requests while the instance is de-registering or unhealthy
- Usually 1 to 3600 seconds (default of 300 seconds)
- Can be disabled by setting value to 0
- Set a low value for short requests
Auto Scaling Group
- Scale in/out EC2 instances to match the required load
- Ensure minimum/maximum number of EC2 instances running
- Automatically register new instances to a load balancer
- Re-create EC2 instance if a previous one is terminated (could be due to unhealthiness)
- ASG are free, only paying for the underlying instances
- Attributes:
- Launch Template configurations:
- AMI, Instance Type
- EC2 User Data
- EBS volumes
- Security groups
- SSH Key Pair
- IAM Roles for instances
- Network and subnet information
- Load balancer information
- Minimum, maximum and initial capacity
- Scaling policies
- Use CloudWatch to scale ASG by monitoring metrics such as Average CPU or a custom metric
ASG Scaling Policies
- Dynamic scaling
- Target tracking
- Simple setup
- Ex. want the ASG CPU around 40%
- Simple / step scaling
- CloudWatch alarm triggered, CPU > 70%, add 2 units
- CloudWatch alarm triggered, CPU < 30%, remove 1 unit
- Scheduled scaling
- Anticipate scaling based on usage patterns
- Predictive scaling
- Continuously forecast load and schedule scaling ahead
- Good metrics:
- CPUUtilization: Average CPU utilization across instances
- RequestCountPerTarget: Ensure number of requests per EC2 instances is stable
- Average Network In / Out
- Custom metric pushed using CloudWatch
- Scaling cooldown period is 300 seconds by default
- Will not launch of terminate instances during this period