Elastic Load Balancing & Auto Scaling Groups

Scalability: Ability to accommodate a larger load by making hardware stronger (scale up) or by adding nodes (scale out)
- Vertical scalability: Implies non-distributed systems, such as a database
  - Increasing size of the instance
- Horizontal scalability: Implies distributed systems
  - Increasing the number of instances / systems for your application
High Availability: Running your application / system in at least 2 AZ
- Usually goes hand-in-hand with horizontal scaling
- Purpose is to mitigate damage from disasters, such as a data centre loss
Elasticity: Given a scalable system, elasticity implies there's auto-scaling such that the system scales based on the load
- Pay per use, match demand, optimised costs
Agility: New IT resources are easily attainable, reducing the time needed to make resources available to developers from weeks to minutes

Load balancers: Servers that forward internet traffic to multiple EC2 instances downstream
Benefits of using a load balancer:
- Spread load across multiple downstream instances
- Expose a single point of access (DNS) to your application
- Easily handle failures of downstream instances
- Do regular health checks to your instances
- Provide SSL termination (HTTPS) for your websites
- High availability across zones
Elastic Load Balancer (ELB): A managed load balancer
- AWS guarantees it will be working, takes care of upgrades, maintenance and high availability
- But AWS only provides a few configuration knobs
Costs less to set up a load balancer but more effort (maintenance, integrations)
AWS offers 4 kinds of load balancers:
- Application Load Balancer (HTTP / HTTPS / gRPC): Layer 7
  - Useful scenarios: Need HTTP Routing features, for static DNS (need static URL)
- Network Load Balancer (TCP / UDP / TLS): Layer 4
  - High performance, can handle millions of requests per second with low-latency
  - Useful scenarios: Static IP through Elastic IP
- Gateway Load Balancer (GENEVE on IP packets): Layer 3
  - Useful scenarios: Route traffic to firewalls that you manage on EC2 instances (classic firewall, intrusion detection, deep packet inspection)
- Classic Load Balancer (retired in 2023): Layer 4 & 7

Load on websites and applications can change, the cloud enables you to create and remove servers quickly
Goal of Auto Scaling Group (ASG):
- Scale out (add EC2 instances) to match an increased load
- Scale in (remove EC2 instances) to match a decreased load
- Ensure we have a minimum and maximum number of machines running
- Automatically register new instances to a load balancer
- Replace unhealthy instances
Only runs at an optimal capacity which saves costs
Scaling strategies are useful when your load has predictable time-based patterns
- Manual Scaling: Update the size of an ASG manually
- Dynamic Scaling: Respond to changing demand
  - Simple / Step Scaling:
    - Ex. Given a minimum and maximum threshold representing load, when exceeded, either add or remove servers
  - Target Tracking Scaling:
    - Ex. Want the average ASG CPU to stay at around 40%
  - Scheduled Scaling: Anticipate a scaling based on known usage patterns
    - Ex. Increase minimum capacity to 10 at Friday 5pm
- Predictive Scaling: Using machine learning to predict future traffic
  - Automatically provisions the correct number of EC2 instances in advance