- EC2 - SAA Level
- EC2 Instance Storage
- High Availability and Scalability: ELB & ASG
- RDS, Aurora, ElastiCache
- Route 53
- Amazon S3 Introduction
- Amazon S3 Advanced
- S3 Security
EC2 - SAA Level
- Elastic IPs - A fixed public IPv4 for your instance
- Owned by you until you delete it
- Can attach to one instance at a time
- Can be used to mask the failure of an instance or software by rapidly remapping the address to another instance in your account
- You can only have 5 Elastic IPs in your account
- Try to avoid using Elastic IP if you can
- They often reflect poor architectural decisions
- Instead use a random public IP and register a DNS name to it
- By default, your EC2 instance comes with
- A private IP (for the interal AWS network)
- A public IP (for the web and SSH. Can change if instance is stopped and started)
- Placement Groups define your EC2 instance placement strategy, which will be one of the following:
Placement Group | Description | Pros | Cons | Use Cases |
---|---|---|---|---|
Cluster | Clusters instances into a low-latency group in a single AZ | Great Network (10Gbps bandwidth between instances with Enhanced Networking enabled) | If the AZ fails, all instances fail at the same time |
|
Spread | Spreads instances across underlying hardware |
| Limited to 7 instances per AZ per placement group |
|
Partition | Spreads instances across many different partitions (on different sets of racks) within an AZ. Can span across multiple AZs in the same region. |
| A partition failure can still affect many EC2 instances |
|
- Elastic Network Interfaces (ENI) are logical components in VPCs that represent virtual network cards.
- Can have the following attributes:
- Primary private IPv4, one or more secondary IPv4
- One Elastic IP (IPv4) per private IPv4
- One Public IPv4
- One or more security groups
- A MAC address
- You can create ENI independently and attach them on the fly (move them) on EC2 instances for failover
- Bound to a specific AZ
- Can have the following attributes:
- EC2 Hibernate is an alternative to stopping and terminating for EC2 instances where:
- Volatile memory is preserved (RAM state is written to a file in the root EBS volume)
- The instance boot is much faster as the OS isn't stopped/restarted
- The root EBS must be encrypted
- Supported when:
- EC2 instance type supports hibernation (to get the list of types with hibernation you can use the command
aws ec2 describe-instance-types --filters Name=hibernation-supported,Values=true --query "InstanceTypes[*].[InstanceType]" --output text | sort
) - AMI supports hibernation (listed here)
- Instance RAM Size is less than 150GB
- Instance is not bare metal
- Root volume is a large encrypted EBS volume
- Instance is an On-Demand, Reserved or Spot Instance
- EC2 instance type supports hibernation (to get the list of types with hibernation you can use the command
- Instances cannot be hibernated more than 60 days
- EC2 Hibernate use cases:
- Long-running processing
- Saving the RAM state
- Services that take time to initialize
EC2 Instance Storage
EBS
- An Elastic Block Store (EBS) volume is a network drive you can attach to instances as they run. They allows your instances to persist data even after termination
- Uses the network to communicate to the instance, meaning reads/writes have some latency
- Can be detached from an instance and attached to another quickly
- They can only be mounted to one instance at a time (at the CCP level)
- They are bound to a specific AZ. To move a volume across you need to snapshot it
- Free tier: 30GB of free EBS storage of type General Purpose (SSD) or Magnetic per month
- They have a provisioned capacity (size in GBs and IOPS) which can be changed. You get billed for all provisioned capacity.
- EBS Volumes have a Delete on Termination attribute that determines if the volume should be deleted if the EC2 instance it is attached to is deleted
- By default, the root EBS volume will be deleted
- By default, any other attached EBS volume is not deleted
- EBS Snapshots allow you to make a backup of your EBS volume at a given time.
- Snapshots can be copied across AZs and Regions
- It's not necessary to detach the volume to snapshot, but it's recommended
- Features:
- EBS Snapshot Archive - Move a Snapshot to an archive tier that is 75% cheaper. Takes within 24-72 hours to restore the archive
- Recycle Bin for EBS Snapshots - Allows you to setup rules to retain deleted snapshots so you can recover them after an accidental deletion. You may specify a retention period (from 1 day to 1 year)
- Fast Snapshot Restore (FSR) - Force a full initialization of snapshot to have no latency on the first use for a fee.
- Amazon Machine Images (AMIs) are customizations of EC2 instances, allowing you to add your own software, configuration, OS, monitoring, etc.
- AMIs have a faster boot/configuration time because all your software is pre-packaged
- AMIs are built for a specific region and can be copied across regions
- You can launch EC2 instances from
- A public AMI (AWS provided)
- Your own AMI
- An AWS Marketplace AMI (an AMI provided/sold by an AWS user)
- Process: 1. Start an EC2 instance and customize it 2. Stop the instance for data integrity 3. Build an AMI (this will also create EBS snapshots) 4. Launch instances from other AMIs
EC2 Instance Store
-
EC2 Instance Store volumes are high-performance hardware disks that:
- Offer better I/O performance than EBS volumes
- Lose their storage if they're stopped
- Work well as a buffer/cache for scratch data or temporary content
- Risk data loss if hardware fails
- You are responsible for backing up and replicating
- Come in 6 types:
General Purpose SSD volumes | Provisioned IOPS SSD volumes | |||
---|---|---|---|---|
Volume type | gp3 | gp2 |
| io1 |
Durability | 99.8% - 99.9% durability (0.1% - 0.2% annual failure rate) | 99.999% durability (0.001% annual failure rate) | 99.8% - 99.9% durability (0.1% - 0.2% annual failure rate) | |
Use cases |
| Workloads that require:
|
| |
Volume size | 1 GiB - 16 TiB | 4 GiB - 64 TiB | 4 GiB - 16 TiB | |
Max IOPS per volume | 16,000 (64 KiB I/O) | 16,000 (16 KiB I/O) | 256,000 (16 KiB I/O) | 64,000 (16 KiB I/O) |
Max throughput per volume | 1,000 MiB/s | 250 MiB/s | 4,000 MiB/s | 1,000 MiB/s |
Amazon EBS Multi-attach | Not supported | Supported | ||
NVMe reservations | Not supported | Supported (up to 16 instances at once) | Not supported | |
Boot volume | Supported |
- When you encrypt an EBS volume:
- Data at rest is encrypted inside the volume
- All the data in flight moving between the instance and the volume is encrypted
- All snapshots are encrypted
- All volumes created from the snapshot are encrypted
- Encryption and decryption are handled transparently and have a minimal impact on latency
- EBS Encryption leverages keys from KMS (AES-256)
- You can encrypt an unencrypted EBS volume:
- Create an EBS snapshot of the volume
- Encrypt the EBS snapshot using copy
- Create a new EBS volume from the snapshot
- Attach the encrypted volume to the original instance
EFS
- Elastic File Systems (EFS) are managed NFSes that can be mounted on many EC2 instances
- Works well with EC2 instances in multi-AZ environment
- Highly available, scalable, expensive, pay per use
- Use cases - Content management, web serving, data sharing, WordPress
- Uses NFSv4.1 protocol
- Uses security group to control access to EFS
- Compatible with Linux-based AMI
- Encryption at rest using KMS
- POSIX file system that has a standard file API
- File system scales automatically
Storage class | Designed for | First byte read latency | Durability (designed for) | Availability SLA | Availability zones | Minimum billing charge per file | Minimum storage duration |
EFS Standard | Active data requiring fast sub-millisecond latency performance | Sub-millisecond | 99.999999999% (11 9's) | 99.99% (Regional) 99.9% (One Zone) | =>3 (Regional) 1 (One Zone) | Not applicable | Not applicable |
EFS Infrequent Access | Inactive data that is accessed only a few times each quarter | Tens of milliseconds | 99.99% (Regional) 99.9% (One Zone) | =>3 (Regional) 1 (One Zone) | 128 KiB | Not applicable | |
EFS Archive | Inactive data that is accessed a few times each year or less | Tens of milliseconds | 99.9% (Regional) | =>3 (Regional) | 128 KiB | 90 days |
High Availability and Scalability: ELB & ASG
- Vertical Scalability means increasing the size of the instance (eg. scaling an instance from a
t2.micro
to at2.large
to meet demand)- Very common for non-distributed systems, such as a database
- RDS, ElastiCache are services that can scale vertically
- Usually a limit to how much you can vertically scale
- Horizontal Scalability means increasing the number of instances / systems for your application
- Implies distributed systems
- Very common for web applications / modern applications
- EC2 instances can be horizontally scaled using ASG and/or ELB
- High Availability - Your application can survive a data center loss
- Can be passive (eg. RDS Multi-AZ)
- Can be active (eg. horizontal scaling)
- Can be achieved with a multi-AZ ASG and/or a multi-AZ ELB
Elastic Load Balancer
- Load Balancers are servers that forward traffic to multiple servers downstream
- Why use a load balancer?
- Spread load across multiple downstream instances
- Expose a single point of access (DNS) to your application
- Seamlessly handle failures of downstream instances
- Do regular health checks to your instances
- Provide SSL termination (decrypting encrypted traffic before passing it along to a web server; HTTPS) for your websites
- Enforce stickiness (session persistence) with cookies
- High availability across zones
- Separate public traffic from private traffic
- An Elastic Load Balancer is a managed load balancer integrated with many AWS offerings/services, including:
- EC2, EC2 ASGs, ECS
- AWS Certificate Manager (ACM), CloudWatch
- Route 53, AWS WAF, AWS Global Accelerator
- Health Checks are a crucial process for load balancers that enable them to know if instances it forwards traffic to are available to reply to requests
- Health checks are done on a port and a route (usually
/health
)
- Health checks are done on a port and a route (usually
Elastic Load Balancer Types
- Classic Load Balancers (v1)
- Supports TCP (layer 4), HTTP & HTTPS (layer 7)
- Health checks are TCP or HTTP
- Fixed hostname
- Application Load Balancer (v2)
- Supports layer 7 (HTTP/HTTPS)
- Can load balance to multiple HTTP applications across machines (target groups) based on:
- Path in URL (eg.
example.com/users
orexample.com/posts
) - Hostname in URL (eg.
one.example.com
orother.example.com
) - Query String, Headers (eg.
example.com/users?id=123&order=false
)
- Path in URL (eg.
- Can load balance to multiple applications on the same machine (eg. containers)
- Supports HTTP/2 and WebSocket
- Supports redirects (eg. from HTTP to HTTPS)
- Great fit for micro-services & container-based applications
- Has a port mapping feature to redirect to a dynamic port in ECS
- Target Groups can include:
- EC2 instances
- ECS tasks
- Lambda functions (HTTP requests are translated into JSON events)
- IP addresses (must be private IPs)
- Health checks are performed at the target group level
- Fixed hostname
- The application servers don't see the user's IP, port or protocol directly, so they are inserted in the headers
X-Forwarded-For
,X-Forwarded-Port
andX-Forwarded-Proto
respectively.
- Network Load Balancer (v2)
- Support layer 4 (TCP & UDP)
- Handles millions of request per second with a lower latency than ALB (~100ms vs ~400ms)
- Has one static IP per AZ and supports assigning Elastic IP (helpful for IP whitelisting)
- NLB are used for extreme performance, TCP or UDP traffic
- Not included in the AWS free tier
- Target Groups can include:
- EC2 instances
- IP Addresses (must be private IPs)
- Application Load Balancers
- Health Checks support the TCP, HTTP and HTTPS protocols
- Gateway Load Balancer
- Operates at layer 3 (Network Layer (IP Packets))
- Use to deploy, scale and manage a fleet of 3rd party network virtual appliances such as:
- Firewall
- Intrusion Detection and Prevention SYstem
- Deep Packet Inspection System
- Payload manipulator
- Works as both a Load Balancer and a Transparent Network Gateway (single entry/exit point for all traffic)
- Target Groups can include:
- EC2 instances
- IP Addresses (must be private IPs)
Sticky Sessions
- Application-based Cookies are either:
- Custom cookies, which are generated by the target and can include any custom attributes required by the application. Cookie names must be specified individually for each target group
- Application cookies, which are generated by the load balancer and are named
AWSALBAPP
- Duration-based Cookies are generated by the load balancer, with the names
AWSALB
for ALB, orAWSELB
for CLB
Cross-Zone Load Balancing
- Cross-Zone Load Balancing means that each load balancer instance will be distributed evenly across all registered instances in the AZ. If disabled, requests will be distributed evenly between ELB nodes but not between instances
- Enabled by default for ALBs. No charges for inter-AZ data
- Disabled by default for NLBs & GWLBs. Fees will be charged for inter-AZ data if enabled
- Disabled by default for CLBs. No charges for inter-AZ data.
Load Balancing with SSL/TLS
- SSL certificates allow traffic between clients and your load balancer to be encrypted in transit
- SSL = Secure Sockets Layer
- TLS = Transport Layer Security. Newer version of SSL.
- Nowadays TLS certificates are mainly used but SSL is still the common name
- SSL certificates are issues by Certificate Authorities (CA)
- SSL certificates have expiration dates and must be renewed
- Load Balancers use X.509 certificates
- AWS Certificate Manager - An AWS service you can use to manage certificates
- You may also upload your own certificates instead
- HTTPS listener:
- You must specify a default certificate
- You can add an optional list of certs to support multiple domains
- Clients can use SNI to specify the hostname they reach
- You can specify a security policy to support older versions of SSL/TLS for legacy clients
- Server Name Indication (SNI) - A protocol that solves the issue of loading multiple SSL certificates onto one web server so the web server can serve multiple websites.
- A client indicates the hostname of the target server in their initial SSL handshake. The server then finds the correct certificate or the default
- Only works for ALB v2, NLB v2 and CloudFront
Connection Draining
- Known as Connection Draining for CLB or Deregistration Delay for ALB & NLB
- Refers to the time to complete "in-flight" requests while the instance is de-registering or unhealthy
- Between 1-3600 seconds (default 300)
- Can be disabled (set to 0)
- Set this to a low value if your requests are short
Auto Scaling Groups
- Auto Scaling Groups allow:
- Scaling in to match increased loads
- Scaling out to match decreased loads
- Minimum/maximum limits for EC2 instance count
- Automatic registration of new instances to load balancers
- Recreation of EC2 instances in case of termination (eg. if the instance is unhealthy)
- ASG Attributes:
- Launch Template;
- AMI + Instance Type
- EC2 User Data
- EBS Volumes
- Security Groups
- SSH Key Pair
- IAM Roles for your instances
- Network + Subnet Information
- Load Balancer information
- Min size / max size / initial capacity
- Scaling policies
- Launch Template;
- It's possible to scale and ASG based on CloudWatch alarms
- An alarm monitors a metric such as Average CPU usage or a custom metric
- Metrics such as Average CPU are computer for the overall ASG instances
- Can create policies to scale in/out upon trigger
- ASG Scaling Policies:
- Dynamic:
- Target tracking
- Simple/step scaling
- Scheduled
- Predictive
- Dynamic:
- Good metrics to scale on:
- CPU Utilization
- RequestCountPerTarget
- Average Network In/Out
- Any custom metric
- Scaling Cooldowns: After a scaling activity happens you enter a cooldown period (default 300 seconds). During this period, the ASG will not launch or terminate additional instances to allow for metrics to stabilize
RDS, Aurora, ElastiCache
RDS (Relational Database Service)
- RDS is a managed DB service for DBs that use SQL as a query language, including:
- Postgres
- MySQL
- MariaDB
- Oracle
- Microsoft SQL Server
- IBM DB2
- Amazon Aurora
- Why use RDS instead of EC2 for DB hosting? RDS is managed
- Automated provisioning, OS patching
- Continuous backups and point-in-time restore
- Monitoring dashboards
- Read replicas for improved read performance
- Multi AZ setup for disaster recovery
- Maintenance windows for upgrades
- Vertical & Horizontal scaling
- Storage backed by EBS
- However, you can't SSH into your instances
- On RDS Storage Auto Scaling:
- When RDS detects you're running out of storage, it scales automatically.
- This will happen if free storage is less than 10% of allocated storage AND low-storage lasts at least 5 mintues AND at least 6 hours have passed since the last modification
- You must set a maximum storage threshold
- RDS Read Replicas help you scale your database reads
- You can have up to 15 Read Replicas within an AZ, cross-AZ or cross-region
- Replication is async, so reads are eventually consistent
- Replicas can be promoted to their own DB
- Use cases:
- You have a production application and a reporting application that both need to read from your database at once. You can create a read replica for the reporting application to read from that takes the load off the source database.
- Things to remember:
- Read replicas are used for SELECT kinds of statements (not INSERT, UPDATE, or DELETE)
- Applications must update their connection strings to leverage read replicas
- Cross-region read replicas will incur extra network cost due to the cross-region data transmission
- RDS Multi-AZ can help you set up disaster recovery contingencies, increasing availability
- Synchronous replication
- Database instances will be under one DNS name, meaning we can have automatic failover in case of loss of AZ, loss of network, instance or storage failover
- Requires no manual intervention in your app
- Not used for scaling
- How to go from a single-AZ deployment to multi-AZ (zero-downtime operation):
- Click on "modify" for the DB
- The following will happen internally:
- A snapshot is taken of the source DB instance
- A new DB is restored from the snapshot in a new AZ
- Synchronization is established between the two DBs
- RDS Custom - Managed Oracle and Microsoft SQL Server DB with OS and DB customization
- Gives us access to the underlying DB and OS so we can:
- Configure settings
- Install patches
- Enable native features
- Access the underlying EC2 isntance using SSH or SSM Session Manager
- REMEMBER: Take a DB snapshot and de-activate automation mode anytime you want to perform customization to prevent data corruption
- Gives us access to the underlying DB and OS so we can:
- Backups:
- Automated:
- Full backup occurs daily during a 'backup window'
- Transaction log backups occur every 5 minutes (enables point-in-time restore)
- Retention Period = 1-35 days. Set to 0 to disable automated backups
- Manual Snapshots:
- Manually triggered by the user
- Backup is retained for as long as desired
- Automated:
- Restoring:
- From a backup/snapshot creates a new DB
- From S3:
- Create a backup of your on-prem DB and store it on S3
- Restore the backup file onto a new RDS instance running MySQL
- ⚠REMEMBER⚠: You still pay for storage even if your RDS DB is stopped! If you plan on stopping your DB for a long time, take a snapshot instead
- Amazon RDS Proxy
- Allows apps to pool and share DB connections established with the DB
- Improving DB efficiency by reducing the stress on DB resources (eg CPU, RAM) and minimize open connections (and timeouts)
- Serverless, autoscaling, multi-AZ
- Reduced failover time by up to 66%
- Supports RDS and Aurora
- No code changes required for most apps
- Enforce IAM Authentication for DB and securely store credentials in AWS Secrets Manager
- RDS Proxy is never publicly accessible, must be accessed from the VPC
- Security Features (Shared with Amazon Aurora):
- At-rest encryption (DB master & replicas encryption using AWS KMS.)
- If the master is not encrypted, the replicas cannot be encrypted
- To encrypt an un-encrypted DB, go through as DB snapshot & restore as encrypted
- In-flight encryption (TLS-ready by default, use the AWS TLS root certificates client-side)
- IAM Authentication (IAM roles to connect to your DB)
- Security Groups
- No SSH available (except on RDS Custom)
- Audit Logs can be enabled and sent to CloudWatch Logs for longer retention
- At-rest encryption (DB master & replicas encryption using AWS KMS.)
Amazon Aurora
- Amazon Aurora is a proprietary AWS DB optimized for the cloud that supports Postgres and MySQL
- Storage automatically grows in increments of 10GB up to 128TB
- Can have up to 15 replicas and the replication process is faster than MySQL (sub-10ms replica lag)
- Failover in Aurora is instantaneous
- More efficient than RDS but costs 20% more
- Integrates with AWS ML services like SageMaker and Comprehend
- High availability and read scaling
- Stores 6 copies of your data across 3 AZs
- 4 copies needed for writes
- 3 copies needed for reads
- Self-healing with p2p replication
- Storage is striped across 100s of volumes
- Support for cross-region replication
- Stores 6 copies of your data across 3 AZs
- Aurora as a DB Cluster
- There is a shared storage volume that automatically expands from 10GB to 128TB
- Writing to the shared storage volume is done through communicating with the Writer Endpoint, which points to the master database, which is the only entity that can write to shared storage.
- If the master fails, automated failover will occur in less than 30 seconds
- To read, connect to the Reader Endpoint, which directs requests to the cluster's set of read-replicas (can have 1-15 at a time), which all read from the shared storage volume
- Features of Aurora
- Automatic failover
- Backup and Recovery
- Isolation and security
- Industry compliance
- Push-button scaling
- Automated Patching with Zero Downtime
- Advanced Monitoring
- Routine Maintenance
- Point-in-time restore via the Backtrack feature
- Defining Custom Endpoints
- If your application involves many different types of workloads and you want to segregate them (say you want to run analytical queries on specific, more powerful read replica instances but not others), define custom endpoints that communicate with a subset of your replicas.
- The Reader Endpoint is generally not used after defining Custom Endpoints
- Serverless Aurora
- Clients communicate with a Proxy Fleet (managed by Aurora), which redirects requests to Aurora DB instances that are instantiated and scaled automatically, which all still read from the shared storage volume
- Aurora Global Database (recommended)
- 1 Primary Region (read/write)
- Up to 5 Secondary Regions (read)
- Up to 16 Read Replicas per Secondary Regions
- Promoting a Secondary region to be a Primary has an RTO of < 1 minute
- Typical cross-region replication takes < 1 second
- Backups:
- Automated:
- Cannot be disabled
- Retained for 1-35 days
- Manual Snapshots:
- Manually triggered by the user
- Backup is retained for as long as desired
- Automated:
- Restoring:
- From a backup/snapshot creates a new DB
- From S3:
- Create a backup of your on-prem DB using Percona XtraBackup and store the backup file on S3
- Restore the backup file onto a new Aurora cluster running MySQL
- Aurora Database Cloning - Create a new Aurora DB Cluster from an
- Faster than snapshot & restore
- Uses copy-on-write protocol
- Initially, the new DB cluster uses the same data volume as the original DB cluster. When updates are made to the new DB cluster data, then additional storage is allocated and data is copied to be separated
- Fast & cost-effective
- Useful to create a "staging" database from a "production" database without impacting the production database
ElastiCache
- ElastiCache is a managed Redis/Memcached service with high performance and low latency
- Helps reduce load off of DBs for read intensive workloads
- Helps make your application stateless
- Usage will involve heavy application code changes
- How it works:
- Applications query ElastiCache
- If there is a cache hit, retrieve the value from ElastiCache.
- Otherwise there is a cache miss. Retrieve the value from your DB service
- Write the result to the cache
- Cache must have invalidation strategy to make sure only the most current data is used. REDIS VS. MEMCACHED
- Redis
- Multi-AZ with Auto-Failover
- Read Replicas to scale reads and have high availability
- Data Durability using AOF persistence
- Backup and restore features
- Supports Sets and Sorted Sets (use case: real-time leaderboard store)
- Supports IAM Authentication and SSL in-flight encryption
- Memcached
- Multi-node for partitioning of data (sharding)
- No replication
- Non-persistent
- No backup and restore
- Multi-threaded architecture
- Patterns for ElastiCache:
- Lazy Loading: All the read data is cached, data can become stale in cache
- Write Through: Adds or updates data in the cache when written to a DB (no stale data)
- Session Store: Store temporary session data in a cache (using TTL features)
Route 53
- Terminology
- Domain Name System (DNS) - System that translates human-friendly hostnames into IP addresses
- Domain Registrar - Route 53, GoDaddy, etc
- DNS Records - A, AAAA, CNAME, NS
- Zone File - Contains DNS records
- Name Server - Resolves DNS queries
- Top Level Domain (TLD) - .com, .us, .in, .gov, .org...
- Second Level Domain (SLD) - amazon.com, google.com
- Sub Domain - www.example.com
- Fully Qualified Domain Name - api.www.example.com
- URL - https://api.www.example.com
- How DNS works (note you wouldn't do this every time, query results are cached with a TTL):
- Client reaches out to Local DNS server (assigned by your company or ISP) with the desired hostname
- The Local DNS server reaches out to the Root DNS Server (managed by ICANN) to get the IP of the TLD DNS Server
- The Local DNS server reaches out to the TLD DNS Server to get the IP of the SLD DNS Server
- The Local DNS server reaches out to the SLD DNS Server to get the IP of the web server under the hostname, which is finally returned to the client
- The client starts sending requests to the web server
- Amazon Route 53 - Highly available, scalable, fully managed and authoritative DNS and Domain Registrar
- The only service with a 100% availability SLA
- Route 53 Records specify how you want to route traffic for a domain. Each record contains:
- Domain/subdomain name
- Record Type
- Value eg. 12.34.56.78
- Routing Policy - How Route 53 responds to queries
- TTL - Amount of time the record cached at DNS Resolvers
- Route 53 Record Types:
- A - Maps a hostname to IPv4
- AAAA - Maps a hostname to IPv6
- CNAME - Maps a hostname to another hostname (only for non-root domain)
- NS - Name Servers for the Hosted Zone
- Alias - Maps a hostname to an (automatically resolved) hostname for an AWS resource(works for root/non-root domains)
- Always of type A/AAAA.
- Can't set the TTL
- Targets: ELB, CloudFront Distributions, API Gateway, Elastic Beanstalk environments, S3 Websites, VPC Interface Endpoints, Global Accelerator accelerators, Route 53 records (in the same hosted zone)
- Hosted Zones - Container for records that define how to route traffic to a domain and its subdomains
- Public Hosted Zones - Contains records that specify how to route on the internet (public domain names)
- Private Hosted Zones - Contains records that specify how you route traffic within one or more VPCs (private domain names)
- Records TTL
- High TTL (eg. 24hr) = Less traffic on Route 53, but possibly outdated records
- Low TTL (eg 60sec) = More traffic on Route 53 (and thus higher costs), less outdated records, makes records easier to change
- Mandatory on all DNS records aside from Alias records
- 3rd Party Registrar with Amazon Route 53
- Create a Hosted Zone in Route 53
- Update NS Records on 3rd party website to use Route 53's Name Servers
Health Checks
- Health Checks - Checks endpoint health for public resources
- Can monitor endpoints, other health checks (calculated health checks) and CloudWatch Alarms
- Integrated with CloudWatch metrics
- About 15 global health checkers will check the endpoint health:
- Healthy/Unhealthy Threshold - 3 (default)
- Interval - 30 seconds (or 10 seconds at a higher cost)
- Supported protocols: HTTP, HTTPS, TCP
- If >18% of checkers report the endpoint is healthy, Route 53 reports it as Healthy. Otherwise, it's Unhealthy.
- Can choose which locations you want Route 53 to use
- Health Checks pass only when the endpoint responds with 2xx and 3xx status codes. Can also be set up to pass/fail based on text in the first 5120 bytes of the response
- Need to configure your router/firewall to allow incoming requests from Route 53 Health Checkers
- Only works for public endpoints. For private endpoints, you can create a CloudWatch Metric and associate a CloudWatch Alarm, then create a Health Check that checks the alarm itself.
- Calculated Health Checks - Combination of results from multiple Health Checks
- Combined with
OR
,AND
orNOT
- Can monitor up to 256 Child Health Checks
- Specify how many of the health checks need to pass to make the parent pass
- Use case: Perform maintenance on your website without causing all health checks to fail
- Combined with
Routing Policies
- Simple - Typically used to route traffic to a single resource
- Can specify multiple values in the same record
- If multiple values are returned, a random one is chosen by the client
- When Alias is enabled, specify only one AWS resource
- Cannot be associated with Health Checks
- Weighted - Control the % of requests that go to each resource
- Assign each record a relative weight (doesn't need to sum to 100)
- DNS records must have the same name & type
- Use cases - Load balancing between regions, testing new application versions
- Assign a weight of 0 to stop sending traffic to a resource
- If all records have a weight of 0, then all records will be returned equally
- Latency-based - Redirect to the resource that has the least latency
- Based on traffic between users and AWS Regions
- Geolocation - Route depending on user location (continent/county/US state)
- Should create a "Default" record in case there's no match on location
- Use cases: website localization, restrict content distribution, load balancing...
- Geoproximity - Route traffic to your resources based on the geographic location of users and resources
- Can shift more traffic to resources based on the defined bias
- To change the size of the geographic region, specify bias values (-99 to 99)
- Resources can be:
- AWS resources (specify AWS region)
- Non-AWS reources (specify latitude and longitude)
- You must use Route 53 Traffic Flow to use this feature
- IP-based - Provide a list of CIDRs for your clients and the corresponding endpoints/locations (user-IP-to-endpoint mappings)
- Use cases - Optimize performance, reduce network costs
- Multi-Value - Use when routing traffic to multiple resources to return multiple values to clients
- Can be associated with Health Checks to return only values for healthy resources
- Up to 8 healthy records returned per query
- Not a substitute for an ELB
Amazon S3 Introduction
- Amazon S3 Use cases
- Backup and storage
- Disaster Recovery
- Archive
- Hybrid Cloud storage
- Application hosting
- Media hosting
- Data lakes & big data analytics
- Software delivery
- Static websites
- S3 allows people to store objects (files) in buckets (directories)
- Buckets must have a globally unique name across all regions and all accounts
- Buckets are defined at the region level
- S3 looks like a global service but buckets are created in a region
- S3 Objects consist of:
- A key, which is the full path, composed of a prefix and an object name
- Object values are the content of the body
- Max object size is 5TB
- If uploading more than 5GB, must use 'multi-part upload'
- Metadata (list of text key/value pairs - system or user metadata)
- Tags (Unicode key/value pair - up to 10) - useful for security/lifecycle
- Version ID (if versioning is enabled)
- S3 security
- User-Based
- IAM Policies - which API calls should be allowed for a specific user from IAM
- Resource-Based
- Bucket Policies - Bucket-wide rules from the s3 console - allows cross account.
- An IAM principal can access an object if the user IAM permissions ALLOW it || the resource policy ALLOWS it && there's no explicit DENY
- Object ACL - Finer grain access control (can be disabled)
- Bucket ACL - Less commonly used (can be disabled)
- Bucket Policies - Bucket-wide rules from the s3 console - allows cross account.
- User-Based
- On Bucket Policies:
- JSON based policies:
- Specify
Resources
(buckets and objects) - Specify the
Effect
(allow/deny) - Specify the
Actions
the policy applies to - Specify the
Principal
(the account or user) to apply the policy to
- Specify
- Use S3 bucket policies to:
- Grant public access to a bucket
- Force objects to be encrypted at upload time
- Grant cross-account access
- JSON based policies:
- Versioning
- You can version your files in Amazon S3
- Enabled at the bucket level
- Uploading a file with the same key as an existing one will change the "version"
- Enabling versioning is best practice as it protects against unintended deletes and
- NOTES:
- Any file that is not versioned prior to enabling versioning will have version "null"
- Suspending versioning does not delete the previous versions
- Replication
- You must enable versioning in source and destination buckets
- Cross-Region Replication (CRR) use case: compliance, lower latency access, replication across accounts
- Same-Region Replication (SRR) use case: log aggregation, live replication between production and test accounts
- Buckets can be in different accounts
- Copying is done asynchronously
- Must give proper IAM permissions to S3
- After you enable replication, only new objects are replicated
- You can optionally replicate existing objects using S3 Batch Replication
- For DELETE operations, you can optionally enable replication of delete markers from source to target. Deletions with a version ID are not replicated
- There is no 'chaining' of replication (ie you cannot set up bucket 1 to replicate to bucket 2 and set up bucket 2 to replicate to bucket 3 and expect objects created in bucket 1 to replicate to bucket 3)
Amazon S3 Advanced
- Moving between Storage Classes
- You can transition objects between storage classes
- Moving objects between tiers can be automated using Lifecycle Rules
- Lifecycle Rules
- Transition Actions - Configure objects to transition to another storage class (eg. move objects to Standard IA class 60 days after creation, move to Glacier for archiving after 6 months)
- Expiration Actions - Configure objects to expire (delete) after some time
- Can be used to delete old versions of files if versioning is enabled
- Can be used to delete incomplete multi-part uploads
- Rules can be created for a certain prefix (eg
s3://mybucket/mp3/*
) - Rules can be created for certain objects Tags (eg
Department: Finance
)
- S3 Storage Class Analysis - Helps you decide when to transition objects to the right storage class
- Recommendations for Standard and Standard IA (does NOT work for One-Zone IA or Glacier)
- Generates a report that is updated daily
- 24 to 48 hours to start seeing data analysis
- Good first step to put together / improve Lifecycle Rules
- With Requester Pays buckets, the requester pays for the cost of data egress instead of the bucket owner
- Helpful when you want to share large datasets with other accounts
- The requester must be authenticated in AWS (cannot be anonymous)
- S3 Event Notifications
- Supported event types of SQS, SNS and Lambda:
S3:ObjectCreated:Put
S3:ObjectCreated:Post
S3:ObjectCreated:Copy
S3:ObjectCreated:CompleteMultipartUpload
S3:ObjectRemoved:Delete
S3:ObjectRemoved:DeleteMarkerCreated
S3:ObjectRestore:Post
S3:ObjectRestore:Completed
S3:ObjectRestore:Delete
S3:ObjectReplication
S3:ReducedRedundancyLostObject
S3:Replication:OperationFailedReplication
S3:Replication:OperationMissedThreshold
S3:Replication:OperationReplicatedAfterThreshold
S3:Replication:OperationNotTracked
S3:LifecycleExpiration:Delete
S3:LifecycleExpiration:DeleteMarkerCreated
S3:IntelligentTiering
S3:ObjectTagging:Put
S3:ObjectTagging:Delete
S3:ObjectAcl:Put
- Object name filtering possible (eg. *.jpg)
- REMEMBER: S3 event notifications typically deliver events in seconds but can sometimes take over a minute
- S3 bucket must have permission to perform actions with the desired service
- SNS:
SNS:Publish
- SQS:
SQS:SendMessage
- Lambda:
lambda:InvokeFunction
- SNS:
- Connecting with Amazon EventBridge:
- Advanced filtering options with JSON rules (metadata, object size, name...)
- Multiple Destinations - Can bridge to over 18 AWS services, including Step Functions, Kinesis Streams/Firehose...
- EventBridge Capabilities - Archive, Replay Events, Reliable Delivery
- Supported event types of SQS, SNS and Lambda:
- S3 Select & Glacier Select
- Retrieve less data using SQL by performing server-side filtering
- Can filter by rows & columns (simple SQL statements)
- Less network transfer, less CPU cost client-side
- S3 Batch Operations
- Perform bulk operations on existing S3 Objects with a single request, such as:
- Modifying object metadata & properties
- Copy objects between S3 buckets
- Encrypt un-encrypted objects
- Modify ACLs, tags
- Restore objects from S3 Glacier
- Invoke Lambda function to perform custom action on each object
- A job consists of a list of objects, the action to perform and optional parameters
- S3 Batch Operations manages, retries, tracks progress, sends completion notifications, generates reports, etc...
- You can use S3 Inventory to get object list and use S3 Select to filter your objects
- Perform bulk operations on existing S3 Objects with a single request, such as:
Performance
- Baseline Performance
- S3 automatically scales to high request rates with a latency of 100-200ms
- Your application can achieve at least 3500 PUT/COPY/POST/DELETE or 5500 GET/HEAD requests per second per prefix in a bucket
- There are no limits to the number of prefixes in a bucket
- Multi-part Uploads:
- Recommended for files > 100MB, must use for files > 5GB
- Can help parallelize uploads
- S3 Transfer Acceleration
- Increase transfer speed by transferring file to an AWS edge location which will forward the data to the S3 bucket in the target region
- Compatible with multi-part upload
- S3 Byte-Range Fetches
- Parallelize GETs by requesting specific byte ranges
- Better resilience in case of failures
- Can be used to speed up downloads
Storage Lens
- Storage Lens can help you:
- Understand, analyze, optimize storage across an entire AWS Organization
- Discover anomalies, identify cost efficiencies, apply data protection best practices across entire AWS Organization
- Aggregate data for Organization, specific accounts, regions, buckets or prefixes and view via a customizable dashboard
- Storage Lens can be configured to export metrics daily to an S3 bucket (CSV, Parquet)
- Storage Lens Default Dashboard:
- Visualize summarized insights and trends for both free and advanced metrics
- Shows multi-region and multi-account data
- Pre-configured by Amazon S3
- Storage Lens Metrics
- Free Metrics:
- Summary - General insights about your S3 storage
- StorageBytes, ObjectCount...
- Use cases: identify the fastest-growing or not used buckets and prefixes
- Cost-Optimization - Insights about how to manage and optimize your storage costs
- NonCurrentVersionStorageBytes, IncompleteMultipartUploadStorageBytes...
- Use cases: identify buckets with incomplete multipart uploaded older than X days, identify which objects could be transitioned to lower-cost storage class
- Data-Protection - Insights for data protection features
- VersioningEnabledBucketCount, MFADeleteEnabledBucketCount, SSEKMSEnabledBucketCount, CrossRegionReplicationRuleCount
- Use cases: Identify buckets that don't follow data-protection best practices
- Access-Management - Insights for S3 Object Ownership
- ObjectOwnershipBucketOwnerEnforcedBucketCount...
- Use cases: identify which Object Ownership settings your buckets use
- Event - Insights for S3 Event Notifications
- EventNotificationEnabledBucketCount (identify which buckets have Event Notifications configured)
- Performance - Provide insights for S3 Transfer Acceleration
- TransferAccelerationEnabledBucketCount (identify which buckets have Transfer Acceleration enabled)
- And more! (about 28 free insights in total)
- Data is available for queries for 14 days
- Summary - General insights about your S3 storage
- Advanced Metrics:
- Activity - Provide insights about how your storage is requested
- AllRequests, GetRequests, PutRequests, ListRequests, BytesDownloaded...
- Status Code - Provide insights for HTTP status codes
- 200OKStatusCount, 403ForbiddenErrorCount, 404NotFoundErrorCount...
- Advanced Cost Optimization
- Advanced Data Protection
- Activity - Provide insights about how your storage is requested
- Free Metrics:
- Selecting Advanced metrics and recommendations for Storage Lens will grant you access to Advanced metrics as well as:
- CloudWatch Publishing, allowing you to access these metrics in CloudWatch without additional charges
- Prefix Aggregation - Collect metrics at the prefix level
- Making data available for queries for 15 months
S3 Security
S3 Encryption
- There are 4 methods to encrypt S3 objects:
- Server-Side Encryption (SSE) methods:
- SSE with Amazon S3-Managed Keys (SSE-S3) - Enabled by default for new buckets and new objects
- Encryption type is AES-256
- Must set header
"x-amz-server-side-encryption": "AES256"
when uploading
- SSE with KMS Keys stored in AWS KMS (SSE-KMS) - Leverage KMS to gain more control over encryption process and to audit key usage using CloudTrail
- Must set header
"x-amz-server-side-encryption": "aws:kms"
when uploading - ⚠REMEMBER⚠: You may be impacted by the KMS limits
- When you upload it calls the GenerateDataKey KMS API
- When you download, it calls the Decrypt KMS API
- Both calls count toward the KMS quota per second (5500, 10000, 30000 req/s depending on your region)
- You may request a quota increase using the Service Quotas console
- Must set header
- SSE with Customer-Provided Keys (SSE-C) - For when you want to manage your own encryption keys
- HTTPS must be used
- Encryption key must be provided in request headers for every request made
- SSE with Amazon S3-Managed Keys (SSE-S3) - Enabled by default for new buckets and new objects
- Client-Side Encryption
- Use client libraries such as Amazon S3 Client-Side Encryption Library
- Clients must encrypt data themselves before sending to Amazon S3
- Customer fully manages the keys and encryption cycle
- Server-Side Encryption (SSE) methods:
- Encryption in Transit (SSL/TLS)
- Amazon S3 exposes two endpoints: HTTP and HTTPS
- You can force encryption in transit with bucket policies:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Deny",
"Principal": "*",
"Action": "s3:GetObject",
"Resource": "arn:aws:s3:::my-bucket/*",
"Condition": {
"Bool": {
"aws:SecureTransport": "false"
}
}
}
]
}
- You can also force certain server-side encryption types using bucket policies:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Deny",
"Principal": "*",
"Action": "s3:PutObject",
"Resource": "arn:aws:s3:::my-bucket/*",
"Condition": {
"StringNotEquals": {
"s3:x-amz-server-side-encryption": "aws:kms" // refuse PUT calls that don't use SSE-KMS
},
"Null": {
"s3:x-amz-server-side-encryption-customer-algorithm": "true" // refuse PUT calls that don't use SSE-C
}
}
}
]
}
- 🛈Note: Bucket Policies are evaluated before "Default Encryption"
S3 CORS
- Cross-Origin Resource Sharing (CORS) -
- Origin = Scheme (protocol) + host (domain) + port
- Web Browser-based mechanism to allow requests to other origins while visiting the main origin
- Cross-origin requests won't be fulfilled unless the other origin allows for the request using CORS Headers (eg.
Access-Control-Allow-Origin
)
- Example:
- http://example.com/app1 & http://example.com/app2 have the same origin
- http://www.example.com & http://other.example.com have different origins
- If a client makes a cross-origin request on our S3 bucket, we need to enable to correct CORS headers
S3 MFA
- If you enable MFA Delete on a bucket, MFA will be required to:
- Permanently delete an object version
- Suspend versioning on the bucket
- MFA will not be required to:
- Enable Versioning
- List deleted versions
- To use MFA Delete, Versioning must be enabled on the bucket
- Only the bucket owner (root account) can enable/disable MFA Delete
S3 Access Logs
- For audit purposes, you may want to log all access to S3 buckets.
- Any request made to S3 from any account (authorized or not) will be logged into another S3 buckets
- You can feed S3 log data into data analysis tools
- The target logging bucket must be in the same AWS regions
- Click Here for more info on the log format
- ⚠REMEMBER⚠: Don't set your logging bucket to be the monitored bucket! It will create a logging loop and your bucket will grow exponentially!
S3 Pre-signed URLS
- You can generate pre-signed URLs using the S3 console, AWS CLI or SDK
- URL Expiration:
- via S3 console - 1-720 minutes
- via AWS CLI - configure expiration with
--expires-in
parameter in seconds. Default = 3600s, Max = 604800s or around 168 hours
- Users given a pre-signed URL inherit the permissions of the user that generated the URL for GET/PUT
S3 Locks
- S3 Glacier Vault Lock
- Adopt a WORM (Write Once Read Many) model
- Create a Vault Lock Policy
- Lock the policy for future edits (can no longer be changed or deleted)
- Helpful for compliance and data retention
- S3 Object Lock
- Adopt a WORM (Write Once Read Many) model
- Block an object version deletion for a specified amount of time
- Retention Modes:
- Compliance - Object versions can't be overwritten or deleted by any user (root included), object retention modes can't be changed and retention periods can't be shortened
- Governance - Only users with special permissions can overwrite or delete an object version or alter its lock settings.
- Retention Period - Protect the object for a fixed period, it can be extended
- Legal Hold - Protect the object indefinitely, independent from retention period.
- Can be freely placed and removed using the
s3:PutObjectLegalHold
IAM permission
- Can be freely placed and removed using the
Access Points
- Access Points simplify security management for S3 Buckets
- Each access point has:
- Its own DNS name (Internet Origin or VPC Origin)
- An access point policy (similar to bucket policy) - manage security at scale
- We can def ine the access point to be accessible only from within the VPC
- You must create a VPC Endpoint (Gateway or Interface) to access the Access Point
- The VPC Endpoint Policy must allow access to the target bucket and Access Point
- Each access point has:
S3 Object Lambda
- Use AWS Lambda Functions to change the object before it is retrieved by the caller application
- Only one S3 bucket is needed, on top of which we create S3 Access Point and S3 Object Lambda Access Points
- Use Cases:
- Redacting personally identifiable information for analytics or non-prod environments
- Converting across data formats (eg converting XML to JSON)
- Resizing and watermarking images on the fly using caller-specific details such as the user who requested the object