Amazon S3

Amazon S3 is one of the main building blocks of AWS, advertised as "infinitely scaling" storage
- Utilised by many websites and other AWS services as well
Use cases:
- Backup and storage
- Disaster recovery
- Archive
- Hybrid Cloud storage
- Application hosting
- Media hosting
- Data lakes & big data analytics
- Software delivery
- Static website
Buckets: Directories which allow people to store objects (files)
- Must have globally unique name (across all regions and all accounts)
- Defined at the region level
- S3 looks like a global service but buckets are created in a region
- Naming convention:
  - No uppercase, no underscore
  - 3-63 characters long
  - Not an IP
  - Must start with a lowercase letter or number
  - Must NOT start with the prefix xn--
  - Must NOT end with the suffix -s3alias
Objects: Files that have a key
- The key is the full path of the file
  - Ex. s3://my-bucket/my_folder/my_file.txt
- There's no concept of "directories" within buckets (UI may appear tricky)
- Keys are just very long names composed of slashes (/), a prefix (directory) and an object name (file)
- Object values are the content of the body
  - Maximum object size is 5TB
  - If uploading more than 5GB, must use multi-part upload
- Metadata (list of text key / value pairs - set by user or system)
- Tags (Unicode key / value pair - up to 10), useful for security / lifecycle
- Version ID (only if versioning is enabled)

Security

User-Based
- IAM Policies: Which API calls should be allowed for a specific user from IAM
Resource-Based
- Bucket Policies: Bucket wide rules from the S3 console (allows cross account)
- Object Access Control List (ACL): Finer grain (can be disabled)
- Bucket Access Control List (ACL): Less common (can be disabled)
NOTE: An IAM principal can access an S3 object if
- The user IAM permissions ALLOW it OR resource policy ALLOWS it
- AND there's no explicit DENY
Encryption: Encrypt objects in S3 using encryption keys

S3 Bucket Policies

JSON based policies
- Resources: Buckets and objects
- Effect: Allow / Deny
- Actions: Set of API to Allow or Deny
- Principal: The account or user to apply the policy to

Use S3 bucket for policy to:

Grant public access to the bucket
Force objects to be encrypted at upload
Grant access to another account (Cross-Account access)

{    
    "Version": "2012-10-17",    
    "Statement": [        
        {            
            "Sid": "PublicRead",           
            "Effect": "Allow",
            "Principal": "*",
            "Action": [
                "s3:GetObject"
            ],
            "Resource": [
                "arn:aws:s3:::examplebucket/*"
            ]    
        }         
    ] 
}

Bucket settings for Block Public Access (can be set at the account level)
- Block public access to buckets and objects granted through new access control lists (ACLs)
- Block public access to buckets and objects granted through any access control lists (ACLs)
- Block public access to buckets and objects granted through new public bucket or access point policies
- Block public and cross-account access to buckets and objects through any public bucket or access point policies
These settings were created to prevent company data leaks, leave them on for private buckets

Static Website Hosting

Host static websites and make them accessible on the Internet
Website URL format depends on region
- http://bucket-name.s3-website-aws-region.amazonaws.com (hyphen)
- http://bucket-name.s3-website.aws-region.amazonaws.com (period)
For error 403 Forbidden, ensure the bucket policy allows public reads

Versioning Files with S3

Enable versioning at the bucket level
Same key overwrite will change the "version": 1, 2, 3...
Best practice to version buckets
- Protect against accidental deletes (ability to restore a version)
- Easy roll back to previous version
Notes:
- Any file that is not versioned prior to enabling versioning will have version "null"
- Suspending versioning doesn't delete the previous versions

Replication (CRR & SRR)

Goal is to setup asynchronous replication between a source bucket and target bucket
- Must enable versioning in source and destination buckets
- Buckets can be in different AWS accounts
- Must give proper IAM permissions to S3
Cross-Region Replication (CRR):
- Use cases: Compliance, lower latency access, replication across accounts
Same-Region Replication (SRR):
- Use cases: Log aggregation, live replication between production and test accounts

S3 Storage Classes

Amazon S3 Standard - General Purpose: For frequently accessed data (more than once a month)
- 99.99% availability
- Low latency and high throughput (access time of milliseconds)
- Sustain 2 concurrent facility failures
- Use cases: Big data analytics, mobile & gaming applications, content distribution
Infrequent Access (IA): For data less frequently accessed, but requires rapid access when needed
- Lower cost than S3 Standard
- Amazon S3 Standard-Infrequent Access (IA)
  - 99.9% availability
  - Use cases: Disaster recovery, backups
- Amazon S3 One Zone-Infrequent Access
  - 99.999999999% durability in a single AZ but data lost when AZ is destroyed
  - 99.5% availability
  - Use cases: Storing secondary backup copies of on-premise data or data you can recreate
Glacier: Low-cost object storage meant for archiving / backup
- Pay for storage and object retrieval cost
- Amazon S3 Glacier Instant Retrieval: For data accessed once a quarter
  - Fast retrieval, takes milliseconds
  - Minimum storage duration of 90 days
- Amazon S3 Glacier Flexible Retrieval: For yearly accessed archive data with varying retrieval times
  - Expedited (1 to 5 minutes)
  - Standard (3 to 5 hours)
  - Bulk (5 to 12 hours, free)
  - Minimum storage duration of 90 days
- Amazon S3 Glacier Deep Archive: For long term storage accessed less than once a year
  - Standard (12 hours)
  - Bulk (48 hours)
  - Minimum storage duration of 180 days
Amazon S3 Intelligent Tiering: Move objects automatically between Access Tiers based on usage
- Small monthly monitoring and auto-tiering fee
- No retrieval charges
- Useful for data with changing or unknown access patterns
- Frequent Acess tier (automatic): default tier
- Infrequent Access tier (automatic): objects not accessed for 30 days
- Archive Instant Access tier (automatic): objects not accessed for 90 days
- Archive Acess tier (optional): configurable from 90 days to 700+ days
- Deep Archive Access tier (optional): configurable from 180 days to 700+ days
Durability: Represents the number of times an object will be lost by S3
- High durability (99.9999999%) of objects across multiple AZ
- Ex. Storing 10 million objects with S3, you can expect on average to incur a loss of a single object once every 10000 years
- Same durability for all storage classes
Availability: Measures how readily available a service is
- Availability varies depending on storage class
- Ex. S3 standard has 99.99% availability which is 53 minutes of a year
Can move between classes manually or using S3 Lifecycle configurations

S3 Encryption

Server-Side Encryption
- Created buckets and uploaded objects are encrypted by default
- Server encrypts the file after receiving it
Client-Side Encryption
- User encrypts the file before uploading it

IAM Access Analyzer for S3

A monitoring feature for S3 buckets, ensuring only intended people have access to them
Evaluates S3 Bucket Policies, S3 ACLs, S3 Access Point Policies
Shows you which buckets are publicly accessible, what buckets are shared with other AWS accounts, etc. for review
Powered by IAM Access Analyzer, which allows you to find resources in your account shared with other entities

Shared Responsibility Model for S3

AWS responsibilities:
- Infrastructure
  - Global security
  - Durability
  - Availability
  - Sustain concurrent loss of data in two facilities
- Configuration and vulnerability analysis
- Compliance validation
User responsibilities:
- S3 Versioning
- S3 Bucket Policies
- S3 Replication setup
- Logging and monitoring
- S3 Storage Classes
- Data encryption at rest and in transit

AWS Snow Family

Very secure and portable devices with two use cases
- Collect and process data at the edge
- Migrate data into and out of AWS

Data migration

Devices: Snowcone, Snowball Edge, Snowmobile
AWS Snow Family: Offline devices to perform data migrations
- Recommended if it takes more than a week to transfer data over a network
Challenges:
- Limited connectivity and bandwidth
- High network cost
- Shared bandwidth
- Connection stability
Snowball Edge
- Phyiscally transporting data to move TBs or PBs of data in and out of AWS
- Alternative to moving data over the network and paying network fees
- Pay per data transfer job
- Provide block storage and S3-compatible object storage
- Snowball Edge Storage Optimized
  - 80 TB of HDD capacity for block volume and S3 compatible object storage
- Snowball Edge Compute Optimised
  - 42 TB of HDD or 28 TB NVMe capacity for block volume and S3 compatible object storage
- Use cases: Large data cloud migrations, DC decommission, disaster recovery
Snowcone & Snowcone SSD devices
- Small, portable, secure and durable in harsh environments
- Used for edge computing, storage and data transfer
- Snowcone: 8 TB of HDD storage
- Snowcome SSD: 14 TB of SSD storage
- Use a Snowcone where Snowball doesn't fit
- Provide your own battery / cables
- Two ways to send data back to AWS:
  - Send data back offline by shipping it
  - Connect it to the internet and use AWS DataSync to send data
Snowmobile
- A truck which can transfer EBs (exabytes) of data
- Each Snowmobile has 100 PB of capacity used in parallel
- Highly secure, temperature controlled, GPS and 24/7 video surveillance
- Better than Snowball if transferring more than 10 PB
Usage process:
- Request Snowball devices from the AWS console for delivery
- Install the snowball client / AWS OpsHub on your servers
- Connect the snowball to your servers and copy files using the client
- Ship back the device when completed to the correct AWS facility
- Data will be loaded into an S3 bucket
- Snowball is completely wiped

Edge computing

Devices: Snowcone, Snowball Edge
Processing data while it's being created on an edge location
Edge locations have none to limited internet access or access to computing power
Use cases: Preprocess data, machine learning at the edge, transcoding media streams
Can also ship back the device to AWS
Snowcone & Snowcone SSD (smaller)
- 2 CPUs, 4 GB of memory, wired / wireless access
- USB-C power using a cord or optional battery
Snowball Edge
- Compute Optimized
  - 104 virtual CPUs, 416 GiB of RAM
  - Optional GPU for video processing / machine learning
  - 28 TB NVMe or 42 TB HDD of usable storage
  - Storage Clustering available for up to 16 nodes
- Storage Optimized
  - Up to 40 virtual CPUs, 80 GiB of RAM, 80 TB storage
All devices can run EC2 instances & AWS Lambda functions using AWS IoT Greengrass
For long-term deployment options, discounted pricing at 1 and 3 years

AWS OpsHub

Software to manage your Snow Family Device
- Unlocking, configuring single or clustered devices
- Transferring files
- Launching / managing instances running on Snow Family Devices
- Monitor device metrics
- Launch compatible AWS services
Alternatively can use the CLI
Snowball Edge pricing
- Pay for device usage and data transfer out of AWS (putting data in is free)
- On-Demand
  - One-time service fee per job
  - 10 and 15 days of usage for Storage Optimized 80 TB and 210 TB respectively
  - Shipping days don't contribute towards the days of usage
  - Pay per day for additional days of usage
- Commited Upfront
  - Pay in advance for monthly, 1 year and 3 years of usage (Edge Computing)
  - Up to 62% discounted pricing

Hybrid cloud for storage

AWS wants part of your infrastructure on-premises and on the cloud
Reasons:
- Long cloud migrations
- Security requirements
- Compliance requirements
- IT strategy
S3 is a properietary storage technology which uses AWS Storage Gateway to expose the S3 data on-premise
AWS Storage Gateway: Hybrid storage service to allow on-premises to seamlessly use the AWS Cloud
- Bridge between on-premise data and cloud data in S3
- Use cases: Disaster recovery, backup & restore, tiered storage
- Types of Storage Gateway:
  - File Gateway
  - Volume Gateway
  - Tape Gateway