Amazon S3
-
Amazon S3 is one of the main building blocks of AWS, advertised as "infinitely scaling" storage
- Utilised by many websites and other AWS services as well
-
Use cases:
- Backup and storage
- Disaster recovery
- Archive
- Hybrid Cloud storage
- Application hosting
- Media hosting
- Data lakes & big data analytics
- Software delivery
- Static website
-
Buckets: Directories which allow people to store objects (files)
- Must have globally unique name (across all regions and all accounts)
- Defined at the region level
- S3 looks like a global service but buckets are created in a region
- Naming convention:
- No uppercase, no underscore
- 3-63 characters long
- Not an IP
- Must start with a lowercase letter or number
- Must NOT start with the prefix xn--
- Must NOT end with the suffix -s3alias
-
Objects: Files that have a key
- The key is the full path of the file
- Ex. s3://my-bucket/my_folder/my_file.txt
- There's no concept of "directories" within buckets (UI may appear tricky)
- Keys are just very long names composed of slashes (/), a prefix (directory) and an object name (file)
- Object values are the content of the body
- Maximum object size is 5TB
- If uploading more than 5GB, must use multi-part upload
- Metadata (list of text key / value pairs - set by user or system)
- Tags (Unicode key / value pair - up to 10), useful for security / lifecycle
- Version ID (only if versioning is enabled)
- The key is the full path of the file
Security
-
User-Based
- IAM Policies: Which API calls should be allowed for a specific user from IAM
-
Resource-Based
- Bucket Policies: Bucket wide rules from the S3 console (allows cross account)
- Object Access Control List (ACL): Finer grain (can be disabled)
- Bucket Access Control List (ACL): Less common (can be disabled)
-
NOTE: An IAM principal can access an S3 object if
- The user IAM permissions ALLOW it OR resource policy ALLOWS it
- AND there's no explicit DENY
-
Encryption: Encrypt objects in S3 using encryption keys
S3 Bucket Policies
-
JSON based policies
- Resources: Buckets and objects
- Effect: Allow / Deny
- Actions: Set of API to Allow or Deny
- Principal: The account or user to apply the policy to
-
Use S3 bucket for policy to:
- Grant public access to the bucket
- Force objects to be encrypted at upload
- Grant access to another account (Cross-Account access)
{ "Version": "2012-10-17", "Statement": [ { "Sid": "PublicRead", "Effect": "Allow", "Principal": "*", "Action": [ "s3:GetObject" ], "Resource": [ "arn:aws:s3:::examplebucket/*" ] } ] }
-
Bucket settings for Block Public Access (can be set at the account level)
- Block public access to buckets and objects granted through new access control lists (ACLs)
- Block public access to buckets and objects granted through any access control lists (ACLs)
- Block public access to buckets and objects granted through new public bucket or access point policies
- Block public and cross-account access to buckets and objects through any public bucket or access point policies
-
These settings were created to prevent company data leaks, leave them on for private buckets
Static Website Hosting
- Host static websites and make them accessible on the Internet
- Website URL format depends on region
- For error 403 Forbidden, ensure the bucket policy allows public reads
Versioning Files with S3
- Enable versioning at the bucket level
- Same key overwrite will change the "version": 1, 2, 3...
- Best practice to version buckets
- Protect against accidental deletes (ability to restore a version)
- Easy roll back to previous version
- Notes:
- Any file that is not versioned prior to enabling versioning will have version "null"
- Suspending versioning doesn't delete the previous versions
Replication (CRR & SRR)
-
Goal is to setup asynchronous replication between a source bucket and target bucket
- Must enable versioning in source and destination buckets
- Buckets can be in different AWS accounts
- Must give proper IAM permissions to S3
-
Cross-Region Replication (CRR):
- Use cases: Compliance, lower latency access, replication across accounts
-
Same-Region Replication (SRR):
- Use cases: Log aggregation, live replication between production and test accounts
S3 Storage Classes
-
Amazon S3 Standard - General Purpose: For frequently accessed data (more than once a month)
- 99.99% availability
- Low latency and high throughput (access time of milliseconds)
- Sustain 2 concurrent facility failures
- Use cases: Big data analytics, mobile & gaming applications, content distribution
-
Infrequent Access (IA): For data less frequently accessed, but requires rapid access when needed
- Lower cost than S3 Standard
- Amazon S3 Standard-Infrequent Access (IA)
- 99.9% availability
- Use cases: Disaster recovery, backups
- Amazon S3 One Zone-Infrequent Access
- 99.999999999% durability in a single AZ but data lost when AZ is destroyed
- 99.5% availability
- Use cases: Storing secondary backup copies of on-premise data or data you can recreate
-
Glacier: Low-cost object storage meant for archiving / backup
- Pay for storage and object retrieval cost
- Amazon S3 Glacier Instant Retrieval: For data accessed once a quarter
- Fast retrieval, takes milliseconds
- Minimum storage duration of 90 days
- Amazon S3 Glacier Flexible Retrieval: For yearly accessed archive data with varying retrieval times
- Expedited (1 to 5 minutes)
- Standard (3 to 5 hours)
- Bulk (5 to 12 hours, free)
- Minimum storage duration of 90 days
- Amazon S3 Glacier Deep Archive: For long term storage accessed less than once a year
- Standard (12 hours)
- Bulk (48 hours)
- Minimum storage duration of 180 days
-
Amazon S3 Intelligent Tiering: Move objects automatically between Access Tiers based on usage
- Small monthly monitoring and auto-tiering fee
- No retrieval charges
- Useful for data with changing or unknown access patterns
- Frequent Acess tier (automatic): default tier
- Infrequent Access tier (automatic): objects not accessed for 30 days
- Archive Instant Access tier (automatic): objects not accessed for 90 days
- Archive Acess tier (optional): configurable from 90 days to 700+ days
- Deep Archive Access tier (optional): configurable from 180 days to 700+ days
-
Durability: Represents the number of times an object will be lost by S3
- High durability (99.9999999%) of objects across multiple AZ
- Ex. Storing 10 million objects with S3, you can expect on average to incur a loss of a single object once every 10000 years
- Same durability for all storage classes
-
Availability: Measures how readily available a service is
- Availability varies depending on storage class
- Ex. S3 standard has 99.99% availability which is 53 minutes of a year
-
Can move between classes manually or using S3 Lifecycle configurations
S3 Encryption
- Server-Side Encryption
- Created buckets and uploaded objects are encrypted by default
- Server encrypts the file after receiving it
- Client-Side Encryption
- User encrypts the file before uploading it
IAM Access Analyzer for S3
- A monitoring feature for S3 buckets, ensuring only intended people have access to them
- Evaluates S3 Bucket Policies, S3 ACLs, S3 Access Point Policies
- Shows you which buckets are publicly accessible, what buckets are shared with other AWS accounts, etc. for review
- Powered by IAM Access Analyzer, which allows you to find resources in your account shared with other entities
Shared Responsibility Model for S3
- AWS responsibilities:
- Infrastructure
- Global security
- Durability
- Availability
- Sustain concurrent loss of data in two facilities
- Configuration and vulnerability analysis
- Compliance validation
- Infrastructure
- User responsibilities:
- S3 Versioning
- S3 Bucket Policies
- S3 Replication setup
- Logging and monitoring
- S3 Storage Classes
- Data encryption at rest and in transit
AWS Snow Family
- Very secure and portable devices with two use cases
- Collect and process data at the edge
- Migrate data into and out of AWS
Data migration
-
Devices: Snowcone, Snowball Edge, Snowmobile
-
AWS Snow Family: Offline devices to perform data migrations
- Recommended if it takes more than a week to transfer data over a network
-
Challenges:
- Limited connectivity and bandwidth
- High network cost
- Shared bandwidth
- Connection stability
-
Snowball Edge
- Phyiscally transporting data to move TBs or PBs of data in and out of AWS
- Alternative to moving data over the network and paying network fees
- Pay per data transfer job
- Provide block storage and S3-compatible object storage
- Snowball Edge Storage Optimized
- 80 TB of HDD capacity for block volume and S3 compatible object storage
- Snowball Edge Compute Optimised
- 42 TB of HDD or 28 TB NVMe capacity for block volume and S3 compatible object storage
- Use cases: Large data cloud migrations, DC decommission, disaster recovery
-
Snowcone & Snowcone SSD devices
- Small, portable, secure and durable in harsh environments
- Used for edge computing, storage and data transfer
- Snowcone: 8 TB of HDD storage
- Snowcome SSD: 14 TB of SSD storage
- Use a Snowcone where Snowball doesn't fit
- Provide your own battery / cables
- Two ways to send data back to AWS:
- Send data back offline by shipping it
- Connect it to the internet and use AWS DataSync to send data
-
Snowmobile
- A truck which can transfer EBs (exabytes) of data
- Each Snowmobile has 100 PB of capacity used in parallel
- Highly secure, temperature controlled, GPS and 24/7 video surveillance
- Better than Snowball if transferring more than 10 PB
-
Usage process:
- Request Snowball devices from the AWS console for delivery
- Install the snowball client / AWS OpsHub on your servers
- Connect the snowball to your servers and copy files using the client
- Ship back the device when completed to the correct AWS facility
- Data will be loaded into an S3 bucket
- Snowball is completely wiped
Edge computing
-
Devices: Snowcone, Snowball Edge
-
Processing data while it's being created on an edge location
-
Edge locations have none to limited internet access or access to computing power
-
Use cases: Preprocess data, machine learning at the edge, transcoding media streams
-
Can also ship back the device to AWS
-
Snowcone & Snowcone SSD (smaller)
- 2 CPUs, 4 GB of memory, wired / wireless access
- USB-C power using a cord or optional battery
-
Snowball Edge
- Compute Optimized
- 104 virtual CPUs, 416 GiB of RAM
- Optional GPU for video processing / machine learning
- 28 TB NVMe or 42 TB HDD of usable storage
- Storage Clustering available for up to 16 nodes
- Storage Optimized
- Up to 40 virtual CPUs, 80 GiB of RAM, 80 TB storage
- Compute Optimized
-
All devices can run EC2 instances & AWS Lambda functions using AWS IoT Greengrass
-
For long-term deployment options, discounted pricing at 1 and 3 years
AWS OpsHub
-
Software to manage your Snow Family Device
- Unlocking, configuring single or clustered devices
- Transferring files
- Launching / managing instances running on Snow Family Devices
- Monitor device metrics
- Launch compatible AWS services
-
Alternatively can use the CLI
-
Snowball Edge pricing
- Pay for device usage and data transfer out of AWS (putting data in is free)
- On-Demand
- One-time service fee per job
- 10 and 15 days of usage for Storage Optimized 80 TB and 210 TB respectively
- Shipping days don't contribute towards the days of usage
- Pay per day for additional days of usage
- Commited Upfront
- Pay in advance for monthly, 1 year and 3 years of usage (Edge Computing)
- Up to 62% discounted pricing
Hybrid cloud for storage
-
AWS wants part of your infrastructure on-premises and on the cloud
-
Reasons:
- Long cloud migrations
- Security requirements
- Compliance requirements
- IT strategy
-
S3 is a properietary storage technology which uses AWS Storage Gateway to expose the S3 data on-premise
-
AWS Storage Gateway: Hybrid storage service to allow on-premises to seamlessly use the AWS Cloud
- Bridge between on-premise data and cloud data in S3
- Use cases: Disaster recovery, backup & restore, tiered storage
- Types of Storage Gateway:
- File Gateway
- Volume Gateway
- Tape Gateway