Amazon S3
-
Infinitely scaling storage.
-
Used for Backup and Storage, Disaster Recovery, Archive, Hybrid Cloud storage, Application hosting, Media hosting, Data lakes and big data analytics, software delivery, and static websites.
Buckets
- Amazon S3 allows people to store objects (files) in "buckets" (directories).
- They must have a globally unique name.
- Defined at the region level.
- S3 may look like a global service but buckets are created in a region.
- Naming convention:
- No uppercase.
- No underscore.
- 3-63 characters long.
- Not an IP.
- Must start with lowercase letter or number.
- Must NOT start with the prefix xn--.
- Must NOT end with the suffix -s3alias.
Objects
- Objects (files) have a key.
- The key is the full path.
- Key is composed of prefix + object name.
- There is no concept of directories within buckets.
- Objects values are the content of the body.
- Max object size is 5TB.
- If uploading more than 5GB, must use a multi-part upload.
- Metadata (list of text key / value pairs - system or user metadata).
- Tags (Unicode key / value pair - up to 10) - useful for security / lifecycle.
- Version ID (if versioning is enabled).
Security
- User Based: IAM Policies - which API calls should be allowed for a specific user from IAM.
- Resource Based:
- Bucket Policies - bucket wide rules from the S3 console - allows cross account.
- Object Access Control List (ACL) - finer grain (can be disabled).
- Bucket Access Control List (ACL) - less common (can be disabled).
- Note: an IAM principal can access an S3 object if the user IAM permissions allow it or the resource policy allows it and there is no explicit deny.
- Encryption: encrypt objects in Amazon S3 using encryption keys.
S3 Bucket Policies
- JSON based policies:
- Resources: buckets and objects.
- Effect: Allow / Deny.
- Actions: Set of API to allow or deny.
- Principal: The account or user to apply the policy to
{ "Version": "2012-10-17", "Statement": [ { "Sid": "PublicRead", "Effect": "Allow", "Principal": "*", "Action": [ "s3:GetObject" ], "Resource": [ "arn:aws:s3:::examplebucket/*" ] } ] }
- Use S3 bucket policy to:
- Grant public access to the bucket.
- Force objects to be encrypted at upload.
- Grant access to another account (cross account).
- When generating a policy, if you want to give access to everything in your bucket. When you provide your bucket Id, add a
/*
to the end of it.
Versioning
- You can version your files in Amazon S3.
- It is enabled at the bucket level.
- Same key overwrite will change the "version".
- It is best practice to version you buckets.
- Protect against unintended deletes (ability to restore a version).
- Easy roll back to previous version.
- Notes:
- Any file that is not versioned prior to enabling versioning will have version "null".
- Suspending versioning does not delete the previous versions.
Replication CRR & SRR
- Must have Versioning enabled in source and destination buckets.
- Cross-Region Replication.
- Same-Region Replication.
- Buckets can be in different AWS accounts.
- Copying is asynchronous.
- Must give proper IAM permissions to S3.
- Use Cases:
- CRR - compliance, lower latency access, replication across accounts.
- SRR - log aggregation, live replication between production and test accounts.
Storage Classes
- Durability: How many times the objects are going to be lost by amazon. All storage classes are very durable.
- Availability: Measures how readily services are available.
- Lifecycle Rules: can be used to define when S3 objects should be transitioned to another storage class or when objects should be deleted after some time.
- Types of Classes:
- Amazon S3 Standard - General Purpose.
- 99.99% Availability.
- Used for frequently accessed data.
- Low latency and high throughput.
- Sustain 2 concurrent facility failures.
- Use Cases: Big data analytics, mobile and gaming apps, content distribution, etc.
- Amazon S3 Standard-Infrequent Access (IA).
- For data that is less frequently accessed but requires rapid access when needed.
- Lower cost than S3 standard.
- 99.9% Availability.
- Use Cases: Disaster Recovery, backups.
- Amazon S3 One Zone-Infrequent Access.
- High durability (99.999999999%) in a single AZ; data lost when AZ is destroyed.
- 99.5% Availability.
- Use Cases: Store secondary copies of on-premise data.
- Amazon S3 Glacier Instant Retrieval.
- Low-cost object storage meant for archiving / backup.
- Pricing: price for storage + object retrieval cost.
- Millisecond retrieval, great for data accessed once a quarter.
- Minimum storage duration 90 days.
- Amazon S3 Glacier Flexible Retrieval.
- Low-cost object storage meant for archiving / backup.
- Pricing: price for storage + object retrieval cost.
- Expedited (1 to 5 minutes).
- Standard (3 to 5 hours).
- Bulk (5 to 12 hours).
- Minimum storage duration is 90 days.
- Amazon S3 Glacier Deep Archive.
- Low-cost object storage meant for archiving / backup.
- Pricing: price for storage + object retrieval cost.
- Standard (12 hours).
- Bulk (48 hours).
- Minimum Storage duration 180 days.
- Amazon S3 Intelligent Tiering.
- Small monthly monitoring and auto-tiering fee.
- Moves objects automatically between access tiers based on usage.
- There are no retrieval charges in S3 Intelligent Tiering.
- Frequent Access tier (automatic): default tier.
- Infrequent Access tier (automatic): objects not accessed for 30 days.
- Archive Instant Access tier (automatic): objects not accessed for 90 days.
- Archive Access tier (optional): configurable from 90 days to 700+ days.
- Deep Archive Access tier (optional): config. from 180 days to 700+ days.
- Amazon S3 Standard - General Purpose.
Encryption
- Server Side Encryption (default).
- Anything uploaded into a bucket is encrypted by Amazon S3.
- Client side encryption:
- When someone encrypts their file and then uploads it to a bucket.
IAM Access Analyzer for S3
- Ensures that only intended people have access to your S3 buckets.
- Analyses all your policies and rules and will let you know which buckets are publicly accessible and will allow you to ensure that all buckets are only being accessed by those intended.
Shared Responsibility Model for S3
- AWS responsible for:
- Infrastructure (global security, durability, availability, sustain concurrent loss of data in two facilities.)
- Configuration and vulnerability analysis.
- Compliance validation.
- You are responsible for:
- S3 versioning.
- S3 bucker policies.
- S3 replication setup.
- logging and monitoring.
- S3 storage classes.
- data encryption at rest in transit.
Snow Family
- Highly secure, portable devices to collect and process data at teh edge and migrate data into and out of AWS.
- Services for Data migration: Snowcone, Snowball Edge, Snowmobile.
- Services for Edge Computing: Snowcone, Snowball Edge
- Data migration with AWS Snow Family:
- Challenges:
- limited connectivity.
- limited bandwidth.
- high network cost.
- shared bandwidth.
- connection stability.
- If it takes more than a week to transfer over the network, use snowball devices.
- It works by receiving a physical device from AWS and uploading data onto it locally and shipping that back to AWS.
- Challenges:
- Snowball Edge:
- is used for moving TBs or PBs of data in or out of AWS.
- Alternative to moving data over the network.
- Pay per data transfer job.
- Provide block storage and amazon S3-compatible object storage.
- Pricing: you pay for device usage and data transfer out of AWS not in.
- 2 types:
- Storage optimized.
- Compute optimized.
- Snowcone and Snowcone SSD:
- Small portable computing anywhere, rugged and secure, withstands harsh environments.
- Used for edge computing, storage, and data transfer.
- 2 Types:
- Snowcone - 8TB of HDD storage.
- Snowcone SSD - 14TB of SSD storage.
- Must provide your own battery and cables.
- Can be sent back to AWS offline, or connect it to internet and use AWS DataSync to send data.
- Snowmobile:
- Transfer exabytes of data (1 EB = 1000 PB = 1 000 000 TBs).
- Each snowmobile has 100 PB capacity.
- High Security.
- Better than Snowball if you start transferring more than 10PB of data.
- Edge Computing:
- Is when you process data while it's being created on an edge location (truck on the road, ship on the sea).
- Edge locations may have limited internet access, limited access to computing power.
- We set up and Snowball edge or Snowcone device to do edge computing.
- Use cases: Pre-process data, machine learning at the edge, transcoding media streams.
- Ship device back to AWS when ready to transfer the data.
- AWS OpsHub:
- a software you install on your computer to help you manage your snow devices without a CLI.
Storage Gateway
- Bridge between on-premise data and cloud data in S3.
- Hybrid storage service to allow on-premises to seamlessly use the AWS cloud.
- Use Cases: disaster recovery, backup & restore, tiered storage.
- Types of Storage Gateway:
- File Gateway.
- Volume Gateway.
- Tape Gateway.