Graduate Program KB

Amazon S3

  • Amazon S3 is one of the main building blocks of AWS, advertised as "infinitely scaling" storage

    • Utilised by many websites and other AWS services as well
  • Use cases:

    • Backup and storage
    • Disaster recovery
    • Archive
    • Hybrid Cloud storage
    • Application hosting
    • Media hosting
    • Data lakes & big data analytics
    • Software delivery
    • Static website
  • Buckets: Directories which allow people to store objects (files)

    • Must have globally unique name (across all regions and all accounts)
    • Defined at the region level
    • S3 looks like a global service but buckets are created in a region
    • Naming convention:
      • No uppercase, no underscore
      • 3-63 characters long
      • Not an IP
      • Must start with a lowercase letter or number
      • Must NOT start with the prefix xn--
      • Must NOT end with the suffix -s3alias
  • Objects: Files that have a key

    • The key is the full path of the file
      • Ex. s3://my-bucket/my_folder/my_file.txt
    • There's no concept of "directories" within buckets (UI may appear tricky)
    • Keys are just very long names composed of slashes (/), a prefix (directory) and an object name (file)
    • Object values are the content of the body
      • Maximum object size is 5TB
      • If uploading more than 5GB, must use multi-part upload
    • Metadata (list of text key / value pairs - set by user or system)
    • Tags (Unicode key / value pair - up to 10), useful for security / lifecycle
    • Version ID (only if versioning is enabled)

Security

  • User-Based

    • IAM Policies: Which API calls should be allowed for a specific user from IAM
  • Resource-Based

    • Bucket Policies: Bucket wide rules from the S3 console (allows cross account)
    • Object Access Control List (ACL): Finer grain (can be disabled)
    • Bucket Access Control List (ACL): Less common (can be disabled)
  • NOTE: An IAM principal can access an S3 object if

    • The user IAM permissions ALLOW it OR resource policy ALLOWS it
    • AND there's no explicit DENY
  • Encryption: Encrypt objects in S3 using encryption keys

S3 Bucket Policies

  • JSON based policies

    • Resources: Buckets and objects
    • Effect: Allow / Deny
    • Actions: Set of API to Allow or Deny
    • Principal: The account or user to apply the policy to
  • Use S3 bucket for policy to:

    • Grant public access to the bucket
    • Force objects to be encrypted at upload
    • Grant access to another account (Cross-Account access)
    {    
        "Version": "2012-10-17",    
        "Statement": [        
            {            
                "Sid": "PublicRead",           
                "Effect": "Allow",
                "Principal": "*",
                "Action": [
                    "s3:GetObject"
                ],
                "Resource": [
                    "arn:aws:s3:::examplebucket/*"
                ]    
            }         
        ] 
    }
    
  • Bucket settings for Block Public Access (can be set at the account level)

    • Block public access to buckets and objects granted through new access control lists (ACLs)
    • Block public access to buckets and objects granted through any access control lists (ACLs)
    • Block public access to buckets and objects granted through new public bucket or access point policies
    • Block public and cross-account access to buckets and objects through any public bucket or access point policies
  • These settings were created to prevent company data leaks, leave them on for private buckets

Static Website Hosting

Versioning Files with S3

  • Enable versioning at the bucket level
  • Same key overwrite will change the "version": 1, 2, 3...
  • Best practice to version buckets
    • Protect against accidental deletes (ability to restore a version)
    • Easy roll back to previous version
  • Notes:
    • Any file that is not versioned prior to enabling versioning will have version "null"
    • Suspending versioning doesn't delete the previous versions

Replication (CRR & SRR)

  • Goal is to setup asynchronous replication between a source bucket and target bucket

    • Must enable versioning in source and destination buckets
    • Buckets can be in different AWS accounts
    • Must give proper IAM permissions to S3
  • Cross-Region Replication (CRR):

    • Use cases: Compliance, lower latency access, replication across accounts
  • Same-Region Replication (SRR):

    • Use cases: Log aggregation, live replication between production and test accounts

S3 Storage Classes

  • Amazon S3 Standard - General Purpose: For frequently accessed data (more than once a month)

    • 99.99% availability
    • Low latency and high throughput (access time of milliseconds)
    • Sustain 2 concurrent facility failures
    • Use cases: Big data analytics, mobile & gaming applications, content distribution
  • Infrequent Access (IA): For data less frequently accessed, but requires rapid access when needed

    • Lower cost than S3 Standard
    • Amazon S3 Standard-Infrequent Access (IA)
      • 99.9% availability
      • Use cases: Disaster recovery, backups
    • Amazon S3 One Zone-Infrequent Access
      • 99.999999999% durability in a single AZ but data lost when AZ is destroyed
      • 99.5% availability
      • Use cases: Storing secondary backup copies of on-premise data or data you can recreate
  • Glacier: Low-cost object storage meant for archiving / backup

    • Pay for storage and object retrieval cost
    • Amazon S3 Glacier Instant Retrieval: For data accessed once a quarter
      • Fast retrieval, takes milliseconds
      • Minimum storage duration of 90 days
    • Amazon S3 Glacier Flexible Retrieval: For yearly accessed archive data with varying retrieval times
      • Expedited (1 to 5 minutes)
      • Standard (3 to 5 hours)
      • Bulk (5 to 12 hours, free)
      • Minimum storage duration of 90 days
    • Amazon S3 Glacier Deep Archive: For long term storage accessed less than once a year
      • Standard (12 hours)
      • Bulk (48 hours)
      • Minimum storage duration of 180 days
  • Amazon S3 Intelligent Tiering: Move objects automatically between Access Tiers based on usage

    • Small monthly monitoring and auto-tiering fee
    • No retrieval charges
    • Useful for data with changing or unknown access patterns
    • Frequent Acess tier (automatic): default tier
    • Infrequent Access tier (automatic): objects not accessed for 30 days
    • Archive Instant Access tier (automatic): objects not accessed for 90 days
    • Archive Acess tier (optional): configurable from 90 days to 700+ days
    • Deep Archive Access tier (optional): configurable from 180 days to 700+ days
  • Durability: Represents the number of times an object will be lost by S3

    • High durability (99.9999999%) of objects across multiple AZ
    • Ex. Storing 10 million objects with S3, you can expect on average to incur a loss of a single object once every 10000 years
    • Same durability for all storage classes
  • Availability: Measures how readily available a service is

    • Availability varies depending on storage class
    • Ex. S3 standard has 99.99% availability which is 53 minutes of a year
  • Can move between classes manually or using S3 Lifecycle configurations

S3 Encryption

  • Server-Side Encryption
    • Created buckets and uploaded objects are encrypted by default
    • Server encrypts the file after receiving it
  • Client-Side Encryption
    • User encrypts the file before uploading it

IAM Access Analyzer for S3

  • A monitoring feature for S3 buckets, ensuring only intended people have access to them
  • Evaluates S3 Bucket Policies, S3 ACLs, S3 Access Point Policies
  • Shows you which buckets are publicly accessible, what buckets are shared with other AWS accounts, etc. for review
  • Powered by IAM Access Analyzer, which allows you to find resources in your account shared with other entities

Shared Responsibility Model for S3

  • AWS responsibilities:
    • Infrastructure
      • Global security
      • Durability
      • Availability
      • Sustain concurrent loss of data in two facilities
    • Configuration and vulnerability analysis
    • Compliance validation
  • User responsibilities:
    • S3 Versioning
    • S3 Bucket Policies
    • S3 Replication setup
    • Logging and monitoring
    • S3 Storage Classes
    • Data encryption at rest and in transit

AWS Snow Family

  • Very secure and portable devices with two use cases
    • Collect and process data at the edge
    • Migrate data into and out of AWS

Data migration

  • Devices: Snowcone, Snowball Edge, Snowmobile

  • AWS Snow Family: Offline devices to perform data migrations

    • Recommended if it takes more than a week to transfer data over a network
  • Challenges:

    • Limited connectivity and bandwidth
    • High network cost
    • Shared bandwidth
    • Connection stability
  • Snowball Edge

    • Phyiscally transporting data to move TBs or PBs of data in and out of AWS
    • Alternative to moving data over the network and paying network fees
    • Pay per data transfer job
    • Provide block storage and S3-compatible object storage
    • Snowball Edge Storage Optimized
      • 80 TB of HDD capacity for block volume and S3 compatible object storage
    • Snowball Edge Compute Optimised
      • 42 TB of HDD or 28 TB NVMe capacity for block volume and S3 compatible object storage
    • Use cases: Large data cloud migrations, DC decommission, disaster recovery
  • Snowcone & Snowcone SSD devices

    • Small, portable, secure and durable in harsh environments
    • Used for edge computing, storage and data transfer
    • Snowcone: 8 TB of HDD storage
    • Snowcome SSD: 14 TB of SSD storage
    • Use a Snowcone where Snowball doesn't fit
    • Provide your own battery / cables
    • Two ways to send data back to AWS:
      • Send data back offline by shipping it
      • Connect it to the internet and use AWS DataSync to send data
  • Snowmobile

    • A truck which can transfer EBs (exabytes) of data
    • Each Snowmobile has 100 PB of capacity used in parallel
    • Highly secure, temperature controlled, GPS and 24/7 video surveillance
    • Better than Snowball if transferring more than 10 PB
  • Usage process:

    • Request Snowball devices from the AWS console for delivery
    • Install the snowball client / AWS OpsHub on your servers
    • Connect the snowball to your servers and copy files using the client
    • Ship back the device when completed to the correct AWS facility
    • Data will be loaded into an S3 bucket
    • Snowball is completely wiped

Edge computing

  • Devices: Snowcone, Snowball Edge

  • Processing data while it's being created on an edge location

  • Edge locations have none to limited internet access or access to computing power

  • Use cases: Preprocess data, machine learning at the edge, transcoding media streams

  • Can also ship back the device to AWS

  • Snowcone & Snowcone SSD (smaller)

    • 2 CPUs, 4 GB of memory, wired / wireless access
    • USB-C power using a cord or optional battery
  • Snowball Edge

    • Compute Optimized
      • 104 virtual CPUs, 416 GiB of RAM
      • Optional GPU for video processing / machine learning
      • 28 TB NVMe or 42 TB HDD of usable storage
      • Storage Clustering available for up to 16 nodes
    • Storage Optimized
      • Up to 40 virtual CPUs, 80 GiB of RAM, 80 TB storage
  • All devices can run EC2 instances & AWS Lambda functions using AWS IoT Greengrass

  • For long-term deployment options, discounted pricing at 1 and 3 years

AWS OpsHub

  • Software to manage your Snow Family Device

    • Unlocking, configuring single or clustered devices
    • Transferring files
    • Launching / managing instances running on Snow Family Devices
    • Monitor device metrics
    • Launch compatible AWS services
  • Alternatively can use the CLI

  • Snowball Edge pricing

    • Pay for device usage and data transfer out of AWS (putting data in is free)
    • On-Demand
      • One-time service fee per job
      • 10 and 15 days of usage for Storage Optimized 80 TB and 210 TB respectively
      • Shipping days don't contribute towards the days of usage
      • Pay per day for additional days of usage
    • Commited Upfront
      • Pay in advance for monthly, 1 year and 3 years of usage (Edge Computing)
      • Up to 62% discounted pricing

Hybrid cloud for storage

  • AWS wants part of your infrastructure on-premises and on the cloud

  • Reasons:

    • Long cloud migrations
    • Security requirements
    • Compliance requirements
    • IT strategy
  • S3 is a properietary storage technology which uses AWS Storage Gateway to expose the S3 data on-premise

  • AWS Storage Gateway: Hybrid storage service to allow on-premises to seamlessly use the AWS Cloud

    • Bridge between on-premise data and cloud data in S3
    • Use cases: Disaster recovery, backup & restore, tiered storage
    • Types of Storage Gateway:
      • File Gateway
      • Volume Gateway
      • Tape Gateway