Graduate Program KB

Section 16 - Databases

Types of Databases

  • RDBMS (SQL/OLTP): RDS, Aurora
  • NoSQL: DynamoDB (JSON), ElastiCache, Neptune (graphs), DocumentDB (Mongo), Keyspaces (Apache Cassandra)
  • Object Store: S3, Glacier (backups, archives)
  • Data Warehouse (Analytics): Redshift (OLAP), Athena, EMR
  • Search: OpenSearch (JSON)
  • Graphs: Neptune
  • Ledger: Amazon Quantum Ledger Database
  • Time series: Amazon Timestream

RDS

  • Managed PostgreSQL / MySQL / Oracle / SQL Server / DB2 / MariaDB / Custom
  • Provisioned RDS instance size and EBS volume type and size
  • Auto-scaling
  • Supports read replicas and multi-az
  • Security via IAM, security groups, KMS, SSL in transit
  • Managed and scheduled maintenance
  • Automated backup with PITR
  • Manual snapshot for longer-term recovery
  • RDS Custom provides access to customize underlying instance (Oracle and SQL Server)
  • Useful for storing relational datasets, performing SQL queries and transactions

Aurora

  • Compatible API for PostgreSQL and MySQL for separation of storage and compute
  • Data is stored in 6 replicas across 3 AZ
  • There are a cluster of DB instances across multiple AZ with auto-scaling of read replicas
    • Clusters are custom endpoints for writer and reader DB instances
  • Security via IAM, security groups, KMS, SSL in transit
  • Managed and scheduled maintenance
  • Aurora Serverless: For unpredictable workloads
  • Aurora Global: Up to 16 DB read instances in each region, less than 1 second storage replication
  • Aurora Machine Learning: Perform ML using SageMaker and Comprehend on Aurora
  • Aurora Database Cloning: New cluster from existing one, faster than restoring a snapshot
  • Same use cases as RDS but less maintenance, more flexibility, performance and features

ElastiCache

  • A managed Redis / Memcached, it's an in-memory data store with sub-millisecond latency
  • Supports Redis clustering and multi-az with read replicas (sharding)
  • Security via IAM, security groups, KMS and Redis Auth
  • Backup, snapshot and PITR features
  • Managed and scheduled maintenance
  • Need to modify app code to leverage ElastiCache
  • Useful for frequent reads, less writes and caching results for DB queries. Common use case is storing session data for websites

DynamoDB

  • Managed NoSQL database with millisecond latency
  • Provisioned capacity with optional auto-scaling or on-demand capacity
  • Could replace ElastiCache as a key/value store
  • Highly available, multi-az setup by default, read and writes decoupled with transaction capability
  • DAX cluster for caching reads, microsecond read latency
  • IAM for security, authentication and authorization
  • Can process events using different streaming services
  • Has a global table feature with active-active setup
  • Automated backups with PITR or on-demand backups
  • Can export to S3 without RCU within PITR window or import from S3 without using WCU
  • Useful for serverless app development with documents less than 100s of KB or as a distributed serverless cache

S3

  • A key/value store for objects (better for bigger objects)
  • Serverless, infinite scaling, max. object size is 5TB, versioning capability
  • Tiers: S3 Standard, S3 Infrequent Access, S3 Intelligent, S3 Glacier (lifecycle policies to move objects between tiers)
  • Features: Versioning, encryption, replication, MFA-Delete, Access logs, etc.
  • Security: IAM, bucket policies, ACL, access points, object lambda, CORS, object/vault lock
  • Encryption: SSE-S3, SSE-KMS, SSE-C, client-side, TLS in transit, default encryption
  • Batch operations: For objects using S3 Batch, listing files using S3 Inventory
  • Performance: Multi-part upload, S3 Transfer Acceleration, S3 Select
  • Automation with S3 Event Notifications, SNS, SQS, Lambda, EventBridge
  • Useful for static files, key/value store for big files and website hosting

DocumentDB

  • DocumentDB is a fully managed AWS implementation of MongoDB (NoSQL DB) with high availability and replication across 3 AZ
  • MongoDB is used to store, query and index JSON data
  • Similar deployment concepts as Aurora
  • Automatically grows in increments of 10GB and scales to workloads with millions of requests per second

Neptune

  • A fully managed graph database that's highly available across 3 AZ with up to 15 read replicas
  • Can build and run apps working highly connected datasets, optimizing for complex and hard queries
  • Store up to billions of relations and query the graph with milliseconds latency
  • Useful for knowledge graphs such as Wikipedia, fraud detection, recommendation engines and social networking
  • Streams
    • Get real-time ordered sequence of every change to your graph data
    • Changes are available immediately after writing
    • No duplicates, there is a strict order
    • Streams data is accessible in a HTTP REST API
    • Useful for sending notifications for certain changes, maintaining synchronized graph data with another data store, replication across regions

Keyspaces

  • A managed Apache Cassandra-compatible (open source NoSQL distributed database) database service
  • Scalable, highly available with tables being able to be replicated 3 times across multiple AZ
  • Automatically scales tables up/down based on app traffic
  • Less than 10ms latency at any scale and can handle thousands of requests per second
  • Capacity: On-demand mode or provisioned mode with auto-scaling
  • Encryption, backup, PITR
  • Useful for storing IoT devices information, timeseries data, etc.

QLDB

  • Ledgers are books for recording financial transactions, QLDB is a fully managed service used to review history of all changes made to your app data
  • Highly available with replication across 3 AZ
  • The system is immutable, meaning no entry can be removed or modified (cryptographically verifiable)
  • Has 2-3x better performance than common ledger blockchain frameworks
  • No decentralization component compared to Amazon Managed Blockchain in accordance with financial regulation rules

Timestream

  • A fully managed timeseries database
  • Automatic scaling to adjust capacity and is thousands of times faster and 10% the cost of relational databases
  • Features scheduled queries, multi-measure records and SQL compatibility
  • Tiering for storing recent data in memory and historical data in a cost-optimized storage
  • Built-in time series analytics functions
  • Encryption in transit and at rest
  • Useful for IoT apps, operational apps, real-time analytics, etc.