Section 16 - Databases
Types of Databases
- RDBMS (SQL/OLTP): RDS, Aurora
- NoSQL: DynamoDB (JSON), ElastiCache, Neptune (graphs), DocumentDB (Mongo), Keyspaces (Apache Cassandra)
- Object Store: S3, Glacier (backups, archives)
- Data Warehouse (Analytics): Redshift (OLAP), Athena, EMR
- Search: OpenSearch (JSON)
- Graphs: Neptune
- Ledger: Amazon Quantum Ledger Database
- Time series: Amazon Timestream
RDS
- Managed PostgreSQL / MySQL / Oracle / SQL Server / DB2 / MariaDB / Custom
- Provisioned RDS instance size and EBS volume type and size
- Auto-scaling
- Supports read replicas and multi-az
- Security via IAM, security groups, KMS, SSL in transit
- Managed and scheduled maintenance
- Automated backup with PITR
- Manual snapshot for longer-term recovery
- RDS Custom provides access to customize underlying instance (Oracle and SQL Server)
- Useful for storing relational datasets, performing SQL queries and transactions
Aurora
- Compatible API for PostgreSQL and MySQL for separation of storage and compute
- Data is stored in 6 replicas across 3 AZ
- There are a cluster of DB instances across multiple AZ with auto-scaling of read replicas
- Clusters are custom endpoints for writer and reader DB instances
- Security via IAM, security groups, KMS, SSL in transit
- Managed and scheduled maintenance
- Aurora Serverless: For unpredictable workloads
- Aurora Global: Up to 16 DB read instances in each region, less than 1 second storage replication
- Aurora Machine Learning: Perform ML using SageMaker and Comprehend on Aurora
- Aurora Database Cloning: New cluster from existing one, faster than restoring a snapshot
- Same use cases as RDS but less maintenance, more flexibility, performance and features
ElastiCache
- A managed Redis / Memcached, it's an in-memory data store with sub-millisecond latency
- Supports Redis clustering and multi-az with read replicas (sharding)
- Security via IAM, security groups, KMS and Redis Auth
- Backup, snapshot and PITR features
- Managed and scheduled maintenance
- Need to modify app code to leverage ElastiCache
- Useful for frequent reads, less writes and caching results for DB queries. Common use case is storing session data for websites
DynamoDB
- Managed NoSQL database with millisecond latency
- Provisioned capacity with optional auto-scaling or on-demand capacity
- Could replace ElastiCache as a key/value store
- Highly available, multi-az setup by default, read and writes decoupled with transaction capability
- DAX cluster for caching reads, microsecond read latency
- IAM for security, authentication and authorization
- Can process events using different streaming services
- Has a global table feature with active-active setup
- Automated backups with PITR or on-demand backups
- Can export to S3 without RCU within PITR window or import from S3 without using WCU
- Useful for serverless app development with documents less than 100s of KB or as a distributed serverless cache
S3
- A key/value store for objects (better for bigger objects)
- Serverless, infinite scaling, max. object size is 5TB, versioning capability
- Tiers: S3 Standard, S3 Infrequent Access, S3 Intelligent, S3 Glacier (lifecycle policies to move objects between tiers)
- Features: Versioning, encryption, replication, MFA-Delete, Access logs, etc.
- Security: IAM, bucket policies, ACL, access points, object lambda, CORS, object/vault lock
- Encryption: SSE-S3, SSE-KMS, SSE-C, client-side, TLS in transit, default encryption
- Batch operations: For objects using S3 Batch, listing files using S3 Inventory
- Performance: Multi-part upload, S3 Transfer Acceleration, S3 Select
- Automation with S3 Event Notifications, SNS, SQS, Lambda, EventBridge
- Useful for static files, key/value store for big files and website hosting
DocumentDB
- DocumentDB is a fully managed AWS implementation of MongoDB (NoSQL DB) with high availability and replication across 3 AZ
- MongoDB is used to store, query and index JSON data
- Similar deployment concepts as Aurora
- Automatically grows in increments of 10GB and scales to workloads with millions of requests per second
Neptune
- A fully managed graph database that's highly available across 3 AZ with up to 15 read replicas
- Can build and run apps working highly connected datasets, optimizing for complex and hard queries
- Store up to billions of relations and query the graph with milliseconds latency
- Useful for knowledge graphs such as Wikipedia, fraud detection, recommendation engines and social networking
- Streams
- Get real-time ordered sequence of every change to your graph data
- Changes are available immediately after writing
- No duplicates, there is a strict order
- Streams data is accessible in a HTTP REST API
- Useful for sending notifications for certain changes, maintaining synchronized graph data with another data store, replication across regions
Keyspaces
- A managed Apache Cassandra-compatible (open source NoSQL distributed database) database service
- Scalable, highly available with tables being able to be replicated 3 times across multiple AZ
- Automatically scales tables up/down based on app traffic
- Less than 10ms latency at any scale and can handle thousands of requests per second
- Capacity: On-demand mode or provisioned mode with auto-scaling
- Encryption, backup, PITR
- Useful for storing IoT devices information, timeseries data, etc.
QLDB
- Ledgers are books for recording financial transactions, QLDB is a fully managed service used to review history of all changes made to your app data
- Highly available with replication across 3 AZ
- The system is immutable, meaning no entry can be removed or modified (cryptographically verifiable)
- Has 2-3x better performance than common ledger blockchain frameworks
- No decentralization component compared to Amazon Managed Blockchain in accordance with financial regulation rules
Timestream
- A fully managed timeseries database
- Automatic scaling to adjust capacity and is thousands of times faster and 10% the cost of relational databases
- Features scheduled queries, multi-measure records and SQL compatibility
- Tiering for storing recent data in memory and historical data in a cost-optimized storage
- Built-in time series analytics functions
- Encryption in transit and at rest
- Useful for IoT apps, operational apps, real-time analytics, etc.