Graduate Program KB

Section 12 - AWS Integration & Messaging

SQS

  • Fully managed service to decouple applications
  • Attributes:
    • Unlimited throughput and number of messages in queue
    • Default message retention of 4 days (max. 14 days)
    • Less than 10ms latency on publish/receive
    • Limitation of 256KB per message sent
  • Potentially duplicate and out-of-order messages
  • Producing messages:
    • Use SendMessage API (SDK) to produce messages to SQS
    • Messages are persisted in SQS until a consumer deletes it
  • Consuming messages:
    • Consumers could be EC2 instances, servers or lambda
    • Poll SQS for messages (receive up to 10 messages at a time)
    • Process the messages then delete the messages using the DeleteMessage API

SQS Security

  • Encryption
    • In-flight encryption using HTTPS API
    • At-rest encryption using KMS keys
    • Client-side encryption if the client wants to perform encryption/decryption
  • Access controls: IAM policies to regular access to SQS API
  • SQS Access Policies
    • Useful for cross-account access to SQS queues
    • Useful for allowing other services to write to an SQS queue

SQS Message Visibility Timeout

  • After a message is polled, it becomes invisible to other consumers
  • Visibility timeout is 30 seconds by default (means message also has 30 seconds to be processed)
  • Messages not processed within the timeout will be processed twice
  • Calling the ChangeMessageVisibility API can increase the time
    • If the timeout is too high (hours) and the consumer crashes then re-processing will take time
    • If the timeout is too low (seconds) then you risk duplicates

SQS Long Polling

  • Consumers can optionally wait for messages to arrive if there are none in queue, this is called long polling
  • Decrease the number of API calls made to SQS while increasing efficiency and reducing latency of app
  • Wait times can vary between 1 second to 20 seconds with the latter being preferable
  • Long polling can be enabled at the queue level or at the API level using WaitTimeSeconds

SQS FIFO Queue

  • First In First Out ordering of messages in the queue
  • Limited throughput of 300 msg/s without batching, 3000 msg/s with batching
  • Exactly-once send capability (removing duplicates)
  • Messages are processed in order by the consumer

SNS

  • Event producer sends messages to an SNS topic
  • Many event receivers (subscribers) want to listen to the SNS topic notifications
  • Each subscriber to the topic will receive all messages (unless filtered)
  • Up to 12.5 million subscriptions per topic, with a 100000 topics limit
  • Integrates with many other AWS services

Publishing with SNS

  • Topic Publish (SDK)
    • Create topic, create subscriptions and publish to the topic
  • Direct Publish (for mobile apps SDK)
    • Create platform app, create platform endpoint, publish to the platform endpoint
    • Works with Google GCM, Apple APNS, Amazon ADM, etc.

SNS Security

  • Encryption
    • In-flight encryption using HTTPS API
    • At-rest encryption using KMS keys
    • Client-side encryption if the client wants to perform encryption/decryption
  • Access controls: IAM policies to regular access to SNS API
  • SNS Access Policies
    • Useful for cross-account access to SNS queues
    • Useful for allowing other services to write to an SNS queue

SNS + SQS: Fan Out

  • Push once in SNS and receive in all SQS queues that are subscribers
  • Fully decoupled with no data loss
  • SQS allows for ata persistence, delayed processing and retries
  • Can add more SQS subscribers over time
  • Must ensure SQS queue access policy allows for SNS to write
  • Cross-Region Delivery works with SQS queues in other regions

SNS FIFO Topic

  • First In First Out ordering of messages in the topic
  • Similar features as SQS FIFO:
    • Ordering by Message Group ID
    • Deduplication using Deduplication ID or Content Based Deduplication
  • Can have SQS Standard and FIFO queues as subscribers
  • Limited throughput is same as SQS FIFO

SNS Message Filtering

  • JSON policy used to filter messages sent to SNS topic's subscriptions
  • No filter policy means they receive all messages

Kinesis

  • Collect, process and analyze streaming data in real-time
    • Data can be ingested via application logs, metrics, website clickstreams, IoT telemetry data, etc.
  • Kinesis Data Streams: Capture, process and store data streams
  • Kinesis Data Firehose: Load data streams into AWS data stores
  • Kinesis Data Analytics: Analyze data streams with SQL or Apache Flink
  • Kinesis Video Streams: Capture, process and store video streams

Kinesis Data Streams

  • 1 day to 365 days retention period
  • Can replay data
  • Data is immutable, it can't be deleted once inserted
  • Data sharing the same partition goes to the same shard
  • Producers: SDK, Kinesis Producer Library, Kinesis Agent
  • Consumers: Write your own using Kinesis Client Library or SDK, or use a managed consumer such as Lambda, Data Firehose, Data Analytics, etc.
  • Capacity modes:
    • Provisioned
      • Choose number of provisioned shards, scaling manually or using an API
      • Each shard gets 1 MB/s in and 2 MB/s out
      • Pay per shard provisioned per hour
    • On-demand
      • Don't need to provision to manage capacity
      • Default provisioned capacity is 4 MB/s in or 4000 records per second
      • Scales automatically based on observed throughput peak during the last 30 days
      • Pay per stream per hour & data in/out per GB

Kinesis Data Streams Security

  • Control access / authorization using IAM policies
  • Encryption in-flight using HTTPS endpoints
  • Encryption at rest using KMS
  • Implement encryption/decryption of data on client side
  • VPC endpoints available for Kinesis to access within VPC
  • Monitor API calls using CloudTrail

Kinesis Data Firehose

  • Fully managed service, no administration, automatic scaling
  • Pay for data input through Firehose
  • Buffer interval of 0 seconds to 90 seconds, with a minimum size of 1 MB
  • Supports many data formats, conversions, transformations, compressions
  • Supports custom data transformations using AWS Lambda
  • Can send failed or all data to a backup S3 bucket

Kinesis Data Streams vs. Firehose

  • Data Streams
    • Streaming service for ingest at scale
    • Custom code to create producer/consumer
    • Real-time
    • Manage scaling with shard splitting and merging
    • Data storage for 1 to 365 days
    • Supports replay capability
  • Firehose
    • Load streaming data into S3, Redshift, OpenSearch, 3rd party or custom HTTP
    • Fully managed
    • Near real-time
    • Automatic scaling
    • No data storage
    • Doesn't support replay capability

Ordering Data Into SQS

  • No ordering for standard SQS
  • For SQS FIFO, not using a Group ID will consume messages in the order they're sent when only one consumer is involved
    • Use a Group ID if you want to scale the number of consumers (similar to partition key in Kinesis)

Kinesis vs. SQS Ordering

  • Context: 100 trucks, 5 kinesis shards and 1 SQS FIFO
  • Data Streams
    • Average 20 trucks per shard
    • Trucks have their data ordered within each shard
    • Maximum amount of consumers in parallel we can have is 5
    • Can receive up to 5 MB/s of data
  • SQS FIFO
    • Only have one SQS FIFO queue
    • Will have 100 Group ID
    • Can have up to 100 consumers due to the 100 Group ID
    • You will have up to 300 msg/s or 3000 if using batching

Amazon MQ

  • Managed message broker service for RabbitMQ and ActiveMQ
  • Traditional apps running from on-premises may use open protocols such as MQTT, AMQP, STOMP, Openwire and WSS
  • Use Amazon MQ instead of re-engineering the app to use SQS and SNS
  • Does not scale as much as SQS/SNS
  • Can run on servers and in Multi-AZ with failover
  • Has both queue feature and topic features