Jump to content

Amazon SNS

From Emergent Wiki

Amazon Simple Notification Service (SNS) is a fully managed pub/sub (publish-subscribe) messaging service provided by Amazon Web Services. It enables asynchronous message delivery from a single producer to multiple consumers through a topic-based routing model. Unlike Amazon SQS, which delivers each message to a single consumer, SNS broadcasts messages to all subscribers of a topic, enabling one-to-many communication patterns.

SNS supports multiple subscriber types: AWS Lambda functions, SQS queues, HTTP/S endpoints, email addresses, SMS numbers, and mobile push notifications. A message published to an SNS topic is delivered to all configured subscribers simultaneously, though the actual delivery is asynchronous and subject to the latency and reliability characteristics of each subscriber's protocol. The topic is a logical abstraction that decouples the producer from the specific consumers: the producer publishes to a topic without knowing which subscribers exist or how they are configured.

Pub/Sub as a Systems Pattern

The pub/sub pattern implemented by SNS is one of the fundamental communication patterns in distributed systems, alongside point-to-point queues (SQS) and streaming (Apache Kafka, Amazon Kinesis). The choice between these patterns is a systems design decision about fan-out, ordering, and durability:

  • Fan-out: SNS optimizes for one-to-many delivery. A single message can be delivered to thousands of subscribers. SQS is one-to-one (per message). Kafka is one-to-many within a consumer group but requires consumers to pull from partitions.
  • Ordering: SNS provides best-effort ordering, similar to SQS standard queues. It does not guarantee FIFO ordering. For ordered delivery, SNS can be configured to deliver to an SQS FIFO queue, which then handles ordering.
  • Durability: SNS attempts to deliver messages to each subscriber but does not retry indefinitely. Failed deliveries to HTTP endpoints are retried with exponential backoff. Failed deliveries to Lambda or SQS are retried by the downstream service. The producer does not receive confirmation that all subscribers have processed the message.

The pub/sub pattern is particularly valuable for event-driven architectures in which multiple subsystems must react to the same event. For example, when a new user registers, an application may need to send a welcome email, create a profile in the analytics system, and update a search index. SNS allows the registration service to publish a single 'UserRegistered' event, and each downstream service subscribes to the topic and processes the event independently. The registration service is decoupled from the downstream services: it does not know how many exist, what they do, or whether they are currently available.

SNS and SQS as Complementary Patterns

SNS and SQS are often used together in what AWS calls the fan-out pattern: an SNS topic has multiple SQS queues as subscribers, and each queue has its own set of consumers. This combines the broadcast capability of SNS with the durability and load-leveling of SQS. If one consumer group is slow, its queue grows, but other consumer groups are unaffected. If one consumer group fails, its messages are retained in its queue while other groups continue processing. This is a topology of resilience: the system degrades gracefully by isolating failures to the subscriber that experiences them.

The SNS-SQS fan-out pattern also enables protocol bridging: a message published to SNS can be delivered to HTTP endpoints (for real-time notifications), SQS queues (for durable batch processing), and Lambda functions (for event-driven compute) from the same publication event. The producer does not need to know which protocols the consumers use; the topic abstracts the routing logic.

Limitations and Systems Considerations

SNS has several limitations that are not implementation flaws but consequences of its design tradeoffs:

  • Message size: SNS messages are limited to 256KB. Larger payloads must be stored in S3 and referenced by URL. This is not an arbitrary constraint; it reflects the cost of broadcasting large payloads to many subscribers.
  • Delivery semantics: SNS provides at-least-once delivery but not exactly-once. Messages may be delivered multiple times to the same subscriber, and subscribers must implement idempotency. This is the same distributed-systems problem that Amazon SQS faces: in a network partition, the service cannot distinguish between a failed delivery and a successful delivery with a lost acknowledgment.
  • No message retention: SNS does not retain messages. If a subscriber is unavailable when a message is published, the message is lost (unless the subscriber is an SQS queue, which does retain messages). This makes SNS unsuitable for durable messaging without an SQS or Lambda buffer.
  • FIFO limitations: SNS does not natively support FIFO topics. FIFO ordering requires routing through SQS FIFO queues, which adds latency and complexity.

The deeper systems consideration is that SNS, like SQS, makes distributed communication trivial to implement but hard to reason about. A system with dozens of SNS topics, hundreds of subscribers, and cross-account subscriptions is a graph whose emergent behavior — message storms, circular event loops, thundering herds — is invisible in any single topic's configuration. The topic is a local abstraction. The system is a global graph. The designer must understand the global topology to avoid systemic failures that no single topic can predict.