How Real-Time Chat Actually Works Under the Hood

Building a chat system that works for two users is easy. Building one that stays responsive, consistent, and affordable at 100,000 concurrent users is a fundamentally different engineering problem. This guide explains how the key pieces fit together — not as a textbook abstract, but as a practical map of the decisions and tradeoffs that define production chat systems.

The architecture described here reflects patterns used by real systems, including the design choices that underpin platforms like this one.

The Core Problem: HTTP Wasn't Designed for This

The web's original protocol, HTTP, follows a request-response pattern: the client asks, the server answers, the connection closes. For chat, you need the opposite — the server needs to push messages to clients as soon as they arrive, without waiting for the client to ask.

Three solutions evolved:

  • Long polling: The client makes a request and the server holds it open until there's something to send. Works everywhere, but creates a new connection constantly and doesn't scale elegantly.
  • Server-Sent Events (SSE): A one-way persistent HTTP connection where the server pushes data to the client. Simpler than WebSockets but only supports server-to-client direction.
  • WebSockets: A persistent, full-duplex TCP connection established through an HTTP upgrade handshake. Both sides can send at any time. This is the standard for production chat systems.

Socket.IO, used in many Node.js chat systems, wraps WebSockets with automatic fallback to long polling, connection resumption, and a room/namespace abstraction that maps cleanly to chat concepts.

The Single-Server Architecture

Understanding the scaling problem requires understanding what works when you don't need to scale. On a single server:

  1. A user connects via WebSocket. The server holds an in-memory map of userId → socket.
  2. When user A sends a message to user B, the server looks up B's socket in the map and emits directly to it.
  3. Active pair tracking, the waiting queue, session data — all of this lives in server memory.

This is fast and simple. The problem is that it only works until you need more than one server process.

Why One Server Eventually Fails You

A single Node.js process runs on one CPU core. Even with efficient I/O, there's a ceiling on concurrent connections (typically 10,000–65,000 per process depending on configuration and workload). If you need more — or if you want redundancy so the service survives a server restart — you need multiple instances. And now the userId → socket map problem becomes critical: if user A is connected to server 1 and user B is connected to server 2, server 1 can't look up B's socket directly.

Horizontal Scaling With Redis Pub/Sub

The standard solution is a message broker — a shared communication layer that all server instances can publish to and subscribe from. Redis is almost universally used for this in Node.js architectures because it's fast, simple, and already present as the session/state cache.

How Socket.IO Adapter Works

The @socket.io/redis-streams-adapter (or the older socket.io-redis) intercepts every emit call. Instead of delivering the message directly to a local socket, it publishes an operation to a Redis Stream or Pub/Sub channel. Every server instance subscribes to this channel. When a message arrives, each server checks whether it holds the target socket — the one that does delivers it locally.

This means:

  • User A on server 1 sends a message to user B on server 2.
  • Server 1 publishes to Redis: "emit 'receive-message' to socket xyz".
  • Server 2, subscribed to the channel, receives the instruction.
  • Server 2 holds socket xyz and delivers the message.

The application code doesn't change — io.to(socketId).emit(...) works the same way. The adapter handles the distribution transparently.

State That Must Live in Redis, Not Server Memory

Any state that multiple server instances need to read or write must be stored in Redis, not in local server memory:

  • User sessions — which user is connected, their current state, their active conversation partner.
  • The waiting queue — the list of users waiting to be matched. A Redis list with atomic operations ensures two servers don't match the same user twice.
  • Active pair data — which two users are currently connected.
  • Online counts — maintained with a Redis Set for O(1) cardinality queries.

Atomic Operations and Race Conditions

Distributed systems create race conditions that don't exist on a single server. The matching problem is a textbook example: if two servers simultaneously try to match users from the waiting queue, they might both read the same user and match them twice.

The solution is atomic Lua scripts executed in Redis. Lua scripts in Redis run atomically — no other command executes between the script's read and write operations. A typical atomicJoinQueue script:

  1. Check if the user is already in a conversation. If yes, return early.
  2. Check if the waiting queue is non-empty. If yes, pop a candidate atomically.
  3. Verify the candidate is still valid (connected, not already matched).
  4. Store the match data for both users in a single atomic transaction.
  5. If the queue was empty, push the current user to the queue and return.

Because this runs as a single Redis operation, no other server can interleave between steps 2 and 4. The match is created exactly once.

Distributed locks (implemented with Redis SET NX PX) solve a related problem: preventing multiple tabs or reconnects from the same user from creating duplicate sessions simultaneously.

Message Persistence: PostgreSQL via Prisma

Redis is fast but not designed as a primary data store. Messages that need to persist — conversation history, friendship data, user records — go to PostgreSQL. The typical pattern in a chat system:

  • Hot path (real-time delivery): Redis handles presence, routing, and the current session state. This keeps latency under 10ms for most operations.
  • Persistence path (write to DB): Messages are written to PostgreSQL asynchronously after delivery. Users don't wait for the DB write before the message appears.
  • Read path (history): When a user loads conversation history, it comes from PostgreSQL, not Redis.

A DB concurrency queue with exponential backoff handles burst writes during high traffic: instead of hammering PostgreSQL with thousands of simultaneous writes, a queue (max N parallel operations) absorbs spikes and retries transient failures without losing messages.

Connection Lifecycle and Edge Cases

A production chat system spends most of its complexity not on the happy path but on edge cases:

  • Reconnects: Mobile users lose connectivity constantly. When a socket disconnects, the server waits briefly before cleaning up state — a reconnecting user within that window gets their existing conversation restored (rejoined event) rather than being re-queued.
  • Multi-tab handling: A user opening multiple tabs shouldn't create multiple sessions. Distributed locks serialized in Redis ensure only one tab establishes an active session.
  • Stale state: TTLs on all Redis keys (sessions, match data, conversations) ensure that zombie state from crashed sessions doesn't persist indefinitely.
  • Graceful degradation: If Redis becomes unavailable, the system should fail informatively rather than silently delivering messages to wrong recipients.

Image Uploads: Avoiding Bottlenecks

Routing image uploads through the chat server creates unnecessary load. The standard pattern for chat image handling:

  1. Client requests a pre-signed URL from the server (a temporary URL that authorizes a direct upload to cloud storage).
  2. Client uploads the image directly to cloud storage (e.g., Cloudflare R2 or S3) — the chat server never touches the image data.
  3. Client sends the chat server only the resulting public URL as a message.
  4. Server delivers that URL to the recipient via the normal message path.

This keeps the chat server stateless with respect to media and allows the storage layer to scale independently.

Key Takeaways

  • WebSockets are the standard for real-time chat — persistent, full-duplex, and low-latency.
  • Single-server architectures fail at scale because the socket-to-user map can't be shared; Redis pub/sub or Streams-based adapters solve this transparently.
  • Any state accessed by multiple server instances must live in Redis, not server memory.
  • Atomic Lua scripts in Redis prevent race conditions in matching and session management.
  • Hot path (Redis) and persistence path (PostgreSQL) serve different roles — don't conflate them.
  • Image uploads should bypass the chat server entirely using pre-signed URLs to cloud storage.
  • Most production complexity lives in edge cases: reconnects, multi-tab handling, stale state cleanup, and graceful failure modes.