WebSocket vs SSE Benchmark: SSE Uses 40% Less Memory at 100K Connections

A team was building a real-time dashboard — prices, inventory counts, threshold alerts. They picked WebSocket without hesitation, thinking it was the obvious choice for real-time communication. When the benchmark results came in, the memory usage made them reconsider.

The Bottom Line

At 100K concurrent connections, SSE uses roughly 40% less memory than WebSocket.

These are Ark Protocol’s real benchmark numbers. They implemented both protocols in Rust + Axum, then ran identical stress tests:

Metric	WebSocket	SSE	Delta
Memory/connection	~52KB	~31KB	-40%
100K total memory	~5.2GB	~3.1GB	-2.1GB
CPU (low-frequency)	baseline	baseline	~same
Horizontal scaling	needs stateful LB	standard HTTP LB	SSE wins

The gap isn’t from some micro-optimization in a corner of the code. It’s fundamental differences in how the two protocols work under the hood.

Protocol Mechanics: Why the Memory Gap Exists

WebSocket: A Permanent Dedicated Phone Line

Once a WebSocket connection is established, the TCP connection stays open. Both sides can send data anytime. This means:

Each connection needs an independent Task on the server: read/write separation, state tracking, connection lifecycle management
Protocol has frame parsing overhead: Opcode (4 bits), Mask bit (1 bit), Payload Length (7/16/64-bit variable), frame checksum
Application-layer heartbeat needs its own timer: even without data transfer, heartbeat packets still consume memory

When connection counts are low, this isn’t a problem. At scale, memory starts screaming — every single connection is eating RAM.

SSE: A Mailbox on Your Door

SSE is HTTP-based, with the server pushing data one-way to the client. Once the connection is established, the client just needs an EventSource API, and the server sends text events on demand.

HTTP/1.1 pipeline lets connections be reused: one connection can carry SSE streams for multiple clients
No WebSocket-style frame parsing overhead: it’s just HTTP streaming response, Text/Event-Source type
HTTP middleware natively understands SSE: NGINX, Cloudflare, AWS ALB all know how to handle it — they won’t misidentify it as an idle connection

The trade-off is clear: one-way server→client only. Client wants to send data? Open another HTTP request.

When to Use Which: Real-World Scenarios

Go with SSE when

Single-direction data flow is your use case. For example:

Real-time price feeds (server pushes, client watches)
Inventory count notifications
Log streams, monitoring dashboards, CI/CD build status
Alert threshold triggers

In these scenarios the client never initiates, and SSE handles it perfectly.

Horizontal scaling is SSE’s strong suit. SSE runs on standard HTTP, so you can route traffic through any HTTP-aware load balancer. Adding 100 backend servers is trivial — no connection state sharing problems like with WebSocket.

Go with WebSocket when

Bidirectional, frequent, low-latency interaction is required. For example:

Chat rooms, collaborative editing (multiple people operating simultaneously)
Game commands (you move, the other player sees immediately)
Financial order submission (bidirectional handshake confirmation)

In these scenarios the bidirectional channel is non-negotiable. Forcing SSE means opening two connections (one push, one pull), which adds complexity.

Gray area: Hybrid Architecture

Some teams use both: SSE for high-volume downstream data (market data, notifications), WebSocket for small high-frequency upstream commands (orders, chat). This is a reasonable compromise.

Interpreting the Benchmark: CPU and Latency Are a Different Story

Ark Protocol’s tests showed a 40% memory gap, but CPU usage was nearly identical. SSE saves memory, WebSocket saves CPU? Be careful with that conclusion.

What actually impacts CPU is message frequency and payload size, not the protocol itself. In millisecond-level high-frequency push scenarios, WebSocket’s binary frames (minimum 2-byte frame header) are more compact than SSE’s text events (data: ...\n\n), so WebSocket actually uses less CPU.

Don’t make decisions based on memory alone:

Dimension	SSE advantage	WebSocket advantage
Memory usage	✅ 40%↓ at 100K connections	-
CPU efficiency (high-freq)	-	✅ binary frames more compact
Bidirectional	❌	✅ native support
Horizontal scaling	✅ standard HTTP LB	❌ needs state sharing
Middleware compatibility	✅ standard HTTP	❌ proprietary protocol
Reconnection	browser auto-reconnects	manual implementation
Heartbeat	LB handles it	needs app-layer impl

Code Comparison: Axum SSE vs WebSocket

SSE Version

use axum::{Router, routing::get, response::sse::{Sse, Event}};
use tokio_stream::wrappers::BroadcastStream;
use tokio::sync::broadcast;
use std::time::Duration;

async fn sse_handler(broadcast_rx: broadcast::Receiver<String>) -> Sse<Event> {
    let stream = BroadcastStream::new(broadcast_rx).map(|msg| {
        Ok(Event::default().data(msg.unwrap_or_default()))
    });
    Sse::new(stream).keepalive(
        axum::response::sse::keep_alive()
            .interval(Duration::from_secs(15))
    )
}

#[tokio::main]
async fn main() {
    let (tx, _rx) = broadcast::channel::<String>(100);
    let app = Router::new()
        .route("/stream", get(sse_handler));
    // Broadcast to all subscribers with tx.send()
    println!("SSE server running on :8080");
}

SSE core advantages:

45 lines of code, done
No connection mapping table to maintain
Browser handles reconnection automatically
Load balancers support it natively

WebSocket Version

use axum::{Router, routing::get, ws::{WebSocket, WebSocketUpgrade}};
use tokio::sync::broadcast;
use std::collections::HashMap;
use std::sync::Arc;
use tokio::sync::Mutex;

struct WSState {
    peers: Arc<Mutex<HashMap<String, broadcast::Sender<String>>>>,
}

async fn ws_handler(
    ws: WebSocketUpgrade,
    State(state): State<WSState>,
    Path(client_id): Path<String>,
) -> impl IntoResponse {
    ws.on_upgrade(move |socket| handle_socket(socket, state, client_id))
}

async fn handle_socket(socket: WebSocket, state: WSState, client_id: String) {
    let (sender, mut receiver) = broadcast::channel::<String>(100);
    {
        let mut peers = state.peers.lock().await;
        peers.insert(client_id.clone(), sender.clone());
    }
    let mut rx = state.tx.subscribe();
    let (ws_sender, mut ws_receiver) = socket.split();
    // Handle WebSocket messages
    let writer = async {
        while let Ok(msg) = rx.recv().await {
            if ws_sender.send(axum::extract::ws::Message::Text(msg)).send().await.is_err() {
                break;
            }
        }
    };
    let reader = async {
        while let Ok(msg) = ws_receiver.next().await {
            if let Some(Ok(axum::extract::ws::Message::Text(text))) = msg {
                println!("Received: {}", text);
            }
        }
    };
    tokio::join!(writer, reader);
    let mut peers = state.peers.lock().await;
    peers.remove(&client_id);
}

WebSocket core challenges:

Must maintain connection mapping table (HashMap + Mutex)
Reconnection needs manual implementation
Load balancing requires sticky sessions or WebSocket-aware LB
Code size is 60-70% larger than SSE

One-Line Decision

Need to watch data, prioritizing scalability → SSE

Need to talk back, low-latency bidirectional → WebSocket

Most monitoring, notification, and real-time data stream scenarios can just use SSE.

Want more Rust async programming and real-time system architecture实战? Follow Rexai Programming on WeChat for weekly updates.
Also check out Rexai AI Programming Assistant — get AI coding tools into production.

The Bottom Line#

Protocol Mechanics: Why the Memory Gap Exists#

WebSocket: A Permanent Dedicated Phone Line#

SSE: A Mailbox on Your Door#

When to Use Which: Real-World Scenarios#

Go with SSE when#

Go with WebSocket when#

Gray area: Hybrid Architecture#

Interpreting the Benchmark: CPU and Latency Are a Different Story#

Code Comparison: Axum SSE vs WebSocket#

SSE Version#

WebSocket Version#

One-Line Decision#