UNION without ALL

Severity: Info

UNION combines the results of two queries and removes duplicate rows by sorting or hashing the entire combined result set. If duplicates are impossible (different tables, mutually exclusive conditions) or acceptable, this deduplication step is wasted work. UNION ALL skips it entirely and is always faster.

Example

-- Sorts the combined result to remove duplicates
SELECT id, name FROM employees_us
UNION
SELECT id, name FROM employees_eu;

-- Skips deduplication — faster when duplicates are impossible or acceptable
SELECT id, name FROM employees_us
UNION ALL
SELECT id, name FROM employees_eu;

Why it matters

To deduplicate, PostgreSQL adds a Sort or HashAggregate node to the query plan. This consumes CPU and memory proportional to the combined result size. For large result sets, the sort may spill to disk.

When the two queries draw from different tables or use mutually exclusive WHERE conditions, duplicates cannot occur. The deduplication step runs but finds nothing to remove — pure overhead.

When UNION is appropriate

UNION is the right choice when duplicates are genuinely possible and you need to eliminate them:

-- The same customer may appear in both tables
SELECT email FROM newsletter_subscribers
UNION
SELECT email FROM order_customers;

How to fix it

If you’ve verified that duplicates are impossible or acceptable, replace UNION with UNION ALL:

SELECT id, name FROM employees_us
UNION ALL
SELECT id, name FROM employees_eu;

Detection

Query Doctor flags this nudge on any SELECT statement that uses UNION without ALL. Review each flagged query to determine whether deduplication is actually needed.