UNION without ALL
Severity: Info
UNION combines the results of two queries and removes duplicate rows by sorting or hashing the entire combined result set. If duplicates are impossible (different tables, mutually exclusive conditions) or acceptable, this deduplication step is wasted work. UNION ALL skips it entirely and is always faster.
Example
Section titled “Example”-- Sorts the combined result to remove duplicatesSELECT id, name FROM employees_usUNIONSELECT id, name FROM employees_eu;
-- Skips deduplication — faster when duplicates are impossible or acceptableSELECT id, name FROM employees_usUNION ALLSELECT id, name FROM employees_eu;Why it matters
Section titled “Why it matters”To deduplicate, PostgreSQL adds a Sort or HashAggregate node to the query plan. This consumes CPU and memory proportional to the combined result size. For large result sets, the sort may spill to disk.
When the two queries draw from different tables or use mutually exclusive WHERE conditions, duplicates cannot occur. The deduplication step runs but finds nothing to remove — pure overhead.
When UNION is appropriate
Section titled “When UNION is appropriate”UNION is the right choice when duplicates are genuinely possible and you need to eliminate them:
-- The same customer may appear in both tablesSELECT email FROM newsletter_subscribersUNIONSELECT email FROM order_customers;How to fix it
Section titled “How to fix it”If you’ve verified that duplicates are impossible or acceptable, replace UNION with UNION ALL:
SELECT id, name FROM employees_usUNION ALLSELECT id, name FROM employees_eu;Detection
Section titled “Detection”Query Doctor flags this nudge on any SELECT statement that uses UNION without ALL. Review each flagged query to determine whether deduplication is actually needed.