How Optimization Works
Most query optimization tools rely on rules and heuristics — pattern matching against known anti-patterns. Query Doctor does that too (query analysis nudges), but the core of its optimization engine takes a fundamentally different approach: it asks PostgreSQL itself.
Production statistics, not guesswork
Section titled “Production statistics, not guesswork”When the analyzer connects to your database, it exports the contents of PostgreSQL’s internal statistics catalogs — pg_statistic and the relation metadata from pg_class. These are the same statistics that PostgreSQL’s own query planner uses to make decisions:
- Row counts and page counts for every table and index
- Column-level statistics — null fractions, distinct value counts, average widths, most common values, histograms, and correlation data
- Index metadata — which indexes exist, their sizes, and visibility information
The analyzer restores these statistics into its own local PostgreSQL instance. This means the local planner operates with the exact same data distribution knowledge as your production database. When it estimates that a sequential scan will read 50,000 pages or that an index scan will return 200 rows, those estimates are grounded in real data — not defaults or assumptions.
What-if analysis with real cost models
Section titled “What-if analysis with real cost models”Once the analyzer has your production statistics, it performs what-if analysis for each query:
-
Baseline plan — the analyzer runs
EXPLAINagainst your query with no changes, using the imported statistics. This produces a cost estimate that reflects how PostgreSQL would plan the query on your production system today. -
Candidate indexes — the analyzer examines the query’s
WHEREclauses,JOINconditions, andORDER BYexpressions to identify columns that could benefit from indexing. It generates candidate indexes across these columns, including multi-column combinations. -
Optimized plan — for each candidate index set, the analyzer creates the indexes in a transaction, re-runs
EXPLAIN, and captures the new cost. The indexes are rolled back immediately — nothing is persisted. -
Comparison — the baseline and optimized costs are compared directly. If the optimized cost is at least 5% lower, the query is marked as having improvements available, and the specific
CREATE INDEXstatements are surfaced as recommendations.
The key insight is that none of this is heuristic. The improvement percentage you see in the UI — “54% cost reduction” — comes from PostgreSQL’s own planner evaluating two real plans with real statistics. It’s the same calculation PostgreSQL would make if those indexes actually existed on your production system.
Why this is accurate
Section titled “Why this is accurate”Traditional query advisors estimate improvements using rules like “adding an index on a filtered column should help.” That’s directionally true, but it can’t tell you how much it helps — or whether it helps at all for your specific data distribution.
Query Doctor’s approach is accurate because it removes the guesswork:
- Same cost model — PostgreSQL’s planner uses configurable cost parameters (
seq_page_cost,random_page_cost,cpu_tuple_cost, etc.) to weight different operations. The analyzer uses the same parameters, so its cost estimates are directly comparable to your production system. - Same statistics — the planner’s row estimates, selectivity calculations, and join cardinality estimates all derive from
pg_statistic. By importing your production statistics, the analyzer’s estimates match what your production planner would produce. - Same planner logic — the analyzer doesn’t reimplement PostgreSQL’s planning. It runs the actual PostgreSQL planner, which means it accounts for the same edge cases, cost model nuances, and optimization strategies.
This is why you’ll often see the analyzer’s cost estimates closely match what you’d get from running EXPLAIN directly on your production database.
What the percentage means
Section titled “What the percentage means”When Query Doctor reports a cost reduction — say, 72% — it means:
PostgreSQL’s planner estimates that the query’s total cost would drop by 72% if the recommended indexes were created, given the current data distribution in your database.
This is not a synthetic benchmark or a guess. It’s a direct comparison of two plans produced by the same planner with the same statistics. The only difference is the available indexes.
Queries with a large improvement (50%+ reduction and absolute cost difference >100) are flagged as critical nudges. Smaller improvements are flagged as warnings. Queries where no index combination produces at least a 5% improvement are marked as already optimal.
Limitations
Section titled “Limitations”Statistics can be stale
Section titled “Statistics can be stale”The accuracy of the analysis depends on the accuracy of your database’s statistics. PostgreSQL collects statistics via ANALYZE (or autovacuum’s auto-analyze), and these are sample-based snapshots. If your data distribution has changed significantly since the last ANALYZE, the statistics — and therefore the optimizer’s estimates — may not reflect reality.
Signs of stale statistics:
- Tables that have had large bulk inserts or deletes since the last
ANALYZE - Columns with highly skewed distributions where the sample didn’t capture the skew
- Tables where autovacuum is infrequent or disabled
You can manually refresh statistics on specific tables:
ANALYZE my_table;Write overhead isn’t measured
Section titled “Write overhead isn’t measured”The analyzer evaluates the read performance impact of adding indexes. It doesn’t account for the write overhead — every INSERT, UPDATE, and DELETE must also update each index on the affected table. For write-heavy tables, adding indexes has a real cost that isn’t reflected in the optimization percentage.
Generic plans only
Section titled “Generic plans only”The analyzer uses GENERIC_PLAN mode, which produces plans without specific parameter values. This matches how PostgreSQL plans most prepared statements after the first few executions, but it means the analysis doesn’t account for parameter-specific plan choices. In rare cases, a query that benefits from an index for most parameter values might not benefit for specific ones (or vice versa).
Statistics target limits
Section titled “Statistics target limits”PostgreSQL’s default statistics target is 100 (the number of histogram buckets and most-common-values entries). For columns with very high cardinality or unusual distributions, this may not capture enough detail for precise estimates. You can increase the target per column:
ALTER TABLE my_table ALTER COLUMN my_column SET STATISTICS 500;ANALYZE my_table;This gives the planner (and the analyzer) more detailed distribution data to work with.