CI Integration

The analyzer doubles as a GitHub Action that reviews every pull request for index recommendations. It captures the queries your test suite runs against a real PostgreSQL database, analyzes them against your schema, and posts a comment on the PR with its findings.

This works with any language, ORM, or query builder — the analyzer reads PostgreSQL’s auto_explain logs, so it sees every query regardless of how it was generated.

How it works

Your CI pipeline starts PostgreSQL with auto_explain logging enabled.
Your migrations and seed scripts set up the schema and seed data.
Your test suite (integration, e2e, load tests, etc.) runs queries against that database.
The analyzer parses the auto_explain logs, introspects the schema, and generates index recommendations.
A comment is posted to the PR with the results.

Queries can run inside rolled-back transactions and will still be captured. The analyzer’s own work is also done in transactions that are always rolled back — no data is modified.

Prerequisites

GitHub Actions on an ubuntu runner (ships with PostgreSQL pre-installed)
A test suite that hits a real PostgreSQL database — the source doesn’t matter (unit, integration, e2e)
At least one row per table in your seed data so the analyzer can test index configurations
pull-requests: write permission for the job so the analyzer can post PR comments

Environment variables

The analyzer action requires these environment variables:

Variable	Required	Description
`POSTGRES_URL`	Yes	PostgreSQL connection string (`postgres://user@host/db`)
`LOG_PATH`	Yes	Path to the `auto_explain` log file — must match `log_directory`/`log_filename` in your PostgreSQL config
`GITHUB_TOKEN`	Yes	GitHub token for posting PR comments — use `${{ github.token }}`

Setup

1. Configure PostgreSQL

Ubuntu runners ship with PostgreSQL 16. Append the auto_explain configuration and start the service:

jobs:
  query-doctor:
    runs-on: ubuntu-latest
    permissions:
      contents: read
      pull-requests: write
    steps:
      - uses: actions/checkout@v4

      - name: Start PostgreSQL
        run: |
          sudo tee -a /etc/postgresql/16/main/postgresql.conf <<EOF
            shared_preload_libraries = 'auto_explain'
            auto_explain.log_min_duration = 0
            auto_explain.log_analyze = true
            auto_explain.log_verbose = true
            auto_explain.log_buffers = true
            auto_explain.log_format = 'json'
            logging_collector = on
            log_directory = '/var/log/postgresql'
            log_filename = 'postgres.log'
          EOF
          sudo tee /etc/postgresql/16/main/pg_hba.conf > /dev/null <<EOF
            host all all 127.0.0.1/32 trust
            host all all ::1/128 trust
            local all all peer
          EOF
          sudo systemctl start postgresql.service
          sudo -u postgres createuser -s -d -r -w runner
          sudo -u postgres createdb testing
          sudo chmod 666 /var/log/postgresql/postgres.log

Change the username (runner) and database name (testing) to suit your project.

2. Run migrations and seed

Run your migration and seed scripts against the database:

      - name: Migrate and seed
        run: npm run migrate && npm run seed
        env:
          POSTGRES_URL: postgres://runner@localhost/testing

The analyzer needs at least one row per table to test index configurations — empty tables produce no useful query plans. Your seed script should insert representative data: a handful of rows per table is enough, but make sure every table your queries touch has data. If you don’t have a seed script, a simple SQL file with INSERT statements works fine:

      - name: Migrate and seed
        run: |
          npm run migrate
          psql postgres://runner@localhost/testing -f seed.sql

3. Run your test suite

Execute whatever tests generate database queries:

      - name: Run tests
        run: npm run test:integration
        env:
          POSTGRES_URL: postgres://runner@localhost/testing

4. Run the analyzer

      - name: Run Query Doctor
        uses: query-doctor/analyzer@v0
        env:
          GITHUB_TOKEN: ${{ github.token }}
          POSTGRES_URL: postgres://runner@localhost/testing
          LOG_PATH: /var/log/postgresql/postgres.log

LOG_PATH tells the analyzer where to find the auto_explain log file. It must match the log_directory/log_filename you configured in step 1.

The analyzer reads the logs, introspects the database schema, and posts a PR comment with index recommendations.

Production statistics (optional)

By default, the analyzer works with the statistics generated by your test data. For more accurate recommendations that reflect real-world data distribution, you can sync statistics from your production database.

Create the following function in your production database:

CREATE OR REPLACE FUNCTION _qd_dump_stats(include_sensitive_info boolean)
RETURNS jsonb AS $$
SELECT json_agg(t) FROM (
  SELECT
    c.table_name as "tableName",
    c.table_schema as "schemaName",
    cl.reltuples, cl.relpages, cl.relallvisible,
    n.nspname as "schemaName",
    json_agg(
      json_build_object(
        'columnName', c.column_name,
        'dataType', c.data_type,
        'isNullable', (c.is_nullable = 'YES')::boolean,
        'stats', (
          select json_build_object(
            'stanullfrac', s.stanullfrac,
            'stawidth', s.stawidth,
            'stadistinct', s.stadistinct,
            'stakind1', s.stakind1, 'stanumbers1', s.stanumbers1,
            'stakind2', s.stakind2, 'stanumbers2', s.stanumbers2,
            'stakind3', s.stakind3, 'stanumbers3', s.stanumbers3,
            'stakind4', s.stakind4, 'stanumbers4', s.stanumbers4,
            'stakind5', s.stakind5, 'stanumbers5', s.stanumbers5,
            'stavalues1', case when $1 then s.stavalues1 else null end,
            'stavalues2', case when $1 then s.stavalues2 else null end,
            'stavalues3', case when $1 then s.stavalues3 else null end,
            'stavalues4', case when $1 then s.stavalues4 else null end,
            'stavalues5', case when $1 then s.stavalues5 else null end
          ) from pg_statistic s
          where s.starelid = a.attrelid and s.staattnum = a.attnum
        )
      ) ORDER BY c.ordinal_position
    ) as columns
  FROM information_schema.columns c
  JOIN pg_attribute a ON a.attrelid = (quote_ident(c.table_schema) || '.' || quote_ident(c.table_name))::regclass AND a.attname = c.column_name
  JOIN pg_class cl ON cl.relname = c.table_name
  JOIN pg_namespace n ON n.oid = cl.relnamespace
  WHERE c.table_name not like 'pg_%'
    AND n.nspname <> 'information_schema'
  GROUP BY c.table_name, c.table_schema, cl.reltuples, cl.relpages, cl.relallvisible, n.nspname
) t;
$$ LANGUAGE sql STABLE SECURITY DEFINER;

Then dump and provide the stats file:

psql -d yourdb -At -F "" -c "select _qd_dump_stats(false)" > stats.json

Pass false to include_sensitive_info to exclude actual cell values from the dump — only statistical distributions are included. Pass true if you need the most accurate recommendations and your data isn’t sensitive.