Skip to content

Modify Statistics

PostgreSQL and other database systems make decisions about how to execute a query based on metadata about the information (records) it contains. Utilizing real (e.g. Production) statistics or realistically faked (e.g. expected future record count) allows our system to give meaningful insights without having to have large volumes of data. Magic!

When configuring the connection string to sync from as external database, you are presented with options for where the source of statistics will be. Select the From connection option and the statistics will be gathered at the same time as the schema is pulled.

Coming soon: specify different sources for schemas and queries versus statistics.

Manually importing statistics from a source database involves the creation of a JSON file containing the metadata from the database. This is generated via a helper function that you create in the database.

  1. Connect to the desired database in the PostgreSQL instance (e.g. using the the psql utility) and running the command below.

    This action requires the connected user to have the CREATE privilege on the schema.

    The include_sensitive_info argument determines whether or not metadata columns that could contain sensitive information, such as the most common values for a column, are collected or not. true is the only option fully supported at this time.

    CREATE OR REPLACE FUNCTION _qd_dump_stats(include_sensitive_info boolean) RETURNS jsonb AS $$
    WITH table_columns AS (
    SELECT
    c.table_name,
    c.table_schema,
    cl.reltuples,
    cl.relpages,
    cl.relallvisible,
    -- cl.relallfrozen,
    n.nspname AS schema_name,
    json_agg(
    json_build_object(
    'columnName', c.column_name,
    'dataType', c.data_type,
    'isNullable', (c.is_nullable = 'YES')::boolean,
    'characterMaximumLength', c.character_maximum_length,
    'numericPrecision', c.numeric_precision,
    'numericScale', c.numeric_scale,
    'columnDefault', c.column_default,
    'stats', (
    SELECT json_build_object(
    'starelid', s.starelid,
    'staattnum', s.staattnum,
    'stanullfrac', s.stanullfrac,
    'stawidth', s.stawidth,
    'stadistinct', s.stadistinct,
    'stakind1', s.stakind1, 'staop1', s.staop1, 'stacoll1', s.stacoll1, 'stanumbers1', s.stanumbers1,
    'stakind2', s.stakind2, 'staop2', s.staop2, 'stacoll2', s.stacoll2, 'stanumbers2', s.stanumbers2,
    'stakind3', s.stakind3, 'staop3', s.staop3, 'stacoll3', s.stacoll3, 'stanumbers3', s.stanumbers3,
    'stakind4', s.stakind4, 'staop4', s.staop4, 'stacoll4', s.stacoll4, 'stanumbers4', s.stanumbers4,
    'stakind5', s.stakind5, 'staop5', s.staop5, 'stacoll5', s.stacoll5, 'stanumbers5', s.stanumbers5,
    'stavalues1', CASE WHEN $1 THEN s.stavalues1 ELSE NULL END,
    'stavalues2', CASE WHEN $1 THEN s.stavalues2 ELSE NULL END,
    'stavalues3', CASE WHEN $1 THEN s.stavalues3 ELSE NULL END,
    'stavalues4', CASE WHEN $1 THEN s.stavalues4 ELSE NULL END,
    'stavalues5', CASE WHEN $1 THEN s.stavalues5 ELSE NULL END
    )
    FROM pg_statistic s
    WHERE s.starelid = a.attrelid AND s.staattnum = a.attnum
    )
    )
    ORDER BY c.ordinal_position
    ) AS columns
    FROM information_schema.columns c
    JOIN pg_attribute a
    ON a.attrelid = (quote_ident(c.table_schema) || '.' || quote_ident(c.table_name))::regclass
    AND a.attname = c.column_name
    JOIN pg_class cl
    ON cl.oid = a.attrelid
    JOIN pg_namespace n
    ON n.oid = cl.relnamespace
    WHERE c.table_name NOT LIKE 'pg_%'
    AND n.nspname <> 'information_schema'
    AND c.table_name NOT IN ('pg_stat_statements', 'pg_stat_statements_info')
    GROUP BY c.table_name, c.table_schema, cl.reltuples, cl.relpages, cl.relallvisible, n.nspname
    ),
    table_indexes AS (
    SELECT
    t.relname AS table_name,
    json_agg(
    json_build_object(
    'indexName', i.relname,
    'reltuples', i.reltuples,
    'relpages', i.relpages,
    'relallvisible', i.relallvisible
    -- 'relallfrozen', i.relallfrozen
    )
    ) AS indexes
    FROM pg_class t
    JOIN pg_index ix ON ix.indrelid = t.oid
    JOIN pg_class i ON i.oid = ix.indexrelid
    JOIN pg_namespace n ON n.oid = t.relnamespace
    WHERE t.relname NOT LIKE 'pg_%'
    AND n.nspname <> 'information_schema'
    GROUP BY t.relname
    )
    SELECT json_agg(
    json_build_object(
    'tableName', tc.table_name,
    'schemaName', tc.table_schema,
    'reltuples', tc.reltuples,
    'relpages', tc.relpages,
    'relallvisible', tc.relallvisible,
    -- 'relallfrozen', tc.relallfrozen,
    'columns', tc.columns,
    'indexes', COALESCE(ti.indexes, '[]'::json)
    )
    )
    FROM table_columns tc
    LEFT JOIN table_indexes ti
    ON ti.table_name = tc.table_name;
    $$ LANGUAGE sql STABLE SECURITY DEFINER;
  2. Execute the function and save the resulting JSON output to a file.

    Section titled “Execute the function and save the resulting JSON output to a file.”

    An example for how to do this is by running the command below in a terminal that has access to the psql utility as well as the source system. Substitute the appropriate parameters for <yourdb>, connection options to the instance, and output file. By default the resulting file will be named stats.json and it will be created in the current working directory.

    psql -d <yourdb> -At -F "" -c "select \_qd_dump_stats(true)" > stats.json
  3. After syncing (the same) source system, navigate to the Setup / Statistics page. Click on the Import Statistics button:

    Import Statistics Button

    In the file selector modal, nagivate to and select the file generated in the previous step. Once added, navigate back to Live Queries to see the updated planner behavior.

Coming (back) soon!

Coming soon!

A few common issues are noted below. If you need any further assistance please reach out on Discord.

Confirm that the connected user has the appropriate permissions to do the tasks.

If you receive the following error:

ERROR: function _qd_dump_stats() does not exist

Confirm the following:

  • The name of the database that the function was created in.
  • That the function was created successfully.
  • The name of the database that you are connecting to when executing the function.

It should be the same database in both places.