Open source · MIT licensed v1.4.0◆Context7 plugin

A Data Questioning Tool that tells you the what and the why.

Unifies your scattered data into one source of truth. Upgrades your existing models, dashboards, and queries into a causal semantic layer you didn't have to write. Picks up on trends and surfaces business insights, all wrapped in a quality harness that puts guardrails on the AI so the reports it generates stay on-spec.

Built forClickHouseandBigQueryfirst.·Snowflake · Databricks · others - WIPcontributors welcome ↗

New · LLM Wiki semantic layer

Your task board is already a semantic layer. dqt extracts it.

Dump tickets, SQL, and BI reports into raw/. Point Claude Code at the vault — it synthesises dataset descriptions, metric definitions, and causal edges into wiki/. No manual YAML authoring.

Based on Karpathy's LLM Wiki pattern ↗

★ Star on GitHub →

detector algorithms

declarative checks

warehouse engines

MIT

no vendor lock-in

The hour after the alert

Most monitoring tools tell you a row count dropped.
They don't tell you why.

You set a threshold. It fires. Slack lights up. Now you're bouncing between dbt docs, the warehouse, and your BI tool — trying to figure out which upstream model changed, whether the spike in nulls explains the dashboard regression, and whether this is worth waking the on-call engineer for.

dqt was built for the part that comes after the alert. It reads your dbt manifest, parses your warehouse SQL into a column-level lineage graph, runs 35 statistical detectors and 29 declarative checks, and discovers causal relationships across your metrics — so the next time something moves, you already know what moved it.

Without dqt

✗orders.amount null_fraction ≥ 0.05 — threshold exceeded

Now what? Go dig through git log, dbt docs, warehouse history…

With dqt

✗orders.amount null_fraction = 12.4% (baseline 0.3%)

Lineage: stg_payments → orders → revenue. Schema break in stg_payments 6h ago.

Causal candidate: stg_payments → orders.amount (E-value 3.2, pending human review).

Four layers. One library.

Statistical detectors

Every column. Every run.

MAD, double-MAD, isolation forest, KS, STL residual z-scores, adjusted boxplot fences. Plus completeness, validity, freshness, schema-change, and SQL-assertion checks. Every detector returns the same (verdict, score, plain_english) shape.

mad_outlier_fraction · ks_pvalue · stl_residual_zscore · isolation_forest_fraction

Column-level lineage

Parsed from your SQL.

dqt walks your dbt manifest and warehouse DDL with sqlglot to build a column-level dependency graph. From any incident, get an automatic blast radius — every downstream table and metric, ranked by exposure.

LLM Wiki · Semantic layer

raw/ holds facts. wiki/ holds knowledge.

dqt uses Karpathy's LLM Wiki pattern. Dump your Trello tickets, SQL files, and BI reports into raw/. Point Claude Code at the vault. It synthesises wiki/ — dataset descriptions, metric definitions, causal edges — from the artifacts your team already has. YAML contracts compatible with dbt's semantic_models.yml.

raw/tickets/ · raw/sql/ · raw/reports/ → wiki/metrics/ · wiki/lineage/

Causal discovery

Granger. PCMCI+. Transfer Entropy.

dqt runs causal discovery across your metric time series, prunes edges with stability selection, and proposes directed metric→metric relationships annotated with lag, confidence, and E-values. Every edge reviewed by a human before entering the production DAG.

The only data questioning tool that ships causal discovery.

Karpathy's LLM Wiki pattern

Your data warehouse
already has documentation.
It's in your Trello board.

Every BI request your GTM team filed is a semantic definition waiting to be extracted. The ticket says what the metric means. The SQL says how it's computed. The report says what thresholds matter.

dqt uses Karpathy's LLM Wiki structure: raw/ for atomic source documents, wiki/ for synthesised knowledge. Point Claude Code at the vault and it writes the semantic layer for you — from the artifacts your team already has.

Read the full workflow guide →

Export Trello tickets + attachments

SQL files, report HTMLs, metric definitions

Put them in raw/

raw/tickets/ · raw/sql/ · raw/reports/ · raw/schema/

Point Claude Code at the vault

cd vault && claude .

Claude Code synthesises wiki/

datasets, metrics, lineage, causal edges — grounded in your actual data

dqt generates per-column docs + checks

write_vault() · dqt run checks.yaml

Recommended workflow

Use dqt with Claude Code.

Three plugins turn Claude Code into a grounded data questioning engineer that knows your warehouse, knows the dqt API, and can run checks from natural language.

1Context7 ↗

Up-to-date dqt docs

Connects Claude Code to dqt's live documentation and source — all 64 detector and check slugs, the exact YAML schema, and adapter protocol. No training-data lag.

▸Write checks from business rules
▸Pick the right detector for your data shape
▸Debug failures with current API knowledge

2Superpowers ↗

Agentic development skills

Gives Claude Code structured workflows for planning, executing, and reviewing multi-step tasks — essential for building out a full dqt check suite or semantic layer from scratch.

▸Plan + execute check suites step by step
▸TDD for detector configs
▸Subagent-driven semantic layer build

3Warehouse MCP ↗

Live warehouse access

Each warehouse publishes its own MCP — e.g. mcp-clickhouse for ClickHouse, or the Postgres MCP server. Connect Claude Code to your warehouse and it can inspect live schemas, sample real distributions, and write dqt checks grounded in your actual data.

▸Schema introspection from live tables
▸Sample-driven detector recommendations
▸Auto-generate semantic.yaml from DDL

Claude Code · all three plugins active

you

Look at my orders table and write dqt checks for amount_usd. Flag outliers and distribution shift.

claude code

Querying your warehouse via MCP... orders.amount_usd: right-skewed, p99=$4,820. Context7 confirms: use adjusted_boxplot_fraction (handles skew via medcouple) + ks_pvalue for drift. checks: - table_name: orders column_name: amount_usd detector_slug: adjusted_boxplot_fraction - table_name: orders column_name: amount_usd detector_slug: ks_pvalue params: {alpha: 0.01}

Full detector catalog

Three distinct problem domains: point outliers, distribution drift, and time-series anomalies. For any given column you typically need one or two — auto_outlier_fraction picks the right one automatically based on the data's distribution. Note: zscore_outlier_fraction assumes normality — use MAD or double-MAD on real warehouse data.

Statistical & ML algorithms · 35

Declarative checks · 29

Three lines to your first check.

Runs in notebooks. Runs in CI.
No server required.

from dqt import Check, Runner, MemoryStore

check = Check(
    schema_name="public",
    table_name="orders",
    column_name="amount",
    detector_slug="mad_outlier_fraction",
)

result = Runner(MemoryStore()).run(check, adapter)

print(result.plain_english)
# → "0.82% of values are outliers — within the 1% warn threshold"

No server required. The optional FastAPI service and dashboard are there when you want them — and stay out of the way when you don't.

From zero to first incident.

Getting started

Four steps. No database, no server. Runs in a notebook or a CI job — wherever Python runs.

Install

pip install dqtlib

Run your first check

from dqt import Runner, MemoryStore
from dqt.checks.models import Check
from dqt.adapters.local import LocalAdapter
import pandas as pd

df    = pd.read_csv("orders.csv")
store = MemoryStore()

check = Check(
    schema_name="public", table_name="orders",
    column_name="amount_usd",
    detector_slug="wasserstein_1",   # drift detection
)
result = Runner(store).run_in_memory(
    check,
    reference=df[df.date < "2024-01-01"],
    current  =df[df.date >= "2024-01-01"],
)
print(result.verdict, result.plain_english)

Read the result

verdict

pass · warn · fail

threshold decision

score

0.3142

raw metric (Wasserstein distance)

plain_english

"Distance 0.31 — above warn threshold"

human-readable summary

Open the dashboard

pip install "dqtlib[dashboard]"  # adds FastAPI + uvicorn
dqt dashboard --port 8080
# → http://127.0.0.1:8080

Checks, column distribution profiles, and Granger causality inference — all in one place. No signup, no cloud, no persistent state beyond the process.

Read the full guide →All 64 checks & detectors →

Drop it in next to the tools you already use.

dbtreads manifest.json and semantic_models.yml directly

Airflow · Dagster · Prefectruns as one Python task

Snowflake · BigQuery · Postgres · Databricksadapter-based; bring your own connection

OpenLineageingests events from any non-dbt pipeline

DuckDBembedded analytics engine for sample-level stats

Install it. Point it at your warehouse.
See your first incident in five minutes.

★ Star on GitHub →Open the dashboard →

Open source · MIT licensed · Python 3.12+ · No telemetry · No signup · No credit card

About the author

Anton Barr is an engineer and data geek with 25+ years building data systems. A student of 質 (shitsu): quality, substance, the inner nature of a thing. dqt is a personal project built by a practitioner who believes craft and precision are the same thing - and got tired of tools that answer what but never why.