Datacenter Registry
Client: Internal • Industry: AI Infrastructure / Geospatial • Completed: June 15, 2026
Geospatial Data Provenance DuckDB Python Next.js
Challenge
Mapping the AI buildout means mapping its physical inputs — which datacenters exist, who owns them, how much power they draw, where that power comes from, and how the grid connects it all. That data is scattered across regulatory filings (FERC, EIA, ISO interconnection queues), company disclosures, trade press, and open geodata (HIFLD, PeeringDB, OSM), with no canonical, auditable place to assemble it. Aggregators exist but bury their sourcing, so a number can’t be trusted or contested.
Solution
Built a global datacenter registry where every fact carries its own provenance, backed by a static map dashboard with no server dependency.
- 14-entity graph model — facilities, campuses, organizations, substations, generation assets, interconnection requests, transmission lines, fiber routes, landing points, chip deployments, cooling systems, supply contracts, relationships, and events — all modeled identically
- Observation envelope on every field value carrying source URL, trust tier, observed date, method, and confidence; estimates always flagged with their basis and never silently mixed with observed values
- Git-versioned canonical JSONL as source of truth — every data change is a diffable, auditable commit; DuckDB is a disposable build artifact rebuilt on every
dcdb build - Bulk ingest modules for HIFLD substations and transmission, EIA-860M generation assets, PeeringDB facility stubs, and OSM datacenters, plus a researched Tier-1 facility seed corpus
- Tier-based conflict resolution — all observations are kept; the build picks the best value by tier then recency, and the dashboard can surface “250 MW (filing) vs 300 MW (press)”
- Fully static Next.js + MapLibre dashboard serving PMTiles and GeoJSON with togglable power layers and click-through facility dossiers — no backend infrastructure
Impact
- Coverage spanning ~6.8K facility stubs, ~73K substations, 7.1K generation plants, 50.3K transmission lines, and ~200 researched Tier-1 facility seeds
- Every fact is contestable: provenance and confidence travel with the value, so disagreements resolve against sources rather than assertions
- Canonical-JSONL-as-truth makes the entire registry diffable in git — corrections are reviewable history, not silent overwrites
- Static deployment to Cloudflare Pages means the map runs with zero backend cost or operational surface
- Built only on public, per-field-cited sources — an inspectable infrastructure graph feeding the SpotWire power and facility indices