How Uplift OS Ingests 500,000+ General Ledger Transactions

A real-estate private equity fund our size — a thousand doors across a handful of assets, low nine-figure portfolio — sits at an awkward middle. We’re large enough that property-level accounting in QuickBooks doesn’t scale, and small enough that an institutional Yardi-plus-five-data-engineers stack would consume the entire G&A budget. The question I had to answer in year one of Uplift Capital was simple: how do we run institutional-grade operating analytics on a fund that doesn’t have an institutional-sized team?

The answer is Uplift OS. It ingests roughly half a million general ledger transactions a year across our properties, normalizes them into a single canonical schema, runs enrichment passes for fee leakage and unit economics, and feeds a small set of dashboards plus partner-letter exports. I built it myself. This post is a walk-through of how it works and why I think every operator-led fund will eventually run something like it.

The Problem in Plain Numbers

Uplift Capital owns and operates multifamily across DFW and the Texoma corridor. Each property uses a third-party property manager whose accounting system is some flavor of Yardi or RealPage. Every month, each property produces:

A trial balance
A general ledger detail (the line-item transactions — every rent payment, every plumbing invoice, every utility bill)
A rent roll
An A/R aging
A bank reconciliation

Multiply by 6 properties, 12 months. Add 4–5 years of historical data we inherited at acquisition. The arithmetic puts us at 500,000+ GL transactions sitting across dozens of disconnected exports, with overlapping but non-identical chart-of-accounts conventions, partial coverage of unit-level detail, and the occasional one-off CSV from a manager who’s still emailing us spreadsheets.

The job: turn that mess into something I can ask questions of in seconds.

“What’s the average days-to-fill across the portfolio in Q4 vs. Q3, broken out by submarket?”

“Show me every fee charged by Property Manager X that doesn’t have a corresponding lease event.”

“Across all 2024 turnovers, what’s our true cost-to-turn versus the budget? Which line items are blowing up?”

If those questions take more than thirty seconds to answer, the operation is running blind.

The Three Layers

Uplift OS is conceptually three layers stacked on a single Postgres instance. I’ll walk through each.

1. Ingestion

The ingestion layer’s only job is to land raw exports without losing fidelity. Every monthly upload from every property gets stored verbatim, in a per-property-per-month immutable table:

yardi_pm_johnson_creek_2024_12_gl_detail
yardi_pm_johnson_creek_2024_12_trial_balance
yardi_pm_johnson_creek_2024_12_rent_roll
realpage_buena_vista_2024_12_gl_detail
...

A few principles I had to internalize early:

Raw stays raw. We don’t touch the original export. Ever. If a transformation has a bug, I don’t want to re-fetch from a property manager who doesn’t owe me re-fetches.
Versioning is on every row. Each ingested row carries loaded_at, source_file_hash, source_file_path. If the same property re-sends an amended export, both versions exist.
Per-source schema. Yardi and RealPage have meaningfully different GL detail schemas. I don’t try to coerce them at ingestion. That’s the next layer’s problem.

The Python script that handles ingestion is, deliberately, three hundred lines. It does one thing: parse the file, hash it, write to the right table. No analytics, no joins, no business logic. Roughly half the lines are exception handling for the dozen ways a CSV from a property manager can be malformed.

2. Normalization

The normalization layer is where the chart-of-accounts wars are fought. Every property manager has its own conventions:

“Repairs and Maintenance” at one property is “R&M – Interior” + “R&M – Exterior” + “R&M – Make-Ready” at another.
“Property Management Fee” is sometimes a single 4% of effective gross income, sometimes a base management fee plus a turnover supplement plus a leasing commission split, all charged separately.
One operator splits utilities by unit; another bills back at the property level.

I maintain a canonical chart of accounts — about 80 line items spanning operating revenue, operating expense, capex, financing, and reserves. Every raw GL row gets mapped to one canonical line. The mapping is rule-based first (regex on the GL description) and then LLM-assisted for the long tail. Every mapping decision is logged so I can audit later when the categorization looks off.

The output of the normalization layer is the table I actually run analytics against:

parcel_year_gl_normalized:
  property_id        text
  property_name      text  -- denormalized for query convenience
  gl_year            int
  gl_month           int
  posted_date        date
  canonical_account  text  -- one of ~80 values
  raw_account        text  -- preserved for audit
  amount_cents       bigint
  vendor_name        text
  description        text
  source_export      text  -- which raw table this came from
  mapping_method     text  -- "rule" | "llm" | "manual"
  mapping_confidence float -- 0-1

That’s it. One wide table, one row per GL line, normalized account, full lineage back to the raw export. Every report in Uplift OS is a query against some slice of this table.

3. Enrichment

The enrichment layer is where the operating intelligence actually happens. Two passes I’ll describe here.

Pass 1: fee leakage. Property managers charge fees. Some are contractual and well-defined (4% management fee on EGI). Many are not. There are turn fees that should be capped, leasing commissions that should taper, late fees that should split with us, processing fees on credit card payments that may or may not be passable. The fee-leakage detector flags every line item categorized as a fee, joins it to the relevant lease event (where applicable), checks against the management agreement (which I’ve digitized as a structured document), and produces a per-property monthly fee-anomaly report.

In our first full audit using this pass, we found systematic over-charging on two properties that totaled mid-five-figures over the prior twelve months. The fees were all in writing in the management agreements; they just hadn’t been compared to the GL line by line by anyone. This one pass paid for the entire build of Uplift OS.

Pass 2: unit-level economics. Most multifamily reporting stops at property-level NOI. That’s the level our trial balance arrives in, but it’s not where the interesting questions live. The questions I want to answer are unit-level: which floor plans are actually most profitable after turn cost? Which buildings have a deferred-maintenance penalty that doesn’t show up in headline NOI? Which submarkets are losing pricing power?

To get there, the enrichment pass joins:

Per-unit lease records (from the rent roll)
Per-unit work-order history (from the property manager’s CRM, where available)
Per-property capex schedules (from internal planning)

The output is a unit-year record — one row per unit per year, with revenue, expense allocation, turn cost, vacancy days, and a rolled-up unit NOI. From there, every floor-plan and building-level question is a GROUP BY away.

What This Costs to Build (and Run)

I want to be specific because I think most operators read about these systems and assume the cost is institutional.

The full Uplift OS infrastructure runs on:

A self-hosted Postgres instance ($0 — runs on hardware I already own)
A handful of Python scripts on the same box
A Next.js dashboard for querying the wide normalized table
Claude API calls for the LLM-assisted categorization (~$30/month at our ingestion rate)
About three weekends of my own time to build the v1, plus an hour or two a month of maintenance

Total monthly run cost: roughly $30, almost all of it Claude API. Total time-to-build: under 100 hours of my own work.

The substitute — institutional fund-administration outsourcing — would have been low-five-figures monthly at our portfolio size and given us reports a month after the fact, in formats I couldn’t run my own queries against. The cost differential is roughly three orders of magnitude.

The Lesson, If There Is One

The framing I keep coming back to with operator-builders I talk to is this: operating intelligence is a moat that can no longer be outsourced. Five years ago, building a system like this required a small data team. Today, the same build is a few weekends of Python + Postgres + a careful read of the property management agreements, and the marginal cost of a query is a few cents.

If you’re operating a real estate fund and you don’t already have line-by-line visibility into your GL, you’re either paying someone a lot of money to maintain that visibility for you, or you don’t actually have it. Both states will become harder to defend over the next few years as it becomes obvious how much of “institutional-grade analytics” is actually three Python scripts and a willingness to read your own management agreements carefully.

Uplift OS is the work behind the work. It’s how I sleep at night knowing what’s really happening across the portfolio. It’s also why, when I talk to other GPs about going AI-native, I tell them to start with operating intelligence before they touch deal sourcing or any of the more glamorous use cases. The boring ingestion-and-normalization layer is the foundation everything else sits on.

If you’re building something similar, the most useful piece of advice I can give: don’t try to make ingestion clever. Make it dumb, immutable, and well-versioned. The cleverness belongs further up the stack, where it can be tested, audited, and replaced without re-fetching the world.