I Used AI to Build a Small-Cap Bank Screener. It Runs Without AI.
Community banks are one of the last places in public markets where the research advantage still belongs to the person willing to do unglamorous work.
There are roughly 4,000 U.S. commercial banks with publicly traded equity. Most have no analyst coverage. A meaningful fraction trade below tangible book. Many file their most important financial data not with the SEC, but with a banking regulator most equity investors have never heard of.
I built a system to screen them systematically. The system runs deterministically against public data with no AI involvement at runtime. But AI — specifically Claude Code and Codex working as development partners — built nearly every layer of it. That distinction is the point.
The data problem
If you want to understand a public industrial company, you read the 10-K. One document, standard structure, every analyst in the country has read it.
Banks are different. The authoritative fundamental data — capital adequacy, credit quality, loan composition, charge-off history — lives in quarterly call reports filed with the FFIEC, not in SEC filings. The SEC filings exist, but they’re structured for holding companies, not operating banks, and they elide the regulatory detail that actually matters for underwriting credit and capital.
The pipeline stitches five public sources into a single screening model:
| Source | What it provides |
|---|---|
| FFIEC Call Reports | Capital, credit quality, loan composition, profitability — quarterly regulatory filings, as of any quarter-end |
| FFIEC / NIC | Bank identity and holding-company structure; the RSSD ID that uniquely identifies each bank regardless of ticker |
| SEC EDGAR | Issuer-level equity facts: tangible common equity, shares outstanding, dividends, repurchases |
| Polygon | Ticker universe, daily prices, market cap, and 30-day average dollar volume across NYSE, Nasdaq, and OTC |
| EDGAR Form 4 | Insider transaction history — whether the people running the bank are buying its stock |
The identity-matching problem is underappreciated. A bank holding company (FXNC, say, trading on Nasdaq) is a different legal entity from its operating bank subsidiary (First National Corporation), which is the entity that files call reports. Matching tickers to RSSD IDs requires resolving CIK-to-RSSD mappings from NIC data, with manual overrides for ambiguous cases. There are always ambiguous cases.
Once those mappings are clean, the pipeline builds trailing-twelve-month aggregates for flow variables — net income, charge-offs, net interest income — from four sequential quarterly call reports, combined with point-in-time capital and balance sheet data. This produces a screen that works as of any quarter-end without waiting for annual filings.
The screening gates
The first pass is seven hard gates. A bank that fails any one of them doesn’t make the screen — not because these thresholds are sacred, but because they define the universe where the thesis is coherent.
| Metric | Threshold | Rationale |
|---|---|---|
| CET1 ratio | ≥ 9.0% | Well-capitalized floor plus operating buffer |
| Texas ratio | < 25% | NPA ÷ (TBV + loan-loss reserve); distress early warning |
| NPL ratio | < 3.0% | Nonperforming loans ÷ total loans |
| NCO rate (TTM) | < 1.0% | Net charge-offs ÷ average loans |
| Profitability | Positive TTM | Earners only — no turnarounds |
| Asset size | $250M – $10B | Community and small-cap only |
| 30-day ADV | ≥ $50K | Minimum liquidity for entry and exit |
The Texas ratio is the bank-specific metric worth explaining. Formalized by Gerard Cassidy at RBC in the early 1990s, it measures distressed assets against the cushion that absorbs losses before equity is impaired. A ratio approaching 100% has historically predicted failure with meaningful reliability. Screening at 25% keeps every candidate well clear of that territory.
A typical universe run produces roughly 240–280 exchange-listed candidates that clear all gates. From there, the ranking begins.
What a century of value theory contributes
The screening gates filter out distressed situations. The scoring model — 100 points across six components — identifies where the opportunity is within the clean universe. Each framework I’ve found useful contributes a distinct weight.
Graham and Dodd: price is the first question. Security Analysis, 1934. The margin of safety doctrine says cheapness alone isn’t enough, but it’s the non-negotiable first condition. Forty-five of 100 score points go to discount to 1.3× tangible book value. Community banks have traded at 1.3–1.7× TBV through most of market history. A bank at 0.8× TBV, absent a credit or structural reason for the discount, has a margin of safety Graham would recognize. The remaining score tries to answer whether that discount is orphaned cheapness or rationalized permanent impairment.
Buffett and Munger: quality at a price, not quality at any price. The Berkshire framework is not “buy cheap banks.” It’s “buy well-run banks when they’re cheap.” Return on assets at 1.5% or above is the profitability benchmark — 12 of 100 score points. Capital buffer above regulatory minimums adds 8 more. The management capital allocation review — a separate sidecar outside the screen — asks explicit questions about buyback discipline, dividend rationality, and whether management allocates capital like an owner or an empire-builder. Buffett’s insight that management quality is the primary driver of long-run bank outcomes isn’t quantifiable in call report data. The screen filters for the preconditions; the diligence work answers the management question.
Greenblatt: special situations are their own category. Some banks screen cheap because they’re actually cheap. Some screen cheap because they recently acquired another institution, restated goodwill, or took a one-time credit mark that distorted trailing tangible book. These are special situations — events that break the normal relationship between the screen and underlying value. The pipeline tracks every acquisition, recap, and accounting reset in a pro forma event review. A bank where the screen is stale gets flagged, not disqualified — but the memo must reconcile the pro forma before any thesis is drawn.
AQR: quality deterioration masking as value. Some stocks look cheap because they’re getting worse, not because they’re overlooked. A community bank with declining NIM, rising CRE concentration, and an efficiency ratio trending toward 80% is not a value opportunity — it’s a melting ice cube trading at a discount. The pipeline’s risk flags serve this function: elevated CRE-to-capital (≥150% of Tier 1), high uninsured deposits (≥50%), and NCO rate approaching the gate are memo prompts that force an explicit explanation of why the deterioration doesn’t invalidate the thesis before a name advances.
Bank specialist investors: the regulatory lens. The bank investors I’ve found most useful — Tom Brown at Second Curve, Joe Stilwell at Stilwell Value, John Hempton at Bronte Capital — share an orientation the generalist frameworks underweight: regulatory capital is not just a balance sheet number, it’s the operating constraint that determines what the business can do. A bank with 15% CET1 is a different machine than one with 9% CET1; the former has deployment optionality the latter doesn’t. The system flags unusually high CET1 (≥30%) because extreme overcapitalization is sometimes the thesis, not a data anomaly. The insider alignment signal — 3 points for positive net insider buy, 2 more for two or more purchase transactions in the trailing twelve months — reflects Stilwell’s emphasis on skin-in-the-game as the diagnostic that separates owner-operators from caretakers.
The score across those six components weights the thesis: 45 points for price, 20 for upside to 1.5× TBV, 12 for profitability, 8 for capital buffer, 5 for credit health, and 10 split across small-cap overlookedness and insider alignment. Those weights required argument. The frameworks above are the argument.
Where the AI actually is
Here’s where most AI-in-investing content gets the framing wrong: the AI isn’t reading the call reports. It isn’t scoring the banks. It isn’t picking the names.
Claude Code and Codex built the machine.
Every structural layer — the FFIEC bulk-fetch and TTM aggregation, the Polygon-to-RSSD identity resolver, the SEC EDGAR facts extractor, the Form 4 parser, the gate evaluator, the opportunity scorer, the memo shortlist renderer, the CLI orchestrator — was pair-programmed with AI in the loop as a development partner. Not via copy-paste from Stack Overflow. As an active collaborator that held the context of the investment thesis, understood the regulatory data model, and could reason about why a TTM aggregation for a bank that changed fiscal year mid-period needs a different handling path than the default case.
That’s a qualitatively different relationship than “AI helped me write code.” The investment framework lives in the code because it was articulated to the AI clearly enough that the AI could implement it correctly. The gates, the scoring weights, the risk flag thresholds — all of those required expressing what I believed about bank investing in terms precise enough for a machine to operationalize. The AI held me accountable to that precision in ways a solo development session doesn’t.
Codex challenged the weighting structure. Claude Code surfaced edge cases in the clamping logic. The resulting model is more defensible than the one I would have built alone, because the building process required defending it.
What the AI doesn’t do, and can’t do:
-
Fill the promotion checklist. Every candidate that advances to active diligence passes through a source-backed gate: valuation reconciliation, capital and credit review, funding model assessment, CRE tie-out against actual filing disclosures, management capital allocation review, event reconciliation if applicable. Each row requires a primary source URL. The system defines what needs filling. A human fills it.
-
Reconcile the pro forma. When a bank acquired another institution and the tangible book figure reflects purchase accounting marks, the screen number is stale. Rebuilding the combined entity on a normalized basis — bridging reported TBV to pro forma TBV, estimating the earn-back period on the premium — is credit work. The system flags the staleness. The operator does the reconstruction.
-
Write the memo. Investment memos cite every claim against a primary source. The pipeline generates citation blocks — SEC filing links, FDIC API links, company-page references — but the thesis prose is written after the diligence work, not before it. The score says the candidate is interesting. The memo explains why.
What surprised me in the build
The identity problem is harder than the analysis problem. Matching 4,000+ Polygon tickers to FFIEC RSSD IDs is where most of the engineering effort went. Getting that mapping clean — holding companies with multiple banking subsidiaries, tickers that changed after mergers, banks that converted from mutual to stock form with a new CIK but an old RSSD — is the unsexy foundation that makes the rest of the system valid. Garbage in the universe means garbage in the screen. The manual overrides file exists because the automated matching always has edge cases, and pretending otherwise produces silent errors.
Automation earns the right to judgment. The screen compresses 240+ candidates into a rank order that focuses attention. Without it, the research process is random walks through company filings. With it, manual diligence concentrates on the 15–20 names where the quantitative case is strong enough that the qualitative question matters. The system doesn’t replace judgment — it tells me where to apply judgment, and what questions to bring.
The century-old playbook is why the AI can implement it. Graham and Dodd wrote Security Analysis in 1934. Buffett’s letters have been public since the 1970s. Greenblatt published The Little Book That Beats the Market in 2005. AQR has been publishing quality-value research for twenty years. There is no shortage of primary source material on how value investors think about cheapness, quality, and capital allocation. That literature gives the AI enough context to build a scoring model that reflects the thesis — not just a generic valuation spreadsheet. The depth of the intellectual tradition is the reason the machine can hold it.
The honest summary
The AI is the contractor that built the building. The building runs without the contractor. The investment thesis is what the building was designed to serve.
That framing — AI builds the deterministic infrastructure, the deterministic infrastructure operationalizes the thesis, the thesis comes from a hundred years of careful thinking about capital allocation — is how I expect most serious investment tooling to work for the next decade.
What it isn’t: AI scoring your stocks. What it is: AI compressing the time and precision required to build a machine that applies a rigorous framework systematically, so the human’s time concentrates on the judgment work the machine can’t do.
The value theory is a hundred years old. The databases are public. The engineering is, in retrospect, not the hard part. The hard part is having the patience to complete the checklist before writing the thesis — and the discipline to build a machine that enforces that patience, not one that shortcuts it.