Keeping Your Ledger Private: Building a Local Financial AI

Last month I needed to answer a simple question: “How much more am I spending on food since I moved?” My bank’s app could not tell me. A spreadsheet could, but I had twelve months of transactions across two accounts and did not feel like writing pivot tables.

The obvious move was to paste the CSV into an LLM and ask. But this particular CSV contained my full transaction history — rent, salary, pharmacy visits, every coffee and bar tab — and “paste it into ChatGPT” means “upload it to a server I do not control, under retention policies I did not write.”

So I did what any stubborn engineer would do: I set up a fully local AI financial analyst that runs on my own hardware, answers natural-language questions about my spending, and sends zero bytes over the network.

This post walks through exactly how to build it. Total setup time: 15 minutes.

Why Your Financial Data Deserves Better

Every API call to a cloud LLM is a data transfer. When you send your bank statement to a hosted model, that data traverses the public internet, lands on third-party infrastructure, and becomes subject to policies you cannot audit.

For most tasks, this tradeoff is fine. For financial data, it is not. A single CSV export from your bank contains:

Transaction history — merchant names, amounts, timestamps
Recurring patterns — rent, mortgage, subscriptions, salary deposits
Inferred geography — merchant locations reveal where you live, shop, and travel
Behavioral signals — spending at pharmacies, therapists, or political organizations

Data breaches, subpoenas, rogue employees, and logging bugs are not theoretical risks. They are common. Even if the provider promises not to train on your data, the attack surface is real the moment the data leaves your machine.

The fix is straightforward: run the model locally. No network calls. No API keys. No trust required.

What We Are Building

Before diving into the details, here is the end result:

$ python analyst.py

Local Financial AI ready. Type 'quit' to exit.

You: What was my highest spending category in October?

Analyzing...

Analyst: I'll work through this step by step.

1. Filtering transactions to October 2024 (Date between 2024-10-01 and 2024-10-31).
2. Selecting only debits (Amount < 0).
3. Grouping by Category and summing the absolute values.

Results:
  - Groceries: $487.32
  - Dining: $312.50
  - Transport: $189.00
  - Utilities: $167.44

Your highest spending category in October was **Groceries** at **$487.32**.

Everything above happened on localhost. The model ran on my GPU. The CSV never left my disk.

The Stack

Component	Role
Ollama	Model runtime — handles downloading, quantization, and local inference
Qwen 2.5 Coder	The LLM — strong at structured data, arithmetic, and code generation
Python + `ollama` client	Glue script — loads your data, builds prompts, queries the model
YAML config	Schema definition — teaches the model what your CSV columns mean

Why Qwen 2.5 Coder Specifically?

Financial analysis demands capabilities that most general-purpose chat models lack:

Structured data parsing. Financial data lives in CSVs, tables, and ledgers. The model must interpret column headers, data types, and row relationships without hallucinating structure that is not there.
Reliable arithmetic. “What did I spend on groceries in October?” requires filtering, grouping, and summing. General-purpose chat models are notoriously bad at multi-step math. Code-tuned models perform better here because their training distribution is heavy on deterministic logic — they naturally decompose math into discrete steps rather than guessing.
Code generation. The most reliable way to get an LLM to analyze tabular data is to have it write code that does the analysis. Instead of “mentally” summing 47 transactions, Qwen 2.5 Coder generates a Python snippet or SQL query that computes the answer deterministically.
Structured chain-of-thought. Qwen 2.5 Coder supports a thinking mode where it explicitly plans before answering: identify the relevant columns, determine filter criteria, choose the aggregation method, then execute. This decomposition dramatically reduces errors compared to models that jump straight to an answer.

Hardware requirements are modest. The 7B parameter variant fits in 8 GB of VRAM. The 14B variant runs well on 16 GB. A recent MacBook with Apple Silicon handles either. You do not need a datacenter.

Setup (15 Minutes)

I run this on a Linux machine with an AMD RX 9070 XT (16 GB VRAM). Most modern MacBooks with Apple Silicon will also work — inference is slower but entirely functional.

1. Install Ollama

# Linux / macOS
curl -fsSL https://ollama.com/install.sh | sh

# Verify
ollama --version

2. Pull the Model

# 7B — fits in 8 GB VRAM
ollama pull qwen2.5-coder:7b

# 14B — recommended if you have 16 GB VRAM
ollama pull qwen2.5-coder:14b

3. Smoke Test

ollama run qwen2.5-coder:14b

Type a test prompt. If you get a coherent response, the model is running entirely on your hardware. Exit with /bye.

4. Install Python Dependencies

python -m venv .venv && source .venv/bin/activate
pip install ollama pyyaml

Preparing Your Data

The CSV

Export transactions from your bank as CSV. Before loading it, strip sensitive headers your bank may include (account numbers, routing numbers). The model does not need them, and removing them is good hygiene.

Here is a minimal example — save this as checking_2024.csv to follow along:

Date,Description,Amount,Category
2024-10-01,Whole Foods,-62.47,Groceries
2024-10-02,Shell Gas Station,-45.00,Transport
2024-10-03,Netflix,-15.99,Entertainment
2024-10-05,Employer Inc,4250.00,Income
2024-10-08,Corner Bakery,-18.30,Dining
2024-10-12,Trader Joes,-87.52,Groceries
2024-10-15,Electric Company,-167.44,Utilities
2024-10-18,Uber,-22.00,Transport
2024-10-22,Costco,-137.33,Groceries
2024-10-25,Sushi Palace,-28.50,Dining
2024-10-28,Whole Foods,-200.00,Groceries

The YAML Context File

Raw CSV column names like TRNS_AMT or POST_DT mean nothing to a model without context. Instead of hoping the model infers your schema, define it explicitly.

Create finance_context.yaml:

dataset:
  name: "Personal Ledger - 2024"
  source: "checking_2024.csv"
  description: >
    Monthly bank statement exports from checking account.
    Contains all debit and credit transactions for the calendar year 2024.

schema:
  columns:
    - name: "Date"
      type: "date"
      format: "YYYY-MM-DD"
      description: "Transaction posting date"
    - name: "Description"
      type: "string"
      description: "Merchant name or transaction description as reported by the bank"
    - name: "Amount"
      type: "float"
      description: "Transaction amount in CAD. Negative values are debits (spending). Positive values are credits (income)."
    - name: "Category"
      type: "string"
      description: "Spending category. One of: Groceries, Dining, Transport, Utilities, Rent, Entertainment, Healthcare, Income, Transfer, Other"

life_events:
  - date: "2024-06-15"
    event: "Moved to a new apartment"
    impact: "Rent increased from $1,800 to $2,400. Utilities provider changed."
  - date: "2024-09-01"
    event: "Changed jobs"
    impact: "Salary deposits shifted from bi-weekly to semi-monthly. Commute costs decreased due to remote work."

analysis_rules:
  - "Always filter by date range before aggregating"
  - "Treat negative amounts as spending and positive amounts as income"
  - "When asked about 'highest spending', return the category with the largest total absolute value of negative transactions"
  - "Round currency values to two decimal places"

Why this matters: This YAML file acts as a persistent system prompt. It tells the model what the data looks like, what the columns mean, and how to interpret ambiguous queries. You write it once and reuse it across sessions. When you switch banks or add accounts, update the schema — the model’s accuracy is directly proportional to how well this file describes your data.

Adapting to your bank: Export a CSV, open it, note the column names and formats, and map them into the schema above. Most banks use slightly different headers — Transaction Date vs Date, Debit vs negative Amount — so spend two minutes getting this right.

The Script

Save this as analyst.py. It loads the YAML context, reads your CSV, and runs an interactive Q&A loop against the local model.

import csv
import yaml
import ollama

MODEL = "qwen2.5-coder:14b"

def load_context(yaml_path: str) -> dict:
    with open(yaml_path, "r") as f:
        return yaml.safe_load(f)

def load_csv_as_text(csv_path: str, max_rows: int = 100) -> str:
    """Read a CSV and convert it to a text table for the model.

    Note: If your CSV exceeds max_rows, only the first max_rows are included.
    Pre-filter large files by date range to ensure the model sees the data
    you actually care about.
    """
    with open(csv_path, "r") as f:
        reader = csv.DictReader(f)
        rows = []
        for i, row in enumerate(reader):
            if i >= max_rows:
                rows.append(f"... (truncated — {i} of more rows shown)")
                break
            rows.append(" | ".join(f"{k}: {v}" for k, v in row.items()))
    return "\n".join(rows)

def build_system_prompt(context: dict, csv_text: str) -> str:
    """Construct a system prompt from the YAML context and CSV data."""
    schema_desc = "\n".join(
        f"  - {col['name']} ({col['type']}): {col['description']}"
        for col in context["schema"]["columns"]
    )

    events_desc = ""
    if "life_events" in context:
        events_desc = "\nLIFE EVENTS (Consider these when explaining spending changes):\n"
        events_desc += "\n".join(
            f"  - {event['date']}: {event['event']} ({event['impact']})"
            for event in context["life_events"]
        )

    rules = "\n".join(f"  - {rule}" for rule in context["analysis_rules"])

    return f"""You are a financial data analyst. You analyze personal transaction data
and answer questions precisely using the data provided.

DATASET: {context['dataset']['name']}
DESCRIPTION: {context['dataset']['description']}

SCHEMA:
{schema_desc}
{events_desc}

ANALYSIS RULES:
{rules}

TRANSACTION DATA:
{csv_text}

When answering:
1. First, identify which columns and filters are relevant to the question.
2. Describe your calculation step by step.
3. Provide the final answer with exact numbers rounded to two decimal places.
If the data is insufficient to answer, say so explicitly."""

def ask(question: str, system_prompt: str) -> str:
    """Send a question to the local Qwen model and return the response."""
    response = ollama.chat(
        model=MODEL,
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": question},
        ],
    )
    return response["message"]["content"]

def main():
    context = load_context("finance_context.yaml")
    csv_text = load_csv_as_text(context["dataset"]["source"])
    system_prompt = build_system_prompt(context, csv_text)

    print("Local Financial AI ready. Type 'quit' to exit.\n")

    while True:
        question = input("You: ").strip()
        if question.lower() in ("quit", "exit", "q"):
            break
        if not question:
            continue

        print("\nAnalyzing...\n")
        answer = ask(question, system_prompt)
        print(f"Analyst: {answer}\n")

if __name__ == "__main__":
    main()

How the Reasoning Chain Works

When you ask “What was my highest spending category in October?”, the model does not eyeball the answer. Because the system prompt instructs step-by-step analysis and because Qwen 2.5 Coder is trained to decompose problems computationally, its reasoning follows a structured path:

Filter by date — select rows where Date falls within 2024-10-01 to 2024-10-31
Filter by type — keep only rows where Amount < 0 (debits)
Group — group filtered rows by Category
Aggregate — sum the absolute values of Amount within each group
Rank — return the category with the highest sum

This mirrors exactly how you would write a GROUP BY query or a pandas pipeline. The coding-centric training makes this decomposition natural — the model “thinks in code” even when responding in prose.

Security and Isolation

This is the core value proposition of the entire setup. Let’s be explicit about the security properties:

No network calls. Ollama serves the model on localhost:11434. The Python script communicates over a local socket. No data touches a network interface. You can verify this yourself: run the entire workflow with your Wi-Fi off.

No telemetry. Ollama is open source. No analytics, no usage reporting, no phone-home behavior. Audit the code if you want to confirm.

No persistent memory. LLMs do not “remember” between sessions. Each conversation starts fresh. Your financial data exists only in the system prompt for the duration of the query and is discarded when the process ends.

File system isolation. The model cannot access your file system. It only sees the text you pass in the prompt. Even if the model were somehow compromised, it has no mechanism to read files, open sockets, or exfiltrate data.

You control the hardware. The model weights sit on your disk. Inference happens on your CPU or GPU. There is no server to be breached, no API key to be leaked, no third-party employee who can view your prompts.

This is data sovereignty in the most literal sense. The data never leaves your custody.

Practical Tips From Daily Use

I have been running this setup daily for several weeks. A few lessons:

Pre-filter large datasets. The 14B model handles several hundred rows well. If you have thousands of transactions, filter by date range before passing to the model — otherwise you hit the context window limit and the model works with truncated (potentially misleading) data.
Treat the YAML as a living document. When you add a new account, change banks, or start tracking a new category, update the schema. Five minutes of YAML editing saves hours of confused model output.
Spot-check early, relax later. For your first few sessions, run queries where you already know the answer. If the model says you spent $342.17 on dining in March, open a spreadsheet and verify. Once you have confirmed accuracy on a dozen queries, you can trust the output with more confidence.
Use life_events liberally. The model cannot know you moved apartments or changed jobs unless you tell it. These annotations are what turn a generic calculator into something that actually understands your financial story.

Conclusion

For years, the assumption has been that useful AI requires cloud infrastructure — massive GPU clusters, proprietary APIs, data pipelines routed through someone else’s servers. That assumption is increasingly wrong.

Models like Qwen 2.5 Coder, running on consumer hardware via Ollama, are capable of real analytical work. Not toy demos — actual, useful analysis of real data, running on a laptop or a gaming rig.

For financial data, local-first is not a preference. It is a requirement. Your spending history is not training data for OpenAI or Google to serve you targeted ads. Your bank statements are not prompts for a model you do not control. And your ledger is nobody’s business but your own.

The setup takes fifteen minutes. The only thing between you and a private financial AI is the decision to stop sending your data to someone else’s computer.