Why Accurate Context Matters More Than Clever Prompting (Part 3)

Created on 2026-01-09

Published on 2026-01-09

I'm trying to show auditors and program managers in government what useful AI use actually looks like by automating a desk review process end-to-end with coding agents like OpenAI's Codex.

Auditing and program management workloads already have documented structure with its rules, templates, approvals, repeatable processes. Coding agents let you turn that structure into a tool you can run whenever you need it, with logs and checks instead of copy/paste madness.

Automating cost report desk reviews

So far in this series of automating desk review processes, we've:

In this next step, we're turning those desk-reviewed results into something districts actually use and rely on for their reimbursement:

One term you'll hear a lot about in the future is: context engineering. It is the practice of designing, structuring, and optimizing the information (context) provided to an AI model to help it produce accurate and high-quality outputs.

In this case, it's about giving the model the right inputs, templates, and guardrails so it fills in the blanks consistently instead of making judgment calls. And in government work, the best kind of context is found in the existing structure and documentation that support programs like statutes, guidance, forms, templates, training materials, and manuals.

Shrinking the lane for the model

Considering the idea of context engineering for a task like this, one the most important factors to think about is how to shrink the lane the model works in. A vague prompt is bad because it can let the model search over infinite possible answers, you're left hoping the model made the right assumptions. The goal should be to give it enough context that it's choosing among the fewest possible number of clearly correct options.

Large AI models are trained to be helpful. If you give them a vague prompt, they might try to "improve" things by rewriting the wording of a letter to sound friendlier, reformatting a standardized table when there are no values to report, or maybe they'll "clarify" or "simplify" a regulatory citation to be more readable.

While that may be fine when you're exploring ideas or creating toy apps but in real-world program management, auditing, and reimbursement work, that's a disaster. You simply cannot have an AI rewriting pre-approved legal templates or contractual language. You need it to fill in the blanks, not redesign the process using its own judgement.

Building a prompt

When I build these tools, I use a framework that treats the AI less like a magic box and more like a junior staff member:

One area where I've changed my prompt style is when realizing that asking the model to "act smart" or "be an expert" isn't the best way to get the output I wanted. It's about designing a lane where the correct result is the easiest result.

I'll share a portion of the prompt for this step below but you'll have to review the Github page to see the entire thing.

Prompt

## Goal
Build a reporting script that:
1. Reads all districts from the database
2. Aggregates cost data per district (state vs. federal totals)
3. Fills in both Word templates for each district
4. Converts each to PDF
5. Saves outputs in a structured folder

## Context
The database `program_management/extract_data/cost_reports.db` contains 
desk-reviewed data for multiple districts. Schema reference is in 
`DATABASE_REFERENCE.md`.

Tables you'll use:
- `cost_reports` – employee-level salary/benefits data
- `contact_info` – district contact metadata  
- `desk_review_findings` – finding narratives per district

The folder `templates/` contains:
- `desk_review_letter_template.docx`
- `desk_review_findings_template.docx`

Completed Reports

Here's an example of the outputs from the python code generated by the Codex agent. An impressive result for sure. It initially was incorrectly listing findings in the findings list when there were no findings but Codex was able to take my feedback and correct the issue.

Reflection

This demo uses six cost reports. The real process I managed had over 400 and took nearly a year and hundreds of billable hours to review the data, document findings, run calculations, and produce reimbursement packets to the appropriate standards.

I'm not claiming this approach replaces that entire engagement. But it shows where the leverage is: once you've defined the rules, templates, and structure, the repetitive execution doesn't have to be manual. Real, important work can shift from producing documents to verifying them.

That's the thread through this whole series. Extraction, validation, and now output generation, none of it required clever prompting or AI tricks. It required pointing the model at the structure it needed to perform the task and narrowing its job to filling in blanks.

Often, it's thought of as bureaucratic overhead if your department or program has training materials or documented procedures gathering dust in a SharePoint folder. It might be time to rethink that when you can use coding agents to transform rules and documents into a robust automated process.

Check the YouTube video if you want to see how well the coding agent can automate this repetitive tasks.

Code & templates on GitHub: https://github.com/scottlabbe/program_management