Unlocking Institutional Memory with AI: Reimagining Audit Knowledge Management

Created on 2025-03-25 13:33

Published on 2025-06-02 11:46

The Hidden Knowledge Challenge

Every organization has documents on shared drives—hard to find, hard to use, and often forgotten due to factors such as organizational silos, lack of awareness, insufficient metadata, or resistance to new technologies. Teams have shared folders of data inherited from previous teams or team members. In a lot of cases, new staff or team members aren’t given dedicated time to become familiar with this old data even though it might have been foundational to the current team workflows.

During my time as a legislative auditor, I produced massive amounts of documentation used to plan audits, learn about an agency and their processes, develop findings and conclusions, and communicate results to a wide audience.

Untapped Resources Gathering Digital Dust

Just think of all the documents created in organizations that are potentially never used again after the immediate project or need is satisfied:

These documents can represent hundreds or thousands of hours of work, yet it’s difficult to leverage this knowledge base for future work because the information is trapped in files tucked away in old project files unknown to people not involved in creating them.

Retrieval Augmented Generation: A Knowledge Base Empowered Chatbot

What if there was a better way to unlock the value in these document repositories? One popular technique to turn a set of files into an AI knowledge base is called Retrieval Augmented Generation or RAG.

RAG involves integrating a knowledge base (like a folder full of pdfs) into a searchable index of data that can be retrieved and fed to the language model to generate responses to a query.

RAG involves four key processes

  1. Indexing - Documents are broken down into meaningful chunks and stored in a searchable database.

  2. Retrieval - When a user poses a natural language question, the system searches the indexed documents for relevant information.

  3. Augmentation - The retrieved content is combined with the user's query to enhance context.

  4. Generation - The LLM generates responses informed by the retrieved documents.

There potential benefits of embracing AI frameworks like RAG are significant.

Hands-on RAG Example

I put together this Google Colab notebook to breakdown this process a little more for anyone that wants to try it out. The notebook should open with some pdfs included in a Reports folder. Feel free to put your own reports in there and change the questions based on what’s included in them. One thing you will need is a paid OpenAI account and an API key to use the model.

https://colab.research.google.com/drive/11ZXW4WeTSGsvmIAF1epVhQ29-Yik28Cg?usp=sharing

The objective was straightforward: transform a set of static documents into an interactive knowledge base without requiring complex infrastructure. The last step in the notebook actually will show you the model’s response to the query along with the top sources retrieved to fill out the model’s response.

For my test case, I used audit reports that I had helped create as a legislative auditor. In some cases, I wrote the report; in others, I was a team member performing testing. I focused on my own work because I wanted to easily spot any errors in the responses—a critical step in evaluating AI solutions before implementing them into workflows.

If you want to follow along with the example questions I set up, you'll need to follow the links below to download the reports and upload them to the notebook.

Broader implications

With a tool like this, each team member can search through the collective knowledge of past work in their own way, new team members can have easy access to institutional knowledge, and teams can make more informed decisions about approaches and directions for new projects.

Example questions from the notebook:

Example output

Other Considerations

This notebook uses a small but powerful closed model, meaning the pdfs you upload are made available to OpenAI’s get-4o-mini model. A solution like this is probably not appropriate for files that have sensitive, confidential, or proprietary information.

This notebook also uses OpenAI to create an index of searchable text and an engine for generating responses from your documents so there will be a cost to using this notebook, although it will be minimal for a small collection of pdfs.

Let me know if you have questions or ideas about this kind of tool framework.

#AI #RAG #RetrievalAugmentedGeneration #KnowledgeManagement #DocumentAI #AuditInnovation #LegislativeAudit