Created on 2024-11-05 18:30
Published on 2024-11-06 14:42
In the world of finance and auditing, efficiency and accuracy are paramount. Traditional methods of data compilation—especially when dealing with numerous spreadsheets—are not only time-consuming but also prone to human error. However, with the advent of AI tools like ChatGPT and Python's Jupyter Notebooks, handling complex data extraction, transformation, and compilation has never been easier.
In case you're unfamiliar, Jupyter Notebooks are interactive documents that combine live code, visualizations, and narrative text in one place. They allow you to write and run code in small, manageable chunks while documenting your process, making them great for both development and sharing your work.
The Challenge of Manual Data Compilation
In my experience as a legislative auditor, compiling data from multiple spreadsheets was a labor-intensive task that could take days to complete. Whether it was income statement data from various districts, staffing records from different contractors, or operational data about program participants, the process involved endless copying and pasting. This method was not only inefficient but also increased the risk of errors that could compromise the integrity of the analysis and add days or weeks to project timelines.
Introducing AI-Powered Automation
One of my favorite examples of leveraging AI for automation involves using ChatGPT in conjunction with Python and Jupyter Notebooks. While Python programming might seem daunting to many finance and auditing professionals, AI tools have made it significantly more accessible. The initial learning curve is surmountable, especially with AI led step-by-step assistance tailored to any learning level.
Importantly, while ChatGPT created the code, this approach processes all data locally on your computer—meaning your sensitive financial or personal information never leaves your system, maintaining complete data security.
A Practical Example: Compiling Templates into One Dataset
In the video below, I start with a folder of template-based spreadsheets with information about districts, names, positions, and salaries and compile all the data into one data table for further analysis. The python code is written by ChatGPT and it's running in a Jupyter notebook.
This Python code reads each spreadsheet in the folder, extracts the specified data from multiple tabs, and compiles it into a single dataset according to my preferences. The result is a neatly organized dataset exported to a file I can use in whatever way is needed.
Prompt Overview
Here's the exact prompt I used to create this code with ChatGPT with some other options to suit different needs.
'''The overall goal is to create a python jupyter notebook to {extract, compile, analyze, graph…} data from a {table, spreadsheet, folder of spreadsheets…} to one table of data in a spreadsheet.
Tab Name: 'Input Data' - The first few rows are blank. I want the row headers in A4 and A5 to be column headers in the output dataset. I want the values in B4 and B5 to repeat for as many rows are in the 'Salaries' tab.
Tab Name: 'Salaries' - I want the headers in A3:F3 to be the next columns in the resulting dataset. The data begins in A4:F4 but each spreadsheet has a different number of rows to extract. There are blank cells at the end of the data.
Here's the folder location: path\to\data
Name the resulting file in the same location: Program_Salaries.xlsx'''
Final Thoughts
What once took days now happens in a matter of seconds. The automation not only saves time but also enhances accuracy by minimizing human error. The streamlined process allows professionals to focus on analysis and decision-making rather than mundane data handling.
Integrating AI into your business processes doesn't mean overhauling everything or compromising data security by sending your data to AI black boxes. It's about leveraging AI as a tool to help you discover simpler, more efficient ways to work. The initial investment in learning and adopting these tools is well worth the effort, allowing new levels of productivity while keeping your operations streamlined and secure.
Here's a link to the actual data and notebook if you want to try it for yourself.
https://github.com/scottlabbe/excel_extract_salaries/tree/main