Financial forecasting tool using historical data and other variablesRecord inserted or updated successfully.
AI & Audit Automation

Financial forecasting tool using historical data and other variables

Author : CA. Shilpa Susan Kuriachan

Watch on Youtube

As Chartered Accountants, we need to prepare projections / project reports / projected cash flows. These projections analyse the historical data of the company as well as the future trends in the industry / market environment. The AI Use case prepares a powerful tool to help CAs analyse these external parameters.


Prompt 1 to ChatGPT-5 - Using the data in the attached excel file, how do I prepare a financial forecasting report quarter wise for period 2020 to 2025

Quarter ended Net Sales Rs Gold price per 10gms Festive season Other business factors

30-Jun-16 213832 27445 0 0

30-Sep-16 198751 27445 0 0

31-Dec-16 325500 27445 1 0

31-Mar-17 286209 29156 0 0

30-Jun-17 335221 29156 0 0

30-Sep-17 271076 29156 0 0

31-Dec-17 356986 29156 1 0

31-Mar-18 329206 31391 0 0

30-Jun-18 357166 31391 0 0

30-Sep-18 358208 31391 0 0

31-Dec-18 489034 31391 1 0

31-Mar-19 398550 39108 0 0

30-Jun-19 404744 39108 0 0

30-Sep-19 352777 39108 0 0

31-Dec-19 540866 39108 1 0

31-Mar-20 375435 50151 0 1


Prompt 2 – give me a excel macro code for preparing the forecast file

Prompt 3 – create an excel macro based on sarimax to prepare forecast report

Prompt 4 – errors generated are as below :

Shell shellCommand, vbNormalFocus error

Forecast_report sheet not displayed

blank excel is displayed

SyntaxError: unexpected character after line continuation character

SyntaxError: invalid syntax

SyntaxError: invalid syntax

❌ No Excel file passed.

NameError: name 'pd' is not defined. Did you mean: 'id'?

KeyError: "None of [Index(['Gold price per 10gms', 'Festive season', 'Other business factors'], dtype='object')] are in the [columns]"

ValueError: Pandas data cast to numpy dtype of object. Check input data with np.asarray(data).

exog contains inf or nans


Prompt 5 – Please re-write the python script

# === FINAL AUTO-CLEANING SARIMAX FORECAST SCRIPT ===

import sys, os, time, re

import pandas as pd

import numpy as np

from statsmodels.tsa.statespace.sarimax import SARIMAX

from openpyxl import load_workbook


print("Raw argv:", sys.argv)


# === 1. Read Excel path ===

try:

excel_file = sys.argv[1]

except IndexError:

print("❌ No Excel file passed.")

sys.exit(1)


print(f"📘 Processing workbook: {excel_file}")


# === 2. Load Excel ===

df = pd.read_excel(excel_file, sheet_name="Sheet1")

df.columns = df.columns.str.strip().str.lower()

print("Detected columns:", df.columns.tolist())


# === 3. Universal cleaner for Indian-style data ===

def clean_to_numeric(val):

"""Convert Excel cell to clean float (handles Rs, ₹, commas, %, text)."""

if pd.isna(val):

return np.nan

if isinstance(val, str):

# remove currency symbols, commas, and spaces

val = val.replace("Rs.", "").replace("Rs", "").replace("₹", "")

val = re.sub(r"[^\d\.\-]", "", val) # keep only digits, dot, minus

try:

return float(val)

except:

return np.nan


for col in df.columns:

if "quarter" not in col:

df[col] = df[col].apply(clean_to_numeric)


# === 4. Identify columns ===

date_col = [c for c in df.columns if "quarter" in c][0]

df[date_col] = pd.to_datetime(df[date_col], errors="coerce")

df.set_index(date_col, inplace=True)

df = df.asfreq("QE")


target_col = [c for c in df.columns if "net" in c and "sale" in c][0]

endog = df[target_col]


possible_exog = [c for c in df.columns if any(k in c for k in ["gold", "festive", "business"])]

exog = df[possible_exog] if possible_exog else None

print("Matched exogenous columns:", possible_exog)


# === 5. Clean missing / infinite values ===

df.replace([np.inf, -np.inf], np.nan, inplace=True)

df.ffill(inplace=True)

df.bfill(inplace=True)


if exog is not None:

exog.replace([np.inf, -np.inf], np.nan, inplace=True)

exog.ffill(inplace=True)

exog.bfill(inplace=True)


# Drop any completely empty rows

if exog is not None:

combined = pd.concat([endog, exog], axis=1).dropna()

endog, exog = combined[target_col], combined[exog.columns]

else:

endog = endog.dropna()


print(f"✅ Clean data: Endog length = {len(endog)}, Exog shape = {None if exog is None else exog.shape}")


# === 6. Data sanity checks ===

if len(endog) == 0:

print("❌ No numeric data found after cleaning. Please verify Excel values.")

sys.exit(1)

if len(endog) < 8:

print("⚠️ Too few valid data points — switching to univariate SARIMA.")

exog = None


# === 7. Fit SARIMAX or SARIMA model ===

try:

print("🔄 Fitting SARIMAX model...")

model = SARIMAX(endog, exog=exog, order=(1, 1, 1), seasonal_order=(1, 1, 0, 4),

enforce_stationarity=False, enforce_invertibility=False)

results = model.fit(disp=False)

except Exception as e:

print(f"⚠️ Multivariate model failed ({e}). Retrying with univariate SARIMA...")

exog = None

model = SARIMAX(endog, order=(1, 1, 1), seasonal_order=(1, 1, 0, 4),

enforce_stationarity=False, enforce_invertibility=False)

results = model.fit(disp=False)


# === 8. Forecast next 24 quarters (2020–2025) ===

future_steps = 24

future_index = pd.date_range("2020-03-31", periods=future_steps, freq="Q")

future_exog = exog.reindex(future_index, method="ffill") if exog is not None else None


forecast = results.get_forecast(steps=future_steps, exog=future_exog)

forecast_df = forecast.summary_frame(alpha=0.05)

forecast_df = forecast_df.rename(columns={

"mean": "Forecasted Net Sales (Rs)",

"mean_ci_lower": "Lower CI",

"mean_ci_upper": "Upper CI"

})

forecast_df.index.name = "Quarter Ended"


# === 9. Save to Excel ===

forecast_file = os.path.splitext(excel_file)[0] + "_Forecast.xlsx"

forecast_df.to_excel(forecast_file, sheet_name="Forecast_Report")


print(f"\n✅ Forecast_Report successfully saved to:\n{forecast_file}")

print("📊 Forecast preview:")

print(forecast_df.head(5))

time.sleep(1)


# === 10. Generate and save forecast plot ===

import matplotlib.pyplot as plt


plt.figure(figsize=(10, 5))

plt.plot(endog.index, endog, label="Actual Sales", linewidth=2)

plt.plot(forecast_df.index, forecast_df["Forecasted Net Sales (Rs)"], 'r--', label="Forecast", linewidth=2)


# Confidence interval shading

plt.fill_between(

forecast_df.index,

forecast_df["Lower CI"],

forecast_df["Upper CI"],

color="gray", alpha=0.2, label="95% Confidence Interval"

)


plt.title("Quarterly Net Sales Forecast (2020–2025)")

plt.xlabel("Quarter Ended")

plt.ylabel("Net Sales (Rs)")

plt.legend()

plt.grid(True, linestyle="--", alpha=0.6)

plt.tight_layout()


plot_file = os.path.splitext(excel_file)[0] + "_Forecast.png"

plt.savefig(plot_file, dpi=300)

plt.close()


print(f"📈 Forecast chart saved to: {plot_file}")

FINAL OUTPUT


In CMD paste the location of the python script and the source dataset excel file

python "D:/DISA/AICA/presentation/hackathon/Demo/sarimax_forecast.py" "D:/DISA/AICA/presentation/hackathon/Demo/Sourcedataset1.xlsx"