Financial forecasting tool using historical data and other variables
Author : CA. Shilpa Susan Kuriachan
Author : CA. Shilpa Susan Kuriachan
As Chartered Accountants, we need to prepare projections / project reports / projected cash flows. These projections analyse the historical data of the company as well as the future trends in the industry / market environment. The AI Use case prepares a powerful tool to help CAs analyse these external parameters.
Prompt 1 to ChatGPT-5 - Using the data in the attached excel file, how do I prepare a financial forecasting report quarter wise for period 2020 to 2025
Quarter ended Net Sales Rs Gold price per 10gms Festive season Other business factors
30-Jun-16 213832 27445 0 0
30-Sep-16 198751 27445 0 0
31-Dec-16 325500 27445 1 0
31-Mar-17 286209 29156 0 0
30-Jun-17 335221 29156 0 0
30-Sep-17 271076 29156 0 0
31-Dec-17 356986 29156 1 0
31-Mar-18 329206 31391 0 0
30-Jun-18 357166 31391 0 0
30-Sep-18 358208 31391 0 0
31-Dec-18 489034 31391 1 0
31-Mar-19 398550 39108 0 0
30-Jun-19 404744 39108 0 0
30-Sep-19 352777 39108 0 0
31-Dec-19 540866 39108 1 0
31-Mar-20 375435 50151 0 1
Prompt 2 – give me a excel macro code for preparing the forecast file
Prompt 3 – create an excel macro based on sarimax to prepare forecast report
Prompt 4 – errors generated are as below :
Shell shellCommand, vbNormalFocus error
Forecast_report sheet not displayed
blank excel is displayed
SyntaxError: unexpected character after line continuation character
SyntaxError: invalid syntax
SyntaxError: invalid syntax
❌ No Excel file passed.
NameError: name 'pd' is not defined. Did you mean: 'id'?
KeyError: "None of [Index(['Gold price per 10gms', 'Festive season', 'Other business factors'], dtype='object')] are in the [columns]"
ValueError: Pandas data cast to numpy dtype of object. Check input data with np.asarray(data).
exog contains inf or nans
Prompt 5 – Please re-write the python script
# === FINAL AUTO-CLEANING SARIMAX FORECAST SCRIPT ===
import sys, os, time, re
import pandas as pd
import numpy as np
from statsmodels.tsa.statespace.sarimax import SARIMAX
from openpyxl import load_workbook
print("Raw argv:", sys.argv)
# === 1. Read Excel path ===
try:
excel_file = sys.argv[1]
except IndexError:
print("❌ No Excel file passed.")
sys.exit(1)
print(f"📘 Processing workbook: {excel_file}")
# === 2. Load Excel ===
df = pd.read_excel(excel_file, sheet_name="Sheet1")
df.columns = df.columns.str.strip().str.lower()
print("Detected columns:", df.columns.tolist())
# === 3. Universal cleaner for Indian-style data ===
def clean_to_numeric(val):
"""Convert Excel cell to clean float (handles Rs, ₹, commas, %, text)."""
if pd.isna(val):
return np.nan
if isinstance(val, str):
# remove currency symbols, commas, and spaces
val = val.replace("Rs.", "").replace("Rs", "").replace("₹", "")
val = re.sub(r"[^\d\.\-]", "", val) # keep only digits, dot, minus
try:
return float(val)
except:
return np.nan
for col in df.columns:
if "quarter" not in col:
df[col] = df[col].apply(clean_to_numeric)
# === 4. Identify columns ===
date_col = [c for c in df.columns if "quarter" in c][0]
df[date_col] = pd.to_datetime(df[date_col], errors="coerce")
df.set_index(date_col, inplace=True)
df = df.asfreq("QE")
target_col = [c for c in df.columns if "net" in c and "sale" in c][0]
endog = df[target_col]
possible_exog = [c for c in df.columns if any(k in c for k in ["gold", "festive", "business"])]
exog = df[possible_exog] if possible_exog else None
print("Matched exogenous columns:", possible_exog)
# === 5. Clean missing / infinite values ===
df.replace([np.inf, -np.inf], np.nan, inplace=True)
df.ffill(inplace=True)
df.bfill(inplace=True)
if exog is not None:
exog.replace([np.inf, -np.inf], np.nan, inplace=True)
exog.ffill(inplace=True)
exog.bfill(inplace=True)
# Drop any completely empty rows
if exog is not None:
combined = pd.concat([endog, exog], axis=1).dropna()
endog, exog = combined[target_col], combined[exog.columns]
else:
endog = endog.dropna()
print(f"✅ Clean data: Endog length = {len(endog)}, Exog shape = {None if exog is None else exog.shape}")
# === 6. Data sanity checks ===
if len(endog) == 0:
print("❌ No numeric data found after cleaning. Please verify Excel values.")
sys.exit(1)
if len(endog) < 8:
print("⚠️ Too few valid data points — switching to univariate SARIMA.")
exog = None
# === 7. Fit SARIMAX or SARIMA model ===
try:
print("🔄 Fitting SARIMAX model...")
model = SARIMAX(endog, exog=exog, order=(1, 1, 1), seasonal_order=(1, 1, 0, 4),
enforce_stationarity=False, enforce_invertibility=False)
results = model.fit(disp=False)
except Exception as e:
print(f"⚠️ Multivariate model failed ({e}). Retrying with univariate SARIMA...")
exog = None
model = SARIMAX(endog, order=(1, 1, 1), seasonal_order=(1, 1, 0, 4),
enforce_stationarity=False, enforce_invertibility=False)
results = model.fit(disp=False)
# === 8. Forecast next 24 quarters (2020–2025) ===
future_steps = 24
future_index = pd.date_range("2020-03-31", periods=future_steps, freq="Q")
future_exog = exog.reindex(future_index, method="ffill") if exog is not None else None
forecast = results.get_forecast(steps=future_steps, exog=future_exog)
forecast_df = forecast.summary_frame(alpha=0.05)
forecast_df = forecast_df.rename(columns={
"mean": "Forecasted Net Sales (Rs)",
"mean_ci_lower": "Lower CI",
"mean_ci_upper": "Upper CI"
})
forecast_df.index.name = "Quarter Ended"
# === 9. Save to Excel ===
forecast_file = os.path.splitext(excel_file)[0] + "_Forecast.xlsx"
forecast_df.to_excel(forecast_file, sheet_name="Forecast_Report")
print(f"\n✅ Forecast_Report successfully saved to:\n{forecast_file}")
print("📊 Forecast preview:")
print(forecast_df.head(5))
time.sleep(1)
# === 10. Generate and save forecast plot ===
import matplotlib.pyplot as plt
plt.figure(figsize=(10, 5))
plt.plot(endog.index, endog, label="Actual Sales", linewidth=2)
plt.plot(forecast_df.index, forecast_df["Forecasted Net Sales (Rs)"], 'r--', label="Forecast", linewidth=2)
# Confidence interval shading
plt.fill_between(
forecast_df.index,
forecast_df["Lower CI"],
forecast_df["Upper CI"],
color="gray", alpha=0.2, label="95% Confidence Interval"
)
plt.title("Quarterly Net Sales Forecast (2020–2025)")
plt.xlabel("Quarter Ended")
plt.ylabel("Net Sales (Rs)")
plt.legend()
plt.grid(True, linestyle="--", alpha=0.6)
plt.tight_layout()
plot_file = os.path.splitext(excel_file)[0] + "_Forecast.png"
plt.savefig(plot_file, dpi=300)
plt.close()
print(f"📈 Forecast chart saved to: {plot_file}")
FINAL OUTPUT
In CMD paste the location of the python script and the source dataset excel file
python "D:/DISA/AICA/presentation/hackathon/Demo/sarimax_forecast.py" "D:/DISA/AICA/presentation/hackathon/Demo/Sourcedataset1.xlsx"