AI-Powered Workflow - Automation of IT Refund Status Check & CommunicationRecord inserted or updated successfully.
AI & Audit Automation

AI-Powered Workflow - Automation of IT Refund Status Check & Communication

Author : CA. kadiyala praveen

Watch on Youtube

The Problem Statement

Challenge: Manual, repetitive and time-consuming process for checking Income Tax Refund Status and downloading and mailing ITR forms to clients.

Description of Manual Process: Chartered Accountants and their staff must manually perform the following steps for every client:

  1. Log in to the Income Tax Portal.
  2. Navigate to the 'Know Your Refund Status' section and input the Assessment Year.
  3. Navigate to the 'E-File' → 'View Filed ITR' section to download the filed ITR form.
  4. Draft and send a separate email with the ITR form attached to the client.
  5. Send a follow-up notification via WhatsApp.

This multi-step process is highly susceptible to human error, is monotonous, and significantly consumes billable hours, especially during peak filing seasons.

The Solution (Automated Workflow)

The proposed solution is a hybrid automation model that combines Robotic Process Automation (RPA) for task execution and AI for intelligent data interaction, creating a seamless and efficient end-to-end compliance workflow.

Key Components:

  1. Input Data: A master Excel sheet containing client-specific variables (PAN, Portal Password, Email ID, WhatsApp Number, Assessment Year).
  2. Automation Tool: Microsoft Power Automate Desktop (PAD).
  3. AI Layer: Python (Streamlit, Pandas) integrated with the Gemini API.

Automated Process Flow:

  1. Read Input: PAD reads the input data from the master Excel file.
  2. RPA Execution (PAD):
  3. Logs into the Income Tax Portal using client credentials.
  4. Navigates to download the Refund Status and the Filed ITR Form for the specified Assessment Year.
  5. Utilizes built-in actions to send the downloaded ITR Form via email to the client.
  6. Automates WhatsApp Web to send a direct notification confirming the email dispatch.
  7. Data Logging & Error Handling: PAD logs the Refund Status and the overall Flow Status (e.g., 'Mailed Successfully', 'Invalid Username/Password') back into the Excel sheet.
  8. AI Data Interface (Chatbot): The updated Excel sheet serves as the dataset for an AI-powered chatbot, allowing team members to ask natural language queries about the aggregated client data and related tax provisions.



AI Implementation (The Gemini API & Chatbot)

The solution moves beyond basic RPA by implementing an AI-driven interface using the Gemini API for sophisticated natural language processing (NLP) and document generation.

ComponentTechnology UsedFunctionality & Role of AI

Data InterfacePython, Streamlit, PandasStreamlit provides the user-friendly interface. Pandas is used to ingest and process the updated dataset containing refund statuses.
Intelligent Query HandlingGemini API (Google AI)The Gemini API processes complex, conversational queries related to the uploaded dataset.
Examples of AI Capability:Data Retrieval: "What is the refund status for client ABC?" → AI searches the dataset and provides the exact status.
Domain Expertise: "How long does it typically take to credit a refund as per IT Act Provisions?" → AI retrieves and summarizes relevant Income Tax provisions.
Content Generation: "Draft a professional email to client XYZ explaining their refund status is 'Sent to Bank.'" → AI generates a structured, context-aware email draft.

Impact, Efficiency Gains, and Scalability

Key MetricBefore Automation (Manual)After Automation (RPA + AI)

Time per Client∼5−7 minutes<1 minute (run time)
Time SavingsN/A∼80−90% of process time saved.
Accuracy/Error RateProne to human errors in credential entry, email addressing, or attachment oversight.Near 100% accuracy. Errors (e.g., 'Invalid Password') are logged and isolated automatically.
Staff FocusMonotonous data entry and repetitive communication.Value-added tasks, client consultation, and complex case analysis.

Key Benefits:

  1. Enhanced Client Service: Clients receive their ITR forms and refund status notifications immediately and automatically, improving the perception of professionalism and speed.
  2. Efficiency: Frees up significant human resource time, allowing CA firms to handle a larger volume of clients without increasing staff size.
  3. Scalability: The underlying logic is universally applicable. This model can be replicated to automate data collection and status checks for other statutory portals like GST, EPF, MCA, TDS Reconciliation, etc., creating a unified compliance dashboard.
  4. Actionable Intelligence: The AI layer transforms a static report (Excel sheet) into a dynamic, queryable knowledge base, providing instant insights and drafting capabilities for the team.


Input file before running the flow: (User need to fill PAN/Password/Mail/Whatsapp number/AY)


 

Updated Excel after running the flow:



AI Chatbot Interface using Python/Streamlit/Gemini API:

Interface for uploading the database



AI Response from Dataset – Query & Bot Response:




Power Automate Desktop Flows for reference:








Python + Streamlit + Gemini API code:

import streamlit as st

import pandas as pd

import os

from google import genai

from google.genai.errors import APIError

st.set_page_config(page_title="Advanced Data Query Bot 🤖", layout="wide")

GEMINI_API_KEY = XXXXXXXXXXXXXXXXXXXXXXX

try:

   if GEMINI_API_KEY == "YOUR_ACTUAL_GEMINI_API_KEY_HERE" or not GEMINI_API_KEY:

       st.error("API Key not found. Please replace the placeholder in the code with your actual key.")

       st.stop()

     

   client = genai.Client(api_key=GEMINI_API_KEY)

   model = "gemini-2.5-flash"

   

   tools_config = [{"google_search": {}}]

   

except Exception as e:

   st.error(f"Error initializing Gemini client: {e}")

   st.stop()


@st.cache_data

def load_data(uploaded_file):

   """Loads, standardizes, and caches the Excel data."""

   try:

       df = pd.read_excel(uploaded_file)

       df.columns = ["AY", "Refund Status", "Client Name", "Refund $", "Date of Filing", "Refund Date"]

       

       date_cols = ["Date of Filing", "Refund Date"]

       for col in date_cols:

           df[col] = pd.to_datetime(df[col], errors='coerce').dt.strftime('%Y-%m-%d')

           

       df[date_cols] = df[date_cols].fillna('N/A')           

       return df

   except Exception as e:

       st.error(f"Error loading or standardizing data: {e}. Check your file structure.")

       return pd.DataFrame()

def generate_gemini_response(df, user_query):

   """

   Crafts a prompt with the data and sends it to the Gemini model

   for context-aware querying, including tool use.

   """

 data_text = df.astype(str).to_markdown(index=False)

       system_instruction = (

       "You are an expert financial data analysis bot and tax assistant. "

       "Your primary task is to answer the user's question using the provided 'Table Data'. "

       "The columns are AY, Refund Status, Client Name, Refund $, Date of Filing, and Refund Date. "

       

       "**DATA QUERY INSTRUCTIONS:** "

       "If the query relates to the data (e.g., specific Client Name or status): "

       "1. Identify the relevant rows based on the client name or other identifiers. "

       "2. If the user asks for the **time taken to credit the refund**, calculate the difference "

       "between 'Refund Date' and 'Date of Filing' for the specific client. "

       "3. Provide the requested information clearly and concisely. "

       

       "**GENERIC QUERY INSTRUCTIONS (If the Google Search Tool is available):** "

       "If the query is a general question (e.g., 'How long does a tax refund usually take?') "

       "that cannot be answered with the table data, use the Google Search tool to find and cite the answer "

       "based on relevant tax provisions or typical processing times. "

       "**Always state your source or mention if the answer is from general knowledge or external search.**"

   )


   user_prompt = (

       f"**User Query:** '{user_query}'\n\n"

       f"**Table Data (use this first):**\n{data_text}"

   )


   try:

       response = client.models.generate_content(

           model=model,

           contents=user_prompt,

config=genai.types.GenerateContentConfig(

system_instruction=system_instruction,

               tools=tools_config

           )

       )

       return response.text

   except APIError as e:

       return f"An API Error occurred: {e}"

   except Exception as e:

       return f"An unexpected error occurred: {e}"



# --- Streamlit App Layout ---


st.title("Advanced Tax Data Bot (Gemini + Search) 📊🔍")

st.markdown("Upload your Excel data and ask specific questions about client refund statuses, time taken, or general tax refund provisions.")


# File Uploader

uploaded_file = st.file_uploader(

   "Upload Excel File (.xlsx or .xls)",

   type=['xlsx', 'xls']

)


if uploaded_file is not None:

   # Load and process the data

   data_df = load_data(uploaded_file)

   

   if not data_df.empty:

       st.success(f"Data loaded successfully with {len(data_df)} records.")

       

       st.subheader("Data Preview")

       st.dataframe(data_df.head())

       

       # User Input for Query

       user_input = st.text_input(

           "Enter your query (e.g., 'How long did it take for the refund of John Doe?' or 'How long does tax refund credit take as per Indian tax law?')",

           key="user_query"

       )

       

       if user_input:

           with st.spinner("Processing your query with Gemini (and Google Search, if needed)..."):

               # Get the response from the Gemini model

               response = generate_gemini_response(data_df, user_input)

           

           # Display the result

           st.subheader("Bot Response")

           st.info(response)

   

   else:

       st.warning("Could not process the uploaded file. Please ensure it's a valid Excel file with the expected headers.")


else:

   st.info("Please upload an Excel file to begin.")