Bank Statement to Balance Sheet - AI-Powered Transaction Classification and Financial Statement Preparation for Indian CA Practice
AI & Accounting

Bank Statement to Balance Sheet - AI-Powered Transaction Classification and Financial Statement Preparation for Indian CA Practice

Author : CA. Krupanand Bammidi

Watch on Youtube

1. Overview and Objective

Bank Statement to Balance Sheet (v2.0) is a locally-running, AI-assisted accounting automation platform built specifically for Indian CA practice. It accepts bank statements in PDF, Excel, and CSV formats from all major Indian banks, classifies every transaction using a 112-rule deterministic engine combined with Gemini AI, and generates a complete set of financial statements -- Schedule III Balance Sheet, Profit and Loss Account, Notes to Accounts, and Trial Balance -- in a single session.


The platform is built on Python and Streamlit and runs entirely on the CA's local machine. No client data is transmitted to any external server during the main workflow. Where Gemini AI is invoked for ambiguous transactions, all personally identifiable information is automatically stripped from the narration before transmission, in compliance with the Digital Personal Data Protection Act, 2023.


Key outcome: A bank statement engagement that previously required two to three working days for transaction classification and financial statement preparation is completed in under one hour, with higher consistency, full audit traceability, and a DPDP-compliant AI processing log.


2. Target Audience

User GroupPrimary Use

Practising CAs (sole proprietors and small firms)Bank statement processing, classification, Trial Balance and Schedule III generation for proprietary, partnership, LLP, and company clients
CA firms handling multiple SME clientsBatch processing across clients; rule sets carry over between sessions
Statutory AuditorsMulti-agent audit compliance check across Companies Act 2013, Income Tax Act 1961, GST Act 2017, and Labour Law
CA students and articleship traineesSupervised classification with CA review and override at every step


3. Problem Statement

3.1 Time and Accuracy

For a practising CA handling small and medium business clients without in-house bookkeepers, the year-end workflow is familiar: the client provides a bank statement, and the CA must convert it into a Trial Balance and financial statements. For a client with 800 to 1,200 transactions, this classification work consumed two to three working days per client -- time that could not be billed effectively and that created a bottleneck during the March-April filing season.


The narrations in Indian bank statements are rarely descriptive. A single day's transactions might read: "UPI/412938761234/PYTM," "NEFT/AXIB0001234/REF8872991," "NACH DR HDFC CREDILA," "INT DEB 250126." There is no merchant name, no purpose, no account description. Each narration requires a judgment call as to whether it represents a business expense, a capital item, a loan repayment, a statutory payment, or a personal drawing.


Classification errors have downstream consequences. A single misclassification -- a capital expenditure coded as a revenue expense, or a director's drawing coded as a salary -- flows into the Trial Balance, distorts the Profit and Loss, and produces a Balance Sheet that does not balance or does not correctly represent the financial position.

3.2 Data Privacy Constraint

When evaluating cloud-based AI tools to assist with narration classification, a further constraint emerged. Passing raw bank statement data to any external server -- containing client UPI IDs, account numbers, mobile numbers, merchant identities, and transaction amounts -- raises obligations under the Digital Personal Data Protection Act, 2023. As a Data Fiduciary within the meaning of Section 2(i) of that Act, I could not satisfy myself that purpose limitation under Section 5, valid consent under Section 6, and the security safeguard obligations under Section 8(4) were being met by cloud-based AI tools. The solution had to maintain data locally.

3.3Financial Statement Preparation

Even after classification was complete, preparing Schedule III-compliant financial statements required additional manual work: mapping classified ledgers to the correct Schedule III heads, constructing the Notes to Accounts, adding the prior year comparison column from last year's file, verifying the Trial Balance, and populating the entity profile (CIN, FRN, director names, auditor details) across multiple pages. This assembly work added two to four hours per client, even when the classification itself was accurate.


4. Solution Architecture and Process Flow

The platform follows a seven-step sequential workflow with a visual progress tracker. Each step gates the next: the CA must confirm data before proceeding to classification, must confirm classification before proceeding to reports, and must finalise the review queue before generating financial statements. This gate structure ensures no data flows forward without the CA's active approval.

Step 1 -- Upload and Extract

The platform accepts bank statements in three formats via tabbed upload panels: PDF, Excel (.xlsx/.xls), and CSV (.csv/.txt). For text-based PDF statements, extraction uses pdfplumber with bank-specific column profiles tuned for HDFC, ICICI, SBI, Axis, Kotak, IDFC FIRST, Bank of Baroda, and other major Indian banks. The system detects the bank format automatically from header patterns when Auto Detect is selected.


For scanned or image-based PDFs, the system detects insufficient text yield from pdfplumber and automatically invokes pytesseract OCR on rasterised page images at 300 DPI. For mixed documents -- part text, part scanned -- both engines run page-by-page and their outputs are merged before entering the pipeline. An MD5 hash signature prevents re-processing of unchanged files in the same session.

Step 2 -- Clean and Structure

Extracted transactions are displayed in an editable data grid. The CA can correct dates, amounts, or narrations before classification begins. The running balance check validates that the balance column is arithmetically consistent with the debit and credit columns across all rows. Mismatches are displayed with expected balance, actual balance, and difference, enabling the CA to identify extraction errors before they propagate. The CA clicks Confirm Data to lock the cleaned dataset.

Step 3 -- Classification Engine (Three Layers)

This is the core of the platform. Classification proceeds in three sequential layers.


Layer 1 -- Deterministic Rule Engine (112 rules, 10 priority tiers):

Priority TierCoverageConfidence

Tier 1Statutory Payment Identifiers: GST CPIN, TDS TAN/TRACES, PF TRRN, ESI challan, Professional Tax, Advance Tax challan0.95 to 0.98
Tier 2Named Indian counterparties: electricity boards (BESCOM, TNEB, MSEDCL), telecom (Airtel, Jio, BSNL), insurers (LIC, Star Health, New India), gas utilities (Indane, HP Gas)0.90 to 0.96
Tier 3Payroll and employee costs: salary, wages, director remuneration, bonus, PF/ESI employer contribution, gratuity0.88 to 0.94
Tier 4Loan and finance: EMI, NACH debit, OD interest, FD maturity, dividend, interest income0.85 to 0.92
Tier 5Business income sub-categories: exam fees, training, consultation, commission, rental, sales0.85 to 0.90
Tier 6Operational expenses (25 rules): rent, medical supplies, staff welfare, travel, repairs, professional fees, marketing, software, courier, printing0.82 to 0.90
Tier 7Bank charges: service charges, SMS, cheque book, RTGS/NEFT, minimum balance penalty0.88 to 0.95
Tier 8Cash transactions: ATM withdrawals, cash deposits, counter withdrawals0.85 to 0.92
Tier 9Inter-account transfers: own account, sweep-in, sweep-out0.88 to 0.93
Tier 10Capital expenditure flags: large one-time debits to identified capex narration patterns0.78 to 0.88


Director Name Matching: After rule classification, the engine extracts director and signatory names from the entity profile (Director 1, Director 2, Managing Director, Chairman, CFO, Partner). Narrations in flagged rows that contain these name segments are reclassified to Director Remuneration with confidence 0.82 and a Review flag.


Layer 2 -- Pattern Engine (Gemini AI for low-confidence rows):

Transactions that fall below the CA-configurable confidence threshold (default 0.75, range 0.50 to 0.95) are passed to the AI classification module. Before any data leaves the local machine, the PII redaction function strips all personally identifiable information from each narration:

  1. UPI virtual payment addresses (word@word format) replaced with [UPI]
  2. Indian mobile numbers (10-digit, starting with 6-9) replaced with [MOB]
  3. Bank account numbers (9 to 18 consecutive digits) replaced with [ACCT]
  4. IFSC codes (4 alpha + 0 + 6 alphanumeric) replaced with [IFSC]
  5. Email addresses replaced with [EMAIL]
  6. Aadhaar numbers (12-digit with optional spaces) replaced with [AADHAAR]

The Gemini model receives only anonymised narration text, transaction amount, debit/credit direction, and entity nature. It returns Primary Head, Sub Head, Detailed Category, confidence percentage, and a one-sentence reasoning note.


Layer 3 -- Confidence Adjustment and Flag Assignment:

After AI classification, confidence scores are adjusted upward where a valid primary head was returned and the raw score exceeded 0.60. Rows remaining below 0.72 receive Review Flag = Review. Rows classified into generic fallback categories (General Expense, UPI Payment - Verify Purpose, Unclassified Payment - Verify Purpose) are force-set to Review Flag = Review with confidence 0.42, ensuring the CA reviews all genuinely ambiguous rows.


Learning Mode: When enabled, high-confidence AI results (confidence >= 0.85, flag = OK) are stored as persistent user rules, applied in future classification runs for the same client.


Director Balance Sheet Reclassification: After initial classification, rows coded as Director Remuneration where the narration contains advance, drawings, personal expense, or loan to director keywords (and no salary indicators) are automatically reclassified to Assets / Loans and Advances -- Directors with confidence 0.91 and flag Auto-Reclassified.


The Step 3 interface has five tabs:

  1. Classify: Confidence threshold slider, Learning Mode toggle, Classify button, dashboard showing Total / By Rules / Pattern Engine / For Review counts
  2. Results: Payee extraction via narration parser, traffic light flag indicators, status column (Auto-Accepted >= 80% / Review Suggested 60-80% / Needs Confirmation < 60%), filter by status and flag, inline editable classified data, Excel download
  3. Bulk Actions: Groups flagged rows by extracted payee with total debits and credits. Preview count and amounts before applying. Applies reclassification to all rows from selected payee in one click
  4. Gemini AI: Manual single-transaction Gemini assist. PII redaction applied before each API call. Model auto-selected from available Gemini models prioritising flash models. Results shown with Primary Head, Sub Head, Confidence, Category, and Reasoning
  5. Rules: Merged view of user rules (priority <= 50) and system rules (priority > 50) in editable grid. User rules saved persistently to disk

Step 4 -- Tally Import and Reconciliation (Large Client Path)

For clients who maintain books in Tally, the platform accepts Trial Balance exports or voucher exports in Excel, CSV, or JSON format. The format auto-detection identifies whether the uploaded file is a Trial Balance or a voucher register. For Trial Balance format, a ledger-to-account-head resolution engine maps each Tally ledger to a Schedule III Primary Head and Sub Head using two-level matching: (1) GROUP_TO_HEAD covering 23 standard Tally group names, and (2) LEDGER_KEYWORD_MAP covering approximately 80 ledger keyword patterns. Each ledger is converted into a synthetic classified row. Known Tally group aggregate rows are excluded to prevent double-counting. Small Client sessions skip Step 4 automatically.

Step 5 -- Review Fixes

The review queue presents all transactions with Review Flag = Review. Quick action controls: Confirm All, Skip All, Download Review Template (Excel), Upload Reviewed Template. The client email workflow allows the CA to dispatch a review email via Gmail SMTP containing an Excel file of flagged transactions with a DPDP Act 2023 compliant data notice. The client fills in the Purpose column and replies; the CA uploads the response file to apply corrections automatically.

Bulk Edit by narration keyword allows the CA to reclassify all transactions whose narration contains a specified text string in one operation. The Review Interface presents each queued transaction with a Review Action dropdown (Confirm Classification / Skip for Later / Pending) and inline override fields. Finalise Review blocks if any row is still Pending, and advances to Step 6 when all rows are either Confirmed or Skipped.

Step 6 -- Financial Reports

Dashboard KPI cards show Total Income, Total Expenses, and Net Profit / Loss. An Income Hierarchy Reasonableness Check runs three automated warnings before statement generation: (1) Other/Miscellaneous Income exceeding Operating Income (HIGH -- likely classification error); (2) Finance Costs exceeding Operating Expenses (MEDIUM -- loan principal likely miscoded as interest); (3) Total Expenses exceeding three times Revenue (MEDIUM -- capital items likely coded as operating expenses).


The extended Trial Balance Schedule shows Opening Debit/Credit, Period Debit/Credit, and Closing Debit/Credit for every ledger, with balance validation. Opening Balances and Adjustments editors allow entry of prior year closing balances and year-end journal entries (depreciation, provisions, accruals) not captured in the bank statement. Pre-populated with 37 standard ledger rows.


Prior Year Figures panel provides 24 line items (18 Balance Sheet, 6 Profit and Loss) for the previous year comparison column. The Copy from Opening Balances button auto-maps account names to Schedule III line items using a 60-entry account-to-key mapping dictionary.


Generate Financial Statements produces the full output pack:

OutputFormat

Balance SheetSchedule III Division I -- formatted as HTML with company header and auditor sign-off block
Statement of Profit and LossSchedule III -- formatted as HTML with prior year comparison column
Notes to AccountsReference numbers matching the Balance Sheet
Balance Sheet LedgerLedger-level detail underlying each Schedule III line
P&L Ledger DetailTransaction-level detail underlying each P&L line
Trial BalanceStandard debit/credit Trial Balance with balance validation
Ledger SummaryAggregated view by Primary Head and Sub Head
Ledger ScrutinyWorking paper flagging miscellaneous bucket entries
Expense BreakdownCategorised expense detail for audit working papers
Excel WorkbookCover sheet (entity profile), all data sheets, all statement sheets in one workbook
Print-ready HTMLBalance Sheet + P&L for browser print to PDF (Ctrl+P)
Financial Statements EmailFull Excel workbook dispatched to client via Gmail SMTP from within the platform

Step 7 -- Multi-Agent Audit Compliance Check


The MasterAuditController runs a full statutory audit across five domains:

Audit AgentCoverage

Companies Act AgentCARO 2020 reporting requirements, Section 134 (Board's Report), Section 135 (CSR if applicable), Section 184 (Director interest disclosure), Section 188 (Related Party Transactions)
Income Tax AgentCash payments exceeding Rs. 10,000 to a single person in a day (Section 40A(3), Income Tax Act, 1961), TDS deduction verification, advance tax adequacy assessment
GST AgentInput Tax Credit eligibility under Section 16, CGST Act, 2017 including 180-day payment verification, RCM applicability checks
Labour Law AgentPF compliance (EPF and MP Act, 1952), ESI compliance, Bonus Act applicability, Gratuity Act provisions
Financial Analysis AgentKey financial ratios, anomaly detection, variance analysis


Results are displayed across five severity tabs (Critical / High / Medium / Low / Info) and five area tabs (Companies Act / Income Tax / GST / Standards on Auditing / Others). Each finding includes Category, Severity, Section Reference, Description, Amount, and Recommendation. Results persist to audit_results.json for session continuity. The DPDP Act 2023 AI Data Processing Log at the bottom of Step 7 records every Gemini API call with full field-level disclosure.


5. Application Features Summary

FeatureDetail

Supported Input FormatsPDF (text and scanned via OCR), Excel (.xlsx/.xls), CSV (.csv/.txt)
Supported BanksHDFC, SBI, ICICI, Axis, Kotak, IDFC FIRST, Bank of Baroda, and others via Auto Detect
Classification Engine112 rules across 10 priority tiers; Gemini AI Pattern Engine for low-confidence rows
Entity FormatsSchedule III Company, NCE (Proprietorship/Partnership), LLP, NPO/Trust/NGO
Accounting FrameworksAS (Accounting Standards), Ind AS (Indian Accounting Standards)
Output StatementsBalance Sheet, P&L, Notes to Accounts, Trial Balance, Ledger Scrutiny, Expense Breakdown
Prior Year ColumnAuto-populated from opening balances or manual entry; 24 line items
Client EmailFlagged transaction review file and final financial statements via Gmail SMTP from within the platform
Audit ComplianceMulti-agent check across Companies Act, Income Tax, GST, Labour Law
DPDP CompliancePII redaction before every AI call; full API transmission log in Step 7
Session ManagementAuto-save at each step; multi-client session list with resume, delete, and erase options
Learning ModeCA corrections converted to persistent rules; applied in future runs
Data Residency100% local; no client data stored on any external server


6. Technical Setup

Prerequisites

  1. Python 3.8 or higher
  2. Gemini API key -- Google AI Studio, free tier sufficient for classification volume
  3. Gmail App Password for email functions (optional)

Core Python Dependencies

streamlit, pandas, pdfplumber, pytesseract, Pillow, openpyxl, python-dotenv, google-generativeai

Module Structure

ModuleResponsibility

app.pyOrchestration shell, 7-step workflow, DPDP PII redaction utilities, AI transmission log
classification.py112-rule engine, RULE_COLUMNS, user rule persistence
ai_classifier.pyGemini AI classification, learn_from_correction, get_classification_stats
classifier.pyclassify_transactions core function
narration_parser.pyPayee/payer extraction from Indian bank narration formats
parser.pyPDF/Excel/CSV extraction, OCR fallback, SUPPORTED_BANKS profiles
tally_import.pyTally voucher and Trial Balance import, GROUP_TO_HEAD mapping
reports.pybuild_financial_statement_pack, Trial Balance builders, export_to_excel
review.pyReview interface, bulk edit, build_review_template, apply_review_decisions
audit_agents.pyMasterAuditController, per-domain audit agents, AuditFinding
audit_compliance.pyAUDIT_AREAS, run_all_audit_checks, export_audit_compliance_report
rules_engine.pyClassification tree, persist_rule, export_rules_snapshot
session_manager.pyMulti-client session save, load, list, delete
email_client.pysend_review_email, send_statements_email, Gmail SMTP, DPDP notice
entity_profile.pyCompany/board/auditor profile, build_entity_cover_dataframe
navigation.pyProgress tracker, step gating, advance_to_step
theme.pyUI components, HTML financial statement rendering, build_print_html


Launch command: streamlit run app.py -- opens at http://localhost:8501. No cloud account required for the main classification and report generation workflow.


7. Key Benefits and Impact

MetricBeforeAfter

Classification time per client (800-1,200 transactions)2 to 3 working daysUnder 1 hour
Financial statement assembly2 to 4 additional hoursUnder 3 minutes
Prior year comparison columnManual -- open last year's file, copy figuresAuto-populated from opening balances
Classification consistencyVaries with fatigue and deadline pressureIdentical 112-rule set applied to every row
Client review processPhone calls and WhatsApp messagesStructured Excel review file via email from within platform
DPDP Act complianceNo documented basis for cloud AI useFull PII redaction + API transmission log per session
Audit observationsPrepared separately after finalisationAuto-generated in Step 7 with section references
Bank format supportEach bank required manual re-formattingAuto-detected for all major Indian banks
Data securityClient data uploaded to cloud AI toolsAll data remains on the CA's local machine


8. Compliance and Regulatory Basis

Regulatory ReferenceApplication in Platform

DPDP Act, 2023 -- Sections 5, 6, 8(4)PII redaction before all AI API calls; DPDP compliance log in Step 7; purpose limitation enforced by design
Companies Act, 2013 -- Schedule IIIBalance Sheet and P&L generated in Schedule III Division I format; Notes to Accounts with reference numbers
Companies Act, 2013 -- Sections 134, 135, 184, 188Companies Act audit agent checks in Step 7 (Board's Report, CSR, Director interest, Related Party Transactions)
Income Tax Act, 1961 -- Section 40A(3)Cash payment limit check (Rs. 10,000 per person per day) in Income Tax audit agent
CGST Act, 2017 -- Section 16ITC eligibility and 180-day payment verification in GST audit agent
EPF and MP Act, 1952PF deduction and employer contribution verification in Labour Law audit agent
ICAI Accounting StandardsEntity profile field for AS vs Ind AS selection; framework-aware classification and reporting
ICAI Standards on AuditingSA area tab in Step 7 audit compliance output


9. Sample Classification Output

FieldExample 1 (Rule Match)Example 2 (AI Assist)

Raw NarrationUPI/412938761234/GSTCPIN20250113/CBECNEFT/AXIB0001234/REF8872991
After PII RedactionNot applicable -- rule matchNEFT/[IFSC]/REF[ACCT]
Primary HeadLiabilitiesExpenses (flagged for review)
Sub HeadTax PayablesOperational
Detailed CategoryGST Payment -- Output TaxVendor Payment -- Verify Purpose
Rule / Methodgst_cpin_tier1Gemini AI -- Pattern Engine
Confidence Score0.970.58
Review FlagOKReview


10. DPDP Act 2023 -- AI Processing Log (Sample Entry)

Log FieldValue

Timestamp2026-05-22 14:32:17
Call Typesingle_transaction
Records Sent1
PII RedactedYes -- UPI IDs, Mobile Numbers, Account Numbers, IFSC Codes, Aadhaar
AI ProviderGoogle Gemini
Modelgemini-2.0-flash (auto-selected from available models)
PurposeTransaction classification for accounting
Fields SentRedacted narration, Amount, Direction, Entity nature
Fields WithheldEntity name, Client name, Raw UPI IDs, Mobile numbers, Account numbers


This log is exportable from Step 7 and constitutes the firm's documented record of data processing activities under Section 8(4) of the Digital Personal Data Protection Act, 2023.


11. Conclusion and Future Roadmap

Bank Statement to Balance Sheet v2.0 automates the most time-consuming segment of small-client accounting engagements for Indian CA practice -- transaction classification and financial statement preparation -- by combining a deterministic 112-rule engine with Gemini AI as a controlled fallback, all within a DPDP Act 2023 compliant architecture where no client data leaves the CA's machine.


The platform demonstrates that AI can be deployed in CA practice without compromising data sovereignty, client confidentiality, or the CA's professional judgment. Every classification is reviewable and overridable by the CA. Every AI call is logged with field-level disclosure. Every financial statement requires the CA's explicit confirmation before generation.


Planned additions for v3.0 include:

  1. GSTR-1 and GSTR-3B reconciliation against classified income rows
  2. Form 26AS and AIS reconciliation against TDS-related bank entries
  3. Direct Tally XML import and Tally-compatible voucher export
  4. ITR-4 (44AD) computation from classified data for eligible assesses
  5. WhatsApp integration for client review workflow
  6. PDF digital signature on exported financial statements


This platform embodies ICAI's vision for AI-assisted CA practice: technology that amplifies the Chartered Accountant's judgment, handles scale and repetition, builds statutory compliance into every output, and keeps professional control, data sovereignty, and ethical responsibility firmly with the CA throughout the engagement.