Bank Statement to Balance Sheet - AI-Powered Transaction Classification and Financial Statement Preparation for Indian CA Practice
Author : CA. Krupanand Bammidi
Author : CA. Krupanand Bammidi
Bank Statement to Balance Sheet (v2.0) is a locally-running, AI-assisted accounting automation platform built specifically for Indian CA practice. It accepts bank statements in PDF, Excel, and CSV formats from all major Indian banks, classifies every transaction using a 112-rule deterministic engine combined with Gemini AI, and generates a complete set of financial statements -- Schedule III Balance Sheet, Profit and Loss Account, Notes to Accounts, and Trial Balance -- in a single session.
The platform is built on Python and Streamlit and runs entirely on the CA's local machine. No client data is transmitted to any external server during the main workflow. Where Gemini AI is invoked for ambiguous transactions, all personally identifiable information is automatically stripped from the narration before transmission, in compliance with the Digital Personal Data Protection Act, 2023.
Key outcome: A bank statement engagement that previously required two to three working days for transaction classification and financial statement preparation is completed in under one hour, with higher consistency, full audit traceability, and a DPDP-compliant AI processing log.
| User Group | Primary Use |
| Practising CAs (sole proprietors and small firms) | Bank statement processing, classification, Trial Balance and Schedule III generation for proprietary, partnership, LLP, and company clients |
| CA firms handling multiple SME clients | Batch processing across clients; rule sets carry over between sessions |
| Statutory Auditors | Multi-agent audit compliance check across Companies Act 2013, Income Tax Act 1961, GST Act 2017, and Labour Law |
| CA students and articleship trainees | Supervised classification with CA review and override at every step |
For a practising CA handling small and medium business clients without in-house bookkeepers, the year-end workflow is familiar: the client provides a bank statement, and the CA must convert it into a Trial Balance and financial statements. For a client with 800 to 1,200 transactions, this classification work consumed two to three working days per client -- time that could not be billed effectively and that created a bottleneck during the March-April filing season.
The narrations in Indian bank statements are rarely descriptive. A single day's transactions might read: "UPI/412938761234/PYTM," "NEFT/AXIB0001234/REF8872991," "NACH DR HDFC CREDILA," "INT DEB 250126." There is no merchant name, no purpose, no account description. Each narration requires a judgment call as to whether it represents a business expense, a capital item, a loan repayment, a statutory payment, or a personal drawing.
Classification errors have downstream consequences. A single misclassification -- a capital expenditure coded as a revenue expense, or a director's drawing coded as a salary -- flows into the Trial Balance, distorts the Profit and Loss, and produces a Balance Sheet that does not balance or does not correctly represent the financial position.
When evaluating cloud-based AI tools to assist with narration classification, a further constraint emerged. Passing raw bank statement data to any external server -- containing client UPI IDs, account numbers, mobile numbers, merchant identities, and transaction amounts -- raises obligations under the Digital Personal Data Protection Act, 2023. As a Data Fiduciary within the meaning of Section 2(i) of that Act, I could not satisfy myself that purpose limitation under Section 5, valid consent under Section 6, and the security safeguard obligations under Section 8(4) were being met by cloud-based AI tools. The solution had to maintain data locally.
Even after classification was complete, preparing Schedule III-compliant financial statements required additional manual work: mapping classified ledgers to the correct Schedule III heads, constructing the Notes to Accounts, adding the prior year comparison column from last year's file, verifying the Trial Balance, and populating the entity profile (CIN, FRN, director names, auditor details) across multiple pages. This assembly work added two to four hours per client, even when the classification itself was accurate.
The platform follows a seven-step sequential workflow with a visual progress tracker. Each step gates the next: the CA must confirm data before proceeding to classification, must confirm classification before proceeding to reports, and must finalise the review queue before generating financial statements. This gate structure ensures no data flows forward without the CA's active approval.
The platform accepts bank statements in three formats via tabbed upload panels: PDF, Excel (.xlsx/.xls), and CSV (.csv/.txt). For text-based PDF statements, extraction uses pdfplumber with bank-specific column profiles tuned for HDFC, ICICI, SBI, Axis, Kotak, IDFC FIRST, Bank of Baroda, and other major Indian banks. The system detects the bank format automatically from header patterns when Auto Detect is selected.
For scanned or image-based PDFs, the system detects insufficient text yield from pdfplumber and automatically invokes pytesseract OCR on rasterised page images at 300 DPI. For mixed documents -- part text, part scanned -- both engines run page-by-page and their outputs are merged before entering the pipeline. An MD5 hash signature prevents re-processing of unchanged files in the same session.
Extracted transactions are displayed in an editable data grid. The CA can correct dates, amounts, or narrations before classification begins. The running balance check validates that the balance column is arithmetically consistent with the debit and credit columns across all rows. Mismatches are displayed with expected balance, actual balance, and difference, enabling the CA to identify extraction errors before they propagate. The CA clicks Confirm Data to lock the cleaned dataset.
This is the core of the platform. Classification proceeds in three sequential layers.
Layer 1 -- Deterministic Rule Engine (112 rules, 10 priority tiers):
| Priority Tier | Coverage | Confidence |
| Tier 1 | Statutory Payment Identifiers: GST CPIN, TDS TAN/TRACES, PF TRRN, ESI challan, Professional Tax, Advance Tax challan | 0.95 to 0.98 |
| Tier 2 | Named Indian counterparties: electricity boards (BESCOM, TNEB, MSEDCL), telecom (Airtel, Jio, BSNL), insurers (LIC, Star Health, New India), gas utilities (Indane, HP Gas) | 0.90 to 0.96 |
| Tier 3 | Payroll and employee costs: salary, wages, director remuneration, bonus, PF/ESI employer contribution, gratuity | 0.88 to 0.94 |
| Tier 4 | Loan and finance: EMI, NACH debit, OD interest, FD maturity, dividend, interest income | 0.85 to 0.92 |
| Tier 5 | Business income sub-categories: exam fees, training, consultation, commission, rental, sales | 0.85 to 0.90 |
| Tier 6 | Operational expenses (25 rules): rent, medical supplies, staff welfare, travel, repairs, professional fees, marketing, software, courier, printing | 0.82 to 0.90 |
| Tier 7 | Bank charges: service charges, SMS, cheque book, RTGS/NEFT, minimum balance penalty | 0.88 to 0.95 |
| Tier 8 | Cash transactions: ATM withdrawals, cash deposits, counter withdrawals | 0.85 to 0.92 |
| Tier 9 | Inter-account transfers: own account, sweep-in, sweep-out | 0.88 to 0.93 |
| Tier 10 | Capital expenditure flags: large one-time debits to identified capex narration patterns | 0.78 to 0.88 |
Director Name Matching: After rule classification, the engine extracts director and signatory names from the entity profile (Director 1, Director 2, Managing Director, Chairman, CFO, Partner). Narrations in flagged rows that contain these name segments are reclassified to Director Remuneration with confidence 0.82 and a Review flag.
Layer 2 -- Pattern Engine (Gemini AI for low-confidence rows):
Transactions that fall below the CA-configurable confidence threshold (default 0.75, range 0.50 to 0.95) are passed to the AI classification module. Before any data leaves the local machine, the PII redaction function strips all personally identifiable information from each narration:
The Gemini model receives only anonymised narration text, transaction amount, debit/credit direction, and entity nature. It returns Primary Head, Sub Head, Detailed Category, confidence percentage, and a one-sentence reasoning note.
Layer 3 -- Confidence Adjustment and Flag Assignment:
After AI classification, confidence scores are adjusted upward where a valid primary head was returned and the raw score exceeded 0.60. Rows remaining below 0.72 receive Review Flag = Review. Rows classified into generic fallback categories (General Expense, UPI Payment - Verify Purpose, Unclassified Payment - Verify Purpose) are force-set to Review Flag = Review with confidence 0.42, ensuring the CA reviews all genuinely ambiguous rows.
Learning Mode: When enabled, high-confidence AI results (confidence >= 0.85, flag = OK) are stored as persistent user rules, applied in future classification runs for the same client.
Director Balance Sheet Reclassification: After initial classification, rows coded as Director Remuneration where the narration contains advance, drawings, personal expense, or loan to director keywords (and no salary indicators) are automatically reclassified to Assets / Loans and Advances -- Directors with confidence 0.91 and flag Auto-Reclassified.
The Step 3 interface has five tabs:
For clients who maintain books in Tally, the platform accepts Trial Balance exports or voucher exports in Excel, CSV, or JSON format. The format auto-detection identifies whether the uploaded file is a Trial Balance or a voucher register. For Trial Balance format, a ledger-to-account-head resolution engine maps each Tally ledger to a Schedule III Primary Head and Sub Head using two-level matching: (1) GROUP_TO_HEAD covering 23 standard Tally group names, and (2) LEDGER_KEYWORD_MAP covering approximately 80 ledger keyword patterns. Each ledger is converted into a synthetic classified row. Known Tally group aggregate rows are excluded to prevent double-counting. Small Client sessions skip Step 4 automatically.
The review queue presents all transactions with Review Flag = Review. Quick action controls: Confirm All, Skip All, Download Review Template (Excel), Upload Reviewed Template. The client email workflow allows the CA to dispatch a review email via Gmail SMTP containing an Excel file of flagged transactions with a DPDP Act 2023 compliant data notice. The client fills in the Purpose column and replies; the CA uploads the response file to apply corrections automatically.
Bulk Edit by narration keyword allows the CA to reclassify all transactions whose narration contains a specified text string in one operation. The Review Interface presents each queued transaction with a Review Action dropdown (Confirm Classification / Skip for Later / Pending) and inline override fields. Finalise Review blocks if any row is still Pending, and advances to Step 6 when all rows are either Confirmed or Skipped.
Dashboard KPI cards show Total Income, Total Expenses, and Net Profit / Loss. An Income Hierarchy Reasonableness Check runs three automated warnings before statement generation: (1) Other/Miscellaneous Income exceeding Operating Income (HIGH -- likely classification error); (2) Finance Costs exceeding Operating Expenses (MEDIUM -- loan principal likely miscoded as interest); (3) Total Expenses exceeding three times Revenue (MEDIUM -- capital items likely coded as operating expenses).
The extended Trial Balance Schedule shows Opening Debit/Credit, Period Debit/Credit, and Closing Debit/Credit for every ledger, with balance validation. Opening Balances and Adjustments editors allow entry of prior year closing balances and year-end journal entries (depreciation, provisions, accruals) not captured in the bank statement. Pre-populated with 37 standard ledger rows.
Prior Year Figures panel provides 24 line items (18 Balance Sheet, 6 Profit and Loss) for the previous year comparison column. The Copy from Opening Balances button auto-maps account names to Schedule III line items using a 60-entry account-to-key mapping dictionary.
Generate Financial Statements produces the full output pack:
| Output | Format |
| Balance Sheet | Schedule III Division I -- formatted as HTML with company header and auditor sign-off block |
| Statement of Profit and Loss | Schedule III -- formatted as HTML with prior year comparison column |
| Notes to Accounts | Reference numbers matching the Balance Sheet |
| Balance Sheet Ledger | Ledger-level detail underlying each Schedule III line |
| P&L Ledger Detail | Transaction-level detail underlying each P&L line |
| Trial Balance | Standard debit/credit Trial Balance with balance validation |
| Ledger Summary | Aggregated view by Primary Head and Sub Head |
| Ledger Scrutiny | Working paper flagging miscellaneous bucket entries |
| Expense Breakdown | Categorised expense detail for audit working papers |
| Excel Workbook | Cover sheet (entity profile), all data sheets, all statement sheets in one workbook |
| Print-ready HTML | Balance Sheet + P&L for browser print to PDF (Ctrl+P) |
| Financial Statements Email | Full Excel workbook dispatched to client via Gmail SMTP from within the platform |
The MasterAuditController runs a full statutory audit across five domains:
| Audit Agent | Coverage |
| Companies Act Agent | CARO 2020 reporting requirements, Section 134 (Board's Report), Section 135 (CSR if applicable), Section 184 (Director interest disclosure), Section 188 (Related Party Transactions) |
| Income Tax Agent | Cash payments exceeding Rs. 10,000 to a single person in a day (Section 40A(3), Income Tax Act, 1961), TDS deduction verification, advance tax adequacy assessment |
| GST Agent | Input Tax Credit eligibility under Section 16, CGST Act, 2017 including 180-day payment verification, RCM applicability checks |
| Labour Law Agent | PF compliance (EPF and MP Act, 1952), ESI compliance, Bonus Act applicability, Gratuity Act provisions |
| Financial Analysis Agent | Key financial ratios, anomaly detection, variance analysis |
Results are displayed across five severity tabs (Critical / High / Medium / Low / Info) and five area tabs (Companies Act / Income Tax / GST / Standards on Auditing / Others). Each finding includes Category, Severity, Section Reference, Description, Amount, and Recommendation. Results persist to audit_results.json for session continuity. The DPDP Act 2023 AI Data Processing Log at the bottom of Step 7 records every Gemini API call with full field-level disclosure.
| Feature | Detail |
| Supported Input Formats | PDF (text and scanned via OCR), Excel (.xlsx/.xls), CSV (.csv/.txt) |
| Supported Banks | HDFC, SBI, ICICI, Axis, Kotak, IDFC FIRST, Bank of Baroda, and others via Auto Detect |
| Classification Engine | 112 rules across 10 priority tiers; Gemini AI Pattern Engine for low-confidence rows |
| Entity Formats | Schedule III Company, NCE (Proprietorship/Partnership), LLP, NPO/Trust/NGO |
| Accounting Frameworks | AS (Accounting Standards), Ind AS (Indian Accounting Standards) |
| Output Statements | Balance Sheet, P&L, Notes to Accounts, Trial Balance, Ledger Scrutiny, Expense Breakdown |
| Prior Year Column | Auto-populated from opening balances or manual entry; 24 line items |
| Client Email | Flagged transaction review file and final financial statements via Gmail SMTP from within the platform |
| Audit Compliance | Multi-agent check across Companies Act, Income Tax, GST, Labour Law |
| DPDP Compliance | PII redaction before every AI call; full API transmission log in Step 7 |
| Session Management | Auto-save at each step; multi-client session list with resume, delete, and erase options |
| Learning Mode | CA corrections converted to persistent rules; applied in future runs |
| Data Residency | 100% local; no client data stored on any external server |
streamlit, pandas, pdfplumber, pytesseract, Pillow, openpyxl, python-dotenv, google-generativeai
| Module | Responsibility |
| app.py | Orchestration shell, 7-step workflow, DPDP PII redaction utilities, AI transmission log |
| classification.py | 112-rule engine, RULE_COLUMNS, user rule persistence |
| ai_classifier.py | Gemini AI classification, learn_from_correction, get_classification_stats |
| classifier.py | classify_transactions core function |
| narration_parser.py | Payee/payer extraction from Indian bank narration formats |
| parser.py | PDF/Excel/CSV extraction, OCR fallback, SUPPORTED_BANKS profiles |
| tally_import.py | Tally voucher and Trial Balance import, GROUP_TO_HEAD mapping |
| reports.py | build_financial_statement_pack, Trial Balance builders, export_to_excel |
| review.py | Review interface, bulk edit, build_review_template, apply_review_decisions |
| audit_agents.py | MasterAuditController, per-domain audit agents, AuditFinding |
| audit_compliance.py | AUDIT_AREAS, run_all_audit_checks, export_audit_compliance_report |
| rules_engine.py | Classification tree, persist_rule, export_rules_snapshot |
| session_manager.py | Multi-client session save, load, list, delete |
| email_client.py | send_review_email, send_statements_email, Gmail SMTP, DPDP notice |
| entity_profile.py | Company/board/auditor profile, build_entity_cover_dataframe |
| navigation.py | Progress tracker, step gating, advance_to_step |
| theme.py | UI components, HTML financial statement rendering, build_print_html |
Launch command: streamlit run app.py -- opens at http://localhost:8501. No cloud account required for the main classification and report generation workflow.
| Metric | Before | After |
| Classification time per client (800-1,200 transactions) | 2 to 3 working days | Under 1 hour |
| Financial statement assembly | 2 to 4 additional hours | Under 3 minutes |
| Prior year comparison column | Manual -- open last year's file, copy figures | Auto-populated from opening balances |
| Classification consistency | Varies with fatigue and deadline pressure | Identical 112-rule set applied to every row |
| Client review process | Phone calls and WhatsApp messages | Structured Excel review file via email from within platform |
| DPDP Act compliance | No documented basis for cloud AI use | Full PII redaction + API transmission log per session |
| Audit observations | Prepared separately after finalisation | Auto-generated in Step 7 with section references |
| Bank format support | Each bank required manual re-formatting | Auto-detected for all major Indian banks |
| Data security | Client data uploaded to cloud AI tools | All data remains on the CA's local machine |
| Regulatory Reference | Application in Platform |
| DPDP Act, 2023 -- Sections 5, 6, 8(4) | PII redaction before all AI API calls; DPDP compliance log in Step 7; purpose limitation enforced by design |
| Companies Act, 2013 -- Schedule III | Balance Sheet and P&L generated in Schedule III Division I format; Notes to Accounts with reference numbers |
| Companies Act, 2013 -- Sections 134, 135, 184, 188 | Companies Act audit agent checks in Step 7 (Board's Report, CSR, Director interest, Related Party Transactions) |
| Income Tax Act, 1961 -- Section 40A(3) | Cash payment limit check (Rs. 10,000 per person per day) in Income Tax audit agent |
| CGST Act, 2017 -- Section 16 | ITC eligibility and 180-day payment verification in GST audit agent |
| EPF and MP Act, 1952 | PF deduction and employer contribution verification in Labour Law audit agent |
| ICAI Accounting Standards | Entity profile field for AS vs Ind AS selection; framework-aware classification and reporting |
| ICAI Standards on Auditing | SA area tab in Step 7 audit compliance output |
| Field | Example 1 (Rule Match) | Example 2 (AI Assist) |
| Raw Narration | UPI/412938761234/GSTCPIN20250113/CBEC | NEFT/AXIB0001234/REF8872991 |
| After PII Redaction | Not applicable -- rule match | NEFT/[IFSC]/REF[ACCT] |
| Primary Head | Liabilities | Expenses (flagged for review) |
| Sub Head | Tax Payables | Operational |
| Detailed Category | GST Payment -- Output Tax | Vendor Payment -- Verify Purpose |
| Rule / Method | gst_cpin_tier1 | Gemini AI -- Pattern Engine |
| Confidence Score | 0.97 | 0.58 |
| Review Flag | OK | Review |
| Log Field | Value |
| Timestamp | 2026-05-22 14:32:17 |
| Call Type | single_transaction |
| Records Sent | 1 |
| PII Redacted | Yes -- UPI IDs, Mobile Numbers, Account Numbers, IFSC Codes, Aadhaar |
| AI Provider | Google Gemini |
| Model | gemini-2.0-flash (auto-selected from available models) |
| Purpose | Transaction classification for accounting |
| Fields Sent | Redacted narration, Amount, Direction, Entity nature |
| Fields Withheld | Entity name, Client name, Raw UPI IDs, Mobile numbers, Account numbers |
This log is exportable from Step 7 and constitutes the firm's documented record of data processing activities under Section 8(4) of the Digital Personal Data Protection Act, 2023.
Bank Statement to Balance Sheet v2.0 automates the most time-consuming segment of small-client accounting engagements for Indian CA practice -- transaction classification and financial statement preparation -- by combining a deterministic 112-rule engine with Gemini AI as a controlled fallback, all within a DPDP Act 2023 compliant architecture where no client data leaves the CA's machine.
The platform demonstrates that AI can be deployed in CA practice without compromising data sovereignty, client confidentiality, or the CA's professional judgment. Every classification is reviewable and overridable by the CA. Every AI call is logged with field-level disclosure. Every financial statement requires the CA's explicit confirmation before generation.
Planned additions for v3.0 include:
This platform embodies ICAI's vision for AI-assisted CA practice: technology that amplifies the Chartered Accountant's judgment, handles scale and repetition, builds statutory compliance into every output, and keeps professional control, data sovereignty, and ethical responsibility firmly with the CA throughout the engagement.