Bank statements Analyser - Bank Audit NPA Tracker
Author : CA. Karan Panjwani
1. Overview & Objective
NPA Lens v8.0 is an AI-powered forensic audit engine built for Indian banking institutions. It automates the classification of loan accounts under RBI's Income Recognition & Asset Classification (IRAC) norms — processing bank statement PDFs of any quality (digital, scanned, photographed, or grainy) and delivering a complete NPA audit report in seconds.
Key Outcome: A 50-account NPA audit that takes a team 2–3 days manually is completed in under 30 minutes — with higher consistency, full audit traceability, and publication-ready observation notes.
2. Target Audience
User Group Primary Use
Statutory Auditors (CA Firms) Bulk NPA verification & formal audit note generation for annual bank audits
Concurrent Auditors Early NPA detection and irregular account flagging during ongoing audits
Credit / Loan Officers Account health review for loan renewal, recall, or restructuring decisions
Internal Audit Departments Portfolio-wide NPA scanning and pre-statutory audit preparation
RBI / NABARD Inspectors Independent classification cross-check during regulatory inspections
3. Problem Statement
NPA classification under RBI IRAC norms is deceptively complex. While the regulation appears simple — 'interest or principal overdue for more than 90 days' — the practical audit process involves multiple compounding challenges:
• Document Quality: Statements arrive as digital PDFs, low-resolution scans, or mobile photographs, making data extraction error-prone.
• IRAC Complexity: Agricultural accounts follow entirely different overdue timelines (18–24 months) based on crop season — requiring specialist knowledge.
• Wash Transaction Detection: Artificial interest credits that are immediately reversed (contra entries) must be identified and excluded from genuine repayment calculations.
• Volume: A single concurrent audit may involve 200–500 loan files. Manual processing at 30–60 minutes per file is operationally unsustainable.
• Audit Note Quality: Writing RBI IRAC-compliant observation paragraphs requires deep regulatory expertise; quality varies widely across audit teams.
HEALTH WARNING -
4. AI Methodology & Process Flow
NPA Lens v8.0 uses a 4-Layer extraction and classification pipeline. Each layer has an automatic fallback to maximise data recovery even from severely degraded documents.
4.1 Document Ingestion & Pre-Processing (Layers 1 & 2)
• PDF → Image: Each page converted to 300 DPI images via Poppler, with fallback to pdf2image / pymupdf.
• Document Type Detection: Auto-classifies as digital, photo (uneven lighting), grainy scan, or clean scan.
• Sauvola Adaptive Binarization: Per-pixel threshold based on local mean and std-dev — preserves thin character strokes in bank table cells far better than global Otsu thresholding.
• Shadow Removal & Denoising: Uneven lighting correction for mobile-photographed documents; Fast NLM denoising calibrated per document type.
• Grid Detection: Morphological H+V kernel analysis isolates table ruling lines to help OCR understand row boundaries.
• Auto-Deskew & Resize: Corrects tilt up to ±45°; normalises to ≤1800px for consistent OCR performance.
4.2 Zonal OCR Engine (Layer 2) — 4 Backends, Auto-Fallback
Tier Engine Notes Install
1 RapidOCR (PP-OCRv4) Best accuracy. ONNX runtime. Works on all platforms, no GPU needed. pip install rapidocr-onnxruntime
2 EasyOCR Deep learning OCR with GPU support. Excellent on degraded documents. pip install easyocr
3 Tesseract PSM-6 Reliable open-source OCR. Optimised for uniform block text. pip install pytesseract
4 Tesseract Full-page Offline last-resort fallback. No API dependency required. (bundled)
4.3 Claude AI Reasoning (Layer 3) — Extraction Paths
• pdfplumber extracts raw text → Claude parses it into structured JSON (date, narration, withdrawal, deposit, balance). PATH A — Digital PDFs:
• Zonal OCR produces noisy text → Claude reconciles ambiguities, aligns columns, returns clean JSON. PATH B — Scanned/Photo:
• Preprocessed binary image sent directly to Claude Vision for end-to-end extraction when text paths fail. PATH C — Vision Fallback:
• Tesseract full-page only. No API required. Lower accuracy on scanned documents. PATH D — Offline:
4.4 RBI IRAC Classification Engine
After extraction, the IRAC engine applies the complete ruleset:
• Non-Agricultural (90-Day Rule): Identifies date when cumulative unserviced interest first exceeded 90 consecutive days, using FIFO credit allocation against interest debits.
• Wash / Contra Detection: Same-day equal debit-credit pairs flagged and excluded from genuine credit calculations.
• Agricultural Accounts: Short-duration crops → NPA after Overdue Date + 18–24 months (crop seasons). Long-duration crops → configurable additional season months.
• Asset Sub-Classification: Substandard (SS < 12m) → Doubtful 1 (12–24m) → Doubtful 2 (24–48m) → Doubtful 3 (48m+) → Loss Asset (D3 + zero credits >12m).
• Account Health Assessment: Flags accounts as Healthy / Irregular based on credit flow regularity, debit-credit ratio, and maximum inter-credit gap (>45 days = Irregular).
• AI Audit Note: Claude generates a 3–5 sentence formal RBI IRAC observation paragraph per account — citing specific dates, rupee amounts, and regulatory provisions.
5. Application Features
5.1 Single File Mode
• Upload one bank statement PDF; configure loan type (Agri / Non-Agri) and crop parameters.
• Extraction Log showing which OCR/AI path was used and per-layer quality notes.
• Executive Dashboard: Asset Category · Classification (NPA / Standard) · Overdue & NPA Dates · Account Health.
• Financial Aggregates: Total Credits · Debits · Interest Charged · Genuine Credits.
• Unserviced Interest Analysis: Amount outstanding · Serviced amount · Servicing Ratio (%) · Max Days Unserviced.
• Colour-coded transaction ledger with interest rows, wash flags, and contra entries highlighted.
• AI Audit Note + individual account CSV / Excel export.
5.2 Bulk Mode (2–5 Files)
• Upload multiple PDFs; progress bar with per-file status updates.
• Aggregate dashboard: Total Files · NPA Count · Standard Count · Pass Rate (%).
• NPA Accounts Tab and All Accounts Tab for portfolio-level drill-down.
• One-click consolidated NPA Audit Report export (CSV + Excel). Account numbers auto-masked to last 4 digits.
5.3 Sidebar Configuration
• Anthropic API Key entry with live connection test · Audit Date · Loan Type toggle.
• Agricultural settings: Short / Long Duration crop · Multicrop flag · Long Duration months.
• Vision Override: Force Claude Vision mode regardless of document quality.
6. AI Prompts Architecture
Three Claude prompts power the pipeline, each with a distinct system role:
Prompt 1 — Ledger Extraction (Path A): System: bank statement extraction specialist. Task: parse raw pdfplumber text into JSON array with fields date, narration, withdrawal, deposit, balance. Handle Indian number format. Return valid JSON only.
Prompt 2 — OCR Reconciliation (Path B): System: expert at parsing bank ledger data from noisy OCR output. Task: fix OCR errors (0/O confusion, split numbers, misaligned columns), output clean JSON transaction array.
Prompt 3 — Audit Note Generation: System: senior Chartered Accountant writing formal audit observations under RBI IRAC norms. Task: given classification, NPA date, interest figures, and health status — write a 3–5 sentence formal paragraph citing specific dates, rupee amounts, and relevant RBI IRAC provisions.
7. Key Benefits & Impact
Benefit Description Impact
Speed End-to-end classification including AI audit note per file. 30 sec vs. 30–60 min manually
Accuracy FIFO interest allocation, wash detection, IRAC rules applied consistently. 100% rule consistency
Document Range Digital PDFs, scanned, photographed, and grainy copies handled. 4-path OCR + Claude Vision
Audit Trail Every decision includes reason string, NPA date, and financial aggregates. Full traceability
Regulatory Language AI audit notes use RBI IRAC terminology — publication ready. Consistent across audit staff
Cost Savings Reduces audit team size for NPA scanning; senior CAs focus on judgement. 60–70% time reduction
Data Privacy Account numbers auto-masked (last 4 digits) in all exports. PDPB-aligned handling
8. Technical Setup
Prerequisites & Install
• Python 3.8+ · Anthropic API key (console.anthropic.com) · Poppler (PDF-to-image conversion).
• pip install anthropic streamlit pandas numpy python-dateutil rapidocr-onnxruntime pillow opencv-python pytesseract pdfplumber openpyxl
Configuration & Launch
• API Key: Set ANTHROPIC_API_KEY environment variable, or paste in the hardcoded constant, or enter in the sidebar at runtime.
• Model: CLAUDE_MODEL = 'claude-sonnet-4-5' (default). Alternatives: claude-opus-4-5 (highest capability), claude-haiku-4-5-20251001 (fastest).
Launch: streamlit run npa_lens_v8.py → opens at http://localhost:8501
Offline Mode: Without an API key, the tool falls back to PATH D (Tesseract). NPA classification still runs; only AI Audit Note generation is disabled.
9. Sample AI-Generated Audit Note
Input: Classification: NPA · NPA Date: 15-Apr-2023 · Interest Debited: ₹48,250 · Genuine Credits: ₹12,000 · Health: Irregular (max credit gap 127 days)
The loan account has been classified as Non-Performing Asset (NPA) effective 15th April 2023, in accordance with RBI's Master Circular on Prudential Norms on Income Recognition, Asset Classification and Provisioning pertaining to Advances. Total interest debited aggregated to ₹48,250, against which genuine credits of only ₹12,000 were received, leaving unserviced interest of ₹36,250 — a servicing ratio of 24.9%, well below the standard classification threshold. The account exhibits an irregular credit pattern with a maximum inter-credit gap of 127 days, substantially in excess of the 45-day benchmark. Accordingly, the account is recommended for provisioning as a Substandard Asset at 15% of secured outstanding balance, and interest income accrued but not received should be reversed from the Profit & Loss Account per applicable income recognition norms.
10. Compliance & Regulatory Basis
Regulatory Reference Application in NPA Lens
RBI Master Circular — IRAC Norms Core 90-day classification rules for non-agricultural accounts
RBI Circular on Agricultural Advances Crop-season NPA timelines; last disbursal & overdue date computation
RBI Provisioning Norms SS / D1 / D2 / D3 / Loss sub-classification with provisioning rates
RBI Wash Transaction Clarifications Contra and wash entry exclusion from genuine credit calculations
NABARD/SBLC Guidelines Agri account indicators (Crop Cycles), multicrop season interpretation
Disclaimer: NPA Lens is a decision-support tool. All classifications must be reviewed by a qualified CA or authorised bank official before use in official filings or regulatory submissions.
11. Conclusion & Future Roadmap
NPA Lens v8.0 automates NPA classification for Indian banks and auditors by merging AI reasoning with a coded RBI IRAC rule engine. It converts days of manual auditing into minutes of error-free, traceable output.
Operational Growth: Moving toward direct Core Banking System (CBS) integration and specialized Agri-rule sets to eliminate manual data entry.
Proactive Credit Management: Shifting from retrospective reporting to Early Warning Signals (EWS) that flag stress before the 90-day default threshold.
Regulatory Signal: RBI's existing RBIA circular directives to larger UCBs and commercial banks have already laid the conceptual groundwork. Extension of these principles to asset quality prediction — from FY 2027–28 onwards — is widely anticipated by the Indian audit community.
Regulatory Future-Proofing: Architecting features for the RBI’s Risk-Based Internal Audit (RBIA) framework (effective April 2027), including behavioral risk scoring and predictive default analytics. (Built for today's IRAC obligations. Ready for tomorrow's RBIA mandate.