Intelligent GSTR-2B Reconciliation
Author: CA. Tapas Ruparelia
THE PROBLEM
Chartered Accountants and their team face significant challenges in reconciling the Purchase Register (PR) with GSTR-2B data downloaded from the GST Portal.
· Data Structure Mismatch: PR data (from Tally/SAP) often has different headers, formatting (commas in numbers), and date formats compared to the GST Portal.
· Format Limitations: The GSTR-2B Excel provided by the portal contains nested headers and merged cells, making standard VLOOKUP/Pivot Tables impossible without manual cleaning.
· Volume & Complexity: Manual matching fails when there are minor typos in Invoice Numbers (e.g., "INV/001" vs "INV-001") or small rounding differences, leading to unclaimed Input Tax Credit (ITC).
· Privacy Concerns: Clients are hesitant to upload sensitive financial data to third-party cloud SaaS platforms.
THE SOLUTION
A secure, locally hosted Python application that automates the end-to-end reconciliation process. The tool utilizes an intelligent "3-Pass Matching Engine" to identify matches not just by exact values, but by recognizing patterns, cleaning data anomalies, and applying user-defined tolerances
HOW IT WORKS
· Smart Data Ingestion:
o GSTR-2B JSON Parsing: Directly consumes the complex JSON file from the GST portal, eliminating the need to clean the portal's Excel file.
o Intelligent PR Reader: Auto-detects header rows in client data (e.g., skipping top rows in Tally exports) and sanitizes number formats (removing commas/spaces).
· Advanced Reconciliation Logic (The 3-Pass Engine):
o Pass 1 (Exact Match): Matches Invoice No + GSTIN + Taxable Value (within configurable ₹ tolerance) + Date (within configurable day tolerance).
o Pass 2 (Pattern AI): Normalizes invoice numbers (removes special characters, financial year suffixes etc ) to find fuzzy matches.
o Pass 3 (GSTIN Logic): Identifies cases where the Invoice Number matches but the GSTIN has a typo or State Code mismatch.
· Comprehensive Reporting:
Generates a single Excel file
with 4 sheets: Reconciliation (Side-by-side comparison), PR Only (Missing in 2B), 2B Only (Missing in PR), and GSTIN-wise Summary
TECH STACK
· Language: Python 3.13
· Frontend: Streamlit (for a clean, browser-based UI running locally)
· Data Processing: Pandas (High-performance data manipulation)
· Database: SQLite (For storing client-specific column mapping configurations)
· Deployment: Localhost (Ensures 100% Data Privacy)
5. Innovation & Impact
· Efficiency: Reduces reconciliation time for 2,000+ line items from hours to seconds.
· Accuracy: "Pattern Matching" logic catches valid ITC that manual eyes often miss due to formatting differences.
· User Experience: "One-time mapping" memory remembers column settings for each client, reducing repetitive work.
· Data Security: Runs entirely on the user's machine; no data ever leaves the system