GST FraudShield
Author : CA Shruti Dang
| A | INTRODUCTION |
About the Tool
GST FraudShield is a privacy-conscious, AI-assisted GST audit and fraud-risk intelligence platform built for Chartered Accountants, tax officers, and compliance professionals operating under India's Goods and Services Tax framework. The platform accepts GSTR-1 return files, extracts GSTIN-wise transaction intelligence, enriches supplier profiles through a four-tier cascade (live GST portal lookup, AI inference, cache, and HSN-based analysis), and scores each supplier against a deterministic, explainable fraud rule engine.
Every risk flag comes with a rule code, severity label, and plain-English explanation — making results defensible in audit proceedings. The system runs as both a browser-based web dashboard and a full-featured Electron desktop application.
Key Objectives
Target Users
| B | PROBLEM STATEMENT |
Pain Point 1: GST Fraud Detection is Reactive, Not Proactive
India loses an estimated Rs. 1 lakh crore annually to GST fraud, primarily through fake invoice networks, bogus ITC claims, and shell GSTINs. Current audit workflows rely on manual scrutiny of returns — a process that is slow, inconsistent, and scales poorly across the 1.4 crore+ active GSTINs on the portal. By the time fraud is detected, credits have already been availed and offenders may have cancelled registrations.
Pain Point 2: GSTIN Enrichment is Fragmented and Time-Consuming
Verifying a counterparty GSTIN requires navigating the GST portal manually, copying legal names, checking registration status, and noting HSN/dealing-in codes. For an auditor reviewing 500+ suppliers, this is days of work. There is no free tool that automates this enrichment pipeline with fallback intelligence.
Pain Point 3: Fraud Rule Outputs Lack Explainability
Black-box ML fraud scores are not actionable in audit proceedings. Assessees demand rule-level justification for every risk flag raised. GST FraudShield addresses this with a fully deterministic, explainable rule engine where every flag carries a code, score, and plain-English rationale.
| C | TECHNOLOGICAL SOLUTION |
Application Architecture
GST FraudShield follows a modular pipeline architecture that cleanly separates file parsing, GSTIN enrichment, fraud rule evaluation, and presentation layers. The system runs in two modes: a web backend dashboard served via Express.js, and a full-featured Electron desktop application with live GST portal automation.
| # | Module | Description |
| 1 | File Parser | Accepts GSTR-1 JSON/Excel; extracts invoices, GSTINs, HSN codes, tax values |
| 2 | GSTIN Enrichment | Four-tier lookup: Cache → Proxy → AI (Anthropic) → HSN Inference |
| 3 | Business Inference | Predicts industry from top HSN chapters; assigns confidence score |
| 4 | Fraud Rule Engine | Applies 5 deterministic rules; produces per-GSTIN score, level, flags |
| 5 | Dashboard / Export | Renders summary cards, risk table, flag list; exports Excel/CSV/PDF/Word |
Technology Stack
| Component | Technology |
| Backend API | Node.js + Express.js (port 3020) |
| Desktop Shell | Electron + electron-builder (Windows / macOS / Linux) |
| Portal Automation | Puppeteer — headless Chromium, pool of 5 browser pages |
| Real-time Comms | WebSocket (ws library, port 3018) |
| Local Cache | NeDB-Promises — embedded JSON database, 7-day TTL |
| AI Enrichment | Anthropic Claude API (optional, configured via API key) |
| Export | Excel, CSV, JSON, Print/PDF, Word (docx) |
| D | CORE MODULES & FEATURES |
Module 1: GST File Parser
Accepts GSTR-1 style JSON files and normalises them into a structured transaction dataset. Supports B2B invoices, credit/debit notes (CDNR), and HSN summary blocks. For every invoice line the parser extracts:
Module 2: Four-Tier GSTIN Enrichment Engine
For every unique counterparty GSTIN encountered, the system attempts enrichment through a prioritised four-tier cascade:
| Tier | Source | Method | Trigger |
| 1 | Cache | NeDB local store (7-day TTL) | Always checked first |
| 2 | GST Portal Proxy | Puppeteer → services.gst.gov.in | Cache miss |
| 3 | AI Lookup | Anthropic Claude API | Proxy unavailable |
| 4 | HSN Inference | Invoice HSN chapter → industry map | All external sources fail |
Enriched attributes include: legal name, trade name, GST status, registration date, address, HSN/dealing-in codes, source, confidence score, and predicted industry.
Module 3: HSN-Based Business Profile Inference
When external enrichment is unavailable, the inference engine analyses the HSN codes across all invoices for a GSTIN, weights them by taxable value, and maps the dominant HSN chapters to an industry category. The top 5 HSN codes by value are selected and a confidence score derived from their dominance ratio. Mapped industries: Electronics, Automobile, Textile, Chemicals, Metal, Pharma, Food, Plastics, Paper, Services.
Module 4: Deterministic Fraud Rule Engine
The centrepiece of GST FraudShield. Five rule categories are applied to every counterparty GSTIN. Scores are additive and capped at 100. Risk level: HIGH (>=60), MEDIUM (>=30), LOW (<30).
| Rule Code | Description | Score | Severity | Trigger Condition |
| UNKNOWN_GSTIN | GSTIN not found or unresolved | 20 | HIGH | Enrichment missing or low-confidence |
| HSN_MISMATCH | Invoice HSN differs from portal profile | 30 | HIGH | No 2-digit, 4-digit, or exact match |
| HIGH_CONCENTRATION | Single GSTIN dominates total value | 15 | MED/HIGH | >=40% share (HIGH if >=60%) |
| INVOICE_SPLITTING | Many small invoices instead of one large | 10 | MEDIUM | >=8 invoices, avg < Rs.50K, total >= Rs.5L |
| ROUND_VALUE_PATTERN | High proportion of round-value lines | 5 | LOW | >=60% round values, >=3 transactions |
| E | HOW IT WORKS — END-TO-END WORKFLOW |
Step-by-Step Processing Pipeline
| # | Step | Description |
| 1 | Upload | User selects GSTR-1 JSON or Excel files. Multiple files can be loaded and merged into a single audit session. |
| 2 | Parse | The file parser validates GSTIN formats, normalises transaction rows across B2B, CDNR, and HSN blocks. |
| 3 | Enrich | Every unique counterparty GSTIN is passed through the four-tier enrichment cascade. Results are cached for 7 days. |
| 4 | Infer | For GSTINs with no external data, the HSN inference engine predicts industry and business profile. |
| 5 | Score | The fraud rule engine applies all five rules. Scores are aggregated and capped at 100. Risk level is assigned. |
| 6 | Review | Dashboard renders summary cards, GSTIN-wise risk table, fraud flag list, and plain-English explanations. |
| 7 | Export | One-click export generates Excel, CSV, JSON, PDF, and Word documents for audit workpapers. |
Dual Deployment Modes
| F | BENEFITS, IMPACT & DIFFERENTIATORS |
Challenge vs Solution Mapping
| Challenge | How GST FraudShield Solves It |
| Manual GSTIN verification takes hours per entity | Four-tier auto-enrichment with caching completes hundreds of GSTINs in seconds |
| HSN mismatches go undetected in manual audits | Automated HSN comparison (2-digit, 4-digit, exact) flags every mismatch |
| Invoice splitting and round-value patterns invisible in raw data | Rule engine detects both patterns with configurable thresholds |
| Fraud scores lack explainability for audit proceedings | Every flag carries a rule code, severity, score, and plain-English reason |
| GST portal inaccessible for bulk programmatic lookup | Puppeteer proxy with page pooling handles portal barriers gracefully |
| No fallback when portal is down or GSTIN is new | AI + HSN inference provides enrichment even without portal data |
| Reporting requires separate tools | Built-in multi-format export: Excel, CSV, JSON, PDF, Word |
Quantified Benefits
| Dimension | Impact |
| Speed | Audit pipeline completes in seconds for hundreds of GSTINs vs. days of manual work |
| Accuracy | Multi-source enrichment with confidence scoring reduces false negatives in fraud detection |
| Scale | Handles large multi-file GSTR-1 datasets without performance degradation |
| Privacy | Local cache means financial data stays on-premises unless AI enrichment is explicitly enabled |
| Coverage | Five rule categories covering the most prevalent GST fraud patterns recognised by tax authorities |
| Cost | Open-source stack with zero per-use licensing cost; deployable on any standard laptop or server |
Unique Differentiators
| G | DEPLOYMENT, LIMITATIONS & ROADMAP |
Deployment Options
Current Limitations
Future Roadmap
| Feature | Description |
| GSTR-2B Integration | Full reconciliation of purchase register against GSTR-2B for ITC risk detection alongside fraud scoring |
| ML Anomaly Layer | Complement rule-based scoring with an unsupervised anomaly detection model trained on GST return patterns |
| Network Graph Analysis | Visualise GSTIN-to-GSTIN transaction networks to detect circular trading and accommodation entry chains |
| Multi-Client Dashboard | CA firm portal to load, switch, and compare audit results across multiple client GSTINs in one session |
| GSTR-3B Pre-fill | Auto-populate GSTR-3B Table 4 values from reconciled data with one-click export to the filing portal |
| WhatsApp/Email Alerts | Automated compliance reminders to suppliers with pending or late GST filings from vendor scorecard |
| Tally/ERP Integration | Direct data pull via ODBC from Tally Prime and SAP, eliminating the export-import step |
| H | CONCLUSION |
GST FraudShield represents a practical, deployable answer to one of India's most pressing tax compliance challenges: the detection of fraudulent invoice networks and bogus ITC claims within the GST ecosystem. By combining structured file parsing, automated GSTIN enrichment, AI-assisted inference, and a fully explainable rule-based fraud engine, the platform delivers audit-grade intelligence in seconds — at a cost accessible to every CA practice.
The architecture is deliberately modular and open. Parser, enrichment, inference, and fraud modules are independently maintained and testable. New fraud rules can be added without touching the data pipeline. New enrichment sources can be plugged in without disrupting the rule engine — ensuring GST Fraud Shield evolves with India's GST law and portal landscape.
The dual-mode deployment model — browser dashboard for shared access, Electron desktop for privacy-first offline use — ensures the tool fits diverse CA practice environments. Built-in multi-format export means audit findings are immediately usable in compliance filings, client reports, and regulatory submissions.
The vision is straightforward: every tax professional in India — whether an independent CA or a GST intelligence officer — should have access to the same quality of fraud detection intelligence that today requires specialized teams and expensive enterprise software. GST Fraud Shield is that equalizer.