GST FraudShield
AI & Accounting

GST FraudShield

Author : CA Shruti Dang

Watch on Youtube
AINTRODUCTION


About the Tool

GST FraudShield is a privacy-conscious, AI-assisted GST audit and fraud-risk intelligence platform built for Chartered Accountants, tax officers, and compliance professionals operating under India's Goods and Services Tax framework. The platform accepts GSTR-1 return files, extracts GSTIN-wise transaction intelligence, enriches supplier profiles through a four-tier cascade (live GST portal lookup, AI inference, cache, and HSN-based analysis), and scores each supplier against a deterministic, explainable fraud rule engine.

Every risk flag comes with a rule code, severity label, and plain-English explanation — making results defensible in audit proceedings. The system runs as both a browser-based web dashboard and a full-featured Electron desktop application.

Key Objectives

  1. Automate GSTIN-level fraud risk scoring from raw GSTR-1 JSON or Excel uploads.
  2. Identify suspicious patterns: unknown GSTINs, HSN mismatches, invoice splitting, high transaction concentration, and round-value manipulation.
  3. Enrich every counterparty GSTIN using a four-tier lookup: Cache → GST Portal Proxy → Anthropic AI → Internal HSN Inference.
  4. Deliver fully explainable results — every risk flag maps to a named rule, score, and severity level — suitable for audit workpapers.
  5. Protect data privacy: the modular backend runs locally; no transaction data is transmitted to third-party services beyond optional AI enrichment.
  6. Provide one-click export in Excel, CSV, JSON, PDF, and Word for filing and reporting workflows.

Target Users

  1. Practising Chartered Accountants conducting GST audits and ITC verification.
  2. In-house tax and finance teams at mid-to-large Indian businesses.
  3. GST Intelligence officers performing risk-based audit selection.
  4. Tax technology teams at CA firms and fintech startups building compliance workflows.



BPROBLEM STATEMENT


Pain Point 1: GST Fraud Detection is Reactive, Not Proactive

India loses an estimated Rs. 1 lakh crore annually to GST fraud, primarily through fake invoice networks, bogus ITC claims, and shell GSTINs. Current audit workflows rely on manual scrutiny of returns — a process that is slow, inconsistent, and scales poorly across the 1.4 crore+ active GSTINs on the portal. By the time fraud is detected, credits have already been availed and offenders may have cancelled registrations.

  1. No existing lightweight tool integrates live GST portal enrichment with rule-based fraud scoring in a single pipeline.
  2. Manual checks on invoice patterns (round-value clustering, splitting, concentration) require days of spreadsheet analysis per entity.
  3. HSN code mismatches — a key fraud indicator — go undetected unless the auditor cross-references the portal profile individually.

Pain Point 2: GSTIN Enrichment is Fragmented and Time-Consuming

Verifying a counterparty GSTIN requires navigating the GST portal manually, copying legal names, checking registration status, and noting HSN/dealing-in codes. For an auditor reviewing 500+ suppliers, this is days of work. There is no free tool that automates this enrichment pipeline with fallback intelligence.

  1. Portal barriers (CAPTCHAs, rate limits) make bulk lookup impractical without automation.
  2. No tool combines portal lookup with AI-based inference as a fallback for unavailable GSTINs.
  3. HSN-based business profiling — crucial for verifying supplier legitimacy — requires manual industry expertise today.

Pain Point 3: Fraud Rule Outputs Lack Explainability

Black-box ML fraud scores are not actionable in audit proceedings. Assessees demand rule-level justification for every risk flag raised. GST FraudShield addresses this with a fully deterministic, explainable rule engine where every flag carries a code, score, and plain-English rationale.



CTECHNOLOGICAL SOLUTION


Application Architecture

GST FraudShield follows a modular pipeline architecture that cleanly separates file parsing, GSTIN enrichment, fraud rule evaluation, and presentation layers. The system runs in two modes: a web backend dashboard served via Express.js, and a full-featured Electron desktop application with live GST portal automation.

#ModuleDescription
1File ParserAccepts GSTR-1 JSON/Excel; extracts invoices, GSTINs, HSN codes, tax values
2GSTIN EnrichmentFour-tier lookup: Cache → Proxy → AI (Anthropic) → HSN Inference
3Business InferencePredicts industry from top HSN chapters; assigns confidence score
4Fraud Rule EngineApplies 5 deterministic rules; produces per-GSTIN score, level, flags
5Dashboard / ExportRenders summary cards, risk table, flag list; exports Excel/CSV/PDF/Word


Technology Stack

ComponentTechnology
Backend APINode.js + Express.js (port 3020)
Desktop ShellElectron + electron-builder (Windows / macOS / Linux)
Portal AutomationPuppeteer — headless Chromium, pool of 5 browser pages
Real-time CommsWebSocket (ws library, port 3018)
Local CacheNeDB-Promises — embedded JSON database, 7-day TTL
AI EnrichmentAnthropic Claude API (optional, configured via API key)
ExportExcel, CSV, JSON, Print/PDF, Word (docx)



DCORE MODULES & FEATURES


Module 1: GST File Parser

Accepts GSTR-1 style JSON files and normalises them into a structured transaction dataset. Supports B2B invoices, credit/debit notes (CDNR), and HSN summary blocks. For every invoice line the parser extracts:

  1. Filing GSTIN and counterparty GSTIN (validated against the standard 15-character Indian GSTIN format)
  2. Document type, date, invoice value, taxable value, CGST / SGST / IGST / cess
  3. HSN code, place of supply, filing period, and source file reference

Module 2: Four-Tier GSTIN Enrichment Engine

For every unique counterparty GSTIN encountered, the system attempts enrichment through a prioritised four-tier cascade:

TierSourceMethodTrigger
1CacheNeDB local store (7-day TTL)Always checked first
2GST Portal ProxyPuppeteer → services.gst.gov.inCache miss
3AI LookupAnthropic Claude APIProxy unavailable
4HSN InferenceInvoice HSN chapter → industry mapAll external sources fail


Enriched attributes include: legal name, trade name, GST status, registration date, address, HSN/dealing-in codes, source, confidence score, and predicted industry.

Module 3: HSN-Based Business Profile Inference

When external enrichment is unavailable, the inference engine analyses the HSN codes across all invoices for a GSTIN, weights them by taxable value, and maps the dominant HSN chapters to an industry category. The top 5 HSN codes by value are selected and a confidence score derived from their dominance ratio. Mapped industries: Electronics, Automobile, Textile, Chemicals, Metal, Pharma, Food, Plastics, Paper, Services.

Module 4: Deterministic Fraud Rule Engine

The centrepiece of GST FraudShield. Five rule categories are applied to every counterparty GSTIN. Scores are additive and capped at 100. Risk level: HIGH (>=60), MEDIUM (>=30), LOW (<30).

Rule CodeDescriptionScoreSeverityTrigger Condition
UNKNOWN_GSTINGSTIN not found or unresolved20HIGHEnrichment missing or low-confidence
HSN_MISMATCHInvoice HSN differs from portal profile30HIGHNo 2-digit, 4-digit, or exact match
HIGH_CONCENTRATIONSingle GSTIN dominates total value15MED/HIGH>=40% share (HIGH if >=60%)
INVOICE_SPLITTINGMany small invoices instead of one large10MEDIUM>=8 invoices, avg < Rs.50K, total >= Rs.5L
ROUND_VALUE_PATTERNHigh proportion of round-value lines5LOW>=60% round values, >=3 transactions



EHOW IT WORKS — END-TO-END WORKFLOW


Step-by-Step Processing Pipeline

#StepDescription
1UploadUser selects GSTR-1 JSON or Excel files. Multiple files can be loaded and merged into a single audit session.
2ParseThe file parser validates GSTIN formats, normalises transaction rows across B2B, CDNR, and HSN blocks.
3EnrichEvery unique counterparty GSTIN is passed through the four-tier enrichment cascade. Results are cached for 7 days.
4InferFor GSTINs with no external data, the HSN inference engine predicts industry and business profile.
5ScoreThe fraud rule engine applies all five rules. Scores are aggregated and capped at 100. Risk level is assigned.
6ReviewDashboard renders summary cards, GSTIN-wise risk table, fraud flag list, and plain-English explanations.
7ExportOne-click export generates Excel, CSV, JSON, PDF, and Word documents for audit workpapers.


Dual Deployment Modes

  1. Web Dashboard Mode: Express.js server on port 3020. Users upload files, trigger audit via POST /api/audit, and view results instantly. Ideal for cloud or firm-wide deployment.
  2. Electron Desktop Mode: Packaged app for Windows/macOS/Linux. Adds GST proxy server (Puppeteer, port 3017), WebSocket live updates (port 3018), cache management, and full export. Designed for privacy-first, offline environments.



FBENEFITS, IMPACT & DIFFERENTIATORS


Challenge vs Solution Mapping

ChallengeHow GST FraudShield Solves It
Manual GSTIN verification takes hours per entityFour-tier auto-enrichment with caching completes hundreds of GSTINs in seconds
HSN mismatches go undetected in manual auditsAutomated HSN comparison (2-digit, 4-digit, exact) flags every mismatch
Invoice splitting and round-value patterns invisible in raw dataRule engine detects both patterns with configurable thresholds
Fraud scores lack explainability for audit proceedingsEvery flag carries a rule code, severity, score, and plain-English reason
GST portal inaccessible for bulk programmatic lookupPuppeteer proxy with page pooling handles portal barriers gracefully
No fallback when portal is down or GSTIN is newAI + HSN inference provides enrichment even without portal data
Reporting requires separate toolsBuilt-in multi-format export: Excel, CSV, JSON, PDF, Word


Quantified Benefits

DimensionImpact
SpeedAudit pipeline completes in seconds for hundreds of GSTINs vs. days of manual work
AccuracyMulti-source enrichment with confidence scoring reduces false negatives in fraud detection
ScaleHandles large multi-file GSTR-1 datasets without performance degradation
PrivacyLocal cache means financial data stays on-premises unless AI enrichment is explicitly enabled
CoverageFive rule categories covering the most prevalent GST fraud patterns recognised by tax authorities
CostOpen-source stack with zero per-use licensing cost; deployable on any standard laptop or server


Unique Differentiators

  1. Only tool combining live GST portal proxy automation with AI-based GSTIN enrichment as a seamless fallback.
  2. Deterministic, explainable fraud scoring — every risk flag maps to a named rule, making results contestable and audit-ready.
  3. Four-tier enrichment cascade guarantees a best-effort profile even for new, inactive, or portal-inaccessible GSTINs.
  4. Dual deployment flexibility: browser-based for firm-wide access, Electron desktop for privacy-sensitive offline environments.
  5. HSN-based industry inference brings supplier profiling capability to auditors without access to the GST portal.
  6. Open, extensible architecture: new fraud rules can be added without touching parser or enrichment layers.



GDEPLOYMENT, LIMITATIONS & ROADMAP


Deployment Options

  1. Local / Offline (Electron Desktop): Packaged installers for Windows (.exe), macOS (.dmg), and Linux (.AppImage). Fully offline; suitable for firms with strict data policies.
  2. Web Server Mode: Node.js application on any server or cloud VM. Accessible from any browser on the network. Suitable for shared firm-wide use.
  3. Developer / Source Mode: Clone repository, run npm install, and start with npm run start:app (web) or npm run electron (desktop). Requires Node.js 18+.

Current Limitations

  1. GSTR-1 JSON only in the modular backend; GSTR-2B and e-way bill formats are supported only in the large Electron UI file.
  2. GST portal automation depends on portal availability; heavy CAPTCHA enforcement may reduce proxy hit rates during peak filing.
  3. AI enrichment requires an Anthropic API key to be configured; not included by default.
  4. No persistent multi-session storage in web mode; audit results must be exported before the session ends.

Future Roadmap

FeatureDescription
GSTR-2B IntegrationFull reconciliation of purchase register against GSTR-2B for ITC risk detection alongside fraud scoring
ML Anomaly LayerComplement rule-based scoring with an unsupervised anomaly detection model trained on GST return patterns
Network Graph AnalysisVisualise GSTIN-to-GSTIN transaction networks to detect circular trading and accommodation entry chains
Multi-Client DashboardCA firm portal to load, switch, and compare audit results across multiple client GSTINs in one session
GSTR-3B Pre-fillAuto-populate GSTR-3B Table 4 values from reconciled data with one-click export to the filing portal
WhatsApp/Email AlertsAutomated compliance reminders to suppliers with pending or late GST filings from vendor scorecard
Tally/ERP IntegrationDirect data pull via ODBC from Tally Prime and SAP, eliminating the export-import step



HCONCLUSION


GST FraudShield represents a practical, deployable answer to one of India's most pressing tax compliance challenges: the detection of fraudulent invoice networks and bogus ITC claims within the GST ecosystem. By combining structured file parsing, automated GSTIN enrichment, AI-assisted inference, and a fully explainable rule-based fraud engine, the platform delivers audit-grade intelligence in seconds — at a cost accessible to every CA practice.

The architecture is deliberately modular and open. Parser, enrichment, inference, and fraud modules are independently maintained and testable. New fraud rules can be added without touching the data pipeline. New enrichment sources can be plugged in without disrupting the rule engine — ensuring GST Fraud Shield evolves with India's GST law and portal landscape.

The dual-mode deployment model — browser dashboard for shared access, Electron desktop for privacy-first offline use — ensures the tool fits diverse CA practice environments. Built-in multi-format export means audit findings are immediately usable in compliance filings, client reports, and regulatory submissions.

The vision is straightforward: every tax professional in India — whether an independent CA or a GST intelligence officer — should have access to the same quality of fraud detection intelligence that today requires specialized teams and expensive enterprise software. GST Fraud Shield is that equalizer.