AI in ICAI

Sign In

AI & CA Office Automation

AI-Powered Video Analysis & Forensic Accounting Suite

Author : CA. SHAILESH WADHAWANIYA

A Multimodal Approach to Evidence Analysis Using Visual Language Models (VLMs)

1. Document Metadata

Field	Detail
Author	CA Shailesh Wadhawaniya
Date	17th October 2025
Category	Forensic Accounting, Audit & Assurance, AI in Professional Practice, Business CCTV Monitoring and early fraud detections

2. Executive Summary

The proliferation of unstructured data, particularly video footage, presents both a massive challenge and a significant opportunity for forensic auditing. Traditional audit procedures are often limited to text-based analysis, while crucial physical evidence contained in video surveillance (CCTV) goes unreviewed due to its sheer volume.

The AI-Powered Video Analysis & Forensic Accounting Suite is a multimodal digital assistant designed to bridge this gap. It utilizes Visual Language Models (VLMs)—advanced AI systems capable of interpreting both language and visual data—to "watch" and analyze video evidence from local files or in real-time.

By combining VLM insights (e.g., identifying unauthorized access or asset movement) with conventional forensic data analysis (e.g., Benford’s Law and Entity Network Mapping), the suite provides a comprehensive, cross-referenced view of potential fraud. Its key deliverables include:

Automated detection of high-risk physical events in video footage.
Secure, offline analysis of sensitive client data using local LLMs.
Generation of PII-redacted, cryptographically secured reports for legal review.

3. The Problem

Chartered Accountants and forensic professionals face several critical challenges in modern fraud investigations:

Overwhelming Video Volume: Security systems can generate hundreds of hours of footage daily. Manually reviewing this to find a few seconds of suspicious activity is impractical, time-consuming, and prone to human error.
Data Privacy and Security Risks: Uploading sensitive client financial data and proprietary surveillance footage to public, cloud-based AI services poses significant confidentiality risks.
Lack of Correlation: Existing tools analyze financial data and physical evidence separately. Proving fraud often requires correlating a suspicious transaction with a specific event visible in video, a process that is currently manual and speculative.
Inefficient Reporting: Generating court-ready evidence involves tedious PII redaction, secure encryption, and standardized legal narrative drafting, which slows down the investigation lifecycle.

4. The Objective

The primary objective is to build a secure, comprehensive forensic suite that enhances the efficiency and depth of audit investigations by:

Enabling Multimodal Analysis: Simultaneously analyzing structured financial data (ledgers) and unstructured visual evidence (videos/images).
Adhering to a Privacy-First Design: Ensuring all analysis is performed locally using downloaded LLM/VLM models (Ollama), eliminating cloud exposure for confidential data.
Providing Real-Time Risk Detection: Processing live RTSP streams from CCTV cameras to identify and flag high-risk events as they occur.
Delivering Court-Ready Output: Generating legally compliant, PII-redacted, and cryptographically secure reports with verifiable integrity hashes.

5. Solution Architecture & Components

The solution employs a hybrid architecture combining advanced Visual Language Models (VLMs), computer vision (YOLO), and specialized forensic data modules, all managed through a local Python interface.

Module / Component	Functionality
VLM (Vision Language Model)	Interprets video frames, objects, and scene context to provide nuanced descriptions, identify non-obvious entities (e.g., car brand/model), and assign risk levels (Medium/High).
CV Engine (YOLOv8)	Provides a high-speed, baseline analysis for object detection in real-time streams, serving as a rapid filter for suspicious activity.
Forensic Data Engine	Performs statistical checks (Benford’s Law, Outlier Detection), financial ratio computation, and applies jurisdiction-specific compliance rules on ledger data.
Entity Network Analyzer	Visualizes transaction flows between accounts and parties, revealing complex relationships and potential collusion.
Secure Export Module	Redacts Personally Identifiable Information (PII), encrypts the final audit report using AES, and generates cryptographic hash reports for evidence integrity.
Audit Trail Database	Uses a resilient SQLite database to log all analysis steps, queries, and detected events, maintaining a robust chain of custody.
LLM Client (Ollama)	Manages connections to run powerful LLM/VLM models locally on the auditor's machine, ensuring complete data privacy.

6. Data & Input Requirements

Video Inputs: Local files (MP4, AVI, MOV) and live RTSP Streams for direct CCTV integration.
Financial Inputs: Structured data (CSV, Excel) and unstructured evidence (images of receipts/invoices, PDFs).
Contextual Inputs: Scene descriptions for the VLM, camera IDs, and transaction details.
Security & Privacy: The application is designed to operate fully offline. By utilizing local LLM/VLM installations, no confidential data ever leaves the auditor's secure environment.

7. Implementation Workflow

Setup: The user configures the connection to their local Ollama LLM/VLM instance.
Video Ingestion: A video file or RTSP stream URL is provided, along with the desired sampling rate and scene context.
Visual Analysis: The engine processes the video, categorizes events by risk level (Low/Medium/High), saves keyframes of high-risk events, and logs all metadata.
Financial Ingestion: The user uploads ledger files (CSV/Excel) or folders containing unstructured evidence.
Forensic Scan: The system runs Benford’s analysis, ratio checks, and Entity Network Mapping on the financial data.
Evidence Correlation: Using Natural Language Queries, the auditor searches the event database to visually correlate flagged transactions with high-risk video events.
Final Export: The suite generates a PII-redacted, encrypted legal narrative and a comprehensive compliance report.

8. Technology Stack

Layer	Tools & Technologies
Frontend / UI	Streamlit (Python)
Core Logic	Python 3.10+
AI Engine	VLM: Ollama (Local) running Qwen-VL model
Computer Vision	ultralytics (YOLOv8), opencv-python
Data Handling	pandas, openpyxl, PyMuPDF
Database	SQLite (with WAL and FTS5)
Visualization	pyvis (for network graphs)
Security & Export	cryptography.hazmat (for AES encryption), fpdf2 (for PDF reports)

9. Key Benefits & Impact

Dimension	Impact
Time Efficiency	Drastically reduces manual review time for surveillance footage from hours to minutes.
Detection Depth	Links statistical financial anomalies directly to specific physical events, providing compelling, correlated evidence.
Data Security	Guarantees client confidentiality by performing all AI analysis offline, satisfying stringent data protection regulations.
Audit Quality	Produces detailed, explainable, and traceable risk reports with verifiable cryptographic integrity.
Scalability	Adaptable for internal audits, forensic investigations, and regulatory compliance checks across various industries.

10. Challenges & Limitations

Hardware Requirements: Running powerful VLMs locally is computationally intensive and requires a system with a dedicated GPU.
Contextual Ambiguity: Complex or subjective actions in video may still require human validation to interpret intent accurately.
Network Stability: Real-time stream analysis is dependent on the stability and bandwidth of the local network connected to the CCTV system.
Model Maintenance: Local AI models require periodic updates to maintain optimal performance and accuracy.

11. Future Scope

IoT Sensor Integration: Incorporate non-visual data (e.g., access control logs, temperature sensors) to enrich event context.
Autonomous Correlation Agents: Develop LLM agents that can automatically suggest links between video events and ledger entries.
Predictive Risk Profiling: Use machine learning to identify patterns that predict high-risk activities before they occur.
Secure Cloud Integration: Implement encrypted connectors for private cloud video management systems.
Migration to PostgreSQL: Migration from SQLite to PostgreSQL for better scalability, cloud integration and more security purposes.

12. Conclusion

The AI-Powered Video Analysis & Forensic Accounting Suite represents a critical evolution in audit technology. By successfully integrating advanced multimodal AI with established forensic practices in a privacy-centric framework, it empowers auditors with a powerful tool to handle the complexity and scale of modern digital investigations. This suite provides a traceable, explainable, and secure method for transforming raw data into court-ready evidence, setting a new standard for the profession.