AI-Powered ITR Preparation from Source Documents
Author : CA Shubham Goyal
AutoITR is a locally-hosted web application built for Chartered Accountants and audit teams to automate one of the most repetitive parts of any practice’s workflow — retrieving, unlocking and reading taxpayer documents from the Income Tax e-Filing portal. From a single form (PAN, e-Filing password and date of birth), the tool logs into incometax.gov.in, navigates the portal, downloads Form 26AS, the Annual Information Statement (AIS), the Taxpayer Information Summary (TIS) and the last filed ITR, removes every password and decryption layer at the point of download, and produces a structured income, deduction and TDS summary for the taxpayer. The entire process runs on the user’s own machine, with credentials never written to disk and no third-party service anywhere in the data path.
For a practicing CA, gathering a client’s tax documents from the e-Filing portal is a recurring, manual and surprisingly fiddly task. Each client requires a fresh login, dismissal of the dual-login popup, navigation to TRACES for Form 26AS, then back to AIS, then to the filed returns — repeated every cycle, for every client.
This creates several practical challenges:
There is no integrated, privacy-first tool that handles the complete workflow from credential entry to a filing-ready summary.
AutoITR collapses the entire workflow into a single web form backed by four integrated modules:
Fetch (Browser Automation): A background subprocess drives a real Chromium browser using Playwright (Python async API). It logs into the e-Filing portal, dismisses the dual-login popup, navigates to TRACES to download Form 26AS (both the canonical caret-delimited text ZIP and the rendered PDF), retrieves AIS and TIS, and fetches the filed ITR — printing every step to a live log the operator can watch in real time.
Decryption & Password Removal: Each document is unlocked the moment it lands, so the user never types a password again. The Form 26AS ZIP is re-written with its encryption flag cleared using the standard-library zipfile module; AIS and TIS PDFs are decrypted and re-saved with pypdf; and the AIS JSON — encrypted with AES-256-CBC and a PBKDF2-SHA256-derived key — is decrypted using pycryptodome. The key is derived from lowercase(PAN) + a fixed constant + DOB. The captcha required for the AIS JSON download is solved locally by ddddocr, an ONNX-based OCR model, with no external API call.
Past Files (Document Archive): Every fetch leaves a per-client document library on disk, organised by PAN and document type. The archive presents each taxpayer as a collapsible card with Download and in-browser View options for every file, including a sub-page that lists the contents of each 26AS ZIP.
Analyze (Structured Summary): Rather than parsing PDFs, AutoITR reads the portal’s own structured outputs — the ITR JSON, the caret-delimited TRACES text file inside the 26AS ZIP, and the decrypted AIS JSON. From these it produces a clean, deterministic summary of income by head, Chapter VI-A deductions, TDS deposited and tax payable or refundable, in roughly a tenth of a second per taxpayer.
Backend & Web: Python 3.12, Flask 3 (Jinja2 templating and signed-cookie session authentication), Gunicorn as the production WSGI server, and a per-fetch subprocess with per-job log files for live progress streaming.
Browser Automation: Playwright (Python async API) driving Chromium, with standard anti-detection mitigations — disabled AutomationControlled, a custom user agent and masking of the navigator.webdriver property.
Decryption: Standard-library zipfile for Form 26AS ZIP password removal; pypdf for AIS/TIS PDF decryption; and pycryptodome (AES-256-CBC with PBKDF2-SHA256 key derivation) for the AIS JSON.
Captcha / OCR: ddddocr, a local ONNX-runtime OCR model for the AIS JSON download captcha — runs entirely on the local machine with no network call.
Frontend: Bootstrap 5.3 and Bootstrap Icons with vanilla JavaScript for live status polling — no SPA framework and no build step.
Architecture: Local-first design — the application runs on the user’s machine, downloaded documents are stored under data/{26as, ais, itr}/<PAN>/, and credentials live only inside the fetch subprocess and disappear when it exits.
Multi-user Authentication: Replace the current single hardcoded user with a proper user store, password hashing and per-user data partitioning.
Multi-year Analysis: Extend the Analyze module to show year-on-year income and TDS trends for a taxpayer.
Batch Fetches: Allow a CA to queue 50 PANs to run unattended overnight.
Export: Export the Analyze summary as a formatted PDF or Excel working paper ready for the audit file.
Hosted Deployment: Offer an optional hosted, multi-tenant version with TLS, audit logging and role-based access for larger firms.
AutoITR represents a practical, privacy-first approach to one of the most routine yet time-consuming tasks in tax practice — assembling and reading a client’s documents from the Income Tax portal. By integrating automated fetching, end-to-end decryption, a per-client document archive and a structured summary into a single local application, it lets CAs move from minutes of manual clicking and document-opening to a one-form, one-click, filing-ready view of every client. Built around the actual workflow a practice follows, with credentials kept ephemeral and all data retained on the user’s machine, it is straightforward to adopt and provides a foundation that can grow toward a hosted, multi-user platform in future iterations.