Create your TaxLens: Simplifying Tax Review, Instantly
Author : CA. Vishnu Manickam Ganesan
Author : CA. Vishnu Manickam Ganesan
๐ Extract Specific Information from Bulk Filed Income Tax Returns (JSON)
In this usecase, we demonstrate how to extract specific data from multiple income tax return JSON files in a structured and step-by-step manner using Python. This is particularly useful for professionals who deal with large volumes of ITR data and need to collate essential details efficiently into an Excel or structured output.
โ Step 1: Open a Single JSON File
We begin by selecting and opening a single JSON file using Python. The file is opened with the appropriate indentation to allow for better readability and inspection of its internal structure. This step helps us understand the layout of the JSON and identify the relevant paths for extracting the required information.
โ Step 2: Extract Information from the Opened JSON
Once the file is loaded, we move on to extracting specific details from it. This step involves pinpointing the required fields based on the JSON's hierarchy. We identify the keys and sub-keys that hold the data we are interested in and isolate them using standard Python techniques.
โ Step 3: Apply the Extraction Logic to All JSON Files in a Folder
After successfully extracting the information from a single file, we scale the logic to work across an entire folder of JSON files. Regardless of the ITR form type (ITR1, ITR2, ITR3, etc.), our approach loops through each file, applies the extraction steps, and consolidates the results into a uniform structureโready to be saved or exported.
๐ผ Use Case: Partner Remuneration & Profit Mapping from Partnership Firm Returns
For this demonstration, we focus on a real-world use caseโextracting Partner Remuneration and Profit Share details from all the filed ITR JSONs of partnership firms.
๐งพ Objective
In any audit or tax review scenario, one of the common pain points is verifying whether the income credited to a partner (from the firmโs ITR) has been appropriately reflected in their individual return. This involves opening each firm's ITR JSON, locating the profit split, noting the partnerโs name and PAN, and then matching it manually across returnsโa time-consuming and error-prone task.
โ๏ธ Solution
With this automated approach:
โ Step 1: Open a Single JSON File
Chat GPT Prompt:
โI want python code to read json files along with indent path D:\AI HACKATHON FINALS\ALL JSON\AYYAPPASAAMYFINANCE_2425.jsonโ
Python Code
Output
โ Step 2: Extract Information from the Opened JSON
Chat GPT Prompt:
โAbove is the output of JSON File in this i need python code extract the partner details table in the following format Name| PAN No| Share Holding %| Remuneration| Profit In this format compute profit for each partner based on share holding % applied for the net profit which will be mentioned in JSON as "PartnerAccBalTrf". Ignore the fields if any other information is not available but include all information which is available.โ
Python Code
OUTPUT
โ Step 3: Apply the Extraction Logic to All JSON Files in a Folder
Chat GPT Prompt:
โI need python code to extract this information from all the json files which is meant for ITR5 (ignore other ITR forms Like ITR1,2,3,4,7) now in each json file profit should be calculated based on respective profit share ratio. Include Firm Name Column and Firm Column So columns are Firm Name| Firm Pan No| Partner Name| Partner Pan No| Share Holding Ratio| Remuneration| Profit Transferred Firm Name is available under "AssesseeName" - "SurNameOrOrgName" and Firm PAN No is available under "PartA_GEN1"-"PAN" Other information you have taken correctly. In the code i need to save this data as excel file in same folderโ
Python Code
Output
Screenshot of Excel File with all partnership firms information
๐ Extending the Approach Across All ITR Types
Building on the same foundational method, we now expand the logic to extract additional relevant data from all types of ITR JSON filesโnot just for partnership firms, but across individuals, HUFs, companies, and others.
๐งฐ What We Aim to Extract
By applying this approach consistently, we can compile:
๐งฐ What We Aim to Extract
By applying this approach consistently, we can compile:
Extracting name, PAN, mobile number, email ID, and address fields from each JSON to maintain an updated, auto-generated contact list.
Identifying clients who have paid advance tax and capturing the amount, date, and mode of paymentโuseful for planning, follow-up, and advising.
Automatically spotting cases where interest under Section 234B, 234C, or other sections is highโhelping in red-flag reporting and advisory.
๐ฏ Why This Matters
Traditionally, extracting such insights required individually opening each return, scanning through multiple pages, and manually collating the dataโa tedious and error-prone process.
Now, with one integrated sweep across folders of JSONs, we can build targeted, insightful dashboards that serve as decision-support tools in practice.
โ Step 1: Train GPT with all ITR Types.
Chat GPT Prompt:
โI will upload sample json text files of 7 ITR types. I need following columnar information from all JSON files in the folder irrespective of its ITR type. But for you to understand where the specific information is stored am uploading the json text one by one in this chat. the Information which i require has following Columns
Assessee Name| PAN| Aadhar | Address| Email Id| Phone Noโ
PYTHON CODE
Output of Client Address book
Chat GPT Prompt:
โSimilarly based on the ITR and informations which i uploaded ! Grab Advance tax details paid by assessees ! ignore files if 0 advance tax paid !โ
Output of Advance Tax Paid Client List
Chat GPT Prompt:
โSimilarly give python code to get this information File Name| PAN| Interest under section 234B| Interest Under Section 234C| mobile noโ
Output of Client List Who paid interest U/s 234B And 234C:
โ Conclusion
By leveraging this structured, Python-based approach, we have transformed what was once a time-consuming and manual review process into an automated, scalable, and accurate solution.
Whether it is:
โall of this can now be done in just a few clicks, with full control, clarity, and confidence.
๐ What This Means for Practice
๐ Better Compliance Tracking
๐ Massive Time Savings
๐ Centralized, Ready-to-Use Data
๐ Faster Review and Advisory Decisions
This is not just automationโthis is elevation of how we work with data in taxation and audit.