Create your TaxLens: Simplifying Tax Review, InstantlyRecord inserted or updated successfully.
AI & Audit Automation

Create your TaxLens: Simplifying Tax Review, Instantly

Author : CA. Vishnu Manickam Ganesan

Watch on Youtube

๐Ÿ” Extract Specific Information from Bulk Filed Income Tax Returns (JSON)



In this usecase, we demonstrate how to extract specific data from multiple income tax return JSON files in a structured and step-by-step manner using Python. This is particularly useful for professionals who deal with large volumes of ITR data and need to collate essential details efficiently into an Excel or structured output.

โœ… Step 1: Open a Single JSON File

We begin by selecting and opening a single JSON file using Python. The file is opened with the appropriate indentation to allow for better readability and inspection of its internal structure. This step helps us understand the layout of the JSON and identify the relevant paths for extracting the required information.

โœ… Step 2: Extract Information from the Opened JSON

Once the file is loaded, we move on to extracting specific details from it. This step involves pinpointing the required fields based on the JSON's hierarchy. We identify the keys and sub-keys that hold the data we are interested in and isolate them using standard Python techniques.

โœ… Step 3: Apply the Extraction Logic to All JSON Files in a Folder

After successfully extracting the information from a single file, we scale the logic to work across an entire folder of JSON files. Regardless of the ITR form type (ITR1, ITR2, ITR3, etc.), our approach loops through each file, applies the extraction steps, and consolidates the results into a uniform structureโ€”ready to be saved or exported.

๐Ÿ’ผ Use Case: Partner Remuneration & Profit Mapping from Partnership Firm Returns

For this demonstration, we focus on a real-world use caseโ€”extracting Partner Remuneration and Profit Share details from all the filed ITR JSONs of partnership firms.


๐Ÿงพ Objective

In any audit or tax review scenario, one of the common pain points is verifying whether the income credited to a partner (from the firmโ€™s ITR) has been appropriately reflected in their individual return. This involves opening each firm's ITR JSON, locating the profit split, noting the partnerโ€™s name and PAN, and then matching it manually across returnsโ€”a time-consuming and error-prone task.

โš™๏ธ Solution

With this automated approach:

  1. We extract remuneration and profit transferred to each partner from every partnership firm ITR in the folder.
  2. The extracted data includes Firm Name, Firm PAN, Partner Name, Partner PAN, Share Percentage, Remuneration, and Profit Transferred.
  3. This consolidated view helps us cross-verify the reporting in individual returns with minimal effort.

โœ… Step 1: Open a Single JSON File



Chat GPT Prompt:

โ€œI want python code to read json files along with indent path D:\AI HACKATHON FINALS\ALL JSON\AYYAPPASAAMYFINANCE_2425.jsonโ€


Python Code


Output


โœ… Step 2: Extract Information from the Opened JSON

Chat GPT Prompt:

โ€œAbove is the output of JSON File in this i need python code extract the partner details table in the following format Name| PAN No| Share Holding %| Remuneration| Profit In this format compute profit for each partner based on share holding % applied for the net profit which will be mentioned in JSON as "PartnerAccBalTrf". Ignore the fields if any other information is not available but include all information which is available.โ€


Python Code




OUTPUT




โœ… Step 3: Apply the Extraction Logic to All JSON Files in a Folder

Chat GPT Prompt:

โ€œI need python code to extract this information from all the json files which is meant for ITR5 (ignore other ITR forms Like ITR1,2,3,4,7) now in each json file profit should be calculated based on respective profit share ratio. Include Firm Name Column and Firm Column So columns are Firm Name| Firm Pan No| Partner Name| Partner Pan No| Share Holding Ratio| Remuneration| Profit Transferred Firm Name is available under "AssesseeName" - "SurNameOrOrgName" and Firm PAN No is available under "PartA_GEN1"-"PAN" Other information you have taken correctly. In the code i need to save this data as excel file in same folderโ€

Python Code








Output

Screenshot of Excel File with all partnership firms information




๐Ÿ“‚ Extending the Approach Across All ITR Types

Building on the same foundational method, we now expand the logic to extract additional relevant data from all types of ITR JSON filesโ€”not just for partnership firms, but across individuals, HUFs, companies, and others.



๐Ÿงฐ What We Aim to Extract

By applying this approach consistently, we can compile:



๐Ÿงฐ What We Aim to Extract

By applying this approach consistently, we can compile:

  1. ๐Ÿ“’ Client Address Book

Extracting name, PAN, mobile number, email ID, and address fields from each JSON to maintain an updated, auto-generated contact list.

  1. ๐Ÿ’ฐ Advance Tax Paid Clients List

Identifying clients who have paid advance tax and capturing the amount, date, and mode of paymentโ€”useful for planning, follow-up, and advising.

  1. โš ๏ธ Clients with Heavy Interest Liabilities

Automatically spotting cases where interest under Section 234B, 234C, or other sections is highโ€”helping in red-flag reporting and advisory.

๐ŸŽฏ Why This Matters

Traditionally, extracting such insights required individually opening each return, scanning through multiple pages, and manually collating the dataโ€”a tedious and error-prone process.

Now, with one integrated sweep across folders of JSONs, we can build targeted, insightful dashboards that serve as decision-support tools in practice.


โœ… Step 1: Train GPT with all ITR Types.

Chat GPT Prompt:

โ€œI will upload sample json text files of 7 ITR types. I need following columnar information from all JSON files in the folder irrespective of its ITR type. But for you to understand where the specific information is stored am uploading the json text one by one in this chat. the Information which i require has following Columns

Assessee Name| PAN| Aadhar | Address| Email Id| Phone Noโ€


PYTHON CODE



Output of Client Address book


Chat GPT Prompt:

โ€œSimilarly based on the ITR and informations which i uploaded ! Grab Advance tax details paid by assessees ! ignore files if 0 advance tax paid !โ€


Output of Advance Tax Paid Client List


Chat GPT Prompt:

โ€œSimilarly give python code to get this information File Name| PAN| Interest under section 234B| Interest Under Section 234C| mobile noโ€

Output of Client List Who paid interest U/s 234B And 234C:


โœ… Conclusion

By leveraging this structured, Python-based approach, we have transformed what was once a time-consuming and manual review process into an automated, scalable, and accurate solution.

Whether it is:

  1. Extracting partner profit and remuneration from firm returns
  2. Building a client address book from multiple ITRs
  3. Listing advance tax paid or identifying interest-heavy cases

โ€”all of this can now be done in just a few clicks, with full control, clarity, and confidence.


๐Ÿš€ What This Means for Practice

๐Ÿ”      Better Compliance Tracking

๐Ÿ•’      Massive Time Savings

๐Ÿ“Š      Centralized, Ready-to-Use Data

๐Ÿ“ˆ        Faster Review and Advisory Decisions

This is not just automationโ€”this is elevation of how we work with data in taxation and audit.