Automated GSTR-3B PDF to Excel Extractor for Monthly Turnover Analysis
Author : CA Priya Seht
Author : CA Priya Seht
Use Case Title :Automated GSTR-3B PDF to Excel Extractor for Monthly Turnover Analysis
For the (stock) audit report which requires monthly sales summary for a financial year, every time, the user opens the GSTR 3B of each month, manually adds up the figure of outward supplies, copies/writes them in excel/on paper respectively, sorts them month wise and then punches the final figures in the report.
This tool automates extraction of GSTR-3B data (Table 3.1 tables a,b,c and e i.e outward supplies as reported in the GSTR-3B) from PDFs using OCR and text parsing, converts it all into structured data in MS Excel, sorts it as per the Financial Year (i.e April to March chronologically) and then provides ready-to-analyse results for the year to the user. The tool also converts the taxable figures so extracted into Lakhs.
The user gets multiple advantages by using this tool:-
a) The time lost due in opening each PDF, noting down the figures and then converting it in Lakhs is saved
b) Manual errors are eliminated
c) The tool is packaged as an EXE and can be used by anyone without any technical knowledge
d) Works offline, thus data privacy is maintained
Prerequisites:
• Python installed and configured
• Tesseract OCR installed with correct path mapping
• Poppler installed and path configured
• Basic folder structure ready with input PDFs
• Required Python libraries installed using pip
The following prompts were used with Chatgpt to create this tool:-
reader and when I feed in the PDFs in it , it gives me the figures. For eg: When I run that
software , all the GSTR3B in a folder are read and the output is given in Excel stating
month-wise sale output taxable figure. It can be done using Python, please suggest.
• Tell me what all needs to be installed with the relevant working links to download
everything.
• The extraction tool executed but it is not extracting the data from the pdf. I want "total
taxable value" of "(a) Outward taxable supplies (other than zero rated, nil rated and
exempted)" plus (b) Outward taxable supplies (zero rated) plus (c ) Other outward
supplies (nil rated, exempted)
• I have poppler in downloads, python is installed in C Drive and tesseract-ocr-w64-
setup-5.5.0.20241111 also executed and saved in C Drive.
•I want to create an exe file of this extractor according to this path.
•Now, I want it all to be mentioned in chronological order of a Financial Year
starting From Apr, May,June,July till March.
• Now, I want you to ensure that the excel sheet so created is formatted properly with the
proper column width and the months are arranged from April onwards i.e April, May,
June, Jul, Aug, Sep, Oct,Nov,Dec,Jan,Feb and March. Further, add a last column wherein
the value is converted into lakhs with decimals Like 12345678 is converted as 123.46 .