Inspiration
This is a problem in our everyday lives. Calculating everyone's split is a headache due to either using various different apps/websites, and the person paying must remember who ate what to calculate the splits.
What it does
The telegram bot uses Tesseract, an OCR engine by Google, to convert the image received from the user into text. This text is passed to OpenAI's DaVinci-003 model to parse text in order to get the items, prices and any other taxes levied. This is then used to create a Telegram poll wherein people can vote on what they should split for, once everyone is done voting the splits are calculated and the PayLah bot is optionally called to send the payments.
How we built it
We used python-telegram-bot and pytesseract in order to generate OCR text. Significant pre-processing is done to improve OCR results, including rotation, shadow-removal, denoising and thresholding. After the OCR engine presents the text, OpenAI davinci-003 model is used to parse the output. This is done as regex is more rigid, and is unable to parse through noisy text effectively. Using the model also allows us to run spell-check on the items.
Challenges we ran into
- Handling files with python-telegram-bot was challenging. Different documentation with various versions of the software exist, making troubleshooting hard. Having never made a telegram bot before, the entire process was tedious, especially as we want to handle the entire user cycle in the Telegram app.
- Getting OCR output is a challenge as well. Without pre-processing, the outputs were extremely noisy and sparse. Crop and deskewing were stalled due to occlusions. This created several challenges as well, and several background separation techniques were explored. In the end, crops were abandoned and focus was instead put on shadow removal and thresholding techniques.
- Post processing OCR output is a challenge even with great OCR output. While we could extract the items and their respective prices with pure regex, spelling errors were plenty. Furthermore, extracting taxes and other charges which can be miscellaneously labeled was a challenge too. Therefore, while the regex approach works for 60% of the scripts we tested, we decided to go for an NLP approach as these models are getting more popular, robust and easier to implement, Hyperparameter tuning was essential to curtail the creativity of these models however, as we imposed a syntax on the output.
Accomplishments that we're proud of
Getting a great OCR output from 80% of our OCR receipts, as well as navigating python-telegram-bot to get it working.
What we learned
We learnt that telegram bots are not as easy to make as they seem, especially with advance data handling and interactions.
What's next for payments-split-bot
LayoutLM can be used for better parsing of the receipts. Creating our own AI for OCR output parsing, using nanoGPT by Andrej Karpathy, is also under consideration.
Log in or sign up for Devpost to join the conversation.