Hack@Brown2018: The Retextualizationizer
APIs used:
There's a lot of information in the world. We tried to create a tool that makes it easy to gain insight into crucial points in any long piece of text. (We're hoping this will be especially useful to all those college students with the heavy reading assignments.)
Description
We use Gensim and NLTK to implement natural language processing and machine learning in an easy-to-use tool. Specify your preferred summary target length and keyword count (in words), or use our default settings! Paste text directly, use the microphone for speech-to-text, or select a PDF. Note: Keywords may be stemmed.
How we built it
We used Flask to integrate our Python backend and processing with our HTML5/CSS/JS frontend. Our backend used the above APIs to provide summarization functionality through multiple methods.
Challenges
Most of us are first-time hackers! It was also our first experience with Flask, PDF processing, speech recognition software, and with full-stack development in general.
Backlog/what's next:
- Handle pdf formatting better: cleanly formatted pdfs are handled well, but more complex formatting can be a problem. Also, in scientific papers the program sometimes get sentences from the references section, so in future we would trim out the reference section.
- One idea we started with but did not implement was to link sentences in summary back to text so that users can learn more about that topic quickly, making is easy to use quotes/pinpoint lines
- For now, we return stemmed keywords. We need to stem them so that inflected/derived words are reduced to base meanings, but the resulting words are sometimes not actual words. We would eventually handle this so that we return actual words.
Log in or sign up for Devpost to join the conversation.