sound-bite

System Diagram

Inspiration

Wanted to experiment with taking notes using speech to text software. Some times its easier to understand something when you can hear the person's tone when they are saying it.

What it does

Allows users to record and upload audio clips which are then transcribed, saved, and made text-searchable. User's can also play back the audio that they find.

How I built it

Wrote several cloud functions to handle things such as file format conversion, speech to text transcription, google cloud storage, and synchronization between Algolia database, Firestore database, and google cloud storage.

The audio files are saved to google cloud storage, and the transcription data + audio file location within google cloud storage are saved in Firestore. Algolia contains the username, the text transcription, and the fire object ID that links it to the firestore object. This way, whenever we update or delete a Firestore audio clip, it can be deleted from both the google cloud storage, as well as Algolia.

Algolia was used to create the fully text searchable database.

Google auth ensures that the app is secure and the APIs can only be accessed by authorized users.

Challenges I ran into

I spent a long time trying to convert audio files to .wav from the browser, before I decided to just make a cloud function to handle the conversion.

What's next for sound-bite

We would like to add additional natural language processing features so that we can sort sound bites by sentiment and show the positivity/negativity associated with certain key-words.

We would also like to add the use of a Maps API so that users can associate certain sound bites with certain geographic landmarks such that those sound bites will only be visible if you are nearby that location.

Built With

algolia
ffmpeg
firebase
firebase-authentication
firebase-hosting
firestore
google-cloud
google-cloud-functions
google-web-speech-api
react

Submitted to

HackRPI Fall 2020
- Winner [Wolfram] Top 15 Teams

Created by

I contributed to the user authentication sign in features. I had difficulty connecting the form to Google's Firebase as it was my first time, but overall was a great learning experience.

jasonlam00
I was responsible for creating the speech to text, sound bite text search, sound playback, and user auth sign in features. It was my first time using cloud functions, text to speech api, and algolia, so it was a little intimidating.

I had a great time learning about these tools and technologies and i'm looking forward to using them in my future projects!

Mitesh Kumar

Updates

Mitesh Kumar started this project — Nov 08, 2020 06:53 AM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.