StoryTeller
🌟 Inspiration
The inspiration for StoryTeller came from the idea of making stories more immersive and visually engaging. Listening to audiobooks is popular, but adding dynamic visual elements that match the narrative in real-time would add a unique layer of interaction, making stories feel alive. This project aims to enhance storytelling by generating images that visualize each scene as the story is narrated, similar to creating a vivid animated experience.
🚀 What We Learned
Throughout this project, we delved into several new areas and strengthened existing skills:
- Text-to-Speech (TTS) Integration: We learned to implement TTS functionality, allowing the system to read stories aloud.
- AI-Powered Image Generation: We explored image-generation tools, focusing on generating unique images that align with story content.
- Frontend & CSS Layouts: We practiced advanced CSS layouts and responsive design, creating a user-friendly interface with a header, footer, main content, and sidebar.
- Event-Driven State Management: Managing play/pause functionality within a single button and linking TTS with image display required careful event handling and state management.
🛠️ How We Built the Project
Frontend Setup:
- Using React for modular components and dynamic UI updates.
- Created a layout with a header, footer, text panel, and image display panel.
- CSS styling was essential for alignment, spacing, and responsive design.
TTS and Image Generation Logic:
- Integrated AWS Polly for the TTS to read out each sentence/paragraph of the story, controlled via a play/pause button.
- Used an AI image generation API using OpenAI's API to create visual scenes based on the TTS narration. Each time a new sentence is read, the API generates and displays a related image.
Backend:
- Utilized AWS Polly for the text-to-speech voice.
🔥 Challenges Faced
- Synchronizing TTS with Image Generation: It was challenging to synchronize TTS playback with real-time image generation. We needed to manage timing delays so that the generated images would display smoothly with each sentence read.
- Database: Database didn't work so we had to scrap that idea for this project.
- Performance Optimization: Image generation can be time-intensive, so optimizing API calls and managing transitions between story sentences was crucial for a smooth experience.
- UI Layout Adjustments: Ensuring that all components displayed correctly on different screen sizes and resolutions required extensive CSS tweaks and media queries.
Log in or sign up for Devpost to join the conversation.