Hi! I am at Table 310
Inspiration
As builders and founders ourselves, we constantly saw VC analysts juggling 10+ tabs just to understand what a startup does. Signals were scattered across Crunchbase, GitHub, LinkedIn, Product Hunt, and company websites — but there was no unified way to surface a clear, investor-relevant profile. SignalStackVC was born from that pain. We wanted a tool that felt like having your own research analyst — but faster, smarter, and automatic.
What it does
SignalStackVC aggregates fragmented data across the web and builds an intelligent, structured profile of any startup — in seconds. You input a company name, and SignalStackVC pulls in high-signal data (funding, team, traction, tech, etc.) from trusted sources like Crunchbase, LinkedIn, GitHub, and press coverage. It categorizes content using semantic embeddings, filters noise using confidence scoring, and generates a clean, JSON-based company profile with clickable sources.
How we built it
- 🧠 Smart Query Engine: Combines rule-based and LLM-generated queries tailored to each startup.
- 🔍 Search Layer: Uses SerpAPI to fetch high-relevance Google search results per query.
- 🧭 Categorization Engine: Uses OpenAI embeddings + cosine similarity to classify each result into funding, tech, team, product, etc.
- 🔗 Async Web Scraper: Enriches high-confidence links with live page content using
aiohttp. - 🤖 Profile Generator: Uses GPT-4 to generate a structured profile from categorized data, with strict constraints to avoid hallucination.
- 🗂 Backend Infra: Modular, CLI-friendly Python backend with rate limiting, deduplication, and JSON export.
Challenges we ran into
- Entity disambiguation: Sometimes we fetched unrelated results for similarly named tokens or products (e.g. VLY crypto vs vly.ai).
- Result quality: Google results can be noisy; we had to build custom filtering, scoring, and fallback logic to keep profiles clean.
- OpenAI hallucinations: Without strict grounding, the model sometimes mixed up team members or made up features — so we tightened prompts and added source matching logic.
Accomplishments that we're proud of
- Built a fully functioning backend pipeline in under 48 hours.
- Extracted clean profiles for real YC startups like vly.ai, including team info and product summaries with cited sources.
- Designed a confidence-weighted categorization system that balances rules + embeddings.
- Modularized the codebase so it’s production-ready and extendable.
What we learned
- Combining LLMs with traditional rule-based systems leads to surprisingly robust data pipelines.
- Query quality is everything — the better your search prompts, the better the downstream data.
- Entity resolution is hard — and crucial. Filtering and exact-match logic saved us from tons of noise.
What's next for SignalStackVC
- 🎯 Frontend: Launch a UI where VCs can search and watch startups, not just generate profiles from CLI.
- 📊 Watchlists & Alerts: Let users track startup momentum over time (team changes, funding updates).
- 💬 Founder Signals: Extract founder posts from Substack, Medium, and LinkedIn.
- 🧠 Semantic Search: Let users search for “early-stage devtools with high traction” and get ranked results.
- 🔐 User accounts + history so analysts can save, comment, and collaborate on startup profiles.
Built With
- fastapi
- react
Log in or sign up for Devpost to join the conversation.