SignalStackVC - Table 310

Hi! I am at Table 310

Inspiration

As builders and founders ourselves, we constantly saw VC analysts juggling 10+ tabs just to understand what a startup does. Signals were scattered across Crunchbase, GitHub, LinkedIn, Product Hunt, and company websites — but there was no unified way to surface a clear, investor-relevant profile. SignalStackVC was born from that pain. We wanted a tool that felt like having your own research analyst — but faster, smarter, and automatic.

What it does

SignalStackVC aggregates fragmented data across the web and builds an intelligent, structured profile of any startup — in seconds. You input a company name, and SignalStackVC pulls in high-signal data (funding, team, traction, tech, etc.) from trusted sources like Crunchbase, LinkedIn, GitHub, and press coverage. It categorizes content using semantic embeddings, filters noise using confidence scoring, and generates a clean, JSON-based company profile with clickable sources.

How we built it

🧠 Smart Query Engine: Combines rule-based and LLM-generated queries tailored to each startup.
🔍 Search Layer: Uses SerpAPI to fetch high-relevance Google search results per query.
🧭 Categorization Engine: Uses OpenAI embeddings + cosine similarity to classify each result into funding, tech, team, product, etc.
🔗 Async Web Scraper: Enriches high-confidence links with live page content using aiohttp.
🤖 Profile Generator: Uses GPT-4 to generate a structured profile from categorized data, with strict constraints to avoid hallucination.
🗂 Backend Infra: Modular, CLI-friendly Python backend with rate limiting, deduplication, and JSON export.

Challenges we ran into

Entity disambiguation: Sometimes we fetched unrelated results for similarly named tokens or products (e.g. VLY crypto vs vly.ai).
Result quality: Google results can be noisy; we had to build custom filtering, scoring, and fallback logic to keep profiles clean.
OpenAI hallucinations: Without strict grounding, the model sometimes mixed up team members or made up features — so we tightened prompts and added source matching logic.

Accomplishments that we're proud of

Built a fully functioning backend pipeline in under 48 hours.
Extracted clean profiles for real YC startups like vly.ai, including team info and product summaries with cited sources.
Designed a confidence-weighted categorization system that balances rules + embeddings.
Modularized the codebase so it’s production-ready and extendable.

What we learned

Combining LLMs with traditional rule-based systems leads to surprisingly robust data pipelines.
Query quality is everything — the better your search prompts, the better the downstream data.
Entity resolution is hard — and crucial. Filtering and exact-match logic saved us from tons of noise.

What's next for SignalStackVC

🎯 Frontend: Launch a UI where VCs can search and watch startups, not just generate profiles from CLI.
📊 Watchlists & Alerts: Let users track startup momentum over time (team changes, funding updates).
💬 Founder Signals: Extract founder posts from Substack, Medium, and LinkedIn.
🧠 Semantic Search: Let users search for “early-stage devtools with high traction” and get ranked results.
🔐 User accounts + history so analysts can save, comment, and collaborate on startup profiles.

Built With

fastapi
react

Updates

Tanvi Guttula started this project — Jun 22, 2025 01:46 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.