Hi! I am at Table 310

Inspiration

As builders and founders ourselves, we constantly saw VC analysts juggling 10+ tabs just to understand what a startup does. Signals were scattered across Crunchbase, GitHub, LinkedIn, Product Hunt, and company websites — but there was no unified way to surface a clear, investor-relevant profile. SignalStackVC was born from that pain. We wanted a tool that felt like having your own research analyst — but faster, smarter, and automatic.

What it does

SignalStackVC aggregates fragmented data across the web and builds an intelligent, structured profile of any startup — in seconds. You input a company name, and SignalStackVC pulls in high-signal data (funding, team, traction, tech, etc.) from trusted sources like Crunchbase, LinkedIn, GitHub, and press coverage. It categorizes content using semantic embeddings, filters noise using confidence scoring, and generates a clean, JSON-based company profile with clickable sources.

How we built it

  • 🧠 Smart Query Engine: Combines rule-based and LLM-generated queries tailored to each startup.
  • 🔍 Search Layer: Uses SerpAPI to fetch high-relevance Google search results per query.
  • 🧭 Categorization Engine: Uses OpenAI embeddings + cosine similarity to classify each result into funding, tech, team, product, etc.
  • 🔗 Async Web Scraper: Enriches high-confidence links with live page content using aiohttp.
  • 🤖 Profile Generator: Uses GPT-4 to generate a structured profile from categorized data, with strict constraints to avoid hallucination.
  • 🗂 Backend Infra: Modular, CLI-friendly Python backend with rate limiting, deduplication, and JSON export.

Challenges we ran into

  • Entity disambiguation: Sometimes we fetched unrelated results for similarly named tokens or products (e.g. VLY crypto vs vly.ai).
  • Result quality: Google results can be noisy; we had to build custom filtering, scoring, and fallback logic to keep profiles clean.
  • OpenAI hallucinations: Without strict grounding, the model sometimes mixed up team members or made up features — so we tightened prompts and added source matching logic.

Accomplishments that we're proud of

  • Built a fully functioning backend pipeline in under 48 hours.
  • Extracted clean profiles for real YC startups like vly.ai, including team info and product summaries with cited sources.
  • Designed a confidence-weighted categorization system that balances rules + embeddings.
  • Modularized the codebase so it’s production-ready and extendable.

What we learned

  • Combining LLMs with traditional rule-based systems leads to surprisingly robust data pipelines.
  • Query quality is everything — the better your search prompts, the better the downstream data.
  • Entity resolution is hard — and crucial. Filtering and exact-match logic saved us from tons of noise.

What's next for SignalStackVC

  • 🎯 Frontend: Launch a UI where VCs can search and watch startups, not just generate profiles from CLI.
  • 📊 Watchlists & Alerts: Let users track startup momentum over time (team changes, funding updates).
  • 💬 Founder Signals: Extract founder posts from Substack, Medium, and LinkedIn.
  • 🧠 Semantic Search: Let users search for “early-stage devtools with high traction” and get ranked results.
  • 🔐 User accounts + history so analysts can save, comment, and collaborate on startup profiles.

Built With

Share this project:

Updates