A CLI for managing a local semantic search-based personal knowledge base. Index your markdown documents and search them using natural language queries. Pairs perfectly with magenta.nvim - a coding assistant plugin for neovim.
- Standalone: Manage and search your own notes, documentation, and knowledge base via the cli.
- Agent Skill: Expose your knowledge base to AI agents like Claude Code by creating a skill.
- Single Db file: Commit the db to git to turn your repo's markdown docs into a searchable knowledge base for your whole team—just run sync (maybe in CI/CD) to keep everything up to date.
Currently only Markdown files (.md) are supported. Uses brute-force search to get the nearest neighbors (no approximate nearest neighbors), so this is best suited for small to medium document collections (not full codebases).
- Node.js 18+
- AWS credentials configured (access to Bedrock for cohere embed v4 model and anthropic haiku 4.5)
us-east-1oreu-west-1region access for Cohere Embed v4- SQLite3 with sqlite-vec extension
# Install dependencies
npm install
# Track a directory of markdown files
npx tsx scripts/cli.ts ~/pkb.db track ~/docs
# Index all tracked sources
npx tsx scripts/cli.ts ~/pkb.db sync
# Search your knowledge base
npx tsx scripts/cli.ts ~/pkb.db search "how to configure X"You can expose your PKB to Claude Code (or similar agents) by creating a skill:
-
Create a skill directory:
mkdir -p ~/.claude/skills/my-knowledge -
Clone this repo into the skill directory:
git clone <repo-url> ~/.claude/skills/my-knowledge/pkb cd ~/.claude/skills/my-knowledge/pkb npm install
-
Create
~/.claude/skills/my-knowledge/skill.md:--- name: my-knowledge description: My personal knowledge base containing notes on <topics>. Search here for <what kind of information>. --- ## Searching \`\`\`bash npx tsx ~/.claude/skills/my-knowledge/pkb/scripts/cli.ts ~/.pkb/knowledge.db search "<query>" [topK] \`\`\` ...
-
Track your files and sync:
npx tsx ~/.claude/skills/my-knowledge/pkb/scripts/cli.ts ~/.pkb/knowledge.db track ~/notes npx tsx ~/.claude/skills/my-knowledge/pkb/scripts/cli.ts ~/.pkb/knowledge.db sync
-
(Optional) Run sync in watch mode to automatically reindex tracked sources on file changes:
npx tsx ~/.claude/skills/my-knowledge/pkb/scripts/cli.ts ~/.pkb/knowledge.db sync --watch
Markdown documents are split into chunks for indexing:
- Hard splits on headings (h1-h6) - each section becomes a separate unit
- Soft splits on paragraphs and code blocks when sections exceed ~2000 characters (~500 tokens)
- Sentence-level splits for very long paragraphs
- Character-level splits with overlap as a last resort
Each chunk preserves its heading hierarchy context (e.g., # Guide > ## Configuration > ### AWS).
PKB implements Contextual Retrieval to improve search accuracy. Before embedding each chunk, an LLM generates additional context that situates the chunk within the full document.
For example, a chunk containing:
Set the ACL to private to restrict access.
Gets augmented with context like:
This chunk describes AWS S3 bucket access control configuration. ACL refers to Access Control List.
This context is prepended to the chunk before embedding, helping the vector search understand ambiguous references and acronyms.
Embeddings: Cohere Embed v4 via AWS Bedrock Context Generation: Claude 4.5 Haiku via AWS Bedrock
Other models are easily implemented by extending the existing interfaces.
npx tsx scripts/cli.ts --help