About the Project

Inspiration

The inspiration for iLLuMinator came from a deep curiosity about how large language models work under the hood, and a desire to go beyond wrapper APIs and truly build intelligence from the ground up. I wanted to demystify transformers and gain hands on experience building a fully custom model that could not only understand language but also retrieve and reason over real knowledge, like humans do when answering questions.

Inspired by seminal works like "Attention Is All You Need" and "Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks", I set out to create a hybrid architecture that could blend generation with intelligent retrieval using my own custom transformer implementation.

What I Learned

Building iLLuMinator taught me a lot, ranging from theoretical underpinnings to real-world implementation:

  • Transformers: Implemented causal self-attention, masking, learned positional encodings, and residual connections from scratch.

  • Language Modeling: Learned next-token prediction training techniques and how to tune generation quality using temperature and top-k sampling.

  • Information Retrieval: Integrated semantic vector search using FAISS and SentenceTransformers for relevance-based document retrieval.

  • Modular Design: Structured a full codebase to support multiple assistant interfaces and configurations.

  • Engineering Challenges: Managed memory, handled tokenizer mismatches, and built a fallback mechanism for stability.

  • Training Dynamics: Explored training stability, regularization, and optimization (e.g., AdamW, gradient clipping, layer norm).

How I Built It

The iLLuMinator project consists of multiple components, all custom-coded:

  • Transformer Core: A configurable transformer supporting ( n )-layer depth, ( d )-dimensional embeddings, and ( h )-head self-attention.
  • RAG System: Combines a document embedding pipeline with FAISS-based retrieval and response conditioning.
  • Smart Assistant Interface: Interactive CLI-based chatbot using the trained model and dynamic context assembly.
  • Training Pipeline: Built using PyTorch, supporting both basic and RAG-enhanced datasets.

The training objective is to minimize next-token prediction loss:

L = -∑ (from t = 1 to T) log P(x_t | x_<t)

where x_t is the target token at time step t, and x_<t are the preceding tokens.


Challenges Faced

The project came with a fair share of obstacles:

  • From-Scratch Design: Implementing transformer logic manually (causal masking, multi-head attention, positional encodings) required debugging low-level tensor operations.
  • Parameter Explosion: Managing memory during training even on small CPUs, especially when testing larger vocab and sequence sizes.
  • Tokenizer Issues: Ensuring tokenizers matched model expectations without external dependency mismatches.
  • Model Stability: Training deep models on small machines led to instability, which required optimization through smaller batch sizes and layer counts.
  • Retrieval Alignment: Getting FAISS-based retrieval results to align contextually with generation required careful preprocessing and embedding tuning.

Through persistence and iteration, iLLuMinator grew into a modular, flexible, and intelligent system that represents a serious step toward custom, scalable, and understandable AI.

Built With

Share this project:

Updates