Pocket GPT

It started with a 300-line paper.

We were reading through a minimal GPT implementation and realized that the entire architecture of modern AI, the same architecture powering ChatGPT, Claude, Gemini, fits in a file you can read in an afternoon. Transformers are not magic. They are matrix multiplications, attention weights, and a loss function pointed at a corpus of text.

But almost nobody knows that. Most people interact with AI as a black box: you type, it answers, and the entire learning process is invisible. We wanted to break open that box. What if you could watch a language model learn, step by step, in real time, right in your browser? What if the loss curve, the attention heads, the token probabilities, all of it, was visible and interactive as the model trained?

That's PocketGPT.

What It Does

PocketGPT is an interactive educational platform that lets you train a miniature GPT model from scratch and watch it learn in real time. No setup, no GPU, no machine learning background required.

Watch It Learn: Choose a dataset (Shakespeare, Python code, or Wikipedia), hit play, and watch a character-level transformer train live. The loss curve updates every step. Generated text evolves from random noise to recognizable prose. A next-token probability tower shows the model's confidence distribution at every moment, with an interactive temperature slider so you can see how sampling randomness affects output.

Attention Cinema: Visualize the transformer's attention mechanisms as they develop. Live-updating 2D heatmaps and a 3D attention surface show which tokens each head focuses on. Watch diffuse, random attention patterns sharpen into purposeful structure as training progresses. A layer and head selector lets you isolate individual attention heads and observe how they specialize over time.

Style Transfer: Paste in your own writing and watch the model learn your voice. In under two minutes, generated text shifts from gibberish toward something that sounds like you — your vocabulary, your punctuation habits, your sentence rhythm. The style evolution display shows the progression side by side.

Every feature comes with a built-in guided tutorial — three chapters, 20+ steps — that explains what you're looking at and why it matters.

How We Built It

The model is a character-level transformer in micro_gpt.py — between 7,900 and 200,000 parameters depending on the size preset. Architecturally identical to GPT: token embeddings, positional embeddings, N transformer blocks (each with LayerNorm, multi-head self-attention, and a feed-forward layer), cosine learning rate decay with optional warmup, and a weight-tied output projection. Built on PyTorch, running on CPU.

The architecture is based on bdunaganGPT by Brian Dunagan (MIT), which is itself derived from Andrej Karpathy's Zero to Hero lecture series and nanoGPT work.

The backend is Flask + Flask-SocketIO. Training runs in a background thread and streams metrics — loss values, generated text samples, attention snapshots, token probability distributions — to the frontend via WebSocket in real time. Sessions are managed with a full lifecycle (idle → running → paused → completed), checkpoint save/load, and support for .docx upload via python-docx.

The frontend is React 19 + Vite, styled with Tailwind CSS, animated with Framer Motion, and uses Three.js for the 3D attention heatmap. Every visualization updates live via Socket.IO. The UI was designed from scratch with a neuro aesthetic, dark backgrounds, gold accents, monospace fonts. The tutorial system uses a spotlight overlay with scroll-to-element behavior and a chapter/step structure, guiding users through the UI without ever leaving the page and teaches them as much as possible while they use the demo about how llms work.

Accomplishments That We're Proud Of

We shipped a real-time training visualizer in a hackathon. Streaming WebSocket events from a live PyTorch training loop to a React frontend, keeping the UI responsive and synchronized at every step, while also handling pause/resume/step controls, model save/load, and multiple concurrent feature tabs.

The 3D attention heatmap. Building an interactive Three.js surface that updates live during training, with orbit controls, fullscreen mode, and synchronized playback — that was a stretch goal we weren't sure we'd hit.

The tutorial system. Writing a 20-step guided tutorial that actually teaches transformer concepts not just how to use the UI, but why attention heads exist, what loss means, how temperature shapes output. .

Tech Stack

Layer	Technology
Model	PyTorch (from scratch)
Backend	Python, Flask, Flask-SocketIO
Real-time	WebSocket (Socket.IO)
Frontend	React 19, Vite
Styling	Tailwind CSS, Framer Motion
3D Viz	Three.js
File parsing	python-docx