Technical Posts
Retrieval Pipelines, Re-Ranking, and Grounding: Building Production RAG
A practical guide for software engineers on building production-grade RAG systems using hybrid retrieval, re-ranking, and grounding techniques to reduce hallucinations and improve answer quality.
read moreTechnical Posts
Vector Embeddings & Similarity: The Foundation of RAG
A practical deep-dive into vector embeddings and cosine similarity — the mathematical foundation that makes retrieval in RAG systems actually work.
read moreTechnical Posts
Vector Databases, ANN, and Chunking: Storing Knowledge for Retrieval
A practical guide for software engineers covering how vector databases use Approximate Nearest Neighbor algorithms to search millions of embeddings efficiently, and how to chunk documents intelligently so your RAG pipeline actually retrieves useful, precise context.
read moreTechnical Posts
Page-Aware AI Chat: Floating Widget and Per-Page Context
A practical walkthrough of adding per-page context awareness to a floating AI chat widget built with Hugo and Netlify Functions, covering layout overrides, slug injection, priority chunk labeling, and the prompt engineering fix that made summarise-this-post actually work.
read moreTechnical Posts
Building an AI Chat Assistant for a Static Blog — No Vector DB Required
A practical walkthrough of building a conversational AI assistant for a Hugo static site using TF-IDF retrieval over a flat JSON knowledge base — no vector database, no backend server, no embeddings infrastructure required.
read moreTechnical Posts
TCP/IP, DNS, and Data Transmission Protocols Explained
A practical, code-illustrated guide to how TCP/IP, DNS, and modern data transmission protocols work under the hood — from handshakes and packet routing to WebSockets, gRPC, and QUIC.
read moreTechnical Posts
AI Prompting Techniques: System Prompts, Few-Shot, CoT, and Structured Output
A practical engineering guide to four core LLM prompting techniques—system prompts, few-shot examples, chain-of-thought reasoning, and structured output—covering real failure modes and production-ready patterns.
read moreTechnical Posts
The Evolution: Beyond Transformers
A practical walkthrough of how the Transformer architecture evolved from encoder-decoder to decoder-only models, why attention’s quadratic scaling became a hard wall, and how Mamba’s state space machines are being absorbed into hybrid architectures that dominate production today.
read moreTechnical Posts
Training for Greatness: Speed, BLEU Records, and the Multimodal Vision
A practical deep-dive into how the original Transformer model shattered translation benchmarks, slashed training costs, and laid the architectural foundation for every major LLM that followed.
read moreTechnical Posts
Inside the Machine: Encoders, Decoders, and Masking
A practical deep-dive into how the Transformer’s encoder and decoder stacks work, covering residual connections, positional encoding, masked self-attention, and cross-attention with code examples throughout.
read more