Context Engineering, Routing, GraphRAG, Reflection, Speculative Decoding
Five new blogs on the patterns powering modern AI apps.
Five new blogs. Each one breaks down a topic that matters in modern AI engineering: Context, Routing, Retrieval, Agents, and Inference Speed.
Let’s get into it.
1. Context Engineering
The model is the engine. The context is the fuel.
A great context with an average model often beats an average context with a great model. So, the next time we are debugging an AI app, the first question is not “is the model wrong?” - it is “is the context right?”
In this blog, we cover what goes inside the context window, the eight components that compete for that space, common patterns like RAG, few-shot, memory, and tool calling, plus the mistakes that quietly kill quality - context rot, “lost in the middle”, and stale history.
Read it: https://outcomeschool.com/blog/context-engineering
2. LLM Routing
Most user queries are simple. “What is 2 + 2?” does not need a frontier LLM. But that is exactly what most apps send.
A frontier LLM can be 30x more expensive than a small one. LLM Routing is the layer that picks the right model for each query - small for easy, frontier for hard.
In this blog, we cover the anatomy of a router, five routing strategies (rule-based, classifier, embedding, LLM-as-router, and cascade), a full trace example across four queries, and how this is different from Mixture of Experts.
Read it: https://outcomeschool.com/blog/llm-routing
3. GraphRAG
Normal RAG treats every chunk as an island. But our data is not islands - it is a web.
“How is Person A connected to Company X through their work history?” - normal RAG cannot answer this. The information is scattered across many chunks, and similarity search alone cannot stitch them together.
GraphRAG fixes this by building a knowledge graph during indexing, then walking the graph at query time. In this blog, we cover the indexing phase, the query phase, local search vs global search (with the map-reduce step), and the trade-offs you should know before you commit.
Read it: https://outcomeschool.com/blog/graphrag
4. Reflection Agent
A plain LLM call gives us one draft and stops. A Reflection Agent gives us a draft, critiques it, revises it, and keeps going until the work passes its own quality bar.
Think of it as a writer working with an editor on the same essay - except both are the LLM.
In this blog, we cover the five parts of a Reflection Agent (Generator, Critic, Tools, Memory, Stop Check), a full trace where the agent rewrites a weak product description into a strong one, how it differs from a ReAct Agent, and seven failure modes - including the sneaky one where the Critic asks for specifics and the Generator hallucinates them to comply.
Read it: https://outcomeschool.com/blog/reflection-agent
5. Speculative Decoding
LLMs generate one token at a time. The GPU is not slow at math - it is slow at moving weights into memory. So most of the GPU’s compute sits idle while we wait.
Speculative Decoding fills that gap. A small draft model writes a few tokens fast. The big target model verifies all of them in a single parallel pass. We get a 2x to 3x speedup, and the math guarantees zero quality loss.
In this blog, we cover the core idea, a step-by-step walkthrough, the rejection sampling rule that makes it lossless, real numbers showing the timeline savings (10s → 3.5s), and where it is used in production today (vLLM, TensorRT-LLM, DeepSeek-V3’s MTP).
Read it: https://outcomeschool.com/blog/speculative-decoding
That’s it for now

