Context Engineering, Routing, GraphRAG, Reflection, Speculative Decoding

Five new blogs on the patterns powering modern AI apps.

May 09, 2026

Five new blogs. Each one breaks down a topic that matters in modern AI engineering: Context, Routing, Retrieval, Agents, and Inference Speed.

Let’s get into it.

1. Context Engineering

The model is the engine. The context is the fuel.

A great context with an average model often beats an average context with a great model. So, the next time we are debugging an AI app, the first question is not “is the model wrong?” - it is “is the context right?”

In this blog, we cover what goes inside the context window, the eight components that compete for that space, common patterns like RAG, few-shot, memory, and tool calling, plus the mistakes that quietly kill quality - context rot, “lost in the middle”, and stale history.

Read it: https://outcomeschool.com/blog/context-engineering

2. LLM Routing

Most user queries are simple. “What is 2 + 2?” does not need a frontier LLM. But that is exactly what most apps send.

A frontier LLM can be 30x more expensive than a small one. LLM Routing is the layer that picks the right model for each query - small for easy, frontier for hard.

In this blog, we cover the anatomy of a router, five routing strategies (rule-based, classifier, embedding, LLM-as-router, and cascade), a full trace example across four queries, and how this is different from Mixture of Experts.

Read it: https://outcomeschool.com/blog/llm-routing

3. GraphRAG

Normal RAG treats every chunk as an island. But our data is not islands - it is a web.

“How is Person A connected to Company X through their work history?” - normal RAG cannot answer this. The information is scattered across many chunks, and similarity search alone cannot stitch them together.

GraphRAG fixes this by building a knowledge graph during indexing, then walking the graph at query time. In this blog, we cover the indexing phase, the query phase, local search vs global search (with the map-reduce step), and the trade-offs you should know before you commit.

Read it: https://outcomeschool.com/blog/graphrag

4. Reflection Agent

A plain LLM call gives us one draft and stops. A Reflection Agent gives us a draft, critiques it, revises it, and keeps going until the work passes its own quality bar.

Think of it as a writer working with an editor on the same essay - except both are the LLM.

In this blog, we cover the five parts of a Reflection Agent (Generator, Critic, Tools, Memory, Stop Check), a full trace where the agent rewrites a weak product description into a strong one, how it differs from a ReAct Agent, and seven failure modes - including the sneaky one where the Critic asks for specifics and the Generator hallucinates them to comply.

Read it: https://outcomeschool.com/blog/reflection-agent

5. Speculative Decoding

LLMs generate one token at a time. The GPU is not slow at math - it is slow at moving weights into memory. So most of the GPU’s compute sits idle while we wait.

Speculative Decoding fills that gap. A small draft model writes a few tokens fast. The big target model verifies all of them in a single parallel pass. We get a 2x to 3x speedup, and the math guarantees zero quality loss.

In this blog, we cover the core idea, a step-by-step walkthrough, the rejection sampling rule that makes it lossless, real numbers showing the timeline savings (10s → 3.5s), and where it is used in production today (vLLM, TensorRT-LLM, DeepSeek-V3’s MTP).

Read it: https://outcomeschool.com/blog/speculative-decoding

That’s it for now

Outcome School Newsletter

Discussion about this post

Ready for more?