π₀.5, from scratch.

π₀.5 uses a discrete categorical objective to shape the language model during training, and a continuous flow-matching objective to act at runtime — and that asymmetry is the architecture.

This is a six-stop walk through π₀.5 — a vision-language-action policy from Physical Intelligence that took mobile-manipulation robots into homes the model had never trained in. It is built for someone who has never opened a transformer paper, never heard of FAST tokenization, and does not yet know what an action expert is. By the end you will know what π₀.5 is, what it computes, what it learns, and where it breaks.

The site assumes you know what a neural network is and what a robot looks like. Everything else — multi-head attention, prefix-LM masking, flow matching, knowledge insulation, the FAST tokenizer — is built up from primitives. Each page surfaces a 5-minute spine; deeper material is tucked into optional collapses.

You can stop wherever you have what you came for. Each page ends with a so what takeaway in plain language.