Building an open world in the browser, part 29: One controller, any body

By Oleg Sidorkin, CTO and Co-Founder of Cinevva

New here? Use the series guide. It explains what a spike is and links all parts.

Part 28 covered grass and occlusion. Twenty-eight parts built a world to stand in: terrain you can read, water you can swim, foliage that holds up to the horizon, a server that remembers what you changed. This part is about the thing that moves through all of it, and it's the payoff of the whole engine, because the goal isn't a player controller. It's a controller that doesn't care what body it's driving, where its animations came from, or whether a human or an AI is at the wheel. Spike 58 builds movement as a stack of pluggable behaviors over a physics engine that knows nothing about locomotion, then proves the architecture by running three different bodies through one copy of it. Spike 59 hangs a real retargeted avatar on that controller without changing a single line of it. Spike 60 hangs a different animation pack that needs no retargeting at all, and pulls the clip-picking logic out into something you can actually test.

A physics engine that knows nothing about walking

Open Spike 58 in a new tab ↗ · View source

The design rule is severe and it's the whole point: the capsule engine owns no locomotion. No walk, no run, no jump, no friction, no top-speed cap, not even gravity. It integrates a kinematic capsule against terrain and nothing more. Every locomotion behavior, walk, slide, glide, climb, swim, crouch, stamina, lives in a self-contained controller registered with the engine. Each frame the engine ticks every controller, asks each one whether it wants control, and lets the highest-priority claimant write velocity. Walk has no special status. It's just the lowest-priority controller that always says yes, so it's the default when nothing else fires. A controller is six small functions: a tick that updates its own internal state every frame even when it isn't active, a pure wantsControl predicate that claims the frame, an applyForces that only the winner runs and that writes velocity and applies its own gravity if it wants any, plus optional onEnter, onExit, and stateName. Swim returns ownsCollision: true from applyForces to take over terrain handling, because its buoyancy spring would otherwise fight the engine's foot-snap.

Even the things that aren't locomotion still leak through that contract, which is what forced the next idea. The piece that earns the architecture is channels. An earlier single-channel design ran everything through one arbitration, which meant a stamina tracker or a crouch stance had to pretend to be locomotion and then decline control with a wantsControl returns-false hack just to run its bookkeeping. The fix splits controllers into named channels that arbitrate independently and apply in a fixed order: resource, then stance, then locomotion. Resource runs first because its writes, like stamina decay, get read by the others. Stance runs second because crouch shrinking the capsule height has to land before walk reads it to cap top speed. Locomotion runs last and owns the per-frame velocity write. So a stamina observer and a crouch modifier and an active swim controller all coexist cleanly, each in its own channel, with no controller having to lie about what it is. The demo visualizes all of this with a bare capsule that changes color by active controller, so you can watch arbitration happen: green walk flips to orange on a steep slope as slide takes over, to cyan in the lake as swim wins, to cream midair when you tap glide.

One engine, three bodies

The real test of "owns no locomotion" isn't the player. It's whether the same engine, untouched, can drive something that isn't a player at all. createCapsuleEngine is a pure factory with no module-level state, no singletons, and no per-instance side effects, so the spike instantiates it three times. The player is one instance with the full controller set. A rideable horse is a second instance that registers only a walk controller, which is the entire idea made literal: a body's movement vocabulary is just whichever controllers you registered, so the horse is faster on the flat and physically can't climb, swim, or glide, because those controllers were never added. An autonomous NPC is a third instance, ticked every frame alongside the player, driven by a wander controller that synthesizes its own input so the body steers itself with no hands on the keyboard. The engine never learns that one of its bodies is a horse or that another is AI-driven. They're all the same capsule integrator with different controller lists.

Mounting is the one piece that deliberately lives outside the engine. Swapping control between the player and the horse means coordinating two engines, and a controller runs inside one engine and can't see across that boundary, so the mount logic sits at the host level: it's a small state machine that decides which engine gets stepped this frame, freezes the other, eases the rider onto the saddle with a smoothstep over three quarters of a second, and tells the camera which body to track. The engine factory never hears about any of it. That's the line the architecture draws and keeps. Behaviors that belong to one body are controllers; coordination between bodies is the host's job, and keeping those separate is why a fourth or fortieth body would cost nothing new.

Tested without a browser

Because the engine reaches for no window, no document, and no Three.js, the whole thing runs headless. The spike ships a Node harness that mocks the terrain interface, drives the engine frame by frame with scripted input, and asserts on the resulting state, so the regressions that are miserable to catch by play-testing get caught in a script instead: the jump-and-land transition, swim entry and exit thresholds, climb auto-clearing when a surface flattens, stamina drain rates. Feeding the same input and timestep sequence twice and checking for byte-identical output state pins the engine as deterministic, which is the property a networked build will eventually lean on. One regression the harness caught is worth keeping: stepping from walkable ground onto a steeper-than-walk slope used to engage the slide controller and bounce the player back up the hill. The fix was a descent gate. Sliding only starts if the capsule is actually falling onto the slope, so walking horizontally into a steep face now blocks cleanly instead of sliding, and the test that asserts "walking into a non-walkable slope blocks the player" keeps it fixed.

Plugging in a real avatar without touching the controllers

Open Spike 59 in a new tab ↗ · View source

Spike 59 is the test of whether the controller layer is really decoupled from the body. It swaps the colored capsule for a real skinned character, the 3MIKE FBX rig with Quaternius Universal Animation Library clips retargeted onto it, and the controllers don't change at all. The seam is a single string. Each controller already reports a state name through stateName, idle, walk, run, jump, fall, land, slide, glide, swim, swimIdle, and the avatar layer maps that name to a retargeted clip through an alias table and crossfades on switch. Walk picks its own sub-state dynamically from grounded plus vertical velocity plus horizontal speed, so a single walk controller drives idle, walk, run, jump, fall, and land, and the avatar just follows the reported name. Because this spike is single-player, the avatar drops the wire-integer indirection and multi-character abstraction from the earlier networked spikes and maps the state name straight to a clip. The proof is that the entire visual upgrade from capsule to rigged human touched zero lines of locomotion code, which is exactly what a pluggable controller is supposed to buy.

A pack that needs no retargeting, and a picker you can test

Open Spike 60 in a new tab ↗ · View source

Spike 60 plugs in a third body, Synty's POLYGON Base Locomotion pack, and the loader is almost nothing. Each clip ships as a standalone FBX carrying an embedded copy of the same Synty skeleton plus one baked animation, and because the character rig and every clip use identical bone names, you grab the clip off fbx.animations[0] and play it directly on the character's mixer with no retargeting library in the loop. Three.js resolves animation track targets by bone name rather than object identity, so a Synty or Mixamo-style pack authored against the matching rig just works. That's the deliberate contrast with the UAL path from the previous spike, which needs heavyweight retargeting because the source clips and the target rig were authored against different skeletons. Same controller, same state-name seam, two completely different animation pipelines behind it.

The other half of spike 60 is making the clip-picking logic testable. Choosing which clip to play is full of judgment thresholds, and that logic had been buried inside the avatar layer next to FBXLoader and the DOM where it couldn't be exercised. The spike extracts the picker into pure functions that touch neither Three.js nor the window: they take a plain player record (velocity, horizontal speed, facing, grounded, ground normal, impact velocity) and a clip-action stand-in, and return a clip alias string. That lets the thresholds live as named, asserted constants. Jump splits into walking, running, and sprinting variants by speed buckets aligned with the walk controller's actual sprint threshold; landings split into soft, medium, and hard by impact velocity; uphill and downhill clip variants fire from a slope-projection dot product with a flat-clip deadzone below about seven degrees; an idle-to-locomotion bridge always resolves forward so a standing start never plays a clip backward; and a stop bridge is gated by a minimum time-in-loop, because Synty's foot-phase stop clips contain about a second of authored deceleration that looks ridiculous tacked onto a step the player barely took. A small state debouncer holds off the fall state for a few frames so a one-frame loss of ground contact across a seam doesn't flicker the animation. Pulling all of that out of the rendering shell means the rules can be unit tested without a GPU, the same headless discipline the engine itself got in spike 58, which is the difference between locomotion feel you tune by guessing and locomotion feel you can pin down.

Technology referenced in this chapter

A locomotion-free physics engine. The capsule engine integrates a kinematic body against terrain and owns no walk, run, jump, friction, speed cap, or gravity. Every behavior is a registered controller exposing tick, wantsControl, applyForces, and optional onEnter/onExit/stateName. Walk is just the lowest-priority always-yes default, and a controller can return ownsCollision: true to take over terrain handling (swim does, so its buoyancy spring doesn't fight foot-snap).

Independent arbitration channels. Controllers register into named channels (resource, stance, locomotion) that arbitrate separately and apply in a fixed order, so observers like stamina and modifiers like crouch coexist with active locomotion instead of faking control and declining it. Resource writes (stamina) are read by stance and locomotion; stance writes (crouch capsule height) are read by locomotion's speed cap; locomotion runs last and owns the velocity write.

One factory, many bodies. createCapsuleEngine is a pure factory with no singletons, so the same engine drives the player, a walk-only rideable horse, and a self-steering NPC fed synthetic input. A body's movement vocabulary is exactly its registered controller set, so the horse can't climb or swim because those controllers were never added. Mounting lives at the host level, not in a controller, because it coordinates two engines that can't see across each other. See GPU-driven LOD.

Headless, deterministic testing. The engine touches no window, document, or Three.js, so a Node harness drives it frame by frame and asserts on jump-and-land, swim thresholds, climb auto-clear, and stamina drain. Replaying identical input twice and checking for identical output proves determinism, and a regression test pins the descent gate that stops a horizontal walk into a steep face from sliding the player backward.

Body-agnostic avatar binding with a testable picker. Controllers report a state-name string and the visual layer maps it to a clip, so swapping a capsule for a retargeted 3MIKE + UAL avatar touches zero locomotion code. Synty POLYGON clips share bone names with the rig and play with no retargeting (Three.js binds tracks by bone name), unlike the UAL path's heavyweight retargeting. The clip picker is extracted into pure functions with named constants for jump speed buckets, landing severity, slope-projection variants, an always-forward idle bridge, a stop-bridge minimum-time guard, and a fall-state debouncer, so locomotion feel is unit-testable rather than guessed.

Part 29 of 29. This is the final part of the series so far. Previous: Part 28 - Grass to the horizon, and ground that hides itself Series guide: /blog/2026-02-25-open-world-browser-series-guide

Building an open world in the browser, part 29: One controller, any body ​

A physics engine that knows nothing about walking ​

One engine, three bodies ​

Tested without a browser ​

Plugging in a real avatar without touching the controllers ​

A pack that needs no retargeting, and a picker you can test ​

Technology referenced in this chapter ​