Building an open world in the browser, part 17: Animations that didn't need retargeting, and a live asset search

By Oleg Sidorkin, CTO and Co-Founder of Cinevva

New here? Use the series guide. It explains what a spike is and links all parts.

Part 16 gave us a world you can place objects in. This part gives the player something better to do in it: a combat-grade animation set, and a way to pull any of a thousand CC0 models into the world by typing a search.

262 clips, one skeleton, three retargeting fixes

Open Spike 35 in a new tab ↗ · View source

The player rig is a CC4-style 3MIKE skeleton with 213 bones, including an 80-plus-bone facial rig and full finger joints. The animation sources don't match it. The first cut retargeted hand-picked Mixamo combat clips, but the source pool was thin and locomotion was split awkwardly between Mixamo and a few Kimodo BVH idles. The real win came when we pivoted the source library to Quaternius's two Universal Animation Library packs, 262 clips total on one consistent UE5-style 65-bone mannequin: sword combos, bow archery, climbs, wallruns, dodges, hit reactions, emotes, and a much cleaner locomotion set.

That's a 65-bone source to a 213-bone target, with no shared bone names, T-pose orientations, or limb proportions. Every clip gets remapped in three steps: a bone-name map, a bind-pose alignment so both rigs share a reference orientation, and position-track scaling so the shorter source rig doesn't put the character knee-deep in the floor. Getting that right meant chasing a sequence of bugs that each taught something specific.

The character came out over-twisted, every shoulder and elbow rotated about 30° too far. The cause was a pose mismatch: the UAL source ships a true T-pose, CC4's bind is an A-pose, and the retargeter assumed both rigs sat in matching reference poses, so the A-to-T difference got added into every per-frame delta. The fix forces CC4's arm chain into a true T-pose to capture the bind, then retargets, so the deltas stay small.

Then the body stopped translating. Knockbacks recoiled the upper body but left the feet planted. UAL puts displacement on the root bone, not the pelvis, so reading pelvis position gave near-zero motion. The fix sums root.position and pelvis.position, scales the horizontal part, and writes a single hip-position track. While doing this we found vertical offsets were off by about 12%, because we'd used the global limb-proportion ratio for the Y axis when pelvis height needs the hip-to-floor ratio. Two ratios, two axes: $verticalRatio = | targetHipY / sourceHipY |$ for Y, the proportion ratio for the horizontal. Overshoot gone. And a stream of RL_BoneRoot.position NaN warnings traced to mapping UAL's root (at $Y = 0$ ) onto a target bone whose position track gets normalized by dividing by its bind Y. Divide by zero, NaN, silently dropped track. The fix was to drop the root mapping entirely, since its translation already lives in the hip track.

The smallest fix was the most satisfying to watch. The character gripped the sword with limp bind-pose fingers because the bone map had no finger entries, and the retargeter only writes tracks for mapped bones. Adding 30 finger bones (five fingers, three segments, two hands, dropping UAL's fourth tip helper that doesn't deform) made the hand close on the grip and open on the release.

Proving 262 clips without watching 262 clips

You can't eyeball-verify a 262-clip library, so we built an offline trajectory parity test: a headless Node script that loads each pack, samples the source skeleton at 60Hz, runs the retargeting pipeline, and compares per-bone world positions and rotations against the source after scaling. Pelvis Y drift came in at 0.003 m max. The hands showed a constant 2.5° offset that first read as "fingers aren't tracking," but a constant offset is the bind delta between the flat UAL hand and CC4's slightly cupped bind, and it's invariant across the clip. Real animation errors show up as drift that varies frame to frame. Once that was clear, the test became a one-shot regression check: if a clip's drift stops holding that constant baseline, a recent change broke retargeting.

A side-by-side reference mannequin made the visual half of debugging decisive. Pressing backslash shows whichever source rig owns the current clip next to the player, so a "twisted shoulder" question becomes "is the twist in the source, or did retargeting add it?" Numerical tests catch regressions, the visual reference catches bind-pose mistakes the numbers don't surface, and together they retired the guessing game.

With the pipeline solid, switching the WASD/jump/swim baseline from Mixamo to UAL was a thin alias map: the FSM still speaks generic state names like idle and walk, resolved to UAL clip names at playback. Swimming needed per-clip rig offsets because the freestyle and tread-water poses anchor the pelvis at different anatomical heights, so we submerge the rig half a meter for active swimming and deeper for treading, blending between them at a smooth 5Hz.

Type a word, get a model

Open Spike 36 in a new tab ↗ · View source

Spike 34 ate a day hand-curating a CC0 pack. The long-term answer is a search box. Polyhaven publishes about 1,100 CC0 models behind a permissive JSON API and a deterministic CDN, and this spike wires the whole path (query, thumbnails, load, render) from a static page with no build step, in roughly 300 lines of vanilla JS plus three.js.

On boot it fetches the full catalog once, about 600 KB. Search is pure client-side scoring (name beats id beats category beats tag) with a 120 ms debounce, rendering the top 60 cards. Thumbnails lazy-load through an IntersectionObserver so typing doesn't fire 60 requests at once. The interesting piece is loading. Polyhaven's file endpoint exposes multi-file glTF, not GLB, with textures shared across resolutions and split into separate files, and it hands back an include map of relative path to absolute CDN URL. Rather than download and patch the JSON ourselves, we feed that map through LoadingManager.setURLModifier, which fires for every dependency the loader needs (the .bin, each texture) and resolves it through the CDN. One click, one apparent file. Both API and CDN set permissive CORS, verified with curl before any client code, so no proxy. PBR materials render correctly with RoomEnvironment and ACES tone mapping defaults with no per-asset fix-ups, and 1k textures keep a typical model at 2 to 5 MB instead of 20 to 40 MB at 4k.

A meshoptimizer WASM pass rounds it out with a non-destructive reducer: each mesh stashes a clone of its original geometry, and changing the ratio rebuilds an index buffer from that clone rather than simplifying cumulatively. Multi-material geometry simplifies per group and rebuilds geometry.groups so material slots don't collapse. An armchair goes from 5,626 triangles at full to 2,812 at half.

Technology referenced in this chapter

Skeleton retargeting with bind-pose alignment. Mapping animation from one skeleton to another with different bone names, proportions, and reference poses requires three corrections: a bone-name map, a bind-pose alignment so both rigs share a reference orientation (forcing the target's A-pose arm chain into the source's T-pose), and position-track scaling. A pose mismatch adds the A-to-T rotation into every per-frame delta, doubling joint rotation. Unmapped helper bones must be dropped, since a root bone at $Y = 0$ triggers a silent divide-by-zero in position-track normalization.

Two scale ratios for one rig. Horizontal displacement uses the global limb-proportion ratio (overall skeleton size), but vertical pelvis offset uses the hip-to-floor ratio $| targetHipY / sourceHipY |$ , because the leg-to-torso proportion differs between rigs. Using one ratio for both axes leaves climbs overshooting and recoveries sinking below the floor. UAL displacement also lives on the root bone, not the pelvis, so both must be summed into a single hip-position track to preserve motion across knockback, climb, and locomotion clips.

Offline trajectory parity testing. A headless script samples the source skeleton at 60Hz, runs the retargeting pipeline, and compares per-bone world transforms against the source. A constant per-frame offset is the harmless bind delta, while drift that varies frame to frame is a real error, so the test becomes a regression check that fires when a change drops or distorts a track. Per-pack GLB isolation (fresh load, freed before the next) avoids cache contention across the sweep.

LoadingManager.setURLModifier for CDN glTF graphs. When a CDN ships glTF as a relative-URI graph with an accompanying include map (relative path to absolute URL), setURLModifier resolves every dependency the loader requests through the CDN without rewriting the JSON. This collapses a multi-file, multi-resolution distribution into a single-click load. Disposing each prior model's geometry, materials, and texture maps before the next load prevents hundreds of MB of GPU memory accumulating across a browsing session.

Non-destructive mesh simplification. Storing a clone of each mesh's original geometry and rebuilding only the index buffer per simplification ratio keeps changes fast and avoids cumulative damage from repeated simplification. Running per geometry.groups slice and reconstructing the groups preserves multi-material assignments. See LOD and meshoptimizer for how this feeds distance-based LOD.

Part 17 of 29. Previous: Part 16 - Structure for a world that keeps growing Next: Part 18 - A scatter brush that feels AI-placed Series guide: /blog/2026-02-25-open-world-browser-series-guide

Building an open world in the browser, part 17: Animations that didn't need retargeting, and a live asset search ​

262 clips, one skeleton, three retargeting fixes ​

Proving 262 clips without watching 262 clips ​

Type a word, get a model ​

Technology referenced in this chapter ​

Building an open world in the browser, part 17: Animations that didn't need retargeting, and a live asset search

262 clips, one skeleton, three retargeting fixes

Proving 262 clips without watching 262 clips

Type a word, get a model

Technology referenced in this chapter