Building an open world in the browser, part 18: A scatter brush that feels AI-placed

By Oleg Sidorkin, CTO and Co-Founder of Cinevva

New here? Use the series guide. It explains what a spike is and links all parts.

Part 17 gave the player a combat-grade animation set and a way to pull any CC0 model into the world. This part goes back to the creator's side. Spike 34's palette places one prop per click, which is fine for staging a hero object and useless for a forest. Spike 37 is the brush: drag across the terrain and trees fill in where trees belong.

"AI-placed" without an AI

Open Spike 37 in a new tab ↗ · View source

The question this spike answers is whether a purely heuristic brush feels intelligent enough to skip the LLM. The test of "AI-placed" is concrete: trees stay off cliffs, rocks tilt into the slope, beach pebbles stop at the waterline, all on the first stroke. We got there with slope and altitude predicates, weighted draws, and per-family spacing, and not a single model call.

The brush works on a 257×257 CPU heightmap with hand-tuned features so every preset has somewhere to land: northern mountains for the mixed-slope picks, an eastern cliff strip for scree, a southern coastal plain for beach and meadow, a lake bowl in the south-west. The terrain bakes vertex colors from an (altitude, slope) biome classifier, so before you paint a single tree you can see where a preset will fire. Five presets ship as flat data, each a list of picks like { category, weight, slopeMin, slopeMax, altMin, altMax, minSpacing, alignToSlope }. Cliff and Scree sets slopeMin: 0.3 so rocks only land on actual slopes and alignToSlope: true so each boulder's up vector follows the surface normal.

For each stroke the scatter engine samples densityPerM2 × area candidate points inside the brush disc, reads height and slope per candidate, filters the preset's picks to those whose predicates pass, weighted-draws one, then runs a spacing check against an in-radius spatial hash. The whole thing is deterministic: a seedable Mulberry32 RNG owns every draw, so (seed, brush events) reproduces any session exactly. On the boot terrain a Mixed Forest stroke on flat meadow placed 139 of 158 candidates in 5 ms, while the same preset on a cliff placed only 106 of 226 and the HUD reported 81 of those rejected by slope. That rejection breakdown is the whole UX: you can see why the cliff only took a few trees instead of guessing.

The point of keeping presets as flat data is that the LLM version, when it lands, is a JSON swap rather than a rewrite. paint({ preset }) doesn't care whether preset.picks came from a hand-tuned recipe or a worker that expanded "deciduous forest with mossy boulders" into weights. The engine never hardcodes a prop id either, so dropping in a different catalog needs no engine changes.

From 300 draw calls to 49

The first cut rendered each placement as a clone(true) of a multi-mesh group, which is fine at a few hundred props and a wall at the 2,500 cap, where draw calls climb into the thousands. We swapped to InstancedMesh before that hit, with one bucket per (propId, partIndex). Each bucket grows by doubling: allocate a bigger InstancedMesh, copy the live matrices, swap the scene parent, dispose the old attribute. Erase is a swap-remove, so removing one instance is O(1) regardless of bucket size. Determinism, spacing, and the rejection HUD all carry through unchanged because the swap lives entirely below the placement record.

A diagnostic on the MegaKit pack settled a real architecture question. A multi-primitive glTF mesh (trunk plus leaves) can reach three.js as either one mesh with an array material and geometry.groups, or as separate sibling meshes with one material each. The loader takes the second path for this pack: every part is a single-material mesh with empty groups. That's the better shape for scatter, because separate buckets per primitive let the trunk bucket grow independently from the leaf bucket if their counts diverge. Same draw-call count either way, better memory shape with the split. The measured win held up: a forest stroke that was ~300 draw calls became 49, and a full multi-stroke session reached 3,221 instances at 75 FPS in 51 draw calls, a cap the clone path could never reach before frame budget collapsed.

Distance LOD, and four bugs hiding in it

Instancing cut draw calls but every instance still drew its full triangle count, even the trees 90 m out contributing two pixels of leaf detail. So we baked three LOD levels per prop part with meshoptimizer (full, 50%, 15%), extended the bucket key to (propId, partIndex, lod), and added a move() that shuttles a placement between sibling buckets with no allocation. Distance bands are 0 to 30 m, 30 to 90 m, and beyond, with ±4 m of hysteresis around each boundary so a camera hovering near a band edge doesn't thrash a placement back and forth re-uploading its matrix every frame. Re-evaluation is capped at 4 Hz and gated on the camera actually having moved, so a still camera costs one squared-distance compare per frame.

That LOD path is where the instructive bugs lived. The first showed up as placements vanishing or duplicating as the camera orbited, worse as the scene filled up. The cause was a shared scratch matrix: move() read a placement's transform into the module-scoped _tmpMat, but the source bucket's swap-remove used that same _tmpMat for its own internal shuffle, clobbering the carried matrix before the destination wrote it. The bug only spared the case where the moved slot was already last in its bucket, roughly a 1/count chance, which is exactly the "rare flicker that gets worse as the scene grows" the playtest saw. Fix was a dedicated _carryMat reserved for move() alone. Stress-tested at 1,274 cumulative moves, the cluster stayed pixel-identical.

The second bug was subtler: every LOD transition felt smooth except the first one. Trees crossing into LOD1 visibly shifted shading even though their silhouette barely changed, while bigger triangle drops later in the ladder looked fine. The simplifier with LockBorder never moves or invents vertices, so surviving vertices keep their normals exactly, but we were calling computeVertexNormals() after every simplification anyway. LOD0 returns the original artist-authored normals untouched; LOD1 and up got three.js's generic face-average recompute. The 0-to-1 boundary was the only place in the ladder where the normal regime changed, so that's where the pop lived. Dropping the one defensive line fixed the shading and, as a bonus, cut per-prop bake time roughly in half because we stopped recomputing normals on four LODs per part.

Auditing what the simplifier produced surfaced a third win. Each LOD was an original.clone() with a fresh index, and BufferGeometry.clone() deep-copies every attribute, so five LODs held five independent copies of position, normal, UV, and color buffers whose values were bit-identical across all of them. We refactored to share attribute references and only own a private index buffer per LOD, dropping a typical tree part from 20 distinct attribute identities to 9 and uploading each vertex buffer to the GPU once. Two contracts come with aliased storage: don't mutate attribute data through any single LOD, and don't dispose() a single LOD geometry, since both would hit every sibling that shares the buffer.

The fourth bug had nothing to do with painting. Just waving the cursor over the terrain dropped the frame rate, with no button held. The pointermove handler raycast against the terrain mesh, a 131,072-triangle plane with no spatial structure, so three.js walked the entire index buffer per event at up to 1,000 events per second. We didn't need the mesh for that lookup at all, because the terrain is a parametric heightmap. An adaptive ray-march against sampleHeight (big strides high above the surface, a 0.4 m floor near it, then 12 bisections on the sign flip) costs roughly 8 to 30 samples per ray instead of 131,072 triangle tests, about three orders of magnitude cheaper, and hover holds the frame cap again.

The cost just moves; make sure it moves off the click

After swapping the spike to WebGPURenderer on three r184 (the production target), a DevTools profile showed the very first paint blocking for 265 ms, 79% of it inside the meshoptimizer WASM. The bake was real work, around 180 simplify calls for a cold preset, but it was running inside the click handler because preloadProps only fetched and parsed scenes, never triggered the LOD bake. The fix was to make preset selection do the full bake in the background: preloadProps now calls the part-resolution path, caches the in-flight promise so a fast click joins it instead of forking a duplicate, and memoizes the per-geometry preprocessing the simplifier was redoing four times per part. First paint dropped from 209 ms to 4 ms in the HUD. The WASM time didn't vanish, it just left the user's critical path and runs while they're looking at the terrain deciding where to paint.

That's the recurring lesson of this spike. Almost none of these fixes changed what the brush does. They changed when the cost lands: off the click, off the hover, off the boundary the camera is hovering near. A scatter tool that feels instant isn't doing less work, it's doing the work where the user isn't waiting on it.

Technology referenced in this chapter

Heuristic suitability scatter. A brush samples candidate points in a disc, reads (height, slope) per point from a CPU heightmap, filters a preset's picks by slope and altitude predicates, weighted-draws one, and rejects it if it violates per-family minimum spacing tracked in a spatial hash. Slope-aligned picks rotate their up vector to the surface normal. This produces placement that reads as intentional (trees off cliffs, rocks tilted into slopes, pebbles stopping at the waterline) with no learned weights, and keeps the preset as flat data so an LLM-generated pick list is a drop-in swap.

Deterministic placement under async loads. A seedable Mulberry32 RNG owns every draw, so (seed, brush events) reproduces a session exactly. RNG draws happen before any await, and spacing reservations are inserted into the spatial index before the glTF clone resolves, so concurrent candidates respect each other and async asset loading can't perturb the sequence.

Bucketed InstancedMesh with O(1) edits. One InstancedMesh per (propId, partIndex, lod), capacity doubled on demand by copying live matrices into a larger buffer. Erase and FIFO-evict are swap-remove with a back-reference array patching the moved instance's index, so a removal is O(1) regardless of bucket size. A diagnostic confirmed glTF parts arrive as single-material meshes, making one-bucket-per-primitive the active path and giving each primitive an independently growable bucket.

Distance LOD with hysteresis and shared attribute buffers. Three meshopt-simplified levels per part, selected by distance bands with ±4 m hysteresis so a camera near a boundary doesn't thrash, re-evaluated at a capped rate and gated on real camera motion. Because LockBorder simplification never moves vertices, all LODs share one set of position/normal/UV/color buffers and differ only in their private index buffer, cutting distinct GPU vertex buffers by roughly half. Skipping a defensive computeVertexNormals keeps artist normals identical across LODs and removes the only shading discontinuity in the ladder. See LOD and meshoptimizer.

Analytic heightmap raycast for high-frequency lookups. A pointermove-rate cursor lookup against a 131k-triangle plane mesh walks the whole index buffer per event. Replacing it with an adaptive ray-march against the analytic height function (large strides far from the surface, a small floor near it, bisection on the sign flip of $({ray}_{y} - {terrain}_{y})$ ) costs tens of samples instead of tens of thousands of triangle tests, about three orders of magnitude cheaper, and a pre-allocated output vector keeps the hot path allocation-free.

Move work off the interaction's critical path. Expensive one-time work (meshopt LOD bakes, WGSL pipeline compiles) should run during idle gaps, not inside the click handler. Preloading the active preset's full bake on preset selection, caching the in-flight promise so a fast click joins rather than forks it, and memoizing per-geometry preprocessing dropped first-paint latency from 209 ms to 4 ms without doing any less total work.

Part 18 of 29. Previous: Part 17 - Animations that didn't need retargeting, and a live asset search Next: Part 19 - The imposter that has to survive a forest Series guide: /blog/2026-02-25-open-world-browser-series-guide

Building an open world in the browser, part 18: A scatter brush that feels AI-placed ​

"AI-placed" without an AI ​

From 300 draw calls to 49 ​

Distance LOD, and four bugs hiding in it ​

The cost just moves; make sure it moves off the click ​

Technology referenced in this chapter ​