Skip to content

Building an open world in the browser, part 20: Faking depth on a flat plane

By Oleg Sidorkin, CTO and Co-Founder of Cinevva

New here? Use the series guide. It explains what a spike is and links all parts.

Part 19 used a flat quad to fake a whole tree at distance. This part uses a flat quad to fake depth up close: parallax occlusion mapping, the trick that makes a cobblestone road look like it has 5 cm of recessed grouting without spending a single extra vertex. The goal was to land it on the production stack (Three.js r184, WebGPU, TSL) so terrain detail materials can carry that depth illusion where it matters and pay flat-texture cost everywhere else.

Three ways to fake depth, side by side

Open Spike 39 in a new tab ↗ · View source

The spike lays three flat 5×5 m planes side by side, all the same material scaffold, differing only in the UV that feeds the samplers. Flat samples the texture straight, the reference baseline. Single-sample parallax shifts the UV once along the view direction by the height at that point, which is cheap and OK at low amplitude but swims at grazing angles. POM ray-marches in tangent space: step along the view ray, find the first layer where the ray passes below the heightfield, and refine the crossing. Tangent space stays simple because every test plane is axis-aligned, so the view direction packs into tangent space with a couple of sign flips instead of a full per-vertex TBN matrix. The texture set pulls live from the Polyhaven file API, the same path Part 17's model search used.

Two WebGPU walls, and a branchless ray-march

The textbook POM loop breaks out of the search on the first crossing. On r184 that doesn't work, for two separate reasons. If(...).and(...) compiled without errors but produced WGSL where the loop body never executed, so the post-loop refinement ran on garbage and the plane rendered nearly white. And Break() as a standalone node hadn't shipped to the r184 build at all, so even with a working If there was no way to express "stop on first crossing." Both trace back to known three.js issues around TSL control flow over-optimizing across If and Loop boundaries in this version range.

The rewrite is branchless. Every iteration unconditionally samples the texture, which keeps texture access in uniform control flow as the WGSL spec wants, then folds the new state in through a done flag held as a float. Once done flips to 1, the per-iteration mix calls degenerate to "keep state unchanged," which is the branchless equivalent of a break. The done flag is built with a step helper implemented as 0.5 + 0.5 × sign(x + ε) because boolean-to-float coercion has been spotty across the r18x line and sign() is universally safe. The cost is that every fragment runs all 64 iterations regardless of where it actually crosses, but that's the right trade at fragment scale: the runtime gates on max steps anyway, and a real GPU would speculate past a "real" break too. A clean fallback matters here, a final mix(baseUV, refined, done), so that at zero amplitude (the far end of the distance fade) no fragment crosses, done stays 0, and the POM material is bit-identical to flat. That's the whole point of the distance-LOD trick: collapse to flat cost where the effect is sub-pixel anyway.

The bug was a discipline failure, not a math failure

The branchless version ran but looked distorted, streaky horizontal artifacts at moderate amplitude and a subtly-wrong-but-not-crisp result at low amplitude. The fix came from a one-line prompt: go read the canonical reference. The LlamAcademy tutorial that inspired this is just a Unity ShaderGraph node, so the real implementation lives in Unity's PerPixelDisplacement.hlsl. Reading it line by line surfaced three semantic differences I'd unwittingly introduced: an off-by-one in the ray-height baseline (Unity does an initial advance before the loop, so my frame of reference was a whole step out of phase, landing crossings in the wrong layer about half the time), a sign convention on the max offset that the refinement step depends on, and a cumulative-offset versus cumulative-UV bookkeeping choice that made my refinement math work harder and tangled the sign.

The root cause wasn't any single error, it was mixing two references. I'd taken the LearnOpenGL POM tutorial as my guide, which uses similar but different sign conventions and a different refinement formula, and ended up in a mongrel state where two-thirds of the math matched one source and one-third matched the other. The rewrite is a near-verbatim port of Unity's HLSL into TSL, same variable names, same initial advance, same refinement, with the branchless done flag kept on top. The lesson is worth carrying: when you port a known-good shader from another stack, port it line-for-line with the same names first, then refactor for local style. Don't re-derive against a second reference mid-port.

A reference plane that can't lie

The side-by-side was missing the obvious thing: a real-geometry plane. Without it, "POM looks pretty good" is unfalsifiable. Pretty good compared to what? So the spike added a fourth plane, the same heightmap pushed through actual vertex positions. WebGPU has no hardware tessellation (it's simply not in the spec, cut for Metal compatibility), so the substitute is a densely subdivided plane (256×256 segments, 131,072 triangles) with vertex displacement in the vertex stage. The same amplitude uniform drives both POM and the geometry plane, so they fade together and the comparison stays apples-to-apples at every distance.

With ground truth on screen, the qualitative claims became measurable. At a 16° orbit looking down, POM and the tessellated plane agree on internal shading. At grazing angles they diverge exactly where they must: POM clamps to the geometry's perfectly straight rectangular edge, while the real mesh shows a bumpy horizon profile of actual peaks and valleys catching the light. So POM's edge "swimming" is now provably intrinsic to the algorithm, not an artifact of the texture or lighting. The two cost shapes are also clear: POM is fragment-bound (cost scales with covered pixels), the tessellated plane is vertex-bound (cost scales with mesh density regardless of coverage). For a terrain chunk, which already pays the vertex cost of a heightmap-driven plane, POM is the right answer for sub-mesh detail.

The reference plane also caught a subtle UX bug. The user noticed the surface appeared to sink as amplitude increased. That traced to Unity's convention treating the geometric plane as the top of the heightfield, so peaks anchor flush and everything else parallaxes downward, dragging the average surface below the flat baseline by (1 − mean_h) × amplitude. The fix re-centers the convention so h = 0.5 is the plane, peaks rise toward the camera and valleys recess. The algorithm runs exactly as Unity prescribes; the spike just post-processes the output by half an offset to match what "amplitude" should mean to a person dragging a slider.

One more thing the reference plane settled. A "Steps" slider seemed to do nothing, which read like a plumbing bug but wasn't. The three-iteration secant refinement after the linear search is so good (Tatarchuk's 2006 POM paper notes a 4-step search plus 3-step secant is visually indistinguishable from a 64-step search) that on a smooth heightmap every step count from 4 to 64 converges to the same sub-texel UV. The fix was a toggle, not a re-plumbing: turn the secant off and the step slider becomes the only control over crossing precision, so dropping to 4 visibly stair-steps the cobblestone and cranking to 64 smooths it back. The toggle is a 0/1 uniform that mixes every secant state update to a no-op when off, so toggling never rebuilds the material and never stutters.

Technology referenced in this chapter

Parallax occlusion mapping in TSL. POM ray-marches the view direction through a heightfield in tangent space, finds the first layer where the ray drops below the surface, and refines the crossing, producing recessed-mortar depth on a flat quad with no extra geometry. A final mix(baseUV, refined, done) makes the material bit-identical to flat when no fragment crosses, which is what lets distance-LOD amplitude attenuation collapse the cost to flat-texture cost at range. See terrain materials.

Branchless loops for WebGPU control flow. On Three.js r184, TSL If(...).and(...) can compile to WGSL whose loop body never runs, and standalone Break() isn't available. The portable pattern is an unconditional texture sample per iteration (keeping texture access in uniform control flow per the WGSL spec) plus a done flag held as a float that mixes each state update to a no-op once set. A step helper built from sign(x + ε) avoids unreliable boolean-to-float coercion. Cost is constant max iterations regardless of early-out point, the correct trade at fragment scale.

Verbatim shader porting. Porting a known-good shader from another engine should be line-for-line with the original's variable names first, refactor for local style second. Mixing two references (Unity's PerPixelDisplacement.hlsl and the LearnOpenGL tutorial) produced a mongrel with an off-by-one ray baseline, an inverted offset sign, and a refinement formula whose clamp masked out-of-range weights as spatial discontinuities. One canonical ground truth, not a re-derivation.

Vertex-displaced ground-truth reference. With no hardware tessellation in WebGPU, a densely subdivided plane (256² segments) displaced in the vertex stage stands in as real geometry to validate a fragment-stage fake. Driving both with the same amplitude uniform keeps the comparison honest across distance. POM is fragment-bound (scales with covered pixels) and the geometry plane is vertex-bound (scales with mesh density), so they diverge exactly at silhouette edges, proving POM's edge swim is intrinsic, not an artifact.


Part 20 of 29. Previous: Part 19 - The imposter that has to survive a forest Next: Part 21 - A faster renderer that wasn't faster Series guide: /blog/2026-02-25-open-world-browser-series-guide