Building an open world in the browser, part 30: A camera that respects walls

By Oleg Sidorkin, CTO and Co-Founder of Cinevva

New here? Use the series guide. It explains what a spike is and links all parts.

Part 29 gave us one controller that drives any body. The body moves correctly now. It walks, slides, swims, glides, climbs into caves, and ducks under overhangs. The problem is what's watching it. For twenty-nine parts the camera was a stock orbit rig that followed the player and did exactly one clever thing to avoid embarrassment, it refused to tilt below the horizon so it couldn't slide under flat ground. That clamp is the tell. It exists because the camera had no idea where the world's geometry actually was, so the only defence against clipping was to forbid the angles where clipping was most likely. Walk up to a hill and the camera sat inside the hill. Step into one of the marching-cubes caves from Part 7 and you were looking at the inside of a rock. Build a house with the authoring tools from Part 16 and stand in a room, and the camera floated outside the wall looking at siding. This part gives the camera the same respect for geometry the body already has.

The shape of the problem, and the shape of the fix

A third-person camera has one job that's hard and a dozen that are easy. The easy ones are following, smoothing, and orbit input, and we already had those. The hard one is the academic literature calls it a visibility constraint, and the survey everyone cites, Christie and Olivier's Camera Control in Computer Graphics, frames the whole field around it: keep the subject framed and unoccluded while respecting the world. At runtime that reduces to a deceptively simple question asked every frame. The player is the pivot. The user has rotated and zoomed to a desired camera position some distance behind them. How far back can the camera actually sit along that line before it pokes into something solid? Answer that honestly and the camera tucks itself in front of the hill, slides down the boom as you back into a corner, and stops at the cave ceiling instead of punching through it.

The pattern that answers it is old and proven. Unreal calls it a spring arm, Godot ships a SpringArm3D node, and Unity's Cinemachine splits it across a third-person follow rig and a deoccluder extension. The idea is always the same. Hang the camera off the end of a boom anchored at the pivot. Hold it at the desired length when the path is clear, retract it toward the pivot when something's in the way, and spring it back out when the way clears. Mark Haigh-Hutchinson's Real-Time Cameras, written by the camera lead on Metroid Prime, spends whole chapters on the failure modes that turn a naive version of this into something that makes players sick. We took the pattern and built our own, small enough to read in one sitting, in public/world/src/camera-rig.mjs.

The boom holds the camera at the user's desired zoom until an obstacle intrudes, then retracts along the same line so the probe sphere rests against the surface instead of the near plane punching through it.

A boom that owns only its length

The design rule that kept the rig small is a borrow from the controller work in Part 29: own one thing, completely, and refuse the rest. The rig owns the length of the boom and nothing else. Yaw, pitch, damping of the orbit, touch and wheel gestures, all of that stays with OrbitControls, which already solves it well and which we have no interest in rewriting. So the rig is not a camera controller. It's a post-process that runs after the orbit math and corrects exactly one number, the distance from pivot to camera.

That decision sounds tidy and it almost broke immediately, because of how OrbitControls thinks. At the top of every update it reads the camera's current position and derives its orbit radius from it. That's normally invisible. But the instant our rig shortens the camera to dodge a wall, the next frame OrbitControls reads that shortened position, concludes the user must have zoomed in, and bakes the shortening into the user's desired zoom. A few frames of that and the camera has collapsed onto the player's head and won't come back out. The fix is two calls that bracket the orbit update and it's the whole integration. Before OrbitControls runs, beforeControls() restores the camera to last frame's full, un-shortened distance, so the orbit math always reads the user's true zoom. After OrbitControls runs, afterControls(dt) reads that freshly orbited desired position, resolves the collision, damps the length, and writes the camera where it should actually render. The user's intent and the collision correction never touch each other, and the dolly stays exactly as responsive as it was before the rig existed. The headless test we wrote for the rig pins this precisely: drive the boom into a wall for sixty frames, clear the wall, and the length springs back to the user's full zoom of ten metres rather than sticking at the collision distance.

One contract, any collider

The rig never asks what the world is made of. A collider is just an object with a probe method, and the rig hands it a ray, a maximum distance, and the camera's probe radius, and gets back one number, the furthest the camera may travel before that collider blocks it. The rig queries every registered collider and takes the nearest hit. That's the entire contract, and it's the same move the character controller made when it turned locomotion into pluggable behaviours. Terrain plugs in, props plug in, building shells plug in, each behind the same probe, and the rig stays ignorant of all of them. Adding a new kind of obstacle is adding a collider to a list, not editing the camera.

One detail in the contract earns its keep, and it's the probe radius. We don't cast a thin ray from pivot to camera, we cast a sphere big enough to contain the camera's near plane. A single ray stops the camera centre at the wall, but the near plane has width, so its corners would already be buried in the wall before the centre ray ever reported a hit. Sweeping a small sphere instead of a ray is what every shipping implementation does, Unreal exposes it as the probe size, and it's the difference between a camera that rests cleanly against a surface and one that lets you see through it at the edges of the screen.

We don't guess that radius, we derive it. The furthest point of the near plane from the camera is one of its corners, and the distance to it falls straight out of the projection. With near plane $n$ and vertical field of view $θ$ , the half-height is $h = n \tan (θ / 2)$ , the half-width is $w = h \cdot aspect$ , and the corner sits at

r_{near} = \sqrt{n^{2} + w^{2} + h^{2}}

The probe radius is that corner distance times a small safety margin, with a fixed floor so it never drops below a sensible minimum on a very narrow frustum. Whenever the field of view, aspect ratio, or near plane changes, the radius is recomputed, so a window resize or a zoom that touches the projection can't quietly leave the probe too small to cover the corners it's meant to protect.

The collider that knows about caves

The terrain collider is where this gets interesting, because terrain in our engine isn't a heightmap. Since Part 7 it's been a signed distance field, a function that returns how far any point in space is from the nearest solid surface and whether it's inside or outside the rock. Positive is air, negative is rock, and that single fact is why our camera can do something a heightmap camera structurally cannot. A heightmap knows the ground height at an x and z. It has no concept of a ceiling, because there's only ever one surface above any point. So a heightmap camera can stop you walking into a hill, but it has no idea an overhang lip or a cave roof is hanging over the pivot, and it sails right through both. A distance field knows about every surface in three dimensions, so the same probe that stops the camera against a hillside stops it against a cave ceiling without a single special case.

A heightmap stores one surface per column, so it never sees the slab of rock above the player and lets the boom punch up through the ceiling. The distance field is negative inside that slab, so the probe contacts it and the camera holds just below the cave roof.

Walking the probe along the field is a technique with a name and a paper behind it. John Hart's Sphere Tracing, from 1996, is the standard way to march a ray against a distance field, and the trick is that the field doesn't just tell you whether you've hit something, it tells you a safe distance you can advance without hitting anything. So instead of creeping along in tiny fixed steps, you sample the field, step forward by the slack it reports, and repeat, taking long strides through open air and short careful ones as you close on a surface.

There's a catch our terrain forces on us, though. A true distance field reports the real Euclidean distance to the nearest surface, and a full stride by that slack is always safe. But where the terrain is still a heightmap rather than carved voxels, the field we can cheaply sample isn't true distance, it's vertical clearance, the gap straight down to the ground. On a slope that number over-reports how far the camera can actually move, because the nearest rock is off to the side, not directly below. The honest distance is smaller by a factor that grows with the gradient, $\sqrt{1 + ‖ \nabla h ‖^{2}}$ , so a stride sized for the reported slack overshoots and can hop clean over a ridge. The fix is under-relaxation, advancing by only a fraction of the reported slack rather than all of it, which stays safe up to roughly a sixty-degree slope and costs nothing on the true-distance voxel regions beyond a few extra samples. We pair that with a stride floor, so a cave wall a cell or two thick is never stepped clean over, and a tighter ceiling that keeps the march cheap. When the sphere finally touches the surface a short bisection tightens the contact point, and the camera backs off by its own radius and rests there.

Written out, the boom is a ray $r (t) = p + t d$ from the pivot $p$ along the unit direction $d$ , and the march advances by an under-relaxed fraction of the field's slack to contact, clamped at both ends:

t_{n + 1} = t_{n} + clamp (λ (Φ (r (t_{n})) - r), s_{min}, s_{max})

Here $Φ$ is the signed distance, positive in air and negative in rock, $r$ is the probe radius, $λ \in (0, 1]$ is the under-relaxation factor that keeps the over-reporting heightmap field from hopping a slope, and $s_{min}, s_{max}$ are the step clamps that keep a thin wall from being skipped and the march from running long. Contact is the first $t$ where $Φ (r (t)) \leq r$ , meaning the sphere surface has reached the rock, and the camera holds at that arc length less its radius.

The collider carries one more method as a safety net, depenetration. Collision should keep the camera out of solid in the first place, but a few situations sneak past it, a pivot straddling a wall thin enough that the boom starts inside it, a cave freshly carved by the terrain tools while the camera sat in the rock that's now gone, a building piece dropped around the camera. For those the collider checks whether the camera ended the frame inside solid, where $Φ (c) < r$ , and if it did, reads the field's gradient, which points straight toward open air because $Φ$ grows as you leave the rock, and steps the camera out along it:

c \leftarrow c + (r - Φ (c)) \frac{\nabla Φ (c)}{‖ \nabla Φ (c) ‖}

A handful of those iterations converges on the $r$ -isosurface, and it's the thing that means a sculpting tool can dig the ground out from under the camera and the view recovers on the next frame instead of going black.

Snapping in, easing out, and not flinching at fence posts

A boom that simply jumps to the collision distance every frame is worse than no boom at all, because the world is full of thin things the camera passes behind for a single frame, a fence post, a lamp, a tree trunk, and a camera that lunges to dodge each one and lunges back is nauseating. Haigh-Hutchinson's book and Itay Keren's much-loved talk on camera motion both land on the same intuition, which is that the camera should react to threats and danger faster than it relaxes from them. So the damping is deliberately asymmetric. When an occluder appears and the boom needs to shorten, it snaps in almost instantly, because a frame of clipping is ugly and the player forgives a fast tuck. When the occluder clears and the boom wants to lengthen, it eases out slowly, and only after a short dwell timer of continuous clearance has elapsed. That dwell is the hysteresis that kills the flinch. Whip the camera past a thin post and the post never clears long enough to trigger the slow extension, so the camera glides past it as if it weren't there, which is exactly what your eye wants. Cinemachine exposes the same idea as separate damping-into and damping-out-of collision values, and the asymmetry is the part that makes it feel like a camera operator rather than a spring.

In code it's one line of exponential smoothing with the rate switched on the sign of the change. If $ℓ$ is the current boom length, $a$ the length collision allows this frame, and $Δ t$ the frame time, then

ℓ \leftarrow ℓ + (a - ℓ) (1 - e^{- k Δ t}), k = {\begin{cases} k_{in} & a \leq ℓ \\ k_{out} & a > ℓ \end{cases}, k_{in} ≫ k_{out}

and the ease-out branch only runs once the clearance has held for the dwell time $τ$ . The $1 - e^{- k Δ t}$ form matters beyond looking tidy. It fixes the response time constant at $1 / k$ regardless of frame rate, so the camera feels identical at 30 and 144 frames per second, where the naive constant-blend $ℓ \leftarrow ℓ + α (a - ℓ)$ would snap faster on a fast machine and mush on a slow one.

The last behaviour is for the tight interiors that started this whole part. When the boom collapses short enough that the camera is right on top of the player, we hide the player's own avatar and let the view sit near first person. This is what Breath of the Wild does in a cramped shrine and what most third-person games fall back to in a corner, because the alternative, a camera jammed against a wall staring at the back of a head, is useless. The rig exposes a single flag for it, and the world loop reads the flag and toggles the local avatar's visibility. Below a metre of boom you're effectively in first person, the walls are honoured, and the moment you back into a room with space the avatar fades back in and the boom extends.

What plugs in next

The terrain collider ships today and it's the hard half, because terrain is everywhere and a distance field is the awkward thing to probe. The props and the building shells from the authoring spikes are the easy half, and the contract is already waiting for them. A second collider, a raycast collider, casts from the pivot toward the camera against a list of meshes and reports the nearest hit the same way the terrain collider does. The cheap version casts a single ray, which is fine until prop counts climb, and the upgrade is to swap the ray for a swept sphere using three-mesh-bvh, Garrett Johnson's library that wraps a mesh in a bounding-volume hierarchy so spatial queries run in log time instead of brute force. Either way the rig doesn't change. It queries a longer list of colliders and takes the nearest hit, which is the entire point of having built the contract first and the colliders second.

Technology referenced in this chapter

A boom that owns one number. The camera rig is a post-process over OrbitControls, not a replacement for it. It owns the length of the boom and leaves yaw, pitch, zoom, and gesture handling with the orbit controller we already trusted. The integration is two bracketing calls: beforeControls() restores the previous frame's full distance so the orbit math reads the user's true zoom rather than mistaking a collision shortening for a dolly-in, and afterControls(dt) resolves the collision and writes the rendered position. Without that pair the camera collapses onto the player over a few frames.

A pluggable collider contract. A collider is any object with a probe that answers "how far can the camera travel from the pivot along this ray before you block it." The rig queries every collider and takes the nearest hit, ignorant of whether the obstacle is terrain, a prop, or a wall, which is the same own-one-thing discipline the pluggable character controller used for locomotion. The probe is a swept sphere sized to contain the near plane, not a thin ray, so the frustum corners never clip a surface the centre ray would miss. Its radius is derived from the projection, the distance to a near-plane corner $\sqrt{n^{2} + w^{2} + h^{2}}$ times a safety margin, and recomputed whenever the field of view, aspect, or near plane changes.

A signed-distance terrain probe that honours overhangs and caves. Because terrain is a signed distance field rather than a heightmap, the same probe that stops the camera against a hillside stops it against a cave ceiling, the case a heightmap raycast structurally cannot see. The march is Hart-style sphere tracing, stepping by an under-relaxed fraction of the slack the field reports and clamped at both ends, because the heightmap regions report vertical clearance rather than true distance and over-report how far the camera can move on a slope, so a full stride would hop a ridge. A gradient-driven depenetration pass is the safety net that recovers the view when the ground is sculpted out from under the camera.

Asymmetric damping with a dwell timer. The boom snaps in fast when an occluder appears and eases out slowly once it clears, and only after a short window of continuous clearance, so whipping past a fence post never makes the camera lunge. Below a collapse threshold the rig flags near-first-person and the world loop hides the local avatar, the standard fallback for tight interiors instead of a camera buried in a wall.

References

The framing of camera control as a visibility constraint comes from Marc Christie and Patrick Olivier, Camera Control in Computer Graphics (Computer Graphics Forum, 2008). The distance-field march is John C. Hart, Sphere Tracing: A Geometric Method for the Antialiased Ray Tracing of Implicit Surfaces (The Visual Computer, 1996). The spring-arm pattern and its probe-sphere collision are documented in Epic's Spring Arm Component and Unity's Cinemachine Deoccluder and Third Person Follow. The motion and damping intuition is from Mark Haigh-Hutchinson, Real-Time Cameras (Morgan Kaufmann, 2009), and Itay Keren, Scroll Back: The Theory and Practice of Cameras in Side-Scrollers (GDC 2015). The mesh-collider upgrade path is Garrett Johnson's three-mesh-bvh.

Part 30 of 30. Previous: Part 29 - One controller, any body Series guide: /blog/2026-02-25-open-world-browser-series-guide

Building an open world in the browser, part 30: A camera that respects walls ​

The shape of the problem, and the shape of the fix ​

A boom that owns only its length ​

One contract, any collider ​

The collider that knows about caves ​

Snapping in, easing out, and not flinching at fence posts ​

What plugs in next ​

Technology referenced in this chapter ​

References ​