Skip to content

Building an open world in the browser, part 7: Marching cubes and the first real caves

By Oleg Sidorkin, CTO and Co-Founder of Cinevva

New here? Use the series guide. It explains what a spike is and links all parts.

Heightmaps are great until you need overhangs.

The moment you want carved tunnels, floating rock lips, or cave ceilings, a pure heightfield pipeline starts blocking you. A heightmap stores one Y value per XZ coordinate. It's physically incapable of representing any surface that folds back over itself. We needed a volumetric representation.

Spike 12 implemented marching cubes on the GPU using WebGPU compute shaders. The algorithm evaluates a signed distance field (SDF) on a 3D grid and extracts a triangle mesh at the zero-crossing surface. Each grid cell is classified into one of 256 cases using a lookup table, and the corresponding triangles are emitted. We ran this on four active 64-cubed chunks simultaneously and tested animated SDF edits with per-frame remesh.

Open Spike 12 in a new tab ↗ · View source

The first win was confidence in the compute pipeline itself. A single dispatch could evaluate the SDF, classify cells, and emit vertices into a GPU buffer without any CPU readback. The second win was discovering how fast "it works" turns into artifact hunting. Missing triangles were rarely a marching cubes theory problem. They were table index mismatches, incorrect draw ranges reading past the active vertex count, or edge-case interactions near chunk boundaries where neighboring SDF samples weren't available.

This spike forced us to think in zones. Near the camera, you want volumetric freedom so players can carve, dig, and see caves. Far from the camera, you want clipmap efficiency where a flat heightmap is cheaper and perfectly adequate. That duality became the backbone of the architecture we kept refining from Spike 13 onward.

One of my favorite debugging moments was the wireframe toggle while edits were running. Watching topology form and dissolve in real time made quality tradeoffs immediately visible. You could see where vertex density was high enough, where it got too coarse, and exactly where LOD transitions would eventually need Transvoxel support to avoid cracks.

In part 8 we cover the integration challenge. Keeping raw compute-driven meshes and Three.js scene graph logic in one stable rendering pipeline was harder than the isolated demo suggested.

Technology referenced in this chapter

Marching cubes. An algorithm for extracting a triangle mesh from a 3D scalar field (Lorensen and Cline, 1987). Each cell in a regular 3D grid is classified by sampling the field at its 8 corners. The sign pattern produces a case index (0-255), and a lookup table maps each case to a set of triangles. Vertices are placed on grid edges by interpolating between the two corners. The algorithm is embarrassingly parallel since each cell processes independently, making it ideal for GPU compute. See our landscape guide on SDFs and marching cubes.

Signed Distance Fields (SDFs). A volumetric representation that stores, at every point in 3D space, the signed distance to the nearest surface. Positive values are outside, negative are inside, and the zero-crossing is the surface. SDFs can represent arbitrary 3D shapes: caves, arches, overhangs, and floating geometry that heightmaps can't express. Editing is natural: adding material is a min() on the distance field, removing (digging) is max() with a negated shape, smooth blending uses smoothMin(). See SDF terrain representation.

WebGPU compute shaders. GPU programs that run general-purpose computation, not tied to the rasterization pipeline. A compute shader dispatches workgroups of threads that execute in parallel. For marching cubes, each thread processes one grid cell: sample the SDF, classify the cell, look up triangulation, interpolate edge vertices, and append to a mesh buffer using atomic counters. No CPU readback is needed because the output buffer is used directly as vertex data for rendering. Will Usher's webgpu-marching-cubes demonstrates real-time 256^3 grid processing in the browser. See our landscape guide on WebGPU-driven LOD.

Hybrid heightmap + SDF architecture. The practical approach for browser terrain: heightmaps cover the entire world (cheap, compact), while SDF volumes exist only in chunks that need caves, overhangs, or creator-carved features (5-10% of chunks). Near the camera, volumetric freedom allows carving and caves. Far away, heightmaps provide efficient flat terrain. See hybrid terrain representation.


Part 7 of 12.
Previous: Part 6 - Clipmaps changed the plot
Next: Part 8 - Integration without losing our baseline
Series guide: /blog/2026-02-25-open-world-browser-series-guide