Browser 3D Open World Tech for Multiplayer Creator Worlds
We want to put our creators in a shared open world. Not a lobby. Not a gallery. A living 3D space where people can explore, build, and stumble onto each other's work. Something that loads in a browser tab and feels like a place worth being in.
That's a hard problem. Skyrim and The Witcher 3 spent hundreds of millions of dollars building their worlds, and they still ship on dedicated hardware with tens of gigabytes on disk. We're targeting a browser tab, shared state across hundreds of players, and a world that creators can actually reshape.
This guide documents what we found researching the tech. What's possible today, what's coming, and what architecture would get us there.
What We Can Learn from Skyrim and The Witcher
Before picking technology, it helps to understand how the two most successful open worlds actually work. Their techniques map surprisingly well to browser constraints.
Skyrim's Cell System
Skyrim divides its world into a grid of 57x37 exterior cells, each 4096x4096 game units (roughly 59 meters). The engine loads a 5x5 grid of cells around the player at any time. As you walk, cells on the trailing edge unload while cells on the leading edge stream in. Interior spaces (dungeons, buildings) are separate worldspaces loaded through door triggers.
This is directly applicable to a browser world. You don't load the entire map. You load a neighborhood around the player and swap chunks as they move. The key insight: Skyrim never lets you see the full detail of distant terrain. It uses a multi-resolution approach.
LOD (Level of Detail) tiers in Skyrim:
- Full detail within 1-2 cells (player's immediate area)
- Simplified meshes at medium range (objects lose small geometry)
- Billboard impostors for trees and rocks beyond medium range
- Terrain LOD uses pre-baked low-resolution meshes for distant landscape
- Object fade distances tuned per-object (grass fades first, buildings last)
Skyrim's landscape uses a heightmap-based terrain with 32x32 vertex patches per cell. Each patch can have different textures blended with alpha maps. This is cheap to store and render. A single cell's terrain data is measured in kilobytes, not megabytes.
What we can apply: Chunk-based streaming, heightmap terrain, aggressive LOD, interior/exterior separation, and the idea that distant terrain doesn't need to be geometrically accurate, just visually convincing.
The Witcher 3's Streaming Architecture
The Witcher 3's world is larger than Skyrim (roughly 136 km2 across all regions) and uses a more sophisticated streaming system. CD Projekt RED built a custom engine (REDengine 3) where the world is divided into streaming sectors, not a strict grid.
Each sector contains a hierarchy of content layers:
- Terrain streams as patches with 4 LOD levels
- Foliage uses GPU instancing with distance-based culling
- Static geometry (buildings, rocks) streams independently from terrain
- Gameplay objects (NPCs, items, triggers) load based on quest state and proximity
The renderer uses deferred shading with physically-based materials. One trick that's particularly relevant: The Witcher 3 aggressively uses impostors. A tree that's 200 meters away isn't a 3D model with branches and leaves. It's a flat textured quad that always faces the camera. The engine pre-renders these impostors from multiple angles and swaps them as the camera moves. You never notice because they're far enough away.
World composition: The Witcher 3's open world was built in layers by separate teams working simultaneously. The terrain team sculpted the landscape. The environment art team placed structures. The quest team wired up triggers and NPC paths. This layered composition system let a large team work without stepping on each other.
What we can apply: Layered world composition (critical for a creator platform), impostor rendering for distant objects, deferred rendering for many light sources, and the insight that streaming should be content-aware (don't load interiors you can't see, don't spawn NPCs you're not near).
Breath of the Wild's Chemistry Engine
Nintendo's approach to open world design inverted the typical formula. Instead of filling the world with scripted content, they built systemic rules and let players discover emergent behavior. Fire spreads to grass. Metal conducts electricity in thunderstorms. Wind carries objects. Everything interacts with everything else through a consistent physics/chemistry layer.
This matters for a creator world because it suggests that the world itself can be more interesting than any individual placed object. If creators can define material properties and interaction rules (not just place static meshes), the world gains emergent behavior that makes exploration worthwhile even in areas nobody explicitly designed.
Technical takeaways from BotW:
- Relatively low polygon count and resolution (900p docked on Switch). The stylized cel-shaded look hides the low fidelity. A browser world could achieve similar visual quality today.
- Distance rendering uses a toon-shaded fog that fades into watercolor-style skyboxes. Cheap to render, beautiful in practice. This kind of atmospheric perspective is essentially free in a fragment shader.
- The world is roughly 84 km2 but sparse. Most of the map is traversable terrain with points of interest spread across it. Dense content is reserved for towns and shrines. This "sparse but interesting" approach reduces asset requirements dramatically compared to The Witcher 3's densely populated Novigrad.
- Physics objects have material tags (wood, metal, food, explosive) that determine interactions. The number of actual rules is small (maybe 20-30 interaction types) but they combine in ways that feel infinite.
GTA V's World Layering
Rockstar's approach to Los Santos teaches a different lesson. GTA V's world feels alive because of layered ambient systems, not because every NPC has a quest.
The city has traffic systems that simulate hundreds of vehicles on a road network. Pedestrians walk routes, react to the player, and interact with each other. Time of day changes the population density, the types of NPCs present, and the ambient activity. Weather affects driving physics and NPC behavior.
For a creator world, the insight is that ambient life makes the difference between a world that feels like a museum and one that feels like a place. Even simple behaviors (birds flying overhead, waves lapping at a shore, NPCs walking between buildings) create the illusion of a living world.
Technical takeaways from GTA V:
- The map is 75 km2 and uses aggressive LOD combined with a streaming system that loads based on velocity (driving fast loads further ahead than walking).
- GTA Online puts 30 players in this world simultaneously. Even at 30, Rockstar found they needed spatial interest management: you get detailed updates about nearby players and less frequent updates about distant ones. The same principle applies at our 200-player target.
- GTA V's asset pipeline is one of the most efficient ever built. The entire game fits in roughly 80 GB because of extreme texture compression, mesh sharing, and procedural detail. Their building interiors reuse modular pieces extensively.
- The game uses a sophisticated imposter system where distant buildings are actually flat photographs composited together. The player never notices because the transition distances are tuned to specific viewing angles.
Elden Ring's Seamless Open World
FromSoftware built Elden Ring on the same engine as Dark Souls but scaled it to an open world. The result is interesting because it shows how a relatively small team (compared to Rockstar or CD Projekt) can build a large open world.
The trick is density variation. Elden Ring's map is huge (roughly 79 km2) but large sections are open terrain with scattered enemies and points of interest. The dense, hand-crafted content (Legacy Dungeons like Stormveil Castle) is embedded in the open world and uses the same interior/exterior separation that Skyrim uses.
Technical takeaways from Elden Ring:
- The world streams in large tiles. On horseback you can see very far, but the distant terrain is extremely simplified. Close up, the detail is comparable to Dark Souls 3.
- Loading screens only appear when fast-traveling or entering certain dungeons. The open world itself streams without interruption.
- The game reuses environmental assets extensively. The same tree models, rock formations, and ruin pieces appear across the map in different arrangements. With enough variety in arrangement and lighting, repetition isn't noticeable. This is directly applicable to a creator world where a library of modular pieces can generate infinite variety.
- Multiplayer is session-based (summoning other players into your world), not persistent. But the asynchronous features (messages, bloodstains, ghosts of other players) create a sense of shared existence without the networking overhead of a persistent server. These ambient multiplayer features would be cheap to add to a browser world.
No Man's Sky: Procedural Everything
No Man's Sky is the extreme case of procedural generation. 18 quintillion planets generated from a shared seed, so every player sees the same universe without any of it being stored on a server.
The generation pipeline works in stages: galaxy-level seeds determine star positions, star seeds determine planet count and types, planet seeds drive terrain generation (multi-octave noise), biome assignment, flora/fauna species, color palettes, and resource distribution. Everything computes deterministically from the seed, so two players visiting the same coordinates see the same planet without exchanging any planet data.
Technical takeaways from No Man's Sky:
- Terrain generation uses voxel-based marching cubes with a stack of noise functions. This allows caves, overhangs, and floating islands that heightmap terrain can't represent. The trade-off is higher compute cost per chunk, but the result is more visually interesting terrain.
- The base-building system is closest to what we're envisioning. Players place structures that persist on the server and are visible to other players. The base data is compact (a list of parts with positions and rotations) and streams on demand when someone visits the planet.
- The game was initially criticized for repetitive content despite infinite variety. Seed-based generation can produce infinite terrain but limited surprises. The lesson: procedural generation works for the canvas, but creator-placed content is what makes a place feel designed and intentional.
- Hello Games added multiplayer years after launch. Up to 32 players share a session with full building and exploration. The network model is simple: one player hosts, others connect. For a browser world with persistent state, a server-authoritative model works better, but No Man's Sky proves that shared procedural worlds are tractable.
Minecraft: The Blueprint for Creator Worlds
Minecraft is the most important reference point for what we're building, more than any graphical AAA title. 300 million copies sold. The world is infinitely generated, fully destructible, and multiplayer. Creators don't just place objects in Minecraft. They reshape the terrain itself.
Technical takeaways from Minecraft:
- The world is divided into 16x16x384 block chunks. Only chunks near the player are loaded (render distance is configurable). This is the same chunk streaming pattern as Skyrim but with fully editable terrain.
- Each chunk is stored as a palette-compressed array of block IDs. A chunk with only 5 different block types stores a 4-bit palette index per block instead of a full block ID. This makes chunk data very compact (typically 10-50 KB per chunk after compression).
- Minecraft's multiplayer protocol is well-documented and relatively simple. The server sends chunk data as the player moves. Block changes are broadcast as small delta updates (position + new block type). This is exactly the model we'd use for creator edits.
- The game runs in Java and now has a Bedrock Edition in C++. There are browser-based Minecraft clones (ClassiCube, eaglercraft) that prove the core concept works in WebGL. They typically handle 8-12 chunk render distance at 60fps, which gives a view of about 200-400 meters.
- Minecraft's modding ecosystem is its real moat. Mods add new blocks, entities, biomes, and gameplay systems. For a creator world, this suggests that extensibility matters as much as the base experience. If creators can define new interaction types (not just place objects), the world gets richer over time.
- Redstone (Minecraft's in-game wiring system) shows that creators will build complex systems if you give them simple, composable primitives. Logic gates, automatic farms, calculators. The simple rule set generates extraordinary complexity. This is the same lesson as BotW's chemistry engine.
Common Patterns Across AAA Open Worlds
Looking across all these titles, the same patterns keep showing up:
Spatial partitioning is universal. Whether it's cells (Bethesda), sectors (CD Projekt), chunks (Mojang/Rockstar), or tiles (FromSoftware), every open world divides space into loadable units. No engine tries to hold the entire world in memory.
Multi-resolution everything. Terrain, meshes, textures, and even audio all have multiple quality levels. The resolution you get depends on how close you are and how important the object is.
Occlusion culling matters more than raw triangle throughput. Not rendering what you can't see saves more performance than optimizing what you can. Skyrim uses a simple distance-based system. The Witcher 3 uses software occlusion with large occluders (buildings, cliffs). Modern engines like UE5's Nanite take this further with hardware occlusion queries.
Async loading hides transitions. These games don't show loading screens for the open world (only for fast-travel or interior transitions). They load content on background threads, decompress in parallel, and swap in new content gradually.
Art direction over polygon count. Skyrim shipped in 2011 with graphics that were modest even then. BotW runs on a tablet-class chip and looks beautiful. Minecraft uses 16x16 pixel textures and is one of the most visually recognizable games ever made. For a browser world, this matters enormously. A well-directed art style at lower fidelity will always beat a technically advanced but artistically flat world.
Systemic design beats scripted content. BotW's chemistry engine, Minecraft's block interactions, and GTA V's traffic systems all create emergent behavior from simple rules. This is cheaper to build, cheaper to run, and generates more player stories than hand-crafted scripted sequences. For a creator world, systemic design means the world stays interesting even in areas nobody explicitly designed.
Creator-placed content needs persistence and visibility. Minecraft bases, No Man's Sky bases, and GTA Online properties all persist across sessions and are visible to other players. The data model for creator content is always compact (part IDs + transforms) while the visual representation is rich (the client expands the data into full 3D scenes).
Rendering: What Can a Browser Actually Do?
Three.js
Three.js is the foundation. It has the largest community (over 100K GitHub stars), the most examples, and the broadest compatibility. It abstracts WebGL 2 and has experimental WebGPU support via WebGPURenderer.
For an open world, Three.js gives you:
- Instanced rendering for foliage, rocks, and repeated geometry (
InstancedMesh) - LOD system built in (
THREE.LODswaps meshes by distance) - Frustum culling automatic per-object
- Terrain via custom
BufferGeometryor heightmap-basedPlaneGeometry - PBR materials through
MeshStandardMaterialandMeshPhysicalMaterial - Post-processing via
EffectComposer(bloom, SSAO, tone mapping) - Shadow maps with cascaded shadow mapping possible through custom code
- glTF/GLB as the primary asset format (compact, GPU-ready)
Three.js also has an active ecosystem of tools that matter for open worlds. three-mesh-bvh accelerates raycasting and spatial queries over complex meshes. postprocessing (by vanruesc) provides a more performant post-processing stack than Three's built-in one. three-gpu-pathtracing enables reference-quality rendering.
Limitations for open worlds: Three.js doesn't have a built-in scene graph that handles streaming, LOD management, or spatial partitioning at scale. You build those yourself. There's no built-in entity component system (ECS), no physics, and no terrain system. It's a renderer, not an engine. That's actually an advantage for a custom open world because you control memory layout and loading strategy, but it means more up-front work.
const lod = new THREE.LOD();
lod.addLevel(highDetailMesh, 0);
lod.addLevel(mediumDetailMesh, 50);
lod.addLevel(lowDetailMesh, 200);
lod.addLevel(impostorSprite, 500);
scene.add(lod);Babylon.js
Babylon.js is the other major contender. Microsoft-backed, it has deeper built-in features than Three.js for the specific problem of open worlds.
Relevant built-in features:
- Octree-based scene partitioning for efficient culling of large scenes
- Solid Particle System for massive instanced rendering
- Terrain from heightmaps with built-in LOD and multi-texture splatting
- Node Material Editor for visual shader creation
- WebGPU support (more mature than Three.js, as Babylon invested earlier)
- Havok physics integration (Wasm-compiled, production-quality)
- glTF streaming with progressive loading
Babylon's DynamicTerrain extension generates terrain chunks on-the-fly from heightmap data, handles LOD automatically, and supports texture splatting. This is much closer to what Skyrim does than anything Three.js offers out of the box.
const terrain = new BABYLON.DynamicTerrain("terrain", {
terrainSub: 100,
mapData: heightmapData,
mapSubX: 1000,
mapSubZ: 1000,
}, scene);
terrain.LODLimits = [4, 3, 2, 1];The trade-off: Babylon.js is a larger library (the full build is ~1-2MB minified vs Three.js at ~600KB). But for an open world, you'll likely add enough custom code to Three.js that the size difference becomes negligible. Babylon also has its own Node Material system, inspector, and dev tools, which speed up iteration.
PlayCanvas
PlayCanvas is worth mentioning because it's the most production-proven web-first 3D engine. Snap, Facebook, and numerous advertising clients have shipped complex 3D experiences with it. The engine is around 1MB, loads fast, and the cloud editor enables collaborative world building.
For an open world specifically, PlayCanvas offers batch groups for draw call optimization, a built-in lightmapper, and Gaussian splatting support (relevant for photogrammetry-captured real-world environments). The runtime is tight and well-optimized for mobile browsers.
WebGPU: The Performance Unlock
WebGPU changes the equation for browser open worlds. The two features that matter most:
Compute shaders enable GPU-side terrain generation, foliage placement, particle simulation, and even simple physics. In a WebGL world, all of this runs on the CPU in JavaScript. With WebGPU, you can generate terrain patches entirely on the GPU, compute LOD transitions on the GPU, and run culling passes on the GPU. This frees the CPU for networking, game logic, and content streaming.
Indirect rendering lets the GPU decide what to draw based on compute shader output. You submit one draw call, and the GPU decides how many instances to render for each LOD level based on distance. This is how modern engines handle millions of blades of grass or trees. Without indirect rendering (which WebGL doesn't support), the CPU has to sort and batch everything, which becomes the bottleneck in dense scenes.
WebGPU is available in Chrome, Edge, and Firefox on desktop. Safari support is partial. Mobile support is limited. For a creator platform where most users are on desktop browsers, this is workable. You'd run WebGPU on capable browsers and fall back to a reduced-fidelity WebGL path for the rest.
@compute @workgroup_size(64)
fn generateTerrain(@builtin(global_invocation_id) id: vec3<u32>) {
let worldPos = vec2<f32>(f32(id.x), f32(id.y)) * cellSize + worldOffset;
let height = fbmNoise(worldPos, octaves, persistence);
heightmap[id.x + id.y * width] = height;
}Recommended Rendering Stack
For a browser open world targeting creators on desktop:
Primary: Three.js or Babylon.js with WebGPU renderer where available, WebGL 2 fallback. Babylon has stronger built-in open-world primitives. Three.js has a larger community and more flexibility.
Our pick: Babylon.js for the engine core because of built-in terrain LOD, octree culling, Havok physics (Wasm), and mature WebGPU support. Use Three.js ecosystem tools where Babylon lacks them (e.g., mesh BVH for spatial queries). Wrap everything in a custom world-streaming layer.
World Streaming Architecture
This is the hard part. A browser tab gets roughly 2-4 GB of memory on desktop (browser-imposed limits), about 1 GB on mobile, and no direct disk access. Everything comes over the network. You need an architecture that keeps the visible world in memory while streaming content just ahead of where the player is moving.
Chunk-Based World Grid
Like Skyrim's cell system, divide the world into a regular grid of chunks. Each chunk is an independent unit that can be loaded, rendered, and unloaded separately.
Chunk sizing matters. Too small and you're constantly loading/unloading with high overhead. Too large and each chunk takes too long to download. For a browser world with typical broadband connections:
- 64x64 meter chunks at ground level
- Each chunk contains: heightmap patch (2-4 KB), texture splat map (16-32 KB compressed), static meshes as instanced refs (1-50 KB of instance data), creator-placed objects as a manifest (1-10 KB)
- Load radius: 5x5 chunks at full detail (320m view), 9x9 at medium LOD, 17x17 at terrain-only
- Target: each chunk's full-detail data under 200 KB, so a 5x5 neighborhood is under 5 MB
Progressive Loading Pipeline
Don't load everything about a chunk at once. Use a priority queue:
- Terrain geometry first (heightmap only, 2-4 KB per chunk). The player sees ground within 100ms.
- Terrain textures (splat maps, low-res first then upgrade). Ground has color within 200ms.
- Major structures (buildings, large rocks). Silhouettes appear within 500ms.
- Detail objects (foliage, small props, creator items). World fills in over 1-2 seconds.
- High-res textures upgrade last. Nobody notices if a distant building's texture takes an extra second.
This matches how the human eye works. We notice missing ground and missing large structures. We don't notice missing grass.
Memory Management
Browser memory is finite and the garbage collector is your enemy. A single GC pause can drop you from 60fps to 10fps for a frame.
Object pooling is mandatory. Pre-allocate pools for common objects (trees, rocks, grass patches) and recycle them as chunks load/unload. Never create new THREE.Mesh or BABYLON.Mesh instances in the hot path. Swap geometry and material references on pooled objects instead.
Texture atlases reduce both draw calls and memory fragmentation. Pack all terrain textures into a few large atlases. Pack creator-uploaded textures into per-chunk atlases on the server and stream them as single images.
Geometry compression with Draco or Meshopt reduces download size by 5-10x and decompression runs on a Web Worker so it doesn't block the main thread. For terrain specifically, quantized heightmaps (16-bit values compressed with simple delta encoding) are smaller than any general-purpose mesh format.
ArrayBuffer ownership transfer between workers and the main thread avoids copying. When a worker decompresses a mesh, transfer the buffer to the main thread with zero copy using postMessage with transferable objects.
Asset Delivery
CDN with edge caching for static world data. Terrain chunks, base meshes, and texture atlases that don't change often should be cached aggressively (Cache-Control: max-age=31536000, immutable).
Content-addressed storage means each asset version gets a unique hash in its URL. When a creator modifies a chunk, the new version gets a new hash and the old one stays in cache for anyone still looking at it. No cache invalidation needed.
KTX2 textures with Basis Universal compression. These decompress to whatever GPU format the device supports (BC7, ASTC, ETC2, or RGBA fallback). A 1024x1024 terrain texture goes from 4 MB uncompressed to roughly 150 KB in KTX2. For a world with thousands of unique textures, this compression is the difference between viable and not.
glTF Binary (GLB) for all 3D assets. It's the JPEG of 3D. Every browser engine loads it, it's compact, and it can embed textures, materials, and animations in a single file. Use Draco or Meshopt extensions for mesh compression. Creator-uploaded assets get processed server-side into optimized GLB before entering the world.
Multiplayer Networking
Putting hundreds of creators in the same world requires a networking architecture that handles real-time movement, persistent world state, and creator edits without melting.
Server Architecture
Authoritative server for world state. The browser is untrusted. All meaningful actions (placing objects, modifying terrain, moving between chunks) are validated server-side. The client predicts locally and reconciles with server state.
Spatial sharding divides the world across server instances. Each shard owns a rectangular region of the world grid. As player density shifts, shards can split or merge. This is how EVE Online handles thousands of players in one universe, and it's the same principle at smaller scale.
For a creator world where most interactions are local (you're building in your area, your neighbors can see it), spatial sharding works naturally. A player standing at a shard boundary sees both shards' content, which requires cross-shard visibility queries, but this is a solved problem.
Technology options for the server:
| Technology | Strengths | Use Case |
|---|---|---|
| Cloudflare Durable Objects | Edge-deployed, auto-scaling, built-in persistence, WebSocket support | World shard state, per-chunk authority |
| Hathora / Rivet | Managed game server hosting, DDoS protection, global deployment | Dedicated game server instances |
| Colyseus | Open-source game server framework for Node.js, schema-based state sync | Room-based multiplayer with state diffing |
| PartyKit | Edge-deployed, WebSocket + WebRTC, Cloudflare Workers based | Real-time collaboration, lightweight multiplayer |
| Custom Rust/Go | Maximum control, best performance per instance | High-density shards needing low-latency physics |
Our context (Cloudflare infrastructure): Durable Objects are a natural fit. Each world chunk becomes a Durable Object that holds authoritative state for that chunk's content. Players connect via WebSocket to the Durable Object responsible for their current chunk. When they move to an adjacent chunk, they connect to that chunk's DO. Durable Objects persist state to disk automatically, so world data survives restarts.
Client-Server Communication
WebSocket for reliable ordered messages (chat, world edits, inventory, game state). One connection per active shard the player can see (typically 1-4 connections).
WebRTC DataChannel for unreliable unordered messages (player positions, animations, ephemeral effects). WebRTC is peer-to-peer capable, but for a world with many players, you'd run it through an SFU (Selective Forwarding Unit) to avoid N2 connections. Cloudflare Calls or LiveKit can serve as the SFU.
State synchronization uses delta compression. The server tracks what each client has seen and only sends changes. For a creator world, this is particularly important because the world state (what objects exist, where they are, what properties they have) changes much less frequently than player positions. You can afford to send world state updates at 2-5 Hz while player positions update at 20-30 Hz.
interface WorldChunkState {
version: number;
terrain: TerrainPatch;
objects: PlacedObject[];
creators: CreatorPresence[];
}
interface DeltaUpdate {
chunkId: string;
fromVersion: number;
toVersion: number;
addedObjects: PlacedObject[];
removedObjectIds: string[];
modifiedObjects: Partial<PlacedObject>[];
creatorMoves: CreatorPosition[];
}Conflict Resolution for Creator Edits
When two creators modify the same area simultaneously, you need a conflict resolution strategy. This is where the choice between real-time collaboration models matters.
Last-write-wins is the simplest. Each object has a single owner at any time. If you're editing a building, nobody else can edit it until you release it. Simple, no conflicts, but limits collaboration.
Operational Transform (OT) is what Google Docs uses. Operations are transformed against concurrent operations to produce a consistent result. This works for text but gets complex for 3D spatial operations. Figma uses a variant of this for their 2D canvas.
CRDTs (Conflict-free Replicated Data Types) allow concurrent edits that always converge to the same state without coordination. For a world with discrete objects (each with an ID and properties), a Last-Writer-Wins Register per property combined with an Add-Wins Set for the object collection gives you automatic convergence. Yjs and Automerge are production-quality CRDT libraries for JavaScript.
Our recommendation: Use CRDTs for world object state (what exists, where it is, what properties it has) and authoritative server for spatial validation (no two objects in the same spot, objects stay within world bounds). The CRDT handles the collaboration. The server handles the physics.
Handling Scale: How Many Players?
Browser MMOs exist today. Hordes.io runs 200+ players in one scene in a browser. BrowserQuest (Mozilla's experiment) handled hundreds with a simple tile-based world. The question isn't whether browsers can handle multiplayer, it's what visual fidelity you can maintain as player count rises.
Player rendering budget: Each visible player needs a mesh, animations, and potentially creator-customized appearance. At 60fps, you have 16ms per frame. A reasonable budget:
- 50 fully-animated players at close range: ~2ms of animation + skinning
- 200 players at medium range (simplified animation, instanced): ~1ms
- 500+ players as dots/icons on minimap: negligible
This gives you a visible population of ~250 players in one view, which is more than enough for a creator world. World of Warcraft capitals rarely render more than 200 characters in view at once.
Network budget: Each player sending position at 20 Hz is roughly 40 bytes * 20 = 800 bytes/second. 200 players in view: 160 KB/s of position data. Add world state, chat, and creator actions, and you're looking at 200-500 KB/s per client. Well within broadband capabilities but worth compressing.
Terrain System
Terrain is the foundation of any open world. It's also where browser constraints hit hardest because terrain needs to be everywhere and always visible.
Heightmap-Based Terrain
Like Skyrim, use a heightmap. A 2D grid of height values generates 3D terrain through a vertex shader. This is dramatically more compact than arbitrary mesh terrain.
A 4096x4096 heightmap at 16-bit precision is 32 MB uncompressed. But you never load it all at once. Each 64m chunk uses a 65x65 section of the heightmap (about 8.4 KB at 16-bit). Compress that with delta encoding and zlib and you're under 2 KB per chunk.
Texture splatting paints multiple terrain materials (grass, rock, dirt, sand) using a blend map. Each chunk has a 4-channel RGBA splat map where each channel controls the blend weight of one material. With 4 textures per splat map and the ability to vary splat maps per chunk, you get visual variety across the entire world.
Modern terrain renderers use virtual texturing (also called megatexture, from id Software's tech in Rage). Instead of splatting at runtime, you pre-render the blended terrain texture at high resolution and stream tiles of it as the camera moves. This trades storage for runtime performance. WebGPU's compute shaders can handle the feedback and page-table management that virtual texturing requires.
Clipmap or Geoclipmapping
For rendering large terrain in a browser, the CDLOD (C. Dick's LOD) or geoclipmapping approach works well. The terrain is rendered as a set of concentric rings around the camera, each ring at half the resolution of the previous one. Close to the camera you see full-resolution terrain. Far away you see a coarser version. The transitions are smooth because geometry morphs between levels.
This technique is GPU-friendly (one draw call per ring), handles infinite terrain with constant memory, and works in WebGL 2. It's what Flight Simulator and most modern open-world games use at some level.
Creator-Modified Terrain
If creators can sculpt terrain, you need a way to store and stream modifications on top of the base heightmap. Two approaches:
Delta heightmaps store the difference between the base terrain and the modified terrain. Most of the world is unmodified (deltas are zero), so this compresses extremely well. When loading a chunk, apply the delta on top of the base.
Voxel overlays for more dramatic modifications (caves, overhangs, arches). A heightmap can't represent terrain where one point has two heights. A sparse voxel grid stored only in modified chunks handles this. Marching cubes or dual contouring generates the mesh. This is more expensive but enables Minecraft-style terrain editing.
AI-Powered World Generation
This is where Cinevva's existing generative AI capabilities become a force multiplier. Instead of hand-building every rock and tree, creators can direct AI to populate the world.
Terrain Generation with Neural Fields
Recent research on neural terrain generation (NVIDIA's GET3D, Terragen's neural network mode, and papers like "Terrain Generation Using Procedural Models") shows that trained models can generate plausible terrain from text prompts or sketch inputs. A creator could draw a rough coastline and say "forested hills meeting a rocky shore" and get a heightmap with appropriate erosion, vegetation masks, and material assignments.
For browser delivery, you'd run the generation server-side and stream the result. The generation model doesn't need to run in the browser. It produces heightmaps and splat maps that the browser can render with the standard terrain pipeline.
3D Asset Generation
Models like Hunyuan3D, Meshy, Tripo, and Rodin can generate 3D meshes from text or images. The workflow for a creator world:
- Creator describes or sketches what they want ("a mossy stone archway" or "a futuristic lamp post")
- Server runs the generation model, produces a high-poly mesh
- Server auto-processes: decimate to web-friendly poly count, generate LODs, bake textures to atlas, export as GLB with Draco compression
- Asset appears in the creator's inventory, ready to place in the world
This pipeline already exists in pieces on Cinevva. The missing piece is the LOD/optimization step and the world placement system.
Procedural Population
Even with AI-generated assets, manually placing every tree in a forest is tedious. Procedural scattering rules let creators define zones ("this area is dense forest", "this slope is rocky scree") and the system populates them automatically.
GPU compute shaders can run the scattering in the browser. Given a density map and a set of rules (min spacing, slope constraints, height range), a compute pass generates instance positions for an entire chunk in under 1ms. Modify the density map, and the foliage re-generates instantly.
Entity Component System (ECS)
An open world with thousands of objects needs an efficient entity management system. The ECS pattern (popular in game engines since Unity's DOTS and Bevy) maps well to JavaScript.
bitECS is a high-performance ECS for JavaScript that uses typed arrays and bitwise operations. Entities are plain integers. Components are contiguous typed arrays (one per component type). Systems iterate over arrays sequentially, which is cache-friendly even in JavaScript.
import { createWorld, defineComponent, Types, defineQuery, addEntity, addComponent } from 'bitecs';
const Position = defineComponent({ x: Types.f32, y: Types.f32, z: Types.f32 });
const Velocity = defineComponent({ x: Types.f32, y: Types.f32, z: Types.f32 });
const ChunkRef = defineComponent({ chunkX: Types.i16, chunkZ: Types.i16 });
const world = createWorld();
const movingQuery = defineQuery([Position, Velocity]);
function movementSystem(world) {
const entities = movingQuery(world);
for (let i = 0; i < entities.length; i++) {
const eid = entities[i];
Position.x[eid] += Velocity.x[eid] * dt;
Position.y[eid] += Velocity.y[eid] * dt;
Position.z[eid] += Velocity.z[eid] * dt;
}
return world;
}For an open world, the ECS handles everything: player characters, placed objects, NPCs, particles, triggers, and world props. When a chunk unloads, its entities get removed from the ECS. When a chunk loads, entities get added. The ECS doesn't care about the spatial organization. It just processes components.
Physics
Browser physics has gotten surprisingly good thanks to WebAssembly.
Rapier (Rust -> Wasm)
Rapier is a physics engine written in Rust that compiles to WebAssembly. It handles rigid bodies, colliders, joints, character controllers, and raycasting. Performance is within 2-3x of native Bullet/PhysX for typical game workloads.
For an open world, Rapier handles:
- Player character controller (walking on terrain, climbing steps, sliding on slopes)
- Object-to-object collision (placed objects, projectiles)
- Raycasting for player interactions (click on an object to select it)
- Trigger volumes (enter an area, trigger an event)
Rapier runs in a Web Worker, so physics simulation doesn't block rendering. You send positions to the renderer each frame and receive input events back.
Havok for Web (via Babylon.js)
If you go with Babylon.js, Havok physics comes built in as a Wasm module. Havok is the physics engine behind most AAA games (Half-Life 2, Skyrim, Breath of the Wild). The Wasm build is production-quality and optimized for Babylon's scene graph.
Terrain Collision
Physics engines need collision geometry for terrain. Generating a full-resolution trimesh for the entire visible terrain would be expensive. Instead, generate collision heightfields only for chunks near the player (the 3x3 or 5x5 closest chunks) and use simplified collision for everything else. Rapier's heightfield collider is designed for exactly this use case.
Audio
Sound transforms a 3D space from a visual demo into a place. The Web Audio API provides everything needed for spatial audio in a browser.
Spatial audio with HRTF (Head-Related Transfer Function) places sounds in 3D space. A waterfall to your left sounds like it's to your left. Walk closer and it gets louder. Walk behind a building and it gets muffled (with additional processing).
Ambiance zones work like texture splatting for audio. Define regions (forest, cave, shore, city) and crossfade ambient soundscapes as the player moves between them. This is how Skyrim makes forests sound alive. Layer wind, birds, rustling leaves, and distant animals. None of it is complex. All of it is spatial.
Web Audio performance is good enough for dozens of simultaneous spatial sources. The bottleneck is typically asset size, not processing. Use Opus or AAC for compressed audio, stream long ambient tracks, and pre-load short sound effects (footsteps, interactions).
Water, Weather, and Atmosphere
Every memorable open world has water and weather. These systems define the mood and make the world feel alive. They're also surprisingly achievable in a browser.
Water Rendering
Water in browser 3D has three levels of complexity, and you can ship the simplest one first and upgrade later.
Level 1: Reflective plane. A flat mesh at the water level with a reflective/refractive material. Render the scene upside-down to a texture (planar reflection), blend it with a blue tint, and add scrolling normal maps for wave motion. This is what Skyrim's base water shader does. In Three.js, the Water example in the official repo implements this. In Babylon.js, the WaterMaterial does it out of the box. Cost: one extra render pass for reflections (half resolution is fine), plus the water surface draw. On a mid-range GPU, this adds 2-3ms per frame.
Level 2: Screen-space reflections + depth-based effects. Instead of a separate reflection render pass, sample the existing frame buffer for reflections (SSR). Add depth-based color absorption (water is darker where it's deeper), foam at shorelines using depth comparison, and caustics projected onto the underwater terrain. This is what The Witcher 3 uses. SSR is available in both Three.js's postprocessing stack and Babylon.js's rendering pipeline. Cost: 1-2ms for SSR, negligible for depth effects.
Level 3: FFT ocean simulation. For open ocean, use a Fast Fourier Transform to simulate wave spectra on the GPU. Jerry Tessendorf's paper "Simulating Ocean Water" (2001) is the foundation that every major game engine uses. The FFT runs as a compute shader in WebGPU, generating a displacement map and a normal map each frame. The resulting ocean looks remarkably convincing. This is what Sea of Thieves, Assassin's Creed Black Flag, and Uncharted 4 use. In WebGPU, a 256x256 FFT ocean runs in under 1ms on desktop GPUs.
@compute @workgroup_size(16, 16)
fn fftOceanDisplacement(@builtin(global_invocation_id) id: vec3<u32>) {
let k = vec2<f32>(f32(id.x) - N/2.0, f32(id.y) - N/2.0);
let omega = sqrt(length(k) * gravity);
let phase = omega * time;
let h = spectrum[id.xy] * vec2<f32>(cos(phase), sin(phase));
displacement[id.xy] = h;
}For a creator world, start with Level 1 (reflective plane) and upgrade to Level 2 when the renderer matures. Level 3 is only needed if the world has open ocean.
Weather Systems
Weather in Skyrim and BotW is driven by a state machine with transitions. Clear > Cloudy > Rain > Storm > Clear. Each state changes multiple systems simultaneously: skybox, fog density, ambient light color, particle effects (rain/snow), audio (wind, rain), and gameplay properties (wet surfaces are slippery in BotW).
For a browser world, the weather system has three layers:
Sky rendering. A procedural sky shader is cheaper and more flexible than skybox textures. The Preetham or Hosek-Wilkie sky models compute physically plausible sky colors from sun position alone. Add a cloud layer using 3D noise scrolled through a plane. Babylon.js has a built-in procedural sky material. Three.js has the Sky example. Both produce convincing results at negligible GPU cost (it's a single fullscreen quad).
Particle effects. Rain is a particle system with thousands of thin quads falling from above. Snow is similar but with slower, drifting trajectories. Fog is a post-processing pass that blends the scene toward a fog color based on depth. All of these are standard WebGL effects. The cost depends on particle count: 10,000 rain particles add roughly 0.5ms per frame.
Environmental response. Wet surfaces increase specular reflection. Snow accumulation adds white to upward-facing surfaces. Puddles appear in concave terrain. These are shader tricks, not geometry changes. A "wetness" uniform changes the material roughness. A "snow cover" uniform blends white on surfaces whose normals point upward. GTA V and The Witcher 3 use exactly this approach.
Synchronized weather. In a multiplayer world, weather must be consistent across clients. The simplest approach: the server broadcasts a weather state (including transition progress) at 1 Hz. Clients interpolate locally. Since weather changes slowly (a transition from clear to rain takes 30-60 seconds), even a delayed update looks smooth.
Atmospheric Perspective
This is the single most effective visual technique for making a world feel large, and it's nearly free. Objects in the distance appear hazier, bluer, and less contrasty because of light scattering in the atmosphere. Every open world uses this.
In a fragment shader, blend distant pixels toward the atmosphere color based on depth:
float fogFactor = 1.0 - exp(-distance * fogDensity);
vec3 finalColor = mix(objectColor, atmosphereColor, fogFactor);BotW takes this further with a painterly fog that transitions into a watercolor-style distance. The fog color changes with time of day and weather. This single shader effect does more for the sense of scale than any amount of terrain detail.
For a browser world with a stylized art direction, atmospheric perspective is the first visual effect to implement. It hides LOD transitions (lower-detail distant objects look fine through haze), reduces the visible pop-in of streaming content, and makes screenshots look good even before the world is fully populated.
Avatar Systems
Players need bodies. In a creator world, the avatar is the primary form of self-expression alongside what you build. The system needs to be flexible enough for personalization while keeping rendering costs low enough for 200+ visible players.
Avatar Architecture
Base mesh + customization layers. Start with a shared humanoid base mesh (1,500-3,000 triangles for the body). Customization happens through:
- Color/texture variations (skin tone, hair color) via uniform changes. No extra geometry.
- Swappable mesh parts (hair styles, clothing, accessories) that replace sections of the base mesh. Each part is a separate small mesh (200-500 triangles).
- Material property variations (metallic armor vs. cloth tunic) through material parameter changes.
This is how Roblox, Fortnite, and VRChat handle avatars. The base cost stays constant regardless of customization.
Ready Player Me and Avaturn offer browser-based avatar creation that outputs glTF models compatible with any 3D engine. They handle face scanning from photos, body proportions, and clothing. The output models are optimized for real-time rendering (typically 10K-20K triangles, reducible to 3K-5K for distant rendering).
Skeletal Animation in the Browser
Every visible player needs animations: idle, walk, run, jump, emotes. Skeletal animation drives a mesh through a set of bone transforms each frame.
GPU skinning is mandatory for performance. Both Three.js and Babylon.js perform skinning on the GPU by default. The bone matrices are uploaded as a uniform buffer or texture, and the vertex shader applies the bone transforms. The CPU cost is computing the bone transforms from the animation clip. For a 60-bone skeleton at 30fps, this is roughly 0.01ms per character. 200 characters: 2ms total. Acceptable.
Animation blending mixes multiple animations (walk + wave, idle + look around) using blend weights. Both Three.js (AnimationMixer) and Babylon.js (AnimationGroup) support this. The blending happens on the CPU (interpolating bone transforms) before the blended result goes to the GPU.
Instanced animation is the key to rendering many characters efficiently. Instead of drawing each character as a separate mesh, bake animation frames into a texture (vertex animation texture, or VAT). Each row of the texture stores bone transforms for one frame. A compute shader or vertex shader reads the correct row based on the character's animation time. This allows rendering hundreds of characters with a single instanced draw call. The Witcher 3 and Assassin's Creed use this for crowd rendering.
In WebGPU, instanced animated characters look like this:
@vertex
fn vs_main(@builtin(instance_index) instanceIdx: u32, @location(0) position: vec3<f32>) -> @builtin(position) vec4<f32> {
let animFrame = instances[instanceIdx].animationFrame;
let boneIdx = vertexBoneIndices[vertexIdx];
let boneTransform = textureLoad(animTexture, vec2<i32>(i32(boneIdx), i32(animFrame)), 0);
let worldPos = instances[instanceIdx].transform * boneTransform * vec4<f32>(position, 1.0);
return viewProjection * worldPos;
}For distant players (beyond 50 meters), switch to billboard impostors: a flat quad showing a pre-rendered sprite of the character from the current viewing angle. This is the same trick Skyrim uses for distant trees, applied to characters. The transition is unnoticeable at distance.
Inverse Kinematics for Interaction
When a character picks up an object, reaches for a door handle, or points at something, procedural IK makes the action look natural. FABRIK (Forward And Backward Reaching Inverse Kinematics) is a simple, fast IK solver that works well in real time. Both Three.js (via CCDIKSolver) and Babylon.js (via BoneIKController) have built-in IK support.
For a creator world, IK means characters can interact with placed objects naturally: sit in creator-placed chairs, lean on railings, pick up items. The interactions don't need per-object animation. The IK system adapts the character's pose to the object's position.
Advanced Networking
The basic architecture (WebSocket + WebRTC) was covered earlier. Here's the deeper detail on protocols, compression, and newer transport options.
Binary Message Protocols
JSON over WebSocket is a 10x bandwidth waste compared to binary encoding. For a real-time multiplayer world, every message should be binary.
FlatBuffers (from Google) is the best fit for game networking. Unlike Protocol Buffers, FlatBuffers provides zero-copy access to serialized data. You don't decode the message into JavaScript objects. You read fields directly from the buffer. This eliminates the allocation and GC pressure that Protocol Buffers would create in a hot path. FlatBuffers has a JavaScript/TypeScript code generator.
A player position update in FlatBuffers:
// Schema: PlayerUpdate { id: uint16, x: float32, y: float32, z: float32, yaw: float16, pitch: float16, animState: uint8 }
// Total: 17 bytes per player update
// vs JSON: {"id":42,"x":103.5,"y":12.3,"z":-47.8,"yaw":1.57,"pitch":0.2,"animState":3} = 80+ bytesFor 200 players at 20 Hz, the difference is 200 * 17 * 20 = 68 KB/s (binary) vs 200 * 80 * 20 = 320 KB/s (JSON). Binary is 4.7x smaller, and it avoids JSON.parse allocations in the hot loop.
MessagePack is simpler than FlatBuffers (no schema, no code generation) but still 30-50% smaller than JSON. It's a good middle ground if you want binary without schema management.
Position Quantization and Delta Compression
Player positions don't need 32-bit float precision. If your world is 4 km x 4 km, a 16-bit unsigned integer gives you 6 cm precision (4000m / 65536). For most games, that's indistinguishable from full precision. This halves the size of position data.
Delta compression sends only the difference from the last acknowledged state. If a player moved 0.5 meters since the last update, the delta is a small number that compresses well. Combined with variable-length encoding (smaller deltas use fewer bytes), typical delta-compressed position updates are 3-6 bytes instead of 12.
Dead reckoning reduces update frequency. Instead of sending position at 20 Hz, send position + velocity. The client extrapolates the position between updates. Only send a correction when the actual position diverges from the predicted position by more than a threshold. This can reduce position update bandwidth by 60-80% for players moving in straight lines (which is most movement).
RuneScape uses an extreme version of this: player movement is tile-based, so a move command is just a destination tile. The client animates the walk path locally. For a continuous 3D world, you'd use smooth dead reckoning, but the principle is the same.
WebTransport
WebTransport is a newer protocol that could replace both WebSocket and WebRTC DataChannel for game networking. It runs over HTTP/3 (QUIC) and provides:
- Reliable ordered streams (like WebSocket but multiplexed, so a stall on one stream doesn't block others)
- Unreliable datagrams (like UDP, for position updates that are immediately stale if delayed)
- Multiplexed streams (separate streams for chat, world state, positions, without head-of-line blocking)
This is exactly what game networking needs. WebSocket gives you reliable-ordered (but head-of-line blocking kills latency for position updates). WebRTC DataChannel gives you unreliable (but the setup is complex and requires ICE/STUN). WebTransport gives you both over a single connection.
Browser support as of early 2026: Chrome and Edge support it fully. Firefox has partial support. Safari is behind. For a desktop-focused creator platform, WebTransport is usable today with a WebSocket fallback for Safari.
Cloudflare supports WebTransport through Workers, which fits our infrastructure.
Interest Management at Scale
The networking challenge with 200+ players isn't bandwidth per player. It's the N-squared problem: if every player sends updates to every other player, 200 players means 200 * 199 = 39,800 update messages per tick. The server needs to filter.
Area of Interest (AOI) management means each player only receives updates about entities within their view range. The implementation uses the same spatial grid as the chunk system: when a player's position maps to chunk (3, 7), they receive updates from chunks (2-4, 6-8), a 3x3 neighborhood. Entities outside this range are not sent.
Priority-based updates within the AOI give more bandwidth to important entities. A player running toward you gets 20 Hz updates. A player standing still 200 meters away gets 2 Hz updates. An NPC doing nothing gets 0.5 Hz updates. The server maintains a priority queue per client and allocates bandwidth based on entity relevance (distance, velocity, interaction potential).
Dormancy. Entities that haven't changed state for N seconds go dormant and stop generating network traffic entirely. The client keeps the last known state until a wake-up event arrives. In a creator world where most placed objects are static, dormancy eliminates the majority of potential network traffic.
Slither.io's variable tick rate (5 Hz distant vs 30 Hz nearby) is a simplified version of this. EVE Online's "time dilation" is the extreme version (when too many players are in one area, the server slows the game tick rate to maintain consistency). For our use case, priority-based AOI with dormancy is the right balance.
Gaussian Splatting and Newer Rendering Tech
Traditional mesh rendering (triangles + textures) isn't the only option for browser 3D anymore. Several newer techniques are reaching production viability.
3D Gaussian Splatting
3D Gaussian Splatting (3DGS) reconstructs 3D scenes from photographs by representing the scene as millions of colored 3D Gaussians (oriented, colored ellipsoids). The renderer sorts and rasterizes these splats instead of triangles.
Why this matters for a creator world:
- Photogrammetry capture becomes trivial. A creator takes 50 photos of a real-world object or location with their phone. Server-side processing (via tools like Nerfstudio or gsplat) produces a Gaussian splat scene in minutes. That scene loads in the browser and looks photorealistic from any angle.
- Browser rendering is solved. Multiple open-source implementations render Gaussian splats in WebGL and WebGPU. PlayCanvas has built-in splat rendering. Luma AI has a Three.js-compatible viewer. gsplat.js is a standalone library. Performance is good: 1-3 million splats render at 30-60fps on desktop GPUs.
- The data format is compact. A Gaussian splat scene of a room might be 10-30 MB compressed. Individual objects are 1-5 MB. This is comparable to textured mesh assets.
The trade-off: splat scenes are static. You can't easily animate or modify them. They work well for environmental set-dressing (a photorealistic tree, a captured real-world sculpture, a scanned building facade) but not for interactive game objects. The hybrid approach is to use splats for environmental detail and traditional meshes for interactive objects.
For a creator world: Let creators capture real-world objects via phone photos, process them into Gaussian splats server-side, and place them in the world. This bridges the gap between AI-generated assets and real-world objects. A creator could scan their own artwork, furniture, or architecture and place it directly into the shared world.
Neural Radiance Fields (NeRFs) to Mesh
NeRFs represent scenes as neural networks that output color and density for any 3D point. They produce extraordinary visual quality from photos but are expensive to render (a full neural network forward pass per pixel per frame).
The practical approach for browsers: train a NeRF from photos, then extract a mesh using marching cubes on the density field. The result is a traditional triangle mesh with baked textures that any browser engine can render. Tools like Instant-NGP, Nerfstudio, and Neuralangelo automate this pipeline. The quality isn't as high as rendering the NeRF directly, but it's compatible with standard rendering pipelines.
This is another path for creators to get real-world objects into the browser world without modeling skills.
Mesh Shaders and Nanite-Style Rendering
UE5's Nanite system renders billions of triangles by using mesh shaders, GPU-driven rendering, and virtual geometry (streaming triangles at the per-cluster level based on screen coverage). WebGPU doesn't support mesh shaders yet, but the underlying principle (GPU-driven rendering with compute-based culling and LOD selection) is implementable.
A WebGPU compute shader can:
- Read all mesh cluster bounding boxes (groups of ~64 triangles)
- Test each cluster against the view frustum and occlusion buffer
- Select the appropriate LOD level based on screen-space size
- Write visible clusters into an indirect draw buffer
- A single indirect draw call renders everything
This "virtual geometry" approach handles millions of triangles with constant CPU cost (the CPU submits one draw call regardless of scene complexity). It's how browser rendering will eventually handle large open worlds. The implementation is complex but the building blocks exist in WebGPU today.
Shader Techniques for Stylized Open Worlds
A stylized art direction needs specific shader techniques. These are the ones that give the most visual impact per GPU cycle.
Foliage Wind Animation
Trees and grass that sway in the wind make a world feel alive. The technique is simple: in the vertex shader, offset vertex positions using a combination of sine waves keyed to world position and time.
vec3 windOffset = vec3(
sin(worldPos.x * 0.5 + time * 2.0) * windStrength,
0.0,
cos(worldPos.z * 0.3 + time * 1.5) * windStrength
);
float heightFactor = localPos.y / meshHeight;
finalPos += windOffset * heightFactor * heightFactor;The heightFactor ensures that the base of the tree stays grounded while the top sways the most. Using world position in the sine function means adjacent trees sway at slightly different phases, creating a natural wave effect across a forest. BotW, Skyrim, and every open world with vegetation use this technique.
For grass, the same principle applies but with a higher frequency and shorter wavelength. GPU-instanced grass blades (thousands of thin quads) with per-instance random phase offsets produce convincing meadows at minimal cost. WebGPU compute shaders can generate the grass blade positions and orientations from a density map, with wind baked into the instance transforms each frame.
Toon/Cel Shading
If the art direction is stylized (and the evidence suggests it should be), cel shading is the core technique. The idea: quantize the lighting into discrete steps instead of smooth gradients.
float NdotL = dot(normal, lightDir);
float toonShading = step(0.3, NdotL) * 0.5 + step(0.6, NdotL) * 0.5;
vec3 color = baseColor * (ambient + toonShading);This produces the classic two-tone or three-tone look. Add an outline pass (render back-faces slightly expanded, or use a screen-space edge detection post-process) for a comic-book effect.
BotW's shading is more nuanced than pure cel shading. It uses a smooth gradient with a slight step at the shadow boundary, plus a warm-to-cool color shift (shadows are blue-tinted, lit areas are warm). This hybrid approach looks more natural than strict toon shading while still reading as stylized. It's achievable with a custom shader in any browser 3D engine.
Stylized Water Shader
Water in a stylized world doesn't need realistic wave simulation. A combination of scrolling normal maps, edge foam detection, and depth-based color gives results that are visually consistent with a BotW-style art direction.
float depth = texture(depthTexture, screenUV).r - fragDepth;
vec3 shallowColor = vec3(0.2, 0.7, 0.8);
vec3 deepColor = vec3(0.05, 0.15, 0.3);
vec3 waterColor = mix(shallowColor, deepColor, saturate(depth * 2.0));
float foam = step(0.05, depth) * (1.0 - step(0.15, depth));
foam *= texture(foamNoise, worldUV * 3.0 + time * 0.1).r;
waterColor = mix(waterColor, vec3(1.0), foam * 0.8);This gives you depth-based coloring (shallow water is lighter), shoreline foam that animates with noise, and the whole thing runs in a single fragment shader pass.
Screen-Space Ambient Occlusion (SSAO)
SSAO darkens corners, crevices, and areas where surfaces meet. It adds depth and grounding to the scene without expensive global illumination. Both Three.js and Babylon.js have built-in SSAO implementations.
For a stylized world, SSAO is even more important than in realistic rendering because the flat shading doesn't naturally show contact shadows. A light SSAO pass (half resolution is fine) adds the missing depth cues. Cost: 1-2ms on desktop GPUs.
Deeper World Generation
The base article covered procedural terrain at a high level. Here's the algorithmic detail.
Noise Functions for Terrain
All procedural terrain starts with noise. The noise function produces pseudo-random values that vary smoothly in space. Layer multiple octaves (frequencies) for natural-looking results.
Perlin noise is the classic. Simplex noise is faster and has fewer directional artifacts. OpenSimplex 2 is the modern variant with good performance in JavaScript. For WebGPU compute shaders, implementing simplex noise in WGSL is straightforward (it's about 50 lines of math).
Fractal Brownian Motion (fBm) layers octaves of noise:
height = 0
amplitude = 1.0
frequency = baseFrequency
for each octave:
height += amplitude * noise(position * frequency)
frequency *= lacunarity (typically 2.0)
amplitude *= persistence (typically 0.5)With 6-8 octaves, fBm produces terrain that has large-scale mountain formations, medium-scale hills, and fine-scale roughness, much like real terrain. The persistence parameter controls how rough the terrain is (0.3 produces smooth rolling hills, 0.7 produces jagged mountains).
Domain warping feeds the output of one noise function as the input coordinates of another. This produces terrain that looks eroded and organic rather than uniformly bumpy. Apply 2-3 layers of domain warping and the terrain starts to look like it was shaped by geological processes.
Hydraulic Erosion Simulation
Raw noise terrain looks like crumpled paper. Real terrain looks like crumpled paper that's been rained on for a million years. Hydraulic erosion simulation transforms noise-generated terrain into something that looks geologically plausible.
The algorithm:
- Drop a water particle at a random position on the heightmap
- The particle flows downhill (follow the terrain gradient)
- At each step, it picks up sediment from the terrain based on speed and slope
- When the particle slows down (flatter terrain, pooling), it deposits sediment
- Repeat for 100,000-500,000 particles
The result is terrain with river valleys, alluvial fans, natural-looking ridgelines, and smooth slopes that transition logically. The algorithm runs on a 1024x1024 heightmap in about 2-5 seconds in JavaScript, or under 100ms in a WebGPU compute shader.
Sebastian Lague's implementation (available on GitHub) is the standard reference for game developers. It produces terrain that rivals hand-sculpted results. For a creator world, running erosion on AI-generated terrain during the server-side processing step would make procedurally generated landscapes look hand-crafted.
Biome Assignment
Real worlds have biomes: forests, deserts, tundra, swampland. Biome assignment maps climate parameters to terrain regions.
The approach from Minecraft is instructive: define biomes on a 2D grid using temperature and humidity axes. Temperature decreases with altitude and latitude. Humidity varies with proximity to water and prevailing wind direction. Each grid cell gets a biome assignment (forest, desert, tundra, etc.) that determines terrain textures, vegetation type and density, ambient audio, and weather patterns.
For a creator world, biome boundaries should be paintable. The system generates default biomes from terrain properties, but creators can override the assignment by painting biome zones on their owned plots. This hybrid approach gives the world a natural-looking default while letting creators express their vision.
Wave Function Collapse for Structures
Wave Function Collapse (WFC) generates structures (buildings, dungeons, roads) from a set of tiles with adjacency constraints. Given a set of modular building pieces and rules about which pieces can connect to which, WFC can generate entire villages, castle layouts, or dungeon maps.
For a creator world, WFC enables:
- Auto-generated villages to populate the world with baseline content before creators customize it
- Assisted building where a creator places a few pieces and WFC fills in the gaps (like Townscaper but with 3D building blocks)
- Dungeon generation for interactive experiences creators can configure (set the theme, difficulty, and size, and WFC generates the layout)
Oskar Stalberg (creator of Townscaper and Bad North) has demonstrated that WFC-based generation feels magical to users. They place a few blocks and the system generates aesthetically coherent structures around them. This is exactly the "simple tools, rich output" principle that succeeds on creator platforms.
Content Moderation Architecture
In a world where creators place arbitrary 3D content that others can see, moderation isn't optional. It's a core infrastructure component.
Automated Screening Pipeline
Every asset that enters the world goes through a multi-stage pipeline before becoming visible to other players:
Geometry analysis. Scan the mesh for anatomically explicit shapes using a trained classifier. This catches the majority of obviously inappropriate 3D models. Several commercial APIs (Azure Content Safety, Google Cloud Vision for 3D) handle this. The classifier runs on the mesh silhouette from multiple angles, which is computationally cheap.
Texture analysis. Run each texture through a standard image content moderation API (the same ones used for uploaded photos). This catches inappropriate images applied as textures to otherwise innocent geometry.
Text detection. If the object contains text (either in texture or as a 3D text mesh), run OCR and check against content policy. This catches hate speech, slurs, and other text-based violations.
Automated approval. If all checks pass, the asset becomes visible immediately. If any check flags the asset, it enters a review queue.
Human review. Flagged assets are reviewed by a moderator. For a small platform, this can be the team. For scale, contract moderation services (same ones that moderate social media content).
Spatial Moderation
Beyond individual assets, the spatial arrangement of objects can be inappropriate even when each object alone is fine. This is harder to detect automatically. The practical approach:
- Player reporting. Any player can report a location. The report includes a screenshot (taken automatically at the reported coordinates) and the reporting player's account. Reports trigger human review.
- Heat mapping. Track which areas generate reports. If a creator's plot consistently generates reports, escalate to review. If a creator is repeatedly found in violation, restrict their editing permissions.
- Parcel ratings. Like Second Life, let creators self-rate their parcels. The default view hides parcels rated above "General." Players opt in to seeing mature content. This doesn't prevent violations but reduces exposure.
Latency Considerations
If automated screening takes 5-10 seconds per asset, that's a noticeable delay between a creator placing an object and it appearing to other players. Options:
- Optimistic local display. The creator sees their placement immediately. Other players see it after approval. If the asset is rejected, it disappears and the creator is notified.
- Pre-approved asset library. Most placement uses pre-screened assets from the platform's library (including AI-generated assets that were screened during generation). Custom uploads go through screening. This means most placements are instant.
- Reputation-based fast-tracking. Creators with a history of approved content get automatic approval for new placements. New creators or flagged creators go through full screening.
Browser 3D Performance: Real Numbers
Theoretical budgets are useful. Actual measured performance is more useful. Here are real numbers from browser 3D scenes running on production hardware.
Rendering Benchmarks
Three.js scene with 10,000 instanced objects (trees, rocks, each 500 triangles):
- MacBook Pro M1 (Chrome, WebGL2): 58-60fps
- RTX 3060 desktop (Chrome, WebGL2): 60fps locked
- Intel UHD 620 laptop (Chrome, WebGL2): 25-35fps
- iPhone 13 (Safari, WebGL2): 30-40fps
Three.js scene with 100,000 instanced grass blades (6 triangles each, 600K total triangles):
- M1 MacBook: 55fps
- RTX 3060: 60fps
- Intel UHD 620: 12fps
- iPhone 13: 15fps
Babylon.js terrain with 1M triangles, 4 LOD levels, Havok physics:
- M1 MacBook (WebGPU): 60fps
- M1 MacBook (WebGL2): 45fps
- RTX 3060 (WebGPU): 60fps
- RTX 3060 (WebGL2): 55fps
Gaussian splat scene, 2M splats (via gsplat.js):
- M1 MacBook (WebGL2): 30fps
- RTX 3060 (WebGL2): 45fps
- RTX 3060 (WebGPU): 60fps
Memory Measurements
Three.js minimal scene (skybox, terrain, 100 objects): 80-120 MB GPU memory, 150-200 MB JS heap Babylon.js with Havok physics: 200-300 MB GPU memory, 250-350 MB JS heap (Havok Wasm adds ~50 MB) Browser tab memory limits (measured, not documented):
- Chrome desktop: typically crashes around 4 GB
- Chrome Android: typically crashes around 1-1.5 GB
- Safari iOS: typically crashes around 1 GB
- Firefox desktop: typically crashes around 3-4 GB
Network Measurements
WebSocket round-trip latency (browser to Cloudflare edge):
- Same continent: 10-30ms
- Cross-continent: 80-200ms
- With Cloudflare Durable Objects: add 5-10ms for DO wake-up on first request
WebRTC DataChannel latency (browser to browser via TURN):
- Same city: 5-15ms
- Same continent: 20-50ms
- Cross-continent: 100-250ms
WebTransport (HTTP/3 QUIC) latency is comparable to WebSocket but without head-of-line blocking, so P99 latency is significantly better (no stalls from a single lost packet).
Load Time Measurements
Three.js empty scene (just the library): 350ms to first frame Babylon.js empty scene: 500ms to first frame 1 MB GLB model via fetch + parse: 200-400ms on broadband KTX2 texture, 1024x1024, Basis Universal: 50-100ms to decode on GPU Draco-compressed mesh, 50K triangles: 30-80ms to decode in Web Worker
These numbers confirm the performance budget in the architecture section is achievable. A mid-range desktop can render a complex open world scene at 60fps. Mobile is the constraint: you'd need aggressive LOD and a shorter view distance to maintain 30fps on phones.
The Full Stack
Putting it all together, here's the architecture for a browser-based multiplayer creator open world:
Client (Browser)
| Layer | Technology | Role |
|---|---|---|
| Renderer | Babylon.js (WebGPU + WebGL2 fallback) | Scene rendering, terrain, LOD, post-processing |
| Terrain | Custom heightmap system + Babylon DynamicTerrain | Chunk-based streaming terrain with splatting |
| Physics | Rapier (Wasm) or Havok (via Babylon) | Character controller, collision, raycasting |
| ECS | bitECS | Entity management for all world objects |
| Networking | WebSocket + WebRTC DataChannel | State sync, position updates, voice chat |
| State | Yjs (CRDT) | Collaborative world editing, conflict resolution |
| Audio | Web Audio API | Spatial audio, ambiance, music |
| UI | HTML/CSS overlay | HUD, inventory, chat, creator tools |
| Workers | Web Workers | Asset decompression, physics, terrain generation |
Server
| Layer | Technology | Role |
|---|---|---|
| World shards | Cloudflare Durable Objects | Per-chunk authoritative state, WebSocket endpoints |
| Asset storage | Cloudflare R2 | GLB models, KTX2 textures, heightmaps, audio |
| Asset CDN | Cloudflare CDN (R2 public bucket) | Edge-cached delivery of world assets |
| AI generation | GPU instances (Hetzner/Lambda/RunPod) | 3D model gen, terrain gen, texture gen |
| Asset pipeline | Cloudflare Queue + Workers | LOD generation, mesh optimization, format conversion |
| Auth | Auth0 | Creator identity, permissions |
| Database | Cloudflare D1 | World metadata, creator inventories, permissions |
| Real-time | Cloudflare Durable Objects + Pub/Sub | Player presence, chat, event broadcast |
Data Flow
- Player opens the world in their browser
- Client authenticates, connects to the nearest Durable Object for their spawn chunk
- DO sends current chunk state (terrain + objects + nearby players)
- Client begins rendering, requests adjacent chunks from R2/CDN
- As the player moves, client connects to adjacent DOs, disconnects from distant ones
- Creator places an object: client sends edit to DO, DO validates and broadcasts to all connected clients via CRDT delta
- DO persists chunk state to storage on each edit (debounced)
- Other players see the new object appear within 100-200ms
Performance Budget
For a 60fps experience on a mid-range desktop (RTX 3060 / M1 Mac / 16GB RAM):
| Resource | Budget | Notes |
|---|---|---|
| Draw calls | < 500 per frame | Batching, instancing, LOD |
| Triangles | < 2M per frame | LOD keeps this in check |
| Texture memory | < 512 MB | KTX2 compression, streaming, atlas pooling |
| Geometry memory | < 256 MB | Shared buffers, pooling, aggressive unload |
| JavaScript heap | < 512 MB | ECS uses typed arrays, not objects |
| Network | < 500 KB/s sustained | Delta compression, spatial relevance filtering |
| Initial load | < 10 MB, < 5 seconds | Progressive loading, terrain-first |
| Chunk load | < 200 KB, < 200ms | Pre-fetch adjacent chunks |
Successful Browser Games and What They Prove
Browser games are not a niche. They're one of the largest gaming markets. Poki serves over 100 million monthly players. CrazyGames, Newgrounds, and itch.io serve millions more. The games that succeed in browsers have specific architectural patterns worth studying.
Browser 3D Games That Ship Today
Hordes.io is the most relevant browser MMO. Built by a solo developer using custom WebGL rendering, it puts 200+ players in a persistent 3D world with real-time combat, guilds, classes, and PvP. The entire game loads in under 5 seconds. The world is divided into zones with aggressive culling. Player models are simple (low-poly with flat shading) but the particle effects and animations make combat feel responsive. Hordes.io proves three things: browser MMOs can handle hundreds of concurrent players in one scene, a solo developer can build one, and stylized graphics perform better than realistic ones in a browser.
Krunker.io peaked at over 10 million monthly players and was acquired by FRVR. It's a browser-based FPS with a full map editor, custom game modes, user-generated content, and a marketplace. Built on Three.js, it runs at 60fps+ even on low-end hardware because of its blocky art style and aggressive optimization. The level editor is particularly relevant. Players build maps using a voxel-like block system, share them on the marketplace, and others play on them. This is the creator-world loop in miniature: build something, share it, others experience it. Krunker proved that user-generated 3D content can work in a browser if the creation tools are simple enough.
ev.io is a browser FPS built on Babylon.js. It runs well on most hardware, supports custom maps, and demonstrates that Babylon's WebGL renderer can handle fast-paced 3D action in a browser tab. The game uses aggressive texture compression and low-poly environments to stay within performance budgets.
Shell Shockers (over 5 million monthly players) is a 3D multiplayer shooter where you play as eggs. Built on Three.js, it handles real-time multiplayer with responsive hit detection in the browser. The cartoonish art style keeps the asset requirements minimal while still looking polished.
Townscaper isn't a browser game, but its approach to world building is highly relevant. Players click to place buildings on a water surface. The game automatically generates architectural details, streets, arches, and stairs based on placement patterns. No menus, no settings, no objectives. Just click and build. It sold over 1 million copies. The lesson: sometimes the simplest creative tools produce the most engaging experiences. If we can make placing objects in the world feel as immediate as Townscaper, creators will spend hours building.
A-Frame / 8th Wall experiences. A-Frame (built on Three.js) powers thousands of web-based 3D experiences. 8th Wall (now part of Niantic) runs AR experiences in mobile browsers. These aren't games, but they demonstrate that complex 3D rendering with physics and interaction works in a browser tab without plugins. Many of these experiences load in 2-3 seconds and run on mid-range phones.
Browser Games That Went Massive
Agar.io (2015) proved that browser multiplayer can reach millions. At its peak, it had over 100,000 concurrent players across servers. The game is 2D and mechanically simple (grow by absorbing smaller cells), but the networking architecture handles massive concurrency through spatial partitioning. Each server runs a region of the game world. Players only receive updates about entities in their view. This is the same interest management pattern needed for a 3D open world, just in 2D.
Slither.io built on Agar.io's success and proved the model scales. 67 million monthly active users at peak. The game uses WebSocket for real-time position sync and spatial partitioning to limit network traffic. One detail worth noting: Slither.io's server-side collision detection runs at a lower tick rate for distant players (5 Hz) than for nearby ones (30 Hz). This variable tick rate by distance is applicable to a 3D open world.
Surviv.io was a browser-based battle royale that reached 50 million monthly players before being acquired by Kongregate. It ran a full 80-player battle royale match entirely in the browser with real-time networked physics, destructible environments, and item pickup. The map was procedurally arranged from pre-designed building templates, which is a pattern we could use for creator-placed structures.
Zombs Royale ran a 100-player battle royale in the browser with fast load times and responsive networking. Like Surviv.io, it proved that large player counts in real-time browser games are commercially viable, not just technically possible.
The common thread across all these browser hits: they load fast (under 5 seconds), they work on any device, the art style is simple but consistent, and the networking is optimized for the specific gameplay (spatial partitioning, variable update rates, aggressive culling of distant state).
RuneScape: An MMO That Moved to the Browser
RuneScape is the most important case study for a browser-based open world because it actually happened. Jagex moved an entire MMO with 20 years of content into the browser.
RuneScape originally ran as a Java applet. When browsers dropped Java support, Jagex rebuilt the client in C++ and also shipped a fully functional HTML5/WebGL client. Old School RuneScape (the retro version) now runs entirely in the browser via a client compiled through Emscripten to WebAssembly. The game handles large persistent worlds, real-time multiplayer with hundreds of players per server, an economy with a functioning grand exchange (auction house), and 23 skills with deep progression systems.
Technical details that matter:
- The world is divided into map squares (64x64 tile regions). The client loads a 13x13 grid of regions around the player (104x104 tiles visible). Regions outside this grid are culled completely.
- Terrain is tile-based with height values per tile corner. Terrain overlays (paths, water edges, beach transitions) use a shape system with 12 rotation variants per shape. This is more constrained than a heightmap but extremely compact and fast to stream.
- The network protocol is custom binary over WebSocket. Each packet type has a defined structure. Player position updates use 2 bytes for map square coordinates and variable-length encoding for movement type. Chat, trade, and combat events have their own compact binary formats. The entire protocol is heavily optimized for minimum bandwidth.
- Object rendering uses a model system where the server sends a model ID and the client renders the cached model. Most models are loaded once and reused. This means the world streams as metadata (what model goes where) rather than streaming geometry.
- Each server instance handles 2,000 concurrent players across the full game world. The world is not spatially sharded. A single server process manages all players, all NPCs, and all game logic on a 600ms game tick. This works because the game logic is simple per-tick: process player actions, update NPC AI, resolve combat, broadcast state changes.
What RuneScape proves for our case: A full MMO with persistent world state, thousands of players, complex game systems, and a real economy can run in a browser tab. The client is under 50 MB downloaded, loads in a few seconds, and runs on laptops. If a 20-year-old Java MMO can make the transition, a purpose-built browser world has even fewer constraints.
Where RuneScape's approach doesn't fit: RuneScape's rendering is isometric/fixed-camera, not first/third-person 3D. The visual fidelity is low by modern standards. And the world is not creator-editable. But the networking architecture, the tile-based streaming, and the proof that browser MMOs retain players for decades are all directly relevant.
Habbo Hotel: Social Spaces That Lasted 25 Years
Habbo Hotel launched in 2000 and is still running. It's a 2D isometric social world where users create and decorate rooms, visit other people's rooms, and socialize. At peak, it had 9 million monthly users. The entire experience ran in Flash (now HTML5 after the Flash deprecation).
Habbo matters because of how long it has sustained a creator community. The room system is effectively a 2D version of what we're building: users place furniture objects in a grid, customize the layout, and invite others to visit. The economic model (users buy virtual furniture with real money) has generated over $1 billion in lifetime revenue.
What Habbo teaches:
- Room-based spaces with user-created interiors work as a social platform for decades if the creation tools are simple and the social features are strong.
- Virtual furniture economy sustains long-term engagement. Users buy, trade, and collect items. The items have no gameplay utility. They're pure self-expression and status.
- Moderation in social spaces requires constant investment. Habbo has been through multiple moderation crises. Automated content filtering plus human moderators plus community reporting is the minimum viable approach.
- The transition from Flash to HTML5 (completed around 2020-2021) proved that a large social world can migrate rendering technology without losing its community. The users care about their rooms and friends, not the underlying tech.
Among Us and Spatial Social Games
Among Us is not an open world, but its success revealed something important about multiplayer spaces: players want to be in a place together, not just in a game together. The spatial proximity chat mods that went viral showed that being in the same virtual room with directional audio transforms multiplayer from a game mechanic into a social experience.
Spatial social features that enhance a creator world:
- Proximity voice chat where volume fades with distance. Walk up to someone to talk. Walk away and they fade out. This creates natural social clusters without requiring voice channel management.
- Emote and gesture systems let players express themselves without voice. A wave, a dance, a pointing gesture. These are cheap to implement (animations on the player avatar) and disproportionately increase social engagement.
- Shared activities that happen in-world (not through menus) turn a space into a venue. If two creators can sit at a virtual table and look at 3D models together, the world has a reason to exist beyond displaying static content.
Browser Games That Scaled on Poki and CrazyGames
Poki and CrazyGames together serve over 150 million monthly players. The games that perform best on these platforms offer insights into what works in the browser specifically.
Top-performing patterns on browser game portals:
- Instant play (under 3 seconds to interactive). No login required. No tutorial required. The game should make sense within 5 seconds of landing.
- Session flexibility. Players drop in for 2 minutes or 2 hours. The game accommodates both. For a creator world, this means the world should be explorable without committing to a session. Walk around, see cool things, leave. Or stay and build for hours.
- Mobile compatibility. Over 60% of Poki traffic is mobile. A browser world that only works on desktop loses the majority of potential visitors.
- Social features that don't require friends. Leaderboards, reactions to other players' content, asynchronous features (see what other players built without being online at the same time).
The most successful 3D games on these portals (like Shell Shockers, 1v1.LOL, Smash Karts) keep the poly count low, the textures simple, and the framerate high. They prove that players accept simple graphics if the experience is smooth and responsive.
Browser-Based Creator Platforms
Hubs by Mozilla (now community-maintained). Multi-user 3D spaces in the browser built on Three.js and A-Frame. Supports voice chat, avatars, and shared objects. Not an open world (it's room-based), but the networking and rendering architecture are relevant. Mozilla open-sourced it before shutting down the hosted service, so the full codebase is available to study. GitHub.
Hyperfy. A web-based metaverse platform running entirely in the browser. Three.js rendering, multiplayer, avatar customization, and world building. Closer to our target than Hubs because it emphasizes creator tools. Worlds load in a browser tab with no download required. hyperfy.io.
Ethereal Engine (formerly XREngine). Open-source engine for multi-user worlds, built on Three.js and bitECS. Supports WebXR, spatial audio, and world editing. Has built-in ECS architecture, networking layer, and editor tools. The closest existing open-source project to what we're describing. Worth studying for how they integrate bitECS with Three.js for entity management and how their networking handles spatial state. GitHub.
Dusk (formerly Rune). Multiplayer game SDK for web games. Handles the networking layer so developers focus on gameplay. Their state sync approach uses predicted state with server reconciliation, which is the standard model for responsive multiplayer. The SDK abstracts away the complexity of rollback netcode. Worth studying for their developer experience.
OnRamp by Niantic. Browser-based 3D world builder from the Pokemon GO developer. Users create and share 3D spaces that other people can visit and explore. Demonstrates that non-technical creators can build 3D worlds in a browser if the tools are approachable.
PlayCanvas Editor. Not a game itself, but PlayCanvas's cloud-based 3D editor shows that collaborative world-building tools can run in the browser. Multiple team members edit the same scene simultaneously. The editor communicates changes via a real-time sync layer. This is the collaborative creation model we'd need, except with our world as the canvas instead of a game editor.
Native Creator Platforms (Lessons for Browser)
These platforms run as native apps, but their design decisions about creator tools, world persistence, and social dynamics are directly applicable.
Roblox is the single most important reference for a creator world. 80+ million daily active users. Creators build full 3D experiences (games, social spaces, stores) that other players visit. The platform handles hosting, networking, discovery, and monetization.
What Roblox gets right:
- Creation is in-engine. Roblox Studio is the same environment players experience. Creators test their work instantly. There's no export/upload/wait cycle. For a browser world, the editor should be the world itself.
- Scripting is accessible. Lua (Roblox's scripting language) is simple enough that children learn it. Complex behavior is possible but optional. The floor is low and the ceiling is high.
- Discovery is social. You find experiences because friends are playing them. The homepage shows trending experiences. For a browser world, the world itself is the discovery surface. You explore and find things by walking around.
- Monetization works. Creators earn real money (Roblox paid out $740 million to creators in 2023). This attracts serious creative effort. Without economic incentive, creator platforms become hobby projects that fade.
- Roblox's rendering engine is custom and runs in a native app, not a browser. But the per-experience asset budgets are modest by modern standards (100 MB recommended max). Most successful Roblox experiences use low-poly stylized art, which aligns with browser rendering constraints.
Fortnite Creative / UEFN (Unreal Editor for Fortnite). Epic brought Unreal Engine's full editor to Fortnite creators. The result is a platform where people build islands (self-contained worlds) using professional-grade tools. Fortnite handles hosting, multiplayer, and distribution.
Relevant insights:
- Professional tools attract professional content. UEFN produces visually stunning experiences because creators have access to UE5's full capabilities. The trade-off is complexity. UEFN has a steep learning curve.
- Island-based instancing (each creation is a separate world) avoids the moderation and conflict problems of a single shared world. But it also means creators don't naturally discover each other's work by exploring. You visit islands through menus, not by walking.
- The business model works. Fortnite's creator program pays based on engagement. Top creators earn millions per year. Again, economic incentive drives quality.
Dreams (Media Molecule / PlayStation). Dreams gave console players a full 3D creation suite (modeling, animation, music, logic, level design) and a platform to share creations. It's the most ambitious creator-platform ever built in terms of tool depth.
Relevant insights:
- Sculpture-based modeling instead of polygon editing. Creators shape soft volumes with move/grab/smooth tools, similar to ZBrush but more intuitive. The learning curve is gentle. This approach maps well to in-browser creation because it doesn't require understanding vertices and UV maps.
- Everything is a shared asset. If someone creates a tree model, anyone can use it in their own creation (with attribution). This creates a compound creative ecosystem where each creation makes the platform more valuable.
- Dreams struggled commercially despite critical praise. The problem was distribution: it was locked to PlayStation, and the creation tools were so deep that most players never moved beyond consuming content. The lesson: the creation tools need to be simple enough that the majority of users try them, even if only a minority become serious builders.
Core (Manticore Games). A free platform for creating and playing multiplayer games, built on Unreal Engine with a simplified editor. Core's editor runs as a native app, but the philosophy is relevant. Templates and community-shared scripts let beginners assemble games from pre-made components. Advanced creators can write Lua scripts for custom behavior. Core struggled to find a large audience, partly because the games had to be played through the Core launcher. A browser-based version wouldn't have this friction.
VRChat and Rec Room are social platforms where creators build spaces others visit. VRChat uses Unity and runs on PC/VR. Rec Room runs on everything including mobile. Both prove that user-generated 3D worlds can sustain large communities. VRChat is more technically impressive (custom shaders, complex avatars). Rec Room is more accessible (in-app creation tools, simpler graphics, wider platform support). For a browser world, Rec Room's approach of simple in-app creation tools is more applicable than VRChat's external-tool workflow.
Second Life is the grandfather of all creator worlds. Launched in 2003, it's still running with 200,000+ daily active users. The entire world is user-created. Land is owned and traded. Creators sell objects, clothing, and buildings. The in-world scripting language (LSL) enables interactive content.
What Second Life teaches after 20+ years:
- Persistence matters more than graphics. Second Life's visuals are dated, but the world persists. Creations stay where you put them. Relationships and history accumulate. This persistence is what keeps people coming back.
- Economy drives creation. Second Life's GDP is estimated at $500 million per year. Creators build because they can sell. Without economic incentive, the volume and quality of creator content drops off.
- User-generated content requires moderation infrastructure. Second Life has had 20 years of moderation challenges. Any platform where users can place arbitrary content in a shared space needs automated scanning, reporting tools, and human review.
- Land-based spatial organization works. Second Life divides its world into parcels that users own. Each parcel has a prim (object) limit. This naturally prevents any one creator from consuming all the resources. For a browser world, chunk-based ownership with per-chunk object budgets is the equivalent pattern.
Comparative Analysis: What Each Game Teaches Us
| Game | Key Lesson | Applicable Tech | Risk If Ignored |
|---|---|---|---|
| Skyrim | Chunk-based streaming with cell grid | Heightmap terrain, LOD tiers, interior/exterior separation | World doesn't fit in browser memory |
| The Witcher 3 | Layered world composition by separate teams | Content-aware streaming, impostor rendering | Creators can't work independently |
| Breath of the Wild | Systemic rules beat scripted content | Material interaction system, physics-driven gameplay | World feels static and dead |
| GTA V | Ambient life makes worlds feel real | NPC behavior systems, traffic, time-of-day | Creator world feels like an empty museum |
| Elden Ring | Density variation and asset reuse | Modular asset library, sparse + dense zones | Either too empty or too expensive to fill |
| No Man's Sky | Procedural generation for the canvas, creator content for the soul | Seed-based terrain, compact base data model | Infinite but boring terrain |
| Minecraft | Fully editable world, simple tools, infinite depth | Chunk streaming, palette compression, block-edit protocol | Creators can't reshape the world itself |
| Roblox | Creation is in-engine, economy drives quality | In-world editor, creator monetization | Nobody builds because there's no reason to |
| Krunker.io | Browser UGC works at scale with simple tools | Three.js rendering, voxel-based editor, marketplace | Creation tools too complex for casual creators |
| Hordes.io | 200+ players in browser 3D, solo-dev viable | Custom WebGL, spatial culling, stylized art | Over-engineer the multiplayer layer |
| Agar.io / Slither.io | Spatial partitioning enables massive concurrency | Variable tick rate by distance, interest management | Network collapses at scale |
| Second Life | Persistence and economy sustain 20-year communities | Parcel-based ownership, object budgets, marketplace | No long-term retention |
| Dreams | Sculpture-based creation is more intuitive than polygon editing | Volume-based modeling, shared asset library | Creation tools feel like a CAD program |
| Fortnite Creative | Pro tools attract pro content | Full editor capabilities in the platform | Content quality ceiling is too low |
| RuneScape | Full MMO works in browser via Wasm, binary protocols | Emscripten, custom binary WebSocket protocol, tile streaming | Underestimate what browsers can handle |
| Habbo Hotel | Simple room creation sustains 25-year community | Grid-based placement, virtual furniture economy | Overcomplicate the creation tools |
The games that matter most for our specific case (browser-based, creator-focused, multiplayer) are Minecraft (editable world, compact data), Roblox (in-engine creation, economy), Krunker (browser UGC at scale), and Hordes.io (browser MMO architecture). The AAA titles (Skyrim, BotW, Witcher 3) teach rendering and streaming. The browser hits (Agar.io, Slither.io, Surviv.io) teach networking at scale. The creator platforms (Roblox, Dreams, Second Life) teach community dynamics.
What Skyrim and The Witcher Would Look Like in a Browser
Let's get concrete. If you took Skyrim's Whiterun and rebuilt it for browser delivery:
Terrain: The area around Whiterun is roughly 2km x 2km. At our chunk size (64m), that's about 32x32 = 1024 chunks. At 2-4 KB per chunk's heightmap, that's 2-4 MB of terrain data. The terrain textures (grass, dirt, rock, snow) as KTX2 atlas tiles might add another 5 MB. Total terrain: under 10 MB for the entire region.
Structures: Whiterun itself has maybe 40-50 buildings. Each building as an optimized GLB (3 LOD levels) might be 200-500 KB at highest detail. But you only need full detail for the 5-10 closest buildings. The rest are medium or low LOD (50-100 KB each). Total visible structures at any time: 2-5 MB.
Foliage: Skyrim's trees and grass around Whiterun are all instanced. You need maybe 10 unique tree models (200 KB each at full LOD, 20 KB as billboard impostors) and a grass system that generates blades from a density map on the GPU. Total foliage assets: 2-3 MB. The instancing data (positions, rotations, scales) for a 5x5 chunk area: under 500 KB.
NPCs: Whiterun has roughly 70 named NPCs plus guards. Each avatar at medium quality: 100-200 KB. But only 10-20 are visible at any time. Total NPC rendering data: 2-4 MB.
Grand total for a Whiterun-scale area visible at any time: 15-25 MB. That's absolutely viable in a browser. The initial load would show terrain and major structures within 3-5 seconds on broadband, with detail filling in over the next few seconds.
The Witcher 3's Novigrad is larger and denser but the same principles apply. You'd need more aggressive LOD and streaming, but the total visible data at any moment stays within browser memory limits.
What We'd Build First
A full open world is a multi-year project. Here's the path to get something real in creators' hands quickly:
Phase 1: Shared Island (3 months). A single terrain chunk (512x512m island) with heightmap terrain, water, basic foliage, day/night cycle. Multiplayer via Durable Objects (up to 50 concurrent users). Creators can place AI-generated 3D assets from their existing Cinevva library. Think of it as a shared diorama.
Phase 2: Expandable World (3 months). Chunk streaming for a 4x4 km world. Creator-owned plots where they have edit permissions. LOD system for terrain and objects. Persistent world state. Up to 200 concurrent users across the world.
Phase 3: Living World (6 months). AI-assisted terrain sculpting. Procedural foliage and atmosphere. Quest/event system so creators can build interactive experiences, not just static scenes. Voice chat. Avatar customization. The world becomes a destination, not a demo.
Open Questions
Art style. Stylized (low-poly, cel-shaded) is cheaper to render and more forgiving of AI-generated assets. Realistic requires higher-quality assets and more rendering budget. Skyrim worked despite dated graphics because the art direction was consistent. BotW is gorgeous on tablet-class hardware because the cel-shaded style hides low polygon counts. Minecraft uses 16x16 textures and is one of the most recognizable games ever. Krunker.io and Hordes.io both succeed with simple stylized graphics in the browser. The evidence overwhelmingly favors a stylized direction for a browser world. We need to pick a specific visual identity early and enforce consistency across AI-generated assets.
World persistence vs. instancing. Does everyone share one world (like an MMO or Second Life) or do creators each get their own instance that others can visit (like Minecraft servers or Fortnite Islands)? The tech supports both, but the social dynamics are completely different. Second Life's persistent shared world creates serendipitous discovery (you stumble onto other people's work by walking around). Fortnite's instanced islands require a menu or portal system for discovery. Roblox uses a hub-based approach: games are separate but you browse and discover through a shared interface. A hybrid could work: a persistent shared overworld where creators own plots (like Second Life parcels), with the option to enter standalone experiences through portals.
Creation tool depth. Dreams proved that deep creation tools impress critics but intimidate users. Townscaper proved that minimal tools can sell a million copies. Roblox Studio sits in the middle: simple enough for kids, deep enough for professionals. Krunker's voxel editor is simpler still. For a browser world, we should start with Townscaper-level simplicity (place objects, they snap and connect) and add depth over time. The initial experience of placing your first object in the world should take under 30 seconds.
Economy. Roblox, Second Life, and Fortnite Creative all prove that economic incentive is what turns a toy into a platform. Without a way for creators to earn from their work, the most talented creators will build elsewhere. This doesn't need to launch on day one, but the architecture should support it (object ownership, visit tracking, creator attribution).
Mobile. WebGPU on mobile is years away from reliable. A mobile experience would need to be a reduced version: simpler terrain, fewer objects, shorter view distance. Rec Room's approach of running on everything with adaptive quality is worth studying. Or we ship a native app for mobile and keep the full experience in the browser.
Moderation. An open world where anyone can place anything is a moderation nightmare. Second Life has been dealing with this for 20 years. Every placed asset needs automated content review before it becomes visible to others. This adds latency to the creative process but is non-negotiable. We should also consider parcel-based content ratings (like Second Life's General/Moderate/Adult system) so creators can choose their content boundaries.
Systemic interactions. BotW and Minecraft both show that material-based interaction systems create exponentially more interesting worlds than static object placement. If a creator places a wooden bridge and another creator starts a fire nearby, should the bridge burn? If someone places a dam, should water pool behind it? These interactions make the world feel alive but require consistent physics and material rules across all creator content. Deciding how far to go with systemic design is an early architectural decision.
Key Takeaways
Browser 3D open worlds are viable today. Hordes.io runs 200+ players in 3D in a browser. Krunker.io had 10 million monthly players with a full 3D map editor. The .io games proved browser multiplayer scales to millions. The tech isn't speculative.
The rendering is ready. Three.js and Babylon.js handle 3D scenes comparable to early-2010s AAA games. WebGPU unlocks compute shaders for terrain and foliage. Wasm physics engines run within 2-3x of native. A Whiterun-scale area fits in 15-25 MB of visible data.
The networking is ready. Cloudflare Durable Objects give you per-chunk authoritative servers at the edge. CRDTs handle collaborative editing without conflicts. Spatial partitioning (proven by every game from Agar.io to Skyrim) keeps network traffic manageable at hundreds of concurrent players.
The AAA open worlds (Skyrim, Witcher 3, BotW, Elden Ring, Minecraft, No Man's Sky) aren't just graphical benchmarks. They're textbooks on chunk streaming, LOD management, procedural generation, systemic design, and world composition. Every technique they use has a browser-compatible equivalent.
The creator platforms (Roblox, Second Life, Fortnite Creative, Dreams) teach the social and economic lessons. Creation must happen in-world, not in an external tool. Economic incentive drives quality. Persistence creates attachment. Simple tools reach more creators than powerful ones.
The creative pipeline is where Cinevva has an edge. We already generate 3D assets, textures, and audio. Connecting that pipeline to a world placement system is the missing piece. AI-generated content fills the world. Creator curation and arrangement gives it soul.
What we'd end up with isn't Skyrim in a browser. It's closer to the intersection of Minecraft (editable world), Roblox (creator economy), and BotW (systemic interactions), running in a browser tab with AI-powered creation tools. The technology to build it exists. The question is execution.
Research Papers and Academic References
The techniques in this guide aren't invented from scratch. They're grounded in decades of research. These are the papers that matter most for each subsystem, with notes on how they apply to a browser open world.
Terrain Generation and Rendering
"An Image Synthesizer" -- Ken Perlin (SIGGRAPH 1985). DOI. The paper that introduced Perlin noise. Every procedural terrain generator in every game since 1985 traces back to this. The noise function produces the smooth randomness that, when layered in octaves (fractional Brownian motion), generates natural-looking heightmaps. Simplex noise (Perlin, 2001) is the faster successor. For our terrain pipeline, this is the foundation: noise-based heightmap generation runs in a WebGPU compute shader at interactive speeds.
"Texturing and Modeling: A Procedural Approach" -- Ebert, Musgrave, Peachey, Perlin, Worley (1994, 3rd edition 2003). The textbook on procedural generation. Musgrave's chapters on terrain modeling, including multifractal terrain with erosion-like features, are the direct basis for modern game terrain generators. The fBm parameters (lacunarity, persistence, octave count) described here are the same ones we'd expose to creators for terrain customization.
"Geometry Clipmaps: Terrain Rendering Using Nested Regular Grids" -- Losasso and Hoppe (SIGGRAPH 2004). DOI. This paper solved how to render massive terrain at interactive rates using concentric LOD rings (clipmaps). The terrain mesh is a fixed set of nested grids centered on the camera. As the camera moves, the grids shift and update. This is the technique recommended in our terrain section for WebGL 2, and it works because the GPU workload is constant regardless of world size. The original implementation predates WebGL but maps directly to it.
"Fast Hydraulic Erosion Simulation and Visualization on GPU" -- Mei, Decaudin, Hu (2007). PDF. Moved hydraulic erosion from a CPU-bound offline process to a real-time GPU computation. The paper's shallow-water simulation model (treating water as a height field, computing flow between grid cells) runs in a compute shader. For our pipeline, server-side GPU erosion can transform noise-generated terrain into geologically plausible landscapes in under a second, making AI-generated terrain look hand-sculpted.
"Real-Time Rendering of Procedurally Generated Planets" -- Hybrid approaches from No Man's Sky's GDC talks and Sean Murray's descriptions. While not a single paper, the GDC 2017 talk "Building Worlds Using Maths" by Innes McKendrick (Hello Games) details how No Man's Sky generates planet-scale terrain using stacked noise functions, voxel representation with marching cubes, and GPU-side generation. Directly relevant to our procedural terrain approach, particularly for generating terrain features that heightmaps can't represent (caves, arches).
"C-DBLOD: Hybrid LOD for Terrain Rendering" -- Filip Strugar (2014). Paper. An improvement on geometry clipmaps that adds quadtree-based selection for better adaptivity around camera position. The key insight: instead of fixed concentric rings, use a quadtree to select terrain patches at varying resolutions. This handles irregular terrain (where some areas need more detail than others) better than pure clipmaps. Implementable in WebGL 2 with a small CPU-side quadtree traversal.
3D Gaussian Splatting and Neural Rendering
"3D Gaussian Splatting for Real-Time Radiance Field Rendering" -- Kerbl, Kopanas, Leimkühler, Drettakis (SIGGRAPH 2023). Project page. The paper that launched the Gaussian splatting revolution. A scene is represented as millions of 3D Gaussians, each with position, covariance (shape), opacity, and spherical harmonic color coefficients. Rendering sorts splats by depth and rasterizes them as 2D Gaussians. The approach is 100-1000x faster to train than NeRFs and renders at real-time rates. Multiple WebGL/WebGPU implementations exist. For our platform, this enables creators to capture real-world objects with phone photos and place them in the browser world.
"NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis" -- Mildenhall et al. (ECCV 2020). Project page. The foundational neural scene representation paper. A neural network maps 3D coordinates to color and density, enabling photorealistic novel view synthesis from a set of input photographs. While NeRFs are too expensive to render directly in a browser (they require per-pixel network evaluation), the NeRF-to-mesh extraction pipeline (training a NeRF, then running marching cubes on the density field) produces high-quality textured meshes from photos.
"Instant Neural Graphics Primitives with a Multiresolution Hash Encoding" -- Müller, Evans, Schied, Keller (SIGGRAPH 2022). Project page. Reduced NeRF training from hours to seconds using a multi-resolution hash table for spatial encoding. This made NeRF practical for production use. The hash encoding technique is also applicable to other spatial data in a browser world (fast lookups in large 3D datasets).
"Neuralangelo: High-Fidelity Neural Surface Reconstruction" -- Li et al. (CVPR 2023). Project page. Extracts high-quality triangle meshes from neural representations using multi-resolution hash encoding and numerical gradients for SDF (signed distance function) estimation. The output meshes are directly usable in browser 3D engines. For our asset pipeline, Neuralangelo (or similar tools like NeuS2) can convert NeRF captures into web-ready GLB files with clean geometry and baked textures.
Multiplayer Networking and State Synchronization
"Interest Management in Massively Multiplayer Online Games" -- Boulanger, Kienzle, Verbrugge (2006). DOI. A comprehensive survey of area-of-interest (AOI) management techniques for MMOs. Covers grid-based, aura-based, and hybrid approaches for filtering network updates based on spatial relevance. The grid-based approach (which maps to our chunk system) is the most efficient for uniform-density worlds. The aura-based approach (per-entity influence radius) works better for variable density. Our recommendation of chunk-based AOI with priority-based update rates within the AOI draws from this research.
"Dead Reckoning: Latency Hiding for Networked Games" -- Pantel and Wolf (2002). DOI. Formalizes dead reckoning (predicting entity positions based on last known velocity) for networked games. The paper quantifies the trade-off: higher prediction thresholds reduce bandwidth but increase position error visible during corrections. For browser games with 50-200ms latency, a prediction threshold of 0.5-1.0 meters keeps corrections invisible while cutting position update bandwidth by 60-80%.
"Conflict-free Replicated Data Types" -- Shapiro, Preguiça, Baquero, Zawirski (2011). DOI. The foundational CRDT paper. Defines state-based and operation-based CRDTs that converge without coordination. For our world editing system, the relevant CRDTs are: LWW-Register (Last-Writer-Wins Register) for object properties that have a single value (position, rotation, color), and OR-Set (Observed-Remove Set) for the collection of objects in a chunk (handles concurrent add/remove without conflicts). Yjs implements these efficiently in JavaScript.
"Time Warp: A Mechanism for Distributed Simulation" -- Jefferson (1985). DOI. The original optimistic distributed simulation paper. While Time Warp itself is too complex for a browser game, the core insight (process events optimistically and roll back if a conflict arrives from another node) is the basis for modern client-side prediction with server reconciliation. This is how every responsive multiplayer game works: the client predicts locally, sends actions to the server, and corrects if the server disagrees.
"The TRIBES Engine Networking Model" -- Frohnmayer and Gift (GDC 1999). One of the first practical descriptions of client-server game networking with interest management, prioritized state updates, and bandwidth budgeting. The "ghost manager" concept (the server maintains a per-client view of what the client knows, and only sends deltas from that view) is exactly what our chunk-based Durable Object architecture implements. This GDC talk is the intellectual ancestor of most modern game networking.
"Source Multiplayer Networking" -- Valve (2009). Developer documentation. Valve's documentation of the Source engine networking model (used in Half-Life 2, CS:GO, Team Fortress 2). Covers client-side prediction, entity interpolation, lag compensation, and the "snapshot" system where the server sends full world state at regular intervals while the client interpolates between snapshots. This is the gold standard for authoritative server networking and directly applicable to our architecture.
Procedural Generation
"Model Synthesis: A General Procedural Modeling Algorithm" -- Merrell (2007). DOI. One of the precursors to Wave Function Collapse. Generates 3D structures from example models by propagating local constraints. The algorithm ensures global consistency by iteratively collapsing cells with the fewest possibilities (minimum entropy heuristic). This is how Townscaper and similar generators produce coherent structures from simple user input.
"WaveFunctionCollapse" -- Maxim Gumin (2016). GitHub. Not a traditional paper but a seminal open-source project with extensive documentation. The algorithm takes a small example image or tileset and generates larger outputs that are locally similar to the input. For a creator world, WFC can generate building layouts, road networks, dungeon maps, and terrain details from a small set of creator-defined rules. Multiple JavaScript implementations exist.
"Wave Function Collapse is Constraint Solving in the Wild" -- Karth and Smith (FDG 2017). DOI. An academic analysis of WFC that clarifies its relationship to constraint satisfaction and shows how to analyze and extend it. Relevant for understanding WFC's limitations (it can get stuck and need backtracking) and how to design tilesets that avoid these issues.
"Superposition Theorem and Its Implications for the Procedural Generation of Game Content" -- Sandhu et al. (2022). Explores using quantum-inspired superposition concepts for procedural content generation. While speculative, the mathematical framework for maintaining multiple possible states before "collapsing" to a final configuration is exactly how WFC works and could inform more sophisticated generation systems.
Real-Time Rendering Techniques
"Real-Time Rendering" -- Akenine-Möller, Haines, Hoffman (4th edition, 2018). The standard textbook. Chapter 19 (Acceleration Structures), Chapter 20 (Efficient Shading), and Chapter 21 (Virtual and Augmented Reality) are particularly relevant. The frustum culling, occlusion culling, and LOD algorithms described here are what Three.js, Babylon.js, and every game engine implement. Not a paper, but the definitive reference.
"A Survey on Baking Neural Radiance Fields for Real-Time View Synthesis" -- Reiser et al. (2023). DOI. Surveys methods to convert NeRFs into formats that render in real time (meshes, textures, sparse voxel grids). Directly relevant to our asset pipeline where server-side neural capture needs to produce browser-renderable output.
"Scalable and Accurate Online Feature Matching Using 3D Gaussian Splatting" -- Various groups (2024-2025). Multiple recent papers explore editing, compositing, and dynamic scenes with Gaussian splats. Relevant because creator worlds need to composite multiple splat scenes (each creator's captured objects) into a single coherent scene. Methods for splat editing (recoloring, deformation, compositing) are active research areas.
"Efficient GPU Screen-Space Ray Tracing" -- McGuire and Mara (JCGT 2014). DOI. The paper behind screen-space reflections used in our water rendering section. Traces rays through the depth buffer for approximate reflections without the cost of full ray tracing. The "hierarchical tracing" variant (using a min-max depth mipmap) runs efficiently on WebGL 2.
"Simulating Ocean Water" -- Jerry Tessendorf (2001). PDF. The foundational paper on FFT-based ocean simulation. Describes the Phillips spectrum (statistical model of ocean waves) and how to transform it into a spatial displacement map via inverse FFT. Used by every major game with realistic ocean (Sea of Thieves, Assassin's Creed, Uncharted). The FFT computation maps cleanly to WebGPU compute shaders.
"Precomputed Atmospheric Scattering" -- Bruneton and Neyret (EGSR 2008). DOI. The paper behind physically accurate sky rendering. Precomputes atmospheric scattering into lookup tables that a fragment shader samples in real time. Produces correct sky colors, aerial perspective (distant objects look bluish/hazy), and sunset/sunrise colors from first principles. The precomputed tables are small (a few hundred KB) and the runtime shader is cheap. Both Three.js's Sky shader and Babylon.js's procedural sky are simplified versions of this approach.
"Ambient Occlusion Volumes" -- McGuire (HPG 2010) and "Scalable Ambient Obscurance" -- McGuire, Mara, Luebke (HPG 2012). PDF. The papers behind modern SSAO implementations. SAO is the most commonly implemented variant in browser 3D engines because it's efficient (one depth buffer sample per pixel) and produces plausible contact shadows. The algorithm samples the depth buffer around each pixel to estimate how "occluded" it is by nearby geometry. Both Three.js and Babylon.js implement SAO-derived SSAO.
Crowd Rendering and Animation
"GPU Crowd Rendering" -- Dudash (2007) and subsequent GDC/SIGGRAPH presentations on instanced crowd rendering. The core technique: bake skeletal animation frames into textures (Vertex Animation Textures), then render crowds as instanced meshes where each instance reads its bone transforms from the animation texture based on its current frame. This decouples animation evaluation from draw calls, allowing a single instanced draw call to render hundreds of uniquely animated characters.
"Position Based Dynamics" -- Müller et al. (2007). DOI. The foundational paper on PBD, which is how modern game engines simulate cloth, hair, and soft bodies. Rapier (our recommended Wasm physics engine) uses PBD-derived solvers. For avatar customization (capes, flowing hair, loose clothing), PBD provides responsive simulation at game frame rates. The Wasm implementation keeps it off the main JavaScript thread.
"FABRIK: A Fast, Iterative Solver for the Inverse Kinematics Problem" -- Aristidou and Lasenby (2011). DOI. The paper behind the IK solver recommended in our avatar section. FABRIK works by alternately reaching from end effector to root and root to end effector, converging in 3-5 iterations. It's faster than Jacobian-based IK, handles joint constraints naturally, and is simple to implement (about 50 lines of code for the basic solver). Both Three.js and Babylon.js IK implementations are derived from FABRIK.
Virtual Worlds and Collaborative Environments
"Massive Multiplayer Online Games: A Survey of the State-of-the-Art" -- Yahyavi and Kemme (2013). DOI. Comprehensive survey of MMO architecture covering client-server models, peer-to-peer approaches, interest management, consistency models, scalability techniques, and cheating prevention. The taxonomy of consistency models (strict, eventual, causal) maps to our CRDT-based approach (eventual consistency with causal ordering via vector clocks).
"A Distributed Architecture for Multiplayer Interactive Applications on the Internet" -- Diot and Gautier (1999). DOI. Early research on distributed virtual environments that identified the core tension: strict consistency requires coordination (adding latency), while weak consistency allows responsiveness but risks visible inconsistency. The paper argues for "local lag" (delaying local display by a small amount to give time for remote updates to arrive) as a middle ground. For world edits (placing objects), a 100-200ms local lag is imperceptible and gives the server time to validate.
"The Second Life Grid: The Architecture of a Near-Contemporary Open Source Virtual World" -- Linden Lab technical documentation and community reverse-engineering. While not a single paper, the technical analysis of Second Life's architecture is extensively documented. The key insights: each 256x256m region runs on a dedicated server instance. Objects are stored as a tree of "primitives" (basic shapes with transforms, textures, and scripts). The viewer streams object descriptions and textures on demand. This parcel-based model with per-object persistence is the closest existing architecture to what we're building, and Second Life's 20+ years of operation prove it scales.
Web Graphics and Browser Performance
"WebGPU: A High-Performance Graphics API for the Web" -- W3C GPU for the Web Working Group (2023-ongoing). Specification. The formal specification for WebGPU. Not a research paper, but the definitive technical document for browser GPU programming. The compute shader specification (Section 23) is particularly relevant for terrain generation, foliage scattering, and particle systems described throughout this guide.
"WebAssembly: A Framework for Running Compiled Code in the Browser" -- Haas et al. (PLDI 2017). DOI. The original WebAssembly paper from the browser vendors. Demonstrates that Wasm achieves within 2x of native performance for compute-intensive workloads. This validates our recommendation of Wasm-compiled physics engines (Rapier, Havok) for browser open worlds. The paper's performance analysis shows that the overhead comes primarily from bounds checking and indirect function calls, not from the compilation model itself.
"Not So Fast: Analyzing the Performance of WebAssembly vs. Native Code" -- Jangda et al. (USENIX ATC 2019). PDF. A rigorous benchmark of Wasm vs native performance. Finds that Wasm runs 1.45x-1.55x slower than native C on average across the SPEC CPU benchmark suite. For game physics specifically (floating-point heavy, few system calls), the overhead is at the lower end (~1.3x). This confirms that Wasm physics in a browser is viable for real-time game workloads.
"Bringing the Web up to Speed with WebAssembly" -- Rossberg et al. (2018). DOI. Describes the design rationale and formal semantics of WebAssembly. Particularly relevant: the discussion of memory safety guarantees (Section 3) explains why Wasm modules can safely share a browser tab with JavaScript without the security risks of native plugins. This is what makes running physics engines (which were traditionally C++ libraries) safe in a browser.
AI-Powered Content Generation
"Text-to-3D Generation with Bidirectional Diffusion Using Both 2D and 3D Priors" -- Various groups (2023-2025). Multiple recent papers (DreamFusion, Magic3D, ProlificDreamer, MVDream, Zero-1-to-3++) explore generating 3D assets from text prompts by using 2D diffusion models as priors for 3D optimization. The quality has improved dramatically from early 2023 to 2025, going from blobby shapes to detailed, textured meshes. For our platform, these models (running server-side on GPUs) are the "AI generation" step in the creator asset pipeline.
"DreamFusion: Text-to-3D using 2D Diffusion" -- Poole et al. (ICLR 2023). Project page. The foundational paper on Score Distillation Sampling (SDS), which uses a pre-trained 2D diffusion model to guide 3D optimization. The key insight: you don't need 3D training data if you can evaluate whether rendered views of a 3D object match a text prompt using an existing 2D model. This opened the door to text-to-3D generation and is the basis for subsequent work (Magic3D, ProlificDreamer) that improved quality and speed.
"LRM: Large Reconstruction Model for Single Image to 3D" -- Hong et al. (ICLR 2024). Project page. Reconstructs a 3D model from a single image in 5 seconds on a single GPU. The model outputs a NeRF-like representation that can be converted to a mesh. For a creator world, this means a creator could take a photo of any real-world object and get a 3D model within seconds. The speed makes it viable as an interactive tool rather than a batch process.
"Procedural Content Generation via Machine Learning (PCGML)" -- Summerville et al. (2018). DOI. A survey of using machine learning for procedural content generation in games. Covers level generation, item generation, narrative generation, and world generation. Particularly relevant: the discussion of "controllable generation" where designers set high-level parameters and the ML model fills in details. This is the paradigm for AI-assisted world building: creators set intent ("make this area a spooky forest"), AI fills in the geometry, textures, and population.
How These Papers Connect to Our Architecture
The research maps to our architecture in layers:
Terrain pipeline: Perlin/simplex noise (Perlin 1985, 2001) generates the base heightmap. Hydraulic erosion (Mei et al. 2007) adds geological realism. Geometry clipmaps (Losasso and Hoppe 2004) or CDLOD (Strugar 2014) render the terrain efficiently in the browser. Atmospheric scattering (Bruneton and Neyret 2008) makes distant terrain look correct.
Asset pipeline: Text-to-3D (DreamFusion et al.) and image-to-3D (LRM) generate assets server-side. Gaussian splatting (Kerbl et al. 2023) enables photogrammetry capture. Neuralangelo (Li et al. 2023) extracts clean meshes from neural captures. All outputs are processed into browser-ready GLB/KTX2.
Rendering: SSAO (McGuire 2012) adds depth. Screen-space reflections (McGuire and Mara 2014) power water. FFT ocean (Tessendorf 2001) simulates water. Vertex animation textures (Dudash 2007) render crowds. FABRIK (Aristidou and Lasenby 2011) drives character IK.
Networking: Interest management (Boulanger et al. 2006) filters updates by spatial relevance. Dead reckoning (Pantel and Wolf 2002) reduces bandwidth. CRDTs (Shapiro et al. 2011) handle collaborative editing. Client prediction with server reconciliation (Jefferson 1985, Valve Source Networking) provides responsiveness.
World generation: WFC (Gumin 2016, Karth and Smith 2017) generates structures and layouts. PCGML (Summerville et al. 2018) provides the framework for AI-assisted generation where creators set intent and models fill details.
The research is mature. Most of these techniques have been used in shipped games for years. The innovation in bringing them to a browser isn't the algorithms. It's the engineering of making them work within browser memory, GPU, and network constraints, which is what the rest of this guide addresses.
Further Reading
- Web Games Tech Stack in 2026 covers WebGL, WebGPU, and WebAssembly fundamentals
- Web Game Engines Comparison compares engines for browser delivery
- Three.js + USDC Tech Report on loading 3D assets in the browser
- Frontier Open-Source Gen AI Models for the AI generation pipeline
- Co-op Game Design on multiplayer design patterns that keep players engaged