Skip to content

Why we built our own WebGPU engine instead of forking PlayCanvas

By Oleg Sidorkin

Every technical person who looks at Cinevva World asks the same thing within five minutes. You're a small team. Mature web 3D engines exist. Why did you write your own renderer, your own character physics, your own animation system, instead of standing on PlayCanvas or Babylon and shipping faster?

It's a fair question, and "not invented here" is the wrong answer to it. We did the homework. We ran the existing engines, read their source, and shipped spikes on top of two of them before deciding. This post is the honest version of that decision: what the off-the-shelf options are genuinely good at, the four places our needs diverged hard enough to justify building, and the situations where you should pick one of them over copying us. We documented the build itself in a long-running open-world-in-the-browser engineering series, so where a claim below has a working log behind it, I'll link to it.

The options we actually evaluated

Four things get called "web game engines," and they aren't the same kind of thing.

Three.js is a rendering library, not a game engine. It gives you a scene graph, materials, loaders, and a renderer, and then it gets out of your way. There's no editor, no physics, no entity system, no opinion about how your game is structured. That's the appeal and the cost. You build everything above the renderer yourself, but nothing fights you. It's MIT-licensed, it has the largest ecosystem in the space, and when you hit an obscure shader bug at 2am there's a forum answer waiting.

Babylon.js is a full engine with a scene graph, a physics integration (Havok), an asset pipeline, and a web editor. It's MIT-licensed, backed by a team at Microsoft, and its WebGPU work has been moving quickly. If you want batteries included and you're happy living inside the engine's structure, it's a strong default.

PlayCanvas is the closest thing to a "Unity for the web." As of this writing the engine is at v2.19.6, released June 5, 2026, and the engine itself is open source under MIT. The thing most people actually use, though, is the hosted visual editor, and that editor is a commercial product, not open source. PlayCanvas uses an entity-component system, you script in TypeScript or JavaScript, assets flow through a server-side GLB pipeline, and real commercial games have shipped on it (Snap runs production work on PlayCanvas, for one). Its renderer runs WebGL2 with a WebGPU path that is still maturing rather than the default. That last detail matters more than it sounds, and I'll come back to it.

Unity WebGL isn't a web engine at all. It's an export target. You build in the Unity desktop editor and compile to a WebGL bundle. It's the right tool when you already have a Unity game and want it in a browser, and the wrong tool when "loads instantly in a tab on a mid-range phone" is a hard requirement, because the runtime and download weight come along for the ride.

Any of these is a reasonable foundation for a normal web game. We didn't have a normal web game.

Decision one: WebGPU is our floor, not our finish line

The split that pushes us off every shelf is this. For us, WebGPU is a requirement, not a feature we'll grow into.

Our terrain isn't a static heightmap. It's a hybrid of a streamed heightmap field and marching-cubes chunks backed by a signed-distance field, so the world can have real caves and overhangs, and creators can sculpt it live. The sculpt brushes, the foliage scatter, and the terrain meshing all run as compute shaders. Take compute away and the world doesn't degrade, it doesn't run.

That's the opposite of where the general-purpose engines sit today. Their WebGPU support is designed as progressive enhancement on top of a WebGL2-first renderer, with a fallback path for browsers that lack it. PlayCanvas in particular is WebGL2-first with WebGPU still in beta. That's the correct call for them, because their job is to run the widest possible matrix of games on the widest possible matrix of devices. Our job is narrower and deeper, so we made the opposite call: we went WebGPU-only at spike 13 and never looked back. Browsers without WebGPU aren't downgraded, they're unsupported, and we track that as a reach number instead of pretending a WebGL2 fallback is one config flag away. It isn't. It would be a partial rewrite of our terrain and foliage stages.

Building on a WebGL2-first engine would have meant either fighting its fallback assumptions on every compute feature, or maintaining two rendering paths forever. Owning the renderer let us treat compute as the baseline.

Decision two: a character solver, not a physics engine

The textbook move is to drop in a physics engine. We tried it. An early spike validated Rapier running in a worker, and it felt fine.

We wrote our own anyway, and there's no Rapier, Cannon, or Ammo anywhere in the shipped build. The reason is scope. We don't need rigid bodies, joints, ragdolls, or a constraint solver. We need one capsule to move correctly against terrain, and we need walking, sliding, gliding, climbing, and swimming to agree on a single shared answer to "am I grounded, on what surface, at what angle." A general physics engine makes that harder, not easier, because those modes end up fighting its internal springs and dampers.

So our character controller is a pluggable multi-channel state machine. Each mode is a small unit that says whether it wants control this frame and, if it wins, writes velocity and facing. They arbitrate by priority across three channels, resource then stance then locomotion, and they all read the same terrain query. Collision is a capsule probe against the heightmap or the signed-distance field depending on the chunk.

The detail I'm proudest of is unglamorous. Ground detection scans every surface in the vertical column beneath your feet and picks the highest one at or below the capsule, instead of trusting the SDF gradient. On the edge of an overhang the nearest surface sideways is the cliff face, so a gradient-based normal flickers between "floor" and "wall" and you get phantom sliding. The column query makes standing on a ledge boring, which is exactly what you want. An off-the-shelf heightfield collider would not have given us that for free, because it can't see our terrain in the first place. The terrain lives on the GPU in our format. No external engine can collide against it without us copying the whole field into its collider format on every edit.

Decision three: animation that travels across rigs

Avatars use the Synty POLYGON rig, and our motion clips come from a large open animation library. Because the rig and the clips share bone names, the common case needs no retargeting at runtime, the tracks just bind by name. We wrote up the full character pipeline in universal characters.

The interesting work is the case where they don't match. We built a retargeting pass that maps one skeleton onto another, and getting it right meant solving three specific problems. You align bind poses first, because an A-pose against a T-pose silently adds about thirty degrees per joint. You scale root motion and stride with two separate ratios, hip-to-floor for the vertical and overall proportion for the horizontal. And you test retargeted clips against their source at a fixed sample rate, so regressions show up before a player does. That pipeline is what lets us add new animation sources, and eventually creator-supplied characters, without hand-fixing every clip.

On top of that sits an animation state machine that picks the right clip family each frame from the motion state: jump variant by speed, landing severity by impact, slope direction from the dot of velocity against the terrain gradient, foot-phase-correct stops so you don't moonwalk to a halt, and a hundred-millisecond debounce so bumping a wall doesn't strobe you between walk and fall. None of this is exotic, but it's the kind of thing you only get by owning the layer.

Decision four: the part no engine ships

The other three decisions are about how the world runs. This one is about what the product is, and it's why the comparison to PlayCanvas is ultimately a category error.

PlayCanvas, Babylon, Unity, and a hand-rolled Three.js app all assume the same shape: you build your game at a desk, in a 2D editor, looking at the world from outside, and then you hit play and ship a separate runtime. Even their in-engine editing is you-at-a-workstation manipulating a scene you're looking at.

Cinevva World inverts that. You create from inside the world, as an avatar standing in the same space your players will stand in, by describing what you want and watching it appear. Creation and play are one continuous session, not a build step bridged to a runtime. The closest reference points are Roblox, Rec Room, and Horizon Worlds, not web 3D engines, and even those keep most creation at a desk. The thing that makes it tractable without a mouse-driven editor is the AI builder, which turns "put a lantern-lit dock here" into geometry and placement.

You can't bolt that onto a general-purpose engine, because it isn't a rendering feature. It's the premise of the whole stack, from how terrain is editable at runtime to how the network treats every object as something a person standing nearby just made.

When you should not do what we did

Here's the part the "we built our own engine" genre usually skips. If you want to ship a browser game this quarter, do not copy us. Use PlayCanvas if you want a Unity-style editor and a managed pipeline. Use Babylon if you want a full engine with physics included and you're happy inside its structure. Use Three.js if you want maximum control and minimum opinion and you have the team to build above it. Port from Unity if you already have a Unity game.

Writing your own renderer, physics, and animation is only the right call when your product's premise is incompatible with the shelf, when the thing you're building isn't a game on an engine but a place that happens to be made of one. That was true for us. It probably isn't true for you, and that's fine. The point of doing the evaluation honestly is knowing which case you're in before you write the first line.