Skip to content

Avataar's Varya makes AI video 10x cheaper, and that's the real story

Avataar.ai unveiled Varya at an event in New Delhi today, with the Secretary of India's Ministry of Electronics and IT in the room. It's a 14-billion-parameter AI video model that the company puts in the same quality bracket as Alibaba's Wan 2.2 and Google's Veo 3, but the number that matters isn't a quality benchmark. It's the price. Varya generates video at roughly ₹0.48 per second, which the company says is up to ten times cheaper than the leading global models. It does that by running a distilled architecture that cuts generation from the usual 50 diffusion steps down to four, while keeping output quality close to the full-step models.

Varya, India's distilled video model, generates clips in four steps instead of fifty

The model got there with help from India's IndiaAI Mission, which provided subsidized access to national AI compute. It's trained for Indian contexts on purpose, so it represents regional diversity, festivals, food, clothing, and everyday environments better than models trained mostly on Western data. There's a live demo at varya.avataar.ai, and Avataar says it will release Varya as an open-weight model, with training data, on India's AIKosh portal, so developers can self-host or fine-tune it. A technical report on the architecture and distillation method is coming too.

Four steps instead of fifty

The technical move here is the whole point. Diffusion models generate by starting from noise and refining over many steps, and each step is a full pass through the network. Fifty steps means fifty passes. Distillation trains a smaller, faster student model to reproduce what the slow many-step teacher produces, but in a handful of steps. Four steps instead of fifty is more than ten times less compute per frame, and compute per frame is where almost all the cost of video generation lives.

The wall-clock numbers make it concrete. On an NVIDIA H200, Avataar says Varya turns out a 5-second 720p clip in about 45 seconds. It clocks Wan 2.2 at around 1,230 seconds for the same job. That's not a small win at the margin, it's a different order of magnitude, and it's the difference between a model you can serve to a crowd and one you can only afford to run for a demo.

This is the same trick that made fast image generation practical over the last two years. Watching it land on video at production scale, with a real per-second price attached, is the signal worth paying attention to.

Why cost is the 2026 video story

For two years the AI video conversation was about capability. Whose model does 4K. Whose model syncs audio. Whose model holds a character across shots. That race mostly resolved. Kling 3.0 shipped 4K with audio in March, LTX 2.3 did it open-source, and the frontier models are all close enough that the differences are arguments, not gaps.

What didn't resolve was the economics. We wrote about how Sora burned roughly a million dollars a day against two million in lifetime revenue before OpenAI shut it down. The capability was never the problem. The cost of serving it was. A model that's beautiful and unaffordable is a demo, not a product.

So the interesting frontier in 2026 isn't a model that does something new. It's a model that does the same thing for a tenth of the cost. Varya is one data point. The broader move across the whole field, distillation, fewer steps, smaller models tuned for specific contexts, is the thing that turns AI video from a money-losing showcase into something a small team can actually build on.

What it means for builders

If you're building anything that generates video as part of the product, not as a one-off, the per-second cost is your unit economics. It's the difference between a feature you can offer everyone and a feature you ration or charge a premium for. A 10x cost drop doesn't make your product 10x better. It makes a whole category of products that were underwater suddenly viable.

This is why we keep watching the cheap end of the curve more than the flashy end. At Cinevva, the generation tools in the platform only work as a business if the cost of a generation is low enough that creators can experiment freely without us bleeding money on every click. The lesson from Sora was that great output at the wrong cost ends the company. The lesson from Varya is that the cost is finally starting to move, fast, and the teams that build on the cheap-and-good models will outlast the ones chasing the expensive-and-spectacular ones.

The regional angle matters too. A model trained for Indian contexts and priced for Indian budgets is a reminder that the next wave of these tools won't all come from the same handful of US labs, and won't all be priced for US wallets. Cheaper, more local, more specific is a different competitive axis than bigger and more general, and it's one the incumbents are slow to defend.

References