Skip to content

Pika Agents launch real-time video chat with a face, voice, and memory

Pika Labs released Pika Agents today, two days after Sora went dark. The product is a sharp pivot from "generate a clip" to "generate a presence." Each agent has a face, a voice, a personality, and persistent memory that survives across Slack, Telegram, Discord, X, Notion, Figma, and Google Meet. Once you've shaped one, it sticks around and learns how you like ideas framed.

A walkthrough of what Pika Agents can do on day one

PikaStream 1.0: the engine under the hood

The real-time video chat is powered by PikaStream 1.0, a model Pika first previewed on April 2. It generates personalized 24 FPS video at 480p on a single H100, with end-to-end speech-to-video latency around 1.5 seconds. The agent reacts with synchronized lip-sync, facial expressions, and what Pika calls "emotionally appropriate" body language. Pricing is $0.20 per minute of streamed video.

The integration model is the interesting part. Instead of building a Pika app, Pika ships skills. The Google Meet skill lets you invite your "AI Self" into a meeting as a participant. The agent joins the call, listens, and responds in video. Zoom and FaceTime are listed as next.

Memory that persists across surfaces

Pika is selling agents as a single entity that follows you across tools. The agent you've trained in Discord remembers your context when you ask it something in Notion. It remembers the running joke from yesterday and the brief you gave it last week. For repeat creative work (a weekly newsletter cover, recurring brand video, agency client whose voice you keep getting wrong) the persistence is the actual product.

Early users have been training agents on specific recurring tasks: storyboard-to-animation, brief-to-ad-cut, ongoing character consistency across episodic content. The agent owns the rhythm. You direct.

Pika 2026 feature breakdown including Agents and PikaStream

The launch campaign is its own thing

Pika commissioned a launch film with production company Ceiling Train and director Josh Cohen. The film leans hard into the "Black Mirror" framing, with users "birthing" their AI Selves and letting them loose. It trended on X for two days straight. Whether that's a feature or a warning depends on which side of the AI agent debate you sit on.

Why this matters for game and creator workflows

For game studios, persistent video agents are an obvious fit for NPC prototyping, voice direction, and table-read sessions. Drop the agent into a Discord channel for the writing team and treat it as the in-canon voice of a character you're developing. For creator workflows, the agency-team-of-one play is real: one operator, three agents trained on three clients, all running in parallel across Slack and Meet.

The unit economics also look durable in a way Sora's didn't. PikaStream's H100-per-stream cost at $0.20 per minute is roughly margin-positive at current GPU prices. The agent personality data, not the model weights, is where the moat lives.

What we'd test first

If you're a small team, the highest-value experiment is probably the Google Meet skill. Train an agent on your product positioning, invite it to your next external pitch as a silent participant, then have it summarize the conversation back to you. It's the AI meeting note-taker pattern but with a face that the other side can see and react to.

For larger studios, the more interesting test is multi-agent. Two agents in the same channel, each trained on different parts of a brand, debating a creative choice while a human moderates. That's the workflow shape that wasn't possible before persistent memory across surfaces.

References