LTX 2.3 brings open-source 4K video with synchronized audio
Lightricks has released LTX 2.3, an open-source video generation model that produces native 4K video at up to 50 FPS with synchronized stereo audio in a single forward pass. It's the first open-source model to combine high-resolution video and audio generation together, and it runs on consumer hardware.
Official LTX 2.3 introduction from Lightricks
What's in the model
LTX 2.3 is a 22-billion-parameter DiT (Diffusion Transformer) that handles video and audio generation as a unified task. It supports text-to-video, image-to-video, audio-to-video, video-to-video, and depth conditioning. You give it a text prompt and get back a video clip with matching sound. No separate audio generation step, no post-syncing.
The model generates at 24, 25, 48, or 50 FPS in clips up to 20 seconds. It handles both horizontal and native portrait (9:16) aspect ratios, which matters for anyone targeting social media formats. Audio output is 24 kHz stereo.
What changed from LTX 2
The biggest improvements over the previous version are sharper details and better motion. LTX 2 had a tendency to produce somewhat static visuals. LTX 2.3 addresses that with a rebuilt VAE (variational autoencoder) for crisper textures and more natural movement.
Prompt understanding got a significant upgrade too. The text connector is now 4x larger with added gated attention, which means the model more actively references different parts of your prompt throughout the generation process. If you describe a specific camera movement while a character performs an action, it's better at maintaining both simultaneously.
Other improvements include cleaner audio with fewer artifacts, motion control features like first/last frame guidance, and camera effects including dolly, jib, and focus shift.
Hands-on look at what LTX 2.3 can produce
Performance
Lightricks claims the model runs 18x faster than competing models on H100 GPUs. On consumer-grade hardware, it can produce a 5-second clip at 24 FPS in about 4 seconds. That's fast enough for iterative work where you're generating multiple takes and picking the best one.
Why this matters for game development
Open weights mean you can run LTX 2.3 locally without API costs or usage limits. For game developers, that opens up several use cases: generating cutscene prototypes, creating trailer footage during pre-production, producing marketing materials, or rapid-prototyping cinematic sequences before committing to full production.
The combined video-plus-audio output is particularly useful for game trailers and cinematics where you'd otherwise need to sync audio separately. Being able to iterate on visual and audio tone simultaneously, in a single generation pass, compresses the feedback loop considerably.
The model is free to use under a 10M annual revenue threshold. It's available through the Lightricks API Playground, ComfyUI, PyTorch, and third-party platforms like Replicate.