LTX-2.3: Sharper Video, Native Portrait, and Cleaner Audio in Lightricks’ Latest Open-Source Model

Lightricks has released LTX-2.3, the latest update to its open-source DiT-based audio-video generation model, bringing a rebuilt VAE for sharper output, native portrait video support, cleaner audio, and improved prompt adherence — all under an Apache 2.0 license.

Conceptual illustration of AI video generation with synchronized audio, showing a glowing film strip with sound waves
Illustration generated by AI

What’s New in LTX-2.3

Released on March 5, 2026, LTX-2.3 is a significant refinement of the LTX-2 foundation model — a 22-billion-parameter Diffusion Transformer (DiT) that generates synchronized video and audio from a single architecture. The update focuses on four key areas:

  • Rebuilt VAE: A completely redesigned variational autoencoder trained on higher-quality data produces sharper fine details, more realistic textures, and cleaner edges across all resolutions. Previous versions were noted for “softer than desired” output, particularly with hair and edge detail — LTX-2.3 addresses this directly.
  • Better prompt understanding: An upgraded gated-attention text connector bridges prompt encoding and generation more faithfully. Complex descriptions of timing, motion, and expression now translate more accurately into the output.
  • Native portrait video: For the first time, LTX supports vertical 1080×1920 (9:16) video trained on native portrait data — not cropped landscape footage.
  • Cleaner audio: A new vocoder with filtered training data removes silence gaps, noise artifacts, and random sounds. Audio alignment with visuals is tighter across both text-to-video and audio-to-video pipelines.

Technical Specifications

LTX-2.3 ships in two variants — a full dev checkpoint and a distilled version optimized for faster inference (8 steps for stage 1, 4 for stage 2). Both are 22B parameters. Key capabilities include:

  • Resolution: Up to 4K (2160p)
  • Duration: Up to 20 seconds per generation, extendable via a dedicated endpoint
  • Frame rates: 24 or 48 FPS
  • Pipelines: Text-to-video, image-to-video, audio-to-video, video-to-video, extend-video, retake-video, and keyframe interpolation
  • Controls: Multiple LoRAs for pose, camera motion, inpainting, depth conditioning, and region-based regeneration
  • Optimization: FP8 quantization for reduced VRAM, xFormers and Flash Attention 3 support

Image-to-video generation also received targeted improvements: the reworked training pipeline reduces the common “Ken Burns” effect — where generated videos produce slow pans or freeze instead of genuine motion.

Availability and Ecosystem

LTX-2.3 weights are available on Hugging Face under the Apache 2.0 license, including the base dev checkpoint, FP8 quantized variant, and distilled model. ComfyUI has day-0 support with stable reference workflows, and Lightricks is also shipping LTX Desktop Beta — a free, open-source desktop application.

For cloud users, fal.ai hosts all seven endpoints with pricing starting at $0.04/second for fast 1080p text-to-video, scaling to $0.24/second for 4K output.

What This Means

LTX-2.3 continues to push the boundary of what open-source video generation can deliver. With native 4K, synchronized audio, portrait support, and a full LoRA fine-tuning framework available under a permissive license, it offers a compelling alternative to closed models. The combination of a rebuilt VAE and improved prompt adherence addresses two of the most common pain points in AI video generation — soft output and prompt drift. For researchers and developers building video generation pipelines, the Apache 2.0 licensing and comprehensive tooling (ComfyUI workflows, control LoRAs, desktop app) lower the barrier to adoption significantly.

Related Coverage

Sources