Krea 2: A From-Scratch Foundation Image Model With Style Control

Krea released Krea 2 on May 12, 2026 — its first foundation image model built completely from scratch, focused on aesthetics, style control, and creative direction. Built on a 12-billion-parameter Diffusion Transformer, Krea 2 ships as a family of variants (Medium, Large, and a 2-second Turbo) with two checkpoints — Raw and Turbo — released as open weights under a custom license. It ranks among the top 10 models on the Artificial Analysis text-to-image leaderboard.
Advanced
A Foundation Model Built From Scratch
Unlike Krea’s earlier work, which fine-tuned existing open models, Krea 2 is a foundation model trained from the ground up. The team frames its core goal as moving past the homogenous “AI look” — the over-smoothed, plasticky aesthetic that betrays a synthetic image — toward genuine visual taste. The model is designed to span grainy film photography, clean studio shots, cinematic stills, illustrations, and digital paintings, and it leans on style control rather than ever-longer prompts. Creators can pass reference images and moodboards, combine multiple style references with adjustable influence strength, and tune cohesiveness across a batch.
Architecture
Krea 2 uses a single-stream Diffusion Transformer (DiT) scaled to 12 billion parameters. The design borrows several efficiency-minded components from recent transformer research:
- Attention: Grouped-Query Attention (GQA) with gated sigmoid attention and QK-Norm.
- MLP: SwiGLU layers at 4× expansion.
- Normalization: zero-centered RMSNorm.
- Positional encoding: 3D axial RoPE.
- Text encoder: Qwen 3 VL with multilayer feature aggregation.
- Autoencoders: Qwen Image VAE and FLUX 2 VAE.
A notable efficiency trick: “lightweight timestep modulation” replaces the per-block MLPs typically used for conditioning, cutting parameter count by 20–30% with no quality loss.
Training Pipeline
Krea 2 is trained in a multi-stage pipeline with progressive resolution scaling through 256px, 512px, and 1024px stages:
- Pretraining with progressive resolution scaling.
- Midtraining for broad stylistic coverage.
- Supervised fine-tuning (SFT) on curated high-quality data.
- Preference optimization (STPO) with human annotations.
- Reinforcement learning using multi-reward GRPO.
- Timestep distillation via Trajectory Distribution Matching (TDM) — the technique behind the Turbo variant.
The team reports using 8-bit training at low and medium resolutions for a 15–20% gain in training speed.
Variants, Speed, and Open Weights
Krea 2 ships in several sizes. Medium is the smaller, faster, more cost-efficient variant, strong on illustration, anime, and painterly styles. Large is more than twice the size, with particular strength in photorealism and “raw” aesthetics like motion blur, grain, and low dynamic range. The distilled Turbo generates an image in roughly 2 seconds, placing it among the fastest models available across both open and proprietary systems. Krea open-sourced two checkpoints — K2 Raw and K2 Turbo — captured at distinct milestones of training, released as open weights under a custom license.
What This Means
Krea 2 lands in an increasingly crowded open-weights image generation space, but its angle is distinct. Where many recent models compete on prompt adherence or raw parameter efficiency, Krea 2 bets on aesthetic range and style controllability as the differentiator — treating style as “something you can guide, mix, strengthen, reduce, and push.” Releasing both a Raw checkpoint (closer to base training, more steerable) and a distilled Turbo checkpoint gives researchers and product teams flexibility: the Raw weights for experimentation and fine-tuning, the Turbo weights for low-latency production. A top-10 finish on the Artificial Analysis leaderboard — and 2nd among independent labs — suggests the aesthetics-first approach does not come at the cost of overall quality.
Related Coverage
- Google Launches Nano Banana 2: Pro-Quality Image Generation at Flash Speed — another model chasing speed-plus-quality, from a proprietary lab.
- Z-Image: Alibaba’s Efficient 6B Open-Source Image Generation Model — a contrasting take on open-weights efficiency.
- Introducing FLUX.1 Kontext: Black Forest Labs’ Breakthrough in AI Image Editing — Krea 2 builds on the FLUX 2 VAE, tying the two lineages together.





沪公网安备31011502017015号