Krea 2: A From-Scratch Foundation Image Model With Style Control

Krea released Krea 2 on May 12, 2026 — its first foundation image model built completely from scratch, focused on aesthetics, style control, and creative direction. Built on a 12-billion-parameter Diffusion Transformer, Krea 2 ships as a family of variants (Medium, Large, and a 2-second Turbo) with two checkpoints — Raw and Turbo — released as open weights under a custom license. It ranks among the top 10 models on the Artificial Analysis text-to-image leaderboard.

Advanced

Krea 2 hero image showing a range of aesthetic styles produced by the model, from film photography to cinematic stills and digital paintings
Image credit: Krea 2 Technical Report

A Foundation Model Built From Scratch

Unlike Krea’s earlier work, which fine-tuned existing open models, Krea 2 is a foundation model trained from the ground up. The team frames its core goal as moving past the homogenous “AI look” — the over-smoothed, plasticky aesthetic that betrays a synthetic image — toward genuine visual taste. The model is designed to span grainy film photography, clean studio shots, cinematic stills, illustrations, and digital paintings, and it leans on style control rather than ever-longer prompts. Creators can pass reference images and moodboards, combine multiple style references with adjustable influence strength, and tune cohesiveness across a batch.

Architecture

Krea 2 uses a single-stream Diffusion Transformer (DiT) scaled to 12 billion parameters. The design borrows several efficiency-minded components from recent transformer research:

  • Attention: Grouped-Query Attention (GQA) with gated sigmoid attention and QK-Norm.
  • MLP: SwiGLU layers at 4× expansion.
  • Normalization: zero-centered RMSNorm.
  • Positional encoding: 3D axial RoPE.
  • Text encoder: Qwen 3 VL with multilayer feature aggregation.
  • Autoencoders: Qwen Image VAE and FLUX 2 VAE.

A notable efficiency trick: “lightweight timestep modulation” replaces the per-block MLPs typically used for conditioning, cutting parameter count by 20–30% with no quality loss.

Diagram of the Krea 2 single-stream multimodal Diffusion Transformer block
Image credit: Krea 2 Technical Report

Training Pipeline

Krea 2 is trained in a multi-stage pipeline with progressive resolution scaling through 256px, 512px, and 1024px stages:

  1. Pretraining with progressive resolution scaling.
  2. Midtraining for broad stylistic coverage.
  3. Supervised fine-tuning (SFT) on curated high-quality data.
  4. Preference optimization (STPO) with human annotations.
  5. Reinforcement learning using multi-reward GRPO.
  6. Timestep distillation via Trajectory Distribution Matching (TDM) — the technique behind the Turbo variant.

The team reports using 8-bit training at low and medium resolutions for a 15–20% gain in training speed.

Diagram of the multi-stage Krea 2 training pipeline
Image credit: Krea 2 Technical Report

Variants, Speed, and Open Weights

Krea 2 ships in several sizes. Medium is the smaller, faster, more cost-efficient variant, strong on illustration, anime, and painterly styles. Large is more than twice the size, with particular strength in photorealism and “raw” aesthetics like motion blur, grain, and low dynamic range. The distilled Turbo generates an image in roughly 2 seconds, placing it among the fastest models available across both open and proprietary systems. Krea open-sourced two checkpoints — K2 Raw and K2 Turbo — captured at distinct milestones of training, released as open weights under a custom license.

What This Means

Krea 2 lands in an increasingly crowded open-weights image generation space, but its angle is distinct. Where many recent models compete on prompt adherence or raw parameter efficiency, Krea 2 bets on aesthetic range and style controllability as the differentiator — treating style as “something you can guide, mix, strengthen, reduce, and push.” Releasing both a Raw checkpoint (closer to base training, more steerable) and a distilled Turbo checkpoint gives researchers and product teams flexibility: the Raw weights for experimentation and fine-tuning, the Turbo weights for low-latency production. A top-10 finish on the Artificial Analysis leaderboard — and 2nd among independent labs — suggests the aesthetics-first approach does not come at the cost of overall quality.

Related Coverage

Sources