OpenAI and Broadcom Unveil Jalapeño, a Custom LLM Inference Chip

OpenAI and Broadcom on June 24, 2026 unveiled Jalapeño, OpenAI’s first custom-designed silicon — an “Intelligence Processor” built from scratch for large language model inference. The chip is the first product of the companies’ 10-gigawatt accelerator partnership announced last October, and OpenAI says it was taken from initial design to manufacturing tape-out in just nine months, a cycle the company believes is the fastest ever for high-performance semiconductors.

Intermediate

OpenAI and Broadcom leaders display the Jalapeño inference chip at its unveiling.
Image credit: OpenAI

What Jalapeño Is

Jalapeño is a “blank-slate” accelerator designed specifically for LLM inference — the act of running a trained model to serve answers — rather than a general-purpose AI chip adapted from earlier workloads. OpenAI designed the architecture around its own understanding of how frontier models behave: the kernels, memory-movement patterns, networking, and serving systems behind ChatGPT, Codex, and its API. Broadcom (NASDAQ: AVGO) handled the silicon implementation and networking, including its Tomahawk networking silicon, while Canadian manufacturer Celestica contributed board, rack, and system integration.

The stated goal is to combine the throughput of today’s leading accelerators with latency closer to specialized inference systems — making the chip well suited for interactive products at scale. Crucially, OpenAI says Jalapeño is built to run all LLMs across the industry, not just its own. Engineering samples are already running real ML workloads in the lab at production target frequency and power, including the GPT-5.3-Codex-Spark model.

Performance and the Nine-Month Sprint

OpenAI says early testing shows Jalapeño will deliver “performance per watt substantially better than current state-of-the-art,” with a detailed technical report promised in the coming months. The architecture’s efficiency comes from reducing data movement and balancing compute, memory, and networking so realized utilization lands much closer to theoretical peak — a recurring bottleneck for GPU-based inference.

The most striking claim is the development speed. A nine-month design-to-tape-out cycle for a high-performance ASIC is extraordinarily fast; such projects typically take years. OpenAI attributes this to deep software-hardware co-development with Broadcom and — notably — the use of its own AI models to accelerate parts of the design and optimization process. As the company frames it, the same models served to users are now helping build the infrastructure that will run future models.

“Jalapeño was designed from the ground up for LLM inference using detailed insights from our close collaboration with OpenAI researchers,” said Richard Ho, who leads OpenAI’s hardware program. “We optimized the architecture around the kernels, memory movement, networking, and serving patterns that matter most for frontier AI models.”

What This Means

Jalapeño extends OpenAI’s strategy of owning its full stack — from products to models and now to chips. By designing the hardware itself, OpenAI can co-optimize every layer toward the same goal: faster, cheaper, more reliable inference. President and co-founder Greg Brockman called the chip “part of our long-term full-stack infrastructure strategy to make compute more abundant.”

For the broader market, the move adds pressure on Nvidia’s pricing power in AI accelerators by giving a major buyer a custom alternative for inference. Jalapeño is the first step in a multi-generation platform targeting initial deployment by the end of 2026, scaling to gigawatt-class data centers with Microsoft and other partners. Broadcom CEO Hock Tan described it as “just the beginning of a multi-generation roadmap.” If AI-assisted chip design continues to compress development timelines, it could lower the cost of compute across the industry — and reshape who controls the hardware underneath frontier AI.

Related Coverage

Sources