Jan-v1-4B: Next-Gen Agentic LLM for Web-Enhanced Reasoning

August 14, 2025Provided by Utku Ege Tuluk

Here’s an enhanced blog-post-style overview of the Jan‑v1‑4B model from the Hugging Face Hub:

Overview

Jan-v1-4B is the inaugural model in the Jan family—crafted for agentic reasoning and problem-solving within the Jan App, an AI assistant platform from Menlo Research. It is fine-tuned from the Lucy model architecture, leveraging superior model scaling for enhanced performance
(Hugging Face).

Powered by Qwen3‑4B‑thinking, this model is designed for advanced reasoning and robust tool integration capabilities, making it well-suited for complex, multi-step tasks
(Hugging Face).

Performance Highlights

On the SimpleQA benchmark—a factual question answering measure—Jan‑v1 achieves an impressive 91.1% accuracy, marking a notable milestone for models of this size
(Hugging Face).
It also performs strongly on chat and instructional benchmarks, showcasing balanced conversational abilities
(Hugging Face).

Getting Started

Integration with Jan App

Users can seamlessly access Jan‑v1 by selecting it from the Jan App interface—no additional setup required
(Hugging Face).

Local Deployment

To run Jan‑v1 locally, two popular frameworks are supported:

vLLM vllm serve janhq/Jan-v1-4B \ --host 0.0.0.0 \ --port 1234 \ --enable-auto-tool-choice \ --tool-call-parser hermes
llama.cpp llama-server --model Jan-v1-4B-Q4_K_M.gguf \ --host 0.0.0.0 \ --port 1234 \ --jinja \ --no-context-shift

(Hugging Face)

Recommended Inference Settings

Users are advised to apply the following parameters for optimal performance:

temperature: 0.6  
top_p: 0.95  
top_k: 20  
min_p: 0.0  
max_tokens: 2048

(Hugging Face)

Quantization Variant

There’s also a GGUF-formatted version—denoted Jan‑v1‑4B‑GGUF—which offers multiple quantization options (4-bit, 5-bit, 6-bit, 8-bit), useful for efficient local deployment
(Hugging Face).

Summary

Attribute	Description
Model Type	Open-source, 4-billion parameter agentic LLM
Architecture	Lucy-based, leveraging Qwen3-4B-thinking
Benchmarks	91.1% SimpleQA accuracy; strong chat/instructional performance
Deployment	Integrated in Jan App; local support via vLLM and llama.cpp
Settings	Recommended inference parameters (temp, top_p/k, etc.)
Quant Variant	GGUF version with efficient quantization support