Jan-v1-4B: Next-Gen Agentic LLM for Web-Enhanced Reasoning

Here’s an enhanced blog-post-style overview of the Jan‑v1‑4B model from the Hugging Face Hub:

Overview

Jan-v1-4B is the inaugural model in the Jan family—crafted for agentic reasoning and problem-solving within the Jan App, an AI assistant platform from Menlo Research. It is fine-tuned from the Lucy model architecture, leveraging superior model scaling for enhanced performance
(Hugging Face).

Powered by Qwen3‑4B‑thinking, this model is designed for advanced reasoning and robust tool integration capabilities, making it well-suited for complex, multi-step tasks
(Hugging Face).


Performance Highlights

  • On the SimpleQA benchmark—a factual question answering measure—Jan‑v1 achieves an impressive 91.1% accuracy, marking a notable milestone for models of this size
    (Hugging Face).
  • It also performs strongly on chat and instructional benchmarks, showcasing balanced conversational abilities
    (Hugging Face).

Getting Started

Integration with Jan App

Users can seamlessly access Jan‑v1 by selecting it from the Jan App interface—no additional setup required
(Hugging Face).

Local Deployment

To run Jan‑v1 locally, two popular frameworks are supported:

  • vLLM vllm serve janhq/Jan-v1-4B \ --host 0.0.0.0 \ --port 1234 \ --enable-auto-tool-choice \ --tool-call-parser hermes
  • llama.cpp llama-server --model Jan-v1-4B-Q4_K_M.gguf \ --host 0.0.0.0 \ --port 1234 \ --jinja \ --no-context-shift

(Hugging Face)

Recommended Inference Settings

Users are advised to apply the following parameters for optimal performance:

temperature: 0.6  
top_p: 0.95  
top_k: 20  
min_p: 0.0  
max_tokens: 2048

(Hugging Face)


Quantization Variant

There’s also a GGUF-formatted version—denoted Jan‑v1‑4B‑GGUF—which offers multiple quantization options (4-bit, 5-bit, 6-bit, 8-bit), useful for efficient local deployment
(Hugging Face).


Summary

AttributeDescription
Model TypeOpen-source, 4-billion parameter agentic LLM
ArchitectureLucy-based, leveraging Qwen3-4B-thinking
Benchmarks91.1% SimpleQA accuracy; strong chat/instructional performance
DeploymentIntegrated in Jan App; local support via vLLM and llama.cpp
SettingsRecommended inference parameters (temp, top_p/k, etc.)
Quant VariantGGUF version with efficient quantization support