FuriosaAI develops data center AI accelerators. Our RNGD (pronounced "Renegade") accelerator, currently sampling, excels at high-performance inference for LLMs and agentic AI.

Each of these popular Hugging Face models ships a pre-compiled bundle, so you can run it immediately on RNGD — all you need is an RNGD accelerator and a recent version of Furiosa-LLM. See the Quick Start to get set up.

Each repository ships the model files together with a matching Furiosa Executable Bundle (FXB), so furiosa-llm serve <repo> runs it on RNGD with no extra setup. See FXB for how bundles are built, cached, and distributed.

Need a model with custom configurations? Build your own FXB with fxb build on Furiosa Docs. Visit Supported Models in the SDK documentation for more information and learn more about RNGD at https://furiosa.ai/rngd.

Featured Pre-compiled models

The table below highlights a selection of the pre-compiled models; you can find all of them at https://huggingface.co/furiosa-ai/models, and curated sets at https://huggingface.co/furiosa-ai/collections.

Pre-compiled Model	Quantization	Base Model	Support Version
furiosa-ai/EXAONE-4.0-32B-FP8	FP8	LGAI-EXAONE/EXAONE-4.0-32B-FP8	>= 2026.1
furiosa-ai/K-EXAONE-236B-A23B-NVFP4A16	NVFP4A16	LGAI-EXAONE/K-EXAONE-236B-A23B	>= 2026.3
furiosa-ai/Llama-3.1-8B-Instruct	BF16	meta-llama/Llama-3.1-8B-Instruct	>= 2025.2
furiosa-ai/Llama-3.3-70B-Instruct	BF16	meta-llama/Llama-3.3-70B-Instruct	>= 2025.3
furiosa-ai/Qwen2.5-0.5B-Instruct	BF16	Qwen/Qwen2.5-0.5B-Instruct	>= 2026.1
furiosa-ai/Qwen3-30B-A3B-FP8	FP8	Qwen/Qwen3-30B-A3B-FP8	>= 2026.3
furiosa-ai/Qwen3-30B-A3B-Instruct-2507-FP8	FP8	Qwen/Qwen3-30B-A3B-Instruct-2507-FP8	>= 2026.3
furiosa-ai/Qwen3-30B-A3B-Thinking-2507-FP8	FP8	Qwen/Qwen3-30B-A3B-Thinking-2507-FP8	>= 2026.3
furiosa-ai/Qwen3-32B-FP8	FP8	Qwen/Qwen3-32B-FP8	>= 2026.1
furiosa-ai/Qwen3-4B-FP8	FP8	Qwen/Qwen3-4B-FP8	>= 2026.3
furiosa-ai/Qwen3-8B-FP8	FP8	Qwen/Qwen3-8B-FP8	>= 2026.3
furiosa-ai/Qwen3-Coder-30B-A3B-Instruct-FP8	FP8	Qwen/Qwen3-Coder-30B-A3B-Instruct-FP8	>= 2026.3
furiosa-ai/Qwen3-Embedding-8B	BF16	Qwen/Qwen3-Embedding-8B	>= 2026.1
furiosa-ai/Qwen3-Reranker-8B	BF16	Qwen/Qwen3-Reranker-8B	>= 2026.1
furiosa-ai/Qwen3-VL-32B-Instruct	BF16	Qwen/Qwen3-VL-32B-Instruct	>= 2026.3
furiosa-ai/Solar-Open-100B-NVFP4A16	NVFP4A16	upstage/Solar-Open-100B	>= 2026.3
furiosa-ai/gpt-oss-120b	MXFP4	openai/gpt-oss-120b	>= 2026.3

Examples

First, install the pre-requisites by following Installing Furiosa-LLM.

Then, run the following command to start the Furiosa-LLM server. We use furiosa-ai/Qwen3-4B-FP8 here — the smallest of the featured models, so it is the quickest to download and run:

furiosa-llm serve furiosa-ai/Qwen3-4B-FP8

Qwen3 is a hybrid reasoning model that thinks by default. To return the chain of thought in a separate field, launch it with the qwen3 reasoning parser:

furiosa-llm serve furiosa-ai/Qwen3-4B-FP8 \
  --reasoning-parser qwen3

Once your server has launched, you can query the model with input prompts:

curl http://localhost:8000/v1/chat/completions \
    -H "Content-Type: application/json" \
    -d '{
    "model": "furiosa-ai/Qwen3-4B-FP8",
    "messages": [{"role": "user", "content": "What is the capital of France?"}]
    }' \
    | python -m json.tool

You can also learn more about usages from Quick Start with Furiosa-LLM, the per-model guides under Supported Models, and the FXB guide.