FuriosaAI develops data center AI accelerators. Our RNGD (pronounced "Renegade") accelerator, currently sampling, excels at high-performance inference for LLMs and agentic AI.
Each of these popular Hugging Face models ships a pre-compiled bundle, so you can run it immediately on RNGD — all you need is an RNGD accelerator and a recent version of Furiosa-LLM. See the Quick Start to get set up.
Each repository ships the model files together with a matching Furiosa Executable Bundle (FXB), so furiosa-llm serve <repo> runs it on RNGD with no extra setup. See FXB for how bundles are built, cached, and distributed.
Need a model with custom configurations? Build your own FXB with fxb build on Furiosa Docs.
Visit Supported Models in the SDK documentation
for more information and learn more about RNGD at https://furiosa.ai/rngd.
The table below highlights a selection of the pre-compiled models; you can find all of them at https://huggingface.co/furiosa-ai/models, and curated sets at https://huggingface.co/furiosa-ai/collections.
First, install the pre-requisites by following Installing Furiosa-LLM.
Then, run the following command to start the Furiosa-LLM server. We use furiosa-ai/Qwen3-4B-FP8 here — the smallest of the featured models, so it is the quickest to download and run:
furiosa-llm serve furiosa-ai/Qwen3-4B-FP8
Qwen3 is a hybrid reasoning model that thinks by default. To return the chain of thought in a separate field, launch it with the qwen3 reasoning parser:
furiosa-llm serve furiosa-ai/Qwen3-4B-FP8 \
--reasoning-parser qwen3
Once your server has launched, you can query the model with input prompts:
curl http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "furiosa-ai/Qwen3-4B-FP8",
"messages": [{"role": "user", "content": "What is the capital of France?"}]
}' \
| python -m json.tool
You can also learn more about usages from Quick Start with Furiosa-LLM, the per-model guides under Supported Models, and the FXB guide.