FuriosaAI develops data center AI accelerators. Our RNGD (pronounced “Renegade) accelerator, currently sampling, excels at high-performance inference for LLMs and agentic AI.

Get started fast with common inference tasks on RNGD using these pre-compiled popular Hugging Face models – no manual conversion or quantization needed. Requires Furiosa SDK 2025.2 or later on a server with RNGD accelerator.

Need a model with custom configurations? Compile it yourself using our Model Preparation Workflow on Furiosa Docs. Visit Supported Models in the SDK documentation for more information and learn more about RNGD at https://furiosa.ai/rngd

Pre-compiled models

Pre-compiled Model	Description	Base Model
furiosa-ai/bert-large-uncased-INT8	INT8 quantized, optimized for MLPerf	google-bert/bert-large-uncased
furiosa-ai/DeepSeek-R1-Distill-Llama-8B	BF16	deepseek-ai/DeepSeek-R1-Distill-Llama-8B
furiosa-ai/EXAONE-3.5-7.8B-Instruct	BF16	LGAI-EXAONE/EXAONE-3.5-7.8B-Instruct
furiosa-ai/Llama-3.1-8B-Instruct	BF16	meta-llama/Llama-3.1-8B-Instruct
furiosa-ai/Llama-3.1-8B-Instruct-FP8	FP8 quantized	meta-llama/Llama-3.1-8B-Instruct

Examples

First, install the pre-requisites by following Installing Furiosa-LLM.

Then, run the following command to start the Furiosa-LLM server with the Llama-3.1-8B-Instruct-FP8 model:

furiosa-llm serve furiosa-ai/Llama-3.1-8B-Instruct-FP8

Once your server has launched, you can query the model with input prompts:

curl http://localhost:8000/v1/chat/completions \
    -H "Content-Type: application/json" \
    -d '{
    "model": "EMPTY",
    "messages": [{"role": "user", "content": "What is the capital of France?"}]
    }' \
    | python -m json.tool

You can also learn more about usages from Quick Start with Furiosa-LLM.