Deploy large language models (LLMs) on AMD Ryzen™ AI NPUs—in minutes.
Think Ollama, but purpose-built for the great AMD Ryzen™ NPUs.
FastFlowLM project runs LLaMA, DeepSeek-R1, and more—hardware-optimized for lightning-fast, low-power, private and always-on AI on the silicon already in your AI PC.
👨💻 Made for Developers Building Local AI Agents
-
🧠 No Low-Level Tuning Needed
-
You don’t need to understand NPU internals—just run your model, we handle the low-level hardware optimization.
-
-
🧰 Works Like Ollama—But For AMD Ryzen™ NPU AI
-
CLI and API simplicity developers love, with the efficiency and control of native NPU execution.
-
-
💻 No GPU or CPU Burden
-
FastFlowLM runs entirely on the Ryzen™ NPU, freeing system resources for other applications.
-
-
📏 Full Context Support
-
All FastFlowLM models support full context windows—up to 128k tokens on LLaMA 3.1/3.2—so you can run long-form reasoning and RAG without compromise.
-
-
⚡ Performance That Speaks for Itself
* Compared to AMD Ryzen™ AI Software 1.4 (GAIA or Lemonade):-
LLM Decoding Speed (TPS: Tokens per Second)
🚀 Up to 14.2× faster in LLM decoding (vs NPU-only baseline)
🚀 Up to 16.2× faster in LLM decoding (vs hybrid iGPU+NPU baseline) -
Power Efficiency
🔋 Up to 2.66× more power efficient in LLM decoding (vs NPU-only)
🔋 Up to 11.38× more power efficient in LLM decoding (vs hybrid)
🔋 Up to 3.4× more power efficient in LLM prefill (vs NPU-only or hybrid mode) -
Latency (LLM Prefill Speed)
🚀 Matches or exceeds the Time to First Token (TTFT) of the NPU-only or hybrid mode
-