top of page

Deploy large language models (LLMs) on AMD Ryzen™ AI NPUs—in minutes.

Think Ollama, but purpose-built for the great AMD Ryzen™ NPUs.

FastFlowLM project runs LLaMA, DeepSeek-R1, and more—hardware-optimized for lightning-fast, low-power, private and always-on AI on the silicon already in your AI PC.

👨‍💻 Made for Developers Building Local AI Agents

  • 🧠 No Low-Level Tuning Needed

    • You don’t need to understand NPU internals—just run your model, we handle the low-level hardware optimization.

  • 🧰 Works Like Ollama—But For AMD Ryzen™ NPU AI

    • CLI and API simplicity developers love, with the efficiency and control of native NPU execution.

  • 💻 No GPU or CPU Burden

    • FastFlowLM runs entirely on the Ryzen™ NPU, freeing system resources for other applications.

  • 📏 Full Context Support

    • All FastFlowLM models support full context windows—up to 128k tokens on LLaMA 3.1/3.2—so you can run long-form reasoning and RAG without compromise.​

  • ⚡ Performance That Speaks for Itself
    * ​Compared to AMD Ryzen™ AI Software 1.4 (GAIA or Lemonade):

    • LLM Decoding Speed (TPS: Tokens per Second)
      🚀 Up to 14.2× faster in LLM decoding (vs NPU-only baseline)
      🚀 Up to 16.2× faster in LLM decoding (vs hybrid iGPU+NPU baseline)

    • Power Efficiency
      🔋 Up to 2.66× more power efficient in LLM decoding (vs NPU-only)
      🔋 Up to 11.38× more power efficient in LLM decoding (vs hybrid)
      🔋 Up to 3.4× more power efficient in LLM prefill (vs NPU-only or hybrid mode)

    • Latency (LLM Prefill Speed)
      🚀 Matches or exceeds the Time to First Token (TTFT) of the NPU-only or hybrid mode

bottom of page