Name: FastFlowLM
Telephone: 123-456-7890

Deploy large language models (LLMs) on AMD Ryzen™ AI NPUs—in minutes.

Think Ollama, but purpose-built for the great AMD Ryzen™ NPUs.

FastFlowLM project runs LLaMA, DeepSeek-R1, and more—hardware-optimized for lightning-fast, low-power, private and always-on AI on the silicon already in your AI PC.

👨‍💻 Made for Developers Building Local AI Agents

🧠 No Low-Level Tuning Needed
- You don’t need to understand NPU internals—just run your model, we handle the low-level hardware optimization.
🧰 Works Like Ollama—But For AMD Ryzen™ NPU AI
- CLI and API simplicity developers love, with the efficiency and control of native NPU execution.
💻 No GPU or CPU Burden
- FastFlowLM runs entirely on the Ryzen™ NPU, freeing system resources for other applications.
📏 Full Context Support
- All FastFlowLM models support full context windows—up to 128k tokens on LLaMA 3.1/3.2—so you can run long-form reasoning and RAG without compromise.
⚡ Performance That Speaks for Itself
* Compared to AMD Ryzen™ AI Software 1.4 (GAIA or Lemonade):
- LLM Decoding Speed (TPS: Tokens per Second)
  🚀 Up to 14.2× faster in LLM decoding (vs NPU-only baseline)
  🚀 Up to 16.2× faster in LLM decoding (vs hybrid iGPU+NPU baseline)
- Power Efficiency
  🔋 Up to 2.66× more power efficient in LLM decoding (vs NPU-only)
  🔋 Up to 11.38× more power efficient in LLM decoding (vs hybrid)
  🔋 Up to 3.4× more power efficient in LLM prefill (vs NPU-only or hybrid mode)
- Latency (LLM Prefill Speed)
  🚀 Matches or exceeds the Time to First Token (TTFT) of the NPU-only or hybrid mode