top of page
⚡ FastFlowLM (FLM) — Unlock Ryzen™ AI NPUs

Run large language models — now with Vision, Audio, Embedding and MoE support — on AMD Ryzen™ AI NPUs in minutes.
No GPU required. Faster and over 10× more power-efficient. Support context lengths up to 256k tokens.

Ultra-Lightweight (16 MB). Installs within 20 seconds.

📦 The only out-of-box, NPU-first runtime built exclusively for Ryzen™ AI.
🤝 Think Ollama — but deeply optimized for NPUs.
✨ From Idle Silicon to Instant Power — FastFlowLM Makes Ryzen™ AI Shine.

    🔽 Download  📊 Benchmark  📦 Models

👉 GitHub  📖 Docs  📺 Demos

     🧪 Test Drive  💬 Discord​​​​​​

FLM in Action — Fast and Ultra-Efficient

Runs GPT-OSS-20B at 19 TPS (token per second) with 10× GPU efficiency — the fastest MoE on any NPU.

GPT-OSS-20BOpenAI100onRyzenAINPUSpeedBoosted-ezgif.com-optimize.gif

Transcribe hours of audio locally — FLM runs OpenAI Whisper fully on the NPU — fast, private, and efficient.

whisper.gif

Runs Meta Llama 3.2-3B at 28 TPS with over 10× GPU efficiency — the fastest on any NPU.

llama3.2_3b.gif
Why FastFlowLM (FLM)?

🧠 No Low-Level Tuning Needed

      Run your models without worrying about NPU internals — FLM handles all the hardware-level optimization. 

🧰 Ollama Simplicity — Optimized for Ryzen™ AI NPUs

      Same CLI/API workflow developers love, but deeply optimized for AMD’s Ryzen™ AI NPU architecture.  
💻 Free Your GPU & CPU

      FLM runs entirely on the Ryzen™ AI NPU, leaving the rest of the system free for other workloads.  
📏 Full Context Lengths

     All FLM models support the maximum context length — up to 256k tokens — enabling long-form reasoning.

​​​​​About the Company

FastFlowLM Inc. is a startup developing a runtime with custom kernels optimized for AMD Ryzen™ AI NPUs, enabling LLMs to run faster, more efficiently, and with extended contexts — all without GPU fallback. FLM is free for commercial use up to USD 10 million in annual company revenue.

📩 Contact: info@fastflowlm.com

bottom of page