IT Infrastructure

AMD Ryzen AI Max+ 395: The 128GB Chip That Runs 100B-Parameter Models Locally

AMD Ryzen AI Max+ 395, the Strix Halo APU with 16 Zen 5 cores, a 40-compute-unit Radeon 8060S integrated GPU, a 50 TOPS NPU, and up to 128GB of unified LPDDR5X memory, shown as a compact mini PC on a desk that can load and run 100-billion-parameter local language models that will not fit on a single consumer discrete GPU, with the tradeoff that its roughly 256 GB per second memory bandwidth makes it a capacity win rather than a raw-speed win.

The Ryzen AI Max+ 395 is AMD’s answer to a specific, growing frustration: the best open-weight language models have outgrown the graphics cards most people can buy. A 120-billion-parameter model will not fit in the 16 or 24GB of memory on a consumer GPU, so running it locally has meant either quantizing it into mediocrity or renting cloud time. AMD’s Strix Halo chip attacks that problem from a different direction. By giving a single processor up to 128GB of unified memory that the CPU and integrated GPU share, the Ryzen AI Max+ 395 can load models that no single consumer discrete GPU can hold, in a mini PC that costs around two thousand dollars.

The short version: this is a capacity breakthrough, not a speed one, and the distinction is the whole story. The Ryzen AI Max+ 395 lets you run very large models locally that were previously off-limits without a multi-GPU rig or a cloud account. It does not run them as fast as a high-end discrete GPU runs the smaller models that fit in GPU memory, because it is limited by memory bandwidth rather than raw compute. Understanding that tradeoff is the difference between being delighted and being disappointed by this chip. This piece covers what the Ryzen AI Max+ 395 is, why the unified memory matters, what it can actually run and how fast, the honest caveat that the viral benchmarks obscure, pricing and availability, and who should actually buy one.

What the Ryzen AI Max+ 395 is

The Ryzen AI Max+ 395 is an APU, a single chip that combines a CPU, a GPU, and a neural processing unit, from AMD’s Strix Halo family. It pairs 16 Zen 5 CPU cores running up to 5.1GHz with a large integrated GPU, the 40-compute-unit RDNA 3.5 Radeon 8060S, and a 50 TOPS XDNA 2 NPU for on-device AI acceleration. It was the first x86 processor to ship with up to 128GB of unified memory.

That unified memory is the headline. The chip uses up to 128GB of LPDDR5X-8000 on a 256-bit bus, shared between the CPU and the integrated GPU, delivering roughly 215 to 256 GB per second of real-world bandwidth. On a conventional PC, system RAM and GPU VRAM are separate pools, and a model has to fit in the GPU’s much smaller VRAM to run on the GPU. Here, the GPU can address the whole 128GB pool, which is what makes the large-model trick possible.

You will find the Ryzen AI Max+ 395 in compact desktops and mini PCs rather than thin laptops, including AMD’s own Ryzen AI Halo developer platform, the Framework Desktop, and mini PCs like the GMKtec EVO-X2.

Why the unified memory matters

The constraint on local AI has never really been compute, it has been memory. To run a language model, the whole model has to fit in the memory the processor can reach. A 120-billion-parameter model needs well over a hundred gigabytes even when quantized, and consumer graphics cards top out at 16 to 32GB. That is why running frontier-size open models locally has been the preserve of people with multiple high-end GPUs or a willingness to rent cloud compute.

The Ryzen AI Max+ 395 changes the math by making 128GB addressable by the GPU. Suddenly a model that physically cannot load on a single consumer discrete GPU loads and runs on a single mini PC. For anyone who wants to run large open-weight models privately, offline, or without per-token cloud bills, that is a genuine unlock. Our explainer on Llama covers the open-weight model families this hardware is built to run, and our look at hardware for AI agents covers why memory capacity, more than raw compute, increasingly decides what you can run at the edge.

What it can actually run, and how fast

The performance numbers are real and worth knowing. On a 30-billion-parameter model, the Ryzen AI Max+ 395 reaches around 100 tokens per second, which is comfortable real-time speed for a single user chatting with a model. It runs OpenAI’s GPT-OSS 120B, a 120-billion-parameter model, entirely in unified memory at roughly 55 tokens per second, which is slower but still usable for single-user work. AMD reports the integrated Radeon 8060S delivering up to about 2.2 times the throughput of Intel’s Arc 140V, and on 14-billion-parameter models the platform running up to roughly 12 times faster than an Intel Core Ultra 258V laptop.

The pattern is consistent: this chip is very good at running large models at usable single-user speeds, and it comfortably beats the integrated graphics in mainstream thin-and-light laptops. Where it needs a caveat is any comparison to a high-end discrete GPU, which is where most of the confusion has come from.

The honest caveat: capacity, not speed

The viral framing around this chip, some version of "it beats an RTX 5080," is misleading, and it is worth being precise about why. The Ryzen AI Max+ 395 can appear to beat a high-end GPU only on models the GPU physically cannot load. A 16GB graphics card cannot hold a 120B model at all, so on that specific test the AMD chip "wins" by running a model the GPU cannot run. That is a capacity result, not a speed result.

On raw speed, a high-end discrete GPU is far ahead wherever a model actually fits in its memory. The Ryzen AI Max+ 395 is bandwidth-constrained: its roughly 256 GB per second of memory bandwidth is a fraction of the roughly 1,792 GB per second on an RTX 5090. Memory bandwidth is the binding constraint on how fast a model generates tokens, so for models under about 32 billion parameters, which fit comfortably in a discrete GPU’s VRAM, the discrete GPU is meaningfully faster. The honest positioning is that the Ryzen AI Max+ 395 lets you run models you otherwise could not run at all on a single consumer machine, and runs them at usable rather than blazing speeds. If your models fit on a GPU you already own, a GPU is the faster choice.

Pricing and availability

Pricing depends on the memory configuration, because the memory is the point. As of mid-2026, a 64GB configuration starts around 1,499 dollars, while the 128GB version that you need for the largest models runs closer to 2,200 dollars. The Framework Desktop with 128GB has been widely cited as the best value at around 1,999 dollars. AMD’s Ryzen AI Halo developer platform built on the chip began pre-orders in mid-2026, initially through Micro Center, and third-party mini PCs are available from vendors like GMKtec.

For a machine that runs 100B-plus parameter models locally, roughly two thousand dollars is a striking price. The comparison is not a gaming laptop, it is the multi-GPU workstation or the ongoing cloud bill you would otherwise need to run the same models.

Software and setup

Getting the most out of the chip takes a little care. On Windows, LM Studio with the Vulkan backend works without special configuration, which is the easiest path for most people. Ollama also works but may need to be manually pointed at the GPU. On Linux you can unlock more of the memory for graphics use, with reports of around 110GB usable, though that does not happen automatically and Windows does not reach the same level. If you are buying this chip specifically to run large local models, plan to spend a little time on driver and runtime setup to get the full benefit.

Who it’s for

The Ryzen AI Max+ 395 makes sense for a specific and growing group: people who want to run large open-weight models locally, whether for privacy, offline use, cost control, or experimentation, and who value being able to load a 100B-plus model at all over running a smaller model at maximum speed. For a developer prototyping against big open models, a privacy-conscious team keeping data on-premises, or an enthusiast who wants a frontier-size model on their desk without a server rack, it is a compelling and affordable option.

It is the wrong choice if your priority is raw inference speed on models that already fit in a discrete GPU, or if you are running high-throughput multi-user inference where dedicated accelerators and their bandwidth win decisively. As always, match the hardware to the workload: the Ryzen AI Max+ 395 is a capacity machine, and it is an excellent one for the jobs that need capacity.

Frequently Asked Questions

What is the AMD Ryzen AI Max+ 395?

It is an APU from AMD’s Strix Halo family that combines 16 Zen 5 CPU cores, a 40-compute-unit RDNA 3.5 Radeon 8060S integrated GPU, and a 50 TOPS NPU, with up to 128GB of unified LPDDR5X memory shared between CPU and GPU. That large shared memory pool lets it load and run very large local AI models that will not fit on a single consumer discrete GPU.

Can it really run 120-billion-parameter models locally?

Yes. With 128GB of unified memory, it can load a model like the 120B GPT-OSS entirely in memory and run it at roughly 55 tokens per second, which is usable for single-user work. Smaller models are faster: around 100 tokens per second on a 30B model. The key is that the GPU can address the full 128GB pool, so models that a 16 or 24GB graphics card cannot hold will still run here.

Is it faster than a high-end discrete GPU?

Only on models the GPU cannot load. The chip is memory-bandwidth-constrained, at roughly 256 GB per second versus around 1,792 GB per second on an RTX 5090, and bandwidth largely determines token speed. For models that fit in a discrete GPU’s VRAM (roughly under 32B parameters), the discrete GPU is meaningfully faster. The Ryzen AI Max+ 395 wins on capacity, not raw speed.

How much does it cost?

As of mid-2026, a 64GB configuration starts around 1,499 dollars and the 128GB version is around 2,200 dollars, with the Framework Desktop at 128GB frequently cited as the best value near 1,999 dollars. Prices vary by vendor and configuration. The 128GB version is the one to get if running the largest models is the goal.

Where can I buy one?

In compact desktops and mini PCs rather than thin laptops: AMD’s Ryzen AI Halo developer platform (pre-orders began mid-2026, initially through Micro Center), the Framework Desktop, and mini PCs from vendors such as GMKtec. Confirm current availability and configuration with the vendor.

What software do I need to run models on it?

On Windows, LM Studio with the Vulkan backend works without special configuration and is the easiest path. Ollama also works but may need to be pointed at the GPU manually. On Linux you can allocate more memory to graphics (reports of around 110GB usable), which helps with the largest models, though it requires configuration. Plan for a little setup time.

Who should buy the Ryzen AI Max+ 395?

People who want to run large open-weight models locally for privacy, offline use, cost control, or experimentation, and who value being able to load a 100B-plus model over running a smaller one at maximum speed. It is not the right pick for maximum speed on models that already fit in a discrete GPU, or for high-throughput multi-user serving, where dedicated accelerators win.

Why does memory bandwidth matter so much?

Generating each token requires reading the model’s weights from memory, so the speed of a language model on a given machine is largely set by memory bandwidth. The Ryzen AI Max+ 395 has a large memory pool but modest bandwidth compared with a high-end GPU, which is why it can hold huge models but runs them at moderate speeds. Capacity and bandwidth are different things, and this chip trades toward capacity.

Digital Matters

IT Infrastructure Desk