Today I learned I can run my very own DeepSeek R1 chatbot on just $6,000 of PC hardware and no megabucks Nvidia GPUs required
The catch is that it's only really fast enough to serve one user with semi-useful gibberish at a time.
Got the impression that a bazillion dollar's worth of GPUs are required to run a cutting-edge chatbot? Think again. Matthew Carrigan, an engineer at AI tools outfit HuggingFace, claims that you can run the hot new DeepSeek R1 LLM on just $6,000 of PC hardware. The kicker? You don't even need a high-end GPU.
Carrigan's suggested build involves a dual-socket AMD EPYC motherboard and a couple of compatible AMD chips to go with it. Apparently, the spec of the CPUs isn't actually that critical. Instead, it's all about the memory.
Complete hardware + software setup for running Deepseek-R1 locally. The actual model, no distillations, and Q8 quantization for full quality. Total cost, $6,000. All download and part links below:January 28, 2025
"We are going to need 768GB (to fit the model) across 24 RAM channels (to get the bandwidth to run it fast enough). That means 24 x 32 GB DDR5-RDIMM modules," Carrigan explains.
Links are helpfully provided and the RAM alone comes to about $3,400. Then you'll need a case, PSU, a mere 1 TB SSD, some heatsinks and fans.
Indeed, Carrigan says this setup gets you the full DeepSeek R1 experience with no compromises. "The actual model, no distillations, and Q8 quantization for full quality," he explains.
From there, simply "throw" on Linux, install llama.cpp, download 700 GB of weights, input a command line string Carrigan helpfully provides and Bob's your large language model running locally, as they say.
Notable in all this is a total absence of mention of expensive Nvidia GPUs. So what gives? Well, Carrigan provides a video of the LLM running locally on this setup plus a rough performance metric.
The biggest gaming news, reviews and hardware deals
Keep up to date with the most important stories and the best deals, as picked by the PC Gamer team.
"The generation speed on this build is 6 to 8 tokens per second, depending on the specific CPU and RAM speed you get, or slightly less if you have a long chat history. The clip above is near-realtime, sped up slightly to fit video length limits," he says.
The video shows the model generating text at a reasonable pace. But that, of course, is for just one user. Open this setup out to multiple users and the per-user performance would, we assume, quickly become unusable.
In other words, that's $6,000 of hardware to support, in effect, a single user. So, this likely isn't an approach that's practical for setting up an AI business serving hundreds, thousands or even millions of users. For that kind of application, GPUs may well be more cost effective, even with their painful unit price.
Carrigan suggests a build relying on GPUs might run up to triple-digits pretty quickly, albeit with better performance.
And if you got this far: Yes, there's no GPU in this build! If you want to host on GPU for faster generation speed, you can! You'll just lose a lot of quality from quantization, or if you want Q8 you'll need >700GB of GPU memory, which will probably cost $100k+January 28, 2025
Best CPU for gaming: The top chips from Intel and AMD.
Best gaming motherboard: The right boards.
Best graphics card: Your perfect pixel-pusher awaits.
Best SSD for gaming: Get into the game ahead of the rest.
But it is intriguing to learn that you don't actually need a bazillion dollar's worth of GPUs to get a full-spec LLM running locally. Arguably, it also provides insight into the true scale of intelligence implied by the latest LLMs.
As an end user experiencing what can seem like consciousness streaming out of these bots, the assumption is that it takes huge computation to generate an LLM's output. But this setup is doing it on a couple of AMD CPUs.
So, unless you think a couple of AMD CPUs is capable of consciousness, this hardware solution demonstrates the prosaic reality of even the very latest and most advanced LLMs. Maybe the AI apocalypse isn't quite upon us after all.
Jeremy has been writing about technology and PCs since the 90nm Netburst era (Google it!) and enjoys nothing more than a serious dissertation on the finer points of monitor input lag and overshoot followed by a forensic examination of advanced lithography. Or maybe he just likes machines that go “ping!” He also has a thing for tennis and cars.