Open source local models... deepseek-r1 isn't going to take your job (yet)
But why is no-one talking about the Middle Eastern models?
Let’s celebrate Chinese New Year by disrupting OpenAI’s business model. Let’s say that we want to run our models ourselves on our own computers: no cloud usage. We proclaim empowerment for individuals, the right to our own computation and all those techno-liberties.
There are many models that you can run on your own computer. It doesn’t matter how old your computer is, Google’s gemma2 or Microsoft’s phi3 is up to the job.
I assembled a collection of them and asked “Why is the sky blue?”
I ran these models on a 12 cpu i7-8700 system running at 3.20GHz with 64GB RAM, supported by a NVIDIA GeForce GTX 1050 Ti. The idea is that this should reflect what business desktops are likely to have in them this year. Laptops are a bit behind that, of course.
I also plotted the number of tokens output per second from each model that I had easy access to. That does (naturally) show a very strong relationship with the complexity of the model. Bigger and more complex models take longer to generate output.
Too slow
The larger models (e.g. the 70 billion parameter version of deepseek-r1) are too slow to for synchronous work. 1 token per second is roughly 40 words per minute. Most people can type faster than that for an extended time, and almost everyone can dictate faster than that. A model that slow is not useful in a chat, unless you needed to brainstorm some ideas. Something that has a wide variety of knowledge and takes a few minutes to come up with an idea is better than staring at a blank wall, but only just.
Generally, a big model like that only makes sense in integrated systems: it could generate answers to customer support queries while they are still in a queue; it could file email; it could do a quality review of other work; it could orchestrate other tasks. But nothing where you would wait for a response.
Too chatty
There wasn’t much relationship between model size — which should be a measure of how smart the model was — and how much it had to say in an answer.
The model I marked as “deepseek-r1_latest” has 7 billion parameters, and can generate 7.3 tokens per second on my computer — about 300 words a minute. You can just about keep up with reading it. But deepseek-r1 has been trained to spend a lot of time talking about the thing that it is supposed to say. It spent 2 minutes 12 seconds all up (about 9 tiktok videos): it wrote 582 words in the “<think>” section deciding on what it needed to say, and then the actual answer was 208 words.
So really, you are only getting about 2-3 useful tokens per second out of deepseek-r1. That’s not as fast as a human being, so it’s relegated to asynchronous work.
And that was lightning fast compared to the model I’ve marked as “qwq_latest” which spent more than 20 minutes answering the question. It’s a combination of being quite slow (1.2 tokens per second) and rambling on a lot. The end result is a lot slower than an distracted-by-social-media intern.
The challengers
So here’s my list of models that you can run on a moderate desktop or good laptop today, that output at a rate that is faster than you can type or read, that are complex enough that they can do interesting and useful things. They are also all free to use.
Remember: there is no security concern with these models. They are completely under your control and the data never leaves your computer. This makes them somewhat compelling regardless of who created them.
Microsoft’s Phi4 — trained on textbooks, it’s very good at giving textbook answers (unsurprisingly). The Phi models and QwQ were the only ones to talk about Rayleigh’s relationship: the amount of scattering is inversely proportional to the fourth power of the wavelength.
Alibaba’s Qwen — I was using the version that is tweaked to be good at programming, but it did fine on this (simple) question, as it does with most tasks.
Google’s Gemma 2 — the tiniest and fastest model, it even consider questions like “why isn’t the sky violet?”
Falcon3 — why is no-one talking about this? Funded by the Abu Dhabi government, it got straight to the point with the most succint answer to the question. It’s not like the Abu Dhabi government is going to give up and let the USA and China dictate their AI future for them.