DeepSeek's Story Is Deeper than You Imagine

DeepSeek's Story Is Deeper than You Imagine

Let's cut to the chase. The rise of DeepSeek turned out to be more politics than technology. Some would even argue the entire existence of DeepSeek is a psyop. Nonetheless, the past week has been quite hectic for tech giants like OpenAI. A new player in the AI market—Chinese company DeepSeek—took the world by storm as their latest R1 model was found to outperform its American counterparts in various areas such as programming. Believe it or not, that isn't why this story is fascinating. For ChatGPT and Claude—'the kings of AI'—you get access to their latest models for $20 each. DeepSeek, on the other hand, grants you open-source software that can run locally on your machine and outperform the aforementioned giants for exactly ZERO bucks.

There is great divide amongst individuals who engage with artificial intelligence on the legitimacy of DeepSeek, where some undermine its advancements and others praise it for embarrassing 'monopolists' like NVIDIA. Now, this isn't a page about politics, and we have no reason to advocate for either side. We just want to give you a breakdown of the entire story.

What Is DeepSeek?

DeepSeek is like OpenAI, a company that does R&D in artificial intelligence. It was founded in mid 2023 and funded by Chinese hedge fund High-Flyer. Where things get a bit crazy is when it's asserted that their model costed just $5.5 million to develop, compared to OpenAI's $100 million. This caused an economic tsunami that drowned NVIDIA's market value by around $600 billion. You might be wondering if that $5.5-million number is actually the total cost. Of course, it's not. DeepSeek, themselves, make clear that the $5.5 million were for a single training run. What that means is that it does not put into account GPU costs, running costs, and R&D. For further clarification, we aren't claiming to have caught DeepSeek red-handed; they actually mention this themselves in their own research paper:

Lastly, we emphasize again the economical training costs of DeepSeek-V3, summarized in Table 1, achieved through our optimized co-design of algorithms, frameworks, and hardware. During the pre-training stage, training DeepSeek-V3 on each trillion tokens requires only 180K H800 GPU hours, i.e., 3.7 days on our cluster with 2048 H800 GPUs. Consequently, our pre- training stage is completed in less than two months and costs 2664K GPU hours. Combined with 119K GPU hours for the context length extension and 5K GPU hours for post-training, DeepSeek-V3 costs only 2.788M GPU hours for its full training. Assuming the rental price of the H800 GPU is $2 per GPU hour, our total training costs amount to only $5.576M. Note that the aforementioned costs include only the official training of DeepSeek-V3, excluding the costs associated with prior research and ablation experiments on architectures, algorithms, or data.

So no, it wasn't a $5.5 million side project that toppled OpenAI.

Is the Model Actually Good?

DeepSeek is very interesting because of its training methodology. OpenAI utilizes supervised, instruction-based learning to train their models; on the other hand, DeepSeek uses reinforcement learning. What does that actually mean though? Training DeepSeek is like training a dog. When it does something right, it is rewarded. Conversely, when it does something wrong, it is—you can say—'punished'. This results in DeepSeek using far more rationale than its competitors, making it far more skilled in logic. You can actually see how this works when DeepSeek is 'Thinking...":

DeepSeek Reasoning With Itself
DeepSeek Reasoning with Itself

Do you notice how it's talking to itself in order to determine how to respond? It does this utilizing all the previous reinforcement learning and coming up with a conclusion, which makes it less susceptible to hallucinations.

How?

We're not entirely sure how DeepSeek managed to achieve this with supposedly lower costs, older hardware, and distinct training methods. You see, Chinese firms are subject to U.S. export sanctions, which legally restrict them from accessing the latest cutting-edge hardware technology. This leaves two possibilities: either the export ban isn't effective, and Chinese companies are illegally acquiring these AI chips, or they’ve found a way to innovate in efficiency and research collaboration because export bans forced them to be self-sufficient. In any case, this is perfect for us consumers.

Good for Us

To be frank, most people don't actually care about DeepSeek's possible association with the CCP. Let's face it: politics is not going to save Anthropic, OpenAI, and Musk from being forced into consumer-friendly practices like competitive pricing. Even if DeepSeek were to be proven guilty of malicious data collection and censorship, the average user will not care, and this is proof of that:

Americans flock to Chinese TikTok alternative RedNote: ‘We have the same struggles’
Despite security concerns, Americans are flooding the app, where Chinese users are welcoming them with open arms – and Luigi Mangione memes

This leaves American AI firms with only one choice: developing powerful models with greater efficiency and cheaper costs.