For various reasons, I was able to spend much more time on this topic since Sunday than I would usually have. On Sunday morning, the topic somehow picked my interested and I have been trying to understand as a Non-Expert what is going on here.
For full disclosure: I have no positions in any of the MAG7 stocks, but that might make me equally biased than someone who has mortgaged his family home to invest in NVDIA.
On Sunday Morning, I initially used mostly Twitter, but during the day this was overflooded with MAGA Crap. Twitter is still a good place at an early stage for “virally developing situations”, bit it gets washed with (AI written) turd pretty quickly.
The DeepSeek topic is interesting on many dimensions. Here are some facts (taken from Wikipedia, but confirmed by other sources):
- DeepSeek is a subsidiary of an AI/Quant Investment firm called HighFlyer based in China. It was span out in 2023 as a subsidiary, funded by the parents money and released their first really good model (V2) in May 2024, outperforming local Big Tech rivals and simultanously undercutting them massively on price.
- The model that caused the “Panic of January 27th”, was actually Deepseek R1, the reasoning model that was already released in November 2024 as a lite version, following by V3, a very powerful (normal) LLM in December
- On January 20th, DeepSeek then released the “full” R1 version which outperformed the competing ChatGPT o1 model in most dimensions (or was at least) equal.
So it took quite some time that people realized that there was a really powerful Chinese model out there. That timeline in my opinion also contradicts the “Hedge Fund releases top LLM model to make money by shorting MAG7 stocks” to a very large degree.
What seemed to have shocked most people in the beginning was the fact that Deepseek mentioned, that the pure “compute cost” of training was only 5 mn USD. This compares to a total of 1 bn USD “training cost” for ChatGPTs o1 model, for which OpenAI just started to charge 200 USD per month for unlimited access. One of the reason for the cheap cost was that they trained on a limited amount of old NVIDIA chips. At least for me, it was not able to compare those numbers even at a high level. What was included for instance in the 1 bn for ChatGPT ? Nobody really knwos.
Very soon, Twitter began to fill up with posts that this is all a Chinese Hoax, it cannot be, they have cheated, It’s a Chinese Psyop, they want to steal your data, they stole from the Great American models, they want to destabilize America etc. MAGA in full force. So if you checked out Twiter on Sunday afternoon, you would most likely believe that this is nothing.
However, The Chinese had not only granted access to the model through a web app, but offered it for free download as “open Source” model including a very detailed paper about what they did.
Some experts quickly pointed out, that the new model included indeed a couple of very smart “tweaks” or even architectural differences, that made the model not only easier to train but also more performant on old hardware.
It was also really interesting to see how the “Big Tech” guys reacted to Deepseek, depending on what their vested interest is:
So where does that leave us ? To be clear, I haven’t become an AI expert over the past 3 days. All I can do is to look at what people whon know much more than I are saying and weighing it with their vested interests.
So for me the most probable interpretation is as follows:
- DeepSeek is really a very model and surprirsed most of the American players
- Maybe the true training cost was higher than 5 mn USD, but the tweaks they made sugests that they were quite limited with computational resources
- The model seems to contain a couple of innovative features that makes it both, easier to train and run on less demanding hardware and therfore cheaper
So is this the “Black Swan” for the MAG7 ? Personally, I don’t think so. Overall AI adoption will clearly speed up if models are cheaper to train and cheaper to run.
Maybe some of the big players might scale back their data center plans somehow, maybe not. However, it makes the story more complex. The story so far was, that only with the newest NVIDIA chips you could develop a really good model. Access to the newest generation of NVIDIA chips was the single most important factor to determine the future of any AI start-up or other AI Model company.
I guess this will definitely change. New players will come out and offer models with great capabilities requiring a lot less CapEx than Xai, OpenAI, Anthropic etc. This will be great news for users, for the exisiting players it will mean that the cost of capital has increased for the time being. How many “professional” users will pay OpenAI 200 USD/month for something that they can download for free and run it for a fraction of the cost themselves ? I will assume that many of the current LLM developers will scramble to make their current cash buffers last longer than planned before the next funding round. And in the VC space, the 2024 AI vintage might look very bad in 12-18 months time already.
Therefore it is also not so surprising, that Apple, which so far did not officially develop LLM actually saw its share price increase. They will have much more partners to chose in the future and might easily be able to run “distilled” models on their phone, which could be a great value proposition for privacy minded customers.
But what about NVIDIA ? Honestly, I do not know. My best guess is that maybe in a few quarters, growth starts to go down a little bit, maybe not. From researching DeepSeek over 3 days, I am not able to understand their full business model and all implications from this.
Summery & take aways
Full disclosure: This post was written without the help of any LLM model, during my research, I did use various AI tools however.