DeepSeek-V3 and Reasoning Model R1: Four Views (a) Explanations (b) The Chinese Perspective (c) DeepSeek Impact on Demand for Inference Chips & Training Chips, and (d) LPBI Group: Expert Content for ML Models in Healthcare, Pharmaceutical, Medical and Life Sciences
Curator: Aviva Lev-Ari, PhD, RN
With the announcement of DeepSeek on January 27, 2025, it became compelling to cover several aspects of this hot Artificial Intelligence Technology.
This curation has four Parts:
Part A: Explanations
Part B: The Chinese Perspective
Part C: DeepSeek potential Impact on Demand for Inference Chips & Training Chips, and
Part D: LPBI Group: Expert Content for ML Models in Healthcare, Pharmaceutical, Medical and Life Sciences
Part A: Explanations by Morgan Brown
🧵 Finally had a chance to dig into DeepSeek’s r1…
Let me break down why DeepSeek’s AI innovations are blowing people’s minds (and possibly threatening Nvidia’s $2T market cap) in simple terms…0/ first off, shout out to @doodlestein who wrote the must-read on this here:
The Short Case for Nvidia StockAll the reasons why Nvidia will have a very hard time living up to the currently lofty expectations of the market.https://youtubetranscriptoptimizer.com/blog/05_the_short_case_for_nvda
1/ First, some context: Right now, training top AI models is INSANELY expensive. OpenAI, Anthropic, etc. spend $100M+ just on compute. They need massive data centers with thousands of $40K GPUs. It’s like needing a whole power plant to run a factory.
2/ DeepSeek just showed up and said “LOL what if we did this for $5M instead?” And they didn’t just talk – they actually DID it. Their models match or beat GPT-4 and Claude on many tasks. The AI world is (as my teenagers say) shook.
3/ How? They rethought everything from the ground up. Traditional AI is like writing every number with 32 decimal places. DeepSeek was like “what if we just used 8? It’s still accurate enough!” Boom – 75% less memory needed.
4/ Then there’s their “multi-token” system. Normal AI reads like a first-grader: “The… cat… sat…” DeepSeek reads in whole phrases at once. 2x faster, 90% as accurate. When you’re processing billions of words, this MATTERS.
5/ But here’s the really clever bit: They built an “expert system.” Instead of one massive AI trying to know everything (like having one person be a doctor, lawyer, AND engineer), they have specialized experts that only wake up when needed.
[color added by curator, See Part D, below]
6/ Traditional models? All 1.8 trillion parameters active ALL THE TIME. DeepSeek? 671B total but only 37B active at once. It’s like having a huge team but only calling in the experts you actually need for each task.
7/ The results are mind-blowing:
– Training cost: $100M → $5M
– GPUs needed: 100,000 → 2,000
– API costs: 95% cheaper
– Can run on gaming GPUs instead of data center hardware8/ “But wait,” you might say, “there must be a catch!” That’s the wild part – it’s all open source. Anyone can check their work. The code is public. The technical papers explain everything. It’s not magic, just incredibly clever engineering.
9/ Why does this matter? Because it breaks the model of “only huge tech companies can play in AI.” You don’t need a billion-dollar data center anymore. A few good GPUs might do it.
10/ For Nvidia, this is scary. Their entire business model is built on selling super expensive GPUs with 90% margins. If everyone can suddenly do AI with regular gaming GPUs… well, you see the problem.
11/ And here’s the kicker: DeepSeek did this with a team of <200 people. Meanwhile, Meta has teams where the compensation alone exceeds DeepSeek’s entire training budget… and their models aren’t as good.
12/ This is a classic disruption story: Incumbents optimize existing processes, while disruptors rethink the fundamental approach. DeepSeek asked “what if we just did this smarter instead of throwing more hardware at it?”
13/ The implications are huge:
– AI development becomes more accessible
– Competition increases dramatically
– The “moats” of big tech companies look more like puddles
– Hardware requirements (and costs) plummet14/ Of course, giants like OpenAI and Anthropic won’t stand still. They’re probably already implementing these innovations. But the efficiency genie is out of the bottle – there’s no going back to the “just throw more GPUs at it” approach.
15/ Final thought: This feels like one of those moments we’ll look back on as an inflection point. Like when PCs made mainframes less relevant, or when cloud computing changed everything.
AI is about to become a lot more accessible, and a lot less expensive. The question isn’t if this will disrupt the current players, but how fast.
/endP.S. And yes, all this is available open source. You can literally try their models right now. We’re living in wild times! 🚀
Momma, I’m going viral! No substack or gofundme to share but a few things to add/clarify:
1/ The DeepSeek app is not the same thing as the model. Apps are owned and operated by a Chinese corporation, the model itself is open source.
2/ Jevon’s paradox is the counter argument. Thanks papa @satyanadella. Could be a mix shift in chip type, compute type, etc. but we’re constrained by power and compute right now, not demand constrained.
3/ The techniques used are not ground breaking. It’s the combination of them w/the relative model performance that is so exciting. These are common eng techniques that combined really fly in the face of more compute is the only answer for model performance. Compute is no longer a moat.
4/ Thanks to all for pointing out my NVIDIA market cap numbers miss and other nuances – will do better next time, coach. 🫡
SOURCE
https://threadreaderapp.com/thread/1883686162709295541.html#google_vignette
Part B: The Chinese Perspective
© 2025 Jordan Schneider
![]() |
SOURCE
From: ChinaTalk <chinatalk@substack.com> on behalf of ChinaTalk <chinatalk@substack.com>
Reply-To: ChinaTalk <reply+2ktto4&8t4ds&&6fa8442469b96573268378f7538ff49c28c45589f5811b2a55b30e89ee8ff94d@mg1.substack.com>
Date: Tuesday, January 28, 2025 at 9:54 AM
To: Aviva Lev-Ari <avivalev-ari@alum.berkeley.edu>
Subject: DeepSeek: The View from China
And
https://www.chinatalk.media/p/deepseek-the-view-from-china
The Mystical DeepSeek. ‘The most important thing about DeepSeek is pushing intelligence’
- Founder and CEO Liang Wenfeng is the core person of DeepSeek. He is not the same type of person as Sam Altman. He is very knowledgeable about technology.
- DeepSeek has a good reputation because it was the first to release the reproducible MoE, o1, etc. It succeeded in acting early, but whether or not it did the absolute best remains to be seen. Moving forward, the biggest challenges are that resources are limited and can only be invested in the most high-potential areas. DeepSeek’s research and culture are still strong, and if given 100,000 or 200,000 chips, they might be able to do better.
- From its preview to its official release, DeepSeek’s model’s long-context capabilities have improved rapidly. DeepSeek’s long-context 20K can be achieved with very conventional methods.
- The CEO of Scale.ai said that DeepSeek has 50,000 chips, but that is definitely not reality. According to public information, DeepSeek had 10,000 old A100 chips and possibly 3,000 H800 cards before the ban. DeepSeek pays great attention to compliance and has not purchased any non-compliant GPUs, so it should have few chips. The way the United States uses GPUs is too extravagant.
- DeepSeek focused all its efforts on a single goal and subsequently gave up many things, such as multimodality. DeepSeek is not just serving people, but seeking intelligence itself, which may have been a key factor in its success.
- In some ways, quant trading can be said to be the business model of DeepSeek. Huanfang (another quantitative investment company founded by Liang Wenfeng) is the product of the last round of machine learning. DeepSeek’s highest priority is to push intelligence. Money and commercialization are not high priorities. China needs several leading AI labs to explore things that can beat OpenAI. Intelligence takes a long time to develop, and has begun to differentiate again this year, so new innovations are bound to result.
- From a technical perspective, DeepSeek has been instrumental as a training ground for talent.
- The business model of AI labs in the United States is not good either. AI does not have a good business model today and will require viable solutions in the future. Liang Wenfeng is ambitious; DeepSeek does not care about the model and is just heading towards AGI.
- Many of the insights from DeepSeek’s paper involve saving hardware costs. On a couple of big dimensions of scaling, DeepSeek’s techniques are able to reduce costs.
- In the short-term, everyone will be driven to think about how to make AI more efficient. In the long-run, questions about computing power will remain. Demand for compute remains strong and no company has enough.
- Discussing DeepSeek’s organization:
- When investing, we always choose the most advanced talent. But we see from DeepSeek’s model (the team is mostly smart young people who graduated from domestic universities) that a group that coheres well may also gradually advance their skills together. It has yet to be seen whether poaching one person might break DeepSeek’s advantage, but for now this seems unlikely.
- While there’s a lot of money in the market, DeepSeek’s core advantage is its culture. The research culture of DeepSeek and ByteDance are similar, and both are critical for determining the availability of funding and long-term viability. Only with an important business model can there be a sustainable culture. Both DeepSeek and ByteDance have very good business models.
- Why did DeepSeek catch up so fast?
- Reasoning models require high-quality data and training. For LLMs or multimodal AI, it’s difficult to catch up with a closed source model from scratch. The architecture of pure reasoning models hasn’t changed much, so it’s easier to catch up in reasoning.
- One reason R1 caught up quickly was that the task was not particularly difficult. Reinforcement learning only made the model choices more accurate. R1 did not break through the efficiency of Consensus 32, spending 32 times the efficiency, which is equivalent to moving from deep processing to parallelization, which is not pushing the boundaries of intelligence, just making it easier.
Pioneers vs. Chasers: ‘AI Progress Resembles a Step Function – Chasers Require 1/10th the Compute’
Points 13 – 17
[Points 18-48 was a long technical discussion we’ve machine-translated below]
Why didn’t the other companies take the DeepSeek approach: ‘Models from the big labs need to maintain a low profile’
Points 49, 50
The Divergence and Bets of 2025 Technology: ‘Can We Find Architectures Beyond Transformer?’
Points 51 – 56
Have developers moved from closed-source models to DeepSeek? ‘Not yet’
Points 57 – 62
OpenAI Stargate’s $500B Narrative and Changes in Computing Power Demand
- The emergence of DeepSeek has led people to question the latest $500B narrative from Nvidia and OpenAI. There’s no verdict yet on compute — and OpenAI’s $500B narrative is their attempt to throw themselves a lifeline.
- Regarding the doubts about OpenAI’s $500B infrastructure investment: because OpenAI is a commercial company, it could be risky if debt is involved.
- $500B is an extreme number — likely to be executed over 4 or 5 years. SoftBank and OpenAI are the leading players (the former providing capital, the latter technology) — but SoftBank’s current funds can’t support $500B; rather SoftBank is using its assets as collateral. OpenAI, meanwhile, isn’t very cash-rich either, and other AI companies are more technical participants than they are funding providers. So it will be a struggle to fully realize the $500B vision.
- OpenAI’s $500B computing power makes sense: during the exploration phase, the cost of trial and error is high, with both human and investment costs being substantial. But although the path isn’t clear and getting from o1 to R1 won’t be easy, at least we can see what the finish line looks like: we can track the intermediate markers, and from day one, aim for others’ proven end states; this gives us a better bearing on our progress. Being at the frontier exploring the next generation is most resource-intensive. The followers don’t bear exploration costs — they’re always just following. If Google/Anthropic succeed in their exploration areas, they might become the frontier company.
- In the future, Anthropic might replace all their inference with TPU or AWS chips.
- Domestic Chinese companies were previously constrained by computing power, but now it’s proven that the potential technical space is vast. For more efficient models, we might not need especially large cards — we can provide relatively customized chips that can be adapted for compatibility with AMD and ASIC. From an investment perspective, Nvidia’s moat is very high, but ASIC will have yet greater opportunities.
- The DeepSeek situation isn’t really about compute — it’s about America realizing China’s capabilities and efficiency.DeepSeek isn’t Nvidia’s vulnerability; Nvidia will grow as long as AI grows. Nvidia’s strength is its ecosystem, which has been built up over a long time. Indeed, when technology develops rapidly, the ecosystem is crucial. The real crisis comes, though, when technology matures like electricity: it becomes commoditized; then, everyone will focus on products, and many ASIC chips will emerge for specific scenario optimization.
Impact on the Secondary Market: ‘Short-term sentiment is under pressure, but the long-term narrative continues’
Points 70 – 74
Open-Source vs Closed Source: ‘If capabilities are similar, closed source will struggle.’
Points 75 – 78
The Impact of DeepSeek’s Breakthrough: ‘Vision Trumps Technology’
- DeepSeek’s breakthrough made the outside world realize China’s AI strength. Previously, outsiders thought China’s AI progress lagged America by two years, but DeepSeek shows the gap is actually 3 to 9 months, and in some areas, even shorter.
- When it comes to technologies and sectors that America has historically blocked China from accessing, if China can break through nonetheless, those sectors ultimately become highly competitive. AI might follow this pattern — and DeepSeek’s success may well prove this.
- DeepSeek didn’t suddenly explode. R1’s impressive results reverberated throughout America’s entire AI establishment.
- DeepSeek stands on the shoulders of giants — but exploring the frontier still requires much more time and human capital cost. R1 doesn’t mean that future training costs will decrease.
- AI explorers definitely need more computing power; China, as a follower, can leverage its engineering advantages. How Chinese large-model teams use less computing power to produce results, thereby having some definite resilience — or even doing better — might end up being how the US-China AI landscape plays out in the future.
- China is still replicating technical solutions; reasoning was proposed by OpenAI in o1, so the next gap between various AI labs will be about who can propose the next reasoning. Infinite-length reasoning might be one vision.
- The core difference between different AI labs’ models lies not in technology, but in what each lab’s next vision is.
- After all, vision matters more than technology.
Technical Discussion
There was a deep technical discussion in the article that we’ve machine-translated below.
Technical Detail 1: Supervised Fine-Tuning (SFT). ‘No need for SFT on the reasoning level’
Points 18 – 27
Technical Detail 2: Data. ‘DeepSeek values data annotation’
Points 28 – 30
Technical Detail 3: Distillation. ‘The limit of distillation is that model diversity drops
Points 31 – 43
Technical Detail 4: Process Reward. ‘The upper limit of process reward is human, but the upper limit of outcome supervision is the model itself.’
Points 44 – 48
SOURCE of the Chinese Perspective
Part C: DeepSeek Impact on Demand for “Inference Chips” and “Training Chips”
Watch Full Interviews with Ark’s Cathie Wood
- Ark’s Wood on DeepSeek, AI, Crypto, Trump | Cathie Wood Full Interview
https://youtu.be/EKELCEW8lNo?si=Zri9QqcMHsESgO8N
- Cathie Wood Talks DeepSeek Lessons, Musk, Driverless Cars & UK
https://youtu.be/aThejSuMX-I?si=e9uM7TpoQ1Neb-cT
“Inference Chips” and “Training Chips”: Technology explained
AI Chips Explained: Training vs. Inference Processors Unveiled
Inference chips and training chips are both types of AI chips that serve different purposes. Training chips are used to develop AI models, while inference chips are used to deploy those models in real-world applications.
An “inference chip” is designed to efficiently execute a trained AI model on new data to make predictions in real-time, prioritizing low latency and power consumption, while a “training chip” is optimized for the computationally intensive process of initially training a machine learning model, requiring high processing power and memory bandwidth, often at the cost of power efficiency; essentially, inference chips are for “applying” the learned model, while training chips are for “learning” the model itself.
SOURCE
Training vs. Inference (But, Really: Training Then Inference)
To recap: the AI training stage is when you feed data into your learning algorithm to produce a model, and the AI inference stage is when your algorithm uses that training to make inferences from data. Here’s a chart for quick reference:
|
Table |
Inference |
|
Feed training data into a learning algorithm |
Apply the model to the inference data |
|
Produces a model comprising code and data |
Produces output data |
|
One time-ish (Requirement to retain training data in case of re-training.) |
Often continuous |
|
Inference |
|
Apply the model to the inference data |
|
Produces output data |
|
Often continuous |
The difference may seem inconsequential at first glance, but defining these two stages helps to show implications for AI adoption particularly with businesses. That is, given that it’s much less resource intensive (and therefore, less expensive), it’s likely to be much easier for businesses to integrate already-trained AI algorithms with their existing systems.
And, as always, we’re big believers in demystifying terminology for discussion purposes. Let us know what you think in the comments, and feel free to let us know what you’re interested in learning about next.
SOURCE
AI 101: Training vs. Inference
November 9, 2023 by Stephanie Doyle
https://www.backblaze.com/blog/ai-101-training-vs-inference/
“AI is really two markets, training and inference. Inference is going to be 100 times bigger than training. Nvidia is really good at training but very miscast at inference.” – Chamath Palihapitiya
Let’s discuss.
Below I layout AMD investor relevant time stamps:
7:35 – Meta AI business strategy
10:00 – Open source impact on LLM marketplace
12:10 – Telecom analogy (capex discussion)
16:35 – Closed source model economic viability
19:50 – Meta overspend on training (Nvidia)
SOURCE
https://www.reddit.com/r/AMD_Stock/comments/1cf765y/ai_is_really_two_markets_training_and_inference/
Part D: LPBI Group: Expert Content for ML Models in Healthcare, Pharmaceutical, Medical and Life Sciences
LPBI Group’s Journal http://pharmaceuticalintelligence.com had a fully developed ontology for the Healthcare, Pharmaceutical, Medical and Life Sciences domains of knowledge.
The ontology comprises of +750 categories of research. Each category consists of multiple scientific articles that were curated by domain knowledge experts in the fields of Healthcare, Pharmaceutical, Medical and Life Sciences.
- Each article is a token, a Non Fungible Token (NFT) = a mutually exclusive scientifically written piece that makes a Prior Art artifact from the intellectual property law perspective and copyright law.
- Each category of research is “An expert system knowledge base”
- Examples: The last column in this table represents the number of articles in this category of research
- Each curation is written by an expert in this domain, and
- Each one of the 469 articles in Example #1, in this category of research had been assigned THIS category by an EXPERT in this domain.
- The universe of 469 articles represents an “Expert System Knowledge Base” in the domain of biological networks, gene regulation and evolution
- Example #1 comprises of 469 NFTs
- Example #2 comprises of 1,022 NFTs
- Example #3 comprises of 681 NFTs
- An ML model can be trained on the content of a Master file that included the content of all the 469 article files mentioned in Example #1 – that process is performed on Training Chips
- The outcomes of the model involve the phase of Inference. That process is performed on Inference Chips.
Example #1: 469 articles in Biological Networks, Gene Regulation and Evolution
Expert, Author, Writer (EAW): Dr. Larry Bernstein
Degree: BS, MS, MD
Specialty: Clinical Pathology
e-Mail: larry.bernstein@gmail
Points (a) to (f) are applicable as well to Example #2, and #3, below. Or for any other category of research from the universe of +750 categories that consists of +50 articles
Example #2: 1,022 articles in CANCER BIOLOGY & Innovations in Cancer Therapy
Contributor EAW: Prabodh kumar Kandala, PhD Specialty: Preclinical Oncology, Prabodh.kandala@gmail.com
Contributor EAW: Ritu Saxena, PhD
ritu.uab@gmail.com
Contributor EAW: Dr. Larry Bernstein
Degree: BS, MS, MD
Specialty: Clinical Pathology
e-Mail: larry.bernstein@gmail.com
Contributor EAW: Stephen J. Williams
Degree: Ph.D. Pharmacology
Specialty: cancer pharmacology, ovarian specialty
e-Mail: sjwilliamspa@comcast.net
Phone: 215-487-0259
Contributor EAW: Tilda Barliya
Degree: PhD
Specialty: Cancer biology, cell biology, nanotechnology and drug delivery
e-Mail: tildabarliya@gmail.com
Phone: +972-50-8622289
Example #3: 681 articles in Frontiers in Cardiology and Cardiovascular Disorders
EAW: Aviva Lev-Ari, PhD, RN
EAW: Justin D. Pearlman
Degree: MD ME PhD MA FACC
Specialty: Internal Medicine, Cardiology, Cardiovascular Radiology, Image Processing, Computer Science, Electronic Records
jdpmdphd@gmail.com
Phone:617-894-6888
Respectively, the categories of research are
- “Expert systems domain knowledge bases”
- They are ready for ML model development in each of the domains that a category comprises more than 50 articles.
- Total number of categories of research in the Journal’s Ontology N = 757 on 1/28/2025

Leave a Reply