DeepSeek's AI Distillation

Shrinking large language models for fun and profit.

Feb 25, 2025

Training a large language model takes a lot of hardware and a lot of time. DeepSeek “distilled” information from other LLMs and their training only cost $5.5 million. The video contains reference to a similar issue that cost only $450 in compute time.

Another thing DeepSeek had going for them is human resources - first they had the people needed to hand code their stuff in assembly language, rather using a high level programming language. Their second advantage is that they put a lot of time into a distributed workload management system, rather than counting on brute force capability of the latest Nvidia gear to plow through training.

Providers all need to make some money, but what they really want is to reach AGI - Artificial General Intelligence. This has long been a philosophical notion, but today it’s no longer crazy talk to suggest that we’ll build such a thing.

🇺🇦 Netwar Irregulars Bulletin 🇺🇦

Discussion about this post