Uploads from Dwarkesh Patel

Watch and track your favorite playlist.

Curated by: Dwarkesh Patel (977 videos)

Currently Playing: How GPT, Claude, and Gemini are actually trained and served – Reiner Pope

Did a very different format with Reiner Pope – a blackboard lecture where he walks through how frontier LLMs are trained and served. It's shocking how much you can deduce about what the labs are doing from a handful of equations, public API prices, and some chalk. It’s a bit technical, but I encourage you to hang in there - it’s really worth it. There are less than a handful of people who understand the full stack of AI, from chip design to model architecture, as well as Reiner. It was a real delight to learn from him. Reiner is CEO of MatX, a new chip startup (full disclosure - I’m an angel investor). He was previously at Google, where he worked on software efficiency, compilers, and TPU architecture. 𝐄𝐏𝐈𝐒𝐎𝐃𝐄 𝐋𝐈𝐍𝐊𝐒 * Wrote up some flashcards and practice problems to help myself retain what Reiner taught. Hope it's helpful to you too! https://reiner-flashcards.vercel.app/ * Download markdown of transcript here to chat with an LLM: https://gist.github.com/dwarkeshsp/79100f0fdeed69d76241903bb0604dbe * Transcript: https://www.dwarkesh.com/p/reiner-pope 𝐒𝐏𝐎𝐍𝐒𝐎𝐑𝐒 - Jane Street needs constant access to incredibly low-latency compute. I recently asked one of their engineers, Clark, to talk me through how they meet these demands. Our conversation—which touched on everything from FPGAs to liquid cooling—was extremely helpful as I prepped to interview Reiner. You can watch the full discussion and explore Jane Street’s open roles at https://janestreet.com/dwarkesh - Google’s Gemma 4 is the first open model that’s let me shut off the internet and create a fully disconnected "focus machine". This is because Gemma is small enough to run on my laptop, but powerful enough to actually be useful. So, to prep for this interview, I downloaded Reiner’s scaling book, disconnected from wifi, and used Gemma to help me break down the material. Check it out at https://goo.gle/Gemma4 - Cursor helped me turn some notes I took on how gradients flow during large-scale pretraining into a great animation. At first, I wasn’t sure the best way to visualize the concept, but Cursor’s Composer 2 Fast model let me iterate on different ideas almost instantaneously. You can check out the animation in my recent blog post: https://www.dwarkesh.com/p/what-i-learned-april-15. And if you have something to visualize yourself, go to https://cursor.com/dwarkesh 𝐓𝐈𝐌𝐄𝐒𝐓𝐀𝐌𝐏𝐒 0:00:00 – How batch size affects token cost and speed 0:31:59 – How MoE models are laid out across GPU racks 0:47:02 – How pipeline parallelism spreads model layers across racks 1:03:27 – Why Ilya said, “As we now know, pipelining is not wise.” 1:18:49 – Because of RL, models may be 100x over-trained beyond Chinchilla-optimal 1:32:52 – Deducing long context memory costs from API pricing 2:03:52 – Convergent evolution between neural nets and cryptography

Uploads from Dwarkesh Patel

Currently Playing: How GPT, Claude, and Gemini are actually trained and served – Reiner Pope

Tracks in this Playlist