Uploads from Programming Throwdown

Watch and track your favorite playlist.

Curated by: Programming Throwdown (374 videos)

Currently Playing: 180: Reinforcement Learning

Intro topic: Grills News/Links: * You can’t call yourself a senior until you’ve worked on a legacy project * https://www.infobip.com/developers/blog/seniors-working-on-a-legacy-project ( https://www.infobip.com/developers/blog/seniors-working-on-a-legacy-project ) * Recraft might be the most powerful AI image platform I’ve ever used — here’s why * https://www.tomsguide.com/ai/ai-image-video/recraft-might-be-the-most-powerful-ai-image-platform-ive-ever-used-heres-why ( https://www.tomsguide.com/ai/ai-image-video/recraft-might-be-the-most-powerful-ai-image-platform-ive-ever-used-heres-why ) * NASA has a list of 10 rules for software development * https://www.cs.otago.ac.nz/cosc345/resources/nasa-10-rules.htm ( https://www.cs.otago.ac.nz/cosc345/resources/nasa-10-rules.htm ) * AMD Radeon RX 9070 XT performance estimates leaked: 42% to 66% faster than Radeon RX 7900 GRE * https://www.tomshardware.com/tech-industry/amd-estimates-of-radeon-rx-9070-xt-performance-leaked-42-percent-66-percent-faster-than-radeon-rx-7900-gre ( https://www.tomshardware.com/tech-industry/amd-estimates-of-radeon-rx-9070-xt-performance-leaked-42-percent-66-percent-faster-than-radeon-rx-7900-gre ) Book of the Show * Patrick: * The Player of Games (Ian M Banks) * https://a.co/d/1ZpUhGl ( https://a.co/d/1ZpUhGl ) (non-affiliate) * Jason: * Basic Roleplaying Universal Game Engine * https://amzn.to/3ES4p5i ( https://amzn.to/3ES4p5i ) Patreon Plug https://www.patreon.com/programmingthrowdown?ty=h ( https://www.patreon.com/programmingthrowdown?ty=h ) Tool of the Show * Patrick: * Pokemon Sword and Shield * Jason: * Features and Labels ( https://fal.ai ( https://fal.ai ) ) Topic: Reinforcement Learning * Three types of AI * Supervised Learning * Unsupervised Learning * Reinforcement Learning * Online vs Offline RL * Optimization algorithms * Value optimization * SARSA * Q-Learning * Policy optimization * Policy Gradients * Actor-Critic * Proximal Policy Optimization * Value vs Policy Optimization * Value optimization is more intuitive (Value loss) * Policy optimization is less intuitive at first (policy gradients) * Converting values to policies in deep learning is difficult * Imitation Learning * Supervised policy learning * Often used to bootstrap reinforcement learning * Policy Evaluation * Propensity scoring versus model-based * Challenges to training RL model * Two optimization loops * Collecting feedback vs updating the model * Difficult optimization target * Policy evaluation * RLHF & GRPO Programming Throwdown Episode 180 March 17, 2025 ★ Episode details: https://share.transistor.fm/s/3d34f87a ★ Additional episodes: https://www.programmingthrowdown.com/

Uploads from Programming Throwdown

Currently Playing: 180: Reinforcement Learning

Tracks in this Playlist

✅ Progress Tracking

⏯️ Resume Playback

📱 Cross-Device Sync

Start Organizing Your YouTube Playlists