AdReads
Library / Cursor / Read · Jun 5

Cursor on Dwarkesh Podcast

Podcast: Dwarkesh Podcast
Host: Dwarkesh Patel
Episode: Alex Imas and Phil Trammell – What remains scarce after AGI?
Publish date: June 5, 2026
post-rollUnknownSaaS
0:00 / 1:13
@ 60:16 in ep

Transcript

One of the biggest problems in RL right now is credit assignment because you have these extremely long rollouts, and you need to know why they succeeded or failed. One of Crozer's researchers, Sasha Rush, gave me a Blackboard lecture on how they use targeted RL with textual feedback to deal with this problem and train composer 2.5. I filmed on my iPhone, so apologies for the camera work. So we've generated this output. Yeah. It's just a sequence of tokens. We're gonna send those sequence of tokens to this model that's gonna read it. And then it's gonna isolate a specific, say, time that it says is problematic. Yep. Then we're just gonna do text manipulation. We're just gonna take that trajectory, and we're literally just gonna, like, smash in some extra tokens. After Cursor injects these hint tokens, they run another forward pass. The trajectory itself doesn't change, but the hint causes the model to assign lower probability to the error tokens. Cursor then trains the original model to match those probabilities, basically teaching it to down weight these specific mistakes. There's a lot more nuance that we couldn't include in this mineral. If you wanna watch the full thing, I posted it on my Twitter. And if you wanna try out Composer 2.5, head to cursor.com/dorkesh.

Auto-transcribed · lightly cleaned · Report an error

More from Cursor