TASM April 25, 2024

Sat May 11, 2024

Note: This post was really late. I've been taking time to do more programming than I've done since my younger startup days. As I write this note, we're in the second week of May and I'm playing catch-up with various Antler and TASM notes. Sorry, kinda not sorry.


We aren't going through Zvi's thing this week because our main news guy isn't here, and no one else has read the entire thing yet. So we're just talking about interesting things we know about.

The Talk - RLHF

What is RL?

RL is a way of training models by providing them with an environment, having them take actions, and providing reward for the actions you want.

How is RL different from ML?

What is RLHF?

The architecture of RLHF

The "human feedback" part can be thought of as a process that acts on the reward predictor.

