TASM April 25, 2024

Sat May 11, 2024

Note: This post was really late. I've been taking time to do more programming than I've done since my younger startup days. As I write this note, we're in the second week of May and I'm playing catch-up with various Antler and TASM notes. Sorry, kinda not sorry.


We aren't going through Zvi's thing this week because our main news guy isn't here, and no one else has read the entire thing yet. So we're just talking about interesting things we know about.

The Talk - RLHF

What is RL?

RL is a way of training models by providing them with an environment, having them take actions, and providing reward for the actions you want.

How is RL different from ML?

What is RLHF?

The architecture of RLHF

The "human feedback" part can be thought of as a process that acts on the reward predictor.

Creative Commons License

all articles at langnostic are licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License

Reprint, rehost and distribute freely (even for profit), but attribute the work and allow your readers the same freedoms. Here's a license widget you can use.

The menu background image is Jewel Wash, taken from Dan Zen's flickr stream and released under a CC-BY license