Discussion about this post

User's avatar
Daniel Popescu / ⧉ Pluralisk's avatar

Wow, the point about RLHF not obviously fitting sequential decision-making, like you discussed in your last post, really got me thinking. Does this suggest the 'state' for LLM alignment is inherently more nebulos or continuous than in typical RL scenarios?

Expand full comment

No posts