this post was submitted on 27 Jan 2025
139 points (97.3% liked)
Futurology
1943 readers
80 users here now
founded 2 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
From someone in the field
https://github.com/huggingface/open-r1
Unfortunately, that's not very clear without more. What kind of reward model are they talking about?
This is potentially a 1000x difference in required resources here, assuming you believe their DeepSeek's quoted figure for spending, so it would have to be an extraordinary change.