this post was submitted on 22 Jan 2025
13 points (100.0% liked)

Technology

1147 readers
62 users here now

A tech news sub for communists

founded 2 years ago
MODERATORS
 

R1 utilizes a training method called direct reinforcement learning which is a form of unsupervised learning that forgoes the need for labelled data or explicit solutions. Instead, the model explores various approaches and generates multiple potential answers that are grouped and evaluated using a reward score. This score acts as a fitness function, allowing for learning and adjusting strategies over time. R1 progressively improves its problem-solving abilities by reinforcing successful approaches. This is a similar process to how humans learn to solve problems through trial and error.

you are viewing a single comment's thread
view the rest of the comments
[–] KrasnaiaZvezda@lemmygrad.ml 5 points 5 months ago

It is more expensive although Deepseek's model is quite cheaper than the others making this much less of a factor. Aditionally these "reasoning models" aren't necessecerilly better for every task though, so for many things a normal, and cheaper model, might be prefered still.