this post was submitted on 04 Jul 2025
24 points (100.0% liked)

technology

23876 readers
207 users here now

On the road to fully automated luxury gay space communism.

Spreading Linux propaganda since 2020

Rules:

founded 5 years ago
MODERATORS
 

Instead of just generating the next response, it simulates entire conversation trees to find paths that achieve long-term goals.

How it works:

  • Generates multiple response candidates at each conversation state
  • Simulates how conversations might unfold down each branch (using the LLM to predict user responses)
  • Scores each trajectory on metrics like empathy, goal achievement, coherence
  • Uses MCTS with UCB1 to efficiently explore the most promising paths
  • Selects the response that leads to the best expected outcome

Limitations:

  • Scoring is done by the same LLM that generates responses
  • Branch pruning is naive - just threshold-based instead of something smarter like progressive widening
  • Memory usage grows with tree size, there currently no node recycling
you are viewing a single comment's thread
view the rest of the comments
[–] yogthos@lemmygrad.ml 2 points 2 weeks ago* (last edited 2 weeks ago)

I think that's an interesting approach as well. There are a bunch of research papers on using MCTS with LLMs, a few examples here:

https://arxiv.org/abs/2503.19309

https://arxiv.org/abs/2505.23229

https://arxiv.org/abs/2504.02426

https://arxiv.org/abs/2504.11009

https://arxiv.org/abs/2502.13428