Tangential but I understand how these reasoning based systems are supposed to work. I saw some sample output from R1 and it looks like it generates a thought process for answering the prompt before actually answering it. I can see how it would make it more likely to answer more logically but the thinking part was like 2 to 3 times longer than the actual answer.
I am assuming OpenAI, Anthropic etc. are doing something similar. As a concept I see nothing wrong with it but since these services charge per token wouldn't this process balloon the context size? It would make querying them much more expensive.