I haven't looked into many LLMs, but Microsoft will use your data for training the next version of Copilot. If you're a paying enterprise customer, then your data won't be used for that.
I suspect Google is also using every bit of data they can get their hands on. They have a habit of handing out shiny new stuff in exchange for your data. That's exactly why Android and Chrome don't require your money.
There might be a way to mitigate that damage. You could categorize the training data by the source. If it's verified to be written by a human, you could give it a bigger weight. If not, it's probably contaminated by AI, so give it a smaller weight. Humans still exist, so it's still possible to obtain clean data. Quantity is still a problem, since these models are really thirsty for data.