1060
            
              AI companies are violating a basic social contract of the web and and ignoring robots.txt
 
            
            (www.theverge.com)
          
          This is a most excellent place for technology news and articles.
Better yet, point the crawler to a massive text file of almost but not quite grammatically correct garbage to poison the model. Something it will recognize as language and internalize, but severely degrade the quality of its output.
Maybe one of the lorem ipsum generators could help.