this post was submitted on 24 May 2025
1395 points (99.0% liked)
Science Memes
14653 readers
3258 users here now
Welcome to c/science_memes @ Mander.xyz!
A place for majestic STEMLORD peacocking, as well as memes about the realities of working in a lab.
Rules
- Don't throw mud. Behave like an intellectual and remember the human.
- Keep it rooted (on topic).
- No spam.
- Infographics welcome, get schooled.
This is a science community. We use the Dawkins definition of meme.
Research Committee
Other Mander Communities
Science and Research
Biology and Life Sciences
- !abiogenesis@mander.xyz
- !animal-behavior@mander.xyz
- !anthropology@mander.xyz
- !arachnology@mander.xyz
- !balconygardening@slrpnk.net
- !biodiversity@mander.xyz
- !biology@mander.xyz
- !biophysics@mander.xyz
- !botany@mander.xyz
- !ecology@mander.xyz
- !entomology@mander.xyz
- !fermentation@mander.xyz
- !herpetology@mander.xyz
- !houseplants@mander.xyz
- !medicine@mander.xyz
- !microscopy@mander.xyz
- !mycology@mander.xyz
- !nudibranchs@mander.xyz
- !nutrition@mander.xyz
- !palaeoecology@mander.xyz
- !palaeontology@mander.xyz
- !photosynthesis@mander.xyz
- !plantid@mander.xyz
- !plants@mander.xyz
- !reptiles and amphibians@mander.xyz
Physical Sciences
- !astronomy@mander.xyz
- !chemistry@mander.xyz
- !earthscience@mander.xyz
- !geography@mander.xyz
- !geospatial@mander.xyz
- !nuclear@mander.xyz
- !physics@mander.xyz
- !quantum-computing@mander.xyz
- !spectroscopy@mander.xyz
Humanities and Social Sciences
Practical and Applied Sciences
- !exercise-and sports-science@mander.xyz
- !gardening@mander.xyz
- !self sufficiency@mander.xyz
- !soilscience@slrpnk.net
- !terrariums@mander.xyz
- !timelapse@mander.xyz
Memes
Miscellaneous
founded 2 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
Some details. One of the major players doing the tar pit strategy is Cloudflare. They're a giant in networking and infrastructure, and they use AI (more traditional, nit LLMs) ubiquitously to detect bots. So it is an arms race, but one where both sides have massive incentives.
Making nonsense is indeed detectable, but that misunderstands the purpose: economics. Scraping bots are used because they're a cheap way to get training data. If you make a non zero portion of training data poisonous you'd have to spend increasingly many resources to filter it out. The better the nonsense, the harder to detect. Cloudflare is known it use small LLMs to generate the nonsense, hence requiring systems at least that complex to differentiate it.
So in short the tar pit with garbage data actually decreases the average value of scraped data for bots that ignore do not scrape instructions.
The fact the internet runs on lava lamps makes me so happy.