this post was submitted on 17 Aug 2025
89 points (90.8% liked)

Technology

39732 readers
549 users here now

This is the official technology community of Lemmy.ml for all news related to creation and use of technology, and to facilitate civil, meaningful discussion around it.


Ask in DM before posting product reviews or ads. All such posts otherwise are subject to removal.


Rules:

1: All Lemmy rules apply

2: Do not post low effort posts

3: NEVER post naziped*gore stuff

4: Always post article URLs or their archived version URLs as sources, NOT screenshots. Help the blind users.

5: personal rants of Big Tech CEOs like Elon Musk are unwelcome (does not include posts about their companies affecting wide range of people)

6: no advertisement posts unless verified as legitimate and non-exploitative/non-consumerist

7: crypto related posts, unless essential, are disallowed

founded 6 years ago
MODERATORS
all 32 comments
sorted by: hot top controversial new old
[–] Pechente@feddit.org 126 points 4 weeks ago (3 children)

Wikipedia going down like that makes me sad, especially since due to ai crawlers, their traffic costs went up significantly.

[–] clb92@feddit.dk 45 points 4 weeks ago* (last edited 4 weeks ago) (3 children)

Why would anyone crawl Wikipedia when you can freely download the complete databases in one go, likely served on a CDN...

But sure, crawlers, go ahead and spend a week doing the same thing in a much more expensive, disruptive and error-prone way...

[–] eager_eagle@lemmy.world 15 points 4 weeks ago* (last edited 4 weeks ago) (1 children)

There are valid reasons for not wanting the whole database e.g. storage constraints, compatibility with ETL pipelines, and incorporating article updates.

What bothers me is that they -- apparently -- crawl instead of just... using the API, like:

https://en.wikipedia.org/w/api.php?action=parse&format=json&page=Lemmy_%28social_network%29&formatversion=2

I'm guessing they just crawl the whole web and don't bother to add a special case to turn Wikipedia URLs into their API versions.

[–] clb92@feddit.dk 10 points 4 weeks ago

valid reasons for not wanting the whole database e.g. storage constraints

If you're training AI models, surely you have a couple TB to spare. It's not like Wikipedia takes up petabytes or anything.

[–] limer@lemmy.ml 13 points 4 weeks ago

Vibe coding

[–] Pechente@feddit.org 1 points 4 weeks ago (2 children)

My comment was based on a podcast I listened to (Tech won’t save us, I think?). My guess is they also wanna crawl all the edits, discussion etc. which is usually not included in the complete dumps.

[–] mesamunefire@piefed.social 3 points 4 weeks ago

Good pod cast

[–] clb92@feddit.dk 3 points 4 weeks ago

Dumps with complete page edit history can be downloaded too, as far as I can see, so no need to crawl that.

[–] ThePantser@sh.itjust.works 14 points 4 weeks ago (1 children)

Yes they should really block crawlers or force them to pay. The only way I can think of that they could do is make you have to register an account to access content but that goes against what they originally intended. But these are new times and it's probably for the best. Wouldn't be hard to flag obvious AI scrappers.

[–] skvlp@lemmy.wtf 4 points 4 weeks ago

It seems there are ways to stop crawlers. Do a web search for "stop ai crawlers" or similar to learn more. I hope it doesn’t escalate into an arms race, but I realise I might be disappointed.

[–] SebaDC@discuss.tchncs.de 5 points 4 weeks ago

And click through rate is dropping.

[–] v4ld1z@lemmy.zip 44 points 4 weeks ago (1 children)

Can't reddit just die or something?

[–] Pechente@feddit.org 34 points 4 weeks ago (3 children)

Ironically it’s getting more popular but to me it seems it’s getting more popular in the way facebook used to get more popular. At some point your weird uncle is on it and all the good content creators just leave.

[–] balder1991@lemmy.world 10 points 4 weeks ago

In fact, I think at least in Brazil Reddit is becoming more and more popular. Go back like 5 years ago and Brazilian subs in Portuguese were small and low-traffic except for one or two (ex: /r/brasil).

Now there are a bunch of different themes and I see new topics being discussed quite often.

I believe the reason is Facebook has enshittified so much that its communities are dying quickly and Brazilians are finding Reddit works better for simple discussions. Also no one posts anything personal on FB anymore, it’s all Instagram style reposts so it lost any purpose.

[–] mesamunefire@piefed.social 7 points 4 weeks ago

Also google makes it the first result for a lot of searches now.

[–] lance20000@lemmy.ca 5 points 4 weeks ago (1 children)

Reddit stopped being fun last November.

I am not engaging with it as much as I used to, and now I am actively annoyed with it and trying to make Lemmy my default again.

[–] Pechente@feddit.org 8 points 4 weeks ago (1 children)

What happened last November? I quit when they turned off the API

[–] lance20000@lemmy.ca 2 points 4 weeks ago
[–] dparticiple@sh.itjust.works 18 points 4 weeks ago

By whose measurements? [citation needed]

[–] foremanguy92_@lemmy.ml 14 points 4 weeks ago (1 children)
[–] HoleSailor@feddit.org 8 points 4 weeks ago (1 children)
[–] foremanguy92_@lemmy.ml 6 points 4 weeks ago

Didn't really support the porn industry since the beginning but I kinda miss the porn magazines and the website made for it.

It's far more dangerous for multiple reasons to view porn on mainstream socials than intended websites, for young specifically... Sadly... One more thing that tend to worsen...

[–] Jack_Burton@lemmy.ca 12 points 4 weeks ago (1 children)

The fact that the biggest increase is ChatGPT and the biggest decrease is Wikipedia really sums things up.

[–] alphabethunter@lemmy.world 12 points 3 weeks ago (1 children)

The biggest decrease is Whatsapp though

[–] Jack_Burton@lemmy.ca 1 points 3 weeks ago

Woops, shows how much I was paying attention haha

[–] pineapple@lemmy.ml 6 points 4 weeks ago (1 children)

What's Bing doing above Wikipedia?

[–] J_on_LemmyML@lemmy.ml 7 points 4 weeks ago

Default search engine on Microsoft Edge browser/Windows search

[–] ragas@lemmy.ml 4 points 4 weeks ago (1 children)

How the fuck is Whatsapp a webpage?

[–] swizzelmuppet@feddit.org 4 points 4 weeks ago

Web.whatsapp.com is the web UI for it. Quite handy

Top 10 websites?

[–] Zerush@lemmy.ml 1 points 4 weeks ago* (last edited 4 weeks ago)

Thr rising of ChatBots are inverse proportional to the decline of intelligence and human comon sense. AI can be usefull, but not if it is used as substitute for the own intelligence, instead as an simple research tool. Worst adding the filter bubble effect using Google and Bing.