Is Mastodon's Link-Previewing Overloading Servers ?

pedroapero@lemmy.ml · 6 months ago

Is Mastodon's Link-Previewing Overloading Servers ?

algernon@lemmy.ml · 6 months ago

…and here I am, running a blog that if it gets 15k hits a second, it won’t even bat an eye, and I could run it on a potato. Probably because I don’t serve hundreds of megabytes of garbage to visitors. (The preview image is also controllable iirc, so just, like, set it to something reasonably sized.)

moreeni@lemm.ee · 6 months ago

Wait, you’re going to tell me you don’t actually have to serve bloat on a blog like it’s foss? No way!

algernon@lemmy.ml · 6 months ago

I only serve bloat to AI crawlers.

map $http_user_agent $badagent {
  default     0;
  # list of AI crawler user agents in "~crawler 1" format
}

if ($badagent) {
   rewrite ^ /gpt;
}

location /gpt {
  proxy_pass https://courses.cs.washington.edu/courses/cse163/20wi/files/lectures/L04/bee-movie.txt;
}

…is a wonderful thing to put in my nginx config. (you can try curl -Is -H "User-Agent: GPTBot" https://chronicles.mad-scientist.club/robots.txt | grep content-length: to see it in action ;))

delirious_owl@discuss.online · 6 months ago

Your bandwidth bill lol

algernon@lemmy.ml · 6 months ago

I don’t think serving 86 kilobytes to AI crawlers will make any difference in my bandwidth use :)

delirious_owl@discuss.online · 6 months ago

Oic its a redirect now

algernon@lemmy.ml · 6 months ago

It’s not. It just doesn’t get enough hits for that 86k to matter. Fun fact: most AI crawlers hit /robots.txt first, they get served a bee movie script, fail to interpret it, and leave, without crawling further. If I’d let them crawl the entire site, that’d result in about two megabytes of traffic. By serving a 86kb file that doesn’t pass as robots.txt and has no links, I actually save bandwidth. Not on a single request, but by preventing a hundred others.

Skull giver@popplesburger.hilciferous.nl · edit-2 6 months ago

deleted by creator

Moonrise2473@lemmy.ml · 6 months ago

Or serve a gzip bomb (is that possible?)

Skull giver@popplesburger.hilciferous.nl · edit-2 6 months ago

deleted by creator

kopper [they/them]@lemmy.blahaj.zone · 6 months ago

https://git.arielaw.ar/arisunz/ir34/

jmcs@discuss.tchncs.de · 6 months ago

There’s no reason why 114MB of static content over 5 minutes should be an issue for a public facing website. Hell, I probably could serve that and the images with a Raspberry Pi over my home Internet and still have bandwidth to spare.

I think they are throwing stones at the wrong glass house/software stack.

Sleepkever@lemm.ee · 6 months ago

It is not, but a write amplification of 36704:1 is one hell of an exploitable surface.

With that same Raspberry Pi and a single 1gbit connection you could also do 333333 post requests of 3 KB in a single second made on fake accounts with preferably a fake follower on a lot of fediverse instances. That would result in those fediverse servers theoretically requesting 333333 * 114MB = ~38Gigabyte/s. At least for as long as you can keep posting new posts for a few minutes and the servers hosting still have bandwidth. DDosing with a ‘botnet’ of fediverse servers/accounts made easy!

I’m actually surprised it hasn’t been tried yet now that I think about it…

algernon@lemmy.ml · 6 months ago

That would result in those fediverse servers theoretically requesting 333333 * 114MB = ~38Gigabyte/s.

On the other hand, if the site linked would not serve garbage, and would fit like 1Mb like a normal site, then this would be only ~325mb/s, and while that’s still high, it’s not the end of the world. If it’s a site that actually puts effort into being optimized, and a request fits in ~300kb (still a lot, in my book, for what is essentially a preview, with only tiny parts of the actual content loaded), then we’re looking at 95mb/s.

If said site puts effort into making their previews reasonable, and serve ~30kb, then that’s 9mb/s. It’s 3190 in the Year of Our Lady Discord. A potato can serve that.

MentalEdge@sopuli.xyz · 6 months ago

Foss project: has 100 open issues

A year passes

Foss project: 50 issues got resolved, 50 new ones have been opened in the meantime

Why hasn’t this giant project fixed a single bug?

0x1C3B00DA@fedia.io · 6 months ago

This issue has been noted since mastodon was initially release > 7 years ago. It has also been filed multiple times over the years, indicating that previous small “fixes” for it haven’t fully fixed the issue.

dsemy@lemm.ee · 6 months ago

I’m sure an affected website could have paid a web developer to find a solution to this issue in the past 7 years if it was that important to them.

veroxii@aussie.zone · 6 months ago

Or probably pay an extra $5 for the better hosting plan.

Die4Ever@programming.dev · edit-2 6 months ago

Or use Cloudflare (properly)

pedroapero@lemmy.ml · 6 months ago

They say they do in the article.

Die4Ever@programming.dev · 6 months ago

Then they aren’t using it properly

0x1C3B00DA@fedia.io · 6 months ago

People have submitted various fixes but the lead developer blocks them. Expecting owners of small personal websites to pay to fix bugs of any random software that hits their site is ridiculous. This is mastodon’s fault and they should fix it. As long as the web has been around, the expected behavior has been for a software team to prioritize bugs that affect other sites.

dsemy@lemm.ee · 6 months ago

If they don’t want to pay to fix it, they can just block the user agent (or just fix their website, this issue is affecting them so much mainly because they don’t cache).

Relying on the competence of unaffiliated developers is not a good way to run a business.

0x1C3B00DA@fedia.io · 6 months ago

Relying on the competence of unaffiliated developers is not a good way to run a business.

This affects any site that’s posted on the fediverse, including small personal sites. Some of these small sites are for people who didn’t set the site up themselves and don’t know how or can’t block a user agent. Mastodon letting a bug like this languish when it affects the small independent parts of the web that mastodon is supposed to be in favor of is directly antithetical to its mission.

dsemy@lemm.ee · 6 months ago

The reason (IMO) this has languished as much as it has, is that most sites handle this fine; though I agree that it should have been fixed by now.

aleph@lemm.ee · 6 months ago

deleted by creator

dsemy@lemm.ee · 6 months ago

They also state their opinion that the issue “should have been prioritized for a faster fix… Don’t you think as a community-powered, open-source project, it should be possible to attend to a long-standing bug, as serious as this one?”

It’s crazy how every single entity who has any issue with any free software project always seems to assume their needs should be prioritized.

delirious_owl@discuss.online · 6 months ago

Well, the users collectively should dictate the priorities.

dsemy@lemm.ee · 6 months ago

Why should they? The users of a free software project aren’t entitled to anything.

If users want to dictate priorities they should become developers, and if they can’t/won’t at least try to support them financially.

delirious_owl@discuss.online · 6 months ago

Because democracy

delirious_owl@discuss.online · 6 months ago

Just fucking cache.

If a GET request is breaking your server, you’re doing something horribly wrong.

uis@lemm.ee · edit-2 6 months ago

It’s about amplification attack. No matter how well you cache, you still will send replies.

delirious_owl@discuss.online · 6 months ago

Doesn’t apply to GET

uis@lemm.ee · 6 months ago

Next stage: doesn’t apply to DNS

For context DNS amplification factor is about 150.

Rimu@piefed.social · 6 months ago

In the comments on the article people have debugged their cloudflare/caching configuration for them and told them what they’re doing wrong.

Moonrise2473@lemmy.ml · 6 months ago

If I understand right that means link previews are requested every single time an user sees it? The instance should request it once a week, cache it and serve that to users

ReveredOxygen@sh.itjust.works · 6 months ago

I believe instances generate the preview as soon as it’s federated. The problem is that if you have many followers, each of their instances will try to generate a preview at the same time

helenslunch@feddit.nl · edit-2 8 days ago

deleted by creator