Sadly it hard to tell if this is an actual DDoS attack, or scrappers descending on the site. It all looks very similar.
The search engines always seemed happy to announce that they are in fact GoogleBot/BingBot/Yahoo/whatever and frequently provided you with their expected IP ranges. The modern companies, mostly AI companies, seems to be more interested in flying under the radar, and have less respect for the internet infrastructure at a whole. So we're now at a point where I can't tell if it's an ill willed DDoS attack or just shitty AI startup number 7 reloading training data.
> To me, Anubis is not only a blocker for AI scrapers. Anubis is a DDoS protection.
Anubis is DDoS protection, just with updated marketing. These tools have existed forever, such as CloudFlare Challenges, or https://github.com/RuiSiang/PoW-Shield. Or HashCash.
I keep saying that Anubis really has nothing much to do with AI (e.g. some people might mistakenly think that it magically "blocks AI scrapers"; it only slows down abusive-rate visitors). It really only deals with DoS and DDoS.
I don't understand why people are using Anubis instead of all the other tools that already exist. Is it just marketing? Saying the right thing at the right time?
> Solving the challenge–which is valid for one week once passed–
One thing that I've noticed recently with the Arch Wiki adding Anubis, is that this one week period doesn't magically fix user annoyances with Anubis. I use Temporary Containers for every tab, which means that I constantly get Anubis regenerating tokens, since the cookie gets deleted as soon as the tab is closed.
Perhaps this is my own problem, but given the state of tracking on the internet, I do not feel it is an extremely out-of-the-ordinary circumstance to avoid saving cookies.
It’s not Anubis that saved your website, literally any sort of Captcha, or some dumb modal with a button to click into the real contents would’ve worked.
These crawlers are designed to work on 99% of hosts, if you tweak your site just so slightly out of spec, these bots wouldn’t know what to do.
Anubis is nice, but could we have a PoW system integrated in protocols (http or TLS, I'm not sure) so we don't have to require JS ?
As usual, there is a negative side to such protection: I was trying to download some raw files from git repository and instead of data got bunch of html. After quick look it turned out to be Anubis HTML page. Another issue was with broken links to issue tickets on main page, where Anubis was asking wrapper script to solve some hashes. Lesson here: after deploying Anubis, please carefully check the impact. There might be some unexpected issues.
> We use a stack consisting of Apache2, PHP-FPM, and MariaDB to host the web applications.
Oh hey, that’s a pretty utilitarian stack and I’m happy to see MariaDB be used out there.
Anubis is also really cool, I do imagine that proof of work might become more prevalent in the future to deal with the sheer amount of bots and bad actors (shame that they exist) out there, albeit in the case of hijacked devices it might just slow them down, hopefully to a manageable degree, instead of IP banning them altogether.
I do wonder if we’ll ever see HTTP only versions of PoW too, not just JS based options, though that might need to be a web standard or something.
Anyone knows a solution that works without js?
It's so bad we're going to the old gods for help now. :)
Kinda love how deep this gets into the whole social contract side of open source. Honestly, it's been a pain figuring out what feels right when folks mix legal rules and personal asks.
As someone who has a lot of experience with (not AI related) web scraping, fingerprinting and WAFs, I really like what Anubis is doing.
Amazon, Akamai, Kasada and other big players in the WAF/Antibot industry will charge you millions for the illusion of protection and half-baked javascript fingerprint collectors.
They usually calculate how "legit" your request is based on ambiguous factors, like the vendor name of your GPU (good luck buying flight tickets in a VM) or how anti-aliasing is implemented on you fonts/canvas. Total bullshit. Most web scrapers know how to bypass it. Especially the malicious ones.
But the biggest reason why I'm against these kind of systems is how they support the browser mono-culture. Your UA is from Servo or Ladybird? You're out of luck. That's why the idea choosing a purely browser-agnostic way of "weighting the soul" of a request resonates highly with me. Keep up the good work!
I don’t really understand why this solved this particular problem. The post says:
> As an attacker with stupid bots, you’ll never get through. As an attacker with clever bots, you’ll end up exhausting your own resources.
But the attack was clearly from a botnet, so the attacker isn’t paying for the resources consumed. Why don’t the zombie machines just spend the extra couple seconds to solve the PoW (at which point, they would apparently be exempt for a week and would be able to continue the attack)? Is it just that these particular bots were too dumb?
From looking at some of the rules like https://github.com/TecharoHQ/anubis/blob/main/data/bots/head... it seems that Anubis explicitly punishes bots that are "honest" about their user agent - I might be missing something, but isn't this just pressuring anyone who does anything bot-related to just lie about their user agent?
Flat out user-agent blacklist seems really weird, it's going to reward the companies that are more unethical in their scraping practices than the ones who report their user agent truthfully. From the repo it also seems like all the AI crawlers are also DENY, which, again, would reward AI companies that don't disclose their identity in the user agent.
Seems like rate-limiting expensive pages would be much easier and less invasive. Also caching...
And I would argue Anubis does nothing to stop real DDoS attacks that just indiscriminately blast sites with tens of gbps of traffic at once from many different IPs.
Sort of tangential but I’m surprised folks are still using Apache all these years later. Is there a certain language that makes it better than Nginx? Or it just the ease of use configuration that still pulls people? I switched to Nginx I don’t even know how many years ago and never looked back, just more or less wondering if I should.
Looks similar to haproxy-protection: https://gitgud.io/fatchan/haproxy-protection/
Can Anubis be restyled to be more... professional? I like the playfulness, but I know at least some of my clients will not.
If I see a cute cartoon with a cryptocurrency mining like "KHash/s" thing I am gonna leave that site real quick!
It should explain it isn't mining and just verifying the browser or such.