More attacks by dump AI-bots
Makarius
makarius at sketis.net
Wed Nov 26 20:14:19 CET 2025
On 17/11/2025 19:32, Makarius wrote:
> On 17/11/2025 11:20, Makarius wrote:
>> We need to find a proper solution for hgweb + Apache specifically.
>
> The Mercurial mailing list has this thread "HELP: Fighting AI scrapers":
> https://lists.mercurial-scm.org/pipermail/mercurial/2025-September/106698.html
>
> It is interesting to read, but also converges to Anubis, without any brilliant
> ideas. Brilliant ideas is what we need, though. I probably need to study the
> hgweb implementation.
I've now invested 1h to study the hgweb implementation of
isabelle.sketis.net/repos, notably mercurial-6.1.4/mercurial/hgweb/hgweb_mod.py
The answer is rather plain and simple:
changeset: 83654:52cd371a36dd
tag: tip
user: wenzelm
date: Wed Nov 26 20:00:47 2025 +0100
files: Admin/Mercurial/mercurial-6.1.4-hgweb.patch
description:
adhoc patch for hgweb.wsgi: provide "hg clone" via HTTP without suffering from
Denial-of-Service attacks on website content (e.g. by Non-Intelligent Agents);
diff -r f152543f9e16 -r 52cd371a36dd Admin/Mercurial/mercurial-6.1.4-hgweb.patch
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/Admin/Mercurial/mercurial-6.1.4-hgweb.patch Wed Nov 26 20:00:47 2025 +0100
@@ -0,0 +1,16 @@
+diff -ru mercurial-6.1.4/mercurial/hgweb/hgweb_mod.py
mercurial-6.1.4-patched/mercurial/hgweb/hgweb_mod.py
+--- mercurial-6.1.4/mercurial/hgweb/hgweb_mod.py 2022-06-16
15:09:43.000000000 +0200
++++ mercurial-6.1.4-patched/mercurial/hgweb/hgweb_mod.py 2025-11-26
19:40:54.320858407 +0100
+@@ -371,6 +371,12 @@
+ )
+ if handled:
+ return res.sendresponse()
++ else:
++ rctx.tmpl = rctx.templater(req)
++ res.status = b'500 Internal Server Error'
++ res.headers[b'Content-Type'] = rctx.tmpl.render(b'mimetype',
{b'encoding': encoding.encoding})
++ return rctx.sendtemplate(b'error', error=b'Usage: hg clone URL DIR')
++
+
+ # Old implementations of hgweb supported dispatching the request via
+ # the initial query string parameter instead of using PATH_INFO.
In other words: there is no longer a website to browse (nor to attack), only
the "wireprotocol" of the hg client via HTTP.
Thus all-important automatic jobs with "hg clone
https://isabelle.sketis.net/repos/isabelle" should work again --- and perform
better than before, because we disregard the AI non-sense altogether.
The error page could be a bit nicer, saying more clearly what is wrong and
what needs to be done instead, but the main focus is still the Isabelle2025-1
release.
Makarius
More information about the isabelle-dev
mailing list