More attacks by dump AI-bots

Makarius makarius at sketis.net
Wed Nov 26 20:14:19 CET 2025


On 17/11/2025 19:32, Makarius wrote:
> On 17/11/2025 11:20, Makarius wrote:
>> We need to find a proper solution for hgweb + Apache specifically.
> 
> The Mercurial mailing list has this thread "HELP: Fighting AI scrapers":
> https://lists.mercurial-scm.org/pipermail/mercurial/2025-September/106698.html
> 
> It is interesting to read, but also converges to Anubis, without any brilliant 
> ideas. Brilliant ideas is what we need, though. I probably need to study the 
> hgweb implementation.

I've now invested 1h to study the hgweb implementation of 
isabelle.sketis.net/repos, notably mercurial-6.1.4/mercurial/hgweb/hgweb_mod.py

The answer is rather plain and simple:

changeset:   83654:52cd371a36dd
tag:         tip
user:        wenzelm
date:        Wed Nov 26 20:00:47 2025 +0100
files:       Admin/Mercurial/mercurial-6.1.4-hgweb.patch
description:
adhoc patch for hgweb.wsgi: provide "hg clone" via HTTP without suffering from 
Denial-of-Service attacks on website content (e.g. by Non-Intelligent Agents);


diff -r f152543f9e16 -r 52cd371a36dd Admin/Mercurial/mercurial-6.1.4-hgweb.patch
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/Admin/Mercurial/mercurial-6.1.4-hgweb.patch	Wed Nov 26 20:00:47 2025 +0100
@@ -0,0 +1,16 @@
+diff -ru mercurial-6.1.4/mercurial/hgweb/hgweb_mod.py 
mercurial-6.1.4-patched/mercurial/hgweb/hgweb_mod.py
+--- mercurial-6.1.4/mercurial/hgweb/hgweb_mod.py	2022-06-16 
15:09:43.000000000 +0200
++++ mercurial-6.1.4-patched/mercurial/hgweb/hgweb_mod.py	2025-11-26 
19:40:54.320858407 +0100
+@@ -371,6 +371,12 @@
+         )
+         if handled:
+             return res.sendresponse()
++        else:
++            rctx.tmpl = rctx.templater(req)
++            res.status = b'500 Internal Server Error'
++            res.headers[b'Content-Type'] = rctx.tmpl.render(b'mimetype', 
{b'encoding': encoding.encoding})
++            return rctx.sendtemplate(b'error', error=b'Usage: hg clone URL DIR')
++
+
+         # Old implementations of hgweb supported dispatching the request via
+         # the initial query string parameter instead of using PATH_INFO.


In other words: there is no longer a website to browse (nor to attack), only 
the "wireprotocol" of the hg client via HTTP.

Thus all-important automatic jobs with "hg clone 
https://isabelle.sketis.net/repos/isabelle" should work again --- and perform 
better than before, because we disregard the AI non-sense altogether.

The error page could be a bit nicer, saying more clearly what is wrong and 
what needs to be done instead, but the main focus is still the Isabelle2025-1 
release.


	Makarius



More information about the isabelle-dev mailing list