<html>
<head>
<meta http-equiv="content-type" content="text/html; charset=UTF-8">
</head>
<body>
<div dir="auto"><u></u><br></div><blockquote class="tutanota_quote" style="border-left: 1px solid #93A3B8; padding-left: 10px; margin-left: 5px;"><div dir="auto">I understand that the use of LLMs for formal proofs is an exciting and fascinating prospect, but it should be kept in mind that the majority of these LLMs suck up information from anywhere, without regard for licensing. This is why the Isabelle dev blog is restricted to accounts. I'd recommend that the maintainers revisit Anubis, as it is not as "nuclear" as restricting the dev blog to accounts.<br></div><div dir="auto"><br></div><div dir="auto"><br></div><div dir="auto">Jul 10, 2025, 05:42 by lp15@cam.ac.uk:<br></div><blockquote class="tutanota_quote" style="border-left: 1px solid #93A3B8; padding-left: 10px; margin-left: 5px;"><div dir="auto">Hello all, you may be interested in a little experiment I tried using the DeepSeek LLM yesterday. I asked whether it could translate a theorem header from n-dimensional real vectors to abstract Euclidean spaces. My query:<br></div><blockquote><div dir="auto">The following theorem header fixes variable "a" to type "real^'n". Changing this type to "'a::euclidean_space" requires modifying the header, replacing $ and χ by their analogues for Euclidean spaces.<br></div><div dir="auto"><br></div><div dir="auto">proposition<br></div><div dir="auto">fixes a :: "real^'n"<br></div><div dir="auto">assumes "m ≠ n" and ab_ne: "cbox a b ≠ {}" and an: "0 ≤ a$n"<br></div><div dir="auto">shows measurable_shear_interval: "(λx. χ i. if i = m then x$m + x$n else x$i) ` (cbox a b) ∈ lmeasurable"<br></div><div dir="auto">and measure_shear_interval: "measure lebesgue ((λx. χ i. if i = m then x$m + x$n else x$i) ` cbox a b)<br></div><div dir="auto">= measure lebesgue (cbox a b)"<br></div></blockquote><div dir="auto"><br></div><div dir="auto">On its first attempt it produced something that seemed to be almost right but clunky and referred to something non-standard. Pushing it a little further, I got the following, which is far as I can tell is correct. (Isabelle accepts it.)<br></div><blockquote><div dir="auto">proposition<br></div><div dir="auto">fixes a :: "'a::euclidean_space"<br></div><div dir="auto">assumes "m ≠ n" and "m ∈ Basis" and "n ∈ Basis"<br></div><div dir="auto">and ab_ne: "cbox a b ≠ {}" and an: "0 ≤ a ∙ n"<br></div><div dir="auto">shows measurable_shear_interval: "(λx. ∑i∈Basis. (if i = m then (x ∙ i) + (x ∙ n) else (x ∙ i)) *⇩R i) ` (cbox a b) ∈ lmeasurable"<br></div><div dir="auto">and measure_shear_interval: "measure lebesgue ((λx. ∑i∈Basis. (if i = m then (x ∙ i) + (x ∙ n) else (x ∙ i)) *⇩R i) ` cbox a b)<br></div><div dir="auto">= measure lebesgue (cbox a b)”<br></div></blockquote><div dir="auto"><br></div><div dir="auto"><br></div><div dir="auto">Everything was accompanied by quantities of plausible sounding narrative, e.g. "this version is more abstract and elegant”. A sample:<br></div><blockquote><div dir="auto">This version is more type-safe because:<br></div><div dir="auto">• No need to worry about index bounds<br></div><div dir="auto">• The basis vectors are first-class entities<br></div><div dir="auto">• The notation more clearly reflects the mathematical intent<br></div><div dir="auto">Note that in this formulation:<br></div><div dir="auto">• Basis is the canonical basis of the Euclidean space<br></div><div dir="auto">• ∙ is the inner product (giving the component in a particular basis direction)<br></div><div dir="auto">• *⇩R is the scalar multiplication<br></div><div dir="auto">• The sum reconstructs the vector from its components<br></div><div dir="auto">Would you like me to explain any part of this formulation in more detail? Or would you like assistance with proving this version of the theorem?<br></div></blockquote><div dir="auto"><br></div><div dir="auto">My impression is that it was grabbing material from a variety of places, in particular the AFP entry Ordinary_Differential_Equations. But the word "elegant" doesn't appear there and in fact appears in the entire distribution + AFP only 42 times.<br></div><div dir="auto"><br></div><div dir="auto">I recall doing the precise task above in the past and needing the better part of an hour to get it right. I didn't try to port the proof (which was long and complicated) and I would be surprised if it could get that right.<br></div><div dir="auto"><br></div><div dir="auto">I'm not sure how much more can be accomplished with the free version of DeepSeek. I have also seen Claude Sonnet 4 swallow entire Isabelle codebases and summarise them. These tools may already be useful for tasks such as identifying duplicate material.<br></div><div dir="auto"><br></div><div dir="auto">Larry<br></div></blockquote><div dir="auto"><br></div></blockquote><div dir="auto"><br></div> </body>
</html>