[isabelle-dev] NEWS: Improved strategy for printing term abbreviations

Thu Oct 24 23:29:52 CEST 2024

*** General ***

* Printing term abbreviations now uses a different strategy: alternate
top-down, bottom-up, top-down etc. until a normal form is reached. This
is more efficient than the former top-down strategy. It also conforms to
AST rewriting (command 'translations'). Rare INCOMPATIBILITY, slightly
different results could be printed (often slightly "better" ones).

*** ML ***

* The new operation Pattern.rewrite_term_yoyo alternates top-down,
bottom-up, top-down etc. until a normal form is reached. This often
works better than former rewrite_term_top --- that is still available,
but has been renamed to rewrite_term_topdown to emphasize that something
notable has changed here.

This refers to Isabelle/bbed9f218158 and AFP/b347675c7769.

I've brushed up the Ast.normalize implementation slightly: it stems from my 
own Isabelle student project from 3 decades ago (under the informal title of 
"Rewriting Isabelle"). The current version is here 
https://isabelle-dev.sketis.net/source/isabelle/browse/default/src/Pure/Syntax/ast.ML;bbed9f218158$227

The corresponding version of Pattern.rewrite_term_yoyo is here 
https://isabelle-dev.sketis.net/source/isabelle/browse/default/src/Pure/more_pattern.ML;bbed9f218158$57 
--- I've changed the typical Berghofer style into more contemporary 
Haftmann-Wenzel Isabelle/ML, with moderate use of combinators like "perhaps" 
and "perhaps_loop".

The performance of Ast.normalize is still pretty good, even with all these 
encoded position constraints that we now have in input and output. In 
performance measurements of Syntax.read_term / string_of_term this part can be 
usually ignored.

The corresponding term normalization operations in Syntax.check_terms / 
uncheck_terms are much slower, because so many other things are going on here: 
e.g. somewhat slow type-inference in check_term. The term abbreviations in 
uncheck_term are now reasonably fast.

This is also due to another change is here: 
https://isabelle-dev.sketis.net/rISABELLEb61abd1e5027 --- it amends earlier 
attempts on performance tuning for terms with schematic variables, but that 
had bad consequences for the average situation.

I've also revisited the Earley parser recently (for markup), and got the 
impression that it is rather fast --- after doing only sporadic measurements. 
We can usually ignore that part in the measurements.

In conclusion, here is an example by Hanno Becker (18-Jul-2024 on isabelle-users):

abbreviation ‹y x ≡ (x + x)›

abbreviation (input) ‹pow3 f t ≡ (f (f (f t)))›
abbreviation (input) ‹id0 ≡ pow3 id›
abbreviation (input) ‹id1 ≡ pow3 id0›
abbreviation (input) ‹id2 ≡ pow3 id1›
abbreviation (input) ‹id3 ≡ pow3 id2›
abbreviation (input) ‹id4 ≡ pow3 id3›
abbreviation (input) ‹id5 ≡ pow3 id4›
abbreviation (input) ‹id6 ≡ pow3 id5›

term ‹id6 ((0::nat) + 0)›

That is now < 0.5s on my 3.6 GHz Intel Linux box from 2017, instead of 
"seconds on Apple M1". Inflating the term further towards id7 or id8 makes it 
"seconds" again.

If there are more concrete, real world examples, now would be an opportunity 
to look at performance profiles.

	Makarius