[isabelle-dev] Summary: WHY have 'a set back?

Tue Aug 30 22:31:49 CEST 2011

Florian Haftmann wrote:
> envisaged working plan
> ----------------------
>
> a) general
> * continue discussion about having set or not, end with a summary (for
> the mailing list archive)

Among all the technical details about the HOW, the WHY part of this
discussion seems to have come to a halt. Here is my attempt of a
summary of what we currently know.

1. In favour of an explicit set type:

1.1. Type discipline (for users):

   The notational conventions seem to be a matter of taste (some users
   like to explicitly distinguish between sets and functions, others
   prefer to write just P instead of {x. P x} in a set
   context). However, our library has many results about both sets and
   predicates, but almost none about arbitrary mixtures of the
   two. Thus, mixing them is usually a bad idea. Inspection of theories
   in the distribution and AFP has shown that such a mixture usually
   requires unfolding mem_def later on to finish the proof.

   Example:
     http://isabelle.in.tum.de/repos/isabelle/rev/510ac30f44c0
     (in Multivariate_Analysis, mem_def and Collect_def disappear from
     proofs when mixture is eliminated)

1.2. Type discipline (for tools): Even when all specifications and lemmas
   respect the set/pred separation, it is occasionally lost by applying
   seemingly harmless lemmas such as subset_antisym ([intro!] by
   default), or set_eqI, whose assumptions contain set notation, while
   the conclusion is generic. Thus, users will be forced into awkward
   situations, and the only way to recover from them is to unfold
   mem_def etc. The only way to avoid this is to disable the automatic
   use of such lemmas (with dramatic losses) or to introduce the type
   discipline.

   In other instances, some simplification rules cannot be enabled by
   default, since (even though they are perfectly sensible) they would
   make the confusion pervasive. The simplifier's eta-expansive
   behaviour can make this worse (turning "A" into "%x. A x" where A is
   a set), but the problem exists independently from this.

   Example:

http://isabelle.in.tum.de/repos/isabelle/file/fd520fa2fb09/src/HOL/Complete_Lattice.thy#l823
     (Here, mem_def is required, since the initial simp call does not
     preserve the the type discipline)

http://isabelle.in.tum.de/repos/isabelle/file/fd520fa2fb09/src/HOL/Equiv_Relations.thy#l367
     (similar situation)

1.3. Higher-Order Unification

   Higher-Order Unification sometimes fails on type "'a => bool" where
   it works for explicit "'a set", which is first-order.  The reasons
   for this effect are unclear and were not investigated in 2008, since
   HOU and its implementation in Isabelle are rather intricate.

   Examples (from the set elimination 2008):
   http://isabelle.in.tum.de/repos/isabelle/rev/d334a6d69598
   http://isabelle.in.tum.de/repos/isabelle/rev/6a4d5ca6d2e5
   http://isabelle.in.tum.de/repos/isabelle/rev/c0fa62fa0e5b

1.4. Coercions

   Sets can be declared as covariant for the sake of coercion
   inference, which makes more sense for subtyping.
   (A similar issue exists for Nitpick's monotonicity inference, but
   there it is solved independently already.)

1.5. Class instances (+,*)

   Sets can get their own class instances differing from functions. The
   only known instance where this is practically useful is in
   HOL/Library/Sets_and_Functions.thy, where one can then define A + B
   = { x + y | x y. x : A & y : B }. So this is rather minor.

1.6. Code generation

   Code generation for sets is simplified by eliminating the need for
   an explicit constructor (as in Library/Cset.thy) as an intermediate
   type. However, for other types (maps), the data refinement problem
   persists.

2. Contra an explicit set type:

2.1. Quite a lot of work for users out there: Cleaning up set/pred
   confusion from our own theories and the AFP is already taking
   significant time. Some newer AFP entries where confusion occurs also
   in definitions and lemmas (in particular DataRefinementIBP and
   GraphMarkingIBP) require significant reengineering. (The original
   author, Viorel Preoteasa kindly agreed to help us here). But even
   for those theories it seems that the changes improve overall
   readability and automation.

2.2. "(Fairly) small loss" in sledgehammer's performance, according to
   Jasmin's measurements.

(summary ends here, my own opinion follows:)

So, it seems that we can conclude that instead of the nice unified
development that we hoped for in 2008, we got quite a bit of confusion
(points 1.1 and 1.2), in addition to the drawbacks that were already
known (1.3, 1.5-1.6). If we had to choose between the two versions now
(with no status quo to consider), the case would be pretty clear. So
the question is whether our users out there will tolerate that they
have to fix quite a number of theories.

I think I am pro 'a set in the end, assuming we can fix the remaining
technical issues.

Please remind me of any important points I have missed.

Alex