# [isabelle-dev] counter example checking from ML

Lucas Dixon ldixon at inf.ed.ac.uk
Mon Sep 21 22:26:27 CEST 2009

```Hi, a little more progress in understanding the issue...

I've made a little test program that highlights the change in behaviour
that we noticed.

The point is to stress test counter example finding and see at what
point a false-conjecture slips past. In particular, when performing 5
iterations, this code measures the average number of times it takes to
before the false conjecture is not spotted by the test_term function.
So, the bigger the number that comes out, the better test_term is doing
(at generating examples that show the conjecture to be false).

Here's the code:

ML {*
(* how many times until this false conjecture slips past codegen *)
fun count_until_missed i =
(case Codegen.test_term @{context} @{term "%a (b :: nat). a + b = a
* b"} 5
of NONE => i
| SOME _ => count_until_missed (i + 1));

(* average of the function's results, preformed n times *)
fun avg f n =
let fun sumf 0 a = a
| sumf k a = sumf (k - 1) (a + f())
in  (Real.fromInt (sumf n 0)) / (Real.fromInt n) end;

(* avg number of times until this false conjecture slips past codegen *)
val count_avg = avg (fn () => count_until_missed 1) 50;
*}

I don't have enough internet to download and test on 2008; but there has
been a significant difference even between Isabelle 2009 and a recent
dev snapshot (32589:9353798ce61f):

On Isabelle dev snapshot (32589:9353798ce61f)
val count_avg = 14.11 : real

On Isabelle 2009
val count_avg = 32.14 : real

(takes some time... good news: you've speeded up codegen a lot! In fact,
this difference might even be compensating for the bad behaviour in
iterations :)

The deterministic version should, I think, not terminate for this
problem and many like it. But I haven't tried it yet.

These are of course averages and change a bit, but having repeated the
experiment many times, I think those numbers are aprox right. At least
they give a rough benchmarking mechanism.

This is a case where your random sampling looks to be doing the wrong
thing - with respect to the spec I've made up in my head ;-). Since "(a
= 0 & b = 0 | a = 2 & b = 2) = (a + b = a * b)", 5 iterations is going
to have to be picking the same case at least three times; a quick hack
would be to cache the examples you've already looked at. But even better
would be for your random example chooser to automatically pick the next
unpicked example, rather than repeat known results. This is probably
more of a problem for conjectures with a small number of variables, but
would still be generally a good idea.

best,
lucas

Florian Haftmann wrote:
> Hi Lucas,
>
> indeed the whole quickcheck matter has grown into a complex story.
>
> First, we currently (e.g. following development in the hg repository)
> have a kind of fork:  there is the "old" quickcheck, residing in
> Pure/codegen.ML, and the "new" one in HOL/Quickcheck.thy; the most
> prominent difference between both is that the "new" one tries to
> internalize as much as possible into the logic, which allows to use type
> classes and avoids tinkering too much with glueing raw ML snippet
> strings together.
>
> Concerning determinism, the "old" quickcheck is inherently randomized.
> The "new" one uses an open state monad which passes a random seed around
> explicitly and can thus be also used to enforce determinism.  If you
> want to use this, be aware of the pitfalls which arise when using
> unofficial intermediate stages of development rather than official releases!
>
> The key for understanding is in HOL/Tools/quickcheck_generators.ML,
> function mk_generator_expr (export it on demand and try something like
> Quickcheck_Generators.mk_generator_expr @{theory} @{term
> "\<lambda>n::nat.  n ~= n^2"} [@{typ nat}]);  for further background
> knowledge on this machinery, refer to ยง4.4 in
> http://www4.in.tum.de/~haftmann/phd.shtml.  Don't hesitate to ask
> further questions.
>
> Hope this helps,
> 	Florian
>

--
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

```