[isabelle-dev] Sledgehammer machine learning

Thu Nov 29 17:20:40 CET 2012

Hi all,

As you might already know, Daniel Kühlwein (a Ph.D. student of Josef Urban) and
I have been working on a machine-learning based relevance filter for
Sledgehammer, called MaSh. The idea is to learn from successful proofs which
facts are more likely to be useful for future proofs.

We are looking for volunteers to try it out. By default, Sledgehammer uses the
good old MePo (Meng & Paulson) filter, but if you do either of the following,
you'll find yourself using a combination of MePo and MaSh:

  1. sledgehammer_params [fact_filter = mesh]  -- at the top of your theory
  2. sledgehammer [fact_filter = mesh]  -- for a single invocation
  3. export MASH=yes

("mesh" = "mash" + "mepo"; yes, the names are a bit crazy. You can also cheat
and drop "fact_filter =".)

You will need a recent repository version (e.g. 05f8ec128e83) as well as Python
on your machine (which you probably already have).

The Sledgehammer manual has some documentation on MaSh. To get the latest
version:

    isabelle build_doc Sledgehammer
    isabelle doc sledgehammer

Sections 5.1, 6.1, and 7.2 are relevant. There's not much to know really -- the
learning and use of that learning takes place automatically (but can be guided
in various ways).

Example:

    theory Scratch
    imports Main
    begin

    lemma "map f xs = ys ==> rev (zip xs ys) = zip (rev xs) (rev ys)"

The command

    sledgehammer learn_isar

forces learning at this point, so that we can start using MaSh right away.
(Otherwise, learning takes place in the background and is available only
from the second Sledgehammer invocation.) This takes about 5 seconds and prints

    MaShing through 7127 facts for Isar proofs...
    Learned 4644 nontrivial Isar proofs.

Next step:

    sledgehammer [prover = e, fact_filter = mesh, verbose]

This prints

    ATP: Including (up to) 1000 relevant facts:
    zip_append list_induct2 ... divide_nonneg_pos wf_lenlex.
    ...
    Facts in "e" proof (of 1000): zip_rev at 4, length_map at 11.
    ...
    Try this: by (metis length_map zip_rev) (20 ms).

"@4" and "@11" indicate that the 4th and 11th facts are used for the proof. The
lower the numbers, the better. If we try again

    sledgehammer [prover = e, fact_filter = mesh, verbose]

we get

    Facts in "e" proof (of 1000): zip_rev at 1, length_map at 5.

which is an improvement that comes from learning. In contrast, MePo doesn't get
any smarter:

    sledgehammer [prover = e, fact_filter = mepo, verbose]

    Facts in "e" proof (of 1000): zip_rev at 4, length_map at 14.

The learning is persistent across sessions and should be robust in the face of
theory reorganizations etc.

If you want to improve MaSh's accuracy, you can let

    sledgehammer learn_atp

run for hours with your favorite theories loaded. It will then invoke
Sledgehammer on each available theorem to learn from its proofs. You can
interrupt it any time. Learning from ATP proofs is usually better than learning
from human-written proofs, according to evaluations by Kühlwein, Urban et al.

I hope those of you who use Sledgehammer regularly will give this a try and let
me know about your experience. We've had very useful feedback about brand new
features this way before (e.g. the tight SPASS integration). If you have nice
examples, they might easily end up in one of our papers.

Cheers,

Jasmin