[isabelle-dev] mira disk usage

Mon Sep 3 13:57:52 CEST 2012

On Sat, 1 Sep 2012, Lars Noschinski wrote:

> On 01.09.2012 01:15, Gerwin Klein wrote:
>> I guess the problem is that build -b produces heap images for all sessions. 
>> These are many, and each image is hundreds of MB. The previous setup had 
>> images for selected base sessions only and no images for the rest (almost 
>> everything).
>
> The "AFP" session in mira does only produce the base images, AFAICS:
>
> $ ls AFP_c5dd6e9db74c4e9fadb373ec946c413a/polyml-5.4.1_x86_64-linux/
> Collections                HOL-Nominal      Jinja               Pure
> Group-Ring-Module          HOL-Probability  LatticeProperties Refine_Monadic
> HOL                        HOL-Word         List-Infinite       Simpl
> HOL-Multivariate_Analysis  HOLCF            Nat-Interval-Logic  log
>
>> ps: "isabelle build" takes about 5 sec on my system to figure out 
>> dependencies (I think). Should this be faster?

Isabelle build produces only the required images, i.e. "inner" nodes of 
the hierarchy.  A more ambitious version would not even do that, forking 
processes directly without going through image save/load first.

In the mira configuration 
http://isabelle.in.tum.de/repos/isabelle/file/7b6beb7e99c1/Admin/mira.py I 
don't see redundant -b options, in agreement with the above observation of 
images found later in the file-system.

> I guess this time is spent hashing the dependencies to figure out, 
> whether they have changed -- at least, it is not time based.
>
> I know that Git uses stat info as a first step and only if this changes 
> proceeds reading the file.

Ever since I am taking care of theory loading and session management 
(approx. 1997) I am aware of the various ways how file identifies are 
managed, including their advantages and disadvantages.  The "stat" model 
is tied to a physical file-system, and was gradually phased from Isabelle 
in the past couple of years.  SHA1 computation on the whole file content 
is fast enough, and has the advantage that it is well-defined for 
non-local files, say via HTTP.  Or files that are not files at all, just a 
vector of characters sent from the front-end to the back-end.

The slowness of the preparatory stage of Isabelle build has a different 
reason: reworking the old 'use' command to fit into the IDE model (without 
a file-system present) I've realized that the prover can now tell about 
auxiliary file-dependencies, via the new 'ML_file' command.  This requires 
a full outer-parse of files that might contain such commands: it also 
explains the renaming from unspecific 'use' to specific 'ML_file', because 
the latter does not occur by accident in source files so often.

This is a significant conceptual improvement, so the question if it is 
possible to do this superficial parsing a bit faster can be sorted out in 
the coming months.  It might be just a matter to use some of Oderski's 
.par combinators.

It also indicates how huge the HOL image has now become, And how great 
Poly/ML 5.5.0 is at digesting that so quickly.

Generally, Scala/JVM technology has a tradition of slow startup, but once 
it is running it runs quite fast.  When I start Proof General Emacs now 
and again by accident, I am surprised by its quick startup, but very slow 
running afterwards.  The same for Python tools like "hg view" or "hgtk".

The EPLF guys have a funny trick in their Scala TTY loop to make it appear 
faster on startup as it is, but I have no plans to imitate this 
illusionistic approach.

 	Makarius