A HOLTrace file records a sequence of HOL Light proofs and theorems. This document specifies version 1 of the file format. Ths document is aimed at implementors writing tools to read or write HOLTrace files.
Files in HOLTrace format conventionally have filenames ending .50
,
although this is not required.
This document allows multiletter italicized variable names. Concatenation here does not mean multiplication.
Overall structure
The file is a sequence of lines. Each line consists of printable ASCII characters (bytes 32 through 126) followed by LF (byte 10). There is always a terminating LF. This is a binary file format: LF does not mean CR LF.
The first line is HOLTrace 1
(followed by LF).
Each subsequent line begins with a 1-character tag
and continues with a sequence of zero or more arguments.
Arguments are encoded nonnegative integers,
except that some tags also have text as a final argument.
Integers are encoded in a way that often appears negative or non-numeric. Specifically:
-
Each integer is encoded relative to an integer n defined below by the context.
-
Integer arguments n−1 through n−10 are encoded as the characters of
!@#$%^&*()
respectively: i.e.,!
means n−1,@
means n−2, etc. -
An integer argument i ≥ n is encoded as the uppercase base-32 representation of the integer i. Lowercase is not accepted as a synonym for uppercase.
-
An integer argument i ≤ n−11 is encoded as the uppercase base-32 representation of the negative integer i−n (starting with a minus sign), or as the uppercase base-32 representation of the integer i if that has strictly fewer bytes. Lowercase is again not accepted as a synonym for uppercase.
For example, for encodings relative to n = 10000:
argument 0
means 0;
9
means 9;
A
means 10;
V
means 31;
10
means 32;
VV
means 1023;
100
means 1024;
8OG
means 8976;
-VV
means 8977;
-10
means 9968;
-V
means 9969;
-B
means 9989;
)
means 9990;
!
means 9999;
9OG
means 10000;
9OH
means 10001.
Spaces between arguments are suppressed in the file:
e.g., tag b
with arguments 0
and bool
is written as b0bool
.
Exception:
if an encoded argument ends with one of the characters 0123456789ABCDEFGHIJKLMNOPQRSTUV
and the next encoded argument begins with one of those characters (or a space)
then a space is inserted:
e.g., tag e
with arguments 17
and 14
is written e17 14
.
Order
Some of the line definitions below
say that a line refers to other lines in the file.
Some references are via explicit arguments:
for example, the line b2!!fun
specifies a type
by reference to the previously specified type.
Some references are implicit:
for example, a theorem is always proven by the most recent inference.
These references must be to lines earlier in the file than the current line. In other words, circular references are prohibited, and lines must appear in topological order.
A file writer normally deduplicates objects
(for example, there is only one b0bool
line),
but a file reader does not rely on this.
For example,
correctly implementing the semantics of the S
tag
relies on recognizing whether various objects are equal.
Types
Each line with tag a
or b
is a "type line" specifying a type.
Each type is assigned an identifying number (a "type ID") for future reference,
counting from 0 upwards:
the first type line specifies type 0, i.e., the type having type ID 0;
the next type line specifies type 1;
the next type line after that specifies type 2;
etc.
To track these references,
the file reader mantains a type counter
that starts at 0 and that increments by 1 immediately after each type line.
(The file writer also does this.)
Specifically, type lines have the following meanings:
-
a
text: The type variable named text; HOL Light'sTyvar
(text). For example, the lineaB
specifies a type variable namedB
; the linea?328
specifies a type variable named?328
. -
b
n a ... z text: The type named text with arguments a,...,z; HOL Light'sTyapp
(text,[a;...;z]). The first argument n says the number of a through z arguments (0 or more, not necessarily 26); those arguments are type IDs specifying theTyApp
arguments. These type IDs are encoded relative to the current type ID, so!
refers to the most recent type line before this, and@
refers to the most recent type line before that. The argument n is encoded relative to the integer-1
(so it never uses!
etc.).
For example, the lines b0bool
and b2!!fun
and b2@!fun
(in that order) specify types
bool
and fun(bool,bool)
and fun(bool,fun(bool,bool))
,
i.e.,
bool
and bool->bool
and bool->bool->bool
.
Terms
Each line with tag c
or d
or e
or f
is a "term line" specifying a term.
Each term is assigned an identifying number (a "term ID") for future reference,
counting from 0 upwards:
the first term line specifies term 0, i.e., the term having term ID 0;
the next term line specifies term 1;
the next term line after that specifies term 2;
etc.
To track these references,
the file reader mantains a term counter
that starts at 0 and that increments by 1 immediately after each term line.
The term counter is separate from the type counter.
For example,
if the first five lines have tags b
, b
, b
, c
, and d
respectively,
then those lines specify type 0, type 1, type 2, term 0, and term 1 respectively.
Specifically, term lines have the following meanings:
-
c
type text: The constant named text whose type is ID type; HOL Light'sConst
(text,type). The encoding of type is relative to the type counter, so!
refers to the most recently specified type. For example, after the linesb0bool
andb2!!fun
andb2@!fun
mentioned above, the linec!=
specifies a constant named=
of typebool->bool->bool
. -
d
type text: The variable named text whose type is ID type; HOL Light'sVar
(text,type). For example, after the linesb0bool
andb2!!fun
andb2@!fun
andc!=
mentioned above, the lined#T
specifies a constant namedT
of typebool
. -
e
type fterm xterm: The evaluation fterm(xterm); HOL Light'sComb
(fterm,xterm). The type argument is a type ID for the type of the value. The fterm and xterm arguments are term IDs for, respectively, the function being evaluated and the evaluation point. These term IDs are encoded relative to the term counter, so!
refers to the most recently specified term before this line. For example, after the linesc!=
andd#T
mentioned above specifying terms=
andT
, the linee@@!
specifies a termComb(=,T)
, i.e. =(T), meaning the value of the = function applied to T. The first@
in this example refers to the value type, namelybool->bool
. -
f
type vterm tterm: The function vterm ↦ tterm; HOL Light'sAbs
(vterm,tterm). The type argument is a type ID for the type of the function. The vterm and tterm arguments are term IDs for, respectively, the function variable and the function output. These term IDs are encoded relative to the term counter, For example, afterd0x
specifies a variable namedx
of typebool
, the linef$!!
specifies the termAbs(x,x)
, i.e., x↦x, meaning the function that maps x to x. The$
in this example refers to the function type, namelybool->bool
.
Theorems
Each line with tag t
is a "theorem line" specifying a theorem.
Each theorem is assigned an identifying number (a "theorem ID") for future reference,
counting from 0 upwards:
the first theorem line specifies theorem 0, i.e., the theorem having theorem ID 0;
the next theorem line specifies theorem 1;
the next theorem line after that specifies theorem 2;
etc.
To track these references,
the file reader mantains a theorem counter
that starts at 0 and that increments by 1 immediately after each theorem line.
The theorem counter is separate from the term counter and the type counter.
Specifically, theorem lines have the following meaning:
t
A ... Z conclusion: The theorem saying that conclusion follows given hypotheses A through Z. The arguments (for the hypotheses and the conclusion) are term IDs encoded relative to the term counter. There must be at least one argument.
(Warning:
Often the literature on logic says "sequent" rather than "theorem",
reserving "theorem" for the case of no hypotheses,
although the conclusion can still be a term of the form P implies Q
.
"Sequent" can also refer to a more general concept
allowing a disjunction of any number of conclusions.)
Some readers that check inferences
expect the hypotheses of each theorem
to be sorted in a particular order.
The order is outside the scope of this document
(but is connected to the reason for variables being labeled d
instead of v
).
As a complete example of a theorem,
here are annotations explaining
the first lines with lowercase-letter tags in a trace of hol.ml
:
line | specifies | as |
---|---|---|
b0bool |
type 0 | type bool |
b2!!fun |
type 1 | type bool->bool |
b2@!fun |
type 2 | type bool->bool->bool |
c!= |
term 0 | constant = of type bool->bool->bool |
d#T |
term 1 | variable T of type bool |
e@@! |
term 2 | value =(T) of type bool |
b2@#fun |
type 3 | type (bool->bool)->bool |
b2#!fun |
type 4 | type (bool->bool)->(bool->bool)->bool |
c!= |
term 3 | constant = of type (bool->bool)->(bool->bool)->bool |
d%p |
term 4 | variable p of type bool |
f$!! |
term 5 | function p↦p of type bool->bool |
e@#! |
term 6 | value =(p↦p) of type (bool->bool)->bool |
e%!@ |
term 7 | value =(p↦p)(p↦p) of type bool |
e%^! |
term 8 | value =(T)(=(p↦p)(p↦p)) of type bool |
c%T |
term 9 | constant T of type bool |
e$)! |
term 10 | value =(T) of type bool->bool |
e%!$ |
term 11 | value =(T)(=(p↦p)(p↦p)) of type bool |
t! |
theorem 0 | theorem =(T)(=(p↦p)(p↦p)) |
This theorem has no hypotheses. Its conclusion is that the constant T is equal to the equality of p↦p and p↦p, i.e., that T is true if and only if p↦p equals p↦p.
Theorem names
Lines with tag /
assign names to theorems.
Specifically, these lines have the following meaning:
/
th text: Theorem th has name text. Here th is a theorem ID, encoded relative to the theorem counter. For example,/!T_DEF
says that the most recent theorem is namedT_DEF
. The name text is required to be nonempty.
A theorem can have multiple names, but multiple theorems cannot have the same name.
Inferences
Each line with an uppercase letter tag (within the list of tags below) is an "inference line" specifying an inference (a way to prove a theorem). Each inference is assigned an identifying number (an "inference ID") for future reference, counting from 0 upwards: the first inference line defines inference 0, i.e., the inference having inference ID 0; the next inference line defines inference 1; the next inference line after that defines inference 2; etc. To track these references, the file reader mantains an inference counter that starts at 0 and that increments by 1 immediately after each inference line. The inference counter is separate from the theorem counter, the term counter, and the type counter.
For example,
in a trace of hol.ml
,
the first lines include the lines shown above (b0bool
etc.) with lowercase-letter tags,
and also one inference line that proves theorem 0,
namely a C!T
line right before the c%T
line.
Some readers of HOLTrace files ignore inferences; inference lines can be simply removed from the file without affecting those readers. However, file readers that check inferences expect each theorem to be preceded in the file by at least one inference and to be proven specifically by the last of those inferences. (Non-inference lines can appear between that inference and the theorem. There can also be further inferences that prove the same theorem in different ways.)
Except for the arguments listed as text below,
all arguments on inference lines are nonnegative integers.
Arguments labeled type are type IDs, encoded relative to the current type counter;
arguments labeled term are term IDs, encoded relative to the current term counter;
arguments labeled th (or TH) are theorem IDs, encoded relative to the current theorem counter.
For example, !
refers to the most recently specified type or term or theorem respectively.
Ten of the inference tags are HOL Light's ten primitive inference rules:
-
B
term: HOL Light'sBETA
primitive inference rule. The input term is required to have the form (x↦y)(x), where x is a variable and y is a term. This rule generates the conclusion (x↦y)(x) = y with no hypotheses. -
E
th TH: HOL Light'sMK_COMB
primitive inference rule. There are two input theorems th and TH, which are required to have conclusions of the form a = b and C = D respectively, with a having suitable type to be evaluated at C. This rule generates the conclusion a(C) = b(D) from a merged list of hypotheses of the input theorems. -
F
term th: HOL Light'sABS
primitive inference rule. The input theorem th is required to have a conclusion of the form a = b. The input term is required to be a variable that does not appear free in any of the hypotheses of th. This rule generates the conclusion (term↦a) = (term↦b) from the same hypotheses as th. -
P
th TH: HOL Light'sEQ_MP
primitive inference rule. There are two input theorems th and TH, which are required to have conclusions of the form a = b and A respectively, where a is the same as A modulo renaming of bound variables. This rule generates the conclusion b from a merged list of hypotheses of the input theorems. -
R
term: HOL Light'sREFL
primitive inference rule. This rule generates the conclusion term = term with no hypotheses. -
S
th oldterm newterm ... oldterm newterm: HOL Light'sINST
primitive inference rule for term substitution in the hypotheses and conclusion of theorem th, including renaming of bound variables if necessary. Beware that replacing term variable v with t and replacing term variable w with u is written in HOLTrace format with arguments v t w u, whereas theINST
input is [t,v;u,w]. There can be 0 substitutions, although this is not useful. -
T
th oldtype newtype ... oldtype newtype: HOL Light'sINST_TYPE
primitive inference rule for type substitution in the hypotheses and conclusion of theorem th, including renaming of bound variables if necessary. Beware that replacing type variable v with t and replacing type variable w with u is written in HOLTrace format with arguments v t w u, whereas theINST_TYPE
input is [t,v;u,w]. There can be 0 substitutions, although this is not useful. -
U
term: HOL Light'sASSUME
primitive inference rule. The input term is required to have typebool
. This rule generates the conclusion term from the hypothesis term. -
V
th TH: HOL Light'sTRANS
primitive inference rule. There are two input theorems th and TH, which are required to have conclusions of the form a = b and B = C respectively, where b is the same as B modulo renaming of bound variables. This rule generates the conclusion a = C from a merged list of hypotheses of the input theorems. -
Z
th TH: HOL Light'sDEDUCT_ANTISYM_RULE
primitive inference rule. This rule generates the conclusion p = P, where p and P are the conclusions of th and TH respectively, from a merged list of the following hypotheses:- hypotheses of th that are different from P modulo renaming of bound variables;
- hypotheses of TH that are different from p modulo renaming of bound variables.
For example, given theorems drawing the conclusion p from the hypothesis P and drawing the conclusion P from the hypothesis p, this rule generates the conclusion p = P with no hypotheses.
The other inference tags are as follows:
-
X
term: HOL Light'snew_axiom
(reported asAXIOM
by ProofTrace). This rule generates the conclusion term from no hypotheses, and adds term to a list of axioms. -
C
term text: HOL Light'snew_basic_definition
(reported asDEFINITION
by ProofTrace). The input term is required to have the form c = t, where c is a variable named text of some type. (HOL Light'snew_basic_definition
takes a variable as input; beware that ProofTrace reports the output, which has a constant.) The term t is required to have no free variables, and to have no type variables except for type variables appearing in c. This rule generates the conclusion C = t with no hypotheses, where C is a constant named text. -
A
reptype repterm th text: HOL Light'snew_basic_type_definition
(reported asTYPE_DEFINITION
by ProofTrace). The input theorem th is required to have a conclusion of the form repterm(x) with no hypotheses, where x is some term of type reptype. The final argument text is required to have the form]
abstypename]
absname]
repname]
tyvar0]
tyvar1]
...]
, starting and ending with]
. This produces a conclusion (with no hypotheses) of the form absname(repname(a
)) =a
wherea
is a variable of type reptype,- absname maps from abstypename(tyvar0,tyvar1,...) to reptype, and
- repname maps the other way.
The names tyvar0, tyvar1, etc. must be the output of the following: take all type variables appearing in repterm, sort in lexicographic order, and remove duplicates.
Each
A
line is assigned another counter, the "typedef ID", which starts at 0 and is incremented after eachA
line. -
D
typedef: the other theorem from HOL Light'snew_basic_type_definition
(this theorem is reported on a secondTYPE_DEFINITION
line by ProofTrace). Here typedef is a typedef ID of anA
line. This produces a conclusion (with no hypotheses) of the form repterm(r
) = (repname(absname(r
)) =r
) wherer
is a variable.
As an example of a type definition,
take type variables A
and B
,
and consider the set of functions
of the form (a ↦ b ↦ a=x and b=y)
having type A->B->bool
,
where x and y range over A
and B
respectively.
Types in HOL Light are nonempty,
so this set is nonempty.
Assume that the most recent theorem proves that this set is nonempty;
that the 430th term before now is this set,
with type (A->B->bool)->bool
;
and that type 49 is the type A->B->bool
.
Then the inference
A1H-DE!]prod]ABS_prod]REP_prod]A]B]
returns the theorem ABS_prod(REP_prod(a)) = a
,
where ABS_prod
maps A->B->bool
to a new type prod(A,B)
and REP_prod
maps the other way.
All constant names must be distinct strings,
where constant names mean the following:
=
;
@
;
text in each C
line;
absname in each A
line;
repname in each A
line.
All type names must be distinct strings,
where type names mean the following:
bool
;
fun
;
ind
;
abstypename in each A
line.
ProofTrace IDs
The tags ,
and ;
are used to enable exact regeneration of a ProofTrace file
given an appropriate HOLTrace file.
Exact regeneration means a byte-for-byte match.
This often requires
multiple copies of the same theorem
and multiple proofs of the same theorem;
ProofTrace has its own set of IDs distinguishing these.
A file reader that handles ProofTrace-related tags
maintains a "ProofTrace counter"
that starts at 0 and that increments by 1 immediately after
each line with ,
or ;
as a tag.
The ProofTrace counter is separate from
the other counters mentioned above.
ProofTrace lines have the following meanings:
-
,
newth inf ptth ptTH: The current ProofTrace ID includes inference inf deriving theorem newth. Here newth is a theorem ID (encoded relative to the theorem counter); inf is an inference ID (encoded relative to the inference counter); ptth and ptTH are ProofTrace IDs (encoded relative to the ProofTrace counter) for theorems matching the th and TH inputs to the inference rule. The ptTH argument is omitted if the inference rule does not involve TH, and the ptth argument is omitted if the inference rule does not involve th. For example, the line,!!
means that the current ProofTrace ID includes the most recent inference deriving the most recent theorem, via an inference rule not involving TH or th. -
;
: No effect other than increasing the ProofTrace ID counter. (ProofTrace IDs are mostly sequential starting from 0 but skip numbers on certain occasions.)
Outside the context of generating ProofTrace outputs
(to check against ProofTrace or to interoperate with tools that expect ProofTrace format),
lines with tag ,
(or ;
) are not useful and can be removed,
typically saving more than half of the space in a HOLTrace file
(and speeding up further processing).
File readers not using ProofTrace tags also do not need the inference counter,
although an "inference initialized" flag
is useful for checking each theorem against the most recent inference.
A ProofTrace-to-HOLTrace converter
generates /
before ,
when it sees an ID that has a theorem name.
A HOLTrace-to-ProofTrace converter
treats /
as applying to the current ProofTrace ID.
Version: This is version 2025.06.16 of the "Format" web page.