The Australian National University
CSL Home | CECS Home | ANU Home | Search ANU | HORUS | Staff Home

Logic and Computation Group: Potential Projects

Projects

The following are suggested projects which members of the Logic and Computation group have indicated they are willing to supervise. Some of them could be undertaken either briefly for a summer internship or in greater depth and detail for a Masters or PhD thesis. Suitability for PhD, Masters, Honours or Summer Scholarship is indicated in each case.

Students interested in pursuing research at any level in the group are invited to contact the group members directly, mentioning any of the advertised projects which they find interesting.

Optimal policies for stochastic local search
Study of the performance limits of local search methods for propositional satisfiability problems.
Supervisor: John Slaney

Stochastic Local Search (SLS) algorithms for propositional satisfiability (SAT) problems have been shown to perform spectacularly well where the problems are fairly unstructured (random k-SAT, for instance) but to be less impressive in the cases of more structured problems. In this project, we treat the search problem as a Markov decision process (MDP) in which the various moves available to the SLS solver, such as "flip a random bit" or "flip a random bit in a random unsatisfied clause", are the actions determining (stochastic) transitions and in which the optimal policy is that determining the stochastic shortest path to a goal state.

Of course, practical algorithms cannot really solve an MDP with trillions of states in reasonable time, so this is not proposed as a solution method. The point of the project is to carry out a theoretical and experimental examination of the effect of different moves and different strategies with a view to comparing strategies to the optimum.

Requirements on the part of the student are an interest in automated reasoning, some mathematical background and a willingness to undertake a good deal of experimental work.

The project could be a suitable basis for an honours or maybe a Masters thesis. Alternatively it could form part of a larger PhD project on local search or on the mathematics of boolean logic.

Son of SCOTT: a high-performance automatic theorem prover
Development of a semantically guided theorem prover with world-beating performance.
Supervisor: John Slaney

The SCOTT project, which has been running intermittently for about 15 years, is concerned with using (small) models of sets of formulae to enhance the performance of a theorem prover for first order logic. Two recent developments have made it interesting to start a new phase of the project. Firstly, in 2004 we developed prototype of SOS (Son of SCOTT), a new prover with a much more efficient way of using semantic information. Secondly, the underlying theorem prover Otter was superceded by the much faster Prover9. Putting all of this together, it is now possible to construct one of the best systems in the world for general purpose first order reasoning.

Writing, testing, refining and using an important new piece of reasoning software is a project requiring creativity as well as programming skills. Investigating the effectiveness of semantic guidance as an element of first order deduction further requires an understanding of logic and proof. For the right student, this project offers the opportunity to "push the envelope" of what mechanised reasoning software can do and thus to make an impact on the applications of automated reasoning.

This is a PhD project. Fragments of it could be undertaken as the research components of lower degrees.

ARIA: Automated reasoning in algebra
A critical examination of the application of automated systems to algebraic reasoning.
Supervisors: Tomasz Kowalski, John Slaney

Some of the most impressive successes of automatic proof generation have been in abstract algebra, the most celebrated being the proof of the Robbins conjecture by McCune using the theorem prover EQP. Moreover, a brief examination of the TPTP (Thousands of Problems for Theorem Provers) library reveals a large collection of problems from elementary algebra: in addition to the ALG section, there are whole sections on group theory, ring theory, field theory, lattice theory and boolean algebra. This emphasis on algebraic reasoning is perhaps not too surprising given that algebra naturally consists of theories determined by small sets of axioms and features symbolic rather than, say, numerical reasoning. TPTP contains many problems from both classical and non-classical propositional logic for much the same reasons. Unlike propositional logic, however, algebra engages the interest of a large mathematical community, so a success of automated reasoning such as the settling of the Robbins conjecture is a significant event. At the same time, it has been often enough said that the Robbins conjecture - what we should now call EQP's Theorem - is "just algebra", as though that makes it less important than the much more difficult reasoning in analysis or number theory being attempted in other automated reasoning research programs. Behind this reaction is perhaps the thought that algebraic manipulations are expected to be easy for machines, however painful they may be for human mathematicians.

An examination of the relationship between automated reasoning and abstract algebra is overdue. The proposed ARIA project will survey the past and present state of research into automatic proof of algebraic theorems and will help to shape such research by directing it towards what is of interest to algebraists rather than towards what is easy for the existing tools. That is, in a sentence, we intend to substitute mathematical pull for technological push.

This is quite a flexible project. It could be undertaken as a one-year project by an honours student with a strong mathematical background, or extended to form the basis for PhD level research.

Real-time Logic
Concepts and techniques for reasoning with unstable data.
Supervisor: John Slaney

In all standard views and applications of logic, we are concerned with the timeless notion of logical closure of a set (or sometimes a multiset or whatever) of formulae. The consequences of data just are what they are, instantaneously, even though in practice it takes time to enumerate them - indeed, it would take forever to enumerate them all. In this project, we ask whether the standard notion is the most appropriate way to think of the matter, given that sometimes a reasoning agent must make and continually update its deductions, and base decisions upon them, against a background of changing data. Does it make sense to think of consequence as somehow rippling out from the input data, so that at any given moment the "correct" answer as to whether a given conclusion follows depends on its deductive distance from the input? Are there useful notions of logical robustness (insensitivity to small changes), for instance, that can be used to qualify proof systems or databases? Does the reliability of inference depend on time as well as on logical accuracy?

These questions are very radical - there is virtually no literature on the subject. We can approach them both from the standpoint of logical theory, as they may suggest new foundations for the very notion of consequence, and from that of practical automated reasoning, as they may lead to new design features for implementations aimed at real-time applications.

This is a challenging PhD project, potentially with a high impact but also with a risk that we find nothing useful. One I would be keen to explore in collaboration with an adventurous student!

Logical analysis of constraint models
Application of automated deduction to data-independent CSPs
Supervisor: Peter Baumgartner

Constraint programming is a highly successful technology for solving all kinds of combinatorial problems. It is one of the closest approaches computer science has yet made to the Holy Grail of programming: the user states the problem, the computer solves it. A constraint model consists of a (declarative) specification of the problem domain (e.g. what a schedule is) and a data part specifying the concrete problem.

Now, an interesting situation appears when the data part is absent. Can we still do useful things then? Yes! Logical techniques based on automated theorem proving can be used to analyze the domain specification. For instance, one might want to optimize the given specification. The situation is comparable to query optimization in databases, which amounts to modifying a query (without knowing the database) so that it can be solved more efficiently.

Here are some possible topics. Each of them admits practical and/or theoretical focus.

  • Translation of problem descriptions ("models") into logical formalism.
  • Model optimization via mapping to logic proof tasks.
  • Application of SMT technology for problem analysis.
  • Integration and extension of the above mentioned base technology into the G12 platform.

The work would be embedded in a larger project that develops NICTA's software platform for solving large scale industrial combinatorial optimisation problems.

Instance-based methods
(Mainly) extending the theorem prover Darwin which is (co-)developed in our group
Supervisor: Peter Baumgartner

Instance-based methods for first-order theorem proving search for proofs by maintaining a set of instances of clauses and analyzing it for satisfiability until completion. IBMs are conceptually different from well established methods like resolution or free-variable analytic tableaux, with a different search space and different termination behaviour. This makes them attractive from a practical point of view as a complementary method. For instance, IBMs are decision procedures for function-free clause sets and thus capture the complexity class NEXPTIME. Topics include:

  • Equality reasoning: reasoning with equations is of paramount importance in virtually all application areas. As the theory of doing this within Darwin is already developed (one could always improve, though) this is mostly a programming project. Programming language is OCaml.
  • Numbers: first-order theorem provers are utterly bad with numbers. This project is about adding and experimenting with finite-domain integer arithmetic to Darwin. This could be done by clever encoding arithmetic in the input and/or extending Darwin.
  • Minimal model computation: Darwin can compute finite models of first-order logic formulas (provided they exist). For some applications, e.g., within computational linguistics or diagnosis, it is not enough to deliver just any model. One wants to have a minimal model. In diagnosis, for instance, this corresponds to the experience that usually only a minimal set of components of a device breaks at a time.
  • Some domains are necessarily infinite, e.g., the integers. Instance-based methods are bad at detecting satisfiability in presence of axiom sets that admit infinite models only. Can we find a clever mechanism to improve the situation, i.e. to discover and represent (certain) infinite models? This is a challenging (also) theoretical project.
  • Many practically interesting problem classes are NEXPTIME-complete. Examples include satisfiability of SHOIQ knowledge bases (important for the Semantic Web), first-order model expansion (relevant for constraint solving), satisfiability in the Ackermann-Class with equality, and First-Order logic with two variables and counting quantifiers. This suggests a number of investigations: to look at existing transformations of the above problems into function-free clause logic, check if they are feasible, perhaps improve them and use tools such as our Darwin prover to solve them.

Verified compilation
Towards a provably correct compiler for a standard programming language.
Supervisor: Michael Norrish

It is now possible to describe real world languages, such as C, Java and Ada, using formal methods (typically by giving the languages an operational semantics). Real world chip-architectures (such as the ARM instruction set) have also been formally described. The big, remaining project is to tie two such descriptions together, producing a compiler that is proven to produce correct output.

Issues:

  • Modern compilers include many optimisations. Can a verified compiler also include verified optimisations?

  • The “obvious” way to get a verified compiler is to define the function that maps from source-code to target assembly in a logic, and then prove that this function preserves behaviour (for some value of “behaviour”). Another approach might also be possible: write code to prove that a piece of assembly implemented the same behaviour as the source. As long as the proof procedure had high coverage (it might be tailored to only verify the output of a particular compilation function, which function might also output hints to help the procedure do the proof), this would be “just as good”.

Tangents and Variations:

  • Compile a functional or logic language like SML or Prolog.

  • Give a semantics to the sort of code that appears in operating system kernels, which mixes C and assembler, potentially within the same function.

  • Verify sophisticated optimisations on intermediate languages.

Further reading:

But: Xavier Leroy may well “do” this whole project in the near future; his existing verifications go from a largish subset of C to a subset of the PowerPC instruction set.

Formal Protocol Design and Analysis
Contribution to the specification and alanysis of communication protocols
Supervisor: Michael Norrish

Design a language capable of expressing the intricacies of modern communications protocols (such as TCP), but which also admits a reasonably efficient checking function, able to determine if an observed trace could have been generated by a given protocol description. Such a tool and language would be extremely valable to the community of protocol designers and implementors.

This work would form part of NICTA's project "Valildating Network Semantics", which is a collaboration with researchers at the University of Cambridge.

Tangents and Variations:

  • Write a tool to animate a specification so that it behaves as a (possibly inefficient) implementation of the protocol.

Further reading:

Theory of Co-algebraic Types with Binders
Development of new logical theory relevant to programming language semantics
Supervisor: Michael Norrish

Researchers in the area of programming language semantics have long been exercised by the problems thrown up the notions of binders and α-equivalence. Some progress has been made recently, but focussing on types that are quotients of algebraic or inductive types, where values are finite in size. Co-algebraic types are potentially infinite in size, which means that assumptions like the one that a value only has finitely many free variables will no longer hold.

Developing a theory of co-algebraic types with binders would probably involve proving co-induction and co-recursion principles for these types, and demonstrating their utility in proofs drawn from the literature. One possible example would be the Boehm tree, which is used in the theory of the λ-calculus.

Further reading:

Mechanise X, Apply to Y
Contribution both to mechanised mathematics and to automated reasoning applications.
Supervisor: Michael Norrish

A possible structure for a PhD project is to mechanise some large-ish piece of mathematics (X) and to then use this to solve some “real” computer science problem (Y). Mechanisation is likely to require sustained use of some tool capable of serving as a reasoning assistant.

To bring this PhD template down to earth, consider a couple of examples. One might, for instance, mechanise real analysis, and then use it to verify floating-point algorithms ( John Harrison). Or mechanise probability theory, and then use it to verify probabilistic algorithms ( Joe Hurd).

Theorem-Proving Implementation—Interfaces
Exploring the area where automated reasoning overlaps with HCI
Supervisor: Michael Norrish

How would one take an interactive, but powerful, tool such as ACL2, HOL, Isabelle, or PVS and make it more user-friendly, without compromsing its ability to handle large problems? A graphical user-interface might be appropriate, but would a GUI put off the power-user? Perhaps a more significant rethinking of the theorem-proving work-flow is necessary.

Further reading:

Theorem-Proving Implementation—Core Technology
Proving more theorems faster, with or without human intervention
Supervisor: Michael Norrish

This would be work on topics such as first-order proof, rewriting and decision procedures for subject domains like arithmetic. There is a wealth of expertise in CSL and the Logic & Computation programme on these areas, so I might not necessarily be the best person to act as supervisor, though work on these technologies in an LCF setting (which tends to be interesting research) would be something I would be involved with.

Proof Optimisation
Improving low-level proofs extracted from high-level ones
Supervisor: Michael Norrish

A proof over an LCF-style kernel is usually expressed (by the user) in terms of high-level steps or tactics. Ultimately however, the proof is still executed at the lowest level of primitive inference steps. If these steps are recorded, the resulting “proof trace” is very long, for all that it may be possible to execute it quickly.

A possible project would be to investigate these traces and optimise them so that they proved the same results but did so faster. Many of the tricks of compiler optimisation might well have proof analogues, and insights from proof theory may also be relevant.

This may turn out not to involve enough new work to warrant a PhD, so it might be best undertaken as the research component of a lower degree first, to act as a pilot project for something larger if necessary.

General HOL4 Upgrade
Programming project
Supervisor: Michael Norrish

Upgrade the current HOL implementation (“HOL4”) to run under the MLton compiler. This would require a fairly substantial re-engineering of the basic HOL work-flow, and the way it represents theories. Once this much was done, there’d be scope to play with connections to Eclipse (see interfaces project above).

Further reading:

Finding Models of Unsolvable First Order Goals
Implementing a generator of counter-examples for use in a proof assistant
Supervisors: Peter Baumgartner and Michael Norrish

When using an interactive proof assistant, a powerful tactic for finishing goals is the use of first order proof procedures. Unfortunately, these procedures are all-or-nothing affairs: if they succeed, they succeed completely, but if they fail to prove the goal, the user knows only that the provided lemmas and assumptions were not sufficient to allow the procedure to find a proof.

The advent of model generation tools promises a way out of this painful situation. These tools can be used to generate what is effectively a counter-example to false goals. This is much more helpful for the user. Either the tool's output really is a counter-example and their proof will never succeed, or the counter-example will be impossible because the user has forgotten to tell the tools some relevant lemma from a database of known facts.

The student’s task in this project would be to implement a connection between two existing tools: the interactive system HOL4, and the model generation tool Darwin. This connection would provide a user-friendly implementation of the scheme described above: when an invocation of the first order prover in HOL failed, the connection implementation would translate the HOL goal into terms that Darwin could understand, call Darwin, and then finally translate Darwin's provided counter-example back into a human-readable form for the user’s benefit.

This project would suit a student familiar with functional programming (HOL is written in SML), and happy with the concepts of first order logic (terms, formulas, quantifiers).