Trustworthy File Systems
Develop a framework for cogenerating high-performance systems (file systems in particular) code and proofs which guarantee that the generated low-level code is a correct implementation up to some abstract specification of the system.
File systems (FSs) are too important, and current ones are too buggy, to remain unverified. Yet the most successful verification methods for functional correctness remain too expensive for current file system implementations — we need verified correctness but at reasonable cost.
We are working towards this goal with ext2fs from linux, and BilbyFS, a high-performance flash file system as case studies. BilbyFs is carefully designed to be highly modular, so it can be verified against a high-level functional specification one component at a time. This modular implementation is captured in a set of domain-specific languages (DSLs) from which we produce the design-level specification, as well as its optimised C implementation. Importantly, we also automatically generate the proof linking these two artefacts. The combination of these features dramatically reduces verification effort. Verified file systems are now within reach for the first time.
Approach: Cogeneration of code and proof
Our approach relies on number of key observations:
- FSs lend themselves to modular implementation, partially as a result to a number of pre-defined abstraction levels. The implementation can be decomposed into multiple components where the correctness of each component can be verified separately.
- Due to the nature of FSs, their implementations tend not to have much sharing among data structures. This feature can be well captured by linear type systems, which directly enable easy memory management and as a consequence build a straight linkage between a pure high-level semantics and a stateful low-level semantics.
- The control logic is cluttered and obscured with error-handling code; in some modules error-handling makes up 80% of all code. Much of this is boilerplate (but nevertheless error-prone) and an also be abstracted away into a DSL. This then lets the implementer focus on the high-level logic.
- In-memory data structures for maintaining the file system information can largely be handled by suitable abstract data types (ADTs), which are separately implemented and verified.
- A large subset of FS code complexity can be easily described using DSLs, namely code for serialising and de-serialising between rich, in-memory data structures and flat, on-medium representations and code implementing the actual file-system logic.
The below figure shows the main components of our framework. Dark grey boxes are the tools that we employ in this framework. Light grey boxes represent code or specifications that are automatically generated; white boxes represent semantics; block arrows represent the proofs which connect them (with green boxes being the part that we have managed to fully automate). Dotted boxes indicate components which the FS implementer must provide. While the correctness specification, proofs and implementation (dotted boxes and arrows in the figure) are specific to the FS being built, ADT library is reusable across many different FSs.
As indicated in the diagram, the file system implementer must provide four distinct components:
- A high-level specification of FS functionality (in Isabelle/HOL)
- The proof that the COGENT generated high-level specification corresponds to the functional correctness specification of the FS
- The implementation of the FS in COGENT
- The implementation of ADTs and interfaces to OS in C, together with proofs that the implementation is correct with respect to the functional correctness specification (partially provided by us or other ADT authors)
COGENT compiler takes COGENT source code as input and produces, from the bottom of the figure above: the C code, the semantics of the C code expressed in Isabelle/Simpl, the same expressed as a monadic functional program, a monomorphic deep embedding of the COGENT program in A normal form, a polymorphic A-normal deep embedding of the same, an A-normal shallow embedding, and finally a 'neat' shallow embedding of the COGENT program that is syntactically close to the COGENT source code. Several theorems rely on the COGENT program being well-typed, which we prove automatically using type inference information from the compiler. The solid arrows on the right-hand side of the figure represent refinement proofs. The only arrow that is not formally verified is the one crossing from C code into Isabelle/HOL at the bottom (the red arrow) — this is the C parser, which is a mature verification tool used in a number of large-scale verifications (e.g. seL4).
For well-typed COGENT programs, we prove a series of refinement relations. The refinement proofs state that every behaviour exhibited by the C code can also be exhibited by the COGENT code and, furthermore, that the C code is always well-defined, including that e.g. the generated C code never dereferences a null pointer, and never causes signed overflow. It also implies that the generated C code is type-safe and memory-safe, meaning the code will never try to dereference an invalid pointer, or try to dereference two aliasing pointers of incompatible types. From the generated proofs, We additionally get guarantees that the generated code handles all error cases, is free of memory leaks, and never double-frees a pointer.
The project explores a new methodology for designing and formally verifying the correctness of systems code, based on a new language COGENT. We use file systems for case studies. Specific contributions include:
- Specification of FS correctness
- The use of linearly typed COGENT language for real world FS implementation
- Proof of FS operations correctness and how COGENT reduces the effort
- The design and formal development of the COGENT language and its certifying compiler
- The fully automatic refinement tower between C code and COGENT's language specification
Code and documentations for this project can be found here: Github
- Given the fact that FS code consist of a large portion of data serialisation/deserialisation, and most of them are boilerplate code, we plan to have another data description language on top of COGENT to further automate verifiably correct FS implementation. We have designed a prototype of that language.
- From the studies we conducted to date, we have a rather clear picture of the pros and cons of the COGENT design, and will examine other alternatives and improvements.
Gabriele Keller (project leader), gabriele.keller<at>nicta.com.au
|Sidney Amani, Alex Hixon, Zilin Chen, Christine Rizkallah, Peter Chubb, Liam O'Connor, Joel Beeren, Yutaka Nagashima, Japheth Lim, Thomas Sewell, Joseph Tuong, Gabriele Keller, Toby Murray, Gerwin Klein and Gernot Heiser|
Cogent: verifying high-assurance file system implementations
International Conference on Architectural Support for Programming Languages and Operating Systems, Atlanta, GA, USA, April, 2016
|Sidney Amani and Toby Murray|
Specifying a realistic file system
Workshop on Models for Formal Analysis of Real Systems, pp. 1-9, Suva, Fiji, November, 2015
||Gabriele Keller, Toby Murray, Sidney Amani, Liam O'Connor-Davis, Zilin Chen, Leonid Ryzhyk, Gerwin Klein and Gernot Heiser|
File systems deserve verification too!
Workshop on Programming Languages and Operating Systems (PLOS), pp. 1-7, Farmington, Pennsylvania, USA, November, 2013
A revised version of this paper was published in Operating Systems Review, Volume 48, Issue 1, January 2014, pages 58-64.
||Sidney Amani, Leonid Ryzhyk and Toby Murray|
Towards a fully verified file system
Poster presentation at EuroSys Doctoral Workshop, Bern, Switzerland, April, 2012