Monday, December 4, 2017

Build a Compiler for AlbaCore in Spring 2018

Profs. Brockman and Thain are currently recruiting multiple undergraduate students to participate in a spring project at the intersection of compilers and computer architecture.  The objective is to build a software toolchain (compiler, assembler, simulator, documentation, etc) that will allow programs written in C-minor to be compiled to the albaCore computer architecture.  This package will be used in future offerings of logic design to assist students in running real, complex programs on custom FPGA hardware.

Juniors or seniors who have taken either compilers or architecture (or both) are invited to apply by contacting either Prof. Thain ( or Brockman (  The project will be offered as a three-credit undergraduate research class in Spring 2018.

Tuesday, June 27, 2017

Talk at ScienceCloud Workshop

Prof. Thain gave the opening talk, "Seamless Scientific Computing from Laptops to Clouds", at the ScienceCloud workshop preceding High Performance Distributed Computing 2017 in Washington, DC.  This talk gives an overview of the problem of migrating scientific codes from the comfortable environment of a laptop to the complex environment of a cluster or a cloud, highlighting our new tools for software deployment and resource management for bioinformatics and high energy physics applications.

Wednesday, May 31, 2017

Online Course in Data Intensive Scientific Computing

We are happy to announce the pilot of a new online short course in Data Intensive Scientific Computing.  This is the equivalent of a one-credit seminar which provides an introduction to the challenges of scientific computing at large scale and the tools used to address those problems.

The course was designed to augment our summer REU program in DISC, but is also suitable for undergraduate students taking research credits, and for graduating students in all disciplines looking for an introduction to topics and tools in scientific computing.

By default, the online course is ungraded: anyone is welcome to sign up, view the lectures, take the quizzes, and follow the tutorials.  If you want to receive a grade, talk to a faculty member at your institution to see if they will work with you on the material.

The course is developed by Prof. Douglas Thain and Prof. Paul Brenner, produced by the Office of Digital Learning at the University of Notre Dame, and offered through the EdX Edge platform.

You can check out a sample lecture here:

And here is an overview of the course structure:

Thursday, February 2, 2017

Writing a Compilers Textbook

To my surprise, I am in the final steps of writing a textbook!  You can see a sample chapter today at

The effort began in the fall of 2016, as I was putting together my materials for CSE 40243, our undergraduate class in Compilers and Language Design.  This class focuses on the challenges of engineering a working language: students implement a working compiler that translates a C-like language into X86 assembly.

While there are a variety of solid textbooks that are great for a graduate course in compiler theory and optimization, none quite had the flavor I was looking for.  Nearly every CS grad needs to write a parser, evaluator, or translator for some kind of little language in their career, but relatively few need to dig deeply into assembly language optimization.  So, I wanted to focus on language design choices and show that simple languages are not hard to implement.

I began to combine my handwritten chalkboard notes and some sample code into a LaTeX document, and the next thing you know, I have seven chapters written.  I expect to finalize everything in the spring 2017 semester.

What has made it relatively easy so far is that my compiler automatically generates many of the figures and code examples automatically, so relatively few things have to be drawn by hand.  For example, this sample AST is produced automatically by the compiler emitting Graphviz DOT code from the internal representation.  Neat, eh?

Following the example of Remzi and Andrea Arpaci-Dusseau with OSTEP the book will be made available for free online in PDF form, and also in an inexpensive hardcover edition printed on-demand.

Stay tuned for the release later in 2017...

Tuesday, April 5, 2016

NunyaOS: An Experimental OS Kernel

This semester, I am organizing an experimental class around the design of an operating system kernel.  Six students formed a team in response to a call for volunteers, and now busy designing NunyaOS, an experimental OS kernel.  Building on top of the Basekernel, they have built a system that boots an X86 machine, reads a CD-ROM filesystem, runs multiple processes in paged virtual memory, and has a simple windowing system.  We are off too a good start.

To try it out, download the source, build it, and run it in a VM like this:
qemu-system-i386 --cdrom basekernel.iso

The key organizing principle of NunyaOS is hierarchical containment.  This means that each process lives within a security container.  Within that container, the process has complete authority to manipulate its resources.  It also has the power to create sub-containers and then place child processes within them.  The containment can be applied to each of the resources within the system -- currently the filesystem, the window system, and the memory allocator.  As a result, each process lives a in a sort of a lightweight virtual machine, where it perceives itself to be the superuser.

For example, here are a few nested containers, each with their own filesystem root, display, and memory allocation:

Ideally, every child process will live in a container, so that we can eliminate attack vectors between code provided from different sources.  For example, your desktop should run your web browser in a container, your web browser should run each tab in a container, and each tab should run downloaded code (like a video codec) in yet another container.  In this way, untrusted code has very little leeway to affect other elements of your system.

Of course, this idea changes the customs by which processes interact with each other.  We can no longer build programs that scatter data all over the filesystem, and expect others to read it.  There are many challenges here, and we have only begun to dig into them.

Wednesday, September 16, 2015

Sandboxes, Distributed Computing, and Closures

The sandbox abstraction comes up over and over again in the context of distributed computing.  We didn't invent it, but it appears so frequently that it's worth giving it a name and explaining how it works.  This abstraction is our model of execution in the Makeflow workflow system, the Work Queue master-worker framework, the Umbrella environment generator, and other systems, and this is what enables these tools to interoperate.

The sandbox abstraction is a way of specifying a remote execution in a way that can be efficiently delivered and precisely reproduced.  To run a task, the user states the following:

run C = T( A, B ) in environment E

In this example, A and B are the input files to the task, T is the command to be run, and C is the output file produced.  (Obviously, there can be more input and output files as needed.)

The sandbox itself is a private namespace in which the task can run, isolated from the outside world.  This enables the task to perceive the input and output files in a different way than the called.  A sandbox can be implemented in many ways: the simplest is just a plain old directory, but it could a Linux container or even a whole virtual machine.

The environment E is all the additional data that needs to be present within the sandbox: the operating system, the filesystem tree, program binaries, scripts, etc, which are represented by L1, L2, and L3 above.  The environment must be compatible with the sandbox technology.  For example, a tarball is a sufficient environment for executing within a directory sandbox, while a virtual machine image is needed for a virtual machine sandbox. 

Now, the pieces must all come together:  The sandbox must be created and the environment unpacked within it.  The input files must be moved to the execution site and copied or otherwise connected to the sandbox.  The task is run, producing the output, which must then be moved outside of the sandbox to the desired location.  Then, the sandbox may be discarded.

Once you begin to execute all tasks using the sandbox abstraction, many things become easier.
  • Executing tasks at remote sites becomes very easy, because all of the necessary dependencies are explicit and can be moved around the world.  (e.g. Work Queue
  • Similar tasks running on the same machine can share input objects, to improve efficiency.  (e.g. Umbrella)
  • Multiple tasks can be chained together while respecting independent namespaces.  (e.g. Makeflow)
Of course, all of these properties are not accidental: they have a good precedent in the realm of language theory.  A sandbox execution is really just a closure, which is the name for a function combined with an environment, which is a set of bindings from names to values.

Wednesday, May 20, 2015

Writing Solid Tests is (Still) Hard

We have a nice little automatic build-and-test system for the Cooperative Computing Tools which has nicely brought together the capabilities of Github, Condor, and Docker.

Every proposed merge to the codebase is packaged up as a build job which is dispatched to our Condor pool.  Some of those jobs run natively on bare hardware, some jobs run on virtual machines, and some are running in Docker containers, but all of them are managed by Condor, so we can have a zillion builds going simultaneously without too much conflict.

The result is that anytime someone proposes a pull request, it gets run through the system and a few minutes later we get a new row on the web display that shows whether each platform built and tested correctly.  It's very handy, and provides for objective evaluation and gentle pressure on those who break the build.

(I should point out here that Patrick Donnelly and Ben Tovar have done a bang-up job of building the system, and undergraduate student Victor Hawley added the Docker component.)

Some days the board is all green, and some days it looks more like this:

But the hardest part of this seems to be writing the tests properly.  Each test is a little structured script that sets up an environment, runs some component of our software, and then evaluates the results.  It might start up a Chirp server and run some file transfers, or run Parrot on a tricky bit of Unix code, or run Makeflow on a little workflow to see if various options work correctly.

Unfortunately, there are many ways that the tests can fail without revealing a bug in the code! We recently added several platforms to the build, resulting in a large number of test failures.  Some of these were due to differences between Unix utilities like sh, dd, and sed on the various machines. Others were more subtle, resulting from race conditions in concurrent actions.  (For example, should you start a Master in the foreground and then a Worker in the background, or vice versa.)  There is a certain art to being able to write a shell script that is portable and robust.

There is also a tension in the complexity of the tests.  On one hand, you want short, focused tests that exercise individual features, so that they can be completed in a few minutes at give immediate feedback.

On the other hand, you also want to run big complex applications, so as to test the system at scale and under load.  We don't really know that a given release of Parrot works at scale until it has run on 10K cores for a week for a CMS physics workload.  If each core consumes 30W of power over 7 days, that's a 50 megawatt-hour test!  Yikes!

Better not run that one automatically.