Tuesday, June 27, 2017

Talk at ScienceCloud Workshop

Prof. Thain gave the opening talk, "Seamless Scientific Computing from Laptops to Clouds", at the ScienceCloud workshop preceding High Performance Distributed Computing 2017 in Washington, DC.  This talk gives an overview of the problem of migrating scientific codes from the comfortable environment of a laptop to the complex environment of a cluster or a cloud, highlighting our new tools for software deployment and resource management for bioinformatics and high energy physics applications.

Wednesday, May 31, 2017

Online Course in Data Intensive Scientific Computing

We are happy to announce the pilot of a new online short course in Data Intensive Scientific Computing.  This is the equivalent of a one-credit seminar which provides an introduction to the challenges of scientific computing at large scale and the tools used to address those problems.

The course was designed to augment our summer REU program in DISC, but is also suitable for undergraduate students taking research credits, and for graduating students in all disciplines looking for an introduction to topics and tools in scientific computing.

By default, the online course is ungraded: anyone is welcome to sign up, view the lectures, take the quizzes, and follow the tutorials.  If you want to receive a grade, talk to a faculty member at your institution to see if they will work with you on the material.

The course is developed by Prof. Douglas Thain and Prof. Paul Brenner, produced by the Office of Digital Learning at the University of Notre Dame, and offered through the EdX Edge platform.

You can check out a sample lecture here:



And here is an overview of the course structure:

Thursday, February 2, 2017

Writing a Compilers Textbook

To my surprise, I am in the final steps of writing a textbook!  You can see a sample chapter today at compilerbook.org.

The effort began in the fall of 2016, as I was putting together my materials for CSE 40243, our undergraduate class in Compilers and Language Design.  This class focuses on the challenges of engineering a working language: students implement a working compiler that translates a C-like language into X86 assembly.

While there are a variety of solid textbooks that are great for a graduate course in compiler theory and optimization, none quite had the flavor I was looking for.  Nearly every CS grad needs to write a parser, evaluator, or translator for some kind of little language in their career, but relatively few need to dig deeply into assembly language optimization.  So, I wanted to focus on language design choices and show that simple languages are not hard to implement.

I began to combine my handwritten chalkboard notes and some sample code into a LaTeX document, and the next thing you know, I have seven chapters written.  I expect to finalize everything in the spring 2017 semester.

What has made it relatively easy so far is that my compiler automatically generates many of the figures and code examples automatically, so relatively few things have to be drawn by hand.  For example, this sample AST is produced automatically by the compiler emitting Graphviz DOT code from the internal representation.  Neat, eh?



Following the example of Remzi and Andrea Arpaci-Dusseau with OSTEP the book will be made available for free online in PDF form, and also in an inexpensive hardcover edition printed on-demand.

Stay tuned for the release later in 2017...

Tuesday, April 5, 2016

NunyaOS: An Experimental OS Kernel

This semester, I am organizing an experimental class around the design of an operating system kernel.  Six students formed a team in response to a call for volunteers, and now busy designing NunyaOS, an experimental OS kernel.  Building on top of the Basekernel, they have built a system that boots an X86 machine, reads a CD-ROM filesystem, runs multiple processes in paged virtual memory, and has a simple windowing system.  We are off too a good start.

To try it out, download the source, build it, and run it in a VM like this:
qemu-system-i386 --cdrom basekernel.iso

The key organizing principle of NunyaOS is hierarchical containment.  This means that each process lives within a security container.  Within that container, the process has complete authority to manipulate its resources.  It also has the power to create sub-containers and then place child processes within them.  The containment can be applied to each of the resources within the system -- currently the filesystem, the window system, and the memory allocator.  As a result, each process lives a in a sort of a lightweight virtual machine, where it perceives itself to be the superuser.

For example, here are a few nested containers, each with their own filesystem root, display, and memory allocation:

Ideally, every child process will live in a container, so that we can eliminate attack vectors between code provided from different sources.  For example, your desktop should run your web browser in a container, your web browser should run each tab in a container, and each tab should run downloaded code (like a video codec) in yet another container.  In this way, untrusted code has very little leeway to affect other elements of your system.

Of course, this idea changes the customs by which processes interact with each other.  We can no longer build programs that scatter data all over the filesystem, and expect others to read it.  There are many challenges here, and we have only begun to dig into them.




Wednesday, September 16, 2015

Sandboxes, Distributed Computing, and Closures

The sandbox abstraction comes up over and over again in the context of distributed computing.  We didn't invent it, but it appears so frequently that it's worth giving it a name and explaining how it works.  This abstraction is our model of execution in the Makeflow workflow system, the Work Queue master-worker framework, the Umbrella environment generator, and other systems, and this is what enables these tools to interoperate.

The sandbox abstraction is a way of specifying a remote execution in a way that can be efficiently delivered and precisely reproduced.  To run a task, the user states the following:

run C = T( A, B ) in environment E

In this example, A and B are the input files to the task, T is the command to be run, and C is the output file produced.  (Obviously, there can be more input and output files as needed.)

The sandbox itself is a private namespace in which the task can run, isolated from the outside world.  This enables the task to perceive the input and output files in a different way than the called.  A sandbox can be implemented in many ways: the simplest is just a plain old directory, but it could a Linux container or even a whole virtual machine.

The environment E is all the additional data that needs to be present within the sandbox: the operating system, the filesystem tree, program binaries, scripts, etc, which are represented by L1, L2, and L3 above.  The environment must be compatible with the sandbox technology.  For example, a tarball is a sufficient environment for executing within a directory sandbox, while a virtual machine image is needed for a virtual machine sandbox. 

Now, the pieces must all come together:  The sandbox must be created and the environment unpacked within it.  The input files must be moved to the execution site and copied or otherwise connected to the sandbox.  The task is run, producing the output, which must then be moved outside of the sandbox to the desired location.  Then, the sandbox may be discarded.

Once you begin to execute all tasks using the sandbox abstraction, many things become easier.
  • Executing tasks at remote sites becomes very easy, because all of the necessary dependencies are explicit and can be moved around the world.  (e.g. Work Queue
  • Similar tasks running on the same machine can share input objects, to improve efficiency.  (e.g. Umbrella)
  • Multiple tasks can be chained together while respecting independent namespaces.  (e.g. Makeflow)
Of course, all of these properties are not accidental: they have a good precedent in the realm of language theory.  A sandbox execution is really just a closure, which is the name for a function combined with an environment, which is a set of bindings from names to values.

Wednesday, May 20, 2015

Writing Solid Tests is (Still) Hard

We have a nice little automatic build-and-test system for the Cooperative Computing Tools which has nicely brought together the capabilities of Github, Condor, and Docker.

Every proposed merge to the codebase is packaged up as a build job which is dispatched to our Condor pool.  Some of those jobs run natively on bare hardware, some jobs run on virtual machines, and some are running in Docker containers, but all of them are managed by Condor, so we can have a zillion builds going simultaneously without too much conflict.

The result is that anytime someone proposes a pull request, it gets run through the system and a few minutes later we get a new row on the web display that shows whether each platform built and tested correctly.  It's very handy, and provides for objective evaluation and gentle pressure on those who break the build.

(I should point out here that Patrick Donnelly and Ben Tovar have done a bang-up job of building the system, and undergraduate student Victor Hawley added the Docker component.)

Some days the board is all green, and some days it looks more like this:



But the hardest part of this seems to be writing the tests properly.  Each test is a little structured script that sets up an environment, runs some component of our software, and then evaluates the results.  It might start up a Chirp server and run some file transfers, or run Parrot on a tricky bit of Unix code, or run Makeflow on a little workflow to see if various options work correctly.

Unfortunately, there are many ways that the tests can fail without revealing a bug in the code! We recently added several platforms to the build, resulting in a large number of test failures.  Some of these were due to differences between Unix utilities like sh, dd, and sed on the various machines. Others were more subtle, resulting from race conditions in concurrent actions.  (For example, should you start a Master in the foreground and then a Worker in the background, or vice versa.)  There is a certain art to being able to write a shell script that is portable and robust.

There is also a tension in the complexity of the tests.  On one hand, you want short, focused tests that exercise individual features, so that they can be completed in a few minutes at give immediate feedback.

On the other hand, you also want to run big complex applications, so as to test the system at scale and under load.  We don't really know that a given release of Parrot works at scale until it has run on 10K cores for a week for a CMS physics workload.  If each core consumes 30W of power over 7 days, that's a 50 megawatt-hour test!  Yikes!

Better not run that one automatically.


Monday, May 19, 2014

Toward a Common Model of Highly Concurrent Programming

(This is the short version of a talk I gave at the MTAGS workshop at Supercomputing 2013.  See the slides here.)

Historically, highly concurrent programming has been closely associated with high performance computing.  Two programming models have been dominant: shared memory machines in which concurrency was expressed via multiple threads, and distributed memory machines in which concurrency was expressed via explicit message passing.  It is widely agreed that both of these programming models are very challenging, even for the veteran programmer.  In both cases, the programmer is directly responsible for designing the program from top to bottom and handling all of the issues of granularity, consistency, and locality necessary to achieve acceptable performance, with very little help from the runtime or operating systems.

However, a new approach to concurrent programming has been emerging over the last several years, in which the user programs in a much higher level language and relies upon the system to handle many of the challenging underlying details.  To achieve this, the program is successively decomposed into simpler representations, such that each layer of the system can gradually adapt it to the hardware available.

The layers can be described as follows:
  • declarative language (DL) for compactly representing a complete program.
  • directed graph (DAG) to represent the expanded program and its resources.
  • bag of independent tasks (BOT) with explicit input and output dependencies.
  • A shared-nothing cluster to which data and tasks must be assigned.
Several different research communities have arrived at this computing model somewhat independently: the high performance computing community, the scientific workflow community, and the cloud computing community.  In each case, the scale and complexity of the systems in use eventually made it impossible for the programmer or the user to take responsibility for all of the challenges of parallel/distributed computing.  Although each community employs different technologies and has distinct optimization goals, the overall structure of these systems is surprisingly similar.

A (very incomplete) selection of systems that follow this model:


Layer Cloud Stack Workflow Stack HPC Stack
Declarative Language (DL)    Pig Weaver Swift-T
Directed Acyclic Graph (DAG)Map-Reduce    Makeflow -
Bag of Tasks (BOT)JobTracker Work Queue Master Turbine
Distributed Data HDFS Work Queue Workers    MPI

Each layer of the system fulfills a distinct need.  The declarative language (DL) at the top is compact, expressive, and easy for end users, but is intractable to analyze in the general case because it may have a high order of complexity, possibly Turing-complete.  The DL can be used to generate a (large) directed acyclic graph (DAG) that represents every single task to be executed.  The DAG is not a great user-interface language, but it is much more suitable for a system to perform capacity management and optimization because it is a finite structure with discrete components.  A DAG executor then plays the graph, dispatching individual tasks as their dependencies are satisfied.  The BOT consists of all the tasks that are ready to run, and is then scheduled onto the underlying computer, using the data dependencies made available from the higher levels.

Why bother with this sort of model?  It allows us to compare the fundamental capabilities and expressiveness of different kinds of systems.  For example, in the realm of compilers, everyone knows that a proper compiler consists of a scanner, a parser, an optimizer, and a code generator.  Through these stages, the input program is transformed from a series of tokens to an abstract syntax tree, an intermediate representation, and eventually to assembly code.  Not every compiler uses all of these stages, much less the same code, but by using a common language, it is helpful to understand, compare, and design new systems.