2012-09-18

Rust (2): Linked Task Failure

In my last post, I gave an introduction to Rust's syntax and memory/concurrency model. None of that stuff was anything I contributed -- that's what I'll talk about in this post.

Rust has a built-in mechanism for failure, sort of light-weight exceptions that can be thrown but not caught. It is written "fail" (or "fail "reason"", or sometimes "assert expr"), and it causes the task to unwind its stack, running destructors and freeing owned memory along the way, and then exit itself.

There are library convenience wrappers for handling failure on the other side of the task boundary, so:

    let result = do task::try {  // spawns and waits for a task
        fail "oops!";
    };
    assert result.is_err();

(There is talk of extending failure to support throwing values of an "any" type and catching them, but that will take development effort.)

But not all failure is created equal. In some cases you might need to abort the entire program (perhaps you're writing an assert which, if it trips, indicates an unrecoverable logic error); in other cases you might want to contain the failure at a certain boundary (perhaps a small piece of input from the outside world, which you happen to be processing in parallel, is malformed and its processing task can't proceed).

Hence the need for different linked failure spawn modes, which was my main project at Mozilla this summer. One of the main motivations for configurable failure propagation is Servo, a parallel web browser being written in Rust (again from Mozilla Research), so along with the code examples below I'll also include a web-browser-style use case for each failure mode.

Linked Task Failure


By default, task failure is bidirectionally linked, which means if either task dies, it kills the other one.

    do task::spawn {
        do task::spawn {
            fail;  // All three tasks will die.
        }
        sleep_forever();  // will get woken up by force
    }
    sleep_forever();  // will get woken up by force

There are plans for Servo to have parallel HTML/CSS parsing and lexing, so the parse phase can start before lexing finishes. If an error happens during either phase, though, the other one should stop immediately -- an application for bidirectionally linked failure.


Supervised Task Failure


If you want parent tasks to kill their children, but not for a child task's failure to kill the parent, you can call task::spawn_supervised for unidirectionally linked failure.

The function task::try uses spawn_supervised internally, with additional logic to wait for the child task to finish before returning. Hence:

    let (receiver,sender) = pipes::stream();
    do task::spawn {  // bidirectionally linked
        // Wait for the supervised child task to exist.
        let message = receiver.recv();
        // Kill both it and the parent task.
        assert message != 42;
    }
    do task::try {  // unidirectionally linked
        sender.send(42);
        sleep_forever();  // will get woken up by force
    }
    // Flow never reaches here -- parent task was killed too.

Supervised failure is useful in any situation where one task manages multiple children tasks, such as with a parent tab task and several image render children tasks, each of the latter of which could fail due to corrupted image data. This failure mode was inspired by Erlang.


This mode of failure propagation was also the hardest to fully support, because parent task failure must propagate across multiple generations even if an intermediate generation has already exited:

    do task::spawn_supervised {
        do task::spawn_supervised {
            sleep_forever();  // should get woken up by force
        }
        // Intermediate task immediately exits.
    }
    wait_for_a_while();
    fail;  // must kill grandchild even if child is gone

Unlinked Task Failure


Finally, tasks can be configured to not propagate failure to each other at all, using task::spawn_unlinked for isolated failure.

    let (time1, time2) = (random(), random());
    do task::spawn_unlinked {
        sleep_for(time2);  // won't get forced awake
        fail;
    }
    sleep_for(time1);  // won't get forced awake
    fail;
    // It will take MAX(time1,time2) for the program to finish.

If you're a Firefox user, you're probably familiar with this screen. Using tasks with isolated failure would prevent the entire browser from crashing if one particular tab crashed.


Wrap-Up


I'd also like to note that asynchronous failure is one of the few sources of nondeterminism in Rust. This code, for example, is dependent on task scheduling patterns:

    fn random_bit() -> bool {
        let result = do task::try {  // supervised
            do task::spawn { fail; }  // linked
            // Might get through here ok; might get killed.
        };
        return result.is_success();
    }

The fact that Rust has no shared state between tasks makes it difficult to trip over inherent randomness in scheduling patterns.

Other sources of nondeterminism include (1) a certain library for shared state, which I'll talk about in my next post; (2) the ability to select on multiple pipes at once; (3) the ability to detect when a pipe endpoint was closed before the message was received (called "try_send()"); and of course (4) system I/O (which includes random number generation). Eric Holk and I believe that in absence of these five things, Rust code (including one-to-one pipe communication) is deterministic.

If you're interested, the slide deck I used for my end-of-internship presentation on linked failure (with more of the same pictures) is here.

No comments:

Post a Comment