2012-09-04

Rust (1): Primer

I spent my summer at Mozilla Research working on Rust. There were several interesting things I did that I'll write about in subsequent posts; this one is an introduction/primer.

Rust is an experimental, still-in-development language that is geared towards parallelism and performance while at the same time providing a strong static type system. (You can read the other buzzwords on the website.)

Syntax Primer


On the front page of Rust's website, there is a code snippet:
 
    fn main() {
        for 5.times {
            println("Here's some Rust!");
        }
    }

This looks sort of cutesy and imperative, but actually there is some syntax sugar going on which facilitates a more functional-programming idiom. The above code is equivalent to:
 
    fn main() {
        times(5, || { println("Here's some Rust!"); true });
    }

where "|args*| { stmt* }" is the lambda/closure syntax (like in Ruby), and "times" is a core library function implemented as:
 
    fn times(count: uint, blk: fn() -> bool) {  // 'blk' is a stack-allocated closure
        if count > 0 {
            if blk() {  // Only continue looping if blk succeeds
                times(count-1, blk);  // Iterate until count hits 0
            }
        }
    }

The long and short of this is that idiomatic Rust typically has a lot of curly-brace "control flow blocks" that are actually closures, and higher-order functions are commonplace.

Concurrency


So, when I was giving my end-of-internship talk (which I'll link in my next post), I showed how easy it is to add parallelism to your rust program.
 
    fn main() {
        for 5.times {
            do task::spawn { // create 5 tasks to print a message in parallel
                println("Here's some Rust!");
            }
        }
    }

'task::spawn' has the signature "fn spawn(child: ~fn())" and is implemented with magic (unsafe code and runtime calls) internally. The 'do' syntax is similar to the 'for' syntax, but doesn't use the "iteration protocol" in which the closure returns bool.

(That code is equivalent to "times(5, || { task::spawn(|| { println("..."); }); true });".)

The Memory Model


If you've a sharp eye, you're wondering what that "~" is that I snuck in on the type of the closure for the child task. That's actually a pointer type, of which Rust has three (none of which can be null, by the way):
  • ~T is a unique pointer to a T. It points to memory allocated in the send heap, which means data inside of unique pointers can be sent between tasks. You can copy unique pointers, but only by deeply copying (otherwise they wouldn't be unique!) (and by default, they are "non-implicitly-copyable", so the compiler will issue warnings if you copy them without writing the "copy" keyword).
  • @T is a managed pointer to a T. Currently, these are reference-counted and cycle-collected (they may be full-on GCed in the future). Copying one increments the reference count, so multiple managed pointers can point to the same data. These are allocated on a per-task private heap, and cannot be sent between tasks.
  • &T is a borrowed pointer to a T. It can point to the inside of arbitrary data structures - on the stack, inside ~ or @ pointers, etc. Rust has a static analysis, called the "borrow checker", that ensures that borrowed pointers must not outlive the scope of the pointed-to data (i.e., it is impossible for rust programs to have a use-after-free).

    Behind this analysis is a sophisticated region system, developed by Niko Matsakis, which you can read about in this tutorial on his blog. I'll also talk a bit more about these in a later post.
The end result here is that in Rust there can be no shared state between tasks; tasks may only communicate by message-passing or by moving unique values into unique closures. More technically said, there is an inherent "send" kind that denotes whether a type may be sent to another task. ~T is sendable if T is sendable; @T and &T are never sendable; structs (conjunctive types) and enums (disjunctive types) are sendable if their contents are sendable; primitive types are always sendable.

Communication


Tasks can pass messages between each other using pipes, which is Rust's communication primitive. Pipes consist of a send endpoint and a receive endpoint, each of which is a noncopyable type (or "linear type", by correspondence with linear logic).

Pipes' noncopyability ensures that communication is one-to-one (i.e., multiple tasks cannot send or receive on the same pipe), which allows their internal synchronisation implementation to be much simpler than N-to-N might require, and hence also be blazing fast. The other benefit of noncopyability is it allows for pipe protocols, statically-enforced send/receive state machines that ensure you can't send/receive values of the "wrong" type, or (for example) try to receive when the other endpoint is also receiving.

I was working closely this summer with Eric Holk, the one responsible for pipes. You can read more about them (some examples, some performance, some type theory) on his blog.

Conclusion


I've got several more posts coming up to talk about the two cool things I personally worked on this summer. Hopefully this post has gotten you enough up to speed on what's going on in Rust to follow along with what I did.

Hopefully also I've gotten you excited about using Rust to write parallel programs that are both safe and performant. I know I am.

3 comments:

  1. So, I see the times function is recursive, wouldn't iterative be a small bit faster? Can rust do a simple c-style for(init,condition,increment) loop without calling another function?

    ReplyDelete
    Replies
    1. Since Rust uses LLVM as a backend, the tail-call will be optimized into a loop. LLVM doesn't guarantee tail-call optimization in all cases, but will provide it for functions with the same number of arguments. You could instead use the imperative "let x = 0; while x < 5 { ...; x+=1; }" construct, which is what I believe the standard library's times function is actually implemented with.

      Delete