tag:blogger.com,1999:blog-88164274314425061102024-03-05T21:34:34.236-05:00Winning Race Conditionsone researcher's thoughts on concurrent programming and how to improve itBenhttp://www.blogger.com/profile/00057900193141052120noreply@blogger.comBlogger22125tag:blogger.com,1999:blog-8816427431442506110.post-67423383889258642602015-10-15T17:46:00.003-04:002015-10-15T18:03:48.596-04:00PLDI evaluationRunning experiments for PLDI has begun in earnest. My evaluation plan calls for 800 CPU-days of testing:<br />
<br />
<br />
80 P2 thread libraries * 6 test cases<br />
79(?) Pintos kernels * 2 test cases<br />
<br />
638 codebase+testcase pairs total<br />
<br />
For each one, a 10-hour control experiment, a 10-cpu * 1-hour "live" experiment, and a 10-cpu * 1-hour "data race false negative" experiment (don't worry, the paper will explain it... should it get published!).<br />
<br />
(80*6+79*2)*3*10 = 19,140 cpu-hours = 797.5 cpu-days.<br />
<br />
And 200 CPUs to do it with.<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
</div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi68oQ-Kbsxxlmx73aOhtycAmqNPTBNbszDRYbxGSbZZ2vhFFHfAFyCADJagIsTwF851chqEUcnG91YscQNVqCNnSd1fKfGzsA8zS-PnfQLg1aoMPtTYXg3hxdMz9s-_EYrRU83kTNvws_A/s1600/pldi-testbed-busy.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="180" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi68oQ-Kbsxxlmx73aOhtycAmqNPTBNbszDRYbxGSbZZ2vhFFHfAFyCADJagIsTwF851chqEUcnG91YscQNVqCNnSd1fKfGzsA8zS-PnfQLg1aoMPtTYXg3hxdMz9s-_EYrRU83kTNvws_A/s320/pldi-testbed-busy.png" width="320" /></a></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhXCRqCH4yrcfXERb3gFthqicClLtSik3x6tOR0rsORsdvoeFZ05AvPCyGkRUe-z_ZaXlu-ZYvvrdaFg5n7kT54oWoCRhe2S2arqHS4YnHnunQXR2dM3kirksPFFkKfxDv1w56hjDIoBpTr/s1600/pldi7.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="180" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhXCRqCH4yrcfXERb3gFthqicClLtSik3x6tOR0rsORsdvoeFZ05AvPCyGkRUe-z_ZaXlu-ZYvvrdaFg5n7kT54oWoCRhe2S2arqHS4YnHnunQXR2dM3kirksPFFkKfxDv1w56hjDIoBpTr/s320/pldi7.png" width="320" /></a></div>
<br />Benhttp://www.blogger.com/profile/00057900193141052120noreply@blogger.com0tag:blogger.com,1999:blog-8816427431442506110.post-36814637949964994242015-09-09T05:28:00.002-04:002015-09-09T05:30:21.346-04:00big green buttonHello internet, it's been a while.<br />
<br />
Tonight I'm having a "1% moment" of research. That is, 99% of the time, I either have my head on the grindstone, or am endlessly worrying and guilting myself about not getting enough done, being an impostor, etc.; but that other 1% is why I'm still a grad student. Because sometimes at 5 in the morning I finish writing mindless automation glue code, finish patching horribly broken anonymous student code, and finish debugging the bugs in my bug-finding software (ha), and finally reach a state where I can hit a big green button marked "GO RUN THE EXPERIMENT" and watch the computer do something absolutely frickin' amazing.<br />
<br />
My current project is an extension of Landslide that automatically searches for new preemption points (PPs) during the course of a systematic test, adds new state spaces to explore using those PPs, and figures out with state space estimation which state spaces are most likely to finish testing in a given CPU budget. I'm calling it "iterative deepening" by analogy with the <a href="https://chessprogramming.wikispaces.com/Iterative+Deepening">chess AI technique</a>, and you can find my latest talk slides <a href="http://www.cs.cmu.edu/~410-s15/lectures/L13_Landslide.pdf">here</a> for more details.<br />
<br />
But mostly the purpose of this post is for me to share some eye-candy. Here's what Landslide looks like when it's feeling victorious.<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgYMrwaw35eQDjAs75J_KhD6xgHTpJOu18xuyIFU92agcwB3OzEwUsxs77VnJRtPQVcNNahzit1dKSi0hOD3Z6IJkPW0euksAS6i1ntk95agwyXdqrHtX25begAKYDcfD9puPupA0H_2Tgw/s1600/landslide-id-berk-result-example2.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="320" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgYMrwaw35eQDjAs75J_KhD6xgHTpJOu18xuyIFU92agcwB3OzEwUsxs77VnJRtPQVcNNahzit1dKSi0hOD3Z6IJkPW0euksAS6i1ntk95agwyXdqrHtX25begAKYDcfD9puPupA0H_2Tgw/s320/landslide-id-berk-result-example2.png" width="225" /></a></div>
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEibZKklZTklvD7zHvC8-WSic3G77prw9IXOfef0fmwPCDWuQFGrmoTpzRoeCTik-Xy46pDyAOZyNUwcBKM1BvRWZoc1WzD-ROpU-50ZGOxy4stIfMbDL8_eMTDRuKwEccGWIUQs7owO9pSk/s1600/landslide-id-berk-result-example.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="320" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEibZKklZTklvD7zHvC8-WSic3G77prw9IXOfef0fmwPCDWuQFGrmoTpzRoeCTik-Xy46pDyAOZyNUwcBKM1BvRWZoc1WzD-ROpU-50ZGOxy4stIfMbDL8_eMTDRuKwEccGWIUQs7owO9pSk/s320/landslide-id-berk-result-example.png" width="181" /></a></div>
<br />
The key thing to note here is that bugs are only found in state spaces
with data-race preemption points, which only Landslide's iterative
deepening framework is capable of identifying and using. IOW, these bugs
would be missed by any other systematic testing tool that interposed
only on thread library calls.<br />
<br />
I've finally got a conference deadline in my sights where getting accepted seems realistic. It's been a looong build-up to this point. Keep your eyes peeled.Benhttp://www.blogger.com/profile/00057900193141052120noreply@blogger.com0tag:blogger.com,1999:blog-8816427431442506110.post-42242485068768827962012-10-20T12:30:00.001-04:002012-12-10T15:02:31.502-05:00What is a "data race" and when is a race not a data race?I've been meaning to write about this since my <a href="http://winningraceconditions.blogspot.com/2012/09/rust-0-index-and-conclusion.html">post series about Rust</a> (in particular <a href="http://winningraceconditions.blogspot.com/2012/09/rust-4-typesafe-shared-mutable-state.html">here</a>, where I wrote <i>"while data races are no longer possible, race conditions in general still are"</i> about the RWARC library). In general, Rust statically guarantees freedom from data races, though not freedom from all races. But what does that mean?<br />
<br />
A <i>data race</i> is when multiple threads concurrently access the same memory location where at least one access is a write. "Concurrently" here could mean either literally at the same time (threads run on different CPUs) or abstractly at the same time (threads interleave with each other on the same CPU); i.e., no synchronisation primitive enforces that one thread's access completes before the other begins.<br />
<br />
<center>
<table><tbody>
<tr><th>Thread 1</th><th>Thread 2</th></tr>
<tr style="font-family: "Courier New",Courier,monospace;"><td><b>if (p != NULL)</b></td><td><br /></td></tr>
<tr style="font-family: "Courier New",Courier,monospace;"><td><br /></td><td><b>p = NULL;</b></td></tr>
<tr style="font-family: "Courier New",Courier,monospace;"><td><b>output(p->data);</b></td><td></td></tr>
</tbody></table>
</center>
<br />
<br />
Data race detectors, such as <a href="http://tinyurl.com/eraser-paper">Eraser</a> and <a href="http://valgrind.org/docs/manual/hg-manual.html">Helgrind</a>, analyse threads' mutual-exclusion and happens-before relationships to identify unsafe concurrent accesses like these. But it's possible to stop accesses from being concurrent without enforcing correct behaviour:<br />
<br />
<center>
<table><tbody>
<tr><th>Thread 1</th><th>Thread 2</th></tr>
<tr style="font-family: "Courier New",Courier,monospace;"><td><b>mutex_lock(m);</b></td><td><br /></td></tr>
<tr style="font-family: "Courier New",Courier,monospace;"><td><b>bool ok = p != NULL;</b></td><td><br /></td></tr>
<tr style="font-family: "Courier New",Courier,monospace;"><td><b>mutex_unlock(m);</b></td><td><br /></td></tr>
<tr style="font-family: "Courier New",Courier,monospace;"><td><br /></td><td><b>mutex_lock(m);</b></td></tr>
<tr style="font-family: "Courier New",Courier,monospace;"><td><br /></td><td><b>p = NULL;</b></td></tr>
<tr style="font-family: "Courier New",Courier,monospace;"><td><br /></td><td><b>mutex_unlock(m);</b></td></tr>
<tr style="font-family: "Courier New",Courier,monospace;"><td><b>mutex_lock(m);</b></td><td><br /></td></tr>
<tr style="font-family: "Courier New",Courier,monospace;"><td><b>if (ok) output(p->data);</b></td><td><br /></td></tr>
<tr style="font-family: "Courier New",Courier,monospace;"><td><b>mutex_unlock(m);</b></td><td></td></tr>
</tbody></table>
</center>
<br />
<br />
Now the data race is gone, but the bug has simply become a <i>higher-level</i> race condition. Most literature calls this an "atomicity violation" (and some literature even uses "race" to mean exclusively data races).<br />
<br />
You might think this code looks silly, but if you're working in a project with many layers of abstraction and function/module boundaries, this kind of mistake can be all too easy to make, and data race detectors <i>are powerless to find them</i>.<br />
<br />
Consider this real-world example. When I started at Mozilla this summer, Rust 0.2 had recently shipped, and its release notes mentioned that it was "Helgrind-clean" (meaning no data races existed). Yet the Rust runtime contained this code:<br />
<br />
<center>
<table><tbody>
<tr><td><div style="font-family: "Courier New",Courier,monospace;">
<b>bool rust_task::blocked() {</b></div>
<div style="font-family: "Courier New",Courier,monospace;">
<b> mutex_lock(this->lifecycle_lock);</b></div>
<div style="font-family: "Courier New",Courier,monospace;">
<b> bool is_blocked = this->state == task_state_blocked;</b></div>
<div style="font-family: "Courier New",Courier,monospace;">
<b> mutex_unlock(this->lifecycle_lock);</b></div>
<div style="font-family: "Courier New",Courier,monospace;">
<b> return is_blocked;</b></div>
<div style="font-family: "Courier New",Courier,monospace;">
<b>}</b></div>
</td></tr>
</tbody></table>
</center>
<br />
Sure, accessing the field was safely protected by the mutex, but once it dropped the lock and returned, all bets were off as to whether the returned value was still accurate or not. (I fixed several bugs related to this, and removed this function entirely.)<br />
<br />
In a similar vein, Rust's type system guarantees that concurrent tasks cannot share state but instead must use message-passing to communicate, which precludes the possibility of data races completely by enforcing happens-before relationships on all data accesses (or in the case of <a href="http://winningraceconditions.blogspot.com/2012/09/rust-4-typesafe-shared-mutable-state.html">this post</a>, by enforcing mutual-exclusion relationships). Yet it's still possible to write nondeterministic programs in Rust (using select2, failure propagation, etc), and so race conditions are still possible.<br />
<br />
The moral of this story is that data races are only one of many types of races, and though many tools exist for finding them, just because one guarantees absence of data races does not
mean your code is completely concurrency-safe. Not to say these tools aren't useful, but they often fail where more sophisticated race-finding techniques could succeed, and even still, no automated race-finding tool can substitute for a careful human brain when reasoning about concurrency.Benhttp://www.blogger.com/profile/00057900193141052120noreply@blogger.com0tag:blogger.com,1999:blog-8816427431442506110.post-18185570821159549162012-10-20T11:25:00.000-04:002012-10-20T11:25:34.499-04:00What exactly is a "race condition", anyway?A friend of mine is taking the operating systems class at UMD, in which the second project is to implement inter-process <a href="http://www.cs.umd.edu/%7Eshankar/412-F12/projects/project2/">signals</a>. He noted a peculiarity in the specification: that processes are not woken up immediately if they receive a signal while blocked (e.g. on child processes, on keyboard/disk input). As a result, it could be completely random whether or not a process receiving a signal gets killed immediately or sleeps forever.<br />
<br />
He discussed this with the professor, and they disagreed over whether this nondeterminism constituted a "race condition" or not. After all, the specification allows for signals to fail to wake up processes under certain circumstances, so there's nothing wrong about implementing it that way. On the other hand, a kernel whose signalling mechanism always wakes up processes in bounded time (i.e., finitely long -- whereas waiting for keyboard input could take forever) could provide stronger guarantees about inter-process communication.<br />
<br />
In my interpretation, both arguments don't tell the entire story. For starters, race conditions don't necessarily entail <i>wrong</i> behaviour; I've seen plenty of "benign" race conditions with comments along the lines of "if X and Y race, Z will happen, and this is OK". Benign races aside, though, "race condition" to me means "unexpected behaviour occurs nondeterministically". So, if you want to be precise, <b>it's important to talk about race conditions <i>with respect to certain expectations</i>.</b><br />
<br />
Someone writing a userspace program for this kernel who didn't realise that signals might never get taken (and hence produced code that sometimes accidentally sleeps forever) could say they were bitten by a race in the kernel. But if they'd read the spec carefully, they might've written code that handled the nondeterminism more robustly. They could say the spec's nondeterminism made it less useful than other possible specs, but it wouldn't be fair to blame the particular implementation of this spec for being buggy.<br />
<br />
In short, I would say the specification itself has a race condition in it, but implementations thereof don't. What's important is who holds the expectations and who nondeterministically breaks them.Benhttp://www.blogger.com/profile/00057900193141052120noreply@blogger.com0tag:blogger.com,1999:blog-8816427431442506110.post-86189667472605825442012-09-26T14:18:00.004-04:002012-09-26T21:52:33.033-04:00Rust (0): Index and ConclusionThis four-post series on <a href="http://www.rust-lang.org/">Rust</a> is intended to introduce you to the language, to teach you about Rust's cool language features, and to give a debriefing of what I contributed to it this summer.<br />
<br />
These posts are targetted for an audience with some knowledge of programming language design principles. You should be lightly familiar with both systems programming languages such as C++ and with functional languages such as Haskell or ML, and preferably strongly skilled in at least one or the other domain.<br />
<br />
Do feel free to skip ahead, if you're already familiar with parts of the language, or to bail out early, if you're not interested in an involved tour of concurrency primitives. All the same, I hope you get something out of some or all of these posts.<br />
<ol>
<li> <b><a href="http://winningraceconditions.blogspot.com/2012/09/rust-1-primer.html">Primer</a></b> - an introduction to the language's syntax, memory model, and concurrency model </li>
<li> <b><a href="http://winningraceconditions.blogspot.com/2012/09/rust-2-linked-task-failure.html">Linked Task Failure</a></b> - advanced parallel programming and error handling with tasks (my first project) </li>
<li> <b><a href="http://winningraceconditions.blogspot.com/2012/09/rust-3-typesafe-shared-state.html">Typesafe Shared State</a></b> - an overview of the region system and a parallelism library that makes heavy use of it </li>
<li> <b><a href="http://winningraceconditions.blogspot.com/2012/09/rust-4-typesafe-shared-mutable-state.html">Typesafe Shared Mutable State</a></b> - using trickery with Rust's type system to achieve a completely safe interface for common concurrency idioms (my second project) </li>
</ol>
I'd like to close with an argument for why I think Rust is the "language of the future" for systems programming.<br />
<ul>
<li> Rust's <b>strong static type system</b> relieves programmers from worrying about many types of errors they should never have to. NULL pointer crashes, memory management errors, surprising implicit type coercions, and dynamic cast exceptions don't exist anymore. Meanwhile, features like closures and higher-order functions (missing in C++ (until very recent versions)), algebraic datatypes and parametric polymorphism (both missing in Go), and <a href="http://pcwalton.github.com/blog/2012/08/08/a-gentle-introduction-to-traits-in-rust/">traits</a> (existential types; a combination of haskell-style typeclasses and OO-style interfaces) allow you to concisely express ideas that would otherwise involve a lot of legwork in certain "conventional" languages. </li>
<li> Unlike other functional languages, however, Rust has heavy <b>focus on performance</b> as well. Stack-allocated data lets you often avoid dynamic allocation overhead and garbage collection (even closures can sometimes be entirely on the stack). The <i>region system</i> and <i><a href="http://dl.rust-lang.org/doc/tutorial-borrowed-ptr.html">borrow checker</a></i> allow for type-and-memory-safe aliasing of arbitrary data with no runtime overhead. Explicit copyability as part of the type system lets you be aware of when expensive copies might occur. </li>
<li> Finally (and this is the big one, for me), Rust's type system includes a <b>concurrency-aware memory model</b>. Forbidding unprotected shared state and using message-passing over pipes as the main communication mechansim means programmers no longer have to worry about data races, and is also friendly to massively-parallel applications where cache-line contention is a serious worry. The use of <i>noncopyable types</i> means the message-passing library can safely assume all communication will be one-to-one, which allows for a <a href="http://theincredibleholk.wordpress.com/2012/07/12/supporting-multiple-senders-with-rust-pipes/">blazing fast</a> implementation under the hood. Noncopyable types also give other strong guarantees, such as the safety of ARCs and the fact that two tasks cannot deadlock when communicating over a single pipe. </li>
</ul>
Hopefully I've gotten you excited about using Rust for safe + performant parallel programming (or maybe several months from now, when its features and syntax are more stable). And to the Rust community: Thanks, it's been a blast.<br />
<br />
<br />
<center>
<a href="http://www.rust-lang.org/"><img src="http://www.rust-lang.org/logos/rust-logo-128x128-blk.png" /></a></center>
Benhttp://www.blogger.com/profile/00057900193141052120noreply@blogger.com19tag:blogger.com,1999:blog-8816427431442506110.post-77757791384181067902012-09-25T16:05:00.000-04:002012-09-27T10:02:31.217-04:00Rust (4): Typesafe Shared Mutable StateThis post is a continuation of <a href="http://winningraceconditions.blogspot.com/2012/09/rust-3-typesafe-shared-state.html">shared immutable state</a>. Before I introduce how we do safe shared mutable state, I'll take a moment to show why unprotected shared mutable state is dangerous.<br />
<br />
<h3>
Dangers of Shared State</h3>
<br />
If you're a functional programmer, you're probably used to a language in which nested data structures are allocated in several heap cells, each of which is garbage-collected, so multiple users can freely alias into the same data, implicitly copy to make changes, and so on.<br />
<br />
Rust's approach is somewhat different: it focuses on stack-allocation, avoiding expensive implicit copies, and predictable performance. In fact, heap-allocation only occurs when you write the <b style="font-family: "Courier New",Courier,monospace;">@</b> or <b><span style="font-family: "Courier New",Courier,monospace;">~</span></b> sigil; and, absent <b style="font-family: "Courier New",Courier,monospace;">@</b>-pointers, Rust's representation semantics <a href="https://twitter.com/pcwalton/statuses/241393172419330048">don't</a> involve garbage collection at all. Instead:<br />
<ol>
<li> Data types are representated with <b>interior types</b>, meaning data types are embedded directly within one another rather than using pointer indirection. You can, of course, create borrowed pointers to such types and pass them between functions. </li>
<li> Stack-allocated and <b><span style="font-family: "Courier New",Courier,monospace;">~</span></b>-allocated values are <b>owned data</b>, which get eagerly freed/deinitialised immediately upon going out of scope or being overwritten. </li>
<li> Rustic data structures can have <b>in-place mutability</b>, indicated with the <b style="color: #783f04; font-family: "Courier New",Courier,monospace;">mut</b>
keyword. While also supported by many other functional languages, in Rust it presents new difficulties with aliasing pointers because of point #2 above.</li>
</ol>
<br />
With such a C/C++-like representation model, the prospect of sharing mutable state among multiple actors is a lot more dangerous. To show why, let's say we added a data-race-enabling function to ARC's interface:<br />
<br />
<div style="font-family: "Courier New",Courier,monospace;">
<b> <span style="color: #783f04;">fn</span> get_mut<T: <span style="color: #38761d;">Const Send</span>>(arc: &a/</b><b><span style="color: #38761d;">ARC</span></b><b><T>) -> &a/</b><b><span style="color: #783f04;">mut</span></b><b> T</b></div>
<br />
Then we can commit badness like:<br />
<br />
<div style="font-family: "Courier New",Courier,monospace;">
<b> </b><b><span style="color: #783f04;">let</span></b><b> arc: </b><b><span style="color: #38761d;">ARC</span></b><b><</b><b><span style="color: #38761d;">Option</span></b><b><~</b><b><span style="color: #38761d;">int</span></b><b>>> = ARC(<span style="color: #990000;">Some</span>(~</b><b><span style="color: #990000;">31337</span></b><b>));</b></div>
<div style="font-family: "Courier New",Courier,monospace;">
<b> </b><b><span style="color: #783f04;">let</span></b><b> arc2 = clone(&arc);</b></div>
<div style="font-family: "Courier New",Courier,monospace;">
<b> </b><b><span style="color: #783f04;">do</span></b><b> </b><b><span style="color: #351c75;">task</span><span style="color: #666666;">::</span></b><b>spawn |</b><b><span style="color: #783f04;">move</span></b><b> arc2| {</b></div>
<div style="font-family: "Courier New",Courier,monospace;">
<b> <span style="color: #134f5c;">// Might print "Some(~31337)". Might print "None". Might segfault.</span></b></div>
<div style="font-family: "Courier New",Courier,monospace;">
<b> </b><b><span style="color: #351c75;">io</span><span style="color: #666666;">::</span></b><b>println(<span style="color: #741b47;">fmt!</span>(<span style="color: #990000;">"%?"</span>, *get(&arc2)));</b></div>
<div style="font-family: "Courier New",Courier,monospace;">
<b> }</b></div>
<div style="color: #134f5c; font-family: "Courier New",Courier,monospace;">
<b> // Frees and deinitialises the owned pointer inside the ARC.</b></div>
<div style="font-family: "Courier New",Courier,monospace;">
<b> *get_mut(&arc) = </b><b><span style="color: #990000;">None</span></b><b>;</b></div>
<div style="color: #134f5c; font-family: "Courier New",Courier,monospace;">
<b> // (But what if this runs after the other task determines the data</b></div>
<div style="color: #134f5c; font-family: "Courier New",Courier,monospace;">
<b> // is Some, but before it dereferences the contained pointer??)</b></div>
<br />
With sufficient cleverness, this can even be harnessed to implement arbitrary type coercion. (See my solution <a href="https://gist.github.com/3776683">here</a>.)<br />
<br />
<h3>
Reader-Writer ARCs</h3>
<br />
The ARC already existed when I arrived at Mozilla, but there was no similar (and safe) solution for the state being mutable. I created the <b>RWARC</b>, with a <a href="http://en.wikipedia.org/wiki/Readers%E2%80%93writer_lock">reader-writer lock</a> inside, to fill this gap.<br />
<br />
You create them just like you create ARCs:<br />
<div style="font-family: "Courier New",Courier,monospace;">
<b><br /></b></div>
<div style="font-family: "Courier New",Courier,monospace;">
<b> </b><b><span style="color: #783f04;">fn</span></b><b> RWARC<T: </b><b><span style="color: #38761d;">Const Send</span></b><b>>(data: T) -> </b><b><span style="color: #38761d;">RWARC</span></b><b><T></b></div>
<div style="font-family: "Courier New",Courier,monospace;">
<b> </b><b><span style="color: #783f04;">fn</span></b><b> clone<T: </b><b><span style="color: #38761d;">Const Send</span></b><b>>(arc: &</b><b><span style="color: #38761d;">RWARC</span></b><b><T>) -> </b><b><span style="color: #38761d;">RWARC</span></b><b><T></b></div>
<br />
But when using them, instead of getting an unlimited-use reference to the data inside, you give the interface a closure to run on the data, and it runs the closure for you with the rwlock held in the correct mode.<br />
<br />
<div style="font-family: "Courier New",Courier,monospace;">
<b> </b><b><span style="color: #783f04;">fn</span></b><b> read <T: </b><b><span style="color: #38761d;">Const Send</span></b><b>>(arc: &</b><b><span style="color: #38761d;">RWARC</span></b><b><T>, blk: </b><b><span style="color: #783f04;">fn</span></b><b>(&T))</b></div>
<div style="font-family: "Courier New",Courier,monospace;">
<b> </b><b><span style="color: #783f04;">fn</span></b><b> write<T: </b><b><span style="color: #38761d;">Const Send</span></b><b>>(arc: &</b><b><span style="color: #38761d;">RWARC</span></b><b><T>, blk: </b><b><span style="color: #783f04;">fn</span></b><b>(&</b><b><span style="color: #783f04;">mut</span></b><b> T))</b></div>
<br />
The key difference is that the region associated with the data pointer is the region of the closure, rather than some arbitrary region defined by the caller. This allows <b><span style="font-family: "Courier New",Courier,monospace;">read()</span></b> and <b><span style="font-family: "Courier New",Courier,monospace;">write()</span></b> to enforce that the contained reader-writer lock is always held in the correct mode when references to the data exist.<br />
<br />
Now we can fix the example from before.<br />
<br />
<div style="font-family: "Courier New",Courier,monospace;">
<b> </b><b><span style="color: #783f04;">let</span></b><b> arc = RWARC(</b><b><span style="color: #990000;">Some</span></b><b>(~</b><b><span style="color: #990000;">31337</span></b><b>));</b></div>
<div style="font-family: "Courier New",Courier,monospace;">
<b> </b><b><span style="color: #783f04;">for</span></b><b> </b><b><span style="color: #990000;">5</span></b><b>.times {</b></div>
<div style="font-family: "Courier New",Courier,monospace;">
<b> </b><b><span style="color: #783f04;">let</span></b><b> arc2 = clone(&arc);</b></div>
<div style="font-family: "Courier New",Courier,monospace;">
<b> </b><b><span style="color: #783f04;">do</span></b><b> </b><b><span style="color: #351c75;">task</span><span style="color: #666666;">::</span></b><b>spawn |</b><b><span style="color: #783f04;">move</span></b><b> arc2| {</b></div>
<div style="font-family: "Courier New",Courier,monospace;">
<b> </b><b><span style="color: #783f04;">do</span></b><b> read(&arc2) |state: &</b><b><span style="color: #38761d;">Option</span></b><b><~</b><b><span style="color: #38761d;">int</span></b><b>>| {</b></div>
<div style="color: #134f5c; font-family: "Courier New",Courier,monospace;">
<b> // Long-running reads on state still happen in parallel.</b></div>
<div style="font-family: "Courier New",Courier,monospace;">
<b> </b><b><span style="color: #351c75;">io</span><span style="color: #666666;">::</span></b><b>println(</b><b><span style="color: #741b47;">fmt!</span>(<span style="color: #990000;">"%?"</span></b><b>, *state));</b></div>
<div style="font-family: "Courier New",Courier,monospace;">
<b> }</b></div>
<div style="font-family: "Courier New",Courier,monospace;">
<b> }</b></div>
<div style="font-family: "Courier New",Courier,monospace;">
<b> }</b></div>
<div style="font-family: "Courier New",Courier,monospace;">
<b> </b><b><span style="color: #783f04;">do</span></b><b> write(&arc) |state: &</b><b><span style="color: #783f04;">mut</span></b><b> </b><b><span style="color: #38761d;">Option</span></b><b><~</b><b><span style="color: #38761d;">int</span></b><b>>| {</b></div>
<div style="color: #134f5c; font-family: "Courier New",Courier,monospace;">
<b> // Exclusive write access. No other aliases to state can exist concurrently.</b></div>
<div style="font-family: "Courier New",Courier,monospace;">
<b> *state = </b><b><span style="color: #990000;">None</span></b><b>;</b></div>
<div style="font-family: "Courier New",Courier,monospace;">
<b> }</b></div>
<br />
Note that while data races are no longer possible, race conditions in general still are. (I mentioned earlier that shared mutable state introduces nondeterminism.) Here, anywhere between zero and five "<b style="color: #990000; font-family: "Courier New",Courier,monospace;">None</b>"s will be printed.<br />
<br />
The compiler will, of course, reject code that tries to cheat the interface:<br />
<br />
<div style="font-family: "Courier New",Courier,monospace;">
<b> </b><b><span style="color: #783f04;">let</span></b><b> escaped_state;</b></div>
<div style="font-family: "Courier New",Courier,monospace;">
<b> </b><b><span style="color: #783f04;">do</span></b><b> write(&arc) |state| {</b></div>
<div style="font-family: "Courier New",Courier,monospace;">
<b> escaped_state = state; <span style="color: #134f5c;">// <i><span style="color: red;">ERROR:</span> reference not valid outside of its lifetime</i></span></b></div>
<div style="font-family: "Courier New",Courier,monospace;">
<b> }</b></div>
<br />
A brief informal justification of safety:<br />
<ul>
<li>The <b style="color: #38761d;"><span style="font-family: "Courier New",Courier,monospace;">Const</span></b> restriction still enforces that readers only see deeply immutable state. Also, even with mutable state, it still prevents cycles from being created, because the RWARC itself does not have the Const kind. </li>
<li>References to the shared state cannot escape the closure called by <b><span style="font-family: "Courier New",Courier,monospace;">read()</span></b> or <b><span style="font-family: "Courier New",Courier,monospace;">write()</span></b>. In effect, the region system statically enforces that the lock must be held in order to access the state.</li>
</ul>
<br />
<h3>
The Concurrency Primitives You Know and Love</h3>
<br />
<i><b>Condition Variables</b></i> <br />
<br />
The RWARC also comes with some other features to remind you of home (if "home" to you means old C-style concurrency primitives you fought off race conditions with back in the day). We have condition variables:<br />
<div style="font-family: "Courier New",Courier,monospace;">
<b><br /></b></div>
<div style="font-family: "Courier New",Courier,monospace;">
<b> </b><b><span style="color: #783f04;">fn</span></b><b> write_cond<T: </b><b><span style="color: #38761d;">Const Send</span></b><b>>(arc: &</b><b><span style="color: #38761d;">RWARC</span></b><b><T>, blk: </b><b><span style="color: #783f04;">fn</span></b><b>(&</b><b><span style="color: #783f04;">mut</span></b><b> T, &</b><b><span style="color: #38761d;">Condvar</span></b><b>))</b></div>
<div style="font-family: "Courier New",Courier,monospace;">
<b><br /></b></div>
<div style="font-family: "Courier New",Courier,monospace;">
<b> </b><b><span style="color: #783f04;">fn</span></b><b> wait(cond: &</b><b><span style="color: #38761d;">Condvar</span></b><b>)</b></div>
<div style="font-family: "Courier New",Courier,monospace;">
<b> </b><b><span style="color: #783f04;">fn</span></b><b> signal(cond: &</b><b><span style="color: #38761d;">Condvar</span></b><b>) -> </b><b><span style="color: #38761d;">bool</span></b></div>
<div style="font-family: "Courier New",Courier,monospace;">
<b> </b><b><span style="color: #783f04;">fn</span></b><b> broadcast(cond: &</b><b><span style="color: #38761d;">Condvar</span></b><b>) -> </b><b><span style="color: #38761d;">uint</span></b></div>
<br />
These work as you might expect. Like the <b style="font-family: "Courier New",Courier,monospace;">&</b><b style="font-family: "Courier New",Courier,monospace;"><span style="color: #783f04;">mut</span></b><b style="font-family: "Courier New",Courier,monospace;"> T</b> reference, the <b style="color: #38761d; font-family: "Courier New",Courier,monospace;">Condvar</b> reference can only be used inside the closure (i.e., while the lock is held).<br />
<br />
<div style="font-family: "Courier New",Courier,monospace;">
<b> </b><b><span style="color: #783f04;">let</span></b><b> arc = RWARC(~[]);</b></div>
<div style="font-family: "Courier New",Courier,monospace;">
<b> </b><b><span style="color: #783f04;">let</span></b><b> arc2 = clone(&arc);</b></div>
<div style="font-family: "Courier New",Courier,monospace;">
<b> </b><b><span style="color: #783f04;">do</span></b><b> </b><b><span style="color: #351c75;">task</span><span style="color: #666666;">::</span></b><b>spawn |</b><b><span style="color: #783f04;">move</span></b><b> arc2| {</b></div>
<div style="font-family: "Courier New",Courier,monospace;">
<b> </b><b><span style="color: #783f04;">do</span></b><b> write_cond(&arc2) |state,cond| {</b></div>
<div style="color: #134f5c; font-family: "Courier New",Courier,monospace;">
<b> // Poor man's message-passing. Of course, pipes are much</b></div>
<div style="color: #134f5c; font-family: "Courier New",Courier,monospace;">
<b> // faster; rwarcs and condvars are built on top of pipes.</b></div>
<div style="font-family: "Courier New",Courier,monospace;">
<b> </b><b><span style="color: #351c75;">vec</span><span style="color: #666666;">::</span></b><b>push(state, ~</b><b><span style="color: #990000;">"hello there!"</span></b><b>);</b></div>
<div style="font-family: "Courier New",Courier,monospace;">
<b> signal(cond);</b></div>
<div style="font-family: "Courier New",Courier,monospace;">
<b> }</b></div>
<div style="font-family: "Courier New",Courier,monospace;">
<b> }</b></div>
<div style="font-family: "Courier New",Courier,monospace;">
<b> </b><b><span style="color: #783f04;">do</span></b><b> write_cond(&arc) |state,cond| {</b></div>
<div style="font-family: "Courier New",Courier,monospace;">
<b> </b><b><span style="color: #783f04;">while</span></b><b> state.len() == 0 {</b></div>
<div style="font-family: "Courier New",Courier,monospace;">
<b> wait(cond);</b></div>
<div style="font-family: "Courier New",Courier,monospace;">
<b> }</b></div>
<div style="font-family: "Courier New",Courier,monospace;">
<b> </b><b><span style="color: #351c75;">io</span><span style="color: #666666;">::</span></b><b>println(</b><b><span style="color: #351c75;">vec</span><span style="color: #666666;">::</span></b><b>pop(state));</b></div>
<div style="font-family: "Courier New",Courier,monospace;">
<b> }</b></div>
<br />
(The more seasoned concurrency hackers among you might now be wondering what if you wanted to associate multiple conditions with the same state? That can be done too -- gritty details are in the <a href="http://dl.rust-lang.org/doc/std/arc.html#implementation-for-condvar">docs</a>.)<br />
<br />
<br />
<i><b>Downgrade (or, Now You're Just Showing Off with the Region System)</b></i><br />
<br />
(Do feel free to zone out for this section.)<br />
<br />
If you're used to being able to atomically "downgrade" write access into read access without letting other writers through in the meantime, you can do that here too. (I'm presenting this feature mostly just to show off more stuff you can do by combining the region system with noncopyable types.)<br />
<br />
<div style="font-family: "Courier New",Courier,monospace;">
<div style="font-family: "Courier New",Courier,monospace;">
<div style="color: #134f5c;">
<b> </b><b>// Calls a closure which will write, then downgrade, then read.</b><b></b></div>
<div style="font-family: "Courier New",Courier,monospace;">
<b> </b><b><span style="color: #783f04;">fn</span></b><b> write_downgrade<T: </b><b><span style="color: #38761d;">Const Send</span></b><b>>(arc: &</b><b><span style="color: #38761d;">RWARC</span></b><b><T>, blk: </b><b><span style="color: #783f04;">fn</span></b><b>(</b><b><span style="color: #38761d;">RWWriteMode</span></b><b>/&a<T>))</b><br />
<div style="color: #134f5c;">
<b> </b><b>// Converts a "write permission" token to a "read permission" token.</b></div>
</div>
</div>
<b></b></div>
<div style="font-family: "Courier New",Courier,monospace;">
<b> </b><b><span style="color: #783f04;">fn</span></b><b> downgrade<T: </b><b><span style="color: #38761d;">Const Send</span></b><b>>(token: </b><b><span style="color: #38761d;">RWWriteMode</span></b><b>/&a<T>) -> </b><b><span style="color: #38761d;">RWReadMode</span></b><b>/&a<T></b></div>
<div style="font-family: "Courier New",Courier,monospace;">
<b><br /></b></div>
<div style="font-family: "Courier New",Courier,monospace;">
<b> </b><b><span style="color: #783f04;">fn</span></b><b> write</b><b><T: </b><b><span style="color: #38761d;">Const Send</span></b><b>></b><b>(token: &</b><b><span style="color: #38761d;">RWWriteMode</span></b><b><T>, blk: </b><b><span style="color: #783f04;">fn</span></b><b>(&</b><b><span style="color: #783f04;">mut</span></b><b> T))</b></div>
<div style="font-family: "Courier New",Courier,monospace;">
<b> </b><b><span style="color: #783f04;">fn</span></b><b> read </b><b><T: </b><b><span style="color: #38761d;">Const Send</span></b><b>></b><b>(token: &</b><b><span style="color: #38761d;">RWReadMode</span></b><b> <T>, blk: </b><b><span style="color: #783f04;">fn</span></b><b>(&T))</b></div>
<br />
Here, the <b style="font-family: "Courier New",Courier,monospace;"><span style="color: #38761d;">RWWriteMode</span></b> and <b style="font-family: "Courier New",Courier,monospace;"><span style="color: #38761d;">RWReadMode</span></b> are noncopyable "permission tokens" that allow the user to write or read, and <b><span style="font-family: "Courier New",Courier,monospace;">downgrade()</span></b> is a function that consumes the write token and wakes up any readers waiting on the rwlock. Since the tokens are noncopyable, the caller cannot still have write permissions after calling <b><span style="font-family: "Courier New",Courier,monospace;">downgrade()</span></b> (which would, of course, result in data races).<br />
<br />
The "<b style="font-family: "Courier New",Courier,monospace;"><span style="color: #38761d;">RWWriteMode</span></b><b style="font-family: "Courier New",Courier,monospace;">/&a</b>" syntax indicates an opaque data structure with region pointers inside. While the write mode token is passed by ownership (so that it can in turn be surrendered to <b><span style="font-family: "Courier New",Courier,monospace;">downgrade()</span></b>), its scope is still constrained by the associated region, which means it can't escape from the closure passed to <b><span style="font-family: "Courier New",Courier,monospace;">write_downgrade()</span></b>. And <b><span style="font-family: "Courier New",Courier,monospace;">downgrade()</span></b> converts a write mode token to a read mode token with the same region, so the latter can't escape either.<br />
<br />
Complex as the above functions may seem, using the interface simply looks like this:<br />
<br />
<div style="font-family: "Courier New",Courier,monospace;">
<b> </b><b><span style="color: #783f04;">do</span></b><b> write_downgrade(&arc) |token| {</b></div>
<div style="font-family: "Courier New",Courier,monospace;">
<b> </b><b><span style="color: #783f04;">do</span></b><b> write(&token) |mutable_state| {</b></div>
<div style="font-family: "Courier New",Courier,monospace;">
<b> ...</b></div>
<div style="font-family: "Courier New",Courier,monospace;">
<b> }</b></div>
<div style="font-family: "Courier New",Courier,monospace;">
<b> </b><b><span style="color: #783f04;">let</span></b><b> token = downgrade(<span style="color: #783f04;">move</span> token);</b></div>
<div style="font-family: "Courier New",Courier,monospace;">
<b> </b><b><span style="color: #783f04;">do</span></b><b> read(&token) |immutable_state| {</b></div>
<div style="font-family: "Courier New",Courier,monospace;">
<b> ...</b></div>
<div style="font-family: "Courier New",Courier,monospace;">
<b> }</b></div>
<div style="font-family: "Courier New",Courier,monospace;">
<b> }</b></div>
<br />
<br />
<i><b>Unwrap</b></i> <br />
<br />
Finally, RWARCs (ARCs too) also now have a mechanism to get your data back out again.<br />
<br />
<div style="font-family: "Courier New",Courier,monospace;">
<b> </b><b><span style="color: #783f04;">fn</span></b><b> unwrap<T: <span style="color: #38761d;">Const Send</span>>(arc: </b><b><span style="color: #38761d;">RWARC</span></b><b><T>) -> T</b></div>
<br />
Of course, it wouldn't be valid to reclaim ownership of the data while other tasks might still have aliases to it. Instead, unwrap() blocks the calling task until its reference is the only reference alive, and then takes ownership of the data instead of freeing it. (To avoid deadlock, subsequent callers to unwrap() on the same ARC immediately fail.)<br />
<br />
This adds expressivity in two ways: it relieves you from having to deeply-copy the shared data if you need to own it (which would be extra problematic if it had noncopyables inside), and it automatically synchronises with the ARC's other users. You could use this to implement a fork-join pattern, like so:<br />
<br />
<div style="font-family: "Courier New",Courier,monospace;">
<b> <span style="color: #783f04;">let</span> arc = RWARC(some_data);</b></div>
<div style="font-family: "Courier New",Courier,monospace;">
<b> </b><b><span style="color: #783f04;">for</span></b><b> num_cpus().times {</b></div>
<div style="font-family: "Courier New",Courier,monospace;">
<b> </b><b><span style="color: #783f04;">let</span></b><b> arc2 = clone(&arc);</b></div>
<div style="font-family: "Courier New",Courier,monospace;">
<b> <span style="color: #783f04;">do</span> <span style="color: #351c75;">task</span><span style="color: #666666;">::</span>spawn |</b><b><span style="color: #783f04;">move</span></b><b> arc2| {</b></div>
<div style="font-family: "Courier New",Courier,monospace;">
<b> process_data(arc2); <span style="color: #134f5c;">// might read, write, whatever</span></b></div>
<div style="font-family: "Courier New",Courier,monospace;">
<b> }</b></div>
<div style="font-family: "Courier New",Courier,monospace;">
<b> }</b></div>
<div style="font-family: "Courier New",Courier,monospace;">
<b> </b><b><span style="color: #783f04;">let</span></b><b> modified_data = unwrap(</b><b><span style="color: #783f04;">move</span></b><b> arc); <span style="color: #134f5c;">// blocks on all child tasks at once</span></b></div>
<div style="color: #134f5c; font-family: "Courier New",Courier,monospace;">
<b> // do more of the algorithm, etc.</b></div>
<br />
All this without ever once copying the data. <br />
<br />
This about wraps up the contributions I made this summer at Mozilla. In my next post I'll conclude the series with a summary of why I like Rust so much.Benhttp://www.blogger.com/profile/00057900193141052120noreply@blogger.com0tag:blogger.com,1999:blog-8816427431442506110.post-3500854921595866652012-09-22T00:38:00.000-04:002012-09-24T17:52:23.410-04:00Rust (3): Typesafe Shared StatePreviously I <a href="http://winningraceconditions.blogspot.com/2012/09/rust-1-primer.html">introduced Rust</a>, talking about syntax, pointer types, and light-weight parallelism and message-passing. I also wrote about my own summer project, <a href="http://winningraceconditions.blogspot.com/2012/09/rust-2-linked-task-failure.html">flexible failure propagation between tasks</a>, talking about some more advanced programming techniques with Rustic tasks.<br />
<br />
Through it all you might have been wondering, <i>"No shared state?! I see the value in eliminating data races, but isn't it sometimes what you want?"</i> Yes! That's what this post is for.<br />
<br />
Consider: When spawning a bunch of tasks to parallelly process a large data structure, it would be a shame to have to deeply copy the whole thing and send one copy over a pipe to each task (expensive in both space and time). You'd want each task to be able to alias the same data instead.<br />
<br />
<h3>
Shared Immutable State</h3>
<br />
Rust's standard library includes the <b>ARC</b>, which stands for <b>A</b>tomically <b>R</b>eference-<b>C</b>ounted object. The ARC serves as a wrapper-handle to some data you wish to share; rather than copying the data itself, you instead copy just the handle, which just involves atomically incrementing a reference count for the contained data.<br />
<br />
To create an ARC:<br />
<div style="font-family: "Courier New",Courier,monospace;">
<b><br /></b></div>
<div style="font-family: "Courier New",Courier,monospace;">
<b> <span style="color: #134f5c;">// Given ownership of some data, wraps it in an ARC.</span></b></div>
<b><span style="font-family: "Courier New",Courier,monospace;"> <span style="color: #783f04;">fn</span> ARC</span></b><b><span style="font-family: "Courier New",Courier,monospace;"><T: <span style="color: #38761d;">Const Send</span>>(data: T) -> <span style="color: #38761d;">ARC</span></span></b><b><span style="font-family: "Courier New",Courier,monospace;"><T></span></b><br />
<br />
The polymorphic type <b><span style="font-family: "Courier New",Courier,monospace;">T</span></b> is constrained by the <i><b><span style="font-family: "Courier New",Courier,monospace;"><span style="color: #38761d;">Send</span></span></b><b><span style="font-family: "Courier New",Courier,monospace;"><span style="color: #38761d;"></span></span></b> kind</i> (which I mentioned in my primer post), so it can only be used with data of types that you could also otherwise send over pipes, and also by the <i><b><span style="color: #783f04; font-family: "Courier New",Courier,monospace;"><b><span style="font-family: "Courier New",Courier,monospace;"><span style="color: #38761d;">Const</span></span></b></span></b> kind</i>, which means the data can have no mutable interior fields (the type has to be <i>deeply immutable</i> to guarantee no data races).<br />
<br />
Like pipe endpoints, the ARC is a <i>noncopyable type</i>. New handles to the same ARC cannot be freely created (for that would bypass the reference counting mechanism); they must be made using the rest of the interface. (ARC also uses destructors internally, so the moment an ARC handle leaves scope, the reference count gets dropped. When the count hits zero, the data will be freed as well.)<br />
<br />
And to use an ARC:<br />
<br />
<b><span style="font-family: "Courier New",Courier,monospace;"> <span style="color: #134f5c;">// Creates a new handle to the ARC.</span></span></b><br />
<div style="font-family: "Courier New",Courier,monospace;">
<b> </b><b><span style="font-family: "Courier New",Courier,monospace;"><span style="color: #783f04;">fn</span></span></b><b> clone<T: <b><span style="font-family: "Courier New",Courier,monospace;"><span style="color: #38761d;">Const Send</span></span></b></b><b>>(arc: &</b><b><span style="font-family: "Courier New",Courier,monospace;"><span style="color: #38761d;">ARC</span></span></b><b><T>) -> </b><b><span style="font-family: "Courier New",Courier,monospace;"><span style="color: #38761d;">ARC</span></span></b><b><T></b></div>
<div style="font-family: "Courier New",Courier,monospace;">
<b> <span style="color: #134f5c;">// Get an immutable pointer to the underlying data.</span></b></div>
<b><span style="font-family: "Courier New",Courier,monospace;"> </span></b><b><span style="font-family: "Courier New",Courier,monospace;"><span style="color: #783f04;">fn</span></span></b><b><span style="font-family: "Courier New",Courier,monospace;"> get</span></b><b><span style="font-family: "Courier New",Courier,monospace;"><T: <b><span style="font-family: "Courier New",Courier,monospace;"><span style="color: #38761d;">Const Send</span></span></b></span></b><b><span style="font-family: "Courier New",Courier,monospace;">>(arc: &a/</span></b><b><span style="font-family: "Courier New",Courier,monospace;"><span style="color: #38761d;">ARC</span></span></b><b><span style="font-family: "Courier New",Courier,monospace;"><T>) -> &a/T</span></b><br />
<br />
You'll notice the use of <b><span style="font-family: "Courier New",Courier,monospace;">&</span></b>-pointers (<i>borrowed</i> pointers) in this interface. In <b><span style="font-family: "Courier New",Courier,monospace;">clone()</span></b>, this means the argument ARC is passed by-reference rather than by-ownership to create the new handle. The interface of <b style="font-family: "Courier New",Courier,monospace;">get()</b> introduces some new syntax, <b style="font-family: "Courier New",Courier,monospace;">&a/T</b>, which to explain I'll need to introduce <b>regions</b>.<br />
<br />
As I hinted at in my primer post, borrowed pointers are statically analysed to ensure they don't outlive the data they were borrowed from. This is done by associating a <i>region</i> with the borrowed pointer to denote its lifetime (which is tied to some lexical scope or inherited from other data's lifetime).<br />
<br />
Mostly, regions exist behind-the-scenes, since the compiler can infer them when needed. Sometimes it is useful, though, to explicitly write that two regions will be the same -- the <b style="font-family: "Courier New",Courier,monospace;">&a/T</b> syntax denotes a borrowed pointer to a <b style="font-family: "Courier New",Courier,monospace;">T</b> with some lifetime <b style="font-family: "Courier New",Courier,monospace;">a</b>. Because the same <i>region variable</i> is used to borrow the ARC itself ("<b style="font-family: "Courier New",Courier,monospace;">&a/<span style="color: #38761d;">ARC</span></b><b style="font-family: "Courier New",Courier,monospace;"><T></b>"), the compiler knows to enforce in <b style="font-family: "Courier New",Courier,monospace;">get()</b>'s caller that the returned pointer cannot outlive the associated ARC handle. <b style="font-family: "Courier New",Courier,monospace;">get()</b> is said to be <i>region-parametric</i>; that is, the region variable <b style="font-family: "Courier New",Courier,monospace;">a</b> can be instantiated with whatever region is appropriate at each call-site.<br />
<br />
<h3>
Examples</h3>
<br />
Here's a code snippet that demonstrates basic ARC usage. I create an ARC with a <b><span style="font-family: "Courier New",Courier,monospace;">BigDataStructure</span></b> inside, <b style="font-family: "Courier New",Courier,monospace;">clone</b> a second handle, and then in two parallel tasks <b style="font-family: "Courier New",Courier,monospace;">get</b> references into them.<br />
<b><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;">
</span></b><b><span style="font-family: "Courier New",Courier,monospace;"><span style="color: #783f04;">fn</span></span></b><b><span style="font-family: "Courier New",Courier,monospace;"> init() -> BigDataStructure { ... }</span><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;">
</span></b><b><span style="font-family: "Courier New",Courier,monospace;"><span style="color: #783f04;">fn</span></span></b><b><span style="font-family: "Courier New",Courier,monospace;"> access(x: &BigDataStructure) { ... }</span></b><br />
<br />
<span style="font-family: "Courier New",Courier,monospace;"><b> </b></span><b><span style="font-family: "Courier New",Courier,monospace;"><span style="color: #783f04;">fn</span></span></b><b><span style="font-family: "Courier New",Courier,monospace;"> main() {</span><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;">
<span style="color: #783f04;">let</span> arc1 = ARC(init()); <span style="color: #134f5c;">// refcount == 1</span></span><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;">
</span></b><b><span style="font-family: "Courier New",Courier,monospace;"><span style="color: #783f04;">let</span></span></b><b><span style="font-family: "Courier New",Courier,monospace;"> arc2 = clone(&arc1); <span style="color: #134f5c;">// refcount == 2</span></span><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;">
<span style="color: #783f04;">do</span> <span style="color: #351c75;">task</span><span style="color: #666666;">::</span>spawn |<span style="color: #783f04;">move</span> arc2| { <span style="color: #134f5c;">// gives child ownership of 2nd handle</span></span><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;">
</span></b><b><span style="font-family: "Courier New",Courier,monospace;"></span></b><b><span style="font-family: "Courier New",Courier,monospace;"><span style="color: #783f04;">let</span></span></b><b><span style="font-family: "Courier New",Courier,monospace;"></span></b><b><span style="font-family: "Courier New",Courier,monospace;"> x2: &BigDataStructure = get(&arc2);</span><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;">
access(x2); <span style="color: #134f5c;">// in parallel with the below</span></span><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;">
<span style="color: #134f5c;">// arc2 gets dropped. BigDataStructure might get freed here.....</span></span><br style="color: #134f5c; font-family: "Courier New",Courier,monospace;" /><span style="color: #134f5c; font-family: "Courier New",Courier,monospace;">
// (note: x2 can no longer be accessed)</span><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;">
}</span><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;">
</span></b><b><span style="font-family: "Courier New",Courier,monospace;"><span style="color: #783f04;">let</span></span></b><b><span style="font-family: "Courier New",Courier,monospace;"> x1: &BigDataStructure = get(&arc1);</span><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;">
access(x1); <span style="color: #134f5c;">// in parallel with the above</span></span><br style="color: #134f5c; font-family: "Courier New",Courier,monospace;" /><span style="color: #134f5c; font-family: "Courier New",Courier,monospace;">
// arc1 gets dropped. .....or it might get freed here.</span><br style="color: #134f5c; font-family: "Courier New",Courier,monospace;" /><span style="color: #134f5c; font-family: "Courier New",Courier,monospace;">
// (note: x1 can no longer be accessed)</span><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;">
}</span></b><br />
<br />
Here are some examples of ways the type system prevents unsafe usage.<br />
<ul>
<li>
First, the compiler won't let me bypass the reference-counting mechanism:<br />
<br />
<b><span style="font-family: "Courier New",Courier,monospace;"> </span></b><b><span style="font-family: "Courier New",Courier,monospace;"><span style="color: #783f04;">let</span></span></b><b><span style="font-family: "Courier New",Courier,monospace;"> arc1 = ARC(init()); <span style="color: #134f5c;">// refcount == 1</span></span><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;">
</span></b><b><span style="font-family: "Courier New",Courier,monospace;"><span style="color: #783f04;">let</span></span></b><b><span style="font-family: "Courier New",Courier,monospace;"> arc2 = arc1; <span style="color: #134f5c;">// <i><span style="color: red;">ERROR:</span> copying a noncopyable value</i></span></span><br style="color: #134f5c; font-family: "Courier New",Courier,monospace;" /><span style="color: #134f5c; font-family: "Courier New",Courier,monospace;">
// double free :(</span></b><br />
<br />
If ARC handles were copyable, two destructors would run here and the reference count would get decremented too many times.<br />
</li>
<li>
The compiler will also stop me from using the reference from <b style="font-family: "Courier New",Courier,monospace;">get()</b> after the associated ARC handle went out of scope (which is legal in a language like C++, and would result in a use-after-free):<br />
<br />
<b><span style="font-family: "Courier New",Courier,monospace;"> </span></b><b><span style="font-family: "Courier New",Courier,monospace;"><span style="color: #783f04;">fn</span></span></b><b><span style="font-family: "Courier New",Courier,monospace;"> broken_get(arc: <span style="color: #38761d;">ARC</span><BigDataStructure></span></b><b><span style="font-family: "Courier New",Courier,monospace;">) -> &a/BigDataStructure {</span><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;">
</span></b><b><span style="font-family: "Courier New",Courier,monospace;"><span style="color: #134f5c;">// note the unconstrained region variable ^</span></span></b><br style="font-family: "Courier New",Courier,monospace;" /><b><span style="font-family: "Courier New",Courier,monospace;">
</span></b><b><span style="font-family: "Courier New",Courier,monospace;"><span style="color: #783f04;">let</span></span></b><b><span style="font-family: "Courier New",Courier,monospace;"> x = get(&arc);</span><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;">
<span style="color: #783f04;">r</span></span></b><b><span style="font-family: "Courier New",Courier,monospace;"><span style="color: #783f04;">eturn</span> x; <span style="color: #134f5c;">// <i><span style="color: red;">ERROR:</span> reference not valid outside of its lifetime</i></span></span><br style="color: #134f5c; font-family: "Courier New",Courier,monospace;" /><span style="color: #134f5c; font-family: "Courier New",Courier,monospace;">
// note: the arc handle would get dropped here(??)</span><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;">
}</span><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;">
access(broken_get(ARC(init()))); <span style="color: #134f5c;">// use after free :(</span></span></b><br />
</li>
<li>
Finally, I will try to surrender ownership of my ARC handle by sending it over a pipe (perhaps
to another task), while still holding on to a pointer I borrowed from it
with <b style="font-family: "Courier New",Courier,monospace;">get()</b>.<br />
<br />
<b><span style="font-family: "Courier New",Courier,monospace;"> </span></b><b><span style="font-family: "Courier New",Courier,monospace;"><span style="color: #783f04;">let</span></span></b><b><span style="font-family: "Courier New",Courier,monospace;"> (sender,receiver) = <span style="color: #351c75;">pipes</span><span style="color: #666666;">::</span>stream();</span><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;">
</span></b><b><span style="font-family: "Courier New",Courier,monospace;"><span style="color: #783f04;">let</span></span></b><b><span style="font-family: "Courier New",Courier,monospace;"> arc = ARC(init());</span><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;">
</span></b><b><span style="font-family: "Courier New",Courier,monospace;"><span style="color: #783f04;">let</span></span></b><b><span style="font-family: "Courier New",Courier,monospace;"> x = get(&arc); <span style="color: #134f5c;">// <i><span style="color: lime;">NOTE:</span> loan of local variable granted here</i></span></span><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;">
sender.send(<span style="color: #783f04;">move</span> arc); <span style="color: #134f5c;">// <i><span style="color: red;">ERROR:</span> moving out of local variable</i></span></span><br style="color: #134f5c; font-family: "Courier New",Courier,monospace;" /><span style="color: #134f5c; font-family: "Courier New",Courier,monospace;">
// <i>prohibited due to outstanding loan</i></span><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;">
access(x); <span style="color: #134f5c;">// unknown whether arc is still alive(??</span></span></b>)<br />
<br />
But the compiler's <i>borrow checker</i> stopped me, because the "loan" I had created earlier was still in scope.</li>
</ul>
<br />
<h3>
Safety</h3>
<br />
Because Rust intentionally has no language features to support shared state, the ARC library provides it by using unsafe code internally. Given that unsafe code "shifts the burden of proof from the compiler to the programmer", how can we know the interface is right?<br />
<br />
While we are working on a proof of the region system's correctness in general, we don't have a proof for this interface in particular (though I'd be curious how one would look!). Nevertheless, we can be quite confident in the ARC's safety because of the guarantees that Rust's language features provide:<br />
<ol>
<li>The <span style="color: #38761d;"><b style="font-family: "Courier New",Courier,monospace;">Const</b></span> kind restriction and the immutable pointer returned by <b style="font-family: "Courier New",Courier,monospace;">get()</b> ensure that once inside an ARC, data can never be modified. This makes data races impossible, and also precludes the possibility of constructing a cyclic reference among ARCs. (Reference counting is a safe memory management strategy only in absence of cycles.)</li>
<li>The use of noncopyable ("linear") types for the ARC handles ensures that the reference count exactly matches the number of handles, and therefore the associated data will only be freed when all handles have left scope.</li>
<li>The regioned type signature of <b style="font-family: "Courier New",Courier,monospace;">get()</b> ensures that a reference to the contained data must be outlived by its associated handle (and hence, by #2, outlived also by the contained data itself).</li>
</ol>
Stay tuned for a follow-up post explaining a still more advanced interface I created for safely sharing <i>mutable</i> state between tasks.Benhttp://www.blogger.com/profile/00057900193141052120noreply@blogger.com0tag:blogger.com,1999:blog-8816427431442506110.post-57640013657983669042012-09-18T12:17:00.003-04:002012-09-23T14:22:51.076-04:00Rust (2): Linked Task Failure<div style="font-family: inherit;">
In my <a href="http://winningraceconditions.blogspot.com/2012/09/rust-1-primer.html">last post</a>, I gave an introduction to Rust's syntax and memory/concurrency model. None of that stuff was anything I contributed -- that's what I'll talk about in this post.</div>
<div style="font-family: inherit;">
<br /></div>
<div style="font-family: inherit;">
Rust has a built-in mechanism for failure, sort of light-weight exceptions that can be thrown but not caught. It is written "<b style="color: red; font-family: "Courier New",Courier,monospace;">fail</b>" (or "<b style="color: #990000;"><span style="font-family: "Courier New",Courier,monospace;"><span style="color: red;">fail</span> "reason"</span></b>", or sometimes "<b style="font-family: "Courier New",Courier,monospace;"><span style="color: red;">assert</span> <i>expr</i></b>"), and it causes the task to unwind its stack, running destructors and freeing owned memory along the way, and then exit itself.</div>
<div style="font-family: inherit;">
<br /></div>
<div style="font-family: inherit;">
There are library convenience wrappers for handling failure on the other side of the task boundary, so:</div>
<div style="font-family: "Courier New",Courier,monospace;">
<b><br /></b></div>
<div style="font-family: "Courier New",Courier,monospace;">
<b> <span style="color: #783f04;">let</span> result = </b><b><span style="color: #783f04;">do</span> <span style="color: #351c75;">task</span><span style="color: #666666;">::</span>try</b><b> { <span style="color: #134f5c;">// spawns and waits for a task</span></b></div>
<div style="font-family: "Courier New",Courier,monospace;">
<b> <span style="color: red;">fail</span> <span style="color: #990000;">"oops!"</span>;</b></div>
<div style="font-family: "Courier New",Courier,monospace;">
<b> };</b></div>
<div style="font-family: "Courier New",Courier,monospace;">
<b> <span style="color: red;">assert</span> result.is_err();</b></div>
<div style="font-family: inherit;">
<br /></div>
<div style="font-family: inherit;">
(There is talk of extending failure to support throwing values of an "any" type and catching them, but that will take development effort.)</div>
<div style="font-family: inherit;">
<br /></div>
<div style="font-family: inherit;">
But not all failure is created equal. In some cases you might need to abort the entire program (perhaps you're writing an assert which, if it trips, indicates an unrecoverable logic error); in other cases you might want to contain the failure at a certain boundary (perhaps a small piece of input from the outside world, which you happen to be processing in parallel, is malformed and its processing task can't proceed).</div>
<div style="font-family: inherit;">
<br /></div>
<div style="font-family: inherit;">
Hence the need for different <b>linked failure spawn modes</b>, which was my main project at Mozilla this summer. One of the main motivations for configurable failure propagation is <a href="https://github.com/mozilla/servo">Servo</a>, a parallel web browser being written in Rust (again from Mozilla Research), so along with the code examples below I'll also include a web-browser-style use case for each failure mode.</div>
<div style="font-family: inherit;">
<br /></div>
<h3 style="font-family: inherit;">
Linked Task Failure</h3>
<div style="font-family: inherit;">
</div>
<div style="font-family: inherit;">
<br />
By default, task failure is <i>bidirectionally linked</i>, which means if either task dies, it kills the other one.</div>
<div style="font-family: inherit;">
<br /></div>
<div style="font-family: "Courier New",Courier,monospace;">
<b> <span style="color: #783f04;">do</span> <span style="color: #351c75;">task</span><span style="color: #666666;">::</span>spawn {</b></div>
<div style="font-family: "Courier New",Courier,monospace;">
<b> </b><b><span style="color: #783f04;">do</span> <span style="color: #351c75;">task</span><span style="color: #666666;">::</span>spawn</b><b> {</b></div>
<div style="font-family: "Courier New",Courier,monospace;">
<b> <span style="color: red;">fail</span>; <span style="color: #134f5c;">// All three tasks will die.</span></b></div>
<div style="font-family: "Courier New",Courier,monospace;">
<b> }</b></div>
<div style="font-family: "Courier New",Courier,monospace;">
<b> sleep_forever(); <span style="color: #134f5c;">// will get woken up by force</span></b></div>
<div style="font-family: "Courier New",Courier,monospace;">
<b> }</b></div>
<div style="font-family: "Courier New",Courier,monospace;">
<b> sleep_forever(); <span style="color: #134f5c;">// will get woken up by force</span></b></div>
<div style="font-family: inherit;">
<br /></div>
<div style="font-family: inherit;">
There are plans for Servo to have parallel HTML/CSS parsing and lexing, so the parse phase can start before lexing finishes. If an error happens during either phase, though, the other one should stop immediately -- an application for bidirectionally linked failure.</div>
<div style="font-family: inherit;">
<br /></div>
<center>
<a href="http://bblum.net/images/rust-intern/linked.png"><img src="http://bblum.net/images/rust-intern/linked.scale.png" /></a></center>
<div style="font-family: inherit;">
<br /></div>
<h3 style="font-family: inherit;">
Supervised Task Failure</h3>
<div style="font-family: inherit;">
</div>
<div style="font-family: inherit;">
<br />
If you want parent tasks to kill their children, but not for a child task's failure to kill the parent, you can call task::spawn_supervised for <i>unidirectionally linked</i> failure.</div>
<div style="font-family: inherit;">
<br /></div>
<div style="font-family: inherit;">
The function task::try uses spawn_supervised internally, with additional logic to wait for the child task to finish before returning. Hence:</div>
<div style="font-family: inherit;">
<br /></div>
<div style="font-family: "Courier New",Courier,monospace;">
<b> <span style="color: #783f04;">let</span> (receiver,sender) = </b><b><span style="color: #783f04;"></span><span style="color: #351c75;">pipes</span><span style="color: #666666;">::</span></b><b>stream();</b></div>
<div style="font-family: "Courier New",Courier,monospace;">
<b> </b><b><span style="color: #783f04;">do</span> <span style="color: #351c75;">task</span><span style="color: #666666;">::</span>spawn</b><b> { <span style="color: #134f5c;">// bidirectionally linked</span></b></div>
<div style="font-family: "Courier New",Courier,monospace;">
<b> <span style="color: #134f5c;">// Wait for the supervised child task to exist.</span></b></div>
<div style="font-family: "Courier New",Courier,monospace;">
<b> <span style="color: #783f04;">let</span> message = receiver.recv();</b></div>
<div style="font-family: "Courier New",Courier,monospace;">
<b> <span style="color: #134f5c;">// Kill both it and the parent task.</span></b></div>
<div style="font-family: "Courier New",Courier,monospace;">
<b> <span style="color: red;">assert</span> message != <span style="color: #990000;">42</span>;</b></div>
<div style="font-family: "Courier New",Courier,monospace;">
<b> }</b></div>
<div style="font-family: "Courier New",Courier,monospace;">
<b> </b><b><span style="color: #783f04;">do</span> <span style="color: #351c75;">task</span><span style="color: #666666;">::</span>try</b><b> { <span style="color: #134f5c;">// unidirectionally linked</span></b></div>
<div style="font-family: "Courier New",Courier,monospace;">
<b> sender.send(<span style="color: #990000;">42</span>);</b></div>
<div style="font-family: "Courier New",Courier,monospace;">
<b> sleep_forever(); <span style="color: #134f5c;">// will get woken up by force</span></b></div>
<div style="font-family: "Courier New",Courier,monospace;">
<b> }</b></div>
<div style="font-family: "Courier New",Courier,monospace;">
<b> <span style="color: #134f5c;">// Flow never reaches here -- parent task was killed too.</span></b></div>
<div style="font-family: inherit;">
<br /></div>
<div style="font-family: inherit;">
Supervised failure is useful in any situation where one task manages multiple children tasks, such as with a parent tab task and several image render children tasks, each of the latter of which could fail due to corrupted image data. This failure mode was inspired by <a href="http://www.erlang.org/doc/design_principles/des_princ.html">Erlang</a>.</div>
<div style="font-family: inherit;">
<br /></div>
<center>
<a href="http://bblum.net/images/rust-intern/supervised.png"><img src="http://bblum.net/images/rust-intern/supervised.scale.png" /></a></center>
<div style="font-family: inherit;">
<br /></div>
<div style="font-family: inherit;">
This mode of failure propagation was also the hardest to fully support, because parent task failure must propagate across multiple generations even if an intermediate generation has already exited:</div>
<div style="font-family: inherit;">
<br /></div>
<div style="font-family: "Courier New",Courier,monospace;">
<b> </b><b><span style="color: #783f04;">do</span> <span style="color: #351c75;">task</span><span style="color: #666666;">::</span>spawn</b><b>_supervised {</b></div>
<div style="font-family: "Courier New",Courier,monospace;">
<b> </b><b><span style="color: #783f04;">do</span> <span style="color: #351c75;">task</span><span style="color: #666666;">::</span>spawn</b><b>_supervised {</b></div>
<div style="font-family: "Courier New",Courier,monospace;">
<b> sleep_forever(); <span style="color: #134f5c;">// should get woken up by force</span></b></div>
<div style="font-family: "Courier New",Courier,monospace;">
<b> }</b></div>
<div style="font-family: "Courier New",Courier,monospace;">
<b> <span style="color: #134f5c;">// Intermediate task immediately exits.</span></b></div>
<div style="font-family: "Courier New",Courier,monospace;">
<b> }</b></div>
<div style="font-family: "Courier New",Courier,monospace;">
<b> wait_for_a_while();</b></div>
<div style="font-family: "Courier New",Courier,monospace;">
<b> <span style="color: red;">fail</span>; <span style="color: #134f5c;">// must kill grandchild even if child is gone</span></b></div>
<div style="font-family: inherit;">
<br /></div>
<h3 style="font-family: inherit;">
Unlinked Task Failure</h3>
<div style="font-family: inherit;">
<br /></div>
<div style="font-family: inherit;">
Finally, tasks can be configured to not propagate failure to each other at all, using task::spawn_unlinked for <i>isolated failure</i>.</div>
<div style="font-family: inherit;">
<br /></div>
<div style="font-family: inherit;">
</div>
<div style="font-family: "Courier New",Courier,monospace;">
<b> </b><b><span style="color: #783f04;"> </span></b><b><span style="color: #783f04;">let </span>(time1, time2) = (random(), random());</b></div>
<div style="font-family: "Courier New",Courier,monospace;">
<b><span style="color: #783f04;"> do</span> <span style="color: #351c75;">task</span><span style="color: #666666;">::</span>spawn</b><b>_unlinked {</b></div>
<div style="font-family: "Courier New",Courier,monospace;">
<b> sleep_for(time2); </b><b><span style="color: #134f5c;">// won't get forced awake</span></b></div>
<div style="font-family: "Courier New",Courier,monospace;">
<b> </b><b><span style="color: red;">fail</span></b><b>;</b></div>
<div style="font-family: "Courier New",Courier,monospace;">
<b> }</b></div>
<div style="font-family: "Courier New",Courier,monospace;">
<b> sleep_for(time1); </b><b><span style="color: #134f5c;">// won't get forced awake</span></b></div>
<div style="font-family: "Courier New",Courier,monospace;">
<b> </b><b><span style="color: red;">fail</span></b><b>;</b></div>
<div style="font-family: "Courier New",Courier,monospace;">
<b> </b><b><span style="color: #134f5c;">// It will take MAX(time1,time2) for the program to finish.</span></b></div>
<div style="font-family: inherit;">
<br /></div>
<div style="font-family: inherit;">
If you're a Firefox user, you're probably familiar with this screen. Using tasks with isolated failure would prevent the entire browser from crashing if one particular tab crashed.</div>
<div style="font-family: inherit;">
<br /></div>
<center>
<a href="http://bblum.net/images/rust-intern/isolated.png"><img src="http://bblum.net/images/rust-intern/isolated.scale.png" /></a></center>
<div style="font-family: inherit;">
<br /></div>
<h3 style="font-family: inherit;">
Wrap-Up</h3>
<div style="font-family: inherit;">
</div>
<div style="font-family: inherit;">
<br />
I'd also like to note that asynchronous failure is one of the few sources of nondeterminism in Rust. This code, for example, is dependent on task scheduling patterns:</div>
<div style="font-family: inherit;">
<br /></div>
<div style="font-family: "Courier New",Courier,monospace;">
<b> <span style="color: #783f04;">fn</span> random_bit() -> <span style="color: #38761d;">bool</span> {</b></div>
<div style="font-family: "Courier New",Courier,monospace;">
<b> <span style="color: #783f04;">let</span> result = </b><b><span style="color: #783f04;">do</span> <span style="color: #351c75;">task</span><span style="color: #666666;">::</span>t</b><b>ry { <span style="color: #134f5c;">// supervised</span></b></div>
<div style="font-family: "Courier New",Courier,monospace;">
<b> </b><b><span style="color: #783f04;">do</span> <span style="color: #351c75;">task</span><span style="color: #666666;">::</span>spawn</b><b> { <span style="color: red;">fail</span>; } <span style="color: #134f5c;">// linked</span></b></div>
<div style="font-family: "Courier New",Courier,monospace;">
<b> <span style="color: #134f5c;">// Might get through here ok; might get killed.</span></b></div>
<div style="font-family: "Courier New",Courier,monospace;">
<b> };</b></div>
<div style="font-family: "Courier New",Courier,monospace;">
<b> <span style="color: #783f04;">return</span> result.is_success();</b></div>
<div style="font-family: "Courier New",Courier,monospace;">
<b> }</b></div>
<div style="font-family: inherit;">
<br /></div>
<div style="font-family: inherit;">
The fact that Rust has no shared state between tasks makes it difficult to trip over inherent randomness in scheduling patterns.</div>
<div style="font-family: inherit;">
<br /></div>
<div style="font-family: inherit;">
Other sources of nondeterminism include (1) a certain library for shared state, which I'll talk about in my next post; (2) the ability to select on multiple pipes at once; (3) the ability to detect when a pipe endpoint was closed before the message was received (called "try_send()"); and of course (4) system I/O (which includes random number generation). Eric Holk and I believe that in absence of these five things, Rust code (including one-to-one pipe communication) is deterministic.</div>
<div style="font-family: inherit;">
<br /></div>
<div style="font-family: inherit;">
If you're interested, the slide deck I used for my end-of-internship presentation on linked failure (with more of the same pictures) is <a href="http://bblum.net/rust-intern-presentation.pdf">here</a>.</div>
Benhttp://www.blogger.com/profile/00057900193141052120noreply@blogger.com0tag:blogger.com,1999:blog-8816427431442506110.post-15805670484723458932012-09-04T15:14:00.003-04:002012-09-22T13:01:46.142-04:00Rust (1): PrimerI spent my summer at Mozilla Research working on <a href="http://www.rust-lang.org/">Rust</a>. There were several interesting things I did that I'll write about in subsequent posts; this one is an introduction/primer.<br />
<br />
Rust is an experimental, still-in-development language that is geared towards parallelism and performance while at the same time providing a strong static type system. (You can read the other buzzwords on the website.)<br />
<br />
<h3>
<b>Syntax Primer</b></h3>
<br />
<span style="font-size: small;">On the front page of Rust's website, there is a code snippet:</span><br />
<span style="font-size: small;"></span>
<span style="font-size: small;"><b><span style="font-family: "Courier New",Courier,monospace;"> </span></b></span><br />
<span style="font-size: small;"><b><span style="font-family: "Courier New",Courier,monospace;"> <span style="color: #783f04;">fn</span> main() {</span></b></span><br />
<span style="font-size: small;"><b><span style="font-family: "Courier New",Courier,monospace;"> <span style="color: #783f04;">for</span> <span style="color: #990000;">5</span>.times {</span></b></span><br />
<span style="font-size: small;"><b><span style="font-family: "Courier New",Courier,monospace;"> println(<span style="color: #990000;">"Here's some Rust!"</span>);</span></b></span><br />
<span style="font-size: small;"><b><span style="font-family: "Courier New",Courier,monospace;"> }</span></b></span><br />
<span style="font-size: small;"><b><span style="font-family: "Courier New",Courier,monospace;"> }</span></b></span><br />
<span style="font-size: small;"><br /></span>
<span style="font-size: small;">This looks sort of cutesy and imperative, but actually there is some syntax sugar going on which facilitates a more functional-programming idiom. The above code is equivalent to:</span><br />
<span style="font-size: small;">
<b><span style="font-family: "Courier New",Courier,monospace;"> </span></b></span><br />
<span style="font-size: small;"><b><span style="font-family: "Courier New",Courier,monospace;"> <span style="color: #783f04;">fn</span> main() {</span></b></span><br />
<span style="font-size: small;"><b><span style="font-family: "Courier New",Courier,monospace;"> times(</span><span style="font-family: "Courier New",Courier,monospace;"><span style="font-family: "Courier New",Courier,monospace;"><span style="color: #990000;">5</span></span>, || { </span><span style="font-family: "Courier New",Courier,monospace;"><span style="font-family: "Courier New",Courier,monospace;">println(<span style="color: #990000;">"Here's some Rust!"</span>);</span> <span style="color: #990000;">true</span> });</span></b></span><br />
<span style="font-size: small;"><b><span style="font-family: "Courier New",Courier,monospace;"> }</span></b></span><br />
<span style="font-size: small;"><br /></span>
<span style="font-size: small;">where "<b><span style="font-family: "Courier New",Courier,monospace;">|<i>args*</i>| { <i>stmt*</i> }</span></b>" is the lambda/closure syntax (like in Ruby), and "<span style="font-family: "Courier New",Courier,monospace;"><b>times</b></span>" is a core library function implemented as:</span><br />
<span style="font-size: small;"><b></b>
<b><span style="font-family: "Courier New",Courier,monospace;"> </span></b></span><br />
<span style="font-size: small;"><b><span style="font-family: "Courier New",Courier,monospace;"> </span><span style="font-family: "Courier New",Courier,monospace;"><span style="font-family: "Courier New",Courier,monospace;"><span style="color: #783f04;">fn</span></span> times(count: <span style="color: #38761d;">uint</span>, blk: <span style="color: #783f04;">fn</span>() -> <span style="color: #38761d;">bool</span>) { <span style="color: #134f5c;">// 'blk' is a stack-allocated closure</span></span></b></span><br />
<span style="font-size: small;"><b><span style="font-family: "Courier New",Courier,monospace;"> <span style="color: #783f04;">if</span> count > <span style="color: #990000;">0</span> {</span></b></span><br />
<span style="font-size: small;"><b><span style="font-family: "Courier New",Courier,monospace;"> <span style="color: #783f04;">if</span> blk() { <span style="color: #134f5c;">// Only continue looping if blk succeeds</span></span></b></span><br />
<span style="font-size: small;"><b><span style="font-family: "Courier New",Courier,monospace;"> times(count-<span style="color: #990000;">1</span>, blk); <span style="color: #134f5c;">// Iterate until count hits 0</span></span></b></span><br />
<span style="font-size: small;"><b><span style="font-family: "Courier New",Courier,monospace;"> }</span></b></span><br />
<span style="font-size: small;"><b><span style="font-family: "Courier New",Courier,monospace;"> }</span></b></span><br />
<span style="font-size: small;"><b><span style="font-family: "Courier New",Courier,monospace;"> }</span></b></span><br />
<span style="font-size: small;"><br /></span>
<span style="font-size: small;">The long and short of this is that idiomatic Rust typically has a lot of curly-brace "control flow blocks" that are actually closures, and higher-order functions are commonplace.</span><br />
<br />
<h3>
<b>Concurrency</b></h3>
<br />
So, when I was giving my end-of-internship talk (which I'll link in my next post), I showed how easy it is to <span style="font-size: small;">add parallelism to your rust program.</span><br />
<span style="font-size: small;"></span>
<span style="font-size: small;"><b><span style="font-family: "Courier New",Courier,monospace;"> </span></b></span><br />
<span style="font-size: small;"><b><span style="font-family: "Courier New",Courier,monospace;"> <span style="color: #783f04;">fn</span> main() {</span></b></span><br />
<span style="font-size: small;"><b><span style="font-family: "Courier New",Courier,monospace;"> <span style="color: #783f04;">for</span> <span style="color: #990000;">5</span>.times {</span></b></span><br />
<span style="font-size: small;"><b><span style="font-family: "Courier New",Courier,monospace;"> <span style="color: #783f04;">do</span> <span style="color: #351c75;">task</span><span style="color: #666666;">::</span>spawn { <span style="color: #134f5c;">// create 5 tasks to print a message in parallel</span></span></b></span><br />
<span style="font-size: small;"><b><span style="font-family: "Courier New",Courier,monospace;"> <span style="font-family: "Courier New",Courier,monospace;">println(<span style="color: #990000;">"Here's some Rust!"</span>);</span></span></b></span><br />
<span style="font-size: small;"><b><span style="font-family: "Courier New",Courier,monospace;"> }</span></b></span><br />
<span style="font-size: small;"><b><span style="font-family: "Courier New",Courier,monospace;"> }</span></b></span><br />
<span style="font-size: small;"><b><span style="font-family: "Courier New",Courier,monospace;"> }</span></b></span><br />
<span style="font-size: small;"><br /></span>
<span style="font-size: small;">'<b><span style="font-family: "Courier New",Courier,monospace;"><span style="color: #351c75;">task</span><span style="color: #666666;">::</span>spawn</span></b>' has the signature "<b><span style="font-family: "Courier New",Courier,monospace;"><span style="color: #783f04;">fn</span> spawn(child: ~<span style="color: #783f04;">fn</span>())</span></b>" and is implemented with magic (unsafe code and runtime calls) internally. The '<span style="color: #783f04;"><b><span style="font-family: "Courier New",Courier,monospace;">do</span></b></span>' syntax is similar to the '<span style="color: #783f04;"><b><span style="font-family: "Courier New",Courier,monospace;">for</span></b></span>' syntax, but doesn't use the "iteration protocol" in which the closure returns <span style="color: #38761d;"><b><span style="font-family: "Courier New",Courier,monospace;">bool</span></b></span>.</span><br />
<span style="font-size: small;"><br /></span>
<span style="font-size: small;">(That code is equivalent to "<b><span style="font-family: "Courier New",Courier,monospace;">times(</span><span style="font-family: "Courier New",Courier,monospace;"><span style="font-family: "Courier New",Courier,monospace;"><span style="color: #990000;">5</span></span>, || { </span></b><b><span style="font-family: "Courier New",Courier,monospace;"><b><span style="font-family: "Courier New",Courier,monospace;"><span style="color: #351c75;">task</span><span style="color: #666666;">::</span></span></b>spawn(|| { println(</span></b><b><span style="font-family: "Courier New",Courier,monospace;"><b><span style="font-family: "Courier New",Courier,monospace;"><span style="font-family: "Courier New",Courier,monospace;"><span style="color: #990000;">"..."</span></span></span></b>); }); <span style="color: #990000;">true</span> });</span></b>".)</span><br />
<br />
<h3>
<b>The Memory Model</b></h3>
<br />
If you've a sharp eye, you're wondering what that "<b><span style="font-family: "Courier New",Courier,monospace;">~</span></b>" is that I snuck in on the type of the closure for the child task. That's actually a pointer type, of which Rust has three (none of which can be null, by the way):<br />
<ul>
<li><b><span style="font-family: "Courier New",Courier,monospace;">~T</span></b> is a <i>unique</i> pointer to a <b><span style="font-family: "Courier New",Courier,monospace;">T</span></b>. It points to memory allocated in the <i>send heap</i>, which means data inside of unique pointers can be sent between tasks. You can copy unique pointers, but only by deeply copying (otherwise they wouldn't be unique!) (and by default, they are "non-implicitly-copyable", so the compiler will issue warnings if you copy them without writing the "copy" keyword).</li>
<li><b><span style="font-family: "Courier New",Courier,monospace;">@T</span></b> is a <i>managed</i> pointer to a <b><span style="font-family: "Courier New",Courier,monospace;">T</span></b>. Currently, these are reference-counted and cycle-collected (they may be full-on GCed in the future). Copying one increments the reference count, so multiple managed pointers can point to the same data. These are allocated on a per-task private heap, and cannot be sent between tasks.</li>
<li><b><span style="font-family: "Courier New",Courier,monospace;">&T</span></b> is a <i>borrowed</i> pointer to a <b><span style="font-family: "Courier New",Courier,monospace;">T</span></b>. It can point to the inside of arbitrary data structures - on the stack, inside <b><span style="font-family: "Courier New",Courier,monospace;">~</span></b> or <b><span style="font-family: "Courier New",Courier,monospace;">@</span></b> pointers, etc. Rust has a static analysis, called the "borrow checker", that ensures that borrowed pointers must not outlive the scope of the pointed-to data (i.e., it is impossible for rust programs to have a use-after-free).<br />
<br />
Behind this analysis is a sophisticated <i>region system</i>, developed by Niko Matsakis, which you can read about in <a href="http://smallcultfollowing.com/babysteps/blog/2012/07/19/yet-another-tutorial-on-borrowed-pointers/">this tutorial</a> on his blog. I'll also talk a bit more about these in a later post.</li>
</ul>
The end result here is that in Rust there can be no shared state between tasks; tasks may only communicate by message-passing or by moving unique values into unique closures. More technically said, there is an inherent <i>"send" kind</i> that denotes whether a type may be sent to another task. <b><span style="font-family: "Courier New",Courier,monospace;">~T</span></b> is sendable if <b><span style="font-family: "Courier New",Courier,monospace;">T</span></b> is sendable; <b><span style="font-family: "Courier New",Courier,monospace;">@T</span></b> and <b><span style="font-family: "Courier New",Courier,monospace;">&T</span></b> are never sendable; structs (conjunctive types) and enums (disjunctive types) are sendable if their contents are sendable; primitive types are always sendable.<br />
<br />
<h3>
<b>Communication</b></h3>
<br />
Tasks can pass messages between each other using <i>pipes</i>, which is Rust's communication primitive. Pipes consist of a send endpoint and a receive endpoint, each of which is a <i>noncopyable type</i> (or "linear type", by correspondence with linear logic).<br />
<br />
Pipes' noncopyability ensures that communication is one-to-one (i.e., multiple tasks cannot send or receive on the same pipe), which allows their internal synchronisation implementation to be much simpler than N-to-N might require, and hence also be blazing fast. The other benefit of noncopyability is it allows for <i>pipe protocols</i>, statically-enforced send/receive state machines that ensure you can't send/receive values of the "wrong" type, or (for example) try to receive when the other endpoint is also receiving.<br />
<br />
I was working closely this summer with Eric Holk, the one responsible for pipes. You can read more about them (some examples, some performance, some type theory) on <a href="http://theincredibleholk.wordpress.com/tag/rust/">his blog</a>.<br />
<br />
<h3>
<b>Conclusion</b></h3>
<br />
I've got several more posts coming up to talk about the two cool things I personally worked on this summer. Hopefully this post has gotten you enough up to speed on what's going on in Rust to follow along with what I did.<br />
<br />
Hopefully also I've gotten you excited about using Rust to write parallel programs that are both safe and performant. I know I am.Benhttp://www.blogger.com/profile/00057900193141052120noreply@blogger.com3tag:blogger.com,1999:blog-8816427431442506110.post-73615192446999673272012-07-01T06:23:00.001-04:002012-08-05T22:29:44.338-04:00Linux's leap-second deadlocks<b>Intro</b><br />
<br />
The <a href="http://en.wikipedia.org/wiki/Leap_second">leap second</a> is an extra second we insert irregularly at midnight at the end of certain months as determined by astronomers. UTC clocks render this as 23:59:60.<br />
<br />
Yesterday at that time, linux servers around the world became wedged or experienced huge CPU spikes due to <b>deadlock bugs in the leap second code</b>. <a href="http://serverfault.com/questions/403732/anyone-else-experiencing-high-rates-of-linux-server-crashes-during-a-leap-second">This post</a> was linked on hackernews today, and has a good summary of some of the bugs in the comments. Here I'll discuss the leap second deadlocks from a concurrency researcher's perspective.<br />
<br />
<b>Five Bugs</b><br />
<br />
I did a bit of digging to see where in Linux's code things were actually going wrong. It turns out there have actually been <i>five different bugs</i> related to the leap second management. It also turns out that we've seen linux deadlock at the leap second before, <a href="https://lkml.org/lkml/2007/7/3/103">in 2007</a>. (Race conditions can exist unnoticed for a very long time in infrequently-tested code paths. Who knew?!)<br />
<br />
Each of these bugs result from some interaction with a spin-lock called "xtime_lock". Take a look at this code (trimmed and approximated a bit), which has moved and changed between different functions over the years but currently lives in "ntp_leap_second" in kernel/time/ntp.c.
<pre> write_seqlock(&xtime_lock);
switch (time_state) {
case TIME_INS:
timekeeping_leap_insert(-1);
time_state = TIME_OOP;
clock_was_set();
printk(KERN_NOTICE
"Clock: inserting leap second 23:59:60 UTC\n");
break;
case TIME_DEL:
timekeeping_leap_insert(1);
time_state = TIME_WAIT;
clock_was_set();
printk(KERN_NOTICE
"Clock: deleting leap second 23:59:59 UTC\n");
break;
// (more cases omitted ...)
}
write_sequnlock(&xtime_lock);</pre>
I will list the bugs in chronological order.
<ol><li>
One deadlock is described <a href="http://www.mail-archive.com/git-commits-head@vger.kernel.org/msg15039.html">here</a> (fixed by same, in 2007). The function clock_was_set() calls smp_call_function() to "retrigger CPU local events" in the high-resolution timer subsystem. Unfortunately, it's forbidden to call smp_call_function() in "atomic context", which this code is in because it holds a spinlock (and moreover, is running in the timer interrupt handler). This was "fixed" in <a href="http://www.mail-archive.com/git-commits-head@vger.kernel.org/msg15039.html">this commit</a> - they simply removed the call.
</li><li>
Another deadlock bug is shown in <a href="https://lkml.org/lkml/2009/1/2/373">this post</a> (It was fixed in 2008, according to <a href="https://lkml.org/lkml/2009/6/18/421">this</a> - but not all machines' linux versions had that fix for the 2008-2009 new year's leap second). The above code's call to printk actually needs to schedule the logging daemon kthread in order to print. Linux has a complicated "completely fair" scheduling algorithm that, when under enough system load, it needs to check the timer to determine what scheduling pattern to use. Thus, <i>only when under heavy load</i>, the call to printk() while holding xtime_lock would attempt to acquire xtime_lock again, causing deadlock.
</li><li>
A third deadlock bug, linked from the serverfault post I linked to in the intro, is shown and fixed in <a href="https://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=6b43ae8a619d17c4935c3320d2ef9e92bdeed05d">this commit</a>, dated a month and a half ago of this year. Nine days prior, the "ntp_lock" spinlock was split out from "xtime_lock", for finer locking granularity, but was wrong here because of circular lock ordering. Only kernels built from this nine-day window would have this bug.
</li><li>
Despite all these bugs' fixes, yesterday saw linux servers having problems around the world. Many people reported <a href="https://blog.mozilla.org/it/2012/06/30/mysql-and-the-leap-second-high-cpu-and-the-fix/">huge CPU spikes</a>, which turned out to be resulting from <a href="https://lkml.org/lkml/2012/6/30/122">futex misbehaviours</a>. (Literally while writing this post, the systems hacker friend who told me yesterday about the bug had the same futex problem on his own server.)<br />
It turned out that in bug #1 above, removing clock_was_set() was wrong after all. Technical details about this are <a href="https://lkml.org/lkml/2012/7/1/203">here</a> (also, major props to John Stultz, the same guy who fixed bug #3, for being on the case promptly last night, and for offering clarification when I emailed him). In short, the bug happened because the missing call caused sub-second high-resolution timers to always immediately return, which causes userspace applications that use them in loops to instead run in tight loops eating up CPU. The popular-seeming fix to this was to run 'date -s "`date`"', which calls settimeofday(), which calls clock_has_changed(), replacing the missing call (<a href="https://lkml.org/lkml/2012/7/1/27">reference</a>).
</li><li>
And yet, there is still another bug lurking whose cause nobody seems to have discovered yet. In the serverfault post I linked in the intro, there was also an error message "[3161000.864001] BUG: spinlock lockup on CPU#1, ntpd/3358". This message is printed when the kernel detects a spinlock has been held for a second or more, indicating (you guessed it) deadlock. The linux folks don't appear to have figured this one out yet (not to disparage them - it's only been a day).
</li></ol>
<br />
<b>Research Applicability</b><br />
<br />
An obvious zeroth-order conclusion: There were/are disproportionately many disastrous race conditions in the leap second code simply because it's such an infrequently-executed codepath. Hence, there's a call for some extra form of verification apart from "run the code as it is" - whether it's code-coverage-based stress tests, symbolic execution, or static analysis. I think static analysis would do best here.<br />
<br />
One interesting thing to note is all of the locks involved here are global - the xtime_lock and the ntp_lock, and also the "interrupt handling context" which is a global property of the code. Projects like RacerX (<a href="http://www.stanford.edu/~engler/racerx-sosp03.pdf">paper pdf</a>) would be well-capable of finding deadlocks #2 and #3 by simple callgraph analysis.<br />
<br />
Bug #1 is especially interesting, though. This "can't call scheduler code while in interrupt context" is a very simple property, for which there's currently a <i>runtime</i> check for (in kernel/smp.c):
<pre> /* Can deadlock when called with interrupts disabled. */
WARN_ON_ONCE(cpu_online(this_cpu) && irqs_disabled() && !oops_in_progress);</pre>
But I think this property can be ensured at <i>compile-time</i> - so that code with this bug will never build, let alone ship to production systems. I'd <a href="http://cgrouphacking.blogspot.com/2011/02/atomic-all-nighters.html">thought about this</a> before, and last fall I wrote a primitive checker for this as a class project, which I called <b>Atomic All-Nighters</b>, and which would have been <i>directly capable</i> of finding bug #1 without ever even compiling or running the code. Here's the <a href="http://bblum.net/atomic-all-nighters.pdf">writeup pdf</a>.<br />
<br />
My high-minded idealist goal for the Atomic All-Nighters project is to get these static properties of the code representable by constructs in the kernel programming language itself. That way, the compiler would refuse to build any code that had this sort of bug. I'm going to work more on that project more when I get back to CMU to start on my ph.d. in the fall. I'm excited for it.Benhttp://www.blogger.com/profile/00057900193141052120noreply@blogger.com4tag:blogger.com,1999:blog-8816427431442506110.post-45410492684841806972012-05-23T00:52:00.002-04:002012-08-05T22:30:45.318-04:00thesisTwo weeks ago I gave my master's thesis defence presentation, and last friday I submitted the thesis document itself. <a href="http://bblum.net/landslide-defence.pdf">here</a> is my defence slide deck, and <a href="http://bblum.net/thesis.pdf">here</a> is the dissertation.<br />
<br />
Among other things, in the dissertation I talk about:
<ul>
<li>How Landslide deals with the challenges of kernel-space pertaining to systematic exploration</li>
<li>How I evaluated Landslide (I met with students of 15-410 and gave them Landslide, and they found bugs with it)</li>
<li>What testing approaches I think should be involved in the future of systematic exploration</li>
<li>A handful of pictures (some educational and some for levity), and some humourous prose here and there as well</li>
</ul>
Hope you enjoy!Benhttp://www.blogger.com/profile/00057900193141052120noreply@blogger.com0tag:blogger.com,1999:blog-8816427431442506110.post-48355439286299324992012-04-06T12:09:00.001-04:002012-08-05T22:30:45.330-04:00410 lectureToday I presented Landslide to 15-410, in an attempt to get students to try landslide out, so I can evaluate how effective it is when students are using it on their own kernels.<br />
<br />
<a href="http://bblum.net/landslide-lecture.pdf">Here</a> are my lecture slides!Benhttp://www.blogger.com/profile/00057900193141052120noreply@blogger.com0tag:blogger.com,1999:blog-8816427431442506110.post-47998522731364529712012-02-07T14:09:00.000-05:002012-08-05T22:30:45.326-04:00convergingI have been playing around with various settings in the <i>arbiter</i> - the body of code that says when an instruction is interesting enough to be a choice point, and that also chooses whom to run next at a choice point we haven't visited before - and am interested to note the differences in the direction of tree traversal.<br />
<br />
At any choice point we haven't visited before, choosing any thread to run will make progress with the search (the real progress logic is in the <i>explorer</i>, which marks off old branches as all explored, uninteresting, or still interesting) - so the arbiter can choose to continue running the current thread (unless it descheduled itself), or to preempt and run somebody else. I call staying with the current thread the "leftmost branch", so a left-to-right tree traversal is one which starts off with fewer preemptions.<br />
<br />
The interesting trend is that left-to-right traversals seem to take more time to find a bug (which I did not expect), but produce shorter, more concise choice traces at the end (which I did). Traversing left-to-right finds an 8-long choice trace (2-3-4-2-3-2-3-4) in 6.5 minutes, while a right-to-left traversal of the same tree finds a 15-long choice trace (2-3-2-3-4-3-4-3-4-3-2-4-2-3-4) in 2.5 minutes.<br />
<br />
Also, with one particular bug that I edited the guest kernel to have, a different failure came up in each traversal order (one tripped an assert, and another got a thread wedged on an unowned-but-locked mutex). I wasn't surprised, but still think it is very cool.<br />
<br />
Next up: making sure landslide can easily find already-known bugs in student-submitted kernels, rather than just in my own.Benhttp://www.blogger.com/profile/00057900193141052120noreply@blogger.com0tag:blogger.com,1999:blog-8816427431442506110.post-50774541085338567682011-12-08T16:57:00.002-05:002012-08-05T22:30:45.336-04:00the lay of the (kernel-)landthe main question that's been flying around in my research-head over the last couple months is, "what's actually so different about kernel-space that makes this research, instead of just re-implementing user-space stuff that people have already done?"<br />
<br />
if I could write a comprehensive answer to this question, I'd have my thesis, so of course my answer isn't very put-together, but I think the scattered parts I've figured out (mostly thanks to talking to people at the PDL retreat) are concrete enough to talk about here.<br />
<br />
<b>environment</b><br />
<br />
it should be no surprise that wrapping a program in a dynamic testing framework is a lot harder when there's no "magic kernel" to sit on the other side of said framework from the thing being tested (in short, the kernel is the thing being tested itself). there are a few possibilities I've thought about for approaching this:<br />
<br />
1. running the guest kernel natively on the hardware itself. the testing framework would need to live inside the kernel itself, which presents some nasty challenges: for one, making sure that changing the thing being tested doesn't change the way its bugs show up; for two, it would take a bunch of complicated details to make run at all (such as, how is interrupt delivery controlled? certainly not automatically by the hardware anymore). (another possibility would be to use something like system management mode, which I haven't thought enough about.)<br />
<br />
2. simulating the guest kernel. this is what I'm actually doing for the project, using Simics, because it's easiest to actually write code for. controlling the kernel is as simple as asking Simics to call into Landslide once per instruction executed, and also asking it to trigger a timer interrupt when we decide it's time to preempt the currently running kernel thread. the downside, of course, is that simulation is slow, and running code once per simulated instruction makes it even slower.<br />
<br />
3. virtualisation. this would be sort of a happy middle-ground, where the guest kernel is running (for the most part) directly on the hardware, so there would be much less performance hit than with full simulation, but also the VMM holds some degree of control over the kernel that the testing framework could make use of. nevertheless, without per-instruction control, it would still require extra trickery both to track exactly what the guest kernel is doing (e.g. when does somebody get added to the runqueue?) and also to decide precisely when to trigger a nondeterminism event. requiring annotations inside the guest kernel would be one approach, but that would have its own limitations in terms of tracking "shared memory accesses" as potential decision points.<br />
<br />
<b>exploration strategies</b><br />
<br />
for the most part, the "decision tree" looks about the same as in multithreaded user-space programs, but some challenges arise that are inherent from having the concurrency implementation itself being part of what's being tested. for one, detecting when one thread switches to another is no longer a trivial thing that simply falls out of knowing which process is currently running (the kernel is "the process"), and also, once we decide which thread we want to run, causing that thread to run can't just be done by calling "yield to this thread"; you have to understand what the scheduler needs to have happen to cause that thread to start running. (both of these named problems are already solved in Landslide.)<br />
<br />
another interesting related problem arises when considering how to implement partial order reduction (the main algorithm that dBug uses). it works by determining that certain "steps" in each interleaving pattern are "independent" of each other, by virtue of their memory accesses being not in conflict (read/write or write/write). however, since each "step" both starts and ends in the context switcher / runqueue code (which is no longer a magic black box), there will necessarily be memory accesses to the same locations in each. some concessions will need to be made in accuracy to be able to say anything is independent of anything else - this is one of the things I intend to implement over the coming months.<br />
<b> </b><br />
<b> possible applications</b><br />
<br />
one major perspective I gained at the PDL retreat was that the things people want to debug kernels for<b> </b>are different than the things people want to debug userspace programs for. for the most part, the core bits of professional kernels (Linux, ...) are correct, since that's where all the talented programmers look most often; it's where the less talented programmers work - i.e., device drivers - that problems tend to show up.<br />
<br />
(the reason I'm getting away with not thinking about this in my project itself is because the case study is Pebbles, the kernel that students implement for 15-410.)<br />
<br />
so thinking about device drivers is a big avenue for future development of my subject. instead of just saying, "the timer interrupt is the only source of nondeterminism we need; we'll test for races between threads", the races we look for would be between interrupt handlers, bottom-halves, and threads executing related system calls. in this application, it's less meaningful to try to use a bunch of threads, and more meaningful to focus on the code paths themselves, so the timer interrupt would play second-fiddle to device interrupts (and the data that the device be communicating with those interrupts, such as network packets or disk read buffers).<br />
<br />
another possible avenue is to think about is kernels that run on "non-CPU" processors - for example, controller firmware for storage devices (the most striking example I remember discussing) are structured in a concurrent way to service parallel i/o requests. such "kernels" are already tested in simulated environments (so said the guy I was talking with about this), and so instrumenting the simulators with a dynamic race detection framework would not be too big of a step up.<br />
<br />
<b>(in summary)</b><br />
<br />
for the most part, these are all "pipe dream"-type thoughts, since I have all of a year to do the project itself and show that it works at all in kernel environments. but of course part of research is justifying its potential broader applications, so this stuff I consider to be the building blocks of my future work section.Benhttp://www.blogger.com/profile/00057900193141052120noreply@blogger.com2tag:blogger.com,1999:blog-8816427431442506110.post-78978519832288577062011-11-10T19:37:00.000-05:002012-08-05T22:30:45.346-04:00vanish_vanisha few weeks ago i hit the big milestone of being able to actually explore the decision tree. in addition to the meta-scheduler, this required me to build the following pieces of infrastructure:<br />
<br />
- tree explorer, to find the next unexplored branch. (71 lines of code)<br />
- save-and-restore, to travel back in time, as previously discussed (310 lines of code)<br />
- pretty-printing / diagnostics-reporting "found a bug" routine (42 lines of code)<br />
- deadlock detection (trivial, given the agent-tracking code already in place)<br />
<br />
i've been using <span style="font-family: "Courier New",Courier,monospace;">vanish_vanish</span>, a small userspace program that simply forks a child, which vanishes, then vanishes itself, to test the infrastructure as it comes into existence. the reason this is interesting is because i modified my student kernel so its vanish implementation does approximately as follows:<br />
<br />
<span style="font-family: "Courier New",Courier,monospace;">void NORETURN vanish()</span><br />
<span style="font-family: "Courier New",Courier,monospace;">{</span><br />
<span style="font-family: "Courier New",Courier,monospace;"> task_lock(self);</span><br />
<span style="font-family: "Courier New",Courier,monospace;"> ...</span><br />
<span style="font-family: "Courier New",Courier,monospace;"> for_each_child(self) {</span><br />
<span style="font-family: "Courier New",Courier,monospace;"> task_lock(child); // A</span><br />
<span style="font-family: "Courier New",Courier,monospace;"> child->parent = &init_process;</span><br />
<span style="font-family: "Courier New",Courier,monospace;"> task_unlock(child);</span><br />
<span style="font-family: "Courier New",Courier,monospace;"> }</span><br />
<span style="font-family: "Courier New",Courier,monospace;"> ...</span><br />
<span style="font-family: "Courier New",Courier,monospace;"> task_lock(self->parent); // B</span><br />
<span style="font-family: "Courier New",Courier,monospace;"> notify_waiter(self->parent);</span><br />
<span style="font-family: "Courier New",Courier,monospace;"> task_unlock(self->parent);</span><br />
<span style="font-family: "Courier New",Courier,monospace;"> ...</span><br />
<span style="font-family: "Courier New",Courier,monospace;">}</span><br />
<br />
the astute programmer will note that a concurrently vanishing parent and child could deadlock if the parent tries to execute line A while the child tries to execute line B, but this is not guaranteed. the extra nice thing about this test case is that to find the bug, one only needs to consider calls to <span style="font-family: "Courier New",Courier,monospace;">task_lock</span> as potential choice points (actually, only even such calls from within <span style="font-family: "Courier New",Courier,monospace;">vanish</span>, which is what i'm really doing here). (i'm going to add shared memory accesses soon, but that will make the tree explode, which would be bad for finding whether my infrastructure is correct.)<br />
<br />
using <span style="font-family: "Courier New",Courier,monospace;">vanish_vanish</span> as the test case and the naively-implemented guest kernel, landslide can find the deadlock in a little bit over 4 minutes:<br />
<br />
<div style="font-family: "Courier New",Courier,monospace;">...</div><span style="font-family: "Courier New",Courier,monospace;">[SCHEDULE] about to switch threads 3 -> 4</span><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;">[SCHEDULE] about to switch threads 4 -> 3</span><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;">[SCHEDULE] DEADLOCK! (3 -> 4 -> 3)</span><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;">[BUG!] **** A bug was found! ****</span><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;">[BUG!] **** Choice trace follows. ****</span><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;">[BUG!] Choice 1: at eip 0x00105ae5, trigger_count 1144575, TID 1</span><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;">[BUG!] Choice 2: at eip 0x00105ae5, trigger_count 1158049, TID 2</span><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;">[BUG!] Choice 3: at eip 0x00106930, trigger_count 1677311, TID 3</span><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;">[BUG!] Choice 4: at eip 0x00106930, trigger_count 1677854, TID 4</span><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;">[BUG!] Choice 5: at eip 0x00106930, trigger_count 1712596, TID 3</span><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;">[BUG!] Choice 6: at eip 0x00106930, trigger_count 1747558, TID 4</span><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;">[BUG!] Choice 7: at eip 0x00106930, trigger_count 1747805, TID 3</span><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;">[BUG!] Choice 8: at eip 0x00106930, trigger_count 1748273, TID 2</span><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;">[BUG!] Choice 9: at eip 0x00106930, trigger_count 1749356, TID 2</span><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;">[BUG!] Choice 10: at eip 0x00106930, trigger_count 1750372, TID 4</span><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;">[BUG!] Current eip 0x00106957, trigger_count 1750826, total triggers 61208725</span><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;">[BUG!] Total choice points 2705, total backtracks 1378, depths 18934</span><br />
<br />
apart from looking impressive, this output is a good prompt to talk about where the project is headed to next.<br />
<br />
to conclude that landslide can find deadlocks such as this in under 5 minutes is not an honest statistic, because the decision tree will get a lot bigger when i enrich the set of decision points, which is a necessary next step. (consider: if you removed one of the <span style="font-family: "Courier New",Courier,monospace;">task_lock</span> calls, there would still be a race, but it would not be a deadlock anymore, and we would need to look at shared memory accesses to find it.) what we can conclude from this test output, however, is the exploration rate: on average, it takes landslide 0.09 seconds per decision point in the tree. it's not clear how fast this is in terms of my eventual goal; this will become clear when i build a better set of decision points and try testing for more subtle race conditions.<br />
<br />
i'm certainly proud to have enough infrastructure to be able to say "look! it works!", but it's not clear how much weight that should carry until i have some evidence from a more realistic test case.Benhttp://www.blogger.com/profile/00057900193141052120noreply@blogger.com0tag:blogger.com,1999:blog-8816427431442506110.post-18561746849070844032011-11-09T13:19:00.001-05:002012-08-05T22:30:45.295-04:00PDL retreati presented landslide at the <a href="http://www.pdl.cmu.edu/Retreat/retreat11.shtml">PDL retreat</a>, as a 10-minute work-in-progress talk and also during the poster sessions. (<a href="http://maximegalon.andrew.cmu.edu/landslide-wip-pdl-2011.pdf">here</a> are my slides.)<br />
<br />
there's not much to say that wouldn't sound trite - in short, how great it was to synchronise with people from industry about what my project and i ought to be expecting out of each other, and also how great it was to get so many different advices from people about how to chart a path for my future.<br />
<br />
one thing that's really worth mentioning is that during the industry feedback session at the end of the retreat, christos from vmware called out my project as "the Holy Grail for anybody who does kernel development; I'd like to see that work moving forward."Benhttp://www.blogger.com/profile/00057900193141052120noreply@blogger.com0tag:blogger.com,1999:blog-8816427431442506110.post-2301507893759591882011-09-17T20:31:00.005-04:002012-08-05T22:30:45.300-04:00time travelan important tenet of dynamic race condition detection is being able to selectively go back in time to an earlier point in the guest's execution and make a different decision about which thread interleaving to choose next. the set of possible execution interleavings can be thought of as a tree, where each edge/node represents a scheduling decision (which thread to run next?) at a choice point (a point at which it might be interesting to preempt the guest).<br />
<br />
the really interesting question that I'll be tackling in the months to come is how to reduce the size of this tree to something manageable - for which I'll be borrowing ideas from dBug, possibly among other previous work - but for now, i'm setting up infrastructure that lets me explore the tree.<br />
<br />
in order to do so productively, though, landslide needs to preserve at least some state when jumping back to an earlier point to make a different decision, or else it won't be able to do any meaningful analysis (e.g., where to prune the tree, or even where it's visited before). but, landslide also tracks a lot of information about the guest kernel's execution (so it can know, for example, which threads are runnable), and that information should not be preserved, since it's specific to each particular execution path.<br />
<br />
what data is important to preserve when travelling "back in time", as it were, and what data should we discard? the implementation mechanism for restoring old state is already somewhat obtuse, and combined with all the different types of state that landslide manages, makes this somewhat difficult to think about.<br />
<br />
I realised, though, that there's already been a situation where I had to solve exactly this problem...<br />
<br />
<div class="separator" style="clear: both; text-align: center;"><a href="http://maximegalon.andrew.cmu.edu/images/braid-green.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="358" src="http://maximegalon.andrew.cmu.edu/images/braid-green.png" width="640" /></a></div><br />
now I need to figure out how to work the game's other mechanics into my code as well, right?Benhttp://www.blogger.com/profile/00057900193141052120noreply@blogger.com0tag:blogger.com,1999:blog-8816427431442506110.post-70844632940160289392011-07-15T17:58:00.005-04:002012-08-05T22:30:45.340-04:00The Importance of Being Assertive, A Trivial Style Guideline for Serious Programmersone of the key ways that landslide "talks" to the guest kernel (i.e., manipulating thread interleaving patterns) is by triggering timer interrupts at key points in the kernel's execution. in terms of very general race detection, this is about the equivalent of looking at the code and thinking "what happens if we get preempted here?" (which is the traditional way of writing "race-free" concurrent code); of course, in this project, it will have some extra automated cleverness attached so it can be effective.<br />
<br />
because landslide's code gets invoked at every instruction of the guest's execution, we must have a notion of a scheduling operation being "still in progress" - that is, after triggering a timer interrupt, there will be several instructions before a target thread (one we decided to switch to) actually starts running. if we take note of when the target thread starts up, we can provide a few guarantees about the instructions that will be run until then - namely, that the kernel must be executing in an interrupt handler and/or the context switcher (and NOT running the "meaty bits" of any other thread's code-path). this may seem obvious, but it is still an invariant that i would like to rely on, so i expressed it as an assert in the code:<br />
<br />
<div style="font-family: "Courier New",Courier,monospace;"> assert(ACTION(scheduler, context_switch) || HANDLING_INTERRUPT(scheduler));</div><br />
imagine my surprise, testing the implementation, when this assert tripped!<br />
<br />
after a bit of debugging, i discovered that the invariant violation was happening at the instruction immediately following the instruction at which i tried to trigger a timer interrupt. it turns out that, in some cases, simics may decide to delay interrupt processing by a few instructions (seemingly by a non-deterministic amount, too) after i set the CPU's pending interrupt flags.<br />
<br />
the fix (ensuring that when i decide to trigger a timer interrupt it is actually received immediately) is nothing special; what is important is to realise that my programming environment had some peculiarity that broke an assumption that i didn't even realise i was making when i established the invariants of my own code. so upon finding this new assumption (and writing code to make sure it worked), i added another assert:<br />
<br />
<div style="font-family: "Courier New",Courier,monospace;"> if (scheduler->just_triggered_timer_interrupt) {<br />
assert(get_current_eip() == get_timer_wrap_begin() &&</div><span style="font-family: "Courier New",Courier,monospace;"> "simics attempted to delay our interrupt! :<");</span><br />
<span style="font-family: "Courier New",Courier,monospace;"> scheduler->just_triggered_timer_interrupt = false;</span><br />
<span style="font-family: "Courier New",Courier,monospace;"> }</span><br />
<br />
now if something else of this nature goes wrong, i will know immediately, with a useful error message to boot. but imagine if i'd never written that first assert to begin with? simics could have merrily delayed all my interrupts for however long it wanted, i would never have known, and <i>wherever i would decide to trigger interrupts </i>(i.e., notionally good places for exposing race conditions) <i>would have no bearing on when they actually happened!</i> i could spend months on this project and it would never work right and i might never know why.<br />
<br />
use asserts, fellow hackers - not just comments or thoughts in your heads. you'll be happy for it later.Benhttp://www.blogger.com/profile/00057900193141052120noreply@blogger.com0tag:blogger.com,1999:blog-8816427431442506110.post-44075121813798323972011-07-04T00:55:00.003-04:002012-08-05T22:30:45.306-04:00agencein order to schedule threads in any interesting way, landslide will need to have pretty tight control over the internal scheduler of the kernel under test. this is not easy, especially since every kernel will have its own slightly different way of tracking scheduler state, so landslide will need a flexible and accurate way of monitoring what's going on in the guest kernel.<br />
<br />
i just finished a bit of framework that lets landslide see a "reflection" of the guest kernel's scheduler. soon, we'll be able to bend the guest's scheduling patterns to our whims, but for now, here's what we see when passively viewing the runqueue during the boot-up sequence (comments are my own, post-hoc):<br />
<br />
<div style="font-family: "Courier New",Courier,monospace;"><span style="font-size: small;">simics> c</span></div><div style="font-family: "Courier New",Courier,monospace;"><span style="font-size: small;">switched threads 1 -> -268370093 at 0x1009cf <i>// garbage from init i haven't cleaned up yet</i></span></div><div style="font-family: "Courier New",Courier,monospace;"><span style="font-size: small;">switched threads -268370093 -> 1 at 0x105385 <i>// it's research-quality; give me a break</i></span></div><div style="font-family: "Courier New",Courier,monospace;"><span style="font-size: small;">switched threads 1 -> 2 at 0x102f7b</span></div><div style="font-family: "Courier New",Courier,monospace;"><span style="font-size: small;">new agent 2 at eip 0x105613 -- [2, 1] <i>// shell fork()ed from init</i></span></div><div style="font-family: "Courier New",Courier,monospace;"><span style="font-size: small;">agent 2 gone at eip 0x1056d5 -- [1] <i>// shell blocks on readline()</i></span></div><div style="font-family: "Courier New",Courier,monospace;"><span style="font-size: small;">agent 1 gone at eip 0x1056d4 -- [] <i> // take init off RQ to schedule</i></span></div><div style="font-family: "Courier New",Courier,monospace;"><span style="font-size: small;">switched threads 2 -> 1 at 0x102f7b</span></div><div style="font-family: "Courier New",Courier,monospace;"><span style="font-size: small;">new agent 1 at eip 0x105613 -- [1] <i> // init back on RQ after context-switch</i></span></div><div style="font-family: "Courier New",Courier,monospace;"><span style="font-size: small;">agent 1 gone at eip 0x1056d5 -- [] <i> // init blocks on wait()</i></span></div><div style="font-family: "Courier New",Courier,monospace;"><br />
</div>one particularity of the exhibited guest kernel (my student kernel) is that every context-switch involves pulling the target thread off of the runqueue and putting it back on later - which we see clearly here. also, keep in mind that this is all from within simics; the kernel itself is totally unmodified. (obviously, the same output could be achieved by putting print statements into the kernel at key points, which is not the point here.)<br />
<br />
i also use the term "agent" to refer to a thread that is currently on the runqueue (i.e. can be context-switched to at any moment); it is recycled terminology from dBug. <br />
<br />
anyway, so if i type a command at the prompt, it continues:<br />
<br />
<span style="font-size: small;"><span style="font-family: "Courier New",Courier,monospace;">new agent 2 at eip 0x105613 -- [2] <i>// kbd handler sees "\n" and puts shell on RQ</i></span><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;">agent 2 gone at eip 0x1056d4 -- [] <i>// take shell off RQ to schedule</i></span><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;">switched threads 1 -> 2 at 0x102f7b</span><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;">new agent 2 at eip 0x105613 -- [2] <i>// shell back on RQ after context-switch</i></span><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;">switched threads 2 -> 3 at 0x102f7b</span><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;">new agent 3 at eip 0x105613 -- [3, 2] <i>// shell forks "juggle" process</i></span><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;">switched threads 3 -> 4 at 0x102f7b</span><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;">new agent 4 at eip 0x105613 -- [4, 3, 2] <i>// "juggle" forks its own child threads...</i></span></span> <br />
<br />
...and so on.Benhttp://www.blogger.com/profile/00057900193141052120noreply@blogger.com0tag:blogger.com,1999:blog-8816427431442506110.post-47581633072091585592011-06-14T14:38:00.001-04:002012-08-05T22:30:54.711-04:00secret sauceJiří tells meBenhttp://www.blogger.com/profile/00057900193141052120noreply@blogger.com0tag:blogger.com,1999:blog-8816427431442506110.post-45237966616702876232011-06-14T14:27:00.004-04:002012-08-05T22:30:45.313-04:00mandelbratwurstLast friday, my research accidentally met its goal of finding bugs - or at least, in one very specific case.<br />
<br />
I was looking into ways for my code to generate keyboard interrupts+input - it turns out simics has a nice interface in the <span style="font-family: "Courier New",Courier,monospace;">kbd0.key_event</span> attribute - and testing it by triggering some hard-coded keyboard input when the kernel reaches a particular (also hard-coded) point in the <span style="font-family: "Courier New",Courier,monospace;">fork()</span> path. the input was "<span style="font-family: "Courier New",Courier,monospace;">mandelbrot\n</span>", which (being received by the shell) should cause the so-named test to start running - except the input would be repeated when <span style="font-family: "Courier New",Courier,monospace;">fork()</span> runs again, so the kernel's <span style="font-family: "Courier New",Courier,monospace;">readline()</span> logic would have to deal with multiple inputted lines while another program is writing to the console at the same time.<br />
<br />
Here's what <span style="font-family: "Courier New",Courier,monospace;">pathos</span>, the 410 reference kernel, does:<br />
<br />
<div class="separator" style="clear: both; text-align: center;"><a href="http://www.contrib.andrew.cmu.edu/%7Ebblum/mandelbro-pathos.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="http://www.contrib.andrew.cmu.edu/%7Ebblum/mandelbro-pathos.png" /></a></div><br />
And what <span style="font-family: "Courier New",Courier,monospace;">BrOS</span>, the student kernel of Eric Faust (who is periodically keeping me company and scoffing at my code), does:<br />
<br />
<div class="separator" style="clear: both; text-align: center;"><a href="http://www.contrib.andrew.cmu.edu/%7Ebblum/mandelbro-bros.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="http://www.contrib.andrew.cmu.edu/%7Ebblum/mandelbro-bros.png" /></a></div><br />
Slightly different, and not in agreement with the Pebbles specification, though Eric claims it's misdesigned rather than buggy. Finally, here's what <span style="font-family: "Courier New",Courier,monospace;">POBBLES</span>, my own student kernel, does:<br />
<br />
<div class="separator" style="clear: both; text-align: center;"><a href="http://www.contrib.andrew.cmu.edu/%7Ebblum/mandelbro-pobbles.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="http://www.contrib.andrew.cmu.edu/%7Ebblum/mandelbro-pobbles.png" /></a></div><br />
Note the top-left character - that is where every single character of the mandelbrot fractal is being drawn. Also, the input string (rather, parts thereof) is drawn five times, despite only being issued twice. No "misdesign" would cause that...Benhttp://www.blogger.com/profile/00057900193141052120noreply@blogger.com0tag:blogger.com,1999:blog-8816427431442506110.post-12683189555730058892011-06-14T12:35:00.004-04:002012-08-05T22:30:45.290-04:00welcomeHello! This is the project blog for Ben Blum's 5th year master's project at CMU, under the advisory of <a href="http://www.cs.cmu.edu/%7Egarth/">Garth Gibson</a> and working with Jiří Šimša.<br />
<br />
Jiří has a project called <a href="http://www.pdl.cmu.edu/dBug">dBug</a>, which does runtime concurrency verification on userland programs by interposing itself between the application under test and the dynamic libraries, thereby to control the nondeterministic aspects of the program's execution. I hope to extend these same ideas to work in kernel-space, where concurrency issues can be much more intricate and subtle.<br />
<br />
15-410 - Operating System Design and Implementation - is a class at CMU where students, in a six-week-long project, implement a small UNIX-like kernel, called <a href="http://www.cs.cmu.edu/%7E410/p2/kspec.pdf">Pebbles</a>, from the ground up. Students do most of their development in <a href="http://www.virtutech.com/">Simics</a>, an x86 simulator, and typically produce code which has many race conditions, which have to be spotted either by the student during the project or by the grader after the project.<br />
<br />
I aim to develop a system for race condition detection on Pebbles kernels in the form of a Simics module, to be used by the 410 course staff to help grade student submissions and by the students to help debug their code, and hopefully to serve as a starting point for similar tools in more complicated environments (i.e., industrial kernels such as Linux).<br />
<br />
The project is called <i>landslide</i> because it shows that <i>pebbles</i> kernels are not as stable as one might hope.Benhttp://www.blogger.com/profile/00057900193141052120noreply@blogger.com0