the all-thing | 2010-07-29 19:44:58 -0400 ========================================== My little continuation framework -------------------------------- Date: June 18, 2009 9:26pm Author: William Morgan Labels: continuations, web URL: http://all-thing.net/my-little-continuation-framework.txt Since continuations are so fast in Ruby 1.9 [1], I've been playing with a simple little contination-based web framework I wrote based on code I ripped out of Whisper [2]. Using continuations is really awesome for webapps because you can write application control flow just like you would write the regular program control flow. You can write it in a single method, store state in local variables, use @if@ statements, whatever. For example, here's the entire code for a webapp that lets you increment and decrement a number: def run counter = 0 while true click = html <Current counter value: #{counter}.

To increment, #{link_to "click here", :increment}

To decrement, #{link_to "click here", :decrement}

Thanks!

EOS case click when :increment counter += 1 when :decrement counter -= 1 end end end Pretty obvious huh? It's a loop, and when someone clicks on the increment link, you increment the counter, and when they click on the decrement counter, you decrement it. The app displays the HTML you see, and provides @/increment@ and @/decrement@ links. The logic isn't spread out across different methods and different classes like it would be in a regular framework. I added back-button support too, so pressing the back button after clicking increment brings you to the previous number, and further increments and decrements do the right thing. I realize frameworks like Seaside and Wee have been doing this forever, but it's new to me and very fun. [1] http://all-thing.net/fibers-via-continuations [2] http://masanjin.net/whisper/ (Reply to this at http://all-thing.net/my-little-continuation-framework.txt.) Ritex 0.3 released ------------------ Date: June 18, 2009 2:41am Author: William Morgan Labels: ritex, releases URL: http://all-thing.net/ritex-0.3.txt I've released Ritex 0.3. No API or functionality changes; this is just a set of miscellaneous tweaks that make Ritex work on Ruby 1.9. (Reply to this at http://all-thing.net/ritex-0.3.txt.) Fibers via Continuations ------------------------ Date: June 1, 2009 2:47pm Author: William Morgan Labels: ruby, ruby1.9, fibers, continuations URL: http://all-thing.net/fibers-via-continuations.txt In the last post I talked about some differences between fibers and continuations [1]. What may not have been clear is that continuations are more primitive and flexible than fibers are. In fact, you can implement fibers using continuations. Here's how. The basic idea is that we want to maintain two variables with continuations in them, @inside@ and @outside@. The first one will transfer execution into the block of code that forms the fiber. The second will transfer control back to the outside world. When the outside world calls @#resume@, we save our continuation point as @outside@, and call the current @inside@ continuation. When, within the block, @#yield@ is called, we save our current continuation point as @inside@, and transfer code back to the current @outside@. There are a few more details in terms of passing values from @#yield@ to @#resume@, handling the return value of the block, and handling excessive calls to #@resume@, but that's the basic story. Here's the code: require 'continuation' class CFiber class Error < StandardError; end def initialize &block @block = block callcc do |cc| @inside = cc return end @var = @block.call self @inside = nil @outside.call end def resume raise Error, "dead cfiber called!" unless @inside callcc do |cc| @outside = cc @inside.call end @var end def yield var callcc do |cc| @var = var @inside = cc @outside.call end end end This is also runnable on Ruby 1.8--just remove the @require@. So why does Ruby 1.9 bother to implement fibers, when we can just use continuations? I don't know what the real answer is, but "speed" is at least a _good_ answer. Let's do some some benchmarking to compare the two: require 'benchmark' n = ARGV.shift.to_i Benchmark.bm do |bm| bm.report " fibers" do f = Fiber.new do x, y = 0, 1 loop do Fiber.yield y x, y = y, x + y end end n.times { |i| f.resume } end bm.report "cfibers" do f = CFiber.new do |c| x, y = 0, 1 loop do c.yield y x, y = y, x + y end end n.times { |i| f.resume } end end We'll start with backporting that code to the Ruby 1.8.7 that Ubuntu provides (ruby 1.8.7 (2008-08-11 patchlevel 72)). For 10000 Fibonacci numbers, we see: table{margin-left: auto; margin-right: auto;}. | |_. user |_. system |_. total |_. real | | _cfibers_ |>. 0.810000 |>. 0.070000 |>. 0.880000 |>. 0.879930 | That's roughly 11.4kfps (that's thousand Fibonacci numbers per second) that we can produce using continuation-based fibers. Let's try the ancient Ruby 1.9.0 that Ubuntu provides (Ruby 1.9.0 (2008-06-20 revision 17482)): table{margin-left: auto; margin-right: auto;}. | |_. user |_. system |_. total |_. real | | _fibers_ |>. 0.040000 |>. 0.000000 |>. 0.040000 |>. 0.037583 | | _cfibers_ |>. 18.680000 |>. 1.770000 |>. 20.450000 |>. 20.482006 | Wow, fibers are fast: 250kfps. But things have gotten significantly worse for cfibers, clocking at a measely 0.489kfps for cfibers. Finally let's try the latest and greatest Ruby 1.9.1 (ruby 1.9.1p129 (2009-05-12 revision 23412)): table{margin-left: auto; margin-right: auto;}. | |_. user |_. system |_. total |_. real | | _fibers_ |>. 0.040000 |>. 0.000000 |>. 0.040000 |>. 0.035148 | | _cfibers_ |>. 0.150000 |>. 0.000000 |>. 0.150000 |>. 0.155890 | Fibers are just as fast as before, but continuations have improved dramatically--from 11.4kfps to 66.6kfps. Still, native fibers are more than three times faster. So perhaps Ruby 1.9.1 is the best of both worlds. When you need fast non-preemptive concurrency, you can use native fibers; when you need to implement your own crazy control structures, you can use continuations and be assured that they're still pretty darn fast (at least, as far as Ruby operations are concerned). [1] http://all-thing.net/fibers (Reply to this at http://all-thing.net/fibers-via-continuations.txt.) Fibers vs Continuations ----------------------- Date: May 31, 2009 9:20pm Author: William Morgan Labels: ruby1.9, ruby, fibers, continuations URL: http://all-thing.net/fibers.txt Ruby 1.9 has both fibers and continuations. The two are often mentioned in the same breath. They do vaguely similar-sounding things, and are implemented in Ruby 1.9 with similar mechanics underneath the hood [1], much as how continuations and threads were implemented with the same underlying mechanics in Ruby 1.8 [2] [PDF, p. 14]. But implementation similarities aside, continuations and fibers have very different semantics. A fiber behaves as a thread without preemption. Like a thread, you create it, and it eventually dies; unlike a thread, you must manually call @yield@ and @resume@ to transfer control in and out of it, instead of just letting the runtime call them for you whenever it feels like it. Like a thread, when you resume a fiber, you have the same call stack and heap state (local variables) as when you left. What's nice about fibers is that, since you keep explicit control of the order of execution, you can get thread-like behavior without all the hassle of mutexes and synchronization. Of course you have to deal with the hassle of ordering all your operations, but you at least have the option of avoiding the fun race-condition game that always seems to crop up in threaded programming. What about continuations? Instead of fibers' create, kill, yield, and resume operations, a continuation only really has two operations: capture and resume. A continuation is captured once, and may be resumed multiple times. When you resume a continuation, the call stack is reverted to what it looked like when it was captured, but the heap state stays the same. There's no exit point or death for a continuation (at least until Ruby gets bounded continuations); execution simply continues from the capture point. What's nice about continuations is that you can use them to implement control structures. Loops, exceptions, cross-procedure gotos... almost every control structure you can come up with can be implemented with continuations. In fact, you can implement fibers using continuations [3]! Let's look at an example. Here's the fiber-based Fibonacci computation from the InfoQ article on Fibers in Ruby 1.9 [4] fib = Fiber.new do x, y = 0, 1 loop do Fiber.yield y x, y = y, x + y end end 20.times { puts fib.resume } Here we call @yield@ from within the fiber once we've computed a number, which transfers control to the main function, and which prints out the number yielded and then calls @resume@ to transfer control back to the fiber. A thread version looks very similar: require 'thread' q = SizedQueue.new 1 fib = Thread.new do x, y = 0, 1 loop do q.push y x, y = y, x + y end end 20.times { puts q.pop } Since we don't have explicit control over the scheduling, we implicitly scheduled the order of operations by using a synchronized @SizedQueue@ data structure, which blocks the computation thread from computing a new number until the printing thread is ready to receive it. (There are many ways we could've accomplished this.) Here's the version using continuations: require 'continuation' c, x, y, i = callcc { |cc| [cc, 0, 1, 1] } puts y c.call c, y, x + y, i + 1 if i < 20 You'll notice there are no loops, and variables are never changed after assignment. In fact the code is starting to look suspiciously like an inductive proof, with one line that like a base case and another line that looks like a recursive case. You can see why continuations make functional-programming enthusiasts get excited! This implementation works because resuming the continuation (the call to @c.call@) replaces the call stack and point of execution with what they were at the point it was captured (the call to @callcc@). In contrast, @resume@-ing the fiber moved us back to the point we were when the fiber called @yield@, and so the outer @loop@ in the fiber implementation was necessary. Beyond call stacks, another major difference between fibers and continuations is the way the heap is treated. Multiple fibers on the same section of code do not share local variables. Multiple continuations on the same section of code _do_. Here's a brief example. First, the fibers version: fib = (0 ... 5).map do |i| Fiber.new do x = 0 Fiber.yield x x += 1 end end fib.each { |f| puts f.resume } We create five fibers, and call @resume@ on them once each. As you'd expect, this prints out a series of 0's. The variable @x@ is not shared between the multiple fibers. Of course, the fiber constructor here is a block, and blocks are closures, so we could _make_ them share state by moving the @x = 0@ line outside the @map@ line. But that's a result of having closures, not of fibers per se. Let's try an example with multiple continuations, all jumping into the same point in the code: require 'continuation' x = 0 c = callcc { |cc| cc } d = callcc { |cc| cc } if c e = callcc { |cc| cc } if c && d f = callcc { |cc| cc } if c && d && e x += 1 puts x c.call if c d.call if d e.call if e f.call if f We initialize @x@ to 0, create 4 separate continuations, add one to @x@, and call the continuations in order. (The postfix @if@ statements ensure that the continuations variables aren't set or called more than once. Calling @c.call@ without arguments will jump back to the @c = callcc@ line and set @c@ to @nil@.) Silly, but it illustrates the point: the output is "1 2 3 4 5", meaning that the four continuations all share the same heap. When @d@ is called, its @x@ is the same as the @x@ of @c@, and even though it was 0 when @d@ was captured, it has since been modified by the resumption of @c@. When @e@ is called, its @x@ is also the same @x@, and so on. (In fact this whole example depends on this behavior--each of the continuation variables are only set once, and must "retain" their value across all rentries to continuations above them.) In additon to multiple continuations being able to share state, the converse is true too: multiple resumes on the same continuation will share state: require 'continuation' x = 0 c = callcc { |cc| cc } x += 1 puts x c.call c while x < 5 This outputs the same thing as the examples above. Hopefully that clears up some of the confusion. Here's the summary: table{margin-left: auto; margin-right: auto;}. |_. Fibers |_. Continuations | |^. Four operations: create, exit, yield, resume. |^. Two operations: capture and resume. | |^. Upon resume, call stack is wherever it was at the last yield. |^. Upon resume, call stack is where it was when captured. | |^. Do not share state except via closure. |^. Multiple continuations and multiple invocations of the same continuation can share state. | [1] http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-core/19609 [2] http://www.atdot.net/~ko1/pub/ContinuationFest-ruby.pdf [3] http://all-thing.net/fibers-via-continuations [4] http://www.infoq.com/news/2007/08/ruby-1-9-fibers: (One reply on this article at http://all-thing.net/fibers.txt.) Whisper 0.5 released -------------------- Date: May 20, 2009 4:48pm Author: William Morgan Labels: whisper, releases URL: http://all-thing.net/whisper-0.5.txt I've released Whisper [1] version 0.5. Lots of good stuff since 0.3 (I didn't announce 0.4 because it was a minor bugfix release): * Nested comments are now properly supported. * New
 and  blocks added.
* A new @whisper-process-email@ command for manually reprocessing email.   You
can also offload all email processing to this program instead of the main
Whisper server, if you like.
* New dependency for the 0.2 version of RiTeX [2], which   has equation array
support (see announcement [3] for details).
* Better mbox-splitting code, now that I've figured out how to do this properly
  in Sup [4].
* RiTeX macros now properly persist throughout an entry.
* Many other minor bugfixes: attribution lines in emails, various incorrect
bits of HTML output, escaping of Ritex error messages, etc.

Try it now!
1. @sudo gem install whisper --source http://masanjin.net/@
2. @whisper-init @
3. Follow the instructions.

[1] http://masanjin.net/whisper/
[2] http://ritex.rubyforge.org/
[3] http://all-thing.net/ritex-0.2
[4] http://sup.rubyforge.org/

(Two replies on this article at http://all-thing.net/whisper-0.5.txt.)

Teaching your grandmother to suck eggs
--------------------------------------
  Date: May  6, 2009  1:31pm
Author: William Morgan
Labels: ancient-greek, whisper, ubuntu
   URL: http://all-thing.net/suck-eggs.txt

If you're reading a random diatribe on whether C and C++ are good for
numerical computing [1] and happen to come across the curious expression
"teaching your grandmother to suck eggs", and decide to learn more about it,
you'll quickly find references to early usages in the 1749 Henry Fielding
novel, ??Tom Jones [2]??, in which the protagonist recounts:

   "I remember my old schoolmaster, who was a prodigious great scholar,
    used often to say, Polly matete cry town is my daskalon. The English
    of which, he told us, was, That a child may sometimes teach his
    grandmother to suck eggs."

And if you then think to yourself, what the heck is "Polly matete cry town is
my daskalon"? you need only grab your handy copy of William Shepard Walsh's
1909 Handy-book of literary curiosities [3], look up "Polly matete" in the
index, and find that it's the transliteration (transphoneticization?) of:

   "πολλοι μαθηται κρειττονες διδασκαλον"

which is the last line of a Greek epigram attributed "sometimes to Phillippus
of Thessalonica, sometimes to Lucilius (both of whom lived in the early days
of the Roman Empire)", translated as:


Hermes, the volatile, Arcady's president,
  Lacquey of deities, robber of herds,
In this gymnasium constantly resident,
  Light-fingered Aulus bore off with these words:
Many a scholar, by travelling faster
On learning's high-road, runs away with his master.


So there you go. And if you're wondering what the original phrase means, Walsh
provides this helpful explanatory rhyme:


Teach not a parent's mother to extract
  The embryo juices of an egg by suction:
The good old lady can the feat enact
  Quite irrespective of your kind instruction.


As a side note, Whisper [4] now supports poems, and I just learned how to type
Greek in Ubuntu.

[1] http://scienceblogs.com/goodmath/2006/11/the_c_is_efficient_language_fa.php
[2] http://en.wikipedia.org/wiki/The_History_of_Tom_Jones,_a_Foundling
[3] http://books.google.com/books?id=hrJkAAAAMAAJ
[4] http://masanjin.net/whisper/

(One reply on this article at http://all-thing.net/suck-eggs.txt.)

Jaunty Jackalope rockin' my world
---------------------------------
  Date: May  1, 2009  4:01am
Author: William Morgan
Labels: ubuntu
   URL: http://all-thing.net/jaunty.txt

Ubuntu 9.04 is really pleasing me. Bootup time is 20 seconds (!) from GRUB to
login prompt, which is pretty incredible. It takes another 20 seconds from
typing my password and hitting enter to a Desktop with all my bells and
whistles loaded. In total, only 4 times the amount of time it takes for @ri
clone@ to print something useless!

Other things I'm enjoying:
* The fancy notification for IMs, wireless connections, etc is really sweet.
It looks nice, is non-intrusive, and properly stops when the corresponding
window has focus. I just wish it were somehow configurable.
* The orange satellite light on my laptop lights up every time the wireless is
 accessed. Every time I send an IM I feel like I'm a fucking astronaut.
* Various wireless issues I was having with 8.10 seem to have disappeared.
* Screen profiles [1]. So cool.

[1] https://help.ubuntu.com/9.04/serverguide/C/screen-profiles.html

(Reply to this at http://all-thing.net/jaunty.txt.)

git-wtf 58b87fe9 released
-------------------------
  Date: May  1, 2009  1:08am
Author: William Morgan
Labels: git, releases, git-wtf
   URL: http://all-thing.net/git-wtf-58b87fe9-released.txt

I've released version 58b87fe9 of git-wtf, available here:
http://git-wt-commit.rubyforge.org/git-wtf [1]

This version contains a fairly major change: branches on origin are treated as
equal to local branches, and branches that are remote-only are denoted with @{
}@. So now there are three possible symbols: @( )@ for local-only, @{ }@ for
remote-only, and @[ ]@ for branches that appear on both origin and your local
repo.

The motivation was dealing with the fact that Sup has very many feature
branches going at once, but I work on it on several different computers and
typically only have a subset of them checked out. I didn't want anyone to be
left out....

I also fixed a few minor things like removing the restriction that version
branches be local branches.

[1] http://git-wt-commit.rubyforge.org/git-wtf

(Reply to this at http://all-thing.net/git-wtf-58b87fe9-released.txt.)

Ritex 0.2 released
------------------
  Date: April  2, 2009  1:02pm
Author: William Morgan
Labels: ritex, releases
   URL: http://all-thing.net/ritex-0.2.txt

It's been almost four years since the previous release, so I'm happy to
announce that Ritex [1] 0.2 has been released today. This version features
many bugfixes an improvements, most notably:

* Array options are now supported. (Necessary to get   the @eqnarray@-style
equation alignment in this   post [2]
* Unary minus heuristics are much improved.

Here's a quick demo of the unary minus:

table{margin-left: auto; margin-right: auto;}. | @-x@                |
-x                | | @x-x@               | x-x
          | | @x--x@              | x--x              | |
@\alpha-x@          | \alpha - x        | | @\alpha\,-x@        |
\alpha\, - x      |

Sadly, just as with LaTeX itself, there are still times where you have to hint
to get the right behavior:

table{margin-left: auto; margin-right: auto;}. | @\sin-x@            |
\sin-x            | | @\sin{-x}@          | \sin{-x}
 |

 Over the years since the last release it looks like there are two new options
for generating MathML in Ruby. Itex2MML [3] has developed Ruby bindings, and
there's some other project just called MathML [4]. The big win for Ritex over
these packages, of course, is macro support:

table{margin-left: auto; margin-right: auto;}. |
@\define{\onion}{\hat{\theta}}@     | \define{\onion}{\hat{\theta}} _-_     | |
@\define{\potato}[1]{E_\theta[#1]}@ | \define{\potato}[1]{E_\theta[#1]} _-_ | |
@\potato{\onion}@                   | \potato{\onion}
      |

A quick @gem install ritex@ should get it for you, and you can see some more
example input/output pairs here [5].


[1] http://masanjin.net/ritex/
[2] http://localhost:9292/smoothing.)
[3] http://golem.ph.utexas.edu/~distler/blog/itex2MML.html
[4] http://mathml.rubyforge.org/
[5] http://masanjin.net/ritex/report.xml

(Reply to this at http://all-thing.net/ritex-0.2.txt.)

Smoothing users' votes
----------------------
  Date: March 31, 2009 11:16pm
Author: William Morgan
Labels: stats
   URL: http://all-thing.net/smoothing.txt

In a previous post [1] I describe how you can cook up a Bayesian framework
that results in IMDB's so-called "true Bayesian estimate", a formula which, on
its face, doesn't look particularly Bayesian.

As my astute commenters pointed out, this formula has many simpler
interpretations without needing to invoke the B word. For example, it's a
linear interpolation between two values: \define{\that}{\hat{\theta}} 
\that=\lambda(v) R + (1-\lambda(v))\tau

where R is our mean vote, \tau is some smoothing
target, \lambda(v) is the smoothing weight.  \lambda(v) can be any
function, as long as it increases with v, stays between 0 and
1, and is 0 when v is 0. Those constraints give you the right
behavior: with no votes, your estimate is \tau exactly; as you add
votes, it approaches R, and \lambda(v) controls how fast
that happens.

This formulation naturally leads to the following question: if I'm smoothing
like this to deal with paucity-of-data issues, what value of \tau
should I pick?  IMBD uses the \tau=C, the global movie mean.
Intuitively that makes sense, but is it the right choice?

What's nice about the expression for \that above is that the
behavior we're most interested in is when v=0, i.e. when there
are no votes. In that case, \that=\tau, because of how I've constrained
\lambda(v). So finding the best \tau is equivalent to
finding the best \that when v=0.


\define{\risk}{R(\theta, \that)}
\define{\loss}{L(\theta, \that)}
\define{\exp}[1]{E_\theta\left[#1\right]}


Happily, we can answer the question of the best \that
analytically, at least if we're happy to imagining that there is a "true"
value of the movie \theta.

Given \theta, we can define a loss function \loss that
describes how bad we think a particular value of \that is. But we
don't really know what \theta is for any movie (if we did, we
wouldn't be bothering with any of this). So we can generalize that a step
further and define a risk function \risk=\exp{\loss} quantifying our _expected
loss_: the aggregate of the loss function across all possible values of
\theta, weighted by the probability of each value. This gives us
the tool we really need to answer the question above: the \that
that minimizes our risk is the winner.

In the absence of any specific notions about errors, we'll use the standard
loss function for reals, squared-error loss: \loss = (\theta-\that)^2. Then it's just
a matter of churning the crank: 
\array{\arrayopts{\colalign{right center left}}
\risk & = & \exp{\loss} \\
                 & = & \exp{(\theta-\that)^2} \\
                 & = & \exp{\theta^2-2\theta\that + \that^2} \\
                 & = & \exp{\theta^2} - 2\that \exp{\theta} + \that^2
}


We can drop that first term since we're only interested in minimimizing this
as a function of \tau. To find the minimum:


\array{\arrayopts{\colalign{right center left}}
\frac{d}{d\that} {-2\that} \exp{\theta} + \that^2 & = & 0 \\
-2 \exp{\theta} + 2\that & = & 0 \\
\that & = & \exp{\theta}
}


Unsurprisingly, we see that the best estimate of \theta under
squared-error loss is the mean of the distribution of \theta. Since
we're interested in the case where v=0, this implies that the
best value to use for \tau is also the mean.

So IMDB's choice of C makes sense: the mean vote over all your
movies is a great estimate of the mean of the distribution of
\theta.

A couple concluding points:
1. This answer is specific to squared-error loss; if you plug in another loss
function, the optimal value for \tau might very well change. And
you might actually have a specific model in mind for how "bad" mis-estimates
are. Maybe over-estimates are worse than under-estimates, or something like
that.
2. The definition of the distribution of \theta is actually
completely vague above. In fact we don't even talk about it; we just use it
implicitly in our \exp{\cdot} terms. So you should feel free to plug in
(the mean of) whatever distribution you believe most accurately represents
your product/movie/whatever. IMDB could arguable to better by plugging in
per-category means, or something even fancier.
3. IMDB is actually a particularly bad case because movie opinions are extremely
subjective.  If you're serious about modeling very subjective things, we
should be talking about multinomial models, Dirichlet priors, and the like.

But the take-home message is: in the absence of a specific loss function that
you really believe, smoothing towards the mean isn't just intuitive, it's
minimizing your risk.

[1] http://all-thing.net/bayesian-average

(Reply to this at http://all-thing.net/smoothing.txt.)

Pages
-----
* Page 1: http://all-thing.net/index.txt
* Page 2: You're reading it.
* Page 3: http://all-thing.net/index/2.txt
* Page 4: http://all-thing.net/index/3.txt
* Page 5: http://all-thing.net/index/4.txt
* Page 6: http://all-thing.net/index/5.txt
* Page 7: http://all-thing.net/index/6.txt
* Page 8: http://all-thing.net/index/7.txt


This delicious text version served up by Whisper .