the all-thing | 2010-09-06 18:18:17 -0400 ========================================== Indirect threading for VM opcode dispatch ----------------------------------------- Date: January 3, 2009 10:50am Author: William Morgan Labels: vm, optimization, whisper URL: http://all-thing.net/indirect-threading.txt There's a good discussion with lots of interesting details on a recent patch submission for adding indirect threading to the Python VM [1]. (And by "discussion" I mean a single, un-threaded sequence of comments where you have to manually figure out who's replying to what, which apparently is what everyone in the world is happy with nowadays except for me. Email clients have had threading since 1975, bitches, so get with the fucking program. _[Hence, Whisper--ed.]_) Pointed to by programming.reddit.com [2], which remains surprisingly useful, as long as you cut yourself once the comment thread devolves (as it invariably does) into meta-argumentation. Indirect threading is a vaguely-neat trick that I first learned about around the time I was getting into the Rubinius code. The idea is that, in the inner loop of your VM, which is going through and interpreting the opcodes one at a time (dispatching each to a block of handler code), instead of jumping back to the top of the loop at the end of handler code section, you jump directly to the location of the handler code for the next opcode. The big benefit is not so much that you save a jump per opcode (which maybe is optimized out for you anyways), but that the CPU can do branch prediction on a per-opcode basis. So common opcode sequences will all be pipelined together. But the discussion shows that this kind of thing is very compiler- and architecture-dependent, and you have to spend a lot of time making sure that GCC is optimizing the "right way" for your particular architecture, is not overly-optimizing by collapsing the jumps together, etc. OTOH, the submitter is reporting a 20% speedup, and this is the very heart of the VM, so it could very well be worth spending time on such trickery. More information: * The structure and performance of efficient interpreters [3] [pdf] * Inline threading, Tracemonkey, etc. [4] * A Pypy-dev thread on threaded interpretation [5]. * Various performance-specific bits of the V8 Javascript interpreter design [6]. [1] http://bugs.python.org/issue4753 [2] http://programming.reddit.com/ [3] http://www.jilp.org/vol5/v5paper12.pdf [4] http://blog.mozilla.com/dmandelin/2008/08/27/inline-threading-tracemonkey-etc/ [5] http://codespeak.net/pipermail/pypy-dev/2008q4/004916.html [6] http://code.google.com/apis/v8/design.html This delicious text version served up by Whisper .