It’s been 12 years since the first MathML spec was released, and math on the
web is still largely unsupported and incredibly complicated to get right. If
that isn’t a spec failure, I don’t know what is.
Personally, after a year of doing my best to do MathML the “right” way, I’ve
given up trying to be correct. I’m now using MathJax
to render math, a solution that is, while absolutely horrible, far less
horrible than before. In particular, people can actually see math.
But William! you might say, all you need to do for math on the web is to
generate MathML. Firefox has supported MathML for years and years! And that’s
true, BUT:
- Random browsers simply don’t support MathML (e.g. Chrome, Safari).
- The browsers that do support it (e.g. Firefox) support it only in strict compliance mode.
And strict compliance mode is an absolute hell for everyone involved. You must
produce valid XHTML and sending your content as text/xml instead
of text/html. Any kind of non-conforming XML produces a horrible
cryptic red error message instead of displaying the page. You will begin to
live in fear of screwing things up whenever you make a chance to your layout,
and god help you if you have any kind of UGC or templating or any kind of
non-trivial content generation.
MathJax smooths that away. You can embed MathML or even LaTeX math markup
directly into a text/html document, and it will do the magic to
turn it into math in the browser. If the browser has native MathML support,
then great, it will use that. If the browser has web font support, then great,
you get pretty fonts. And if not, the degradation is graceful. And you get the
nice error-robust rendering that makes HTML nice.
I’m still using Ritex to translate LaTeX math
into MathML, because I like the syntax and because I didn’t feel like going
back and translating all the math. I’ve changed
Whisper to emit text/html as the
content type. So now I should have the best of all possible worlds.
Let’s try it:
If you see math above, I have succeeded. If not, I have failed.
I’ve released Whisper version 0.5.
Lots of good stuff since 0.3 (I didn’t announce 0.4 because it was
a minor bugfix release):
- Nested comments are now properly supported.
- New <pre> and <poem> blocks added.
- A new
whisper-process-email command for manually reprocessing email.
You can also offload all email processing to this program instead of the main
Whisper server, if you like.
- New dependency for the 0.2 version of RiTeX, which
has equation array support (see announcement for details).
- Better mbox-splitting code, now that I’ve figured out how to do this properly
in Sup.
- RiTeX macros now properly persist throughout an entry.
- Many other minor bugfixes: attribution lines in emails, various incorrect
bits of HTML output, escaping of Ritex error messages, etc.
Try it now!
sudo gem install whisper --source http://masanjin.net/
whisper-init <blog directory>
- Follow the instructions.
If you’re reading a random diatribe on whether C and C++ are good for
numerical
computing
and happen to come across the curious expression “teaching your grandmother to
suck eggs”, and decide to learn more about it, you’ll quickly find references
to early usages in the 1749 Henry Fielding novel, Tom
Jones, in
which the protagonist recounts:
I remember my old schoolmaster, who was a prodigious great scholar, used
often to say, Polly matete cry town is my daskalon. The English of which, he
told us, was, That a child may sometimes teach his grandmother to suck eggs.
And if you then think to yourself, what the heck is “Polly matete cry town is
my daskalon”? you need only grab your handy copy of William Shepard Walsh’s
1909 Handy-book of literary curiosities, look up “Polly
matete” in the index, and find that it’s the transliteration (transphoneticization?) of:
πολλοι μαθηται κρειττονες διδασκαλον
which is the last line of a Greek epigram attributed “sometimes to Phillippus of Thessalonica, sometimes to Lucilius (both of whom lived in the early days of the Roman Empire)”, translated as:
On a Stolen Statue of Mercury
Hermes, the volatile, Arcady's president,
Lacquey of deities, robber of herds,
In this gymnasium constantly resident,
Light-fingered Aulus bore off with these words:
Many a scholar, by travelling faster
On learning's high-road, runs away with his master.
So there you go. And if you’re wondering what the original phrase means, Walsh
provides this helpful explanatory rhyme:
Teach not a parent's mother to extract
The embryo juices of an egg by suction:
The good old lady can the feat enact
Quite irrespective of your kind instruction.
As a side note, Whisper now supports
poems, and I just learned how to type Greek in Ubuntu.
So apparently WebKit has no real MathML
support. Empirically, it seems
like you get some stuff like greek symbols, but things like sums and whatnot
don’t appear. Oh well. Mac users, switch to Firefox, or ignore the math posts.
I’ve released Whisper 0.3. This is mostly a bugfix release, with generally
better email support, including support for MIME multipart email.
How to do it:
sudo gem install whisper --source http://masanjin.net/
whisper-init <blog directory>
- Follow the instructions!
I’ve released Whisper 0.2. Beyond some minor
bugfixes, the big enhancement in this one is that the “post as micro mailing
list” idea now works. The comments on every post form a mailing list, with
everyone who commented auto-receiving everyone else’s comments, and all replies
being archived on the mailing list.
Of course you can set your reply settings on a per-comment basis to disable
this, or to restrict it to only send immediate replies to your comment. The
only thing you can’t do so far is change your settings (e.g. from all to none)
once you’ve made them. That will be coming later.
Still to go: trackbacks, I guess, and maaaaybe add textarea comments.
Get it: sudo gem install whisper --source http://masanjin.net/
I’ve finally pulled in all the old comments from the Blogspot blog. A painful
process of semi-automated Atom to YAML+Textile conversion, and the resulting
comments are not threaded, but they’re at least here now.
As a side note, I’m really liking having my posts stored in a git repo. I can
write them locally, tweak them and see how things look, and push when they’re
finally ready to be published.
As another side note, MathML is a being a shitshow as usual. Firefox 3.1 (but
not 3.0?) apparently craps out at embedded style sheets in XML (craps out as
in, refuses to display the blog and displays a big red error instead), or some
shit. So I’ve removed some stylesheet line from the master template and now
everything seems to work in both Firefoxes. But that line is critical
according to Putting mathematics on the Web with
MathML so god only knows what I’ve broken in the
process.
The big problem with all this MathML stuff is that the XML wonks apparently
managed to trick everyone into violating Postel’s law and failing hard when the
browser doesn’t like something about the XML it sees. So the moment anything is
slightly out of whack, no one can see your blog. Maybe that’s why no one in the
world uses MathML except for me?
That brings to mind an old Mark Pilgrim post about XML and Postel’s
Law which is a good
read, and includes this memorable quote:
Various people have tried to mandate this principle out of existence, some
going so far as to claim that Postel’s Law should not apply to XML, because
(apparently) the three letters “X”, “M”, and “L” are a magical combination that
signal a glorious revolution that somehow overturns the fundamental principles
of interoperability.
Good stuff. Too bad that was five fucking years ago and I’m still dealing
with this shit.
I’ve released Whisper 0.1. Now you can blog like
me. It will happily serve static files, though if you’re expecting heavy
traffic, you might put it behind something like Nginx. (See instructions in the
configuration file for more.)
How to do it:
sudo gem install whisper --source http://masanjin.net/
whisper-init <blog directory>
- Follow the instructions!
I’ve done some benchmarking on Whisper. Here are the results, with
a few points of comparison:
| system |
req/s |
ms/req |
delta ms/req |
| nginx static |
13736.04 |
7.280 |
|
| rack/thin |
3065.24 |
32.624 |
25.344 |
| whisper/no logging |
1918.56 |
52.123 |
19.499 |
| whisper |
1833.40 |
54.544 |
2.421 |
Nginx static is nginx serving a static file. We see it can handle 13k
requests per second, and takes about 7ms for a single request. If we add
a simple Thin server on top of that, going through Rack, we immediately
drop requests/second by an order of magnitude, and it takes us an extra
25ms/request. That’s the cost of using Ruby.
Adding Whisper on top of that requires another 19.5 ms/requests,
bringing our rate down to 1919 requests/second, or over 7 times slower
than Nginx serving static files. And if you want logging with that, add
another 2.4 ms/request.
That 2.4ms/request is interesting, because it’s basically the result of
a few puts statements. Yes, Ruby is expensive. The bare Rack/Thin
performance shows the headroom I have on the Ruby side (i.e. without
rewriting the whole thing in C). If a puts is that expensive, then
stripping out a couple debugging statements and caching some regexp
results would probably result in a very noticable improvement in
performance.
But how many requests/second do you need to be able to survive being
Slashdotted? A brief web search suggests a high estimate of “several
hundred”. Let’s say that means 300 req/s. That means that Whisper is
already 6 times the Slashdot effect requirement. So it’s almost definitely not
worth complicated the code for the sake of performance.
Experiment parameters: these are all tests using ab (the Apache
benchmark tool) with 100 concurrent requests, averaged over 50k
requests. The tests were performed by connecting to localhost (i.e.
going over the network stack but not over the network itself), on a
quad-core Intel 2Ghz (Q8200) running 64-bit Linux 2.6.27. YMMV.
Over the past month or so I’ve been spending some time hacking together yet
another blogging platform, to satisfy all my (admittedly weird) blogging
desires. It’s finally at the point where I can host my fascinating insights on
it, so here you go. It’s called Whisper, and you’re looking at it now.
Interesting features:
- No RDBMS. Storing your blog entries in a RDBMS is like driving to work in the
Space Shuttle.
- YAML+Textile, sitting on a disk. Like Hobix, blog posts
and comments are stored on disk in regular files, using a mix of YAML and
Textile. This means you can keep your content under version control, and you
can edit it with whatever editor you desire. Unlike Hobix, the entry content is
stored in a separate file from the metadata, so there’s none of the trickiness
of embedding Textile in YAML.
- Sits directly on top of Rack (or Thin). No intermediate layer to slow
things down. These particular bits are served from Thin over a unix socket to
Nginx.
- Lazy cached dependency graph: every bit of content is cached, built lazily,
and a part of a big dependency graph. That means almost every request is
served directly from memory, and making a change, like adding or updating an
entry, forces a regeneration of only those bits that require it.
Infrequently-requested bits of content eventually expire.
- Markup enhancements: I’ve added some extra processing on top of Textile to do
the things I’ve always wanted to do. Ruby code is automatically
syntax-highlighted, LaTeX math expressions are turned into MathML (via
RiTeX ), etc. Finally I can write purty-lookin’
math and code without a ridiculous amount of effort.
- Threaded comments. Why would you not have this?
- Comments via email. This is still a work in progress, but comments can
currently only be made by entering your email address, and replying to the
resulting email. This allows you to quote, thread, and generally have a
reasonable discussion, which is what email is good at, and what typing shit
into little text areas on your web browser is not. The eventual goal
is to automatically mirror the entire conversation, but right now it just
mirrors individual replies.
- Multiformat support. In addition to HTML and RSS output, there’s a plain
text mode for the hard-core.
- Pagination, labels, per-label and per-author indices, etc.
- The whole thing amounts to a little over 1200 lines of code.
The code’s still a while away from being ready for public consumption, but I’ve
put up a git repo here: git://masanjin.net/whisper.
The next steps are to flesh out the code enough to make it usable by other
people, make a gem, and maybe publish some performance numbers.