Most programmers are by now familiar with the difference between the number of
bytes in a string and the number of characters. Depending on the string’s
encoding, the relationship between these two measures can be either trivially
computable or complicated and compute-heavy.
With the advent of Ruby 1.9, the Ruby world at last has this distinction
formally encoded at the language level: String#bytesize is the number of
bytes in the string, and String#length and String#size the number of
characters.
But when you’re writing console applications, there’s a third measure you have
to worry about: the width of the string
on the display. ASCII characters take up one column when displayed on
screen, but super-ASCII characters, such as Chinese, Japanese and Korean
characters, can take up multiple columns. This display width
is not trivially computable from the byte size of the character.
Finding the display width of a string is critical to any kind of console
application that cares about the width of the screen, i.e. is not simply
printing stuff and letting the terminal wrap. Personally, I’ve been needing it
forever:
- Trollop needs it because it tries to format
the help screen nicely.
- Sup needs it in a million places because it
is a full-fledged console application and people use it for reading mail in all
sorts of funny languages.
The actual mechanics of how to compute string width make for an interesting
lesson in UNIX archaeology, but suffice it to say that I’ve travelled the path
for you, with help from Tanaka Akira of pp fame, and I am happy to announce
the release of the Ruby console gem.
The console gem currently provides these two methods:
Console.display_width: calculates the display width of a string
Console.display_slice: returns a substring according to display offset and display width parameters.
There is one horrible caveat outstanding, which is that I haven’t managed to
get it to work on Ruby 1.8. Patches to this effect are most welcome, as are,
of course, comments and suggestions.
Try it out!.
If you’re writing a multithreaded Ruby program that uses ncurses,
you might be curious why program stops running when you call
Ncurses.getch. Sup has been plagued
by this issue since 2005. Thankfully, I think I finally understand
it.
The problem is that there is a bug in the Ruby ncurses library
such that using blocking input will block all Ruby threads when
it waits for user input, instead of just the calling thread. So
Ncurses.getch will cause everything to grind to a halt. This is
probably due to the library not releasing the GVL when blocking on
stdin.
This bug is present in the latest rubygems version of curses,
0.9.1. It has been fixed in the latest libncurses-ruby Debian
packages (1.1-3).
To see if you have a buggy, blocking version of the ruby ncurses
library, run this program:
require 'rubygems'
require 'ncurses'
require 'thread'
Ncurses.initscr
Ncurses.noecho
Ncurses.cbreak
Ncurses.curs_set 0
Thread.new do
sleep 0.1
Ncurses.stdscr.mvaddstr 0, 0, "library is GOOD."
end
begin
Ncurses.stdscr.mvaddstr 0, 0, "library is BAD."
Ncurses.getch
ensure
Ncurses.curs_set 1
Ncurses.endwin
puts "bye"
end
(I purposely require rubygems in there to load the rubygems
ncurses library if it’s present; you can drop this if you don’t
use rubygems.)
There are two workarounds to this problem. First, you can simply
tell ncurses to use nonblocking input:
Ncurses.nodelay Ncurses.stdscr, true
But if you’re writing a multithreaded app, you probably aren’t
interested in nonblocking input, unless you want a nasty polling
loop.
The better choice is to add a call to IO.select before getch,
which will block the calling thread until there’s an actual
keypress, and then allow getch to pick it up:
if IO.select [$stdin], nil, nil, 1
Ncurses.getch
end
IO.select requires a delay, so you’ll have to handle the
periodic nils that generates. But the background threads should no longer block.
There is one further complication, which is that you won’t be able
to receive the pseudo-keypresses Ncurses emits when the terminal
size changes, since they don’t show up on $stdin and thus the
select won’t pass. The solution is to install your own signal
handler:
trap("WINCH") { ... handle sigwinch ... }
You will still see the resize events coming from getch, but only
once the user presses a key. You can drop them at this point.
That should be enough to make any multithreaded Ruby ncurses app
able function. Of course, once everyone’s using a fixed version fo
the ncurses libraries, you can do away with the select and set
nodelay to false.
(One last hint for the future: I’ve found it necessary to set it
to false before every call to getch; otherwise a ctrl-c will
magically change it back to nonblocking mode. Not sure why.)
The 0.7 release ain’t the only exciting Sup news.
Here’s a list of interesting features that are currently cooking in Sup next,
along with the associated branch name.
- zsh completion for sup commandline commands, thanks to Ingmar
Vanhassel. (zsh-completion)
- Undo support for many commands, thanks to Mike Stipicevic.
(undo-manager)
- You can now remove labels from multiple tagged threads, thanks to
Nicolas Pouillard, using the syntax
-label). (multi-remove-labels)
- Sup works on terminals with transparent backgrounds (and that’s fixed
copy-and-paste for me too!), thanks to Mark Alexander.
(default-colors)
- Pressing ‘b’ now lets you roll buffers both forward and backward,
also thanks to Nicolas Pouillard. (roll-buffers)
- Duplicate messages (including messages you send to a mailing list, and
then receive a copy of) should now have their labels merged, except
for unread and inbox labels. So if you automatically label messages
from mailing lists via the before-add-hook, that should work better
for you now. (merge-labels)
- Saving message state is now backgrounded, so pressing ‘$’ after
reading a big thread shouldn’t interfere with your life. It still
blocks when closing a buffer, though, so I have to make that work.
(background-save)
- Email canonicalization, also thanks to Nicolas Pouillard. The mapping
between email addresses and names is no longer maintained across multiple
emails. (dont-canonicalize-email-addresses)
The canonicalization one is a weird one. There’s been a long-standing problem
in Sup where names associated with email addresses are saved and reused.
Unfortunately many automated systems like JIRA, evite, blogger, etc. will send
you email on behalf of someone else, using the same email address but different
names. The issue was compounded because Sup decided that longer names should
always replace shorter ones, so receiving some spam claiming to be from your
address but with a random name would have all sorts of crazy effects.
Addresses are still stored in the index, both for search purposes, and for
thread-index-mode. (Otherwise thread-index-mode has to reread the headers
from the message source, which is slow.) Once thread-view-mode is opened, the
headers must be read from the source anyways, so the email address is updated
to the correct version.
So, incoming new email should be fine. Sup will store whatever name is in the
headers, and won’t do any canonicalization.
For older email, you can update the index manually by viewing the message in
thread-view-mode, and forcing Sup to re-save it, e.g. by changing the labels
and then changing them back. Marking it as read, and then reading it, is an
easy way to accomplish this, at least for read messages.
You can also make judicious use of sup-sync to do this for all messages in
your index.
Sup 0.7 has been released.
You can read the announcement here
The big win in this release is that Ferret index corruption issues
should now be fixed, thanks to an extensive programming of locking and
thread-safety-adding.
The other nice change is that text entry will now scroll to the right
upon overflow, thanks to some arcane Curses magic.
Development of Sup is done with Git. Sup follows a
topic branch methodology: features and bugfixes typically start off as
“topic” branches from master, and are merged into an “integration”/“version”
branch next for integration testing. After n cycles of additional bugfix
commits to the topic branch, and re-merges into next, the topic branches are
finally merged down to master, to be included in the next release.
I really like this approach because I think it evinces the real power of Git:
that merges are so foolproof that I can pick and choose, on a
feature-by-feature basis, which bits of code I want at each level of
integration. That’s crazy cool. And users can stick to master if they want
something stable, and next if they want the latest-and-greatest features.
The biggest problem I’ve had, though, is that long-lived topic branches often
conflict with each other. This happens both when merging into next and when
merging into master. I don’t think there’s a way around it; isolating
features in this way has all the benefits above, but it also means that when
they touch the same bits of code, you’ll get a conflict.
As a lazy maintainer, the biggest question I’ve had is: is there a way to push
the burden of conflict resolution to the patch submitter? Is there a way for me
to say: hey, your change conflicts with Bob’s. Can you resolve the conflict and
send it to me?
One option I’ve considered is to have contributors to publish not only their
feature branches, but their next branch as well. Assuming they aren’t mucking
about with their next branch otherwise, if it contains just the merge commit,
I can merge it into mine, and it should be a fast-forward that gets me the
merge commit, conflict resolution and all.
But I don’t like that idea because, in every other case, I’m merging in the
feature branches directly. Why should I suddenly start merging in next just
because you have a conflict?
Furthermore, Sup primarily receives email contributions via git format-patch,
and I do the dirty deed of sorting them into branches and merging things
around. Requiring everyone to host a git repo iff they produce a conflicting
patch seems silly. (And git format-patch, unfortunately, produces nothing for
merge commits, even if they have conflict resolution changes. Maybe there’s a
good reason for this, or maybe not. I’m not sure.)
After some effort, and some git-talk discussion, I have a solution. And no, it
doesn’t involve sharing git-rerere caches. (Which it seems that some people
do!)
For the contributor: once you have resolved the conflict, do a git diff
HEAD^. This will output the conflict resolution changes. Email that to the
maintainer along with your patch.
For the maintainer:
$ git checkout next
$ git merge <offending branch>
[... you have a conflict, yada yada ...]
$ git checkout next .
$ git apply --index <resolution patch filename>
$ git commit
Running git merge gets you to the point where you have a conflict. Running
git checkout next . sets your working directory to the state it was before
you merged. And git apply applies the resolution changes.
You lose authorship of the conflict resolution, but you can use git commit
--author to set it.
I think the ideal solution would be for git format-patch to produce something
usable in this case. I see some traffic on the Git list that suggests this is
being considered, so hopefully one day this rigmarole will not be necessary.
In Rethinking Sup part I, I
concluded that Sup the MUA is an evolutionary dead end, and that the future
lies in Sup the Service (STS). But what does that mean?
One thing I want to make clear it does not mean is any abandonment of the Sup
curses UI. That particular “user experience” has been refined over the past few
years to become my ideal email interface. It would be silly to throw that away.
What will happen to the curses code is that it will become one client among
(hopefully) many. Once there’s a clear delineation between UI and backend, you
can make a UI choice independent of making a choice to use Sup in the first
place. You can run sup-curses-client if you want. Or you can build
a web interface, or an Openmoko interface. Working with ncurses has always been
the least enjoyable part of Sup, so maybe I’ll actually enjoy learning
Javascript.
What backend functionality will STS actually provide? If I were simply
reworking Sup into a client and a server, the obvious answer would be “a
searchable, labelable, threaded view of large amounts of email”.
But reworking Sup is a great time to extend its original goals. In particular,
I would love for STS to handle to other types of documents besides email. I’ve
always used my inbox as a mechanism for writing notes to myself. I’ve
experimented briefly with reading RSS feeds through it. I’d like STS to support
email, of course, but not to be limited by it.
My grand vision: STS will be a searchable, labelable, threaded view of large
numbers of documents.
You can throw whatever you want in there, and STS will store it, thread it, and
let you label and search for it. Email, RSS feeds, notes, jabber and IRC logs,
web pages, RI documents—I want you to be able to throw them all in there. I
want you to be able to annotate any of those things by adding notes and
threading them against the original objects. Basically I want STS to be the
primary tool you use for organizing and recalling all the textual information
you’ve ever encountered in your life.
Cool, huh?
There’s another convenient benefit to this transformation: no one will expect
STS to act like a MUA. STS does its own storage. You add your email and your
other documents to the server and then you can throw those files away (or not).
There are no more questions of supporting IMAP or various mbox dialects or “why
doesn’t Sup treat Maildir correctly”. The files are in STS, and once they’re
their, they’re out of your hands. You’ll be able to export them, of course, and
if you’re crazy you might be able to write an IMAP server translation layer
for STS, but there will be no more expectation of realtime Maildir handling. As
I explained in part I, that’s a game I don’t want to play.
STS is a grander vision than a MUA, and it no longer has to be hobbled by the
constraints of being expected to act like one.
Some other nice benefits of reworking Sup into SYS:
- You’ll be able to run multiple clients at once.
- It’s an opportunity to rework some things. For example, one of the most
noticeably slow operations in Sup (“Classic”) is assembling a large thread.
This is because I made a decision early on to do all threading at search time.
That made certain things easier (in particular, I could change the threading
model without having to rescan the entire index), but in retrospect the cost is
too high. STS will maintain document trees directly.
- I can replace Ferret with Sphinx. It’s been a good couple years, but the
periodic non-deterministic index corruption that’s been an issue for over a
year is an exit sign to me.
Working with Sphinx is nowhere nearly as nice as working with Ferret, but speed
and stability go a long way.
I’ve been working on the code for STS on and off for the past couple weeks and
it’s slowly starting to come together. Once the major components have at least
been all sketched, I will host a git repo.
It’s been clear to me for a while now that Sup has been trying to be two very
different things at once, thus pleasing no one and irritating everyone. There’s
Sup the email client, which is kind of the standard view of things. And then
there’s Sup the service: a threaded, fielded, searchable, labelable view into
your email.
Sup the email client is lacking in many ways, as many people have been very
quick to point out to me. The most obvious of these is that it refuses to
actually, you know, actually write back any state to your mailstore.
Specifically, read/unread state is never written anywhere except its internal
index. Furthermore, mailstore rescans of most any type are incredibly slow.
These two features make using it in conjunction with other clients near
impossible, which pretty much breaks one of the primary principles of tool
design: don’t break other tools. (Then there’s also the problem of IMAP
connections being terrifically slow and prone to crashes, but I lay most of
that blame on IMAP being a crappy protocol and the Ruby IMAP libraries leaving
a lot to be desired.)
Sup the service, on the other hand, suffers from the rather obvious flaw of not
being exposed in any manner other than through Sup itself (and irb, I suppose).
I think the reason for this bizarre situation stems from my goal of fusing two
very different things together: mutt and Gmail. Mutt is a client; Gmail is a
service; Sup cherry-picks functionality, and lack of functionality, from both.
Examples: I refused to have Sup write back to mailstores because Gmail didn’t
have to export to your local Maildir or mbox file, so why should I? (Well
technically, I said I would accept patches that did that, but that I wouldn’t
be working on that feature myself. A fine distinction!) At the same time, I
pooh-poohed the notion of a Sup server because mutt didn’t have a server, and
so why should Sup? And so on.
For Sup to evolve into something more useful than it is, and that appeals to a
broader audience than it currently does, I believe it has to go down one of
these routes completely. And I believe I know which one, and I believe this can
be done without compromising the basic user experience, which I would be very
reluctant to do because it has been lovingly tweaked over the years to be
William’s Ideal Email Experience.
The first option is to make Sup more of a client. In order to be a real email
client, Sup must be able to interoperate with other clients. This means it has
to write back all its state to the mailstores: read/unread status in whatever
manner the mailstore supports, and probably something like all labels in a
special header. It must also be able to do a full rescan in a fast manner, so
that changes by other clients are reflected.
Right off the bat, that seems impossible, redundant with other software, and
not that interesting. As I wrote in a sup-talk thread from a few months
ago:
Sup is never going to be able to compete with programs like Mutt in
terms of operations like “open up a mailstore of some format X, and mark a
bunch of messages as read, and move a bunch of messages to this other
mailstore.” That’s a tremendous amount of work to get right, get safe and get
fast, and Mutt’s already done it well, and I sure don’t want to have to
reimplement it. Competing with mutt on grounds of speed, stability, and
breadth of Mailstore usage is a recipe for fail. Ruby sure as shit ain’t gonna
come close to C for speed (at least until Rubinius gets LLVM working), and
mutt’s already hammered out all the quirkinesses with Exchange, etc.
But not only would it be impossible, it wouldn’t be interesting. The things
that make Sup valuable are the UI, the indexing and the flags, and those simple
don’t translate to external mailstores. Furthermore, Sup is aimed at the
mailstores of the future (my present mailstores), which are so big that mutt
can’t handle them anyways.
So that leaves Sup as a service. And that’s where things get interesting. But
I’ll save that for a later post.