Replacing Clever Code with Unremarkable Code in Go

by Baron Schwartz on 04 Jun 2013

Not too long ago, my primary programming language was Perl. I've written a lot of Perl, including some things that I think are quite clever. And therein lies the problem.

Clever code is dangerous. It feels great because when I'm puzzling with a sticky problem — the abstraction is wrong, or the language design doesn't support the abstraction I want, or so on — and then I find some neat trick, it's so … cool. The feeling of delight at an “elegant” way to do something is such a blast of endorphins, such a reward for struggling through.

A lot of the clever code I've written involved callbacks. In fact, I once blogged a review of a Perl book, Higher-Order Perl, that takes the notion of callbacks deeper and deeper, until you're breathless at the beauty of it. You feel initiated into an inner circle. You feel like you've discovered LISP in an alternate dimension (but you haven't).

climberOne of the things I like about Go is that it offers much better ways to write things that you'd usually write with callbacks. The resulting code feels unremarkable, not clever at all. In fact, it reads like a straightforward solution of the problem. As a programmer, I don't get a high out of that, but it's much better. A straightforward solution of the problem is superior to one that makes you feel like a high priest for having discovered it.

As an example, in Percona Toolkit several of the tools have the notion of a “processing pipeline.” An object or bit of data is passed into a function, whose job is to manipulate it and pass it to another function, and so on in turn until all of the functions are done and the data comes out the other end.

The design is slightly difficult to get working well in Perl, in part because the obvious-feeling solutions have many shortcomings. For example, some of the requirements we realized we had over time:

  1. The pipeline isn't pre-defined. One should be able to add and remove stages at will.
  2. Any pipeline stage should be able to drop a bit of data and not pass it further.
  3. Any pipeline stage should be able to terminate the whole program.
  4. Any pipeline stage should be able to time out. The feeder process at the head of the pipeline should be able to time out, too.
  5. The feeder should be able to retry failures, as should individual stages.
  6. Termination, or an error, should be possible to handle gracefully without losing all of the work that's been done.

These types of requirements are necessary for a resilient, robust program that can do useful work on lots of data in unexpected conditions.

The requirements are also surprisingly difficult to fulfill in Perl. An expert Perl programmer can probably suggest several good ways to do this type of work, but a novice or intermediate programmer will almost certainly struggle. Try to imagine how to do this with iteration, recursion, chained callbacks, or so on. One or more of the requirements brings in edge cases that are subtle or annoying to handle. That's a sign of trouble. If a program to do such a simple thing — pass data along an assembly line — requires an expert to create or understand, then there's a lot of risk. Risk of bugs, key-man risk, schedule risk, and so on.

Take a look at the source code of pt-query-digest and search for “pipeline,” and you get a sense of how complicated the pipeline is to handle. Read some of the comments about the edge cases with starting an iteration, the “terminator,” and “data left in the pipeline” — youch. I am the original author of this program and it's clearly out of control. It's a huge risk to work with code like that.

I don't like risk. I especially don't like it to be lurking deeply in something that ought to be simple.

Is it reasonable to say that this type of program ought to be simple? Yes. The task is simple. It's essentially the same as Unix pipe-and-filter design. That realization  highlights the real problem: Perl's design doesn't make this simple, obvious job easy to implement in a simple, obvious way. In fact, most C-like languages have the same problem.

Enter Go. Go's design actually does make programs like this simple and obvious. If you find yourself writing callbacks in Go, you're almost certainly doing it wrong. You should be using channels instead. A channel is a typed conduit. You can send and receive data over the channel. The channel is the analog to the Unix shell's | character. It is one of the fundamental building blocks of concurrency. It helps avoid code that's tightly coupled, synchronized, or causes action at a distance.

And as a result, programs written with channels feel straightforward and obvious. I've recently written or rewritten several programs with channels in a handful of lines of code, without any mutexes or callbacks, which would have been nightmares in other languages. If you're interested, I encourage you to spend half an hour watching this video from Google I/O 2013, because the presenter does a much better job at it than I can.

I like many things about Go, but replacing callbacks with language constructs that map directly to the real task at hand is one of my favorites.

By the way, I don't know the original source of the image, but for those readers who aren't climbers, what the climber does is astonishing — and probably more sensational than what is required. Sort of like using callbacks to create a pipeline!

comments powered by Disqus