Go I/O is Fun*!

2014-01-25

* For certain definitions of fun.

One of my favorite parts of Go is the interface type. There has been sufficient hate recently about the missing generics but, to be honest, I haven’t missed them one bit. One of the reasons I haven’t missed them is the use of interfaces in go, especially in the standard library. The io package is a great example.

Whether you open a file, read from a network, or request a web page, they all implement the io.Reader interface. This means that your code can be generic in the sense that the source of the data is irrelevant.

The notion of interfaces allow for very interesting design patterns. One pattern that I’ve found useful lately is to wrap an interface with some logic, but provide the same interface so that existing code doesn’t need to change. Let me give you an example of what I mean.

Suppose that you already have an application that POSTs data to a web server for permanent storage. Right now your code accepts an io.Reader and uses that as the data being POSTed. Now lets say you want encrypt that data beforehand. One option would be to encrypt the data to another file and then use that file’s io.Reader to send the data.

It’s not a bad solution, but a more elegant solution would be to encrypt the data on the fly before it’s sent to the server. There would be little overhead and no additional disk space or I/O would be required. We can do this because http.Request accepts any io.Reader. Our encryption interface need only implement the io.Reader interface and then the http.Request will happily use it.

This has become common enough for me that I’ve written a library called wrapio. It wraps a given io.Reader or io.Writer with some mutator or inspector. As data is passed through, it can be manipulated or checked without making huge copies in memory or disk.

A working example of this can be found at this gist. Let’s walk through some of the code.

block = wrapio.NewBlockReader(bme.BlockSize(), in)
last = wrapio.NewLastFuncReader(pad, block)
crypt = wrapio.NewFuncReader(func(p []byte) {
    bme.CryptBlocks(p, p)
}, last)

First, since we are using AES to encrypt the data, we need to do some things before we do the actual encryption. Block mode ciphers can only encrypt blocks whose length is a multiple of the block size. We use the NewBlockReader to wrap the input around another io.Reader that only sends data in full blocks, except possibly the last block. To accommodate that last block being smaller than the block size, we use the NewLastFuncReader to call the pad function on the last block. It basically looks at the last block and adds some padding to the end of the block if necessary. Finally, we wrap that io.Reader in another io.Reader that does the encryption before passing it along.

In just a few lines of code we’ve implemented an on-the-fly encryption mechanism that can send the data anywhere an io.Reader is used. Existing parts of your application that use an io.Reader can generally accept it without any changes.

What if we are getting the data back from the server and want to decrypt it? No problem, we just do the same with the io.Reader we get back from the response body.

block = wrapio.NewBlockReader(bmd.BlockSize(), in)
crypt = wrapio.NewFuncReader(func(p []byte) {
    bmd.CryptBlocks(p, p)
}, block)
last = wrapio.NewLastFuncReader(unpad, crypt)

The order is slightly different because we want to decrypt the block before we remove the padding from the end of the data.

I can’t vouch for the effectiveness of the encryption being done here, but I think it gives you a good idea of how powerful interfaces are. By wrapping the I/O interfaces you can do just about anything with your data as it’s being passed through. Functions like ioutil.ReadAll and io.Copy become extremely useful and in some of my cases have become the only function call in the main loops of my programs.

As always, feedback, pull request, comments, etc. are welcome. Enjoy your I/O!


Dynamic Server Management With Doozer and Groupcache

2013-11-16

Groupcache, while not a complete replacement for memcached, is an amazing caching library. In just a few lines of code, you can greatly improve the access time to your immutable data. One of the problems many people quickly run into when using groupcache is maintaining a list of peers where the cached data is distributed.

Using a configuration file is a fairly brittle method for maintaining the list. Fortunately, this isn’t a new problem. There are several programs which store configuration information in a highly available and consistent system. ZooKeeper is fairly popular with some of the people I know. There is even a library for connecting to ZooKeepr in Go: gozk. Let’s be Go purists for a few minutes though and give Doozer a try.

Applications like Doozer solve the problem of maintaining a list of peers in two ways. First, we can fetch a current list of peers from the Doozer server. Second, we can create a watch on the list of peers. Whenever the list changes, Doozer will push us an updated list. We can use both of these to keep our list of peers up to date.

I’ve put together a contrived example in a gist. In this example, we are caching definition responses from a dict service. Why would you want to do that? I don’t know. Maybe you host your own dict service and there is a spelling bee craze in your town. The actual cached data isn’t what’s interesting in this discussion.

Here is how we’ll setup our groupcache:

// Setup the cache.
pool = groupcache.NewHTTPPool("http://" + addr)
dict = groupcache.NewGroup("dict", 64<<20, groupcache.GetterFunc(
    func(ctx groupcache.Context, key string, dest groupcache.Sink) error {
        def, err := query(key)
        if err != nil {
            err = fmt.Errorf("querying remote dictionary: %v", err)
            log.Println(err)
            return err
        }

        log.Println("retrieved remote definition for", key)
        dest.SetString(def)
        return nil
    }))

The query function can be found farther in the file, but it basically Dials into the dict service and asks for the definition, returning the first. With this setup, we’ll only have to query the dict service if it’s not already in the cache.

If we left it at that, each instance of groupcache would only store results locally. We want to distribute the cache among a group of servers though, so we need to tell groupcache where its peers are with HTTPPool.Set. In our application we do that by first fetching the list from Doozer, then we add ourselves to the list, and finally we call the Set method:

// Run the initial get.
data, rev, err := d.Get(peerFile, nil)
if err != nil {
    log.Println("initial peer list get:", err)
    log.Println("using empty set to start")
    peers = []string{}
} else {
    peers = strings.Split(string(data), " ")
}

// Add myself to the list.
peers = append(peers, "http://"+addr)
rev, err = d.Set(peerFile, rev,
    []byte(strings.Join(peers, " ")))
if err != nil {
    log.Println("unable to add myself to the peer list (no longer watching).")
    return
}
pool.Set(peers...)
log.Println("added myself to the peer list.")

The calls to Doozer are relatively simple. In fact most of this code is logging and error checking. We take this a step further by actively listening to signals and changes from Doozer and updating the peer list accordingly:

// Setup signal handling to deal with ^C and others.
sigs := make(chan os.Signal, 1)
signal.Notify(sigs, os.Interrupt, os.Kill)

// Get the channel that's listening for changes.
updates := wait(d, peerFile, &rev)

for {
    select {
    case <-sigs:
        // Remove ourselves from the peer list and exit.
        for i, peer := range peers {
            if peer == "http://"+addr {
                peers = append(peers[:i], peers[i+1:]...)
                d.Set(peerFile, rev, []byte(strings.Join(peers, " ")))
                log.Println("removed myself from peer list before exiting.")
            }
        }
        os.Exit(1)

    case update, ok := <-updates:
        // If the channel was closed, we should stop selecting on it.
        if !ok {
            updates = nil
            continue
        }

        // Otherwise, update the peer list.
        peers = update
        log.Println("got new peer list:", strings.Join(peers, " "))
        pool.Set(peers...)
    }
}

We start by setting up two channels. The first is used for receiving signals from the operating system. When we kill the application, we remove ourselves from the peer list before we exit. The second is used for receiving changes from Doozer. Doozer doesn’t give us a channel to receive from, but wrapping it around one is fairly simple. If you aren’t familiar with how to do that, you can check out the wait function.

Grab the code and try it out yourself! You’ll first want to download Doozer and start up an instance if you don’t have one:


./doozerd -w false

You may also need to get the libraries if you don’t have them already:


go get github.com/golang/groupcache
go get github.com/ha/doozer

In one terminal run:


go run main.go -addr="127.0.0.1:8000"

Then in another terminal, run a similar command using a different port:


go run main.go -addr="127.0.0.1:8001"

Notice in the first terminal, that Doozer notifies us of the change:

got new peer list:  http://127.0.0.1:8000 http://127.0.0.1:8001

You can keep adding new instances and you should see all the other systems receive the updates. Try killing one of the instances (Ctrl+C) and notice that it removes itself from the list.

You can verify that the service works using curl. For example:


$ time curl http://localhost:8002/define/piston
Piston \Pis"ton\, n. [F. piston; cf. It. pistone piston, also
   pestone a large pestle; all fr. L. pinsere, pistum, to pound,
   to stamp. See {Pestle}, {Pistil}.] (Mach.)
   A sliding piece which either is moved by, or moves against,
   fluid pressure. It usually consists of a short cylinder
   fitting within a cylindrical vessel along which it moves,
   back and forth. It is used in steam engines to receive motion
   from the steam, and in pumps to transmit motion to a fluid;
   also for other purposes.
   [1913 Webster]
curl http://localhost:8002/define/piston  0.01s user 0.01s system 7% cpu 0.264 total

$ time curl http://localhost:8002/define/piston
Piston \Pis"ton\, n. [F. piston; cf. It. pistone piston, also
   pestone a large pestle; all fr. L. pinsere, pistum, to pound,
   to stamp. See {Pestle}, {Pistil}.] (Mach.)
   A sliding piece which either is moved by, or moves against,
   fluid pressure. It usually consists of a short cylinder
   fitting within a cylindrical vessel along which it moves,
   back and forth. It is used in steam engines to receive motion
   from the steam, and in pumps to transmit motion to a fluid;
   also for other purposes.
   [1913 Webster]
curl http://localhost:8002/define/piston  0.01s user 0.01s system 76% cpu 0.020 total

Because we’ve cached the results of the definition of piston, the second request is substantially faster. Overall, the code involved is fairly simple and the wins are huge. You no longer need to worry about maintaining a list of peers in some configuration file because your code will take care of it.

There are a couple of things you can do to improve upon the code we have here. Some applications may have multiple caches they want to maintain. In our contrived example, we may want a cache for different dictionaries. Making this change isn’t too hard. You’d want to make a channel yourself and then modify the wait function to use that channel when responding to changes. Then for each cache, you’d want to run wait for that cache in a goroutine.

Another way to improve upon this code might be to maintain the list of peers more rigorously. In a catastrophic or network failure, the peer might not be able to remove itself from Doozer. To solve this problem, you could have a cron job periodically poll the servers on the list and remove the servers that are not accessible.


Hosting Your Own Godoc

2013-05-20

One of the biggest reasons to use Go is its great tooling. Godoc is one of those tools and it makes documentation so simple that you’ll be wanting to document your code instead of dreading it. Besides having a command line interface for displaying documentation, it can also start an HTTP Server. Golang.org is a public version of this for the standard packages. GoDoc does a similar thing for public packages.

I have some private repositories though that I host internally and I’ve always wanted to be able to view the package documentation. There are at least a couple of ways to accomplish this. First, if you simply want to host it your local machine, godoc can do that for you:

$ godoc -http=:6060

This will create an HTTP web server that listens on port 6060 and serves up the standard documentation as well as anything in your GOPATH.

What I’ve always wanted to do though is have it run on the server that’s hosting all my repositories. I finally spent some time getting this working and it turns out to be quite simple. Since the HTTP Server is already built, we just need a way to get the system to recognize it as a service. I use Arch Linux and systemd is the default service manager. If you use something else, I’m guessing this should translate fairly easily to the other services.

Since systemd can manage starting and stopping services using signals, we really just need to tell systemd how to start up the godoc server. Here is the godoc.service file in full:

[Unit]
Description=Go Documentation
Wants=network.target

[Service]
Environment=GOPATH=/srv/go/ GOROOT=/usr/lib/go/
ExecStart=/usr/bin/godoc -index -http=:6060
RestartSec=30
Restart=always

[Install]
WantedBy=multi-user.target

The meat of the configuration is ExecStart. It is simply a call to godoc with the -http parameter. I added -index so that the documentation could be searched. I thought this would be useful since it’s intended to be online full-time.

One thing to note, I set the GOPATH to /srv/go. Since systemd doesn’t otherwise load environment variables, the GOPATH would not normally be set. You can change that path to your actual GOPATH. I simply used a symbolic link to my actual go repositories.

Copy your service file to a place where system will recognize it:

$ sudo cp godoc.service /etc/systemd/system

That’s all there is to it! I now have a fully functional, locally hosted Go documentation web site. I use it like golang.org, but it also searches and displays my private packages. I can use it like any other systemd service:

$ sudo systemctl enable godoc.service # Start at boot.
$ sudo systemctl {start,stop,restart} godoc.service

If you are using Arch Linux, I made an AUR package named godoc-systemd. You can install it like you would any other AUR package:

$ wget https://aur.archlinux.org/packages/go/godoc-systemd/PKGBUILD
$ makepkg
$ sudo pacman -U godoc-systemd-1.0-1-any.pkg.tar.xz

I can’t say enough about how impressed I am with the tooling built for Go. Godoc is only one of many and it is a good example of how useful, extendable, and manageable the go tools are.


More...