April 1, 2007 12:00 AM

Identifiers CAN Say Why

question-mark key

A while ago I wrote about the rule of thumb I use to decide when to write a comment: "Identifiers say what. Comments say why." Since then I've had the pleasure to work on a couple of projects with Ivan Moore and he's shown me that it is possible for identifiers to say "why" as well. His style is to use a few long and descriptive identifiers to explain why particularly odd or complex pieces of code exist in the system. I've started using the style as well and found it to be very useful. For example, here's some code from an imposteriser in jMock 2:

    private <T> Class<?> createProxyClass(Class<T> mockedType, Class<?>... ancilliaryTypes) {
        Enhancer enhancer = new Enhancer();
        enhancer.setClassLoader(mockedType.getClassLoader());
        if (mockedType.isInterface()) {
            enhancer.setSuperclass(Object.class);
            enhancer.setInterfaces(prepend(mockedType, ancilliaryTypes));
        }
        else {
            enhancer.setSuperclass(mockedType);
            enhancer.setInterfaces(ancilliaryTypes);
        }
        enhancer.setCallbackType(InvocationHandler.class);
        enhancer.setNamingPolicy(
            NAMING_POLICY_THAT_ALLOWS_IMPOSTERISATION_OF_CLASSES_IN_SIGNED_JARS);
        enhancer.setUseFactory(true);
        
        Class<?> proxyClass = enhancer.createClass();
        return proxyClass;
    }

It's pretty clear, I think, why that naming policy is being used.

This style works well for enterprise software. Computer software, being precise and logical, is never a good fit for the messy, illogical ways that groups of people work together1. Elegant code will need some clunky workaround for strange organisational decisions, and it's important to clearly explain in the code why those workarounds exist. Hopefully it will allow programmers to remove the kludge after the next round of reorgs reshuffles the management again.

  1. I'll rant about workflow systems another time!
Posted on April 24, 2007 [ Permalink | Comments ]

We were doing test-driven development but forgot about the tests!

Test equipment with probes

On a recent project our system had to use information maintained by another system. After discussing various ways of connecting the two systems – database views, web services, messaging – we decided that the team writing the other system would provide us with a compiled JAR that hid whatever jiggery-pokery they preferred behind a convenient API.

Up to this point, we had been mocking out their system and so already had interface definitions for the API. We even had tests that defined the behaviour we wanted from the implementation. The other team were happy to work from those tests and integrate them into their build to ensure that they didn't break the API in the future.

However, we forgot one important detail. Our end-to-end integration tests need to set up test data behind that API. The tests we handed over to the other team did not define an API for doing that. We now have no way of priming our tests with test data without being coupled to their database schema – the very thing we were trying to avoid.

Posted on April 10, 2007 [ Permalink | Comments ]

Scrapheap Challenge at SPA2007, part 3: Name that Tune

The second challenge in the SPA 2007 Scrapheap Challenge workshop was:
A Christmas Quiz Game. I want a quiz game to play at Christmas. The most important feature is that I don't want to have to prepare the quiz beforehand. The second most important feature is that the quiz should help avoid family arguments by keeping scores. Exactly how the game is to be played is part of the challenge. What is the quiz about? Is there a quizmaster? If not, how do players make their guesses? Do all players try to answer the same questions or do they take turns?

I have been writing a location-aware music player in my research project at Imperial College, so I had quite a large collection of music on my laptop in Ogg Vorbis format categorised by artist, album and track and I knew how to use the GStreamer library in Python to play that music. So I decided to write a an automatic version of Name that Tune. The idea of the game would be to play ten seconds of a random song and the players would have to guess the artist and song name. This time Ivan tried the challenge as well on his laptop so we swapped ideas as we worked.

The GStreamer APIs are quite complex, dealing as they do with the asynchronous playback of generic media streams. While exploring my music collection I discovered I had a command-line program called ogg123 installed, which plays a single Ogg Vorbis file. We changed our plans and decided to run ogg123 from a command-line Python program and so avoid the difficulties of writing an event-driven application -- we were pressed for time, after all.

To play the first ten seconds we planned to run ogg123 in the background, sleep for ten seconds and then kill the background process. However, on reading the help text for ogg123 we found that it had a command-line argument to play the first n seconds. Our code became much simpler: there was no need to run the process in the background now, so we could concentrate on recording the scores and managing players.

While writing the program we made sure that it was always in a working state. We started with a simple program that picked a random music file and played ten seconds. We then added code to print out the artist and song name to act as a question. Next, we added a list of players, passed in on the command line, and made the program ask each player in turn. Then we made the program keep track of the scores for each player and print out the scores after each question and when the program exits. Finally we rotated the quiz-master responsibility between the players: when a player answers a question they become the quiz-master for the next player.

We just got the program working in time and didn't have time to clean it up. The code is at the end of this article.

At the end of the challenge there was some controversy as to whether our solution actually passed the first requirement: that I should not have to prepare the quiz data beforehand. After all, I had to rip the Ogg files from my CD collection onto my laptop's file system before I could run the quiz. If someone had done the same thing with their iTunes database we would have accepted that as a solution but because I don't store my main music collection on my laptop we conceded that our solution didn't meet the requirements. (Update: Carlos Villela has created a solution that controls iTunes on MacOS X.)

The winning pair wrote a similar program in Perl. Instead of playing music they asked questions about films. Their solution was ingenious: they screen-scraped IMDB to get the name of the film and then presented several questions about the film: what was the genre, when was it made, who were the starring actors , and so on. To verify the answers, the quizmaster switched to their web browser: the program had opened the IMDB page about the film in the browser while the quizmaster was asking the questions!

Here's our code:

#!/usr/bin/python

import sys
import os
import subprocess
import random
from itertools import *


class Track:
    def __init__(self, artist, album, track, file):
	self.artist = artist
	self.album = album
	self.track = track
	self.file = file

    def __str__(self):
	return self.__class__.__name__ + str(self.__dict__)

def all_tracks(root):
    for artist in os.listdir(root):
	artist_path = os.path.join(root, artist)
	if os.path.isdir(artist_path):
	    for album in os.listdir(artist_path):
		album_path = os.path.join(artist_path, album)
		if os.path.isdir(album_path):
		    for track in os.listdir(album_path):
			if track.endswith(".ogg"):
			    file_path = os.path.join(album_path, track)
			    yield Track(artist, album, track, file_path)


def run_turn(player, tracks, scores, time=5):
    track = random.choice(tracks)
    
    print "Can", player, "guess this track in", time, "seconds"
    print "Artist:", track.artist
    print "Album: ", track.album
    print "Track: ", track.track
    
    subprocess.call(["ogg123", "--quiet", "--end", str(time), track.file])

    answer = None
    while answer != 'y' and answer != 'n':
	sys.stdout.write("Correct? (y/n) ")
	answer = raw_input().lower()
    
    if answer == 'y':
	scores[player] = scores[player]+1
    
    print player, "has", scores[player], "points"
    
    print "Pass the computer to", player
    print player, "press Return to continue"
    raw_input()


def find_winners(scores):
    best_score = 0
    best_players = []
    
    for player, score in scores.items():
	if score > best_score:
	    best_score = score
	    best_players = [player]
	elif score == best_score:
	    best_players.append(player)

    return best_players


def print_scores(scores):
    for player, score in scores.items():
	print player, "scored", score
    
    print ""
    winners = find_winners(scores)
    if len(winners) == 1:
	print "the winner is:", winners[0]
    else:
	print "the winners are:", ", ".join(winners)


tracks = list(all_tracks("/home/nat/music"))
players = sys.argv[1:]
scores = dict( (player,0) for player in players )

try:
    print players[-1], "asks the first question"
    print players[-1], "press Return to continue"
    raw_input()
    
    for player in cycle(players):
	run_turn(player, tracks, scores)

except KeyboardInterrupt:
    pass

print ""
print ""
print "Final scores:"
print_scores(scores)
Posted on April 5, 2007 [ Permalink | Comments ]

Scrapheap Challenge at SPA2007, part 2: The Presentation Package

Here is my solution to the the Presentation Package challenge from the SPA 2007 Scrapheap Challenge workshop. The challenge was:

A Presentation Package. I want to be able to type in a list of sentences that summarise what I will talk about during each slide of the presentation. For each summary the tool should suggest pictures that illustrate the point I want to put across and let me pick one picture per slide to build a presentation. It shows that presentation full-screen.

To write a solution in 90 minutes I used as much of the infrastructure of the GNOME desktop environment as I could.

The user writes their slide summaries in a text file using the Gedit text editor. For example:

Scrapheap Challenge is a workshop about using other peoples software
We have created a scrapheap for you to use called the Internet
You have to work in pairs
You will be given three challenges
The first pair to complete the challenge wins
Then swap pairs before the next challenge
We will have a short retrospective after each challenge
And a long retrospective at the end of the workshop
Prizes will be awarded on completely arbitrary criteria

I wrote a little Python script that turned those summaries into comma-separated tags and used a Python API to the Flickr search webservice to pull down ten pictures that matched each set of tags. I chose Python because I know it well, it has a large standard library for doing internet stuff, and it lets you write terse but readable code, which is good when you want to get a lot done in a short time. I chose Flickr because it contains a lot of stunning photos, Google don't provide automatable search APIs any more and I've had problems with the Yahoo image search in the past.

The script is below. It's what I wrote on the day in 90 minutes while experimenting with the Flickr API so it could be tidied up but I think it's still pretty readable, which is one of Python's big strengths in my opinion.

import sys
import os
from itertools import *
from urllib2 import urlopen

from flickr import photos_search

BatchSize = 10

fluff = set([
    "then", "there", "with", "have", "will"
])

def search(title):
    words = set([word.lower() for word in title.split() if len(word) > 3])
    tags = ",".join(words - fluff)
    return photos_search(tags=tags,
			 tag_mode="any",
			 sort="interestingness-desc",
			 per_page=BatchSize)


titles = [line for line in
	  [line.strip() for line in open(sys.argv[1]).readlines()]
	  if line != ""]

results = [(title, search(title)) for title in titles]

os.system("rm -rf slides/")
os.makedirs("slides/chosen")
for (title, photos), slide_index in izip(results, count(1)):
    print title
    slide_dir = "slides/choose/%02i - %s"%(slide_index,title)
    os.makedirs(slide_dir)
    
    for photo, photo_index in izip(photos,count(1)):
	url = photo.getURL(urlType='source')
	print "    Loading ", url
	data = urlopen(url).read()
	
	local_file = slide_dir + "/%02i.%02i - %s.jpg"%(slide_index,photo_index,title)
	
	open(local_file, "wb").write(data)

The script creates two folders, slides/choose and slides/chosen. Under slides/choose it creates a folder per slide, named after the summary of that slide:

For each summary in the user's text file the script downloads ten photos from Flickr that have any tags in common with the words in the summary, ordered by "interestingness", whatever that means. The downloaded photos are saved into the appropriate folder under slides/choose:

The user then opens the slides/choose and slides/chosen folders in Nautilus, the GNOME file manager, and drags one picture per slide from the subfolders of slides/choose into slides/chosen:

To give a presentation, the user opens the slides/chosen folder in Nautilus and double-clicks on the first slide to open it in the GNOME image viewer. Hitting F11 in the image viewer shows the slides fullscreen. Hitting Space shows the next slide in the folder. The user can also navigate forwards and back with the Page-Up and Page-Down keys.

The final presentations are surprisingly good.

Posted on April 2, 2007 [ Permalink | Comments ]

Scrapheap Challenge at SPA2007, part 1.

Ivan and I ran our Scrapheap Challenge workshop again at last week's SPA conference. This time we were hoping to get the participants to invent their own challenges in an Improv-style brainstorm at the start of the workshop, a section we called "Who's Line of Code is it Anyway?". Unfortunately this attempt was a bit of a flop, possibly because everyone had to get up early on a Sunday to get to Cambridge while the train services were cancelled and so weren't in a very up-beat brainstorming kind of mood, or more probably because neither me nor Ivan had any experience of Improv whatsoever.

Luckily we had some pre-canned challenges in reserve, so the workshop wasn't a total washout:

  1. A Presentation Package. I want to be able to type in a list of sentences that summarise what I will talk about during each slide of the presentation. For each summary the tool should suggest pictures that illustrate the point I want to put across and let me pick one picture per slide to build a presentation. It shows that presentation full-screen.
  2. A Christmas Quiz Game. I want a quiz game to play at Christmas. The most important feature is that I don't want to have to prepare the quiz beforehand. The second most important feature is that the quiz should help avoid family arguments by keeping scores. Exactly how the game is to be played is part of the challenge. What is the quiz about? Is there a quizmaster? If not, how do players make their guesses? Do all players try to answer the same questions or do they take turns?
  3. Real-time Sloppographer. A tool for developers that show them if they are making the code better or worse as they work. We ran over time a bit on tea-breaks before this challenge so the participants only had half an hour to produce a solution instead of an hour and a half.

This time the challenges were more open ended and the applications were more interactive than the workshop we ran at PoMoPro. Dynamic languages that played well with other software and webservices won out by a slight margin – Perl being the most successful – but Unix pipes-and-filters were not very useful.

Here are what the participants found helped their efforts:

  • Break the problem into parts
  • "Luck" in finding components
  • Focus on the minimal functionality and grow the system from there.
    • Incremental, small steps
    • Always keep the system working
  • Don't be fixated on the technology
  • Start with example code and change to fit your needs
  • Find components that do more of the problem: are a better fit
  • Divide the research task (e.g. finding useful components or services) between the pair
  • Keep it Simple, Stupid!
  • Code examples that are easy to find

Here are what the participants found hindered their efforts:

  • Version incompatibilities
  • Environment dependencies
  • Difficult to find what you're looking for
  • Descriptions of components or services not good for getting a quick idea of what the API does
  • Confusing language (gobbledegook) used to describe the API.
  • Leaving integration to later
  • Bad example code
  • Closed formats

Normally we work through the challenges before running the workshop to make sure they are achievable in the time available. However, we hadn't done so this time because we were hoping that we wouldn't have to use them. On the plus side, that meant we were able to participate in the challenges during the workshop. Ivan has already written up our solution to the Real-Time Sloppographer. I will write up our solutions to the Presentation Package and Quiz Game in later articles.

Update: I have published our solution to the Presentation Package challenge.

Update: I have published our solution to the Quiz Game challenge.

Posted on April 2, 2007 [ Permalink | Comments ]