Test-Driven Design

Growing Object-Oriented Software, Guided by Tests

The proposed cover

Steve and I have been working on a book for the last few months entitled Growing Object-Oriented Software, Guided by Tests. We're now at the stage where we can put content online to garner feedback. We'll post a chapter every week or so. Eventually, the book will appear as a Rough Cut on Safari.

If you'd like to read and comment, please join the Yahoo group we've set up for discussion about the content. We'd love to hear what you think.

Posted on July 25, 2008 [ Permalink | Comments ]

Allowing tests to drive development

Witches

I have noticed teams who are new to TDD reach for ever more sophisticated tools to help them test their code rather than refactor their code to make it more testable. By doing so, they lose one of the major benefits of TDD: the feedback it gives on internal design quality.

Code is difficult to test when classes are tightly coupled, are entangled by hidden dependencies like static methods or singletons, or don't clearly distinguish between internal implementation details and peer objects (dependencies, policies or notifications). To unit-test classes like this, it can be tempting to use tools that use clever class loader tricks and bytecode manipulation [1, 2] to break the tight dependencies. But by using tools like this you lose one of the major benefits of unit-testing in the TDD process: rapid feedback on the internal quality of the software.

TDD is not merely about testing — that is, about verifying the external quality of the software: whether it is reliable enough, accurate enough, fast enough, etc. It also gives you feedback about its internal quality: how well designed the code is, such as how loosely coupled and cohesive the classes are, whether dependencies are explicit or hidden, whether information hiding is being used appropriately. And that translates into feedback about how maintainable the code is: how easy it is to extend, change or correct its behaviour over its lifetime.

If you use unit-testing tools that let you side-step poor dependency management in the design, you lose this valuable source of feedback and, when you find that you do need to address these design issues because you have to modify the production code, it will be much harder to do so. The poor structure will have influenced the design of other parts of the system that rely upon it. The programmers responsible for the change will not understand the code as well as those who wrote it (even if they are the same people). It's far easier to nip these design issues in the bud as you discover them than let them remain to affect the design of the rest of the code.

I therefore use a simple rule of thumb when choosing technologies to help me write unit tests:

Break dependencies in unit tests only with techniques that would be acceptable in the production code.

Would I use classloader magic and bytecode manipulation to set dependencies in production code? No. Would I use reflection to modify private fields in production code? No. Would I refactor to introduce an interface between objects? Yes. Would I refactor an object to get a dependency through its constructor instead of from a global variable? Yes.

Following this rule of thumb ensures that code is always easy to change. There are no nasty surprises late in the project when new functionality requires massive changes to the design and developers are pressured to hack the change in because the estimated costs of doing it the right way are too high.

Posted on January 6, 2008 [ Permalink | Comments ]

Tricks with Test Data Builders: Refactoring Away Duplicated Logic Creates a Domain Specific Embedded Language for Testing

Builder and Customer

Test Data Builders remove a lot of duplication from test code, but there can often still be duplicated logic at the point at which the built objects are used. Many different tests will have very similar code that creates an object using a builder and then passes it to the code under test. We can address this duplication by factoring out test scaffolding that works with builders, not system objects. Doing so produces a higher level testing API that more clearly communicates the intent of the test and hides away unimportant details of how the system is being tested.

For example, consider a system to process orders. Orders are sent into our system and processed asynchronously. To perform an end-to-end system test, the test must must create an order, send the order to our system and track the processing of the order by waiting for correlated events to appear on the system's monitoring topic and driving the client through its user interface. That would look something like the following (where the requestSender and progressMonitor do lots of behind the scenes magic with JMS connections, sessions, message producers and consumers, message properties and correlation IDs).

@Test public void reportsTotalSalesOfOrderedProducts() {
    Order order1 = anOrder()
        .withLine("Deerstalker Hat", 1)
        .withLine("Tweed Cape", 1)
        .withCustomersReference(1234)
        .build();

    requestSender.send(order1);
    progressMonitor.waitForConfirmation(order1);
    progressMonitor.waitForCompletion(order1);

    Order order2 = anOrder()
        .withLine("Deerstalker Hat", 1)
        .withCustomersReference(5678)
        .build();

    requestSender.send(order2);
    progressMonitor.waitForConfirmation(order2);
    progressMonitor.waitForCompletion(order2);

    TotalSalesReport report = gui.openSalesReport();
    report.displaysTotalSalesFor("Deerstalker Hat", equalTo(2));
    report.displaysTotalSalesFor("Tweed Cape", equalTo(1));
}

It is tempting pull this duplication into a "helper" method that builds and uses an object. For example:

@Test public void reportsTotalSalesOfOrderedProducts() {
    submitOrderFor("Deerstalker Hat", "Tweed Cape");
    submitOrderFor("Deerstalker Hat");

    TotalSalesReport report = gui.openSalesReport();
    report.displaysTotalSalesFor("Deerstalker Hat", equalTo(2));
    report.displaysTotalSalesFor("Tweed Cape", equalTo(1));
}

void submitOrderFor(String ... products) {
    OrderBuilder orderBuilder = anOrder()
        .withCustomersReference(customersReference++);

    for (String product : products) {
        orderBuilder = orderBuilder.withLine(product, 1);
    }

    Order order = orderBuilder.build();
    
    requestSender.send(order);
    progressMonitor.waitForConfirmation(order);
    progressMonitor.waitForCompletion(order);
}

private int customersReference = 1;

However, this refactoring leaves us with the same difficulties that we encountered with the Object Mother when we have to vary data in different tests. We will need to submit orders with different properties and submit different kinds of events — orders, order amendments, order cancellations, etc. The helper method has the very same problems we found with the Object Mother, and that we avoided by using builders to create our test data.

void submitOrderFor(String ... products) { ... }
void submitOrderFor(String product, int count) { ... }
void submitOrderFor(String product, int count, String otherProduct, int otherCount) { ... }
void submitOrderFor(String product, double discount) { ... }
void submitOrderFor(String product, String giftVoucherCode) { ... }
... etc ...

Instead, we can pass an order builder to the method that sends an order into the system, just as we do when combining builders. That method can add properties through the builder before building the order sending it into the system.

@Test public void reportsTotalSalesOfOrderedProducts() {
    sendAndProcess(anOrder()
        .withLine("Deerstalker Hat", 1)
        .withLine("Tweed Cape", 1));
    sendAndProcess(anOrder()
        .withLine("Deerstalker Hat", 1));
    
    TotalSalesReport report = gui.openSalesReport();
    report.displaysTotalSalesFor("Deerstalker Hat", equalTo(2));
    report.displaysTotalSalesFor("Tweed Cape", equalTo(1));
}

void sendAndProcess(OrderBuilder orderDetails) {
    Order order = orderDetails
        .withDefaultCustomersReference(customersReference++)
        .build();
    
    requestSender.send(order);
    progressMonitor.waitForConfirmation(order);
    progressMonitor.waitForCompletion(order);
}

private int customersReference = 1;

Finally, a bit of judicious renaming can change the language of the test so that it communicates more about what behaviour is being tested than how the system implements that behaviour.

@Test public void reportsTotalSalesOfOrderedProducts() {
    havingReceived(anOrder()
        .withLine("Deerstalker Hat", 1)
        .withLine("Tweed Cape", 1));
    havingReceived(anOrder()
        .withLine("Deerstalker Hat", 1));
    
    TotalSalesReport report = gui.openSalesReport();
    report.displaysTotalSalesFor("Deerstalker Hat", equalTo(2));
    report.displaysTotalSalesFor("Tweed Cape", equalTo(1));
}

@Test public void takesAmendmentsIntoAccountWhenCalculatingTotalSales() {
    Customer theCustomer = aCustomer().build();

    havingReceived(anOrder().from(theCustomer)
        .withCustomerReference(10)
        .withLine("Deerstalker Hat", 1)
        .withLine("Tweed Cape", 1));
        
    havingReceived(anOrderAmendment().from(theCustomer)
        .withCustomerReference(10)
        .withLine("Deerstalker Hat", 2));

    TotalSalesReport report = gui.openSalesReport();
    report.displaysTotalSalesFor("Deerstalker Hat", equalTo(2));
    report.displaysTotalSalesFor("Tweed Cape", equalTo(1));
}

Test Data Builders are a foundation upon which we can define higher-level testing APIs that better communicates the intent of our tests in a language that is closer to that used by non-technical project stakeholders and so greatly help communication within the project.

Update: Thanks to David Peterson and Michael Hunger for helpful feedback. I've fixed typos in the test code and improved the test names. Hopefully the code is easier to follow now.

Posted on December 21, 2007 [ Permalink | Comments ]

Tricks with Test Data Builders: Emphase the Domain Model with Factory Methods

Construction of the 'gherkin' building

Tests that use Test Data Builders can be made less noisy by combining builders. This still leaves some noise in the test: the test code overly emphasises how the tests are building objects at the expense of what they are building. A future reader of the test will be far more interested in what objects are being used than in the way that those objects are constructed.

We can de-emphasise the builders further by instantiating them in clearly named factory methods:

Order order = 
    anOrder().fromCustomer(
          aCustomer().withAddress(
              anAddress().withNoPostcode())).build();

When we do this, the naming convention we've used for builder methods up to now gets in the way instead of making things clearer. The builder code looks better if we rename the methods to reflect the relationship between objects only, and not include the type of object at the far end of the relationship:

Order order = 
    anOrder().from(aCustomer().with(anAddress().withNoPostcode())).build();

This relies on Java's method overloading and so only works for properties that have unique, user-defined types. Longer method names are necessary for primitive types, or if the built object has different relationships with the same type of object. For example, most of the fields of an Address are Strings, and so the builder methods must be explicitly named after the field. However, the post code is strongly typed and so can be passed to an overloaded method:

Address aLongerAddress = anAddress()
    .withStreet("222b Baker Street")
    .withCity("London")
    .with(postCode("NW1", "3RX"))
    .build();
Posted on December 16, 2007 [ Permalink | Comments ]

Tricks with Test Data Builders: Defining Common State

A builder with a plan

Using separate Test Data Builders to construct objects with common state leads to duplication and can make the test code harder to read and maintain. For example:

Invoice invoiceWith10PercentDiscount = new InvoiceBuilder()
    .withLine("Deerstalker Hat", new PoundsShillingsPence(0, 3, 10))
    .withLine("Tweed Cape", new PoundsShillingsPence(0, 4, 12))
    .withDiscount(0.10)
    .build();

Invoice invoiceWith25PercentDiscount = new InvoiceBuilder()
    .withLine("Deerstalker Hat", new PoundsShillingsPence(0, 3, 10))
    .withLine("Tweed Cape", new PoundsShillingsPence(0, 4, 12))
    .withDiscount(0.25)
    .build();

Instead, you can initialise a single builder with the common state and then repeatedly call its build method after defining values that apply only to the built objects:

InvoiceBuilder products = new InvoiceBuilder()
    .withLine("Deerstalker Hat", new PoundsShillingsPence(0, 3, 10))
    .withLine("Tweed Cape", new PoundsShillingsPence(0, 4, 12));

Invoice invoiceWith10PercentDiscount = products
    .withDiscount(0.10)
    .build();

Invoice invoiceWith25PercentDiscount = products
    .withDiscount(0.25)
    .build();

This can make tests much easier to read because there is less code and you can give the builder a descriptive name.

However, you have to be careful if the built objects need different fields to be initialised. Because the withXXX methods change the state of the shared builder, objects built later will be created with the same state as those created earlier unless it is explicitly overridden. For example, in the following code, the second invoice has both a discount and a gift voucher, which is not what the code appears to communicate at first glance.

InvoiceBuilder products = new InvoiceBuilder()
    .withLine("Deerstalker Hat", new PoundsShillingsPence(0, 3, 10))
    .withLine("Tweed Cape", new PoundsShillingsPence(0, 4, 12));

Invoice invoiceWithDiscount = products
    .withDiscount(0.10)
    .build();

Invoice invoiceWithGiftVoucher = products
    .withGiftVoucher("12345")
    .build();

A solution is to add a method or copy constructor to the builder that copies state from another builder:

InvoiceBuilder products = new InvoiceBuilder()
    .withLine("Deerstalker Hat", new PoundsShillingsPence(0, 3, 10))
    .withLine("Tweed Cape", new PoundsShillingsPence(0, 4, 12));

Invoice invoiceWithDiscount = new InvoiceBuilder(products)
    .withDiscount(0.10)
    .build();

Invoice invoiceWithGiftVoucher = new InvoiceBuilder(products)
    .withGiftVoucher("12345")
    .build();

Alternatively, you could add a factory method to the builder that returns a new builder with a copy of the builder's state:

InvoiceBuilder products = new InvoiceBuilder()
    .withLine("Deerstalker Hat", new PoundsShillingsPence(0, 3, 10))
    .withLine("Tweed Cape", new PoundsShillingsPence(0, 4, 12));

Invoice invoiceWithDiscount = products.but().withDiscount(0.10)
    .build();

Invoice invoiceWithGiftVoucher = products.but().withGiftVoucher("12345")
    .build();

The safest option is to make every with method create an entirely new copy of the builder instead of returning this.

Posted on December 12, 2007 [ Permalink | Comments ]

Test Data Builders: an alternative to the Object Mother pattern

The traditional view of a builder

If you are strict about your use of constructors and immutable value objects, constructing objects in a valid state can be a bit of a chore.

Usually in application code, such objects are constructed in few places and all the information required by the constructor is at hand, having been provided by user input, obtained from a database query or received in a message for example. In tests, on the other hand, you have to provide all those constructor arguments every time you want to create an object, whether to test its behaviour or to create a value to use as input to the code being tested.

Invoice invoice = new Invoice(
    new Recipient("Sherlock Holmes",
        new Address("222b Baker Street", 
                    "London", 
                    new PostCode("NW1", "3RX"))),
    new InvoiceLines(
        new InvoiceLine("Deerstalker Hat", 
            new PoundsShillingsPence(0, 3, 10)),
        new InvoiceLine("Tweed Cape", 
            new PoundsShillingsPence(0, 4, 12))));

The code to create all those objects makes tests messy and hard to read and fills the tests with lots of unnecessary information that has nothing to do with the behaviour being tested. It also makes tests brittle: changes to the constructor arguments or the structure of the objects will break many tests.

The Object Mother pattern is one attempt to avoid this problem. An Object Mother is a class that contains a number of (usually static) Factory Methods that create objects for use in tests. For example, we could create an Object Mother for invoices we want to use in tests:

Invoice invoice = TestInvoices.newDeerstalkerAndCapeInvoice();

An Object Mother helps keep tests readable by moving the code that creates new objects out of the tests themselves and giving clear names to the objects being constructed. It also helps maintain the test data by gathering the code that creates new objects together into the Object Mother class and allowing it to be reused between tests.

However, the Object Mother pattern does not cope at all well with variation in the test data. Every time programmers need some slightly different test data they add another factory method to the Object Mother.

Invoice invoice1 = TestInvoices.newDeerstalkerAndCapeAndSwordstickInvoice();
Invoice invoice2 = TestInvoices.newDeerstalkerAndBootsInvoice();
...

Over time, the Object Mother becomes bloated, messy and hard to maintain. Either programmers add new factory methods without refactoring, in which case the Object Mother becomes full of duplicated code, or programmers refactor diligently, in which case the Object Mother becomes full of many, many fine-grained methods that each contain little more than a single new statement.

A solution is to use the Builder Pattern. For each class you want to use in a test, create a Builder for that class that:

  1. Has an instance variable for each constructor parameter
  2. Initialises its instance variables to commonly used or safe values
  3. Has a `build` method that creates a new object using the values in its instance variables
  4. Has "chainable" public methods for overriding the values in its instance variables.

For example, a builder of Invoice objects might look like:

public class InvoiceBuilder {
    Recipient recipient = new RecipientBuilder().build();
    InvoiceLines lines = new InvoiceLines(new InvoiceLineBuilder().build());
    PoundsShillingsPence discount = PoundsShillingsPence.ZERO;

    public InvoiceBuilder withRecipient(Recipient recipient) {
        this.recipient = recipient;
        return this;
    }

    public InvoiceBuilder withInvoiceLines(InvoiceLines lines) {
        this.lines = lines;
        return this;
    }

    public InvoiceBuilder withDiscount(PoundsShillingsPence discount) {
        this.discount = discount;
        return this;
    }

    public Invoice build() {
        return new Invoice(recipient, lines, discount);
    }
}

Tests that don't care about the precise values in an Invoice can create one in a single line:

Invoice anInvoice = new InvoiceBuilder().build();

Tests that want to use specific values can define them inline without filling the test with unimportant details:

Invoice invoiceWithNoPostcode = new InvoiceBuilder()
    .withRecipient(new RecipientBuilder()
        .withAddress(new AddressBuilder()
            .withNoPostcode()
            .build())
        .build())
    .build();

I've used Builders for creating test data on a couple of projects now and I've found that, compared to Object Mothers, they make it much easier to create test data in-line in the test code without making tests brittle or creating lots of duplication. Tests are isolated from those aspects of the objects' structure that have no bearing on the test. For example, code that creates the invoice with no postcode needs to know that an invoice has a recipient, that has an address, that has a postcode, but has no further dependencies on the structure of invoices, recipients and addresses. You can add constructor arguments without breaking tests at all. Removing constructor arguments is easy as well with modern refactoring IDEs.

Another benefit is that the test code is easier to write and read because the parameters are clearly identified. Compare:

TestAddresses.newAddress(
    "Sherlock Holmes", 
    "222b Baker Street", 
    "London", 
    "NW1");

to:

new AddressBuilder()
    .withName("Sherlock Holmes")
    .withStreet("222b Baker Street")
    .withCity("London")
    .withPostCode("NW1", "3RX")
    .build();

Nothing in the first example will tell you that "London" has been accidentally passed as the second street line instead of the city name.

In some cases, Builders have so improved the code that they ended up being used in the production code as well.

Further techniques for using Test Data Builders:

Credits: builder picture from the Keep Scotland Beautiful campaign.

Update: thanks to Richard Hansen for pointing out a typo in the builder code, which is now fixed.

Posted on August 27, 2007 [ Permalink | Comments ]

Lessons Learned Using FIT

Fitness class

I've used the FIT framework on a few recent projects with both positive and negative results. Here are some rough notes to my future self about what worked, what didn't and what to beware of. Here "specification" means the FIT HTML document and "test" means the act of interpreting the HTML with fixture classes in a test-run.

Know your Audience.

FIT's reflectopornographic architecture introduces a great deal of overhead when writing and maintaining tests and diagnosing test failures. That overhead might be worth it if FIT helps you communicate with users and customers. However, if the FIT specifications are not read by anyone outside the development team (and I include business analysts in the development team) then the overhead is just not worth it. Instead choose a tool that works smoothly with your IDE, such as JUnit, and cooperate closely with non-technical team members when writing end-to-end JUnit tests. JUnit extensions like LiFT might help with this communication. I've found it pretty easy to refactor JUnit tests into a DSL-style like that of LiFT for apps not supported by LiFT itself.

Nobody in the Team "Owns" the Specifications

Don't let any one role in the team have sole responsibility for the specifications. They exist to help customers, BAs, testers and developers collaborate. If a developer finds the spec hard to read when they come to write fixtures or implement the specified functionality, the team should work together to make the spec more comprehensible. If BAs or testers think that more scenarios should be tested, the team should work together to make the spec more comprehensive. If the developers need to change the spec to use their library of common fixtures, the team should work together to refactor the specifications. If the team can't collaborate over the specifications the project has problems than can't be solved by a testing framework.

Focus on the Differences between Examples

Write the specifications to focus on one aspect of system behaviour. Write multiple specifications that specify different slices of functionality. Separate happy and sad paths into different specifications. Don't include information in the specification that is also included in other specifications. Don't write specifications that contain a single enormous table that covers all possible combinations: it is very difficult understand what is specified which makes it much harder to diagnose test failures.

Intersperse Explanatory Text

Intersperse explanatory text among the tables to explain what each table means. Don't write a paragraph or two of overview at the top of the document and then follow it with table after table of test data.

Design for Failure

When writing a specification or its fixtures, watch the test fail for different reasons and make sure the fixture generates good diagnostics.

Write Custom Fixtures that Support Your Application

The default FIT fixtures have confusing names. Despite using FIT for quite some time I never could remember the difference between a RowFixture and a ColumnFixture. All tables have rows and columns! This is quite ironic when one considers that FIT was designed to aid communication.

In practice I found it easiest to extend DoFixture to write document level fixtures that and then use custom fixtures for each table in the document. When writing custom fixtures it was easier to extend Fixture base class than extend one of the generic, reusable fixture classes, or wrap a generic fixture class around a "system under test" object.

Reuse Fixtures Between Many Specifications

Because each fixture type introduces maintenance overhead because of the difficulty of refactoring, try and share fixture types between specifications.

If specifications concentrate on the differences between examples then it's quite easy to write a fixture that initialises properties to defaults, one or two of which are overridden from the HTML. That fixture can then be used to test many different specifications.

For example, if I wrote a fixture to enter an order into an order-processing system I'd default most of the order details in the fixture class but let the HTML override any of those defaults. I could then use the fixture to test specifications for happy path order processing (overriding different names or products), when the customer has not given their postcode, when the order contains multiple lines for the same product, or for products with different delivery schedules, etc. etc.

Refactor Fixture Classes Very Aggressively

Because FIT uses a lot of reflection and does not include the reflected method names in the documents themselves, refactoring FIT code is very time consuming. However, if you don't refactor you end up with shed loads of fixture code that becomes an increasing maintenance burden. In situations like this, little and often is the best approach. The refactoring will be a continual drag on velocity but it's better than putting the work off until the refactoring is too much work to even contemplate.

Refactor the Specifications

To build fixtures that can be used to test multiple specifications you will need to find and emphasise the commonality between the specifications themselves. You will discover this commonality as you develop the system and its FIT tests, so specifications will not start out using common table structures. Therefore you will have to refactor the specifications themselves to redesign their tables to work with the reusable fixtures you write.

This is a good thing: it will make the system description more consistent and easier to read.

Don't Drive Business Objects Directly

FIT is used to communicate the capabilities of the system. When the a customer sees a green FIT test, they understand that as meaning the system performs as specified, not that some objects, when poked in the right order, perform as specified. If you use FIT to drive business objects only, the actual end-to-end behaviour of the system can get forgotten. On one project, requirements had been signed off as complete because of green FIT tests but I found that the main function of the program contained nothing but a "TODO: finish this" comment!

The only reason to drive business objects directly from FIT tests is as an optimisation, if there are too many combinations to reasonably test in slow-running end-to-end tests. But make sure that there is at least one end-to-end test that verifies that the specified behaviour hangs together system-wide.

Update: In the comments, Rex Madden wrote:

I agree with everything but the last point. We used to test everything end-to-end with FIT, but found it to be less useful than using it to test groups of business objects. We now use them to test some part of the domain in isolation. We tend to call them "business rule" tests. We still always have some sort of test (usually in Selenium) for testing the system end to end. With FIT, we can focus on the domain driven design stuff without worrying about the database or gui.

I asked:

What advantage does FIT give over JUnit for testing objects? Do you show the FIT reports to customers or users?

Personally I prefer JUnit for driving domain-driven design because I get rapid feedback on whether the object model reflects the domain while FIT hides the object model altogether.

Rex replied to be by email:

With FIT, the customer can write the tests. It helps them be more exhaustive in the examples, as they can play with scenarios. And they can do it well before we even think about the object model. It may be a few days before we get that section of the code, but if we have the FIT specifications beforehand, we can better make decisions on how to implement. The customer can also refer to them later.

Also, domain driven design is a lot more than just the object model. It's about getting the language of the customer into the code base somehow. Having the customer write the FIT tests helps drive the ubiquitous language by introducing key words into the comments and tables.

That certainly coincides with my experience of FIT. In my experience, customers and end-users find it incredibly easy to write examples. The FIT way of writing specifications through concrete examples is very intuitive. I once showed some preliminary FIT tests to a customer and they grabbed a pencil and paper and started sketching out tables of what they wanted to see the software do, without any explanation of how FIT worked. That really sold me on the value of FIT.

However, if customers don't write the specifications I don't think the high maintenance overhead of FIT is worth while.

I also wholeheartedly agree with the idea that "domain driven design is a lot more than just the object model". Developing an ubiquitous language (or System of Names) is very important. On my current project we're trying to encourage the development of an ubiquitous language by reflectively mapping object field names and values into the GUI, thereby forcing the code to use the same language as the user interface and the users. More on that another time...

Posted on July 23, 2007 [ Permalink | Comments ]

Refactoring Tests

Notes

An attitude that can be very detrimental to a test-driven development process is to consider the tests as sacrosanct, a sacred specification of the system that once written can never be changed. If a team do change the tests they do not improve them as they learn more about the application domain, the technologies they are using and the practices of test-driven development itself. As a result, the tests become confusing to read and hard to diagnose when they fail. Eventually the team find the tests so hard to work with that they are tempted to remove them from the build altogether. This situation is made worse when one part of the organisation (QA, for example) considers that it "owns" the tests but expects another part of the organisation (development) to diagnose and fix test failures.

I find it very helpful to refactor tests very frequently while I'm developing. I use the tests as a record of what my pair and I currently understand about the functionality we're working on. When working on something I don't understand well I will often refactor a every few minutes or even every few seconds to jot down of what I have learned, usually renaming identifiers or pulling out methods and variables to introduce explanatory names.

To show what I mean, here are the changes that were recorded in my Eclipse local history for just one test in jMock 2 as I analysed and fixed a defect (warning: low-level hackery ahead). Steve Gilbert, who reported the defect, helpfully included a JUnit 4 test case that demonstrated the failure. The first thing I did was add the test to the jMock 2 test suite and convert it to JUnit 3 like the rest of the jMock tests.

public class ComparatorTest extends TestCase {
  private Mockery context = new JUnit4Mockery() {{
      setImposteriser(ClassImposteriser.INSTANCE);
  }};

  public void testFailingTest() throws Exception {
    final MyComparator comparator = context.mock(MyComparator.class);
    final MyComparable a = new MyComparable();
    final MyComparable b = new MyComparable();

    context.checking(new Expectations() {{
      one(comparator).compare(a, b); will(returnValue(0));
    }});

    compare(a, b, comparator);
  }

  public void testPassingTest() throws Exception {
    final Comparator<MyComparable> comparator = context.mock(MyComparator.class);
    final MyComparable a = new MyComparable();
    final MyComparable b = new MyComparable();

    context.checking(new Expectations() {{
      one(comparator).compare(a, b); will(returnValue(0));
    }});

    compare(a, b, comparator);
  }

  private <T> int compare(T a, T b, Comparator<T> c) {
    return c.compare(a, b);
  }

  static class MyComparable implements Comparable {
    public int compareTo(Object o) {
      return 0;
    }
  }

  static class MyComparator implements Comparator<MyComparable> {
    public int compare(MyComparable o1, MyComparable o2) {
      return 0;
    }
  }
}

I suspected that the bug had nothing to do with Comparators and Comparables. I explored that possibility by using strings instead of MyComparable objects to see if anything changed.

public class ComparatorTest extends TestCase {
  private Mockery context = new JUnit4Mockery() {{
      setImposteriser(ClassImposteriser.INSTANCE);
  }};

  public void testFailingTest() throws Exception {
    final MyComparator comparator = context.mock(MyComparator.class);

    context.checking(new Expectations() {{
      one(comparator).compare("a", "b"); will(returnValue(0));
    }});

    compare("a", "b", comparator);
  }

  public void testPassingTest() throws Exception {
    final Comparator<String> comparator = context.mock(MyComparator.class);
    
    context.checking(new Expectations() {{
      one(comparator).compare("a", "b"); will(returnValue(0));
    }});

    compare("a", "b", comparator);
  }

  private <T> int compare(T a, T b, Comparator<T> c) {
    return c.compare(a, b);
  }
        
  public static class MyComparator implements Comparator<String> {
    public int compare(String s1, String s2) {
      return 0;
    }
  }
}

The test still failed in the same way. Comparables had nothing to do with the problem so I introduced some new types into the test to better describe what was really being exercised. I also renamed the compare method to useTheMock for the same reason.

public class ComparatorTest extends TestCase {
    private Mockery context = new JUnit4Mockery() {{
        setImposteriser(ClassImposteriser.INSTANCE);
    }};
    
    public void testFailingTest() throws Exception {
        final AnImplementation mock = context.mock(AnImplementation.class);

        context.checking(new Expectations() {{
            one(mock).doSomethingWith("a");
        }});

        useTheMock(mock, "a");
    }
    
    public void testPassingTest() throws Exception {
        final AnInterface mock = context.mock(AnImplementation.class);

        context.checking(new Expectations() {{
            one(mock).doSomethingWith("a");
        }});

        useTheMock(mock, "a");
    }

    private void useTheMock(AnInterface mock, String string) {
        mock.doSomethingWith(string);
    }

    public interface AnInterface {
        void doSomethingWith(String arg);
    }

    public static class AnImplementation implements AnInterface {
        public void doSomethingWith(String arg) {
        }
    }
}

Both tests passed! I didn't expect that. The only significant difference between the AnInterface type I had defined and Comparable is that AnInterface is no longer generic. I made it generic to see what would happen.

public class ComparatorTest extends TestCase {
    private Mockery context = new JUnit4Mockery() {{
        setImposteriser(ClassImposteriser.INSTANCE);
    }};
    
    public void testFailingTest() throws Exception {
        final AnImplementation mock = context.mock(AnImplementation.class);

        context.checking(new Expectations() {{
            one(mock).doSomethingWith("a");
        }});

        useTheMock(mock, "a");
    }
    
    public void testPassingTest() throws Exception {
        final AnInterface<String> mock = context.mock(AnImplementation.class);

        context.checking(new Expectations() {{
            one(mock).doSomethingWith("a");
        }});

        useTheMock(mock, "a");
    }

    private <T> void useTheMock(AnInterface<T> mock, T arg) {
        mock.doSomethingWith(arg);
    }

    public interface AnInterface<T> {
        void doSomethingWith(T arg);
    }

    public static class AnImplementation implements AnInterface<String> {
        public void doSomethingWith(String arg) {
        }
    }
}

That made the test fail again. That showed me that the bug was something to do with non-generic classes implementating generic interfaces, so I renamed the test to record what I'd learned.

Originally the test methods were well named because they accompanied to the bug report: one test passed and the other failed unexpectedly. Now they were integrated into the jMock test suite those names no longer make sense. After all, when I finished fixing the bug they should both be passing. The difference between the two tests seemed to be that the one is calling the mock through the mocked concrete class, not through the interface, so I renamed them to reflect that.

public class MockingImplementationOfGenericTypeAcceptanceTests extends TestCase {
    private Mockery context = new JUnit4Mockery() {{
        setImposteriser(ClassImposteriser.INSTANCE);
    }};
    
    public void testWhenInvokedThroughClass() throws Exception {
        final AnImplementation mock = context.mock(AnImplementation.class);

        context.checking(new Expectations() {{
            one(mock).doSomethingWith("a");
        }});

        useTheMock(mock, "a");
    }
    
    public void testWhenInvokedThroughInterface() throws Exception {
        final AnInterface<String> mock = context.mock(AnImplementation.class);

        context.checking(new Expectations() {{
            one(mock).doSomethingWith("a");
        }});

        useTheMock(mock, "a");
    }

    private <T> void useTheMock(AnInterface<T> mock, T arg) {
        mock.doSomethingWith(arg);
    }

    public interface AnInterface<T> {
        void doSomethingWith(T arg);
    }

    public static class AnImplementation implements AnInterface<String> {
        public void doSomethingWith(String arg) {
        }
    }
}

Tracing through with the debugger, it appears that the problem arises when the expectation is set up through the concrete class but the method is actually invoked through the interface. Our passing test sets up expectations and invokes mocked methods through the interface. The failing test sets up expectations through the class and invokes mocked methods through interface. What about the third case: when a test sets up and invokes through the class? I renamed the methods again to record what I'd learned and added the additional case to see what happened.

After that refactoring the useTheMock method was pointless because it was only only used in one test and also made the tests harder to read, so I inlined it.

public class MockingImplementationOfGenericTypeAcceptanceTests extends TestCase {
    private Mockery context = new JUnit4Mockery() {{
        setImposteriser(ClassImposteriser.INSTANCE);
    }};
    
    public void testWhenDefinedAndInvokedThroughClass() throws Exception {
        final AnImplementation mock = context.mock(AnImplementation.class);

        context.checking(new Expectations() {{
            one(mock).doSomethingWith("a");
        }});
            
        mock.doSomethingWith("a");
    }
    
    public void testWhenDefinedThroughClassAndInvokedThroughInterface() throws Exception {
        final AnImplementation mock = context.mock(AnImplementation.class);

        context.checking(new Expectations() {{
            one(mock).doSomethingWith("a");
        }});
        
        ((AnInterface<T>)mock).doSomethingWith("a");
    }
    
    public void testWhenDefinedAndInvokedThroughInterface() throws Exception {
        final AnInterface<String> mock = context.mock(AnImplementation.class);

        context.checking(new Expectations() {{
            one(mock).doSomethingWith("a");
        }});

        mock.doSomethingWith("a");
    }

    public interface AnInterface<T> {
        void doSomethingWith(T arg);
    }

    public static class AnImplementation implements AnInterface<String> {
        public void doSomethingWith(String arg) {
        }
    }
}

Digging around with the debugger some more I discovered that when a method of a generic interface is invoked through the interface but implemented by a non-generic class, the invocation gets routed through a "bridge method" that implements the downcast. Bridge methods are not visible at compile time. They are an implementation detail generated by the compiler and only visible at runtime. Luckily that information is exposed through the reflection API; Method objects have an isBridge property. I added a comment to the test to explain why the test fails. (Note: I've removed foul language about Java generics from the comment to protect those with tender sensibilities).

public class MockingImplementationOfGenericTypeAcceptanceTests extends TestCase {
    private Mockery context = new JUnit4Mockery() {{
        setImposteriser(ClassImposteriser.INSTANCE);
    }};
    
    public void testWhenDefinedAndInvokedThroughClass() throws Exception {
        final AnImplementation mock = context.mock(AnImplementation.class);

        context.checking(new Expectations() {{
            one(mock).doSomethingWith("a");
        }});
            
        mock.doSomethingWith("a");
    }
    
    public void testWhenDefinedThroughClassAndInvokedThroughMethod() throws Exception {
        final AnImplementation mock = context.mock(AnImplementation.class);

        context.checking(new Expectations() {{
            one(mock).doSomethingWith("a");
        }});
        
        // Note: this is invoked through a "bridge" method and so the method
        // invoked when expectations are checked appears to be different from
        // that invoked when expectations are captured.
        ((AnInterface<String>)mock).doSomethingWith("a");
    }
    
    public void testWhenDefinedAndInvokedThroughInterface() throws Exception {
        final AnInterface mock = context.mock(AnImplementation.class);

        context.checking(new Expectations() {{
            one(mock).doSomethingWith("a");
        }});

        mock.doSomethingWith("a");
    }

    public interface AnInterface<T> {
        void doSomethingWith(T arg);
    }

    public static class AnImplementation implements AnInterface<String> {
        public void doSomethingWith(String arg) {
        }
    }
}

Now I could fix the bug. I changed the ClassImposteriser to not intercept bridge methods but let them forward the invocation on to the concrete implementation. This way expectations specified via bridged or non-bridged methods can be satisfied by invocations of bridged or non-bridged methods.

Posted on May 15, 2007 [ Permalink | Comments ]

We were doing test-driven development but forgot about the tests!

Test equipment with probes

On a recent project our system had to use information maintained by another system. After discussing various ways of connecting the two systems – database views, web services, messaging – we decided that the team writing the other system would provide us with a compiled JAR that hid whatever jiggery-pokery they preferred behind a convenient API.

Up to this point, we had been mocking out their system and so already had interface definitions for the API. We even had tests that defined the behaviour we wanted from the implementation. The other team were happy to work from those tests and integrate them into their build to ensure that they didn't break the API in the future.

However, we forgot one important detail. Our end-to-end integration tests need to set up test data behind that API. The tests we handed over to the other team did not define an API for doing that. We now have no way of priming our tests with test data without being coupled to their database schema – the very thing we were trying to avoid.

Posted on April 10, 2007 [ Permalink | Comments ]

Looking at tests from a different perspective.

An astronomer using a telescope

As I mentioned before, I've been writing Protest with Phil Dawes. Protest is a Python test framework and toolset for writing programmer tests and generating useful stuff from those tests. Currently the most interesting tool is the API documentation generator.

Writing API documentation by hand is a chore that offends my hard-learned distaste for duplication. When I've written tests that specify the API's behaviour I don't think I should to repeat that information in documentation. Even worse, the documentation cannot tell me when it's incorrect.

On the other hand, I really don't like Javadoc and similar tools for Python, such as Epydoc. Putting documentation into the code being documented ends up a maintenance nightmare. The code becomes hard to read and hard to navigate, lost in a mass of comments or docstrings.

So, Protest takes the same approach as Testdox and generates documentation from programmer tests that follow some straightforward conventions. I've tried to make it generate higher quality documentation than Testdox, which only creates a rudimentary overview of a test suite. The generated documentation has rich navigation links and a graphical visualisation of the module structure of the documented code. Having seen the importance of concrete examples in the Scrapheap Challenge workshop, the documentation includes the test code to show how to use every feature of the documented classes. Explanatory text can be to any example by giving the test a documentation string in reStructuredText format, keeping long comments out of the production code and in the tests where they are more useful.

A surprising side effect of the tool is how it really helps you improve your test code. I like to think I'm pretty good at writing tests. I designed the documentation generator to work with the way I like to make my tests self-explanatory. However, when I saw my test code in the documentation, rather than the IDE, I found lots of room for improvement: the test names needed tweaking to generate better titles, variables needed better names to identify the roles they played in the test, test data needed to be made self-describing, and so forth. I was pleased to find that the tool did more than just generate documentation, it gave me a fresh perspective on my test code that highlighted where it needed improvement.

Posted on March 30, 2006 [ Permalink | Comments ]

Too clever by half!

The Mekon

I recently wrote a simple test framework to use on a project that is using Python on Series 60 phones. I began the project using PyUnit but found the API unintuitive. I wanted something more "pythonic", so I rolled my own.

I was initially puzzled about how to bootstrap the development of the test framework. How could I write the test framework itself test-first? Then I came up with a cunning plan: because I was also writing my project test-first, I could use the project as the test for the test framework that was being used to test the project. A simple, straightforward solution to my dilemma, as I'm sure you'll agree. Maybe.

The way I figured it, I'd write a test in my not-yet-existing test framework that would test my not-yet-existing project. When I first ran the test, it would not report that my project did not behave as expected. Intead it would report that the test framework did not behave as expected. So, I would keep writing the test framework and running the tests until it reported the expected failure of my project. I could then make the tests pass by writing project code, and start all over again until I had a working project tested by a working test framework. Pretty soon the test framework would be complete and I could focus just on the project.

It worked like a charm. I soon had a simple, flexible test framework and well tested project code. But of course, I was missing tests for the test framework itself. As I added features to the framework and built some useful development tools around it (more about those later) other developers started to get interested in the framework and wanted to use it on their projects too. This meant I had to extend the framework with features that my project didn't need and so I could no longer test the test framework against the project and vice versa.

I had to bite the bullet and write tests for the test framework. I hate testing after the fact — it's boring and error prone, but what else could I do? I should have written good tests in the first place instead of being too clever for my own good.

Posted on March 17, 2006 [ Permalink | Comments ]

Environmentally Friendly Deployment

Keep your fan clean with environment tests

Deployment. After weeks of development the rubber finally hits the road... and often the shit hits the fan and the egg hits the face.

This time, the deployment of my project went smoothly thanks to a suite of Environment Tests, a technique I learned about at XTC and XP Day 4. An environment test verifies that the environment into which you are about to deploy an application is in the state that you expect: that expected directories exist; that files are readable when running under service accounts; that database logins are set up; that stored procedures have been loaded into the database; etc. etc. Environment tests are also used to verify that the development environment accurately reflects the production environment.

My first suite of environment tests were written in JScript and ran in the windows scripting host (WSH). JScript is easier to write than the Windows batch language but it's awkward to access the filesystem and other operating system functions from WSH.

But that wasn't the biggest problem. As I made my tests more accurate, the JScript tests ended up duplicating functionality in the application. For example, to test that my application would be able to connect to the database with its assigned username and password I had to parse the application's configuration file, generate the connection string and try a connection.

At this point I realised my mistake... I shouldn't have duplicated the functionality of the application in the tests, I should have used it. I therefore rewrote the tests in the same language as the application. The environment tests now call through the application code to probe the environment in exactly the same way as the actual application.

The application is structured as a package of core services — domain model, data mapping layer, etc. — that are used by multiple client packages. The clients include the deployed executables, some development tools for ad-hoc exploratory testing, the unit tests, the integration tests and now the environment tests.

Those environment tests have already saved my bacon more than once. At deploy time they detected that the application's login had not been set up correctly in the production database. Later, while helping with the install of another application, they detected that the install had deleted registry entries used by COM objects shared between the two applications. Both problems were easy to fix because they were detected by the environment tests without having to actually run the application in the live environment.

My fan is still shiny and clean and my face is still egg-free.

Update 27/05/2005: fixed link to Environment Tests presentation

Posted on May 25, 2005 [ Permalink | Comments ]

Regional Accents

Jay Fields wonders when it is appropriate to use #region blocks in C#. I've found a good use them! Here's a test from nMock 2, a port of jMock to C# that I've been writing on and off over the last few weeks.

region1.gif

What's the value of ANY_NUMBER? You shouldn't care! It means any number. So I hid the declaration in a region. If the reader opens that region:

region2.gif

What, you really want to know? Ok, but don't say I didn't warn you.

region3.gif

Did I say it was a good use of regions? Perhaps that was not completely accurate.

Posted on May 16, 2005 [ Permalink | Comments ]

Import Tests

Sniffing out Illegal Imports

A project I am working on hit a snag recently: all of a sudden the program stopped producing correct results for a subset of the input data. I tracked the problem down and found that the culprit was a package of calculation routines I'm importing from another project. Some regression in their code fed incorrect results into my calculation, which caused my final results to be way out of whack.

Unfortunately, my acceptance/regression suites hadn't caught the error. My acceptance tests were testing just what the business users wanted to see: that the calculation engine itself generated good results when fed good data. To test that, and to get a predictable test environment, I had captured input data and expected results in spreadsheets that were fed to the calculation engine.

One table of input data captured what would be calculated in production by the imported code that eventually caused the regression. Ideally I would have fed test data through that code, but it was not possible to connect it to a different source of data: the data access code was entangled within the calculation code. So, I used the test data to test my calculation engine only and relied on manual eyeball tests to catch end-to-end regressions, which they did.

As part of hunting down the bug, I wrote a test that verified that the imported code still provided the behaviour that the rest of my required of it. My next problem was where should I put this test: should I add it to my project's test suite, or to that of the project whose code I am importing?

Eventually I decided on adding it to both suites. The code that I am importing is being written by another project that has its own set of acceptance criteria. By adding the test to their suite, they will detect when a change will regress my code. However, it's possible that their project may have to meet goals that are inconsistent with mine. In that case, they will change their acceptance tests. If I relied on their acceptance tests, I wouldn't catch the error until end-to-end testing. By keeping a copy of the test in my suite, and changing it as I need to, I define exactly what my code needs their code to do, can detect when their code no longer meets does that, and will indicate the location of the error when a regression occurs again.

Like a unit test with mock objects or an environment test, this test defines what one bit of code requires from another. However, it's scope is larger than unit tests and smaller than environment tests. I don't know how to describe it so, because it's defining/testing the functionality that one assembly needs to import from another, I've called it an "Import Test."

Posted on March 21, 2005 [ Permalink | Comments ]

State vs. Interaction Based Testing Example

In a comment to my previous post on this topic, Jason Yip asked for some example code. Here's a simplified example from a previous project. Note, this is all from memory and I haven't compiled or run any of this code; consider it pseudocode.

I was writing an interactive, graphical simulation. Well, a video game but that doesn't sound so impressive on a CV. The simulation rendered graphics, represented as Drawable objects, and ran simulation activities, represented as Animated objects, every timeslice. The Drawable and Animated interfaces are shown below:

public interface Drawable {
	void draw( GraphicsSurface g );
}
public interface Animated {
	void animate( double deltaT );
}

I implemented animated sprites as objects that implement both the Drawable and the Animated interfaces. A Sprite's animation was defined by a fixed number of Drawable cels and the duration that the cels will be displayed for. The Sprite displays each cel for the cel duration before stepping to the next in the loop.

Using mock objects (interaction based testing), my tests specified that a sprite draws its cels in order, stepping to the next after the cel duration.

public class SpriteTest extends MockObjectTestCase
	static final double CEL_DURATION = 1.0;

	GraphicsSurface display;
	Mock cel1;
	Mock cel2;
	Sprite sprite;

	public void setUp() {
		display = (GraphicsSurface )newDummy(GraphicsSurface.class,"graphics");

		cel1 = mock(Drawable.class,"cel1");
		cel2 = mock(Drawable.class,"cel2");
		List cels = new ArrayList();
		cels.add( cel1.proxy() );
		cels.add( cel2.proxy() );
		
		Sprite sprite = new Sprite( CEL_DURATION, cels );
		
	}

	public void testInitiallyDrawsFirstCel() {
		cel1.expects(once()).method("draw").with(same(display));
		
		sprite.draw( graphics );
	}

	public void testDrawsNextCelAfterCelDuration() {
		cel1.expects(once()).method("draw").with(same(display))
			.id("draw tick 1");
		cel1.expects(once()).method("draw").with(same(display))
			.after("draw tick 1")
			.id("draw tick 2");
		cel2.expects(once()).method("draw").with(same(display))
			.after("draw tick 2")
			.id("draw tick 3");
		
		sprite.animate( CEL_DURATION/2 );
		sprite.draw( graphics );
		sprite.animate( CEL_DURATION/2 );
		sprite.draw( graphics );
	}

	...
}

Here's a possible implementation that will pass these tests.

class Sprite implements Animated, Drawable {
	private List cels;
	private double celDuration;

	int currentCelIndex = 0;
	double celTimer = 0.0;
	
	public Sprite( double celDuration, List cels, ) {
		this.cels = (List)cels.clone();
		this.celDuration = celDuration;
	}
	
	public void draw( GraphicsSurface display ) {
		cels[currentCelIndex].draw(display);
	}

	public void animate( double deltaT ) {
		celTimer += deltaT;
		while (celTimer > celDuration) {
			celTimer -= celDuration;
			currentCelIndex = (currentCelIndex+1) % cels.size();
		}
	}
}

Now, how would I test this with state-based testing? I would have to expose the current cel as a property:

public class SpriteTest extends TestCase {
	...

	public void testInitiallyDrawsFirstCel() {
		assertEquals( "should be showing first cel", 0, sprite.getCurrentCelIndex() );
	}
       
	public void testDrawsNextCelAfterCelDuration() {
		sprite.animate( CEL_DURATION/2 );
		assertEquals( "should be showing first cel", 0, sprite.getCurrentCelIndex() );
		sprite.animate( CEL_DURATION/2 );
		assertEquals( "should be showing second cel", 1, sprite.getCurrentCelIndex() );
	}
}
public class Sprite ... {
	...
	public Drawable getCurrentCelIndex() {
		return currentCelIndex;
	}
}

But now I've had to create a new method on the Sprite class just for testing. This is not a good idea:

  1. I'll have to maintain the method when I change the internals of Sprite.
  2. The API is now more complex than it needs to be -- the getCurrentCelIndex() method is just noise that does not contribute to the required functionality of the class and will confuse maintenance programmers who have to learn how to use the class.
  3. The tests are now misleading because they don't express how one should use the class: domain code should never call getCurrentCelIndex() but the tests say the opposite.
  4. The tests do not actually test that the sprite draws the cel with the current cel index, only that it changes the current cel index. That's because drawing is done in a "Tell, Don't Ask" style — there is no state to assert about.

Drawback 1 is the one that hits you first when extending code. Later in the project I extracted the concept of a "clip" of cels so that a single clip could be shared by many sprite instances, and introduced finite and looped clips. Sprites used an iterator over cels instead of maintaining a current index. Looped animations were represented as a clip of infinite size: the iterator looped around the clip.

public interface CelIterator implements Drawable {
	boolean hasNext();
	void next();
}
public class Sprite ... {
	private CelIterator currentCel;
	private double celDuration;
	
	public Sprite( double celDuration, Clip clip ) {
		this.currentCel = clip.iterator();
		this.celDuration = celDuration;
	}
	
	public void draw( GraphicsSurface display ) {
		currentCel.draw(display);
	}
	
	public void animate( double deltaT ) {
		celTimer += deltaT;
		while (celTimer > celDuration && currentCel.hasNext() ) {
			celTimer -= celDuration;
			currentCel.next();
		}
	}
}

Apart from the setUp() method which passed a Clip to the Sprite instead of a List, I didn't have to change any tests. If I had used state based testing, on the other hand, what could I have done? The Sprite doesn't store the current cel any more. I could have exposed the current cel through the CelIterator, but now my tests are pushing a bad design decision into my domain code. They should be guiding me towards good designs, not bad designs.

What I care about is what my objects do, not what state they happen store to coordinate what they do, or happen to leave lying about in memory after they've finished doing what they do. The only visible manifestation of their behaviour are the messages that they send to other objects. That's why I find interaction based testing easier than state based testing when writing object-oriented code.

Posted on August 6, 2004 [ Permalink | Comments ]

State vs Interaction Based Testing

Soldiers on parade

Martin Fowler has recently written an article comparing what he calls "state-based" and "interaction-based" unit testing. The article doesn't really cover the subject in much depth but one statement in particular surprised me: "interaction-based tests are ... more coupled to the implementation of a method." I think that Martin is spot on when he says "one of the hardest things for people to understand in OO design is the 'Tell Don't Ask' principle," but that principle has a big influence on how you write tests and is exactly what makes interaction-based testing necessary. In an object-oriented design, an object's state is an implementation detail that should be properly encapsulated and its interactions with its environment should be its only visible behaviour. If you follow the "Tell, Don't Ask" style, objects have very little visible state to assert about.

When writing a program, I care only about what that program does, not the internal state that the program uses to control what it does. The only visible behaviour that a program has is its interactions with external entities, such as I/O devices or remote processes. When a program is divided into modules, likewise the internal state of a module is unimportant; it is only how a module interacts with other modules that matters.

In a procedural program, modules interact by reading and writing state that is stored in shared data structures, and so changes to that state should be tested. In an object oriented program, on the other hand, a program is modularised as collaborating objects that perform actions by sending messages to other objects. To effect changes in the program's environment, application objects send messages to objects that represent entities in the environment. The behaviour of a program, and that of the objects within it, is defined solely in terms of message sending, and those messaging interactions are what should be tested.

In software that has a pure object-oriented design, in which logic operating upon state is defined only in the objects that hold that state and objects interact in a "Tell, Don't Ask" style, objects expose next to no visible state that can be used for state based testing. Making assertions about state and state changes therefore requires objects to provide access to their internals that is not neccessary for the normal execution of the software, and ties the tests to implementation details that should be properly encapsulated.

Blaming "brittleness" of tests upon interaction-based testing is a red herring. Both interaction-based tests and state-based tests become brittle if they make assertions upon implementation details and overly constrain the interfaces between modules. Whether you prefer a procedural style in which you test the changes of visible state in data structures, or an object-oriented style in which you test the coordination of actions between objects, you need to carefully choose what your tests specify to keep them from being brittle. In state based tests you have to be careful that you don't test for inconsequential state changes or make too tight assertions about state values. In interaction tests you have to be careful not to test for inconsequential interactions and make too tight assertions about parameter values. This is why jMock provides so much flexibility in the way a test can define constraints upon method signature, parameter values, invocation ordering, expected invocation counts, etc.

I think that the most important benefit of interaction based testing is that it helps reduce the amount of mutable state in a program. Mutable state makes a program harder to understand and maintain because the behaviour of a piece of code cannnot be easily predicted merely by reading the text but depends on the sequence of events that put it and its environment into signigicant states. By concentrating on the interactions between objects instead of state changes, interaction-based testing guides the design towards objects that transform data as they pass it around rather than store data and perform logic on the state of other objects.

I know this makes me sound like a functional programming zealot instead of an object-oriented programming zealot, so I've dug up this quote by Alan Kay to reestablish my OO purist credentials:

"Doing encapsulation right is a commitment not just to abstraction of state, but to eliminate state oriented metaphors from programming." — Alan Kay, Early History of Smaltalk.

I have come to think of object oriented programming as an inversion of functional programming. In a lazy functional language data is pulled through functions that transform the data and combine it into a single result. In an object oriented program, data is pushed out in messages to objects that transform the data and push it out to other objects for further processing.

Update: example and code.

Posted on July 14, 2004 [ Permalink | Comments ]

Don't test the methods of your classes

The most perfect expression of human behavior is a string quartet. — Jeffrey Tate

I recently had to work on some code I wrote before I had really understood test-driven development. I had written the tests first and the code had 100% test coverage, but the tests were not very useful in helping me understand the code.

The problem was that I had written tests for each method, testing pre- and postconditions and class invariants. This is the wrong approach to writing programmer tests. Instead, each test case should specify a describable aspect of the functionality an object provides to its clients. That aspect will probably involve multiple methods of the class. As a maintenance programmer using the tests as documentation you want to know how the behaviour of those methods is interrelated, not how each method acts individually.

Before writing that code I had recently written an application in Eiffel and the Design by Contract approach was fresh in my mind. This is one of the reasons I don't really like Design by Contract: it doesn't help you understand your code because the DbC specifications are attached to each feature of a class, instead of specifying aspects of a class' functionalty that each involve the interplay of multiple features. Using DbC to specify the interrelation of methods requires the programmer add features to the class that are otherwise unnecessary, and has the result of actually making the class harder to understand.

Posted on December 15, 2003 [ Permalink | Comments ]

Always make your tests fail

An airliner crash test

I ported some old code that I wrote in my branch of the mockobjects.com dynamic mock packages to the new jMock codebase. Looking through the test suite I found some tests that didn't actually test anything! They just passed without bothering to check the behaviour of the object being tested. Hmmm... I must have written the test and the code to be tested at the same time, and so expected the test to pass. Which of course it appeared to do with flying colours. What a stupid thing to do. I should have written each test, watched it fail and only then written the code to make it pass.

Lesson: always make your tests fail before you make them pass.

Or, as is already written on the C2 Wiki, "never write a line of code without a failing test.

Posted on December 11, 2003 [ Permalink | Comments ]