March 1, 2005 12:00 AM

When Metaprograms Attack!

Shark Attack!

One of the most cunning, or, to be honest, most stupid, bits of code I ever wrote was a Tcl program that wrote itself. It was a browser for a distributed object directory that would let the user examine and invoke objects across the network.

The browser queried the directory service, used CORBA's reflection mechanism to interrogate objects and wrote new classes to represent the remote objects that it found and the GUI components to display objects, get and set their attributes, invoke methods and display returned values. The entire program was two or three screenfuls of code, but wrote itself to be much larger as it ran.

Unfortunately it contained a bug that occurred infrequently and seemingly at random. I couldn't work out where the bug was in the code that the program wrote because I couldn't put breakpoints into code that didn't exist until runtime. That meant that I couldn't find the real bug, the bug in the code that wrote the code that contained the bug that actually occurred in the running program.

Eventually I rewrote the program in a more traditional style. What a shame. It was so very nearly the coolest code I ever wrote.

Posted on March 23, 2005 [ Permalink | Comments (2) ]

Import Tests

Sniffing out Illegal Imports

A project I am working on hit a snag recently: all of a sudden the program stopped producing correct results for a subset of the input data. I tracked the problem down and found that the culprit was a package of calculation routines I'm importing from another project. Some regression in their code fed incorrect results into my calculation, which caused my final results to be way out of whack.

Unfortunately, my acceptance/regression suites hadn't caught the error. My acceptance tests were testing just what the business users wanted to see: that the calculation engine itself generated good results when fed good data. To test that, and to get a predictable test environment, I had captured input data and expected results in spreadsheets that were fed to the calculation engine.

One table of input data captured what would be calculated in production by the imported code that eventually caused the regression. Ideally I would have fed test data through that code, but it was not possible to connect it to a different source of data: the data access code was entangled within the calculation code. So, I used the test data to test my calculation engine only and relied on manual eyeball tests to catch end-to-end regressions, which they did.

As part of hunting down the bug, I wrote a test that verified that the imported code still provided the behaviour that the rest of my required of it. My next problem was where should I put this test: should I add it to my project's test suite, or to that of the project whose code I am importing?

Eventually I decided on adding it to both suites. The code that I am importing is being written by another project that has its own set of acceptance criteria. By adding the test to their suite, they will detect when a change will regress my code. However, it's possible that their project may have to meet goals that are inconsistent with mine. In that case, they will change their acceptance tests. If I relied on their acceptance tests, I wouldn't catch the error until end-to-end testing. By keeping a copy of the test in my suite, and changing it as I need to, I define exactly what my code needs their code to do, can detect when their code no longer meets does that, and will indicate the location of the error when a regression occurs again.

Like a unit test with mock objects or an environment test, this test defines what one bit of code requires from another. However, it's scope is larger than unit tests and smaller than environment tests. I don't know how to describe it so, because it's defining/testing the functionality that one assembly needs to import from another, I've called it an "Import Test."

Posted on March 21, 2005 [ Permalink | Comments (5) ]

Encapsulation is not Information Hiding

A Gemini space capsule in orbit

I have recently had problems integrating code written on other projects into my own application. In each case, the problem was caused by a reference to an external resource — a file name, for example — being hard-coded in the class of some object in an internal package within the the code I was trying to use. The explanation for the design of these packages, and why I shouldn't change things, was that the dependency was "encapsulated" within the object in question.

Many articles and books use the word "encapsulation" as a synonym for "information hiding". However, encapsulation and information hiding are two separate, orthogonal concepts:

  • Information hiding conceals how an object implements its functionality behind the abstraction of the object's API.
  • Encapsulation ensures that the behaviour of an object can only be affected through the object's API.

Information hiding lets one build higher level abstractions on top of lower level details. Good information hiding lets one ignore details of the system that are unrelated to the task at hand.

Encapsulation ensures that there are no unexpected dependencies between conceptualy unrelated parts of the system. Good encapsulation lets one easily predict how a change to one object will, or will not, impact on other parts of the system. Achieving encapsulation requires the use of common coding techniques: defining immutable value types, avoiding global variables and singletons, copying collections or mutable value objects when storing them in instance variables or returing them from methods, and so forth.

When hiding information it is important that the right information is hidden in the right place. The problems I encountered were caused by information about the environment of the application that should have been specified at the application scope being hidden hidden in a lower level class that should have been passed that information, not known it a priori.

The problem I have with using the word "encapsulation" to mean "information hiding" is that encapsulation is always a good thing to do but hiding information in the wrong place is not. Suppose I say:

"Let's encapsulate the exact data structure used by the cache in the CachingStockLoader class." "Let's encapsulate the name of the application's log file in the CalculationProgressListener class."

That sounds good to me, but is it really?

I find it much easier to make good decisions when I am clear about when I am doing encapsulation and when I am doing information hiding. For example, I would restate the statements above to be explicit that they really refer to information hiding:

"Let's hide the exact data structure used by the cache in the CachingStockLoader class." "Let's hide the name of the application's log file in the CalculationProgressListener class."

That will make me realise that the first decision is correct but the second is suspect. Code that loads stocks should not have to care whether the loader caches previous requests in a hash table or red-black tree or whatever. The name of the application's log file, on the other hand, should probably not be hidden away in that CalculationProgressListener class; it should be specified by the application and passed to instances when they are constructed.

I find it essential to keep the distinction between "encapsulation" and "information hiding" in mind when thinking about design decisions or discussing them when pair programming. When I use the wrong word to think or talk about a design decision I find it harder to realise when the decision is incorrect. When I admit that I am doing information hiding, not encapsulation, I can better decide whether the information I am hiding should be hidden at all and if so, where I should hide it.

Posted on March 17, 2005 [ Permalink | Comments (1) ]