Binstock on Software: programming

Showing posts with label programming. Show all posts

Wednesday, April 23, 2008

Perfecting OO's Small Classes and Short Methods

In The ThoughtWorks Anthology a new book from the Pragmatic Programmers, there is a fascinating essay called “Object Calisthenics” by Jeff Bay. It’s a detailed exercise for perfecting the writing of the small routines that demonstrate characterize good OO implementations. If you have developers who need to improve their ability to write OO routines, I suggest you have a look-see at this essay. I will try to summarize Bay’s approach here.

He suggests writing a 1000-line program with the constraints listed below. These constraints are intended to be excessively restrictive, so as to force developers out of the procedural groove. I guarantee if you apply this technique, their code will move markedly towards object orientation. The restrictions (which should be mercilessly enforced in this exercise) are:

1. Use only one level of indentation per method. If you need more than one level, you need to create a second method and call it from the first. This is one of the most important constraints in the exercise.

2. Don’t use the ‘else’ keyword. Test for a condition with an if-statement and exit the routine if it’s not met. This prevents if-else chaining; and every routine does just one thing. You’re getting the idea.

3. Wrap all primitives and strings. This directly addresses “primitive obsession.” If you want to use an integer, you first have to create a class (even an inner class) to identify it’s true role. So zip codes are an object not an integer, for example. This makes for far clearer and more testable code.

4. Use only one dot per line. This step prevents you from reaching deeply into other objects to get at fields or methods, and thereby conceptually breaking encapsulation.

5. Don’t abbreviate names. This constraint avoids the procedural verbosity that is created by certain forms of redundancy—if you have to type the full name of a method or variable, you’re likely to spend more time thinking about its name. And you’ll avoid having objects called Order with methods entitled shipOrder(). Instead, your code will have more calls such as Order.ship().

6. Keep entities small. This means no more than 50 lines per class and no more than 10 classes per package. The 50 lines per class constraint is crucial. Not only does it force concision and keep classes focused, but it means most classes can fit on a single screen in any editor/IDE.

7. Don’t use any classes with more than two instance variables. This is perhaps the hardest constraint. Bay’s point is that with more than two instance variables, there is almost certainly a reason to subgroup some variables into a separate class.

8. Use first-class collections. In other words, any class that contains a collection should contain no other member variables. The idea is an extension of primitive obsession. If you need a class that’s a subsumes the collection, then write it that way.

9. Don’t use setters, getters, or properties. This is a radical approach to enforcing encapsulation. It also requires implementation of dependency injection approaches and adherence to the maxim “tell, don’t ask.”

Taken together, these rules impose a restrictive encapsulation on developers and force thinking along OO lines. I assert than anyone writing a 1000-line project without violating these rules will rapidly become much better at OO. They can then, if they want, relax the restrictions somewhat. But as Bay points out, there’s no reason to do so. His team has just finished a 100,000-line project within these strictures.

Tuesday, February 19, 2008

Restarting the Platypus And the Lessons Learned

As many of you know, I have spent much of my free time during the last 24 months working on an open-source project called Platypus. The project's goal is to implement a command language like TeX, which enables users to embed formatting commands directly into text and generate documents of typeset quality in PDF, Microsoft Word, and HTML. The aims of Platypus are to be much easier to use than Tex and to provide many features of interest to developers, especially for printing code and listings.

After approximately 20,000 lines of Java code and comments, I have concluded that I need to restart and re-architect the project. The more I code, the more I see that I am adding top floors to a leaning tower. Eventually I'll topple it. So by restarting Platypus, I hope to straighten out architectural shortcomings and deliver a better, more expandable product more quickly.

In the process of coming to this decision, I have been able to crystallize several key lessons, a few of which I could probably have seen foreseen.

PROJECT AND DESIGN LESSONS

1) It's extremely difficult to figure out where your architecture is deficient if you have never done the kind of project you're currently undertaking. The best you can do is layout some basic architecture, abide by good dev practices, and learn as you go. Alas, as in this case, it took 20K lines of work to recognize that the architecture was irretrievably flawed and how.

2) First, do the known hard parts that can't be avoided. In the case of Platypus, I knew from the get-go I wanted a full programming language for the user to employ (the lack of which is one of the major failings of TeX). Early on, I decided that language would be JavaScript (JS). And having decided that and knowing that Java 6 had built-in support for JS, I put the issue aside for later implementation. When I revisited implementing JS, I realized that my command syntax no longer worked well with JS and that some commands would have been better implemented in JS. Had I written code and worked with embedded JS from the beginning, I could have avoided these dissonances, and I would have experienced Platypus more from the perspective of the user.

3) If you have to write a parser, write it just once. I wrote a parser for basic and compound commands, but did not anticipate intricacies of the syntax for very complex commands (think of commands to specify a table, for example). When it came time to add these commands, I found myself undoing a lot of parser work and then trying to back-fit existing syntax in to the new grammar. Parsers are a nightmare to get right, so make sure you write them just once. This means planning all your syntax ahead of time and in great detail.

4) Learn how to present your project crisply. Everyone understands writing a debugger for a hot new language is a cool project that will attract contributors. But projects where there is no immediately identifiable built-in community require crisply articulated messages. I did not do this well.

PROGRAMMING LESSONS:

a) Design Java classes around dependency injection. This will make the classes work better together and help you test them well. I am not saying use a DI framework, which is overkill in many situations, just use the DI pattern.

b) When it comes to modularity for input and output processing, plug-ins are an excellent design model. They form a very convenient way to break projects into sub-projects, and they make it easier for contributors to work on a discrete part of the project.

c) Unit testing delivers extra value when you're writing difficult code. The ability to be deep in the parser and test for some specific semantic occurrence right away is pure gold. Unit testing can also show up design errors. At one point, I was implementing a whole slew of similar commands (relating to output of special characters). I'd made each special character its own class; so each character required: copying a class, renaming it, and changing two strings inside it. Then rinse, lather, repeat. Likewise, my unit tests were just copied, tweaked, and run. This obviously is not a great use of unit tests (what are you actually testing? your ability to tweak a test template?). This was a good clue that the design needs revisiting. And in fact, it did. In the new design, one class will handle all special characters.

OBSERVATIONS ABOUT TOOLS

1) Of all the Java IDEs I've used, I like IntelliJ IDEA best. And since I review IDEs regularly for InfoWorld, I've used lots of them including the high-priced pups. IDEA provides the best user experience, bar none. However, as the number of classes in Platypus climbed towards 200, I noticed IDEA became very slow. Clicking on a file to haul it into an edit window was a tedious process, sometimes taking 30 seconds or more (even with v. 7 of the product). Because of this, I will look elsewhere at restart. I expect to use NetBeans 6 (a hugely improved IDE, by the by) at least initially. We'll see how that works out.

2) I switched from Ant to Maven while working on Platypus. Maven is a much better solution for many build steps than Ant. See my blog post on this. However, I dislike both products. I find that I still have to waste lots of time doing simple configurations. Also, I also don't like using XML for configuring builds. I generally concur with Tapestry's Howard Ship, that Ivy plus some other tool might be a better solution. I'll explore this as I go.

3) Continuous Integration (CI) is a good concept and there are truly great tools out there. But outside of providing historical data about a project, CI's value is limited on a one-developer project. This especially true when builds are already being done on a separate machine and only code that past tests is checked into the repository. (Nonetheless, the historical data is reason enough to continue using it.)

There are surely other lessons to be learned, and as they come to me, I'll post them on this blog if they seem useful.

Words of Thanks

It would be quite wrong to end this post without pausing to deeply thank Jeff Frederick, who was exceedingly generous with his time and insights while I worked on this first phase and who hastened my realization of several important aspects that I've touched on in this post. Thank you!

Tuesday, January 22, 2008

Excellent Explanation of Dependency Injection (Inversion of Control)

I've read lots of explanations of Dependency Injection or DI (formerly known as Inversion of Control) and the associated Hollywood Principle ("Don't call us, we'll call you."). They all tend to be unclear, either because they delve immediately into highly detailed explanations, or they tie the explanation specifically to one particular technology. Such that either the pattern is lost or its simplicity is. Here is clearest explanation I've found--slightly edited for brevity (from the very good Spring in Action, 2nd. Ed. by Craig Walls):

"Any nontrivial application is made up of two or more classes that collaborate with each other to perform some business logic. Traditionally, each object is responsible for obtaining its own references to the objects it collaborates with (its dependencies). When applying DI, the objects are given their dependencies at creation time by some external entity that coordinates each object in the system. In other words, dependencies are injected into objects."

I find that very clear.

Dependency Injection was originally called Inversion of Control (IoC) because the normal control sequence would be the object finds the objects it depends on by itself and then calls them. Here, this is reversed: The dependencies are handed to the object when it's created. This also illustrates the Hollywood Principle at work: Don't call around for your dependencies, we'll give them to you when we need you.

If you don't use DI, you're probably wondering why it's a big deal. It delivers a key advantage: loose coupling. Objects can be added and tested independently of other objects, because they don't depend on anything other than what you pass them. When using traditional dependencies, to test an object you have to create an environment where all of its dependencies exist and are reachable before you can test it. With DI, it's possible to test the object in isolation passing it mock objects for the ones you don't want or need to create. Likewise, adding a class to a project is facilitated because the class is self-contained, so this avoids the "big hairball" that large projects often evolve into.

The challenge of DI is writing an entire application using it. A few classes are no big deal, but a whole app is much more difficult. For entire applications, you frequently want a framework to manage the dependencies and the interactions between objects. DI frameworks are often driven by XML files that help specify what to pass to whom and when. Spring is a full-service Java DI framework; other lighter DI frameworks include NanoContainer and the even more lightweight PicoContainer .

Most of these frameworks have good tutorials to help beginners find their way.

Tuesday, December 11, 2007

Beautiful Code vs. Readable Code

For many years--decades actually--I was a big fan of beautiful code. I read almost everything by Brian Kernighan, Jon Bentley, and P. J. Plauger. This passion for elegant code was an attempt to re-create the rush I felt when I first read:

*x++ = *y++

in the C Programming Language. I'd never seen anything so beautifully succinct. It was luminous!

But as years passed, I read many clever algorithms, many impressive optimizations, many small tricks. And I got less and less charge from each of these discoveries. The reason, quite frankly, is that they almost always fell into one of two categories: some very elegant expressiveness in a new language (Ruby converts from Java can attest to this) or a technique that I'm not likely to ever use. In other words, I was chasing baubles.

In time, my esthetic sense turned to code clarity for its jollies. Today, if I can pick up a blob of complex code, read it in one pass, and accurately understand what it's doing; then I feel the rush again. I most often have this feeling when reading the code of great, non-academic developers. To be honest, when reveling in such moments, I frequently have the perception that my code is not like theirs. Even my best code doesn't quite snap together like theirs does. And I have wondered what I could do to improve my code clarity.

Kent Beck's new book, Implementation Patterns is a short handbook on code clarity. I have read much of it and already I recognize some bad habits that undermine my code's readability. Beck basically looks at typical coding issues and dispenses sage advice.

This means that some recommendations are best suited to beginners, while there are just enough of the other more subtle suggestions to keep the attention of a veteran who cares about clarity. For example, one poor habit I have developed without thinking about it too much is mixing levels of abstraction in the same method. So, for example (using Beck's example):

void process() {
input();
count++;
output();
}

Here the second statement is clearly at a different level of abstraction than the other two, which makes the code harder to read quickly. Beck proposes the following, which I agree is clearer.

void process() {
input();
tally();
output();
}

There are many other habits of mine that this book has illuminated. And in the half a dozen changes it will bring to my style, I think I have derived more benefit than in all the essays I've read on beautiful code.

Before leaving off, however, I should point out two caveats. The beginner-to-intermediate material dominates; so you'll need to skim over large parts of the text to extract the valuable nuggets. (However, this aspect makes it a great gift for junior programmers at your site.) A second point is that the book lacks for good editing. A book on code clarity should be pellucid; this one is not. (Consider the use of the word 'patterns,' which is highly misleading. It's not at all about patterns.) But these are forgivable issues. The book is a useful read if you share my appreciation of clear code.

Tuesday, July 31, 2007

Needed Code Reminders

In a recent blog post, Larry O'Brien, points out that we need more varied types of tags in our code than the overworked TODO. He suggests one that enables you to mark code intended for a future feature (despite YAGNI). Which I understand, but would strongly tend to avoid. I think the XP folks have it right in hewing closely to YAGNI.

But Larry's larger theme comes up for me as well. There are some useful tags I can think of. The most urgent to me is one that you can put on suspected cruft. Something like: CRUFT. This would suggest that maintenance needs to be done to check if this code is in use anywhere, and if not then to delete it. Not just "in use" in the technical sense, but in the sense of doing anything that's still needed.

I've also been hankering for a more graded version of TODO. Such as TODO-NOW and then plain TODO. The latter would mean "todo sometime."

IntelliJ (and presumably other IDEs) enable you to do create these custom tags and have them recognized/processed by the IDE, which is a really helpful.

Thursday, July 12, 2007

A Limitation in Subversion. Any ideas?

Today, I was speaking with an expert at CollabNet (the Subversion folks) about a problem I have with Subversion. To my amazement, he told me Subversion can't handle my (apparently) simple problem.

When I work on Platypus, I start by checking out my code from the public Subversion repository. I put it in my local project directory, and work there. Along the way, I might write code that breaks existing tests. At some point before I get all tests working, I want to check my code into an SCM, so that I can get back to where I am right now, if something should go wrong. Best practices in SCM forbid committing code to a project directory if it breaks tests. So, the way to do what I need, it would seem, is to use a temporary repositorty to commit to until I'm finished coding. Then, when all tests pass, I can commit the code back to the original public Platypus repository.

But Subversion does not permit checking code from one directory into different repositories. The suggestion from CollabNet was to copy the code to a temporary directory from which I could commit it to a separate Subversion repository. This, of course, is a non-starter--you don't keep two live copies of the code you're working on. That makes for all sorts of errors.

Another suggestion was to create a branch in the public repository and commit the small changes to that branch, and eventually merge the branch when all tests pass. I don't like this either, because it forces me to check non-working code into a public repository--the whole thing I want to avoid.

The only solution I've come up with is to use a different SCM (Perforce) for my local repository. Check temporary code into it. Then when everything is copascetic, check the tested code into the public Subversion directory. This works, but it's hardly elegant.

How do you solve this problem or how do you avoid it to begin with? TIA

Monday, May 28, 2007

Comments--How many should you have?

While there is considerable conversation about how many unit tests to write (I have two recent posts on the topic--here and here--based on conversations with vendors), few people have much to say re how many comments there should be. Once the usual themes (more comments, keep comments up-to-date, avoid pointless comments) have been stated, the conversation ends. Everyone understands what relevant and up-to-date comments mean, but few will hazard a guess as to how many of them necessary.

Interestingly, the comment ratio is a key factor in one of the most interesting metrics, the maintainability index (MI). It is also a ratio that is rewarded by Ohloh.net, the emerging site for tracking details of open-source projects. Ohloh gives projects a kudo for a high percentage of comments. The question is how high is high enough to earn the kudo. According to Ohloh, the average open source project runs around 35% comments. Projects in the top third overall get the kudo. I don't know the cut-off for this top third, but I do know Apache's FOP with a 46% ratio of comments definitely qualifies.

Comment count is a metric that is particularly easy to spoof. You could do what some OSS projects do and list all the license terms at the top of each file. I've always disliked scrolling through this pointless legalese. In my file headers, I simply point to the URL of the license, which is sufficient. But all the license boilerplate inflates comment counts amazingly. So does commenting out code and leaving it in the codebase (gag!) or writing Javadoc for simple getters and setters (also gag).

But where legitimate comments are concerned, 35% is probably a good working number to shoot for. I find many OSS projects at this ratio have quite readable code. I'll be working to bring my own codebase up to this level.

Friday, May 18, 2007

Groovy Gaining Traction

Java developers suddenly have a wealth of choices when it comes to dynamic languages that run on the JVM. There's JavaFX, which Sun announced at JavaOne this year, and JRuby, which Sun expects to complete sometime this year, and then, of course, there's my favorite: Groovy. Groovy makes writing Java programs far easier. It essentially takes Java and removes the syntactical cruft, leaving a neat language that makes you terrifically productive.

Because Groovy took a long time getting out of the gate, it's taken some licks in the press. However, it's clear that Java developers are catching on to its benefits. The JavaOne bookstore published its daily top-10 sales during the show. The picture on this post, shows the Day 2 list with two Groovy titles in the top 10 (at places 5 and 8). Overall, the Groovy bible, Groovy in Action, came in at number 5 for the show. Interest is definitely growing.

If you haven't tried Groovy yourself, it's definitely worth a look. Here are a couple of good overviews:

The Groovy Getting Started Guide
Andrew Glover's Overview and Tutorial on DeveloperWork

Wednesday, May 16, 2007

Unit Testing Private Variables and Functions

How do you write unit tests to exercise private functions and check on private variables? For my projects, I have relied on a technique of adding special testing-only methods to my classes. These methods all have names that begin with FTO_ (for testing only). My regular code may not call these functions. Eventually, I'll write a rule that code-checkers can enforce to make sure that these violations of data hiding don't accidentally appear in non-test code.

However, for a long time I've wanted to know if there is a better way to do this. So, I did what most good programmers do--I asked someone who knows testing better than I do. Which meant talking to the ever-kind Jeff Frederick, who is the main committer of the popular CI server Cruise Control (and the head of product development at Agitar).

Jeff contended that the problem is really one of code design. If all methods are short and specific, then it should be possible to test a private variable by probing the method that uses it. Or said another way: if you can't get at the variable to test it, chances are it's buried in too much code. (Extract Method, I have long believed, is the most important refactoring.)

Likewise private methods. Make 'em small, have them do only one thing, and call them from accessible methods.

I've spent a week noodling around with this sound advice. It appeals to me because almost invariably when I refactor code to make it more testable, I find that I've improved it. So far, Jeff is mostly right. I can eliminate most situations by cleaning up code. However, there are a few routines that look intractable. While I work at find a better way to refactor them (a constant quest of mine, actually), I am curious to know how you solve this problem.

Saturday, January 27, 2007

One activity that is inherently productive: unit testing

In a post today in his blog, Larry O'Brien states: "...let's be clear that of all the things we know about software development, there are only two things that we know to be inherently highly productive: Well-treated talented programmers and iterative development incorporating client feedback"

I find it hard not to add unit testing to this list.

Of all the things that have changed in how I program during the last six or seven years, nothing comes close to unit testing in terms of making me more productive. In addition, it has made me a better programmer, because as I write code I am thinking about how to test it. And the changes that result are almost always improvements.

Binstock on Software