Binstock on Software: testing

Showing posts with label testing. Show all posts

Tuesday, November 02, 2021

Jacobin JVM project after three months

Development on Jacobin, the JVM written in go that supports Java 11, has been proceeding rapidly. In the 100 days since the beginning of the project, there have been 314 pushed commits. I'll give more stats below. Here's where we stand:

Jacobin can read, parse, format check, and load class files. This process happens very quickly. For example, running all these steps on one of the largest classes in the JDK distribution, BigDecimal.class, takes just 2ms. When parsed, BigDecimal has 1567 entries in its constant pool, 37 fields, and 167 methods. That's a huge class!

When a class is loaded by Jacobin or any other JVM, it necessarily pulls in other classes to be loaded. For example, all classes run from the command-line have a superclass. Often, that superclass is java.lang.Object, which depends on other classes. Among these are java.lang.Class and java.lang.String; various I/O classes are needed as well. The OpenJDK-based JVMs (essentially, all JVMs except IBM's J9 and some embeddable VMs) address this need by preloading hundreds of widely used classes at JVM start-up. For a look at the list of all the classes loaded just to display the JVM version info, run this from the command line:

java -verbose:class -version

On my Java 11 test system, this command preloads 381 classes (in 347ms!) While Jacobin does not need as many classes loaded to run the specified class, it needs a subset of them. The next step in the project is to identify the required classes and load them quickly. To this end, loading opertions (parsing and format checking) will need to be done in parallel. Fortunately, one of the go language's strengths is a rich set of easy-to-use resources for precisely this kind of concurrent operation.

After this task is completed, work will begin on execution.

Testing Thoroughly

One of the principal goals of Jacobin is to be a reliable JVM. This requires disciplined work in the planning, development, and testing. Development is based entirely in tasks which are logged in a cloud-instance of JetBrains' excellent tool, YouTrack (graciously provide for free). You can see the presence of this tracking, in that every commit on GitHub starts with the corresponding task name. (Presently, the most recent task is JACOBIN-89.) Quality of the code is reviewed by automatic linters on GitHub. Currently, the code merits an A+. The goreport badge on the jacobin GitHub project, takes you to the most recent report.

Testing is done on a near-fanatical basis. Let me explain:

In 2005, I was a contractor with Agitar, a now-shuttered company that made a tool which would read a Java codebase and generate unit tests for missing areas of coverage. It worked great. In conversations with their sales engineers, they told me they used a back-of-the-envelope calculation to assess a company's commitment to testing. They compared the size of the test codebase to the production code. If the test codebase was 50% the size, the company had some commitment to testing. Over 80% was a clear and strong commitment to testing, and over 100% meant a deeply engrained testing culture.

The current code base of Jacobin consists of 8,342 lines (includes: code, comments, blank lines). Of those, 4,718 lines are in tests. That is, the testing codebase is 130.2% the size of the production code. The goal is to get that ratio even higher. Future quarterly updates will reveal our success in this effort.

Want to help?

It's always great to know a project is interesting to others. If Jacobin is interests you and you want to encourage its progress, a GitHub star is our preference. If you want to participate more directly, let me know in the comments, which are kept private. We also love code reviews, suggestions, and later on, we'll surely need folks to do testing. Whatever your interest, thanks for your time!

Wednesday, May 16, 2007

Unit Testing Private Variables and Functions

How do you write unit tests to exercise private functions and check on private variables? For my projects, I have relied on a technique of adding special testing-only methods to my classes. These methods all have names that begin with FTO_ (for testing only). My regular code may not call these functions. Eventually, I'll write a rule that code-checkers can enforce to make sure that these violations of data hiding don't accidentally appear in non-test code.

However, for a long time I've wanted to know if there is a better way to do this. So, I did what most good programmers do--I asked someone who knows testing better than I do. Which meant talking to the ever-kind Jeff Frederick, who is the main committer of the popular CI server Cruise Control (and the head of product development at Agitar).

Jeff contended that the problem is really one of code design. If all methods are short and specific, then it should be possible to test a private variable by probing the method that uses it. Or said another way: if you can't get at the variable to test it, chances are it's buried in too much code. (Extract Method, I have long believed, is the most important refactoring.)

Likewise private methods. Make 'em small, have them do only one thing, and call them from accessible methods.

I've spent a week noodling around with this sound advice. It appeals to me because almost invariably when I refactor code to make it more testable, I find that I've improved it. So far, Jeff is mostly right. I can eliminate most situations by cleaning up code. However, there are a few routines that look intractable. While I work at find a better way to refactor them (a constant quest of mine, actually), I am curious to know how you solve this problem.

Monday, April 30, 2007

How many unit tests per method? A rule of thumb

The other day, I met with Burke Cox who heads up Stelligent, a company that specializes in helping sites set up their code-quality infrastructure (build systems, test frameworks, code coverage analysis, continuous integration--the whole works, all on a fixed-price basis). One thing Stelligent does before leaving is to impart some of the best practices they've developed over the years.

A best practice Stelligent uses for determining the number of unit tests to write for a given method struck me as completely original. The number of tests is based on the method's cyclomatic complexity (aka McCabe complexity). This complexity metric starts at 1 and adds 1 for every path the code can take. Many tools today generate cyclomatic complexity counts for methods. Stelligent's rule of thumb is:

complexity 1-3: no test (likely the method is a getter/setter)
complexity 3-10: 1 test
complexity 11+: number of tests = complexity / 2

I like this guide, but would change one aspect. I think a cyclomatic complexity of 10 should have more than 1 test. I'd be more inclined to go with: complexity 3-10: # of tests = complexity / 3.

Note: you have to be careful with cyclomatic complexity. I recently wrote a switch statement that had 40 cases. Technically, that's a complexity measure of 40. Obviously, it's pointless to write lots of unit tests for 40 case statements that differ trivially. But, when the complexity number derives directly from the logic of the method, I think Stelligent's rule of thumb (with my modification) is an excellent place to start.

Monday, April 23, 2007

Effectiveness of Pair-wise Tests

The more I use unit testing, the more I wander into areas that are more typically the province of true testers (rather than of developers). One area I frequently visit is the problem of combinatorial testing, which is how to test code where there is a large number of possible values it must handle. Let's say, I have a function with four parameters that are all boolean. There are, therefore, 16 possible combos. My temptation is to write 16 unit tests. But the concept of pairwise testing argues against this test-every-permutation approach. It is based on the belief that most bugs occur between the interaction of pairs of values (rather than a specific configuration of three, or four of them). So, pair-wise experts look at the 16 possible values my switches can have and choose the minimum number in which all pairs of values have been exercised. It turns out there are 5 tests that will exercise every pair of switch combinations.

The question I've wondered about is if I write those five unit tests, rather than the more ambitious 16 tests, what have I given up? The answer is: not much. At the recent Software Test & Performance Conference, I attended a session by BJ Rollison who heads up TestingMentor.com when he's not drilling Microsoft's testers in the latest techniques. He provided some interesting figures from Microsoft's analysis of pair-wise testing.

For attrib.exe (which takes a path + 6 optional args), he did minimal testing, pairwise, and comprehensive testing with the following results:

Minimal: 9 tests, 74% code coverage, 358 code blocks covered.
Pairwise: 13 tests, 77% code coverage, 370 code blocks covered
Maximal: 972 tests, 77% code coverage, 370 code blocks covered

A similar test exploration with findstr.exe (which takes a string + 19 optional args) found that pairwise testing via 136 tests covered 74% of the app, while maximal coverage consisting of 3,533 tests covered 76% of the app.

These numbers make sense. Surely, if you test a key subset of pairs possibilities, testing additional combinations is not likely to exercise completely different routines, so code coverage should not increase much for the tests that exceed pair-wise recommendations. What surprised me was that pairwise got such high-numbers to begin with. 70+ % is pretty decent coverage.

From now on, pair-wise testing will be part of my unit-testing design toolbox. For a list of tools that can find the pairs to test, see here. Rollison highly recommended Microsoft's free PICT tool (see previous link), which also provides a means to specify special relationships between the various factors in the combinations.

Thursday, March 22, 2007

Characterization Tests

In my current column in SD Times, I discuss characterization tests, which are an as-yet little discussed form of testing. They are unit tests whose purpose is not to validate the correctness of code but to capture in tests the behavior of existing code. The idea, first popularized in Michael Feathers' excellent volume Working Effectively with Legacy Code, is that the tests can reveal the scope of any changes you make to a codebase.

You write comprehensive tests of the codebase before you touch a line. Then, you make your changes to the code, and rerun the tests. The failing tests reveal the dependencies on the code you modified. You clean up the failing tests and now they reflect the modified code base. Then, rinse, lather, repeat.

The problem historically with this approach is that writing the tests is a colossal pain, especially on large code bases. Moreover, large changes in the code break so many tests that no one wants to go back and redo 150 tests, say, just to capture the scope of changes. And so frequently, the good idea falls by the way side--with unfortunate effects when it comes time for functionally testing the revised code.To the rescue comes Agitar with a free service called JunitFactory. Get a free license and send them your Java code they will send you back the characterization tests for it. Cool idea. Change your code, run the tests, verify that nothing unexpected broke. And then have JUnitFactory re-create a complete set of characterization tests. How cool is that?

Actually, I use Agitar's service not only for characterization but for my program as I am developing it. The tests always show me unit tests I never thought of writing. Try it out. You'll like it.

Thursday, March 01, 2007

How Many Unit Tests Are Enough?

Recently, I was down visiting the folks at Agitar, who make great tools for doing unit testing. Going there always results in interesting conversations because they really live and breathe unit testing and are always finding new wrinkles in how to apply the technique. During one conversation, they casually threw out a metric for unit testing that I'd never heard before. It answers the question of: How many unit tests are enough? You'll note that pretty much all the books and essays on unit testing go to great pains to avoid answering this question, for fear, I presume, that by presenting any number they will discourage developers from writing more. Likewise, if the suggested number is high, they risk discouraging developers who will see the goal as overwhelming and unreachable.

The engineers at Agitar, however, did offer a number (as a side comment to an unrelated point). They said (I'm paraphrasing) that if the amount of test code equals the amount of program code, you're generally in pretty good shape. Their experience shows that parity of the two codebases translates into code coverage of around 70%, which means a well-tested app. Needless to say, they'd probably want to qualify this statement, but as a rule of thumb, I think it's a very practical data point--even if a bit ambitious.

I will see how it works out on my open-source projects. Currently, my ratio of test-code to app-code is (cough, cough) just over 42%. I guess I know what I'll be doing this weekend.

Saturday, January 27, 2007

One activity that is inherently productive: unit testing

In a post today in his blog, Larry O'Brien states: "...let's be clear that of all the things we know about software development, there are only two things that we know to be inherently highly productive: Well-treated talented programmers and iterative development incorporating client feedback"

I find it hard not to add unit testing to this list.

Of all the things that have changed in how I program during the last six or seven years, nothing comes close to unit testing in terms of making me more productive. In addition, it has made me a better programmer, because as I write code I am thinking about how to test it. And the changes that result are almost always improvements.

Binstock on Software