Showing posts with label Apache Commons. Show all posts
Showing posts with label Apache Commons. Show all posts

Monday, August 20, 2018

Apache Commons ArrayUtils.toString(Object) versus JDK Arrays.toString(Object)

Apache Commons Lang provides an ArrayUtils class that includes the method toString(Object) that "Outputs an array as a String." In this post, I look at situations in which this method might still be useful when the JDK provides the Arrays.toString(Object[]) method [and several overloaded versions of that method on the Arrays class for arrays of primitive types].

At one point, a reason to use the Apache Commons Lang ArrayUtils.toString(Object) method might have been that there was no alternative provided by the JDK. Arrays.toString(Object[]) was introduced with J2SE 5 (late 2004) where as Apache Commons Lang has had ArrayUtils.toString(Object) since at least Lang 2.0 (late 2003). Although there's just over a year's difference between those releases, many organizations are more willing to upgrade libraries than JDK versions, so it's possible some organizations elected to use the Lang API because they had not yet adopted JDK even after its General Availability release. By today, however, it's likely only a very, very small percentage of Java deployments are not using JDK 5 or later, so this no longer seems to be a reason to use the Apache Commons Lang method instead of the JDK method in newly written code.

Another reason Apache Commons Lang's ArrayUtils.toString(Object) might be selected over the JDK's Arrays.toString(Object[]) is the format of the string constructed from the array of elements. This doesn't seem like a very compelling motivation because the respective outputs are not that different.

For my examples in this post, I'm assuming an array of Strings defined like this:

/** Array of {@code String}s used in these demonstrations. */
private static final String[] strings
   = {"Dustin", "Inspired", "Actual", "Events"};

When the array defined above is passed to the JDK's Arrays.toString(Object[]) and to Apache Commons Lang's ArrayUtils.toString(Object), the respectively generated String representation of the array by each is compared below.

JDK Arrays.toString(Object[]) vs. ACL ArrayUtils.toString(Object)
Comparing Single String Output of Typical Java Array
Input Array JDK Arrays.toString(Object[]) Apache Commons Lang ArrayUtils.toString(Object)
{"Dustin", "Inspired", "Actual", "Events"} [Dustin, Inspired, Actual, Events] {Dustin,Inspired,Actual,Events}

The table illustrates that both methods' generated Strings are very similar in substantive terms, but there are cosmetic differences in their output. The JDK version surrounds the array contents of the generated string with square braces while the Apache Commons Lang version surrounds the array contents with curly braces. The other obvious difference is that the JDK array elements are presented in the string with a delimiter that consists of a comma and space while the Apache Commons Lang representation of the array elements delimits those elements with just a comma and no space.

If the Apache Commons Lang ArrayUtils.toString(Object) allowed the "style" of its output to be customized, that might strengthen the argument that its style of representation is an advantage. However, as can be seen in the method's implementation, it always uses ToStringStyle.SIMPLE_STYLE.

Another minor difference between the two approaches being discussed here for presenting a Java array as a single String representation is the handling of null passed to the methods. Both methods return a non-null, non-empty String when passed null, but the contents of that string differ depending on the implementation invoked. The JDK's Arrays.toString(Object[]) returns the string "null" when null is passed to it while the Apache Commons Lang's ArrayUtils.toString(Object) returns the string "{}" when null is passed to it.

The "{}" returned by ArrayUtils.toString(Object) is easy to understand and is some ways is more aesthetically pleasing for presenting the string version of null provided for an array. However, it could be argued that the "{}" implies an empty array instead of a null. The Apache Commons Language version does indeed return the same "{}" string for an empty array as well (and that matches exactly how one would declare an empty array with an array initializer). The JDK's Arrays.toString(Object[]) method provides the string "null" for null input and provides "[]" for an empty array input.

It could be argued that the JDK approach of presenting the string version of a null array parameter as "null" is more consistent with what a Java developer might expect given other similar situations in which a String representation of null is provided. Both the implicit String conversion of null (see Section 5.1.11 of Java SE 10 Language Specification for more details) and the String returned by calling String.valueOf(Object) on a null parameter present the string "null". The implicit String conversion of null for an array type results in the "null" string as well.

Another difference between ArrayUtils.toString(Object) and Arrays.toString(Object[]) is the type of the parameter expected by each method. The ArrayUtils.toString(Object) implementation expects an Object and so accepts just about anything one wants to provide to it. The JDK's Arrays.toString(Object[]), forces an array (or null) to be provided to it and non-array types cannot be provided to it. It's debatable which approach is better, but I personally generally prefer more strictly typed APIs that only allow what they advertise (help enforce their contract). In this case, because the desired functionality is to pass in an array and have a String representation of that array returned, I prefer the more definitively typed method that expects an array. On the other hand, one might argue that they prefer the method that accepts a general Object because then any arbitrary object (such as a Java collection) can be passed to the method.

In general, I don't like the idea of using a method on a class called ArrayUtils to build a String representation of anything other than an array. I have seen this method used on Java collections, but that is unnecessary as Java collections already provide reasonable toString() implementations (arrays cannot override Object's toString() and that's why they require these external methods to do so for them). It's also unnecessary to use ArrayUtils.toString(Object) to ensure that a null is handled without a NullPointerException because Objects.toString(Object) and String.valueOf(Object) handle that scenario nicely and don't pretend to be an "array" method (in fact, they don't help much with arrays).

The difference in parameter expected by each implementation that provides a String representation of a provided array leads into the motivation that I believe is most compelling for choosing the third party library-provided ArrayUtils.toString(Object) over the built-in Arrays.toString(Object[]), but it's for one specific case that this is a significant advantage: multi-dimensional Java arrays. The JDK's Arrays.toString(Object[]) is designed for a single-dimensional Java array only. The Apache Commons Lang ArrayUtils.toString(Object), however, nicely supports presenting a single String representation even of multi-dimensional Java arrays. Its method-level Javadoc advertises this advantage: "Multi-dimensional arrays are handled correctly, including multi-dimensional primitive arrays." To illustrate the differences in these methods' output for a multi-dimensional array, I'll be using this ridiculously contrived example:

/** Two-dimensional array of {@code String}s used in demonstrations. */
private static final String[][] doubleDimStrings
   = {{"Dustin"}, {"Inspired", "Actual", "Events"}};

The output from passing that two-dimensional array of Strings to the respective methods is shown in the following table.

JDK Arrays.toString(Object[]) vs. ACL ArrayUtils.toString(Object)
Comparing Single String Output of Two-Dimensional Array
Input Array JDK Arrays.toString(Object[]) Apache Commons Lang ArrayUtils.toString(Object)
{{"Dustin"}, {"Inspired", "Actual", "Events"}} [[Ljava.lang.String;@135fbaa4, [Ljava.lang.String;@45ee12a7] {{Dustin},{Inspired,Actual,Events}}

The table just shown demonstrates that the JDK's Arrays.toString() is not particularly helpful once a Java array has more than a single dimension. The Apache Commons Lang's ArrayUtils.toString(Object) is able to present a nice single String representation even of the multi-dimensional array.

I have intentionally steered clear of comparing the two alternatives covered in this post in terms of performance because I have rarely found the performance difference of these types of methods to matter in my daily work. However, if this functionality was needed in a case where every millisecond counted, then it might be worth trying each in realistic scenarios to choose the one that works best. My intuition tells me that the JDK implementation would generally perform better (especially if working with arrays of primitives and able to use one of Arrays's overloaded toString() methods intended for primitives), but my intuition has been wrong before when it comes to questions of performance.

The following table summarizes my post's discussion on characteristics of Apache Commons Lang's (version 3.7) ArrayUtils.toString(Object) and the JDK's (JDK 10) Arrays.toString(Object[]).

JDK Arrays.toString(Object[]) vs. ACL ArrayUtils.toString(Object)
Input Type JDK Arrays.toString(Object[]) Apache Commons Lang ArrayUtils.toString(Object)
Single-Dimension Array "[Dustin, Inspired, Actual, Events]" "{Dustin,Inspired,Actual,Events}"
Double-Dimension Array "[[Ljava.lang.String;@135fbaa4, [Ljava.lang.String;@45ee12a7]" "{{Dustin},{Inspired,Actual,Events}}"
null "null" "{}"
Empty Single-Dimension Array "[]" "{}"

This post has looked at some possible motivations for choosing the third-party Apache Commons Lang's ArrayUtils.toString(Object) over the built-in JDK's Arrays.toString(Object[]) for generating single String representations of arrays. I find the most obvious situation to choose this particular third-party library over the built-in alternative is for multi-dimensional arrays.

Monday, December 23, 2013

Determining Presence of Characters or Integers in String with Guava CharMatcher and Apache Commons Lang StringUtils

A recent Reddit post asked the question, "Is there a predefined method for checking if a variable value contains a particular character or integer?" That question-based title was also asked a different way, "A method or quick way for checking if a variable contains any numbers say or ('x',2,'B') like a list?" I am not aware of any single method call within the standard SDK libraries to do this (other than using a carefully designed regular expression), but in this post I answer those questions using Guava's CharMatcher and Apache Common Lang's StringUtils class.

Java's String class does have a contains method that can be used to determine if a single character is contained in that String or if a certain explicitly specified sequence of characters is contained in that String. However, I'm not aware of any way in a single executable statement (not counting regular expressions) to ask Java if a given String contains any of a specified set of characters without needing to contain all of them or contain them in the specified order. Both Guava and Apache Commons Lang do provide mechanisms for just this thing.

Apache Commons Lang (version 3.1 used in this post) provides overloaded StringUtils.containsAny methods that easily accomplish this request. Both overloaded versions expect the first parameter passed to them to be the String (or more precisely, the CharSequence) to be tested to see if it contains a given letter or integer. The first overloaded version, StringUtils.containsAny(CharSequence, char...) accepts zero or more char elements to be tested to see if any of them are in the String represented by the first argument. The second overloaded version, StringUtils.containsAny(CharSequence, CharSequence) expects the second argument to contain all the potential characters to be searched for in the first argument as a single sequence of characters.

The following code listing demonstrates using this Apache Commons Lang approach to determine if a given string contains certain characters. All three statements will pass their assertions because "Inspired by Actual Events" does include 'd' and 'A', but not 'Q'. Because it is only necessary for any one of the provided characters to be present to return true, the first two assertions of true pass. The third assertion passes because the string does NOT contain the only provided letter and so the negative is asserted.

Determining String Contains A Character with StringUtils
private static void demoStringContainingLetterInStringUtils()
{
   assert StringUtils.containsAny("Inspired by Actual Events", 'd', 'A');  // true: both contained
   assert StringUtils.containsAny("Inspired by Actual Events", 'd', 'Q');  // true: one contained
   assert !StringUtils.containsAny("Inspired by Actual Events", 'Q');      // true: none contained (!)
}

Guava's CharMatcher can also be used in a similar manner as demonstrated in the next code listing.

Determining String Contains A Character with CharMatcher
private static void demoStringContainingLetterInGuava()
{
   assert CharMatcher.anyOf("Inspired by Actual Events").matchesAnyOf(new String(new char[]{'d', 'A'}));
   assert CharMatcher.anyOf("Inspired by Actual Events").matchesAnyOf(new String (new char[] {'d', 'Q'}));
   assert !CharMatcher.anyOf("Inspired by Actual Events").matchesAnyOf(new String(new char[]{'Q'}));
}

What if we specifically want to make sure at least one character in a given String/CharSequence is a numeric (integer), but we cannot be guaranteed that the entire string is numerics? The same approach as used above with Apache Commons Lang's StringUtils can be applied here with the only change being that the provided letters to be matched are the numeric digits 0 through 9. This is shown in the next screen snapshot.

Determining String Contains a Numeral with StringUtils
private static void demoStringContainingNumericDigitInStringUtils()
{
   assert !StringUtils.containsAny("Inspired by Actual Events", "0123456789");
   assert StringUtils.containsAny("Inspired by Actual Events 2013", "0123456789");
}

Guava's CharMatcher has a really slick way of expressing this question of whether a provided sequence of characters includes at least one numeral. This is shown in the next code listing.

Determining String Contains a Numeral with CharMatcher
private static void demoStringContainingNumericDigitInGuava()
{
   assert !CharMatcher.DIGIT.matchesAnyOf("Inspired by Actual Events");
   assert CharMatcher.DIGIT.matchesAnyOf("Inspired by Actual Events 2013");
}

CharMatcher.DIGIT provides a concise and expressive approach to specifying that we want to match a digit. Fortunately, CharMatcher provides numerous other public fields similar to DIGIT for convenience in determining if strings contain other types of characters.

For completeness, I have included the single class containing all of the above examples in the next code listing. This class's main() function can be run with the -enableassertions (or -ea) flag set on the Java launcher and will complete without any AssertionErrors.

StringContainsDemonstrator.java
package dustin.examples.strings;

import com.google.common.base.CharMatcher;
import static java.lang.System.out;

import org.apache.commons.lang3.StringUtils;

/**
 * Demonstrate Apache Commons Lang StringUtils and Guava's CharMatcher. This
 * class exists to demonstrate Apache Commons Lang StringUtils and Guava's
 * CharMatcher support for determining if a particular character or set of
 * characters or integers is contained within a given
 * 
 * This class's tests depend on asserts being enabled, so specify the JVM option
 * -enableassertions (-ea) when running this example.
 * 
 * @author Dustin
 */
public class StringContainsDemonstrator
{
   private static final String CANDIDATE_STRING = "Inspired by Actual Events";
   private static final String CANDIDATE_STRING_WITH_NUMERAL = CANDIDATE_STRING + " 2013";
   private static final char FIRST_CHARACTER = 'd';
   private static final char SECOND_CHARACTER = 'A';
   private static final String CHARACTERS = new String(new char[]{FIRST_CHARACTER, SECOND_CHARACTER});
   private static final char NOT_CONTAINED_CHARACTER = 'Q';
   private static final String NOT_CONTAINED_CHARACTERS = new String(new char[]{NOT_CONTAINED_CHARACTER});
   private static final String MIXED_CONTAINED_CHARACTERS = new String (new char[] {FIRST_CHARACTER, NOT_CONTAINED_CHARACTER});
   private static final String NUMERIC_CHARACTER_SET = "0123456789";

   private static void demoStringContainingLetterInGuava()
   {
      assert CharMatcher.anyOf(CANDIDATE_STRING).matchesAnyOf(CHARACTERS);
      assert CharMatcher.anyOf(CANDIDATE_STRING).matchesAnyOf(MIXED_CONTAINED_CHARACTERS);
      assert !CharMatcher.anyOf(CANDIDATE_STRING).matchesAnyOf(NOT_CONTAINED_CHARACTERS);
   }

   private static void demoStringContainingNumericDigitInGuava()
   {
      assert !CharMatcher.DIGIT.matchesAnyOf(CANDIDATE_STRING);
      assert CharMatcher.DIGIT.matchesAnyOf(CANDIDATE_STRING_WITH_NUMERAL);
   }

   private static void demoStringContainingLetterInStringUtils()
   {
      assert StringUtils.containsAny(CANDIDATE_STRING, FIRST_CHARACTER, SECOND_CHARACTER);
      assert StringUtils.containsAny(CANDIDATE_STRING, FIRST_CHARACTER, NOT_CONTAINED_CHARACTER);
      assert !StringUtils.containsAny(CANDIDATE_STRING, NOT_CONTAINED_CHARACTER);
   }

   private static void demoStringContainingNumericDigitInStringUtils()
   {
      assert !StringUtils.containsAny(CANDIDATE_STRING, NUMERIC_CHARACTER_SET);
      assert StringUtils.containsAny(CANDIDATE_STRING_WITH_NUMERAL, NUMERIC_CHARACTER_SET);
   }

   /**
    * Indicate whether assertions are enabled.
    * 
    * @return {@code true} if assertions are enabled or {@code false} if
    *    assertions are not enabled (are disabled).
    */
   private static boolean areAssertionsEnabled()
   {
      boolean enabled = false; 
      assert enabled = true;
      return enabled;
   }

   /**
    * Main function for running methods to demonstrate Apache Commons Lang
    * StringUtils and Guava's CharMatcher support for determining if a particular
    * character or set of characters or integers is contained within a given
    * String.
    * 
    * @param args the command line arguments Command line arguments; none expected.
    */
   public static void main(String[] args)
   {
      if (!areAssertionsEnabled())
      {
         out.println("This class cannot demonstrate anything without assertions enabled.");
         out.println("\tPlease re-run with assertions enabled (-ea).");
         System.exit(-1);
      }

      out.println("Beginning demonstrations...");
      demoStringContainingLetterInGuava();
      demoStringContainingLetterInStringUtils();
      demoStringContainingNumericDigitInGuava();
      demoStringContainingNumericDigitInStringUtils();
      out.println("...Demonstrations Ended");
   }
}

Guava and Apache Commons Lang are very popular with Java developers because of the methods they provide beyond what the SDK provides that Java developers commonly need. In this post, I looked at how Guava's CharMatcher and Apache Commons Lang's StringUtils can be used to concisely but expressively test to determine if any of a set of specified characters exists within a provided string.

Monday, March 19, 2012

ToString: Hexadecimal Representation of Identity Hash Codes

I have blogged before on the handy Apache Commons ToStringBuilder and I was recently asked what the seemingly cryptic text appearing in the generated String output constitutes. The colleague asking the question correctly surmised that what he was looking at was a hash code, but it did not match his instance's hash code. I explained that ToStringBuilder adds the identity hash code in hexadecimal format to its output. In this post, I look in more depth at ToStringBuilder's use of the identity hash code presented in hexadecimal format. Even those not using ToStringBuilder might find this information useful as Java's standard Object.toString() also uses a hexadecimal representation of what is effectively its identity hash code.

I'll begin with a very simple Java example using ToStringBuilder. This example uses three Java classes (Person.java, Employee.java, and Main.java) that are shown next.

Person.java
package dustin.examples;

import org.apache.commons.lang.builder.ToStringBuilder;

/**
 * A simple representation of a Person intended only to demonstrate Apache
 * Commons ToStringBuilder.
 * 
 * @author Dustin
 */
public class Person
{
   /** Person's last name (surname). */
   protected final String lastName;

   /** Person's first name. */
   protected final String firstName;

   /**
    * Parameterized constructor for obtaining an instance of Person.
    * 
    * @param newLastName Last name of new Person instance.
    * @param newFirstName First name of new Person instance.
    */
   public Person(final String newLastName, final String newFirstName)
   {
      this.lastName = newLastName;
      this.firstName = newFirstName;
   }

   /**
    * Provide String representation of this Person instance.
    * @return My String representation.
    */
   @Override
   public String toString()
   {
      final ToStringBuilder builder = new ToStringBuilder(this);
      builder.append("First Name", this.firstName);
      builder.append("Last Name", this.lastName);
      return builder.toString();
   }
}
Employee.java
package dustin.examples;

import java.util.Objects;
import org.apache.commons.lang.builder.ToStringBuilder;

/**
 * Simple class intended to demonstrate ToStringBuilder.
 * 
 * @author Dustin
 */
public class Employee extends Person
{
   /** Employee ID. */
   private final String employeeId;

   /**
    * Parameterized constructor for obtaining an instance of Employee.
    * 
    * @param newLastName Last name of the employee.
    * @param newFirstName First name of the employee. 
    * @param newId Employee's employee ID.
    */
   public Employee(
      final String newLastName, final String newFirstName, final String newId)
   {
      super(newLastName, newFirstName);
      this.employeeId = newId;
   }

   /**
    * Provide String representation of me.
    *
    * @return My String representation.
    */
   @Override
   public String toString()
   {
      final ToStringBuilder builder = new ToStringBuilder(this);
      builder.appendSuper(super.toString());
      builder.append("Employee ID", this.employeeId);
      return builder.toString();
   }

   /**
    * Simple object equality comparison method.
    * 
    * @param obj Object to be compared to me for equality.
    * @return {@code true} if the provided object and I are considered equal.
    */
   @Override
   public boolean equals(Object obj)
   {
      if (obj == null)
      {
         return false;
      }
      if (getClass() != obj.getClass())
      {
         return false;
      }
      final Employee other = (Employee) obj;
      if (!Objects.equals(this.employeeId, other.employeeId))
      {
         return false;
      }
      return true;
   }

   /**
    * Hash code for this instance.
    * 
    * @return My hash code.
    */
   @Override
   public int hashCode()
   {
      int hash = 3;
      hash = 19 * hash + Objects.hashCode(this.employeeId);
      return hash;
   }
}
Main.java (Version 1)
package dustin.examples;

import static java.lang.System.out;

/**
 * Simple class enabling demonstration of ToStringBuilder.
 * 
 * @author Dustin
 */
public class Main
{
   /**
    * Main function for running Java examples with ToStringBuilder.
    * 
    * @param args the command line arguments
    */
   public static void main(String[] args)
   {
      final Person person = new Person("Washington", "Willow");
      out.println(person);
      final Employee employee = new Employee("Lazentroph", "Frank", "56");
      out.println(employee);
   }
}

The above example is simple and its output is shown next:

The output depicted above shows the String in question printed for both instance's output generated by ToStringBuilder. The String representation of the instance of Person class includes the String "1f5d386" and the String representation of the instance of Employee class includes the String "1c9b9ca". These strings are the hexadecimal representation of each object's identity hash code.

The strings "1f5d386" and "1c9b9ca" do not look like the integer hash codes many of us are used to seeing because of their hexadecimal representation. The Integer.toHexString(int) methods [available since JDK 1.0.2] is a convenience method for printing an integer in hexadecimal format and can be used to convert "normal" hash codes to see if they match those generated by ToStringBuilder. I have added calls to this method on the instances' hash codes in the new version of the Main class.

Main.java (Version 2)
package dustin.examples;

import static java.lang.System.out;

/**
 * Simple class enabling demonstration of ToStringBuilder.
 * 
 * @author Dustin
 */
public class Main
{
   /**
    * Main function for running Java examples with ToStringBuilder.
    * 
    * @param args the command line arguments
    */
   public static void main(String[] args)
   {
      final Person person = new Person("Washington", "Willow");
      out.println(person);
      out.println("\tHash Code (ten): " + person.hashCode());
      out.println("\tHash Code (hex): " + Integer.toHexString(person.hashCode()));

      final Employee employee = new Employee("Lazentroph", "Frank", "56");
      out.println(employee);
      out.println("\tHash Code (ten): " + employee.hashCode());
      out.println("\tHash Code (hex): " + Integer.toHexString(employee.hashCode()));
   }
}

Executing the above leads to the following output:

As the output indicates, the hexadecimal representation of the hash code for the Person instance does indeed match that shown in the ToStringBuilder-generated String for that instance. However, the same cannot be said for the Employee instance. The difference is that the Person class does not override the hashCode() method and so uses the identity hash code by default while the Employee class does override its own hashCode() (and therefore being different than the identity hash code).

The third version of Main outputs the identity hash code using System.identityHashCode(Object) [discussed in further detail in my blog post Java's System.identityHashCode].

Main.java (Version 3)
package dustin.examples;

import static java.lang.System.out;

/**
 * Simple class enabling demonstration of ToStringBuilder.
 * 
 * @author Dustin
 */
public class Main
{
   /**
    * Main function for running Java examples with ToStringBuilder.
    * 
    * @param args the command line arguments
    */
   public static void main(String[] args)
   {
      final Person person = new Person("Washington", "Willow");
      out.println(person);
      out.println("\tHash Code (ten): " + person.hashCode());
      out.println("\tHash Code (hex): " + Integer.toHexString(person.hashCode()));
      out.println("\t\tIdentity Hash (ten): " + System.identityHashCode(person));
      out.println("\t\tIdentity Hash (hex): " + Integer.toHexString(System.identityHashCode(person)));

      final Employee employee = new Employee("Lazentroph", "Frank", "56");
      out.println(employee);
      out.println("\tHash Code (ten): " + employee.hashCode());
      out.println("\tHash Code (hex): " + Integer.toHexString(employee.hashCode()));
      out.println("\t\tIdentity Hash (ten): " + System.identityHashCode(employee));
      out.println("\t\tIdentity Hash (hex): " + Integer.toHexString(System.identityHashCode(employee)));
   }

With this in place, we can now compare the the identity hash code to the string generated by ToStringBuilder.

The last example definitively demonstrates that ToStringBuilder includes the hexadecimal representation of the system identity hash code in its generated output. If one wants to use the hexadecimal representation of the overridden hash code rather than of the identity hash code, an instance of ToStringStyle (typically an instance of StandardToStringStyle) can be used and the method setUseIdentityHashCode(boolean) can be invoked with a false parameter. This instance of ToStringStyle can then be passed to the ToStringBuilder.setDefaultStyle(ToStringStyle) method.

As a side note, the equals(Object) and hashCode() methods in the Employee class shown above were generated automatically by NetBeans 7.1. I was happy to see that, with my source version of Java for that project specified as JDK 1.7, this automatic generation of these two methods took advantage of the Objects class.

I have used ToStringBuilder-generated output throughout this post to facilitate discussion of hexadecimal representations of identity hash codes, but I could have simply used the JDK's own built-in "default" Object.toString() implementation for the same purpose. In fact, the Javadoc even advertises this:

The toString method for class Object returns a string consisting of the name of the class of which the object is an instance, the at-sign character `@', and the unsigned hexadecimal representation of the hash code of the object. In other words, this method returns a string equal to the value of:

getClass().getName() + '@' + Integer.toHexString(hashCode())

The only reason I did not use this example to begin with is that I almost always override the toString() method in my classes and do not get this "default" implementation. However, when I use ToStringBuilder to implement my overridden toString() methods, I do see these hexadecimal representations. I am likely to reduce my use of ToStringBuilder as I increase my use of Objects.toString().

Many of us don't think about hexadecimal representations or identity hash codes in our daily Java work. In this blog post, I have used ToStringBuilder's output as an excuse for looking a little closer at these two concepts. Along the way, I also briefly looked at the Integer.toHexString(Object) method, which is useful for printing numbers in their hexadecimal representation. Knowing about Java's support for hexadecimal representation is important because it does show up in toString() output, in labeling of colors, memory addresses, and in other places.

Thursday, September 1, 2011

Checking for Null or Empty or White Space Only String in Java

.NET Framework 4 introduces a new method on its String class called IsNullOrWhiteSpace that checks whether a provided String is null, empty, or consists only of "white space." This handy method is in addition to the method IsNullOrEmpty that has been available since .NET 2. These potentially very useful (and commonly used) methods are not part of Java's standard JDK String, but in this post I look at how Apache Commons Lang and Guava provide methods similar to these or from which methods similar to these can be easily written.

A typical "standard Java" approach for detecting whether a String is null, is empty, or consists solely of white space is as shown in the next code listing.

Java Approach
/**
 * Demonstrate checking for String that is not null, not empty, and not white
 * space only using standard Java classes.
 *
 * @param string String to be checked for not null, not empty, and not white
 *    space only.
 * @return {@code true} if provided String is not null, is not empty, and
 *    has at least one character that is not considered white space.
 */
public static boolean isNotNullNotEmptyNotWhiteSpaceOnlyByJava(
   final String string)
{
   return string != null && !string.isEmpty() && !string.trim().isEmpty();
}

In the above example, one might compare length of Strings rather than call the isEmpty() method. There are other approaches as well including using regular expressions. Also, the code could be made more concise by not having the individual check for emptiness and simply doing that via the trim().isEmpty() call.

The Google Guava library can also be used effectively for determining a non-null, not empty String with at least one non-whitespace character. An example of that is shown next.

Guava Approach
// import com.google.common.base.Strings;
/**
 * Demonstrate checking for String that is not null, not empty, and not white
 * space only using Guava.
 *
 * @param string String to be checked for not null, not empty, and not white
 *    space only.
 * @return {@code true} if provided String is not null, is not empty, and
 *    has at least one character that is not considered white space.
 */
public static boolean isNotNullNotEmptyNotWhiteSpaceOnlyByGuava(final String string)
{
   return !Strings.isNullOrEmpty(string) && !string.trim().isEmpty();
}

The Guava approach above uses the "standard Java" approach for determining that the String is not white space only. However, Guava provides the convenient Strings.isNullOrEmpty(String) static method for determining if a given String is null or empty.

The example using Apache Commons Lang is simplest and most concise approach as depicted in the next code listing.

Apache Commons Lang Approach
// import org.apache.commons.lang.StringUtils;
/**
 * Demonstrate checking for String that is not null, not empty, and not white
 * space only using Apache Commons Lang classes.
 *
 * @param string String to be checked for not null, not empty, and not white
 *    space only.
 * @return {@code true} if provided String is not null, is not empty, and
 *    has at least one character that is not considered white space.
 */
public static boolean isNotNullNotEmptyNotWhiteSpaceOnlyByCommons(
   final String string)
{
   return StringUtils.isNotBlank(string);
}

With Apache Commons Lang, one simple call does it all! The StringUtils.isNotBlank(String) static method does exactly what we wanted in this particular case: check a String to ensure it is not null, not empty, and not white space only. Its Javadoc documentation says as much: 'Checks if a String is not empty (""), not null and not whitespace only.'

The above approaches and variations on the above approaches can be used in Java to determine if a String is not null, not empty, and not white space only. They are easy to employ and the Apache Commons Lang approach is particularly concise. However, it would be nice to have this support as a standard part of the Java String. It is difficult for me to finish a post without talking about Groovy, so now I transition to the ability to use Groovy's dynamic support to "add" methods to Java's String class to support this check.

The Groovy code in the next code listing shows how an instance method could be added to String to check for these conditions. In this case, I'm borrowing from Apache Commons Lang's name for the method and using isNotBlank() for the injected method name.

Using metaClass.methodMissing to 'Add' isNotBlank() Method to String
java.lang.String.metaClass.methodMissing=
{ String name, args ->
   if (name == "isNotBlank")
   {
      // Do NOT need to check for null because this method could not have been
      // invoked on a null String. Groovy's GDK extension of String has the
      // isAllWhitespace() method that returns {@code true} for all white space
      // including empty String.
      return !delegate.isAllWhitespace()
   }
}

The above Groovy snippet will inject an "isNotBlank()" method on String instances when invocations of a method of that name are not resolved (missing method). I don't have a null check because an instance method, by its nature, will only work on a non-null instance. The reason the methods discussed previously checked for null is that they were static methods working on a provided instance (and not themselves). With Groovy clients and the safe navigation operator, this is not as much of a deal-breaker as it might be in Java. The example above also demonstrates a method that Groovy's GDK extension of String provides: isAllWhitespace().

An advantage of the above instance-level approach is that we can ask the String itself if it is empty of white space. As you may have noticed, one did not need to intercept String in Groovy to implement this method as the isAllWhitespace() method Groovy provides on the GDK extension of String actually counts an empty String as being all white space.

Suppose that we want to use a static method in Groovy like the Java examples shown earlier so that null can be handled as well. Because this is Groovy, we could simply use the Apache Commons Lang class or the Guava class just as we did in Java. For realistic purposes, that would be a good approach. But for showing off Groovy's grooviness, the next approach is preferable. In this case, a new static method is defined on String that implements the check we have seen above. This enables Groovy client code to call a static method directly on String itself for the check.

Using metaClass.static to 'Add' New Static Method to String
java.lang.String.metaClass.static.isNotNullNotEmptyNotWhitespaceOnly=
{ String string ->
   return string != null && !string.isAllWhitespace()
}

This last example makes it possible to call String.isNotNullNotEmptyNotWhitespaceOnly(String) on any String from Groovy code to check it for being not null, not empty, and not white space only. It also demonstrates how to inject a static method into an existing object in Groovy.


Conclusion

It would be a minor added convenience if standard Java had a class and/or methods added to check Strings for more common conditions. Java 7 has added the Objects class for performing some very common functionality on Objects, so a new class called Strings or StringUtils might do the same thing for Java Strings.