So Why is Spock Such a Big Deal?

Spock testing in a Java environment is all the rage. Russel Winder talks through the history of testing on the JVM and demonstrates why Spock is so groovy.

We will take as read the fact that all programmers know that testing is a good thing, and that they do it as a matter of course. Whether programmers use test-driven development (TDD), behaviour-driven development (BDD), and/or some other process, we will take as read that all programmers have good test coverage of their systems at all times.

So the only question is which testing framework?

For some platforms, there is very little argument about which test framework to use. For example, with Go and D there is a built-in framework that most people use. There tend to be extensions, but most programmers just use what is provided by default. With C++ though, as most ACCU members will know, there has been a long history of a plethora of frameworks. This article is though about the Java Platform. If you think this just means Java and JUnit, then… wrong.

A bit of history

Given that we are talking about the Java Platform milieu only here…

In the beginning (mid-1990s), Java was used for making Web browser plugins, and few plugin writers really cared much about testing. As Java started being used server-side, the development process known as TDD jumped from its Smalltalk roots to the rapidly expanding Java-verse. Kent Beck and Erich Gamma created JUnit based on the architecture of sUnit that Kent Beck had developed for use with Smalltalk in the early 1990s. Unlike other programming languages of the time, where the model was ‘everyone writes their own test framework’, the model in the Java-verse rapidly became ‘use JUnit’. JUnit just became an integral part of the Java-verse, treated almost as a part of the Java Platform – even though it was, and is, not.

Around 2003 though there was a stirring in the Java-verse: generics and annotations were coming to Java 5. Although this wasn’t a change of computational model, and so JUnit could work as it had ever done, annotations brought a whole new way of thinking about Java code and about test frameworks. Cédric Beust saw this as an opportunity and set about creating TestNG. This replaced the naming conventions and use of inheritance that was integral to the JUnit way of working, with the use of annotations, and it changed the way programmers implemented their tests.

JUnit remained in maintenance mode whilst TestNG rushed into the new Java 5 style of programming. Now there were two. However TestNG, the ‘new kid in town’ was having to fight the incumbent and entrenched JUnit for usage. For many, working with Java meant using Eclipse and JUnit came as standard, whereas TestNG was an ‘added extra’. Programmers had to do something to switch from JUnit to TestNG, and generally didn’t, even though TestNG brought new capabilities as well as a Java 5 way of working.

After what seemed like an age after the release of Java 5, a new JUnit appeared, JUnit4 – the old JUnit was relabelled JUnit3. In many, many ways, JUnit4 was just a copy of TestNG. Where JUnit4 and TestNG differ, TestNG is generally the better framework. The most obvious ‘grand difference’ was that JUnit was a unit testing framework whereas TestNG was a test framework covering unit, integration, smoke, and some system testing. However JUnit4 was JUnit and therefore, at least in many Java programmers’ minds, the one true Java test framework. More importantly though JUnit4 was seen as an upgrade from JUnit3 and so it was very easy for all the IDEs, and in particular Eclipse, to claim they were modern and hip with Java 5 by switching JUnit3 out and JUnit4 in.

Where else have we seen technical superiority ignored in the market?

So JUnit4 became established as the Java 5 style test framework, leaving TestNG as a minor player. Of course, some people didn’t bother to move from JUnit3 since they could see no benefit to the use of annotations rather than a naming convention and use of inheritance. These people argued that the switch from a naming convention to use of annotations didn’t actually bring any new capability: JUnit4 was not actually a functional improvement over JUnit3. TestNG brought integration and system testing mindsets yes, but most programmers still thought testing meant unit testing. The JUnit3 ‘die hards’ were, indeed are, wrong.

Things get groovy

Concurrent with the JUnit4 vs. TestNG vs. JUnit3 battle came the invention, development and rise of the Groovy programming language. ¹ Whereas Java is a statically-typed compiled language, Groovy is a dynamic language. Yes, Groovy is compiled to JVM bytecodes just as Python is compiled to PVM bytecodes, but a Groovy program is a dynamically typed system. Perhaps bizarrely, Groovy now has the capability of being statically type checked, and indeed fully statically compiled. This makes it a competitor to Java as a statically typed language as well as being a dynamic symbiote to the static Java.

In its dynamic language guise, Groovy is much closer to Smalltalk than Java ever can be. Algorithms, programming techniques, and idioms of Smalltalk are much easier to represent in Groovy than they are in Java. The JUnit3 way of working is completely natural in Groovy where it can be a little awkward in Java. Of course Groovy can work with JUnit4 and TestNG since it is symbiotic with Java, inter-working very easily with Java.

For a while Groovy used principally the JUnit3 approach, to the extent of integrating it directly into the runtime system via the GroovyTestCase class. Of course, JUnit4 and TestNG could be used, but Groovy arose in a fundamentally JUnit3 context, and the model of working fitted very well.

Then around 2007–2009 Abstract Syntax Tree Transformations (AST Transforms) came to Groovy, formally released in Groovy 1.6. These are very similar to what are called macros in other languages or annotations in Java. Many see them as ways of providing just the sort of thing that annotations and macros can provide. This misses some of the truly devious things that can be achieved with Groovy AST transforms. Peter Niederwieser, however, did not miss the potential: he proceeded to create Spock, based on the new AST transform capabilities with a desire to escape the straight-jackets that are JUnit3, JUnit4, and TestNG.

Testing by example

Clearly the best way of showing Spock’s, indeed any test framework’s, capabilities and comparing with other frameworks, e.g. JUnit3, JUnit4, and TestNG, is by example. For this we need some code that needs testing: code that is small enough to fit on the pages of this august journal, ² but which highlights some critical features of the test frameworks.

We need an example that requires testing, but that gets out of the way of the testing code because it is so trivial.

We need factorial.

Factorial is a classic example usually of the imperative vs. functional way of programming, and so is beloved of teachers of first year undergraduate programming courses. ³ I like this example though because it allows investigating techniques of testing, and allows comparison of test frameworks.

Factorial is usually presented via the recurrence relation:

However, this way of presenting the semantics of factorial is just the beginning of the tragedy that is most people’s first recursive (functional) implementation.

Being naïvely imperative

Let us completely avoid the whole recursive function thing for this article, since we are focusing on testing. ⁴ Let us instead consider what almost every programmer would write as an iterative (imperative) implementation using Java ⁵ see Listing 1.

package uk.org.russel.stuff;

  public class Factorial_Naïve {
    public final static Integer iterative(
        final Integer n) {
      Integer total = 1;
      for (Integer i = 2; i <= n; ++i) {
        total *= i;
      }
      return total;
    }
  }

Listing 1

We can imagine a programmer constructing the JUnit4 test as shown in Listing 2 and feeling very pleased with themselves.

package uk.org.russel.stuff;

import org.junit.Test;
import static
    org.junit.Assert.assertEquals;

import static
    uk.org.russel.stuff.Factorial_Naïve.iterative;

public class Test_Factorial_Naïve_JUnit4_Java {
  @Test  public void zero() {
    assertEquals(new Integer(1), iterative(0));
  }
  @Test  public void one() {
    assertEquals(new Integer(1), iterative(1));
  }
  @Test  public void seven() {
    assertEquals(new Integer(5040), iterative(7));
  }
}

Listing 2

There is so much wrong with these codes, it is difficult to know where to start – and switching to TestNG with this style of testing will not help. The two most obvious problems with this test are:

What happens for negative arguments? (Factorial is undefined for negative arguments.)
What happens for arguments greater than 13? (The above implementation will give the wrong answer.)

Point 1 is just highlighting the fact that most programmers tend to consider testing only the success modes of their code and fail to deal with the failure modes. Good QA people tend to immediately break things, exactly because they look at failure modes, which leads to tensions, sometimes animosity, but is a road that eventually leads to DevOps.

Point 2 is actually also about failure modes but is about underlying implementations rather than testing of the domain of the units. In this case it is about the fixed size of JVM integral types and the overflow that occurs. This means we must immediately give up using Integer and switch to BigInteger , how else can we deal with a function whose values are generally +#######+ big numbers. ⁶

Becoming less naïve

Listing 3 is something of a transliteration of the earlier implementation to using BigInteger . Overloading is employed to provide implementations for different argument types to try and fully cover the domain. Note that negative arguments are now dealt with.

package uk.org.russel.stuff;

import java.math.BigInteger;

public class Factorial {

  public final static BigInteger iterative(
      final Integer n) {
    if (n < 0) {
      throw new IllegalArgumentException(
"Argument must be a non-negative Integer.");
    }
    return iterative(BigInteger.valueOf(n));
  }

  public final static BigInteger iterative(
      final Long n) {
    if (n < 0l) {
      throw new IllegalArgumentException(
"Argument must be a non-negative Long.");
    }
    return iterative(BigInteger.valueOf(n));
  }

  public final static BigInteger iterative(
      final BigInteger n) {
    if (n.compareTo(BigInteger.ZERO) < 0) {
      throw new IllegalArgumentException(
"Argument must be a non-negative BigInteger.");
    }
    BigInteger total = BigInteger.ONE;
    if (n.compareTo(BigInteger.ONE)  > 0) {
      BigInteger i = BigInteger.ONE;
      while (i.compareTo(n) <= 0) {
        total = total.multiply(i);
        i = i.add(BigInteger.ONE);
      }
    }
    return total;
  }

}

Listing 3

I think we can all agree that writing code in Java working with BigInteger is somewhat less than pleasant.

Of course we must have some tests (see Listing 4). Which I guess is fine, well fine-ish, anyway.

package uk.org.russel.stuff;

import java.math.BigInteger;

public class Factorial {

  public final static BigInteger iterative(
      final Integer n) {
    if (n < 0) {
      throw new IllegalArgumentException(
"Argument must be a non-negative Integer.");
    }
    return iterative(BigInteger.valueOf(n));
  }

  public final static BigInteger iterative(
      final Long n) {
    if (n < 0l) {
      throw new IllegalArgumentException(
"Argument must be a non-negative Long.");
    }
    return iterative(BigInteger.valueOf(n));
  }

  public final static BigInteger iterative(
      final BigInteger n) {
    if (n.compareTo(BigInteger.ZERO) < 0) {
      throw new IllegalArgumentException(
"Argument must be a non-negative BigInteger.");
    }
    BigInteger total = BigInteger.ONE;
    if (n.compareTo(BigInteger.ONE)  > 0) {
      BigInteger i = BigInteger.ONE;
      while (i.compareTo(n) <= 0) {
        total = total.multiply(i);
        i = i.add(BigInteger.ONE);
      }
    }
    return total;
  }

}

Listing 4

Being Groovy

Instead of using Java for the test code, we can use Groovy code. Although Groovy is a dynamic language whereas Java is a statically typed one, Groovy is based on the exact same data model and so we can just access the JUnit4 features directly, as shown in Listing 5.

package uk.org.russel.stuff;

import org.junit.Test;
import static org.junit.Assert.assertEquals;

import java.math.BigInteger;

import static
  uk.org.russel.stuff.Factorial.iterative;

public class Test_Factorial_JUnit4_Java {

  @Test
  public void zero() {
    assertEquals(BigInteger.ONE, iterative(0)); }

  @Test
  public void one() {
    assertEquals(BigInteger.ONE, iterative(1)); }

  @Test
  public void seven() {
    assertEquals(BigInteger.valueOf(5040),
                 iterative(7)); }

  @Test(expected=IllegalArgumentException.class)
  public void minusOne() { iterative(-1); }

}

Listing 5

One could argue that there is little or no benefit accruing here to using Groovy rather than Java, even though being able to render the BigInteger literals more readably makes for a nicer read of the testing code. And there are no semicolons.

As previously there is little or no benefit to using TestNG compared to JUnit4 in this situation.

So why use Groovy at all?

Two obvious reasons spring to mind:

We can rewrite the Factorial implementation in Groovy: Groovy can be a very nice, statically-typed, compiled language simply by using the @CompileStatic AST transform. We could write the factorial implementations using Groovy code as shown in Listing 6, which produces the same results at fundamentally the same performance of the earlier Java code. Having BigInteger literals and the ability to define operators on types, ⁷ the code is much easier to read and much easier to maintain. Despite this superiority of Groovy, many think they have to use Java for production code.
We can use Spock.

package uk.org.russel.stuff

import groovy.transform.CompileStatic

@CompileStatic
class Factorial_Groovy {

  static BigInteger iterative(Integer n) {
    if (n < 0) {
      throw new IllegalArgumentException(
'Argument must be a non-negative Integer.')
    }
    iterative(n as BigInteger)
  }

  static BigInteger iterative(Long n) {
    if (n < 0) {
      throw new IllegalArgumentException(
        'Argument must be a non-negative Long.')
    }
    iterative(n as BigInteger)
  }

  static BigInteger iterative(BigInteger n) {
    if (n < 0G) {
      throw new IllegalArgumentException(
'Argument must be a non-negative BigInteger.')
    }
    def total = 1G
    if (n > 1G) { (2G..n).forEach{total *= it} }
    total
  }

}

Listing 6

Enter Spock

Let’s dive straight into an example: Listing 7 is a Spock version of the tests of the Java implementation of factorial.

package uk.org.russel.stuff

import spock.lang.Specification

import static
  uk.org.russel.stuff.Factorial.iterative

class Test_Factorial_Spock_Groovy 
  extends Specification {

  def zero() {
    expect:
    iterative(0) == 1G
  }

  def one() {
    expect:
    iterative(1) == 1G
  }

  def seven() {
    expect:
    iterative(7) == 5040G
  }

  def minusOne() {
    when:
    iterative(-1)
    then:
    thrown(IllegalArgumentException)
  }
}

Listing 7

Very Groovy. And very much a return to the sUnit/JUnit3 sort of thinking in that inheritance is used to deal with marking classes that are test code, and method names are important: Spock assumes all except some specific method names are test methods, feature method in Spock nomenclature. Have we lost anything by not using annotations to specify test methods? Not really. Have we gained anything using Groovy? Apart from a much nicer way of expressing BigInteger literals, arguably not – except that we can use Spock. Has Spock brought something to the case. Definitely. The whole naming and structuring of tests is revolutionized. Assuming methods are test methods cleans things up, but the real win is that Spock steps away from the traditional test method structure. Instead, Spock uses a block structuring of test methods to give much more of an obvious Arrange–Act–Assert structure. Labels introduce blocks of code. Expect blocks are sequences of Boolean expressions that are assertions about the state – a mix of act and assert. when/then block pairs provide an action separate from the assertion. In this last case we are seeing the Spock way of specifying that an exception is expected. If nothing else the code reads much more easily and enables both TDD- and BDD-style thinking about tests.

Of course, there is a lot more to Spock that makes it the framework of choice for Java and Groovy codebases – also possibly Scala, Ceylon, and Kotlin codebases. Let us delve into arguably the most important.

Getting parameterized: the Spock variant

The idea of writing one test method for each test case is fine in principle. Actually it is a very, very good idea. However the idea of manually writing one test method for each test case is clearly a very, very silly one. We should be getting the framework to write the methods for us given input of a table of test cases. Data-driven testing is a very good idea, and any framework that does not support this cleanly, with easy use, is clearly not fit for purpose.

TestNG has ‘data providers’ which work very well. JUnit has ‘parameterized tests’ which are a little less nice than TestNG data providers but can achieve more or less the same thing. ⁸

Listing 8 is an extended version of a test for the Java iterative factorial implementation, using some of the power of Spock. Here we can see the power associated with use of Groovy:

Method names can be arbitrary strings.
Operator definition allows a nice syntax for internal DSLs giving:
- a tabular structure of data (as in the first test method); and
- providing an iterable over which to iterate (as in the second method).

package uk.org.russel.stuff

import spock.lang.Specification
import spock.lang.Unroll

import static uk.org.russel.stuff.Factorial.iterative

class Test_Factorial_Spock_Parameterized_Groovy extends Specification {
  @Unroll
  def 'iterative(#i) succeeds'() {
    expect:
    iterative(i) == r
    where:
    i | r
    0 | 1G
    1 | 1G
    7 | 5040G
    12 | 479001600G
    20 | 2432902008176640000G
    40 | 815915283247897734345611269596115894272000000000
  }
  @Unroll
  def 'iterative(#i) throws exception'() {
    when:
    iterative(i)
    then:
    thrown(IllegalArgumentException)
    where:
    i << [-1, -2, -5, -10, -20, -100]
  }
}

Listing 8

In the second method the iterable need not be a literal, it can (and usually is) just a variable referring to a computed iterable. This allows for very powerful test-driven testing.

The Spock features are:

the where clause which enforces the iteration structure over the iterable providing data.
the @Unroll AST transform, which causes Spock to rewrite the code creating one test method per entry in the iterable using the method as a ‘template’.

So the code as written represents 12 test methods, with the name of each of them incorporating the value of the data that the method was generated for – this is what the #i in the method name does for us. Without the @Unroll the test still works but it is just a single test method with iteration – not as good as the situation with the @Unroll .

This is surely jump up and down for joy impressive?

Getting parameterized: the TestNG Variant

So as to ‘prove’ the Spock way of doing things is superior in all ways, it is necessary to show an alternative. To date we have seen JUnit4 codes, but with comments that ‘TestNG is better’. So now is the time for a TestNG example. Listing 9 is a test of the iterative factorial function using TestNG and its data providers.

package uk.org.russel.stuff;

import org.testng.annotations.DataProvider;
import org.testng.annotations.Test;
import static org.testng.Assert.assertEquals;

import java.math.BigInteger;

import static uk.org.russel.stuff.Factorial.iterative;

public final class Test_Factorial_TestNG_DataProvider_Java {
  @DataProvider
  private final Object[][] positiveData() {
    return new Object[][] {
        {0, BigInteger.valueOf(1)},
        {1, BigInteger.valueOf(1)},
        {7, BigInteger.valueOf(5040)},
        {12, BigInteger.valueOf(479001600)},
        {20, new BigInteger("2432902008176640000")},
        {40, new BigInteger("815915283247897734345611269596115894272000000000")}
    };
  }
  @DataProvider
  private final Object[][] negativeData() {
    return new Object[][]{{-1}, {-2}, {-5}, {-10}, {-20}, {-100}};
  }
  @Test(dataProvider = "positiveData")
  public void positiveArgumentShouldWork(final long n, final BigInteger expected) {
    assertEquals(iterative(n), expected);
  }
  @Test(dataProvider = "negativeData", expectedExceptions = {IllegalArgumentException.class})
  public void negativeArgumentShouldThrowException(final long n) { iterative(n); }
}

Listing 9

This creates 12 distinct test methods, just as the Spock version did. However, for me, there is just so much text here, especially compared to the Spock version. Java is a verbose language, and here it shows. Coding this TestNG code in Groovy doesn’t help that much because of the use of arrays and the annotations. Much as I used to love TestNG for testing, I have deserted its use for use of Spock.

Getting parameterized: the JUnit4 variant

I suggest we just do not go here. JUnit4 parameterized tests relies on use of public classes and so you have to have one test per file. In this case we would have to have two files (one for positive values, one for negative values) with most of the content the same. Let us leave this as an exercise for the reader. I can assure you that after just a short while, you will agree that the TestNG way is far superior to the JUnit4 way. The only moot point will be whether the TestNG approach is coming anywhere close to the readability, and efficacy of the Spock approach. Not so much a moot more a forgone conclusion. Spock long and prosper.

Mocking the code under test

Some people like to use mocks when unit testing, and perhaps a bit when integration tests. ⁹ Other claim that any use of mocks in any form of testing misses the point about what testing is and what testable software structuring is. We shall ignore this entire debate for the purposes of this article.

So why this little section? JUnit3, JUnit4, and TestNG have no notion of mock built in to the framework. Instead there is EasyMock, JMock, Mockito, an entire plethora of mocking frameworks, some of which are not at all bad. Spock though has absorbed directly, earlier work on mocks in a Groovy context: Groovy being a dynamic language, it is incredibly easy to do mocking, monkey patching, stubs, fakes, spies, etc., etc. So whilst mocking is a ‘big deal’ in Java, hence many sophisticated mocking frameworks using all sorts of (bizarre) reflection techniques, ¹⁰ mocking in a dynamic language is actually rather easy – but still benefits from a formalized framework, cf. unittest.mock in Python.

The point here is that dynamic languages are great languages for writing testing frameworks, whereas things can get rather complicated in static ones. Groovy is a splendid base for Spock, and Spock makes most excellent use of Groovy and its capabilities. ¹¹

Conclusions

I expect that you are already impressed by Spock and want to use it for all Java (and Groovy) code testing. Many people working on the JVM have had the Spock revelation, and I hope there will be more articles on Spock in the pages of this august journal. ¹² Certainly I have a few ideas for more articles, some of which will expand on the Spock theme.

Obviously the set of available testing frameworks associated with the JVM is much, much larger than I have set out here, there is ScalaTest, ScalaCheck, Specks,… the list goes on. In the main though people tend to use Scala frameworks for Scala code, Ceylon frameworks for Ceylon code, and by habit JUnit (or TestNG) for Java code. Many are though now using Spock for any Java or Groovy code. The point here is that Groovy and Java have a special relationship in that Groovy uses the Java data model directly whereas Scala, Ceylon, Kotlin do not – though these other languages are able to inter-work with Java easily (so as to access the Java Platform in its entirety), but there is an adaption layer. This is not the case with Groovy. Thus Spock, JUnit TestNG are in direct competition for testing Java and Groovy code. For me, Spock wins, hands down. Many others believe the same thing.

Places to look

This is a list of links (checked on 2 Jan 2016) of places for further information about Spock and the other technologies mentioned in this article:

Java’s homes: https://www.java.com, http://openjdk.java.net/
JUnit’s home: http://junit.org/
TestNG’s home: http://testng.org/
Groovy’s home: http://www.groovy-lang.org/
Spock’s home: http://spockframework.org still redirects to the now defunct Googlecode project area.
Spock’s documentation is at http://docs.spockframework.org/ which redirects to a GitHub Pages area.

The project is active at GitHub https://github.com/spockframework

Acknowledgements and thanks

Thanks to Frances Buontempo and the anonymous reviewers for various comments and feedback on an earlier version of this article. All the typos were fixed, but that doesn’t mean there are none left! Most of the points of content led to updates to the article, but one or two I chose not to take on board. The ‘ignored’ topics raised lead to quite long points, that may end up as short articles in the future.

Groovy is now (2015 Q4) a top-level Apache project, and is properly called Apache Groovy.
This is not though an annual magazine sent out only in August.
Though with the changes to UK school curriculum in 2014 of IT to computing, this example may well have to move down the age scale.
I shall reserve the right to rant about this in another article.
We will ignore the whole ‘Integer’ vs. ‘int’ thing for the purposes of this article, which is about testing not benchmarking.
Note that switching from ‘Integer’ to ‘Long’ serves no useful purpose other than raising the point of failure from arguments greater than 13 to arguments greater than 20.
We leave for another article a rant about how excluding operator definition from Java may not actually have been as a good a programming language design choice as the Project Green people thought in the early 1990s.
For anyone trying to undertake data-driven testing, TestNG data providers are a much nicer tool than JUnit4 parameterized tests. This is a good reason for using TestNG over JUnit4. Of course Spock is even better than TestNG, so the only choice is Spock.
Anyone found using mocks as part of what they claim is system or end-to-end testing, clearly need some re-education.
Java’s reflection system exists, but is not really that good.
The real reason for this section is to be a bit of a tease for a future article.
I think I already did the August ‘joke’.