Benefits of Well Known Interfaces in Closed Source Code

Designing a good API is a significant challenge. Arun Saha suggests taking inspiration from outside.

The availability of a high quality data structure library is a necessary ingredient for the success and timely completion of any software project. It allows the programmers to focus on the problem domain rather than the solution domain. But what are the options if no such library is available and an in-house one has to be developed? Fortunately, all is not lost. The in-house library can be designed to use a standardized or well-known interface, which reduces a lot of the strategic design, tactical design, testing, learning, adaptation, and maintenance efforts. This article focuses on two key aspects, interface design and functional testing.

Introduction

Consistent use of a library keeps uniformity, both syntactic and semantic, across a project. It is essential for the development and maintenance of any large or multi-programmer code base. In C++, the standard library specifies a bunch of data structures (a.k.a. containers) (for example, array , vector , list , map , set , unordered_map , unordered_set , bitset ) and algorithms (for example, find , search , sort , partial_sort ) that are usable with any suitable built-in or user-defined type [ C++2011 , relevant sections: 20, 23, 24, 25]. The availability of the standard library provides immense benefits to a project: the programmers can look beyond the repetitive structural and algorithmic issues and focus more on the issues of the problem domain. The first implementation of such a type independent library was published by SGI and is known as Standard Template Library.

Although these containers and algorithms are specified in the C++ standards (C++1998, C++2003, and upcoming C++0x), they are not part of the core C++ language ; the library extends the language to provide some general components [ Josuttis99 ].

There are multiple implementations of the C++ standard library available. Among them, SGI, GNU and STLport are open-source implementations, and Dinkumware is a commercial one. [ Implementations ]

However, there exist systems and environments, mostly embedded systems, where the C++ language is used without the standard library. One such example is ‘Embedded C++’ [ EC++ ]; it is a subset of C++ which prohibits templates (among other things) and thereby a major part of standard library, including the containers and the algorithms, is unavailable.

If some project wants to use the standard library and if one of the open-source implementations is technically and legally suitable, then that can be chosen to be used – end of story.

However , in a commercial software or a proprietary code base, using open-source software is frequently not an option. There are multiple reasons, and the following is a non-exhaustive list:

licensing or legal issues (for example, the requirement of publishing derivative work or modifications to the open-source code)
the code is not actively maintained (for example, as of March 2011, the latest release of STLport is from December 2008)
the code is not well documented and hence difficult to understand and maintain
the code does not match the in-house development policies or coding standards (for example, the use of exceptions or asserts) and changing them requires significant rework.

Thus, the commercial houses have two major options for using a C++ data structure library:

Option A : Purchase the library software from a vendor and license it appropriately
Option B : Develop the necessary library in-house .

Our experience is with Option B (Develop), and in the remainder of this article we shall share two major lessons learned from that choice. One is the interface design and the other is comparative testing.

Interface design

The first and foremost item in developing a library is designing the interface. By interface, we mean all the public methods and attributes that are visible to the user code. While it is possible to design an interface in multiple ways, it is hard to produce the ‘right’ one. However, though the choice of Option B means developing an in-house implementation, fortunately there is still something that can be ‘borrowed’ from the C++ standard library. The interface!

For the interface of the to-be-developed library, our recommendation is to choose exactly the one specified in the C++ standard.

There are many reasons why.

It is the standard

API design is hard. A study of the obstacles faced by developers when learning APIs [ Robillard09 ] notes:

APIs support code reuse, provide high-level abstractions that facilitate programming tasks, and help unify the programming experience (for example, by providing a uniform way to interact with list structures).

The interface of the C++ standard library is widely known; virtually every C++ programmer is aware of it. For example, to insert an item at the end of a list or vector , the de facto, the idiomatic, and the most natural way is to use the method push_back() .

Since it is the standard, some other benefits include:

Known Roadmap: Between the library developers and the library users, there is a clear understanding about what is offered or may be offered versus what is not.
Reference point: In case of any confusion or disagreement internal to the library development team or between library developers and library users, the standard specification serves as the authoritative reference point.
Time savings: Following a standard eliminates the design debates and the time spent on interface design.
Superior Design: The API of the C++ standard library is standardized by the C++ standardization committee which includes many of the world's top C++ experts. Over the time, it has also been reviewed by other experts outside the committee and used by thousands of projects by millions of users. As a result of such rigorous analysis, extensive review, and widespread use, the interface has become so robust that it would be short sighted to ignore it.
Cultural effect: The users of the library have the feel of using the C++ standard library, albeit an in-house implementation.

Lower barrier to entry

One of the costs (and often a barrier) of using a library is learning its interface. The aforementioned study warns that:

APIs have grown very large and diverse, which has prompted some to question their usability. It would be a pity if the difficulty of using APIs would nullify the productivity gains they offer.

It would be a bigger pity if programmers have to, on top of that, learn different APIs – for example the C++ standard library and potentially different in-house libraries at different organizations – for doing the same job, such as inserting an element to a list . The number of APIs that we are talking here is large: dozens of classes, each with scores of methods, scores of algorithms, and a long list of idioms and good practices. There exists a significant amount of material – books, articles, tutorials, blogs, forums, newsgroups, mailing lists – on aspects of the C++ standard library; it is a substantial learning curve to master them and become an effective user.

If the in-house library uses the same API as the C++ standard library, then the cost of training the programmers is completely eliminated (or drastically reduced) because they can simply continue to apply their pre-acquired knowledge (or learn from already existing materials). This applies equally well for the C++-skilled programmers who are hired in future. On the contrary, if the in-house library is built with a different API, all the knowledge and mastery suddenly becomes useless.

Long term impact

Any software interface, standardized or otherwise, has long term implications. The implementation can be easily modified, but once it is published and the remaining code base starts using it, changing an interface

is extremely hard. Choosing an already stable interface reduces such impacts.

Also, if for some reason, in future, the organization wants to switch from Option B (Develop) to Option A (Purchase), then the migration is extremely easy because all user code is written against the same interface.

Rule of least surprise

The Art of Unix Programming [ Raymond03 ] observes:

The easiest programs to use are those that demand the least new learning from the user – or, to put it another way, the easiest programs to use are those that most effectively connect to the user’s pre-existing knowledge.

So, following an existing standard is the most natural choice to make.

Testability

If the in-house library follows the same interface as the C++ standard library, then testing the correctness of the library is much easier. This important aspect is now explained in more detail.

Testing

The choice of interface specification is a good first step, but that itself is not sufficient. The crucial design invariant – the interface compatibility with the C++ standard library – has to be actively maintained. That leads to the following questions:

Syntax conformance Does the in-house library conform to the interface specified by the C++ standard library?
Semantic conformance Does the in-house library provide behaviour exactly as specified in the C++ standard library?

The solution that we found most useful is to develop a test suite for the library with the following strategy:

Each unit of the library, for example a container, an iterator, an algorithm, or an allocator has its own unit test.
Separate unit tests are independent and stand-alone C++ programs, all of which are run in a regression suite.
The unit tests verifies the behaviour of a unit against the specification in the C++ standard.
A unit test exercises each interface of the unit in all possible ways.

It is best to explain with examples. In the following, excerpts from the vector test code are shown.

Comparative testing

All the tests follow a common structure: at the beginning of the test code, a control is provided to run the test against either a reference standard, or the in-house code. Listing 1 shows the structure for vector .

// vector_test.cpp
typedef unsigned long int Type;
#ifdef STD_REF
  #include <vector>       // From standard library
  typedef std::vector< Type > TypeVector;
#else
  #include "vector.hh"    // From in-house library
  typedef inhouse::vector< Type > TypeVector;
#endif
typedef TypeVector iterator TypeVectorIter;

#include <cassert>
#define UNIT_TEST assert

static const Type Values[] = {10, 20, 30, 40, 50,
   60, 70};
static const size_t ValuesLength =  
   sizeof( Values ) / sizeof( Values[ 0 ] );
int main() { 
  size_t valuesIndex = 0; 
  TypeVector vut;  // Vector Under Test 

  UNIT_TEST( vut.empty() ); 
  for( valuesIndex = 0; 
       valuesIndex < ValuesLength;
       ++valuesIndex ) {
    vut.push_back( Values[ valuesIndex ] );
  }

  UNIT_TEST( ! vut.empty() );
  UNIT_TEST( vut.size() == ValuesLength );
  UNIT_TEST( vut.front() == Values[ 0 ] );
  UNIT_TEST( vut.back() == 
     Values[ ValuesLength - 1 ] );
  valuesIndex = 0;
  for( TypeVectorIter it = vut.begin();
       it != vut.end();
       ++it, ++valuesIndex ) {
    UNIT_TEST( *it == Values[ valuesIndex ] );
    UNIT_TEST( *it == vut[ valuesIndex ] );
    UNIT_TEST( *it == vut.at( valuesIndex ) );
  }
  UNIT_TEST( valuesIndex == ValuesLength );
  UNIT_TEST( ! vut.empty() ); 
  vut.clear(); 
  UNIT_TEST( vut.empty() ); 
 }

Listing 1

First it defines the type of the elements that the vector consists of. For simplicity in this example, we used the built-in type unsigned long int , although it could be any user defined type ( struct or class ). When the macro STD_REF is defined, we run this unit test on a reference implementation of the standard library. Otherwise, we run this unit test on the in-house library. Observe that, in both ways of setup, we defined a type named TypeVector . The remainder of the file vector_test.cpp runs all tests on TypeVector , without any knowledge of the source of the library code .

Thus we have a simple way of choosing one among many possible vector implementations and run the unit test on the chosen one. If the implementations conform to the C++ standard, then the unit test would compile with all of them, and execute to produce identical results in all of them.

Test construction

The next task of the unit testing strategy is creating the test cases. All the test cases are created as a sequence of two steps:

Do some operation(s) on the unit (here, vector ).
Programmatically verify that the properties and contents of the data structure matches the expected result(s).

The rest of Listing 1 shows an example of some simple test cases applied on vector , where programmatic verification is done using asserts.

It tests some methods of vector ( empty() , push_back() , size() , front() , back() , at() , begin() , end() , clear() , operator[] ) and the type vector::iterator .

For each unit, the conformance and correctness testing consists of few simple steps. The steps for compiling and running for vector are as follows:

CC := g++ -W -Wall -Werror -ansi -pedantic -std=c++0x
CC -DSTD_REF -D_GLIBCXX_DEBUG vector_test.cpp -o ref_vector
CC vector_test.cpp -o inhouse_vector
./ref_vector
./inhouse_vector

Things to note for these steps:

Build settings The compiler used is the GNU C++ compiler with all the warnings turned on and strict conformance to the C++0x standard.
For use in the target environment, the in-house library is (cross) compiled and linked with a different C++ compiler, and (successfully) run on a different OS on a different CPU.
Reference build The unit test code is built to be run with the reference standard library. Also, the GNU STL debug macro is defined for strict checks. Successful completion of this step implies that the unit test code in vector_test.cpp is syntax compliant with the reference C++ standard library.
Inhouse build The unit test code is built to be run with the in-house library. Successful completion of this step along with the previous step implies that the in-house library is also syntax compliant with the C++ standard.
Reference execution The unit test is executed on the reference standard library. Successful completion implies that the unit testing code ( vector_test.cpp ) is semantically correct.
Inhouse execution The unit test is executed on the in-house library. Successful completion proves that the in-house library is semantically compliant to the C++ standard. In other words, the in-house vector implementation exhibited expected standard behavior.

This example is rather simplistic, it uses only few member functions available in vector . In reality, there are lot more methods in the vector template class. To obtain basic confidence in the conformance and correctness of the in-house library, the unit test code tests each method in isolation . Then the methods are tested in different combinations and sequences .

Other experiences

Without risking any non-conformance to the standard interface, the implementation of the in-house library can offer some niceties which may or may not be available in other implementations. Here are two examples.

Have log/trace messages at important points in the code that can be triggered based on a log level selected by the user code. For example, generation of log messages whenever memory is allocated or deallocated.
Maintain class invariants. For example, in the list class template, we kept the following private attributes:
- size_ : number of elements in the list (this also helped the size() method to have O(1) complexity)
- news_ : number of times an element is added to the list .
- deletes_ : number of times an element is removed from the list .
Thus we had the following invariant
```
			news_ - deletes_ == size_
```
We asserted on this invariant as a pre-condition and post-condition of every mutator method in the list class template.

Some other general strategies:

Writing the unit tests (for example vector_test.cpp ) before implementing the unit (for example vector ). Since it is known what exactly to expect, the development of a standards compliant in-house library is an ideal scenario for applying the principles of Test Driven Development [ TDD ], and employing it has been immensely helpful to us.
Comparison of program size, for example comparing size between ref_vector and inhouse_vector
Comparison of program speed, for example comparing running time between ref_vector and inhouse_vector

Conclusion

Consistent interfaces make life easier. The same is true for software development. This article emphasizes that the interface provided by the C++ standard library, which sometimes go unappreciated and overlooked,is very valuable by itself. As the author of the in-house library, it has been realized numerous times that choosing to follow the standard interface was the most important design decision that was made. Following the interface conventions as in the C++ standard library has tremendously helped (non-library) programmers to easily understand and easily use the newly written in-house library. It brought the programmers to a common and consistent style both syntactically and semantically. Overall, it has proven to be a great step in reducing software complexity in the organization’s code base.

References

[C++2011] ‘Working Draft, Standard for Programming Language C++’, 02 2011. http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2011/n3242.pdf

[EC++] ‘The Embedded C++ specification’, 1999. http://www.caravan.net/ec2plus/spec.html

[Implementations]‘Dinkumware C++ Standard Library’. (http://www.dinkumware.com/), ‘The GNU C++ Library Documentation’ ( http://gcc.gnu.org/onlinedocs/libstdc++/ ), ‘SGI Standard Template Library Programmer’s Guide’, 1994 ( http://www.sgi.com/tech/stl/), ‘STLport C++ Standard Library’ (http://www.stlport.org/)

[Josuttis99] N. M. Josuttis, The C++ Standard Library, A Tutorial and Reference . Addison-Wesley, 1999.

[Raymond03] E. S. Raymond, The Art of Unix Programming , 2003. http://catb.org/~esr/writings/taoup/html/ch01s06.html#id2878339

[Robillard09] M. P. Robillard, ‘What Makes APIs Hard to Learn? Answers from Developers’, IEEE Software , vol. 26, no. 6, 2009. http://www.cs.mcgill.ca/~martin/papers/software2009a.pdf

[TDD] ‘Test-driven development’, accessed 2011-March-10. http://en.wikipedia.org/wiki/Test-driven_development