Supporting Threads in Standard C++ (Part 3)

Recap

The previous articles in this series on threads offered two suggestions:

Use a thread function interface class to separate the application-specific code from the thread library facilities (the Thread-Runs-Polymorphic-Object design).
Use a Handle-and-Reference-Counted-Body technique to manage the lifetime of objects representing threads.

The resulting thread model can form the basis of a useful thread library. In practice, however, it does not address some problems.

Active Objects

At the heart of the project I have been working on for the past two years there are active objects known as Jobs. There are several different kinds of Job and each runs in its own thread. Roughly speaking, a Top Job runs Middle Jobs and a Middle Job runs Bottom Jobs. (I have changed the names to protect the innocent.)

So, naturalists observe, a flea hath smaller fleas that on him prey; and these have smaller fleas to bite 'em, and so proceed ad infinitum. (Jonathan Swift)

Figure 1 - Job class structure.

// A 2-step Job.
class Job : public Thread::Function {
public:
  Job() : thread(0), done(false) {}
  ~Job() {delete thread;}
  void Start();
  void Step1_Completed();
  void Step2_Completed();
  void Wait() const {thread->wait();}
  . . .
private:
  class Command;
  virtual int Run();
// implement Thread::Function::Run()
  void Start_Step1();
  void Start_Step2();
  void Terminate() {done = true;}
  Thread* thread;
  Queue   command_queue;
  bool  done;
};
void Job::Start() {
  thread = new Thread(*this);
  command_queue.Put(new Command(this,
                &Job::Start_Step1));
}
void Job::Step1_Completed() {
  command_queue.Put(new Command(this,
                 &Job::Start_Step2));
}
void Job::Step2_Completed() {
  command_queue.Put(new Command(this,
                 &Job::Terminate));
}

Each Job class hides its thread behind a suitable interface, as shown in Figure 1. The thread model is essentially the same as the Thread-Runs-Polymorphic-Object design described in the previous articles.

Each Job contains a queue of Command objects [ GoF ]. Public member functions add commands to the back of the queue; the private Run() function removes commands from the front of the queue and executes them in the internal thread. Figure 1 shows the structure of a 2-step Job class.

The queue class simplifies the Job functions by providing blocking read and write functions that are unconditionally safe in a multi-threading environment. The Command objects call one of the Job's member functions to do the real work.

A client at a higher level creates a Job and calls the Start() function, which creates a new start-step-1 command and adds it to the queue. The Run() function removes the command from the front of the queue, executes it, which initiates step1, and deletes it.

When step 1 finishes, a client at a lower level calls Step1_Completed() , which adds a new start-step-2 command to the back of the command queue. The Run() function removes the command from the front of the queue, executes it, which initiates step 2, and deletes it.

Similarly, when step 2 finishes, a client at a lower level calls Step2_Completed() , which adds a new terminate command to the back of the command queue. The Run() function removes the command from the front of the queue, executes it, which sets the done flag, and deletes the command. The Run() function then exits its processing loop and the thread terminates.

Figure 2 - The Run() function.

virtual int Job::Run() {
  while (!done)
  {
    Command* command = queue.Get();
    (*command)();
    delete command;
  }
  return 0;
}

Note that all the public member functions execute in a client thread and all the private member functions execute in the Job's own thread. This is how the Job class hides its thread from the client code.

Now, I could not help noticing that there is a lot of duplication here. Top, Middle and Bottom Jobs all look much the same. And the same pattern (small 'P') crops up elsewhere in our code. This is a golden opportunity for some refactoring [ Fowler ].

Refactoring

In C++ there are several ways to factor out common code. We could create a common base class, for example, or a class template or a separate module all together. Which should we choose for the family of Job classes?

The template option is not appropriate here because the Job classes do not have a common interface. In the real application Top, Middle and Bottom Jobs have public member functions with very different names and purposes. A Job base class would cure the code duplication problem for the Job classes, but the same problem occurs elsewhere in the program. So, I think we are led to a more general refactoring - one that works for active objects of all kinds.

In the good old days, when multi-threading operating systems were not generally available ^{[

1

]} , many software systems were divided into multiple processes. Such systems were often designed using Mascot diagrams. Those diagrams showed processes as circles and the processes were linked by "channels" that functioned as message queues. In a Mascot diagram, "process" meant a real (physical) process as known to the operating system. (Data flow diagrams from the era of structured design methodologies are very similar except that they show abstract (logical) processes. The logical processes in a DFD could be nested and mapped to physical processes in arbitrary ways.)

Designing systems using Mascot diagrams was relatively straightforward. Concurrency problems did arise, but (as far as I remember) they were less numerous and less troublesome than they are in modern multi-threading programs. I think this is partly because software these days is more complex and threads encourage more concurrent processing. But it is also partly because the art of designing software for a multi-threading environment seems to have been lost.

The Mascot model is very general. It qualifies as a Design Pattern (capital 'P') [ GoF ] and it solves certain problems of communication between concurrent threads. So, instead of re-inventing the wheel, I propose to use this Pattern to build a new thread model - an "application-oriented" model - and then use the new thread model to refactor the Top, Middle and Bottom Job classes.

Application-Oriented Thread Model

To keep things simple I shall describe a rather naïve implementation of the application-oriented thread model. Better implementations will be mentioned, but not explained in detail.

The key feature of the application-oriented thread model is that the thread class contains an input queue. The queue contains pointers to function objects that are executed in sequence in the context of the thread. Each function object must conform to the interface defined by the thread::command class.

Figure 3 - Application-Oriented Thread Interface.

// Generic thread class.
class thread {
public:
  class command;
  thread();
  ~thread();

  void push_back (command*);
  void terminate();
  void wait() const;
  . . .
private:
  class body;
  body* hidden_implementation;
};

class thread::command {
public:
  virtual ~command() {}
  virtual void operator() () = 0;
};

Figure 3 shows the thread and thread command classes. It also shows a thread body class declared as an incomplete type. The thread body hides all details of the implementation. (This is the Cheshire Cat/Pimpl/Bridge Pattern, again.)

Once started, a Job proceeds in a series of event-driven steps. The hardware drives the Bottom Jobs, which drive the Middle Jobs which, in turn, drive the Top Jobs.

Like the Job classes illustrated above, client code adds function objects to the queue using the push_back() function while the hidden implementation removes each function object from the queue, executes it and disposes of it using the delete operator. A better implementation would provide command objects in the form of Handles which could be passed by value. This would provide a more flexible method of managing the lifetime of the command objects.

The terminate() function just adds a 'terminate' command object to the back of the queue. The 'terminate' command is provided by the implementation; it does not have to be provided by the client code. Executing the 'terminate' command causes the thread to terminate.

The wait() function suspends the calling thread until its own thread has terminated. When the wait() function returns it is safe to destroy the thread object. A Handle/Body mechanism would provide more flexible lifetime management here, too. (The previous article in this series illustrates this technique.)

All the declarations should be in a suitable namespace to minimise the risk of name clashes.

The effect of refactoring the Job classes to use the new thread model is illustrated in Figure 4.

Figure 4 - The Refactored Job Class.

// 2-step Job class.
class Job : public thread {
public:
  void Start();

  void Step1_Completed();
  void Step2_Completed();
private:
  class Command;
  void Start_Step1();
  void Start_Step2();
};

void Job::Start() {
  push_back(new Command(this,& Job::Start_Step1));
}
void Job::Step1_Completed() {
  push_back(new Command(this,
          & Job::Start_Step2));
}

void Job::Step2_Completed(){
  terminate();
}

This version of a Job class inherits all the thread behaviour from the thread class. It needs no user-defined constructor or destructor, no implementation of the Run() function, no Terminate() function and none of the data that the earlier version of the class required. The code that is left is all specific to the Job class. And the interface has remained unchanged. All the hallmarks of a successful refactoring are here.

Reflection

We started with a family of active objects (the Top, Middle and Bottom Jobs). We identified a classic case of unwarranted code duplication. And we refactored to remove the duplication. The whole process was a routine application of good programming principles that led to a thread model that suited one particular program. But we have not compared the new thread model with the more common designs, nor have we considered whether the new model would be useful in other applications.

In discussing these points, I must stress that I do not have enough data for any firm conclusions. Nevertheless, it is worth explaining how the application-oriented thread model arose and exploring some of the characteristics of the two threading models.

In our project, the Top, Middle, Bottom hierarchy of Jobs was established early in the design. The normal path through each Job is a simple sequence or repeating loop. However, the user can intervene at any time to pause or abort the Job. At each step the Job would initiate an asynchronous operation (e.g. start a lower-level job or perform an operation on a device) and wait for one of several outcomes (success, failure, pause or abort). The Win32 WaitForMultipleEvents() API call seemed to provide a natural way to do this. In practice, though, managing the multiple Event objects required by this approach became rather complex.

Although it was far from clear at the time, replacing the multiple Events with some sort of message queue simplified the design considerably. A Job could now wait for a "new message" event, read the message and perform some appropriate action. In effect, multiple un-parameterised Events were replaced by a single parameterised Event, with the message as the parameter.

The down side of a classic message queue is that the receiving function tends to become a big, error-prone switch statement. Our answer to that problem was to put Command objects into the input queue. Or rather, pointers to Command objects because Commands are necessarily polymorphic. And that is the design described at the beginning of this article.

This is just one team's experience and one data point does not make a trend, but it suggests that threads and input queues work well together. The design methods based on Mascot and Data Flow diagrams seem to support this idea. The input queue of the application-oriented model does, of course, carry more overhead than the lower-level thread designs supported by most threading libraries. But I wonder if multi-threaded applications usually build in such machinery anyway. If so, a library supporting application-oriented threading could be a useful addition to the programmer's toolkit.

References

[Fowler] Refactoring - Improving the Design of Existing Code , by Martin Fowler, ISBN 0-201-48567-2.

[GoF] Design Patterns - Elements of Reusable Object-Oriented Software , by Erich Gamma, Richard Helm, Ralph Johnson and John Vlissides, ISBN 0-201-63361-2.

^{[

1

]} In those days operating systems were simple, efficient and reliable.