Modelling and Software Development

pattern n. .... 2. a model or design or instructions according to which something is to be made ...5. a regular form or order in which a series of actions or qualities etc. occur....

pattern v. 1. to model according to a pattern ^{[

1

]} .

Anyone who has studied human geography has probably come across the work of Walter Christaller. In 1933 he proposed a model to describe the settlements patterns in southern Germany - see side bar. While this model shed some light on where settlements had occurred nobody really expected it to actually give the location of settlements; it described settlement patterns in an abstracted way.

Models, by their nature make assumptions to simplify things: when Airfix produced their model F1-11 - the company attempted to show you what a particular jet bomber looked like but they never claimed it would fly (OK, I'm sure I was not the only boy who tried to fly the odd Airfix model out the bedroom window!)

From a programming perspective we are concerned with information models: this places us closer to the models of Christaller than Airfix.

If at this point you wonder what all this has to do with software let me spell it out: when we create computer systems we are creating models. Sometimes these are obvious: I once worked in a department modelling the electricity market, these predictions where used directly by management to decide which electricity contracts to sign; sometimes these models are less obvious: a customer relationship management (CRM) system models the expected interactions between company and customer, when the model fails we find post-it notes on people's terminals "If Jack Smith phones transfers him to Jo."

Our models have boundaries: within these conditions and scenarios are dealt with. Some boundaries are explicit, some are tacit. Economists are familiar with these boundaries: every economics model comes with the "all other things being equal" pre-condition. Their models will attempt to describe activity provided every other boundary condition remains unchanged. In many cases these models make sweeping generalisations. The monetarist model (see side bar) attempts to predict inflation in a closed economy. In itself it is useless because it is so general, but it does allow economists to reason about an economy. It also forms the starting point for sophisticated models used by make economic predictions.

Central Place Theory

Originally devised to explain settlement patters in southern Germany the theory has been applied to other settlement patterns and found to fit North American geography well. After studying southern Germany Christaller devised a model with assumed a uniform flat plane, with no natural advantages of one place to another, settlements could occur anywhere. Given this, where would towns occur? He concluded people cluster in groups equidistant from one another. Each town would have a hinterland of equal size. This would result in circular spheres of influence leaving some areas uncovered so the model uses hexagons instead of circles, each of which has a town at the centre.

In Cristaller's model natural features like rivers and mountains where ignored. The model could suggest where settlements would arise - if there was a river close by the settlement would be a bit closer to it, natural features could be allowed for when looking at real settlements. But the model allowed geographers to reason about settlement patterns, and it provided a common standard to measure and compare settlement patterns by - it provided a language to exchange information.

Nobody ever expected to find a perfect example of the model, that wasn't the point, it was a tool for modelling the real world.

What are the models we build?

In software development we build many different kinds of model, on any project the different models look at the problem domain from different perspectives.

Metaphores are small models

Drawing an analogy by metaphor is setting up a small, quick model: we rely on the fact someone already knows the metaphor, so when Kevlin compares software development to gardening, he is relying on the fact that most of use have an idea what gardening entails.

Metaphores are used in one of two modes:

Educators use a metaphor to describe a new concept to us in terms of a known one
We use it to reason about a system: because X is like Y in one respect, is it similar in other respects?

These are the same thing at different points. We tie our subject (e.g. software engineering) to a target (e.g. gardening) by pointing out a similarity, then we can follow this up with a less obvious similarity. In doing this we leverage knowledge in one area to increase knowledge in another. Think of it as proof by induction.

Specifications

In traditional development a business analyst writes a specification document that is then implemented. Some organisations still work this way but many companies make do with a vague statement of intent: "We intend to build an equities trading system" - this relies on the developers understanding of what features an equities trading system should contain, the developers have their own mental model of what the system needs to do. (This explains why banks prefer to hire people with experience in the financial markets and why these people can command premium wages.)

Monetrarist theory of money

One of the simplest economics models it is also one of the most far reaching, advocated by Milton Friedman and, at least in public, believed by Margaret Thatcher and Ronald Reagan.

M x V = P x T

Were, in any given time period:

M = the quantity of money in circulation
V = the velocity of circulation
P = the price of all goods
T = number of transactions

Spending is: money multiplied by the speed at which it changes hands, this must be equal to all the goods sold in the same time period at their prevailing price. Obvious really when you think about it, everything we spend must equal everything we buy.

Hence, if we increase the amount of money in the system, all other things being equal (velocity and number of transactions remain constant) prices must raise - inflation, q.e.d.

Again, this is an idealised model and ignores little factors like savings, investment and what actually constitutes money but the idea is clear. Some elements of reality can be abstracted away to show the central concept. More complex models used by banks, firms and governments peek into the future, each of these makes assumptions, each has some mathematical model at its heart that, almost by definition, is inaccurate.

At heart both Christaller's and the Monetarist models are attempts to reason about information and systems, which is not that different to what we do when we write a design a system.

Whether a large document, in developers' heads or on many individual CRC cards, the specification is a model of the problem. Yet it is usually incomplete, and frequently inconsistent with itself.

Design

While I hope all systems have a design, I believe most use the Topsy Design Pattern: they just grow.

A design is high level representation of the source code and as such it is indisputably a model. It is a model of the solution, not the problem.

Like any model, it makes assumptions and abstractions. It is important that all working on the system understand the model: if I think we are building an Airfix Lancaster bomber and you think we building a B17 we may well get something that looks like a four engined World War II bomber but it will be neither one thing or another. Auntie Dotty may think it is a good model but her terms of reference are different to ours.

Source code

Our ultimate model of the solution: the point where we start to discover the inconstancies and holes in the specifications.

I once worked on a train timetabling system. The specification was long and inevitably contained omissions and errors. These where fixable, even when they occurred late in the development cycle we could add new rules and change existing ones. The most difficult problems occurred when rules conflicted each other, usually this wasn't obvious until the source code was examined and we found that a fix for requirement A had introduced a bug, requirement A was quite respectable, but nobody foresaw that when implemented the result contradicted requirement B. Only when the specification was codified in the pure logic of code was this clear that neither A nor B represented the true requirement.

Our voluminous specification model was neither complete nor self-consistent and much was still locked in people heads.

Other models

Specification, design and code may be the first models that spring to mind but there are other models in software development:

Test suites: test results part way between problem and solution; they attempt to apply the solution to the problem.
Process models: We defined models for how we develop software, our processes. SSADM, Extreme Programming, waterfall, and such are models of process. Those who read my article on Extreme Programming will notice this as one of my criticisms: [3] Beck sets out a model called Extreme Programming (XP), then, he says: you cannot modify this model, if you do so it is no longer XP. I can't accept this, XP is a process model, no team will ever have the exact conditions of the C3 team, it must be adapted for each case.
Delivery schedules: need I say more?

What are our tools?

When we use CASE tools like Rational Rose our modelling is obvious, but even when we write C++, Java and Pascal we are codifying our logic in a language model. The languages and machines which run our models are all Turing equivalent so no computer or language is really more powerful than another, but each brings different techniques for modelling the problem, for thinking about the problem, and this is where their power lies.

Object oriented Java is not more powerful than procedural Pascal because it runs faster; it is more powerful because it allows us to think, to model, in different concepts.

Beyond language we have notations: when I draw a UML chart you know that a rectangle means one thing and circle means another. Actually, I will take exception with my own argument here: I think many of our notations, especially UML, rely too much on subtleties, a folded corner on a rectangle means it is different to a regular rectangle, while a dotted line is different to an solid line. I think we often try and put too much information into our notations.

I found the following story in Software Fundamentals: " if the presenter showed a block diagram, Dave [Parnas] would ask about the semantics - the meaning of different block shapes ... the meaning of an arrow; whether an unfilled arrow meant something different than a solid, filled arrow... Usually these frills had no meaning. They certainly didn't aid careful analysis, and they often got in the way. " ^{[

2

]}

In recent years the patterns community has moved to define more labels for more models. In the case of the GoF book they defined a meta-model that could be used to describe all their models ^{[

3

]} . Now when I say Singleton, Chain-of-responsibility or Mediator you know what I mean - OK, I deliberately included Mediator because it is not so well known and this is one of the problems the patterns community faces; it has been so successful in defining patterns that, outside of a core half dozen, few are widely known.

Why are our models inexact?

Like Christaller we can never expect our models to exactly describe a situation - by their nature they are abstractions. The simplification we make to create the model and generalise it return in real life. This brings us to the realm of Chaos Theory, a small variation can, over time, when repeated, magnify into significant difference.

We also face the problem of Catastrophe Theory - when multiple parameters are varied things start to break down. And as if this weren't bad enough we also have to face the law of diminishing returns ^{[

4

]} .

The key with a model is identifying variability ^{[

5

]} however, we frequently miss points of variability and need to adjust our models accordingly - but add too many points of variability, too many if's and but's and instead of a model we have just a list of special cases.

The more parameters a model has the less useful it is. There is no model of the game of soccer because there are too many parameters which can effect the game, we can make generalisations: Manchester United usually win, Everton usually lose ^{[

6

]} .

We should not expect our models to be exact, nor should we expect to follow them blindly. Sometimes it just doesn't make sense. Yes, we would like our singletons to be nicely destroyed at the end of the program, but what does it matter if the OS model will clean things up? Sometimes the work involved is not justified: consider the GPS system in a cruise missile, what does a memory leak matter in the final few milli-seconds before impact? Attempting to have a GPS-singleton delete itself neatly is more work for the programmers and adds more variability to a system at the exact moment when it must be totally predictable.

And not only with code and patterns: the process models of Yourdon, Jackson and Beck are really just more Christaller models. Yes, we can learn from them, we can compare ourselves to them, but to attempt to adopt them as laid out by the authors is about as sensible as ignoring the a mountain range when building your village!

Summary: Just what does a model give us?

Models provide us with many benefits:

They give an idea or concept a label
A label allows us to communicate more efficiently
They allow us to compare different concepts
They qualify the main characteristics of an idea

However, they come with drawbacks:

No model will ever exactly describe an idea: if it does it is not a model
Because they selectively hide elements that can be manipulated to the advantage of the modeller
Applied incorrectly they can be wasteful on resources: you can't apply Christaller to mountainous regions; to do so would simply waste your time.

The value of a model lies in the abstractions it makes: by focusing on what the important elements are we are not distracted by the irrelevant ones. This is a classic definition of software abstraction but it also brings us back to Christaller: on the German plains Christaller accurately isolated the abstractions that describe settlement patterns. But, you would never expect to find Christaller's settlement patterns exactly because there are things like rivers, hills, particularly fertile land areas and such. Equally you should not expect your software model, your pattern, to describe exactly your software.

No more green fields

I carry this model of the perfect job in my mind: at the interview I'm told: "we have a few people here, we have a problem and we have no idea of how we are going to solve it". So, maybe I never expect to hear this but I settle for: "our current team has decided to develop a completely new product without using any existing code".

What I want is: green field development, a brand new development, with no old code to maintain, no creaking database scheme, no legacy source code control system...

In my old age, in my scepticism I don't believe there are any green field projects left. So much has been done that every area has been touched by legacy systems. Sometimes this is obvious: we must keep the current system working; sometimes it is contradictory: "we are writing a new product but we plan to salvage as much as we can from the old one". Even if they were to throw away all the code they would want to keep the user interface, or the database format, or the file format.

Yesterday's models form part of today's reality: people's expectations have changed - 10 years ago a text based calendar system was magic, today people want an easy to use GUI, voice control, and, and...

Your perspective is always shaped by what you know and what has come before; today's problem are shaped by what exists, so, even if a company doesn't have an application installed they will have users accustomed to some systems ^{[

7

]} .

Some of these advances are good: defining an XML schema is better than defining a byte-by-byte file format, but such developments can be limiting: you may need to use XML because it is a buzz word, forget the fact that you are writing Space Invaders, you must use XML somewhere! Models can constrain us too.

It is not only technology that limits us. If you think of your problem domain as a blank canvas, or better, a Christaller like uniform plain it is still bounded:

Our Easterly edge is compatibility: we must take data from this system, or produce output for that system, we are asked to reuse as much of the existing system as we can - even though we are supposed to replace it!
A project deadline and costing forms our Westerly edge: maybe we have a drop-dead project deadline, or maybe it simply effects our bottom line! Maybe if we don't deliver in time the company will cease to be. Even worse can be no deadline, an abyss on the Western edge into which our team disappears on a blue-sky mission to seek out and explore strange new technologies.
To the North there is the advance of technology: our project will not be the first, or the last, system already out of date when delivered; or maybe, you must target 386 Windows 3.1 machines to save on costly upgrades.
And to the South our co-workers: we usually don't get to choose the people we work with, some are there when we arrive, some are hired without our involvement; some may be fresh from college and lacking in real world experience, others may be jaded by too many failed projects or over exposure to COBAL and Lisp.

And then when the terrain is not flat and fertile:

Barren patches mean that crops don't take: we may advance Extreme Programming, templated designs or code reviews but if people are unenthusiastic, and management won't back you then nothing will grow.
Quagmires: you advance a generic design, but before you can cut a line of code you are taken at your word and now your design must work for your department and four others. You are bogged down in endless meetings, advocacy, design reviews. Sir Humphrey would be proud!
The seasons change: in the autumn the company decides to standardise on Oracle for all database work, suddenly you must abandon SQL Server; winter brings a cash flow crisis and nothing will get signed off; spring brings a thaw but suddenly Java is in favour, and then summer when everyone is on holiday, nothing is decided and even less done!

And through the centre of the plain runs the rift valley of office politics: will your proposals prove too radical for the management? Would co-operation with the New York office further someone's career? Does the office secretary hate contractors and make their life difficult? Could you use COM in your design? Is the senior architect convinced he knows all the answers? Is the director's ear bent by a long standing employee who grew up on Pascal and frankly, doesn't know the first thing about objects? Sometimes you have to choose your battles.

Conclusion

Models are an essential way of abstracting problems and patterns. Using models we can codify and communicate ideas - this in turn allows us to learn and to share ideas. However, as other disciplines know, models have limits, we should not expect our idealised models to be used straight out the box. Every model, whether it is a design pattern, a process or a programming style must be adapted to our present circumstances.

^{[

1

]} Oxford paperback dictionary, 1983.

^{[

2

]} Introduction to Abstract Types Defined as Class Variables, Software Fundamentals: Collected Papers of David L. Parnas, edited by Hoffman and Weiss, Addison-Wesley 2001.

^{[

3

]} Design Patterns, Gamma et al, Addison-Wesley, 1995.

^{[

4

]} See any good economics text book for a description of diminishing returns. Nor do I give references for Chaos theory, Catastrophe theory, Christaller or Monetarism - likewise these can be found in good mathematics, human-geography and economics text books. Google searches provide lots of sources on all.

^{[

5

]} See Coplien, Multi-Paradigm design in C++, Addison-Wesley, 1998 for discussions on commonality and variability analysis.

^{[

6

]} This will change next season, I'm sure.

^{[

7

]} Overload 37, May 2000.