Introduction to Expert Programming (in C)

Lesson 5: Simplicity (continued)

For this lesson, we emphasize particular types of simplicity: determinism and equivalences. The C language behaves more ``deterministically'' than many other programming languages, and this is reflected in several cases of equivalence between superficially different syntax.

After reviewing the C syntax of pointers and arrays to expose its simplicity, we challenge the student to practice his programming skill with two exercises. If you think these exercises are difficult, a ``self-fulfilling prophecy'' may make your programs long and difficult. If you recognize their inherent simplicity you will be through this lesson quickly.

5.1	Simplicity is the enemy of uncertainty.
5.1	Design Goal: Determinism and Regularity

5.2	Simplicity means conciseness and consistency.
5.2	Design Goal: Pointer/Array syntax in C

5.3	Simplicity may require retyping working code.
5.3	Exercise: Path-counting routine

5.4	Simplicity of the whole may require complexity of a piece.
5.4	Exercise: Waldo's Nim (Sparse Arrays)

5.5	Simplicity is in the eye of the beholder.
5.5	Design Goal: Macros for Simplicity

S i m p l i c i t y i s t h e e n e m y o f u n c e r t a i n t y .

Lesson 5.1: Determinism and Regularity

In a car with automatic transmission, you may not be able to predict whether it will down-shift when you step on the gas -- that's a function of engine speed, how hard you push the pedal and so on. The automatic algorithm may be fine, even better than what the human operator would have done with a manual transmission, but that's not my point here. The point is you're no longer ``in the driver's seat'' even if that's where you're sitting!

People buy automatic transmissions because that's what they want, but what would you think of a manual gearshift which sometimes puts you in 4th when you shift to 3rd because, to the machine, 4th seems like a ``friendlier'' gear? I'd like you to think of situations, whether computer-related or not, where you are often inconvenienced or annoyed by automated processes that do not have clearly defined responses. Just to give my pet peeve, consider certain mouse-clicks (say in a browser) in an Internet-cafe environment. Many of these will have meanings dependent on the history of earlier cafe users. This kind of uncertainty (non-determinism) is exacerbated by malicious webware (e.g. what looks like a Close-click downloading a porno viewer instead).

Programming languages sometimes have non-deterministic behavior. In FORTRAN, the behavior of a DO-loop may be unpredictable if you `GOTO' the middle of the loop. This is for a good reason -- FORTRAN compilers trade-off determinism in irregular codings in order to emphasize highly optimized object code -- but I personally like determinism (`what you see is what you get'). My cars have manual transmissions, and when I want a special code optimization in my software, I optimize it myself.

Another example of non-determinism in programming would be using routines for which no source code is provided. This may be OK for very well-defined functions, but many functions aren't quite as well-defined as they might pretend. In the same vein would be overloadings in a C++ program. Even if the relevant source codes are visible, it will not always be immediately clear which object's properties are inherited in a given context. The answer may be deterministic, but if you have to study and solve a maze of inheritance relationships to find the answer, you are denied the clear simplicity which is the fruit of determinism.

Another virtue whose fruit is clarity and simplicity is regularity, by which we mean data structures or handlings which are made equivalent, so that a multiplicity of objects becomes a single object type.

Approximate equivalence is not the same thing as true equivalence. I'm sure we've all been frustrated by things which purported to be ``almost equivalent'' to something else, but where we got bitten by one of the exceptions. It's a very comfortable feeling when you know that an equivalence is 100%, complete, total and with no exceptions whatsoever.

Here's a simple example in C:


        (p && q)
        (p ? !!q : 0)

These two expressions are exactly the same in C. They're not usually the same or almost the same or sometimes the same; they are always exactly identical. (As long as the two codes have exactly the same effect, a compiler might generate different codes for the two cases, but I'd be at least mildly disappointed of a good one that did).

In this and all other C language discussions we exclude pre-processing tricks. Obviously you can use ``#define'' tricks (e.g. macros with unbalanced parentheses) to create exceptions to the rules I show, but would that really be helpful?

Accept no imitations of regularity. Often an unabashedly irregular system will be less confusing that one which aspires only to almost-regularity.

Here's another example of C synonyms:


        (p, q)
        (p ? q : q)    /* provided p is not of void type */
        ((p, q))

The special comma operator in C is almost notorious in confusing learners. Textbooks talk about a ``sequence point'' and some beginners get the idea that C's special comma operator has some special purpose they cannot quite understand. However (p, q) has exactly the same effect as (p ? q : q) and the textbooks don't usually bother with the term ``sequence point'' in explaining the latter. Next time someone expresses confusion about the comma-operator, show him the alternative (p ? q : q). (If he says that looks bizarre, answer that avoiding the bizarrity is the only reason C bothers having the comma operator at all.)

(Of course C uses commas in a variety of list contexts that have nothing to do with the ``special'' comma operator. Reviewing simple C syntax is not a purpose of these lessons, but I mention in passing that the comma in a formula ((p, q)) will always be the ``special comma'' -- if it is legal at all -- and the comma in (p, q) is ``special'' if and only if it has the same effect as ((p, q)).)

(Exercise: Think of some other expressions in C which have an alternative exactly equivalent expression.)

Applications and command-processing shells follow a convention in Unix, where the first three file descriptors are called stdin (0), stdout (1) and stderr (2). The stdio library also refers to three objects of the same names, and they are almost the same. But you may have been bitten by the following behavior:


        fclose(stdin);                      /* close any input script */
        freopen("Torture.log", "a", stderr);  /* redirect diagnostics */
        freopen("/dev/tty", "r", stdin);    /* use terminal for input */
        printf("stdin=%d stderr=%d\n", fileno(stdin), fileno(stderr));
        status = system("torturer");         /* setup complete -- run */

Compile and run this program (without the system()) if you don't see the problem. The equivalence between the two entities named stdin is ruptured when you do something a little complex. (It's often hard to avoid such rupturing in multi-layer systems, but when you do, highlight the fact in the manual section to avoid any confusion, don't leave it as a bug to be discovered by a ``power customer'' !)

This peculiarity of stdio may be viewed as a feature or as a flaw, but we must agree that when (rarely) stdin != stdin, an important regularity has been rendered moot.

S i m p l i c i t y m e a n s c o n c i s e n e s s a n d c o n s i s t e n c y .

Lesson 5.2: Pointer/Array syntax in C

Here's a simple table which provides an overview of C semantics.
Operand Semantics in C
Mode of symbol: -> Function Array Pointer Structure Union Int, etc.
Usage Context: -\
In declaration or sizeof() < ---- Confer with cdecl ---- >
In ordinary expression or as function argument Invoke As pointers s . f == (&s)->f As arithmetic values
Allowed as Lvalue No No Yes Yes after 1984 Yes

**Operand Semantics in C**
Mode of symbol: ->	Function	Array	Pointer	Structure	Union	Int, etc.
Usage Context: -\
In declaration or sizeof()	< ---- Confer with cdecl ---- >
In ordinary expression or as function argument	Invoke	As pointers	s . f == (&s)->f	As arithmetic values
Allowed as Lvalue	No	No	Yes	Yes after 1984	Yes

A key point in this table is that pointers and arrays are interpreted the same in ordinary expressions (although different code is generated).

The data types can be compounded, except that ints, etc. are primitive and functions must return lvalues. Thus valid types include ``functions returning pointers to arrays of functions returning structures of arrays of floats unioned with pointers to functions returning structures of arrays.''

Another point worthy of note is that the dot operator (s . f) can always be replaced with the arrow operator ((&s)->f). If you understand this you understand C; if you don't you don't.

Contrariwise if the arrow operator is confounding you this morning you can replace it with a dot: (p->f == (*p) . f).

Recalling the discussion of regularity above, let's emphasize the complete and unrestricted equivalence of the stated forms. We are often condemned to imperfection in this world, and simplifying notions come with caveats and implementation bugs. The elegance and perfection (simplicity) of C language syntax stands out in bold and noble contrast.

The utter elegance of C is demonstrated again by the following expressions, all of which are exactly equivalent (if they're legal at all) when used in ordinary expressions:


        * ("Howdy" + jj)
        * (jj + "Howdy")
         "Howdy" [jj]
         jj ["Howdy"]

Addition is always commutative in C (though not in C++) so once you know the first three expressions are equivalent, the fourth joins the group of necessity. This comes as a surprise to most when they first learn it: -- an array and its index don't seem commutable.

Realizing that *ptr and ptr[0] are synonyms, and that & is just the inverse of *, we know immediately that all of the following (or rather that subset of them legal in a given context) must be equivalent:


        * ptr
        * * & ptr
        * & * ptr
        & * * ptr            /* legal if (*ptr) is a pointer */
        & & * * * ptr        /* legal if (**ptr) is a pointer */
        * (ptr + 0)
        ptr [0]

Appreciate the sheer simplicity and elegance ! Note that C's choice of 0 for the first index of an array follows as the night the day as long as we insist that


        ptr == ptr + 0

Despite this, one still hears some disaffected C programmers say something like: ``Dennis Ritchie should have consulted me; arrays and structures should behave the same, and the first element of an array should be [1] not [0].'' Frankly, I think such comments show confusion.

Note that C has simple solutions for these simple complaints. Consider:


        float   Costs[29];         /* I wish I could be an Lvalue :-(  */
        struct  { ... } Platoon;   /* I wish I could be a pointer :-(  */

Simply replace these definitions with:


        struct { float foo[29]; }  Costs;   /* I can be an Lvalue :-)  */
        struct  { ... } Platoon[1];         /* I can be a pointer :-)  */

As to starting arrays at index 1, that's how Numerical Recipes in C does it. The simple, regular, flexible grammar of C makes it trivial for Numerical Recipes to adopt the deviation it chooses.

C combines the most important virtues of both low-level and high-level programming languages. True, C language may fail a code simplicity test when measured against intelligent use of the incisive expressive power possessed by ``high-level'' languages like C++, SQL, or Prolog. And true, C lacks the beautiful simplicity of ``low-level'' syntax-driven languages like Lisp, Forth or Prolog. No language can satisfy everyone's ideals, but I think Dennis Ritchie's beautiful creation is an excellent compromise, combining most of the semantic simplicity of low-level language and most of the coding simplicity of high-level language. Try to appreciate its merits before seeking improvements!

C has a reputation as a difficult obfuscated language. The reality is just the opposite! (I wonder if C's supposed difficulties are advertised by programmers seeking job security.)

S i m p l i c i t y m a y r e q u i r e r e t y p i n g w o r k i n g c o d e .

Exercise 5.3: Aggregation operators are regular

This is an exercise for the student; To get maximal benefit from this course you should now design, write, test and debug a C program. Try to make it a simplicity of beauty. Please try to work the problem yourself before looking at the suggested answer.

In a cycle-free directed graph, how many paths are there from A to B? How long is the longest path? The shortest?

Write a program to solve this problem. You need not read in an arbitrary directed graph: just generate a large (but, for your convenience, simple) test case.

When you've debugged your code, and rewritten and tweaked it to make it beautiful, take another look at it. Does it capture a particular simplicity?

S i m p l i c i t y o f t h e w h o l e m a y r e q u i r e c o m p l e x i t y o f a p i e c e .

Exercise 5.4: Waldo's Nim (Sparse arrays)

This is another (C language) coding exercise for the student. (It also has a Suggested Solution, though some kibitzers will argue that that code should get a very low grade. :-)

Play Waldo's Nim against a teletype client. No extra credit here for fancy graphics, but do not play a losing move when a win is available. (The subtitle `sparse array' is a hint towards my suggested solution, but feel free to take a very different approach and mail your debugged code to me. It will be my privilege to post the best one.)

Waldo's Nim is a game in the Nim or Chomp family. Start with twelve rows of twelve jelly beans each. Two players alternate turns, removing (and eating) jelly beans. Whoever eats the last jelly bean loses. The only legal moves are a-by-b ``rectangles:'' You remove b beans from each of the a smallest (non-empty) rows. The player can choose his own a and b, as long as both numbers are positive, there are at least a non-empty rows, and at least b beans in each non-empty row.

S i m p l i c i t y i s i n t h e e y e o f t h e b e h o l d e r .

Lesson 5.5: Macros for Simplicity

The present letter is a very long one, simply because I had no leisure to make it shorter.
- Blaise Pascal (1623-62)
I am never satisfied until I have said as much as possible in a few words, and writing briefly takes far more time than writing at length.
- Karl Gauss (1777-1855)
I didn't have time to write a short letter, so I wrote a long one instead.
- Mark Twain (1835-1910)
Never before have I written so long a letter.... I can assure you that it would have been much shorter if I had been writing from a comfortable desk ...
- Martin Luther King, Jr. (1929-1968)

Macros and functions are both used to ``factor'' code, and thereby call attention to its simplicity; a main difference is that macros can be used in syntactic contexts where functions can not be. Some languages have macro facilities much more powerful than C's; some have facilities less powerful. I like C's ``happy medium'' -- flexible enough for most purposes, yet simple enough to be very transparent.

Since macros will distract the reader somewhat, I tend to avoid macros when their value is slight. On the other hand, I've written some horrendous macros when they seemed appropriate.

Here are four programs on this website which make extensive use of macros:

The Logic puzzle solver from Lesson 9.2.
The Flexible hash library from Lesson 3.5
The Suggested Solution to Waldo's Nim, from Lesson 5.4.
A subroutine to check for five-in-a-row in a Connect-Five game.

Of these, example (1) uses macros to fashion a mini-language; example (2) uses macros so the program can be easily tuned for a specific application and recompiled; examples (3) and (4) are the ``horrendous'' macros, yet I'm not sure any alternative would be better. (I submitted both examples to Usenet discussion and no one could offer an improvement.)

Many macros are used to avoid code replication -- define a macro once and invoke it many times -- but saving time is not the primary purpose (after all, you might be able to replicate the code using an editor facility). Avoiding the replication makes it easier to understand or modify the code. To emphasize the real purpose of macros, I've quoted Pascal and Twain at the top of this section.

Please send me some e-mail.

Please visit my family tree.