Understanding while(*s++ = *t++)

Table of Contents

Consider the following implementation of strcpy:

/* strcpy: copy t to s; pointer version 3 */
void strcpy (char *s, char *t)
  {
    while (*s++ = *t++)
      ;
  }

K & R says the idiom of strcpy should be mastered. So does Joel1. Indeed, understanding it requires a surprisingly deep knowledge of C.

Order of evaluation

The actual order of evaluation is:

  1. while
  2. postfix increment
  3. indirection
  4. assignment

The effective order is:

  1. while
  2. indirection
  3. assignment
  4. increment

Walkthrough

The first thing evaluated is the while expression. Appendix A9.5 of K & R says that the substatement will execute repeatedly so long as the expression remains unequal to 0.2 Since the substatement doesn't exist, everything here happens in the expression.

The expression is

*s++ = *t++

It involves two variables, s and t, and three operators, dereference (*), increment (++), and assignment (=). The C spec fully specifies the precedence and associativity of operators, or the sequence in which operators are evaluated. The evaluation order of operands is largely undefined.3 Fortunately, since dereference and increment are unary operators (they have one operand), this consideration only applies to the assignment.

The order of precedence and associativity is given in the spec by the order of the sections.4 Increment comes first, followed by dereference, finishing with assignment.5

How postfix-increment operates in this context is a little tricky. At first glance, it seems that by immediately incrementing s and t, we would not copy the first character. Consider the following6,

#include <stdio.h>
char s[] = "first sentence";
char t[] = "the second one";

char *sp = s;
char *tp = t;

printf ("%c\n", *sp);
printf ("%c\n", *tp);

sp++;
tp++;

printf ("%c\n", *sp);
printf ("%c\n", *tp);
f
t
i
h

Here, we incremented the value and then dereferenced. This is fundamentally different from how *s++ = *t++ works! Postfix- increment evaluates first and then increments after the value is used. So, it is the current value of s or t that is dereferenced8. The assignment then occurs, putting the value of t into s and the new value of s (which is t) is returned9. Finally, the increment of s and t happens. Since s and t are pointers, incrementing moves them according to their type. As pointers to char, they point at the next respective char in memory.

The process repeats. while checks the expression, the result of the assignment. If it's the end of the string, \0 or 0, the loop terminates.

Let's see it all in action:

#include <stdio.h>

char s[] = "first sentence";
char t[] = "the second one";

/* strcpy: copy t to s; pointer version 3 */
void strcpy (char *s, char *t)
  {
    while (*s++ = *t++)
      ;
  }

printf ("Before copy:\n", s);
printf ("%s\n", s);
printf ("%s\n\n", t);

strcpy (s, t);

printf ("After copy:\n", s);
printf ("%s\n", s);
printf ("%s\n", t);
Before copy:
first sentence
the second one

After copy:
the second one
the second one

Footnotes:

2

A9.5 Iteration Statements

Iteration statements specify looping.
  iteration-statement:
    while (expression) statement
    do statement while (expression);
    for (expression_opt; expression_opt; expression_opt) statement

In the while and do statements, the substatement is executed repeatedly so long as the value of the expression remains unequal to 0; the expression must have arithmetic or pointer type. With while, the test, including all side effects from the expression, occurs before each execution of the statement; with do, the test follows each iteration.

3

A7. Expressions

The precedence and associativity of operators is fully specified, but the order of evaluation of expressions is, with certain exceptions, undefined, even if the subexpressions involve side effects. That is, unless the definition of the operator guarantees that its operands are evaluated in a particular order, the implementation is free to evaluate operands in any order, or even to interleave their evaluation.

4

A.7 Expressions

The precedence of expression operators is the same as the order of the major subsections of this section, highest precedence first.

5

Dereference is also called "indirection" and that's how it's referred to in the specification. The relevant sections are:

A7.3 Postfix Expressions A7.3.4 Postfix Incrementation

A7.4 Unary Operators A7.4.3 Indirection Operator

A7.17 Assignment Expressions

6

It's not obvious why things need to be defined this way. First, since we want to modify the arrays (by copying one into the other), we must define them using array syntax.

char s[] = "first sentence";

This creates an array which is initialized using the string literal "first sentence".

Second, the name of the array, s, is not a variable. Even though it decays to a pointer to the array's initial element, it's not a modifiable lvalue7. We can't operate on the array name as though it were a pointer. Things like s++ are undefined behavior and will cause problems like segmentation faults (or worse). It's not entirely clear why C is specified this way. The reason is likely historical. To modify the array, we need to use either array indexing or a pointer to the array. In the case of our function, we can use the name s because when it's passed to a function, the pointer is copied, we're able to modify the copy directly, and it's an lvalue.

We may be tempted to declare the string explicitly using a pointer,

char *s = "first sentence";

This creates a pointer to a string literal. However, string literals are static. Their values are retained across block exits. This means we can't modify the object the pointer references.

7

An lvalue is an addressable object, something that can appear on the left side of an assignment. An rvalue is anything else.

8

It's a little confusing because the spec says that the result of postfix incrementation is not an lvalue (A7.3.4). It's unclear what's meant by "result". Is it the value of the operand, such as s or t, which are objects and lvalues? Or, is it the incremented thing, whatever that is?

Indirection doesn't explicitly require an lvalue. Instead, indirection results in an lvalue if the operand is a pointer to an object of arithmetic, structure, union, or pointer type (A7.4.3). Again, it's unclear if indirection results in an lvalue.

Assignment requires an lvalue as its left operand. So, the result of the indirection must be an lvalue. This implies that the operand of the indirection is a pointer to an object of one of the listed types. Which one?

9

A7.17 Assignment Expressions

There are several assignment operators; all group right-to-left.

  assignment-expression:
    conditional-expression
    unary-expression assignment-operator assignment-expression

  assignment-operator: one of
    = *= /= %= += -= <<= >>= &= ^= |=

All require an lvalue as left operand, and the lvalue must be modifiable…

The type of an assignment expression is that of its left operand, and the value is the value stored in the left operand after the assignment has taken place.

2023-01-12

Powered by peut-publier

©2024 Excalamus.com