Different strlen depending on other variable assignment

Table of Contents

Statement of problem1

I'm using strlen to get the length of a char array, sentence.

When I run the following, the sentence's length is output as 12, despite being only 10 bytes wide:

/* mre.c */
#include <stdio.h>
#include <string.h>


int
main (void)
{
  int length, i, beg;
  char sentence[10] = "this is 10";

  length = strlen (sentence);
  printf ("length: %d\n", length);

  for (i = 0; i < length; i++)
    {
      beg = 0;
    }

  return 0;
}
length: 12

When beg = 0; is removed, it returns the expected result:

/* mre.c */
#include <stdio.h>
#include <string.h>


int
main (void)
{
  int length, i, beg;
  char sentence[10] = "this is 10";

  length = strlen (sentence);
  printf ("length: %d\n", length);

  for (i = 0; i < length; i++)
    {
      /* beg = 0; */
    }

  return 0;
}
length: 10

I notice that if I print the sentence a char at a time within a shell within Emacs, I see two extra chars:

/* mre.c */
#include <stdio.h>
#include <string.h>


int
main (void)
{
  int length, i, beg;
  char sentence[10] = "this is 10";

  length = strlen (sentence);
  printf ("length: %d\n", length);

  for (i = 0; i < length; i++)
    {
      beg = 0;
      printf ("%c", sentence[i]);
    }

  return 0;
}
length: 12
this is 10^@^@

I'm at a loss for how to explain this.

Conclusion

The strange behavior happens because of undefined behavior. The string lacks a null terminator and is, therefore, ill-defined. The strlen function doesn't know how to handle the bad string.

C strings

String constant

Quoting (gnu-c-manual) String Constants (which is almost verbatim to K & R),

A string constant is a sequence of zero or more characters, digits, and escape sequences enclosed within double quotation marks. A string constant is of type "array of characters". All string constants contain a null termination character ('\0') as their last character.

It goes on to say,

The null termination character lets string-processing functions know where the string ends.

There is no way to tell the length of a string without a null terminator. Library functions expecting a string expect a null terminator!

There are two common practices:

  • allocate enough space to include the null terminator
/* one more than the number of characters */
char str[6] = "hello";

/* sometimes written as follows out of courtesy */
char str[5+1] = "hello";
  • let the compiler allocate memory

When written without specifying the length, the compiler will automatically allocate the right amount.

/* compiler automatically allocates 6 bytes */
char str[] = "hello";

It's advised to "ask the compiler". That is, let the compiler allocate space and then use sizeof to get the size.

Two kinds of quotes

Single-quotes

Single quotes define character constants.

The constant has type int, and its value is the character code of that character.

[Although] the character constant’s value has type int…the character code is treated initially as a char value, which is then converted to int. If the character code is greater than 127 (0177 in octal), the resulting int may be negative on a platform where the type char is 8 bits long and signed.

Double-quotes – string constant (a.k.a. string literal).

A string constant has type "array of characters" (that is, char *) and storage class static which means it persists across the exit of a block. Its value should not be changed (that is, the array is probably stored in a read-only area of memory). Trying to change a string literal may result in SIGSEGV or, if written to a location that's not read-only, may cause unexpected results. Double quotes tells the compiler to append a null terminator automatically. Sometimes the null terminator is rendered as ^@, other times it's written like '\0' or simply 0.

#include <stdio.h>

char str[5+1] = "hello";

printf ("%c", str[4]);
printf ("%c", str[5]);
o^@

strlen

The strlen function,

returns the offset of the terminating null byte within the array

When no terminating null byte exists, you get undefined behavior. The function with continue scanning memory until it reaches something that terminates it–a null byte, a memory access violation (seg fault), or something else. It's impossible to tell what will happen.

References

Footnotes:

2023-01-10

Powered by peut-publier

©2024 Excalamus.com