151. Reverse Words in a String

Problem statement P1
Problem statement P2
- Solution B1
- Reflection

Problem statement P1

Given an input string s, reverse the order of the words.

A word is defined as a sequence of non-space characters. The words in s will be separated by at least one space.

Return a string of the words in reverse order.

Solution P1a: Arrays

One technique for reversing a string is to swap characters at opposite ends of the string. This approach requires only half of the string be processed. This is because you swap the first and last character, the second and second to last character, and so on, up until the middle.

Using character swapping requires two passes to reverse the words in a sentence:

Reverse the entire string
Reverse each word in the string

The time complexity will be O(N²), where N is the length of the string. The first pass takes N/2 steps, but the second pass nests a N/2 maximum loop inside a N step loop. This gives N/2 + (N*N)/2 => O(N²). Space is O(N) as it scales with the length of the string.

#include <stdio.h>
#include <string.h>

/* reverse a STRING between indices [BEG, END) */

void
reverse (char string[], int beg, int end)
{
  int length = end - beg;
  char swap;
  int i;

  for (i = 0; i < length / 2; i++)
    {
      swap = string[beg + i];
      string[beg + i] = string[end - 1 - i];
      string[end - 1 - i] = swap;
    }
}


int
main (void)
{
  char sentence[] = "today is a  new day!";
  int length, word_begin, current;

  length = strlen (sentence);

  /* print indices */

  printf (" ");
  for (current = 0; current < length; current++)
    {
      printf ("%d", current % 10);
    }
  printf (" <--index\n");

  /* reverse string */

  printf ("'%s'\n", sentence);
  reverse (sentence, 0, length);
  printf ("'%s'\n", sentence);

  /* reverse words */

  for (current = 0; current < length; current++)
    {
      if (sentence[current] == ' ')
        {
          reverse (sentence, word_begin, current);
          word_begin = current + 1;
        }
      else if (current == length - 1)
        {
          reverse (sentence, word_begin, length);
        }
    }

  printf ("'%s'\n", sentence);

  return 0;
}

 01234567890123456789 <--index
'today is a  new day!'
'!yad wen  a si yadot'
'day! new  a is today'

Solution P1b: Pointers

This solution, modified from GeeksForGeeks, is basically the same as Solution P1a but differs in implementation. It uses the same two steps:

Reverse the entire string
Reverse each word in the string

Rather than working with arrays, it uses pointers instead of indexing off the length. I feel it's slightly easier to understand because the pointer always references the current character. In Solution P1a, especially in the reverse function, there needed to be an offset calculation. That's not needed in this solution.

The drawback is that in order to reverse the entire string, we need a pointer to the end of the string. The only way to get that is to traverse the string. This makes the code less flexible. The code can't be rearranged so that each word is reversed before the entire sentence is. The sentence must be reversed first.

The time and space complexity is the same as Solution P1a, for the same reasons. O(N²) for time and O(N) for space.

#include <stdio.h>

/* reverse array starting at BEGIN and ending at END */

void
reverse (char *begin, char *end)
{
  char temp;
  while (begin < end)
    {
      temp = *begin;
      *begin++ = *end;
      *end-- = temp;
    }
}

int
main ()
{
  char s[] = "today is a new day!";
  char *word_begin = s;
  char *current = s;

  printf ("%s\n", s);

  /* reverse words */

  while (*current)
    {
      current++;
      if (*current == ' ')
        {
          reverse (word_begin, current - 1);
          word_begin = current + 1;
        }
      else if (*current == '\0')
        {
          reverse (word_begin, current - 1);
        }
    }

  printf ("%s\n", s);

  /* reverse entire string */

  reverse (s, current - 1);

  printf ("%s\n", s);

  return 0;
}

today is a new day!
yadot si a wen !yad
day! new a is today

Reflection

The array solution is more flexible than the pointer solution in that the two steps are independent, allowing code to be rearranged without error. However, it is prone to off-by-one errors. The pointer solution feels conceptually cleaner. Yet, the code is less robust. We could find the end of the string first, reverse the whole sentence, and then process the words (see below). This feels contrived (as do all these kinds of problems).

#include <stdio.h>

/* reverse array starting at BEGIN and ending at END */

void
reverse (char *begin, char *end)
{
  char temp;
  while (begin < end)
    {
      temp = *begin;
      *begin++ = *end;
      *end-- = temp;
    }
}

int
main ()
{
  char s[] = "today is a new day!";
  char *word_begin = s;
  char *current = s;

  printf ("%s\n", s);

  /* reverse entire string */

  /* move to end of sentence */
  while (*current)
    {
      current++;
    }

  reverse (s, current - 1);

  printf ("%s\n", s);

  /* reverse words */

  /* move to start of sentence */
  current = s;

  while (*current)
    {
      current++;
      if (*current == ' ')
        {
          reverse (word_begin, current - 1);
          word_begin = current + 1;
        }
      else if (*current == '\0')
        {
          reverse (word_begin, current - 1);
        }
    }

  printf ("%s\n", s);

  return 0;
}

today is a new day!
!yad wen a si yadot
day! new a is today

As far as algorithms go, it would be good to find a better time complexity. O(N²) is not great. Unfortunately, I don't think the current technique–reversing words as we traverse the sentence–can be made more efficient. The problem needs a different approach.

Problem statement P2

The previous solutions have O(N²) runtime complexity. On reflection, it seems the approach, which requires nested loops, forces that complexity. Can we find a different, better approach?

Let's use Polya's method:

What is the unknown? The steps required to reverse the words in the string.
What are the data? An array of characters.
What is the condition? The words the characters make up must be in the reverse order to their present arrangement.

The previous approach operated on the characters of the words, not the words themselves. Can we operate on words rather than characters? What do we know about the characters? The characters have two types: whitespace and non-whitespace. Whitespace is relevant only so far as distinguishing a word. The previous problem preserved whitespace. How might we approach the problem if we removed the restriction to preserve whitespace?

Given an input string s, reverse the order of the words.

A word is defined as a sequence of non-space characters. The words in s will be separated by at least one space.

Return a string of the words in reverse order concatenated by a single space.

Note that s may contain leading or trailing spaces or multiple spaces between two words. The returned string should only have a single space separating the words. Do not include any extra spaces.

Solution B1

Fred Brooks, Rob Pike, and Robert Sedgewick (and probably many others) all point out that data structures dictate algorithms. If you've chosen the right data structures and organized things well, the algorithms will almost always be self-evident.

The previous solutions used an array of characters. If instead we had a stack of words, with the first word of the sentence at the bottom and the last word on top, we could pop the words off the list and our sentence would be reversed word-wise.

Array of characters:
"If this is the sentence."

Stack of words:

  sentence. --+
  the --------+---+
  is ---------+---+---+
  this -------+---+---+---+
  If ---------+---+---+---+---+
              |   |   |   |   |
              V   V   V   V   V
       sentence. the  is this If

That seems to prove it. With a change in data structure, the algorithm is trivial:

Split the sentence into words
Put the words on a stack, starting with the first word
Once all the words in the sentence have been put on the stack, pop until none remain, joining them with a space.

The difficult part, now, is to implement it.

How should we represent a word?

Let's not over-complicate it: a word is an array of characters. We might choose to null terminate the array so that each word is a proper string.

How should we implement the stack?

A stack is a collection of items governed by two operations, push and pop, such that popping removes the last item pushed (Last In, First Out). The collection is traditionally stored in an array or a linked list. An array requires a maximum number of items be declared. A linked list allows for arbitrary amounts of data. It would be easy enough to use an array, but that hardly seems like well designed software. After all, isn't that what we're trying to do here?

A linked list stack is constructed of nodes where each node has two fields, data and a reference to the previous node.

                                                                  +-- push --+
                                                                  |          |
        +-----+-----+        +-----+-----+        +-----+-----+   |    +-----+-----+
        |*prev|     |        |*prev|     |        |*prev|     |   |    |     |     |
NULL<------   |data |<----------   |data |<----------   |data |<--+    |     |     |
        +-----+-----+        +-----+-----+        +-----+-----+        +-----+-----+

                                                        |
                                                        |
                                                        V
                                                       pop

The advantage of a linked list is we don't need to allocate storage up-front. The question then is, when do we allocate storage and how much? We can defer answering these questions by using flexible array fields. If the last member of a struct is an array whose size is omitted, the initializer will determine the size (since C99).

struct node
{
  struct node *prev;
  char data[];
};

typedef struct node Stack;

To answer how much storage we need, we'll wait until the last possible moment–when we're about to push an item onto the stack. At that point, we'll know what word is being pushed, how big it is, and, therefore, be fully equipped to allocate storage.

Stack *
push (Stack * stack, char *text)
{
  Stack *sp = malloc (sizeof (Stack) + strlen (text));

  strcpy (sp->data, text);

  if (stack == NULL)
    {
      sp->prev = 0;
    }
  else
    {
      sp->prev = stack;
    }
  return sp;
}

Whenever we malloc, we must free. The safest place to allocate memory is where we free it. Unfortunately, it doesn't seem to make much sense to call free on push. The word must exist we until we pop it.

int
is_empty (Stack * stack)
{
  return stack == NULL;
}


Stack *
pop (Stack * stack)
{
  Stack *sp;
  sp = stack;
  if (is_empty (stack))
    {
      printf ("STACK UNDERFLOW\n");
    }
  else
    {
      stack = stack->prev;
      free (sp);
    }
  return stack;
}

Equipped with the right data structure, the challenge becomes the problems of reading and parsing the sentence. To read, we use getline. Given a NULL pointer, a size, and a stream, it allocates space, and reads from the stream until a newline character. It then calculates the size of the input and sets the pointer to data read in.

char *sentence = NULL;
size_t len = 0;
ssize_t read;

printf ("Enter sentence> ");

read = getline (&sentence, &len, stdin);

Next, we parse the sentence with strtok. By splitting on space and newline, we get each word (and not a trailing newline on the last word). strtok is initialized by passing a reference to the string to be split. Subsequent calls require a NULL pointer, as the function keeps track of its position within the sentence. It continues until the end of the sentence and returns a NULL pointer. This allows us to put everything in a loop.

for (word = strtok (sentence, DELIM); word; word = strtok (NULL, DELIM))
  {
  }

After the sentence is processed, we read the words back before popping them.

All together, it looks like:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>


#define DELIM " \n"

/* prototype */

struct node
{
  struct node *prev;
  char data[];
};

typedef struct node Stack;


Stack *push (Stack *, char *);
int is_empty (Stack *);
Stack *pop (Stack *);


/* implementation */

Stack *
push (Stack * stack, char *text)
{
  Stack *sp = malloc (sizeof (Stack) + strlen (text));

  strcpy (sp->data, text);

  if (stack == NULL)
    {
      sp->prev = NULL;
      stack = sp;
    }
  else
    {
      sp->prev = stack;
      stack = sp;
    }
  return stack;
}


int
is_empty (Stack * stack)
{
  return stack == NULL;
}


Stack *
pop (Stack * stack)
{
  Stack *sp;
  sp = stack;
  if (is_empty (sp))
    {
      printf ("STACK UNDERFLOW\n");
    }
  else
    {
      stack = stack->prev;
      free (sp);
    }
  return stack;
}


int
main (void)
{
  Stack *stack = NULL;
  char *sentence = NULL;
  char *word;
  size_t len = 0;
  ssize_t read;

  printf ("Enter sentence> ");

  read = getline (&sentence, &len, stdin);

  if (-1 != read)
    {
      for (word = strtok (sentence, DELIM); word; word = strtok (NULL, DELIM))
        {
          stack = push (stack, word);
        }

      while (!is_empty (stack))
        {
          printf ("%s ", stack->data);
          stack = pop (stack);
        }
      printf ("\b\n");
    }

  return 0;
}

Reflection

Armed with the right data structure, the bookkeeping is eliminated.

Time complexity improves to O(N), limited only by the length of the string processed. Space complexity is also O(N), again limited by the size of the sentence.

2023-01-11

151. Reverse Words in a String

Table of Contents

Problem statement P1

Solution P1a: Arrays

Solution P1b: Pointers

Reflection

Problem statement P2

Solution B1

Reflection