Module 3: Debugging

At a Glance

Prerequisites

Debugging

More important than knowing how to make something work is knowing what to do when it doesn't. Debugging is probably one of the most important skills you can learn, since things rarely work perfectly the first time.

Printf Debugging

A common method of debugging is inserting print statements throughout the program to log program flow and variable states. Although this has its place in debugging and is still useful, it has many drawbacks, especially in an embedded context.

  • Requires recompilation every time we want to look at something different - For embedded, this also means reflashing.
  • Requires cleanup after the problem is found.
  • Requires dump functions for anything more than an integer.
  • Printf is huge - For embedded, adding printf support can eat up a lot of RAM and flash.
  • Printf is slow - It takes a long time to actually format the string, then transmitting it through your debug output has an additional delay.
    • This can actually block the main program for a noticeable amount of time.
    • This will wreck havoc with any timing-sensitive operations such as race conditions.
  • It requires a debug output - In Module 2: Hello World, we needed a UART-to-USB adapter to see the serial output.
  • It can't handle crashes and relies on the debug output to be working.

Of course, like any tool, this has its place. If you're okay with the drawbacks, printf is great for convenient, human-readable continuous logging. We have a set of LOG macros for human-readable print statements in our HAL, such as the one you used in Module 2: Hello World.

GPIO Debugging

A higher-speed variant of this technique on embedded platforms is toggling GPIOs at certain points and using an oscilloscope or logic analyzer to trace execution. I find this most useful for timing-sensitive debugging.

As an example, I tend to use this technique to determine bottlenecks in functions by toggling the IO before and after chunks of code and looking at the timing conditions. This is useful because it doesn't require a programmer to be attached.

GDB

In software, knowing how to use a debugger like GDB is invaluable. Debuggers allow us to do things like step through running programs, inspect variables, and set breakpoints without recompilation. Since it can access the target's memory, it doesn't take any effort to dump entire structures or walk through arrays.

GDB is usually the first tool I reach for when something's wrong. Since it doesn't require recompilation, I can attach it to running programs or microcontrollers without any hassle. My workflow usually looks like this:

  1. Attach to the target using GDB
  2. Set up breakpoints on the functions or lines I suspect
  3. Run the program until my breakpoints are hit
  4. Look at the function's parameters, local variables, and the stack trace for any anomalies
  5. Continue execution and repeat until I've found the problem

Useful Commands

Keep in mind that this isn't an exhaustive list, just some commands that I find myself using a lot. GDB has many more abilities, including support for Python scripting!

Fun fact: GDB can handle short forms of commands. For example, bt instead of backtrace. GDB also supports tab completion for pretty much everything.

CommandExample/Short FormPurpose

help

help cmd

h

h b

Prints GDB command help

Prints help for the specific command

runrBegins the program and runs until next breakpoint or program end
continuecContinues the program until next breakpoint or program end
stepsSingle-step code, descending into functions
nextnSingle-step code without descending into functions
finishfinRun until the function/loop/etc. is complete
backtracebtPrint the stacktrace
up
Go up one context level in the stack
down
Go down one context level in the stack
print variablep/x fooPrints the specified variable - this command is very powerful and can be used with arbitrary memory addresses

break function

break file:line

break location if expr

b prv_func

b module.c:50

b module.c:50 if x > 20

Set a breakpoint at the specified function

Set a breakpoint at the specified line in the specified file

Set a breakpoint that will only stop if the expression is true

info breaki bLists breakpoints

delete

delete #breakpoint

d

d 1

Delete all breakpoints

Delete the specified breakpoint

Example

Let's create a simple program to test GDB with. To start, you should be in our development environment.

Terminal
# Make sure you're sshed into the vagrant box
vagrant ssh
# Should report vagrant
whoami
cd ~/shared/firmware
# Check the state of your local repo
git status
# Move back into the getting started branch
git checkout wip_getting_started

# Initialize new project
make new PROJECT=debug_test

Now, just like before, let's create a new main.c in the project's src folder.

main.c
#include <stdint.h>
#include "soft_timer.h"
#include "gpio.h"
#include "interrupt.h"
#include "log.h"
#include "wait.h"

#define DEBUG_TEST_TIMEOUT_MS 50

typedef struct DebugTestStorage {
  GpioAddress *leds;
  size_t num_leds;
  size_t current_led;
} DebugTestStorage;

static DebugTestStorage s_storage;

static void prv_init_leds(DebugTestStorage *storage) {
  GpioAddress leds[] = {
    { .port = GPIO_PORT_C, .pin = 8 },  //
    { .port = GPIO_PORT_C, .pin = 9 },  //
    { .port = GPIO_PORT_C, .pin = 6 },  //
    { .port = GPIO_PORT_C, .pin = 7 },  //
  };

  GpioSettings led_settings = {
    .direction = GPIO_DIR_OUT,
    .state = GPIO_STATE_HIGH
  };

  storage->leds = leds;
  storage->num_leds = SIZEOF_ARRAY(leds);
  storage->current_led = 0;

  for (size_t i = 0; i < SIZEOF_ARRAY(leds); i++) {
    gpio_init_pin(&leds[i], &led_settings);
  }
}

static void prv_timeout_cb(SoftTimerId timer_id, void *context) {
  DebugTestStorage *storage = context;

  // Toggle the current LED
  GpioAddress *led = &storage->leds[storage->current_led];
  LOG_DEBUG("Toggling LED P%d.%x\n", led->port, led->pin);
  gpio_toggle_state(led);

  // Next LED
  storage->current_led = (storage->current_led + 1) % storage->num_leds;

  soft_timer_start_millis(DEBUG_TEST_TIMEOUT_MS, prv_timeout_cb, &storage, NULL);
}

int main(void) {
  interrupt_init();
  soft_timer_init();
  gpio_init();

  prv_init_leds(&s_storage);
  uint32_t buffer[20];
  LOG_DEBUG("%ld LEDs set up\n", s_storage.num_leds);
  soft_timer_start_millis(DEBUG_TEST_TIMEOUT_MS, prv_timeout_cb, &s_storage, NULL);

  size_t i = 0;
  while (true) {
    wait();
    buffer[i % SIZEOF_ARRAY(buffer)] += i;
    i++;
  }

  return 0;
}

Just as side note, now would be a perfect time to commit your work! Initial revisions that might change drastically as you try to fix bugs are usually a good reference for reverting to when your fixes don't quite work out.

To begin debugging with GDB, we have a special gdb target in our build system.

Terminal
# This works with STM32 as well - just remove PLATFORM=x86
make gdb PROJECT=debug_test PLATFORM=x86

You should see the following prompt appear:

GDB
GNU gdb (Ubuntu 7.11.1-0ubuntu1~16.5) 7.11.1
Copyright (C) 2016 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from build/bin/x86/debug_test...done.
(gdb)

First, let's try to run it as is.


GDB
(gdb) r
Starting program: /home/vagrant/shared/firmware/build/bin/x86/debug_test
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[0] projects/debug_test/src/main.c:58: 4 LEDs set up
[0] projects/debug_test/src/main.c:43: Toggling LED P2.8

Program received signal SIGSEGV, Segmentation fault.
0x0000000000400c01 in printf (__fmt=0x401623 "[%u] %s:%u: Toggling LED P%d.%x\n")
    at /usr/include/x86_64-linux-gnu/bits/stdio2.h:104
104       return __printf_chk (__USE_FORTIFY_LEVEL - 1, __fmt, __va_arg_pack ());
(gdb)

If instead you get an error like

Program received signal SIG37, Real-time event 37.
0x00007ffff75fd369 in __sigqueue (pid=5662, sig=37, val=..., val@entry=...) at ../sysdeps/unix/sysv/linux/sigqueue.c:40
40 ../sysdeps/unix/sysv/linux/sigqueue.c: No such file or directory.

Try exiting gdb and running the following command, then trying again (see https://stackoverflow.com/a/16742034):

echo "set auto-load safe-path /" >> ~/.gdbinit

Woah, what happened? It seems like we hit a segfault, but GDB was able to catch it. Let's take a closer look at what caused that.

It's also possible that you won't get a segfault, but it should produce the wrong output!


First, we use backtrace to print the state of the stack. We can see that the segfault is occuring in prv_timeout_cb, which is the timer callback we registered. We can use up and down to move between function scopes in the stack. In our case, we move up to prv_timeout_cb. Since it's a segfault, we're probably attempting to access invalid memory. Looking at led and storage, we see that they're completely invalid. Since led is derived from storage, we should look at where storage is defined.

What's a callback?

A callback is just a function that we pass to be called at a later time! We also commonly provide user context variables since the function has likely gone out of scope.

For example, on line 61, soft_timer_start_millis takes in a callback function and context variable. Once the specified time has elapsed, the callback function is run (prv_timeout_cb) with the provided context (&s_storage).

GDB
(gdb) bt
#0  0x0000000000400c11 in printf (__fmt=0x401623 "[%u] %s:%u: Toggling LED P%d.%x\n")
    at /usr/include/x86_64-linux-gnu/bits/stdio2.h:104
#1  prv_timeout_cb (timer_id=<optimized out>, context=0x7fffffffd9c0)
    at projects/debug_test/src/main.c:45
#2  0x0000000000400e7e in prv_soft_timer_interrupt () at libraries/ms-common/src/x86/soft_timer.c:38
#3  prv_soft_timer_handler (interrupt_id=<optimized out>)
    at libraries/ms-common/src/x86/soft_timer.c:49
#4  <signal handler called>
#5  0x0000000000400ad0 in main () at projects/debug_test/src/main.c:64
(gdb) up
#1  prv_timeout_cb (timer_id=<optimized out>, context=0x7fffffffd9c0)
    at projects/debug_test/src/main.c:45
45        LOG_DEBUG("Toggling LED P%d.%x\n", led->port, led->pin);
(gdb) p led
$1 = (GPIOAddress *) 0x800000c029c0
(gdb) p *led
Cannot access memory at address 0x800000c029c0
(gdb) l
40      static void prv_timeout_cb(SoftTimerId timer_id, void *context) {
41        DebugTestStorage *storage = context;
42
43        // Toggle the current LED
44        GPIOAddress *led = &storage->leds[storage->current_led];
45        LOG_DEBUG("Toggling LED P%d.%x\n", led->port, led->pin);
46        gpio_toggle_state(led);
47
48        // Next LED
49        storage->current_led = (storage->current_led + 1) % storage->num_leds;
(gdb) p storage
$2 = (DebugTestStorage *) 0x7fffffffd9c0
(gdb) p *storage
$3 = {leds = 0x7fffffffd9c0, num_leds = 3257981071963740416, current_led = 6301696}
(gdb)

Now we know it's segfaulting because storage is invalid, but why is that? storage is provided through the context parameter of soft_timer_start_millis, which is called on lines 51 and 62. Looking at the code, the first call to soft_timer_start_millis is on line 62, so it should be equal to s_storage.

Let's set a breakpoint on prv_timeout_cb and compare storage with s_storage, since they should be identical. Since storage is set in the function, we'll step forward a few lines so that it's initialized. Due to optimizations, although it's the first line in the code, it won't necessarily be the first thing to be executed.

You could also just set a breakpoint on the line: b main.c:41 and step forward one line (s or n) so it's executed.

GDB
(gdb) b prv_timeout_cb
Breakpoint 1 at 0x400bd6: file projects/debug_test/src/main.c, line 40.
(gdb) r
The program being debugged has been started already.
Start it from the beginning? (y or n) y
Starting program: /home/vagrant/shared/firmware/build/bin/x86/debug_test
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[0] projects/debug_test/src/main.c:60: 4 LEDs set up
Breakpoint 1, prv_timeout_cb (timer_id=0, context=0x602130 <s_storage>)
    at projects/debug_test/src/main.c:40
40      static void prv_timeout_cb(SoftTimerId timer_id, void *context) {
(gdb) n
45        LOG_DEBUG("Toggling LED P%d.%x\n", led->port, led->pin);
(gdb)
40      static void prv_timeout_cb(SoftTimerId timer_id, void *context) {
(gdb)
44        GPIOAddress *led = &storage->leds[storage->current_led];
(gdb)
41        DebugTestStorage *storage = context;
(gdb)
40      static void prv_timeout_cb(SoftTimerId timer_id, void *context) {
(gdb) p storage
$1 = (DebugTestStorage *) 0x602130 <s_storage>
(gdb) p &s_storage
$2 = (DebugTestStorage *) 0x602130 <s_storage>
(gdb)

So the first iteration seems to match up. Let's continue the program execution and the next time the breakpoint is hit.

GDB
(gdb) c
Continuing.
[0] projects/debug_test/src/main.c:45: Toggling LED P2.8

Breakpoint 1, prv_timeout_cb (timer_id=1, context=0x7fffffffd9c0)
    at projects/debug_test/src/main.c:40
40      static void prv_timeout_cb(SoftTimerId timer_id, void *context) {
(gdb) n
45        LOG_DEBUG("Toggling LED P%d.%x\n", led->port, led->pin);
(gdb)
40      static void prv_timeout_cb(SoftTimerId timer_id, void *context) {
(gdb)
44        GPIOAddress *led = &storage->leds[storage->current_led];
(gdb)
41        DebugTestStorage *storage = context;
(gdb)
40      static void prv_timeout_cb(SoftTimerId timer_id, void *context) {
(gdb) p storage
$3 = (DebugTestStorage *) 0x7fffffffd9c0
(gdb) p *storage
$4 = {leds = 0x7fffffffd9c0, num_leds = 5124990741377889536, current_led = 6301696}
(gdb)

storage has changed and it's obviously invalid now. If we continue running the program, we'll hit a segfault.

So like we said earlier, soft_timer_start_millis provides context, which storage is assigned as. Other than line 62, context is provided on line 51 as &storage.

That actually passes the address of the storage variable (which is on the stack) when we actually want to pass the value of storage that contains the address of s_storage. So the fix is to replace &storage with storage. With that change, it no longer segfaults!


Here's another great place to commit your changes! It's a nice, cohesive change that can be summarized in a line or two if necessary.

Hopefully, you were able to see how GDB allows you to observe the state and data of your program without recompiling or relying on debug output. If you are more of a visual learner, CS246 has an amazing GDB tutorial video on their Youtube page that goes over these commands and some other helpful ones. Although they use C++, many of the same principles still apply to C.

There's actually another issue in this code. The expected behavior is for all LEDs to toggle in sequence, but the values are all wrong. Can you figure out why?

Hint: Be careful with addresses to stack-allocated variables!