Module 3: Debugging
At a Glance
Prerequisites
Debugging
More important than knowing how to make something work is knowing what to do when it doesn't. Debugging is probably one of the most important skills you can learn, since things rarely work perfectly the first time.
Printf Debugging
A common method of debugging is inserting print statements throughout the program to log program flow and variable states. Although this has its place in debugging and is still useful, it has many drawbacks, especially in an embedded context.
- Requires recompilation every time we want to look at something different - For embedded, this also means reflashing.
- Requires cleanup after the problem is found.
- Requires dump functions for anything more than an integer.
- Printf is huge - For embedded, adding printf support can eat up a lot of RAM and flash.
- Printf is slow - It takes a long time to actually format the string, then transmitting it through your debug output has an additional delay.
- This can actually block the main program for a noticeable amount of time.
- This will wreck havoc with any timing-sensitive operations such as race conditions.
- It requires a debug output - In Module 2: Hello World, we needed a UART-to-USB adapter to see the serial output.
- It can't handle crashes and relies on the debug output to be working.
Of course, like any tool, this has its place. If you're okay with the drawbacks, printf is great for convenient, human-readable continuous logging. We have a set of LOG
macros for human-readable print statements in our HAL, such as the one you used in Module 2: Hello World.
GPIO Debugging
A higher-speed variant of this technique on embedded platforms is toggling GPIOs at certain points and using an oscilloscope or logic analyzer to trace execution. I find this most useful for timing-sensitive debugging.
As an example, I tend to use this technique to determine bottlenecks in functions by toggling the IO before and after chunks of code and looking at the timing conditions. This is useful because it doesn't require a programmer to be attached.
GDB
In software, knowing how to use a debugger like GDB is invaluable. Debuggers allow us to do things like step through running programs, inspect variables, and set breakpoints without recompilation. Since it can access the target's memory, it doesn't take any effort to dump entire structures or walk through arrays.
GDB is usually the first tool I reach for when something's wrong. Since it doesn't require recompilation, I can attach it to running programs or microcontrollers without any hassle. My workflow usually looks like this:
- Attach to the target using GDB
- Set up breakpoints on the functions or lines I suspect
- Run the program until my breakpoints are hit
- Look at the function's parameters, local variables, and the stack trace for any anomalies
- Continue execution and repeat until I've found the problem
Useful Commands
Keep in mind that this isn't an exhaustive list, just some commands that I find myself using a lot. GDB has many more abilities, including support for Python scripting!
Fun fact: GDB can handle short forms of commands. For example, bt
instead of backtrace
. GDB also supports tab completion for pretty much everything.
Command | Example/Short Form | Purpose |
---|---|---|
help help cmd | h h b | Prints GDB command help Prints help for the specific command |
run | r | Begins the program and runs until next breakpoint or program end |
continue | c | Continues the program until next breakpoint or program end |
step | s | Single-step code, descending into functions |
next | n | Single-step code without descending into functions |
finish | fin | Run until the function/loop/etc. is complete |
backtrace | bt | Print the stacktrace |
up | Go up one context level in the stack | |
down | Go down one context level in the stack | |
print variable | p/x foo | Prints the specified variable - this command is very powerful and can be used with arbitrary memory addresses |
break function break file:line break location if expr | b prv_func b module.c:50 b module.c:50 if x > 20 | Set a breakpoint at the specified function Set a breakpoint at the specified line in the specified file Set a breakpoint that will only stop if the expression is true |
info break | i b | Lists breakpoints |
delete delete #breakpoint | d d 1 | Delete all breakpoints Delete the specified breakpoint |
Example
Let's create a simple program to test GDB with. To start, you should be in our development environment.
# Make sure you're sshed into the vagrant box vagrant ssh # Should report vagrant whoami cd ~/shared/firmware # Check the state of your local repo git status # Move back into the getting started branch git checkout wip_getting_started # Initialize new project make new PROJECT=debug_test
Now, just like before, let's create a new main.c
in the project's src
folder.
#include <stdint.h> #include "soft_timer.h" #include "gpio.h" #include "interrupt.h" #include "log.h" #include "wait.h" #define DEBUG_TEST_TIMEOUT_MS 50 typedef struct DebugTestStorage { GpioAddress *leds; size_t num_leds; size_t current_led; } DebugTestStorage; static DebugTestStorage s_storage; static void prv_init_leds(DebugTestStorage *storage) { GpioAddress leds[] = { { .port = GPIO_PORT_C, .pin = 8 }, // { .port = GPIO_PORT_C, .pin = 9 }, // { .port = GPIO_PORT_C, .pin = 6 }, // { .port = GPIO_PORT_C, .pin = 7 }, // }; GpioSettings led_settings = { .direction = GPIO_DIR_OUT, .state = GPIO_STATE_HIGH }; storage->leds = leds; storage->num_leds = SIZEOF_ARRAY(leds); storage->current_led = 0; for (size_t i = 0; i < SIZEOF_ARRAY(leds); i++) { gpio_init_pin(&leds[i], &led_settings); } } static void prv_timeout_cb(SoftTimerId timer_id, void *context) { DebugTestStorage *storage = context; // Toggle the current LED GpioAddress *led = &storage->leds[storage->current_led]; LOG_DEBUG("Toggling LED P%d.%x\n", led->port, led->pin); gpio_toggle_state(led); // Next LED storage->current_led = (storage->current_led + 1) % storage->num_leds; soft_timer_start_millis(DEBUG_TEST_TIMEOUT_MS, prv_timeout_cb, &storage, NULL); } int main(void) { interrupt_init(); soft_timer_init(); gpio_init(); prv_init_leds(&s_storage); uint32_t buffer[20]; LOG_DEBUG("%ld LEDs set up\n", s_storage.num_leds); soft_timer_start_millis(DEBUG_TEST_TIMEOUT_MS, prv_timeout_cb, &s_storage, NULL); size_t i = 0; while (true) { wait(); buffer[i % SIZEOF_ARRAY(buffer)] += i; i++; } return 0; }
Just as side note, now would be a perfect time to commit your work! Initial revisions that might change drastically as you try to fix bugs are usually a good reference for reverting to when your fixes don't quite work out.
To begin debugging with GDB, we have a special gdb
target in our build system.
# This works with STM32 as well - just remove PLATFORM=x86 make gdb PROJECT=debug_test PLATFORM=x86
You should see the following prompt appear:
GNU gdb (Ubuntu 7.11.1-0ubuntu1~16.5) 7.11.1 Copyright (C) 2016 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-linux-gnu". Type "show configuration" for configuration details. For bug reporting instructions, please see: <http://www.gnu.org/software/gdb/bugs/>. Find the GDB manual and other documentation resources online at: <http://www.gnu.org/software/gdb/documentation/>. For help, type "help". Type "apropos word" to search for commands related to "word"... Reading symbols from build/bin/x86/debug_test...done. (gdb)
First, let's try to run it as is.
(gdb) r Starting program: /home/vagrant/shared/firmware/build/bin/x86/debug_test [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". [0] projects/debug_test/src/main.c:58: 4 LEDs set up [0] projects/debug_test/src/main.c:43: Toggling LED P2.8 Program received signal SIGSEGV, Segmentation fault. 0x0000000000400c01 in printf (__fmt=0x401623 "[%u] %s:%u: Toggling LED P%d.%x\n") at /usr/include/x86_64-linux-gnu/bits/stdio2.h:104 104 return __printf_chk (__USE_FORTIFY_LEVEL - 1, __fmt, __va_arg_pack ()); (gdb)
If instead you get an error like
Program received signal SIG37, Real-time event 37.
0x00007ffff75fd369 in __sigqueue (pid=5662, sig=37, val=..., val@entry=...) at ../sysdeps/unix/sysv/linux/sigqueue.c:40
40 ../sysdeps/unix/sysv/linux/sigqueue.c: No such file or directory.
Try exiting gdb and running the following command, then trying again (see https://stackoverflow.com/a/16742034):
echo "set auto-load safe-path /" >> ~/.gdbinit
Woah, what happened? It seems like we hit a segfault, but GDB was able to catch it. Let's take a closer look at what caused that.
It's also possible that you won't get a segfault, but it should produce the wrong output!
First, we use backtrace to print the state of the stack. We can see that the segfault is occuring in prv_timeout_cb
, which is the timer callback we registered. We can use up
and down
to move between function scopes in the stack. In our case, we move up to prv_timeout_cb
. Since it's a segfault, we're probably attempting to access invalid memory. Looking at led
and storage
, we see that they're completely invalid. Since led
is derived from storage
, we should look at where storage
is defined.
What's a callback?
A callback is just a function that we pass to be called at a later time! We also commonly provide user context variables since the function has likely gone out of scope.
For example, on line 61, soft_timer_start_millis
takes in a callback function and context variable. Once the specified time has elapsed, the callback function is run (prv_timeout_cb
) with the provided context (&s_storage
).
(gdb) bt #0 0x0000000000400c11 in printf (__fmt=0x401623 "[%u] %s:%u: Toggling LED P%d.%x\n") at /usr/include/x86_64-linux-gnu/bits/stdio2.h:104 #1 prv_timeout_cb (timer_id=<optimized out>, context=0x7fffffffd9c0) at projects/debug_test/src/main.c:45 #2 0x0000000000400e7e in prv_soft_timer_interrupt () at libraries/ms-common/src/x86/soft_timer.c:38 #3 prv_soft_timer_handler (interrupt_id=<optimized out>) at libraries/ms-common/src/x86/soft_timer.c:49 #4 <signal handler called> #5 0x0000000000400ad0 in main () at projects/debug_test/src/main.c:64 (gdb) up #1 prv_timeout_cb (timer_id=<optimized out>, context=0x7fffffffd9c0) at projects/debug_test/src/main.c:45 45 LOG_DEBUG("Toggling LED P%d.%x\n", led->port, led->pin); (gdb) p led $1 = (GPIOAddress *) 0x800000c029c0 (gdb) p *led Cannot access memory at address 0x800000c029c0 (gdb) l 40 static void prv_timeout_cb(SoftTimerId timer_id, void *context) { 41 DebugTestStorage *storage = context; 42 43 // Toggle the current LED 44 GPIOAddress *led = &storage->leds[storage->current_led]; 45 LOG_DEBUG("Toggling LED P%d.%x\n", led->port, led->pin); 46 gpio_toggle_state(led); 47 48 // Next LED 49 storage->current_led = (storage->current_led + 1) % storage->num_leds; (gdb) p storage $2 = (DebugTestStorage *) 0x7fffffffd9c0 (gdb) p *storage $3 = {leds = 0x7fffffffd9c0, num_leds = 3257981071963740416, current_led = 6301696} (gdb)
Now we know it's segfaulting because storage
is invalid, but why is that? storage
is provided through the context
parameter of soft_timer_start_millis
, which is called on lines 51 and 62. Looking at the code, the first call to soft_timer_start_millis
is on line 62, so it should be equal to s_storage
.
Let's set a breakpoint on prv_timeout_cb
and compare storage
with s_storage
, since they should be identical. Since storage
is set in the function, we'll step forward a few lines so that it's initialized. Due to optimizations, although it's the first line in the code, it won't necessarily be the first thing to be executed.
You could also just set a breakpoint on the line: b main.c:41
and step forward one line (s
or n
) so it's executed.
(gdb) b prv_timeout_cb Breakpoint 1 at 0x400bd6: file projects/debug_test/src/main.c, line 40. (gdb) r The program being debugged has been started already. Start it from the beginning? (y or n) y Starting program: /home/vagrant/shared/firmware/build/bin/x86/debug_test [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". [0] projects/debug_test/src/main.c:60: 4 LEDs set up Breakpoint 1, prv_timeout_cb (timer_id=0, context=0x602130 <s_storage>) at projects/debug_test/src/main.c:40 40 static void prv_timeout_cb(SoftTimerId timer_id, void *context) { (gdb) n 45 LOG_DEBUG("Toggling LED P%d.%x\n", led->port, led->pin); (gdb) 40 static void prv_timeout_cb(SoftTimerId timer_id, void *context) { (gdb) 44 GPIOAddress *led = &storage->leds[storage->current_led]; (gdb) 41 DebugTestStorage *storage = context; (gdb) 40 static void prv_timeout_cb(SoftTimerId timer_id, void *context) { (gdb) p storage $1 = (DebugTestStorage *) 0x602130 <s_storage> (gdb) p &s_storage $2 = (DebugTestStorage *) 0x602130 <s_storage> (gdb)
So the first iteration seems to match up. Let's continue the program execution and the next time the breakpoint is hit.
(gdb) c Continuing. [0] projects/debug_test/src/main.c:45: Toggling LED P2.8 Breakpoint 1, prv_timeout_cb (timer_id=1, context=0x7fffffffd9c0) at projects/debug_test/src/main.c:40 40 static void prv_timeout_cb(SoftTimerId timer_id, void *context) { (gdb) n 45 LOG_DEBUG("Toggling LED P%d.%x\n", led->port, led->pin); (gdb) 40 static void prv_timeout_cb(SoftTimerId timer_id, void *context) { (gdb) 44 GPIOAddress *led = &storage->leds[storage->current_led]; (gdb) 41 DebugTestStorage *storage = context; (gdb) 40 static void prv_timeout_cb(SoftTimerId timer_id, void *context) { (gdb) p storage $3 = (DebugTestStorage *) 0x7fffffffd9c0 (gdb) p *storage $4 = {leds = 0x7fffffffd9c0, num_leds = 5124990741377889536, current_led = 6301696} (gdb)
storage
has changed and it's obviously invalid now. If we continue running the program, we'll hit a segfault.
So like we said earlier, soft_timer_start_millis
provides context
, which storage
is assigned as. Other than line 62, context
is provided on line 51 as &storage
.
That actually passes the address of the storage
variable (which is on the stack) when we actually want to pass the value of storage
that contains the address of s_storage
. So the fix is to replace &storage
with storage
. With that change, it no longer segfaults!
Here's another great place to commit your changes! It's a nice, cohesive change that can be summarized in a line or two if necessary.
Hopefully, you were able to see how GDB allows you to observe the state and data of your program without recompiling or relying on debug output. If you are more of a visual learner, CS246 has an amazing GDB tutorial video on their Youtube page that goes over these commands and some other helpful ones. Although they use C++, many of the same principles still apply to C.
There's actually another issue in this code. The expected behavior is for all LEDs to toggle in sequence, but the values are all wrong. Can you figure out why?
Hint: Be careful with addresses to stack-allocated variables!