FW16 Bootloader (Flash-over-CAN)

Bootloader in all of our STM32L433s.

The bootloader will support DFU over CAN. It will also support multi-node flashing, with configuration data being stored in flash and full vehicle DFU.

FW15 Bootloader (Sorry, there is no documentation on this, but this page pretty much covers it): https://github.com/uw-midsun/fwxv/tree/main/projects/bootloader

DATAGRAM PROTOCOL:

image-20241216-000141.png
Datagram as of 15/12/2024

DESIGN CHOICES:

  1. The bootloader will be located at the base of the flash memory, 0x800 0000. Ideally this takes very little memory; in the FW15 car, the bootloader took ~16 kB of 64 kB and thus was never permanent. We now have 256 kB to spare.

    1. Maybe we use the LL library and not the CubeIDE HAL? To be determined

  2. Write to flash memory every 2048 bytes (Flash page size in the STM32L433). This supports double-word writing (8 bytes at a time), so flash speeds should be much faster than the FW15 car.

  3. Custom datagram protocol over CAN. Since CAN is limited to 8 bytes at a time, we must chunkify all of our messages on the client side.

  4. Use a sequencing protocol where every 2048 bytes, a sequencing datagram is sent. This contains the sequence number, allowing us to account for lost packets and a binary chunk CRC32. This CRC32 will be used to validate flash integrity.

  5. Protect the bootloader memory region using the built-in MPU. This will prevent the bootloader from ever being erased by a run-time program.

  6. Each MCU will have a custom config page consisting of the following:

    1. Project Name

    2. GitHub Branch Name

    3. ID Number (1-16)

    4. To be expanded if needed

  7. SCons build system with OpenOCD must support the new memory map.

    1. Reverse compatibility to ONLY flash applications

    2. Selection between flashing the bootloader/application

    3. The bootloader will only be built using the release mode

    4. Erase only the required memory regions

  8. Watchdog to prevent hanging within the bootloader. Likely using the independent watchdog rather than the windowing watchdog due to unnecessary complexity

  9. SysTick ISR to timeout of the bootloader. This interrupt will allow us to increment a counter that determines how many seconds have passed.

  10. A bare-metal, while-true loop that constantly polls the CAN bus for new messages. These messages will be fed into the state machine, where the datagram ID will cause different procedures.

  11. State Data as of 15/12/2024 from FW15:

    typedef struct { uintptr_t application_start; uintptr_t current_address; uint32_t bytes_written; uint32_t binary_size; uint32_t packet_crc32; uint16_t expected_sequence_number; uint16_t buffer_index; BootloaderStates state; BootloaderError error; uint16_t target_nodes; bool first_byte_received; } BootloaderStateData;
    1. application_start comes from the linker script. This is a pointer to the start address

    2. current_address is a value that is incremented during DFU to allow for correct memory flashing

    3. bytes_written counts the number of bytes written to memory. It allows us to preemptively stop DFU

    4. binary_size is updated upon receiving the start message. It is compared to bytes_written

    5. packet_crc32 is updated upon receiving a sequencing message. It is validated when before/after writing to flash memory

    6. expected_sequence_number is updated upon receiving a sequencing message. It is validated every time a sequencing message is received

    7. buffer_index is incremented every time we receive a data message. It is incremented 8 bytes at a time. It accounts for the last non-8-bytes of data.

    8. The rest are not critical to operation

  12. Custom linkerscripts for both the bootloader/application memory. Please reference this document written by me to learn more about linkerscripts: Linker scripts

IMPORTANT LEARNINGS FROM FW15 (ARYAN’S NOTES):

  1. The system-control-block (SCB) was automatically set to 0x800 0000 by CMSIS. This means the bootloader had the correct memory address for its NVIC/SCB, etc. but the application didn’t. I manually updated the CMSIS file to reference the linkerscript memory_start.

  2. Writing/erasing to flash takes a long time! That is why we need valid sequencing between the client/MCU. Using acknowledge messages helped synchronize communication.

  3. I debugged/ran everything with GDB. I do not recommend this; maybe a #ifdef that includes UART during debugging would be helpful?

  4. Smoke test everything. Since we were using the standard peripheral library, I had to rewrite simpler HAL APIs using register-level functions. This required smoke tests to validate everything.

  5. Ensure memory alignment! If something is not 4-byte aligned, use padding.

  6. Jumping back to the bootloader after entering the application caused some funky issues that I was too lazy to debug.

    1. The temporary solution I used is resetting the entire MCU (Thus entering the bootloader). What is more appropriate is jumping to the bootloader’s memory address.

  7. FW15 Achieved flashing in < 3 seconds

DESIGN IMPROVEMENTS/THOUGHTS

  1. Some sort of simple cryptography would be cool to add to the sequencing packet, as we have a spare 2 bytes.

  2. Implement a GUI for the Python scripts. I had a Python script that controlled the commands via the terminal.

    1. This also includes cleaning up the python scripts. They became incredibly messy near the end because of how frustrated I became during bootloader development (It wasn’t working for a few days )

  3. Depending on our future application sizes, there may be a benefit to dual-bank flashing. Where we define a section of memory for a secondary application. In FW15 our applications never exceeded 64 kB, assuming that size remains constant, we could store 3+ applications in memory!

  4. Use an LED to indicate bootloader mode

  5. Implement multi-project flashing in parallel. This means we can flash BMS and PD at the exact same time.

    1. This could be done by implementing a node bitset in the CAN ID. By masking the node bitset (let's say the lower 8 bits), we can check if it's a bootloader command. If it is, then we can take the node bitset and validate if it is the correct node. This would allow us to flood the bus with both BMS and PD binary.