Miscellaneous Ideas & Cool Things

This page has some ideas and other things I found that could be useful in the future.

--print-memory-usage: Printing memory usage by section from GCC

It turns out that GCC has a flag that works similarly to our pretty_size.sh. If you add -Wl,--print-memory-usage to the stm32f0xx linker flags via adding it to LDFLAGS in platform/stm32f0xx/platform.mk, you get output like this at the end of the make build output:

Building power_distribution.elf for stm32f0xx
Memory region         Used Size  Region Size  %age Used
           FLASH:       28712 B       128 KB     21.91%
             RAM:        7736 B        16 KB     47.22%

It looks like GCC uses the memory sections defined in the linker script (our stm32f0discovery_def.ld) to determine how much of each memory section we used.

See https://embeddedartistry.com/blog/2020/08/17/three-gcc-flags-for-analyzing-memory-usage/, also https://blog.thea.codes/the-most-thoroughly-commented-linker-script/ and https://ftp.gnu.org/old-gnu/Manuals/ld-2.9.1/html_chapter/ld_3.html for some linker script resources.

Generic GPIO

We have a few different kinds of GPIO pins in MSXIV: there are the native GPIO pins on the STM32, accessible via the gpio library, and a few types of I2C IO expanders accessible via their driver (MCP3427 with mcp3427_gpio_expander, PCA9539R with pca9539r_gpio_expander).

All of these drivers/libraries implement the same basic interface for each pin – init_pin, set_state, toggle_state, and get_state – but they’re entirely incompatible at the moment, so you can’t mix and match GPIO types. This leads to awkward situations like in the bts_7200_load_switch and bts_7040_load_switch drivers, where there are two separate init functions and a whole ton of duplication where the only change is whether one pin is a native STM32 GPIO pin or a PCA9539R IO expander pin.

In any remotely object-oriented language, this problem is solved by having each driver’s GpioAddress class inherit from a base AbstractGpioAddress class or implement an IGpioAddress interface; then libraries that need to manipulate GPIO pins but don’t need to care about what they are can pass around AbstractGpioAddress instances and call set_state on them, and dynamic dispatch will be used to route the call to the appropriate driver. Unfortunately, C is not remotely object-oriented, so we have to implement it ourselves.

I don’t have a precise design for this, but we could possibly use a form of virtual method table - have the GpioAddress-type structs contain a pointer to a vtable struct which contains function pointers for each virtual function (init_pin, set_state, etc). The vtable struct would be a global constant for each type. See https://stackoverflow.com/a/8194632.

CAN ID allocation reform

A (standard) CAN frame consists of a few fields: the arbitration field (11 bits), control, data (1-8 bytes), CRC, and end of frame.

The arbitration field contains the arbitration ID, which is the ID of a CAN message on the physical layer. It’s used both to identify a message and to determine which message wins if two nodes try to send two messages at the same time: the message with the lower arbitration ID wins and gets sent first.

An important point is that if two nodes send a message with the same arbitration ID at the same time, that’s an error and who knows what could happen. The easiest way to avoid this scenario in production is to embed a device ID into the arbitration ID so that each node only sends CAN messages with its device ID. This is what we currently do. We split the arbitration ID as follows:

                    11 bits - arbitration ID
|-------------------------------------------------------------|
|--------------------------------|-----|----------------------|
      6 bits - message ID         1 bit    4 bits - device ID
                                 ACK flag

The message ID is the actual ID of the CAN message, e.g. SYSTEM_CAN_MESSAGE_SOLAR_FAULT; the device ID is uniquely assigned to each project, e.g. SYSTEM_CAN_DEVICE_CENTRE_CONSOLE; and the ACK flag is used for the acknowledgement system. Normally, the ACK flag is 0 to indicate a data message. When a “critical” message is received (currently defined as a message with message ID <= 13), our CAN library automatically responds with a message with the same message ID, ACK = 1 to indicate an acknowledgement message, and the device ID of the current device. This can be used to make sure certain devices have received certain messages with can_ack_add_request, e.g. the power up/down sequences in centre console use this extensively.

Due to this splitting, we have 2^6=64 possible message IDs per device ID and 2^4=16 possible device IDs. The issue is that our codegen tooling doesn’t allow a message with the same message ID to be sent from multiple devices, except for ACKs! There’s no way to specify multiple sources for a message. This limits us to just 64 message IDs total, rather than 64 message IDs per device ID, and this is really tight.

Here are a few suggestions to improve our CAN ID allocation. All of them are backwards-incompatible and will require changes to the core CAN library as well as codegen-tooling, so tread carefully.

Use the device ID as well as the message ID to determine the message

This one is fairly simple: our current codegen tooling doesn’t allow multiple device IDs to share the same message ID (so if two devices want to send the same message, they have to use two message IDs, like SYSTEM_CAN_MESSAGE_FRONT_CURRENT_MEASUREMENT and SYSTEM_CAN_MESSAGE_REAR_CURRENT_MEASUREMENT). To give us more messages, let’s change it so that two CAN messages with the same message ID and different device IDs are considered different messages. This would give us 64 messages per device ID, for a total of 1024 possible messages. In essence, we’d just consider message ID + device ID to identify a distinct CAN message, rather than just the message ID.

This would require extensive changes to codegen-tooling, to the CAN library, and to telemetry, as well as reflashing all the boards onsite when we’re done.

An important thing to consider here is priority. Since a smaller arbitration ID has priority on the CAN bus, and the message ID takes up the most significant bits of the message, the message ID determines the priority of a message on the bus. So if you’re writing firmware for a tiny unimportant sensor and need a few CAN messages to send back voltage data, don’t start your message IDs at 0 and increment: otherwise your sensor might delay something important like the BPS heartbeat. I think it’s also reasonable to keep the convention of defining a set of critical message IDs which get ACKed, and priority levels: see the top of can_messages.asciipb in codegen-tooling-msxiv for some idea of priority levels.

Use a special message ID for ACKs instead of a bit in the arbitration ID

An ACK message currently has the following attributes:

same message ID as the message it’s ACKing, ACK bit set to 1, device ID of the current device
contains 1 byte of data: the ACK status (see CanAckStatus from can_ack.h)

Instead, we could just reserve a message ID for ACKs (one message ID per device ID if we implement the previous suggestion), then send both the message ID (+ device ID) we’re ACKing and the ACK status in the data. This would allow us to remove the ACK bit from the arbitration ID, so we can then allocate it to the message ID and double the number of message IDs (or double the number of device IDs).

This would require less extensive changes, but still some wide-ranging backwards-incompatible changes to codegen-tooling, the CAN library, and telemetry.

Advantages of the current ACK bit system:

the priority of an ACK is the same as the message it’s ACKing (actually very slightly lower)

Advantages of the reformed special message ID system:

double the message IDs (or device IDs)

We’d have to choose an appropriate message ID to reserve. 0 or 1 might be appropriate because ACKs are sent for critical messages so they are themselves critical; in particular ACKs are used for the BPS heartbeat system and power sequence messages. Since critical messages aren’t sent too often it probably doesn’t matter the exact value as long as it’s lower than all the messages sent frequently/continuously.