Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

See https://embeddedartistry.com/blog/2020/08/17/three-gcc-flags-for-analyzing-memory-usage/, also https://blog.thea.codes/the-most-thoroughly-commented-linker-script/ and https://ftp.gnu.org/old-gnu/Manuals/ld-2.9.1/html_chapter/ld_3.html for some linker script resources.

...

I don’t have a precise design for this, but we could possibly use a form of virtual method table - have the GpioAddress-type structs contain a pointer to a vtable struct which contains function pointers for each virtual function (init_pin, set_state, etc). The vtable struct would be a global constant for each type. See https://stackoverflow.com/a/8194632.

CAN ID allocation reform

A (standard) CAN frame consists of a few fields: the arbitration field (11 bits), control, data (1-8 bytes), CRC, and end of frame.

...

The arbitration field contains the arbitration ID, which is the ID of a CAN message on the physical layer. It’s used both to identify a message and to determine which message wins if two nodes try to send two messages at the same time: the message with the lower arbitration ID wins and gets sent first.

An important point is that if two nodes send a message with the same arbitration ID at the same time, that’s an error and who knows what could happen. The easiest way to avoid this scenario in production is to embed a device ID into the arbitration ID so that each node only sends CAN messages with its device ID. This is what we currently do. We split the arbitration ID as follows:

Code Block
                    11 bits - arbitration ID
|-------------------------------------------------------------|
|--------------------------------|-----|----------------------|
      6 bits - message ID         1 bit    4 bits - device ID
                                 ACK flag

The message ID is the actual ID of the CAN message, e.g. SYSTEM_CAN_MESSAGE_SOLAR_FAULT; the device ID is uniquely assigned to each project, e.g. SYSTEM_CAN_DEVICE_CENTRE_CONSOLE; and the ACK flag is used for the acknowledgement system. Normally, the ACK flag is 0 to indicate a data message. When a “critical” message is received (currently defined as a message with message ID <= 13), our CAN library automatically responds with a message with the same message ID, ACK = 1 to indicate an acknowledgement message, and the device ID of the current device. This can be used to make sure certain devices have received certain messages with can_ack_add_request, e.g. the power up/down sequences in centre console use this extensively.

Due to this splitting, we have 2^6=64 possible message IDs per device ID and 2^4=16 possible device IDs. The issue is that our codegen tooling doesn’t allow a message with the same message ID to be sent from multiple devices, except for ACKs! There’s no way to specify multiple sources for a message. This limits us to just 64 message IDs total, rather than 64 message IDs per device ID, and this is really tight.

Here are a few suggestions to improve our CAN ID allocation. All of them are backwards-incompatible and will require changes to the core CAN library as well as codegen-tooling, so tread carefully.

Use the device ID as well as the message ID to determine the message

This one is fairly simple: our current codegen tooling doesn’t allow multiple device IDs to share the same message ID (so if two devices want to send the same message, they have to use two message IDs, like SYSTEM_CAN_MESSAGE_FRONT_CURRENT_MEASUREMENT and SYSTEM_CAN_MESSAGE_REAR_CURRENT_MEASUREMENT). To give us more messages, let’s change it so that two CAN messages with the same message ID and different device IDs are considered different messages. This would give us 64 messages per device ID, for a total of 1024 possible messages. In essence, we’d just consider message ID + device ID to identify a distinct CAN message, rather than just the message ID.

This would require extensive changes to codegen-tooling, to the CAN library, and to telemetry, as well as reflashing all the boards onsite when we’re done.

An important thing to consider here is priority. Since a smaller arbitration ID has priority on the CAN bus, and the message ID takes up the most significant bits of the message, the message ID determines the priority of a message on the bus. So if you’re writing firmware for a tiny unimportant sensor and need a few CAN messages to send back voltage data, don’t start your message IDs at 0 and increment: otherwise your sensor might delay something important like the BPS heartbeat. I think it’s also reasonable to keep the convention of defining a set of critical message IDs which get ACKed, and priority levels: see the top of can_messages.asciipb in codegen-tooling-msxiv for some idea of priority levels.

Use a special message ID for ACKs instead of a bit in the arbitration ID

An ACK message currently has the following attributes:

  • same message ID as the message it’s ACKing, ACK bit set to 1, device ID of the current device

  • contains 1 byte of data: the ACK status (see CanAckStatus from can_ack.h)

Instead, we could just reserve a message ID for ACKs (one message ID per device ID if we implement the previous suggestion), then send both the message ID (+ device ID) we’re ACKing and the ACK status in the data. This would allow us to remove the ACK bit from the arbitration ID, so we can then allocate it to the message ID and double the number of message IDs (or double the number of device IDs).

This would require less extensive changes, but still some wide-ranging backwards-incompatible changes to codegen-tooling, the CAN library, and telemetry.

Advantages of the current ACK bit system:

  • the priority of an ACK is the same as the message it’s ACKing (actually very slightly lower)

Advantages of the reformed special message ID system:

  • double the message IDs (or device IDs)

We’d have to choose an appropriate message ID to reserve. 0 or 1 might be appropriate because ACKs are sent for critical messages so they are themselves critical; in particular ACKs are used for the BPS heartbeat system and power sequence messages. Since critical messages aren’t sent too often it probably doesn’t matter the exact value as long as it’s lower than all the messages sent frequently/continuously.