BMS AFE Rev 1.0 Validation - Walkthrough

Setup

Connected to a 12S 1P Li-ion battery stack (some of the many old laptop batteries that were hanging around my basement).

IP and IN on the AFEs seem to be backwards compared to the vertical duraclick connectors. Had to swap the pins in the cable to match it up.

The connections to the cells seem to be crimped improperly, parts of them were squished during crimping, allowing the pins to push out the end of the connector. We should ask Molex for the proper crimp tool, as this happens a lot. This may also be contributing to the fact that the 14 and 16 pin connectors are tough to remove, require 1 hand on the board and 1 hand on the connector.

 

I dropped the board. It landed on the screw hole. Maybe we should consider moving the screws away from the edge of the board a little more? But the main point here is that I should have been more careful with the board. It landed on a thin rubber mat on the floor in my work area.

 

Power Validation

All power nets are not connected. Good.

I presume toggling the CS pin is supposed to keep it from going in to sleep state?

Measuring 5V is good. Nothing on either of the 3V0 VREF outputs.

The board seems to be resetting at a rate independent of the rate at which I toggle the CS pin. By resetting, I mean the 5V LED toggling off (for a short time, about 100ms before turning back on). So the device seems to be going into a sleep state or something before waking up again via the SPI command.

 

Battery Stack Monitor

@Gerald Aryeetey (Deactivated) and @Micah Black tried to get the LTC6811 chip to talk using code from the old car, modified to work with just 1 AFE.

They seem to always be replying with a '0' for the cell voltage measurement, causing a bad CRC error. It is unknown at this point whether it is a problem with the code or a problem with the hardware.

Here is a scope image of the VREF2 pin when the commands are sent.

So it seems like the device is waking up. More debugging to do. @Liam Hawkins any ideas?

The AFE takes time for the references to wake up, and it takes about 3.5mS for the AFE to wake up the references. It shows 5mS here, but would be within the specified range if measured from the end of the rising edge.

Reading further, this wait cycle only applies when waking up the AFE to measure it.

 

So we went on testing trying to get data from it.

We connected a logic analyzer to the output to see if we were getting the correct data out. The output shows the same data that we send in code, so this verifies that we have the correct SPI port and pinouts. However, still no response from the AFEs.

We looked at the ISOSPI input on the AFEs with an oscilloscope, and that all seemed fine:
Note - the peak are much higher (>+-1V) when zoomed in.

So the data is getting to the AFE side accurately. It is waking up from the communication and powering the DRIVE pin to enable the 5V regulator - the green LED on the board turns on - and then goes back to sleep after 2s.

We did notice that the MISO and MOSI lines seem to be following each other, which I didn’t really expect. The MISO/MOSI pins seem to be shorted on the board, but I remember @Liam Hawkins mentioned at one point that the LTC6820 chips are not ‘idle’ - they will actively drive some of the lines and might be causing this measurement?. We should probably do some more digging into the LTC6820 datasheet to see what we should expect on the SPI port.

Any thoughts @Liam Hawkins ?

Debugging Jun 6 2020

Going off the last comment of the SPI lines being shorted, I double checked them and they seemed to be fine. No shorts that I could see. I measured the other SPI bus on the carrier (for the Current Sense) and it was not shorted - which seemed like the expected behaviour.

I checked all the pins on the LTC6820 and confirmed that it was only the MISO/MOSI pins that were shorted. Thinking it could be some solder stuck underneath the chip connecting the backs of the pins, I flooded those pins with solder, then removed it using the solder wick. Still shorted.

Poking around the board a little more, I found the culprit. MISO/MOSI on SPI1 are shorted underneath the controller board connector, so that you can only see it if you’re looking at an angle less than about 30 degrees to the horizontal. And it really does just look like flux unless you have the right lighting. I tried to get a picture, but it doesn’t really do it justice:

Plugging everything back in, we immediately get data back!

But is it the data expected?
We are sending it: 0x00, 0x02, 0x2b, 0x0a which should be the command to read configuration register A, followed by the 2 bytes of the Packet Error Code used to check for data errors.

It sent back:
0xFA, 0x00, 0x00, 0x00, 0x00, 0x00, 0xC6, 0x52

Which should be the 6 bytes of data in Configuration Register A followed by the 2 bytes of the packet error code. According to the Validation Page (and the datasheet), this is expected - with only the GPIOs and the DTEN bits set.

And the python script I had created earlier for generating and checking the CRC gives a CRC of 0xC652 for the same 6-byte message so everything is working!
Yay!

Here’s the python program if anyone wants it:

This is the modified smoke_spi main.c I used to get the data:

 

Now going back to the Plutus code that we were originally trying to run to read voltages: It works!

 

[0] projects/plutus/src/main.c:19: THIS IS THE START[0] libraries/ms-helper/src/persist.c:112: Found valid section at 0x801f800 (0x10 bytes), loading data [0] projects/plutus/src/main.c:22: Board type: 0 [0] projects/plutus/src/ltc_afe_fsm.c:65: TRIGGERED CELL CONVERSION SUCCESSFULLY [0] projects/plutus/src/ltc_afe_impl.c:230: Got cell voltage for cell index: 0 with value 40889 [0] projects/plutus/src/ltc_afe_impl.c:230: Got cell voltage for cell index: 1 with value 40862 [0] projects/plutus/src/ltc_afe_impl.c:230: Got cell voltage for cell index: 2 with value 40590 [0] projects/plutus/src/ltc_afe_impl.c:240: CALCULATING PACKET ERROR CODE (CRC) FOR rev_pec=48032 and data_pec=48032 [0] projects/plutus/src/ltc_afe_impl.c:230: Got cell voltage for cell index: 3 with value 40897 [0] projects/plutus/src/ltc_afe_impl.c:230: Got cell voltage for cell index: 4 with value 41418 [0] projects/plutus/src/ltc_afe_impl.c:230: Got cell voltage for cell index: 5 with value 41621 [0] projects/plutus/src/ltc_afe_impl.c:240: CALCULATING PACKET ERROR CODE (CRC) FOR rev_pec=64312 and data_pec=64312 [0] projects/plutus/src/ltc_afe_impl.c:230: Got cell voltage for cell index: 6 with value 41517 [0] projects/plutus/src/ltc_afe_impl.c:230: Got cell voltage for cell index: 7 with value 41479 [0] projects/plutus/src/ltc_afe_impl.c:230: Got cell voltage for cell index: 8 with value 41443 [0] projects/plutus/src/ltc_afe_impl.c:240: CALCULATING PACKET ERROR CODE (CRC) FOR rev_pec=39510 and data_pec=39510 [0] projects/plutus/src/ltc_afe_impl.c:230: Got cell voltage for cell index: 9 with value 39617 [0] projects/plutus/src/ltc_afe_impl.c:230: Got cell voltage for cell index: 10 with value 41053 [0] projects/plutus/src/ltc_afe_impl.c:230: Got cell voltage for cell index: 11 with value 40681 [0] projects/plutus/src/ltc_afe_impl.c:240: CALCULATING PACKET ERROR CODE (CRC) FOR rev_pec=38042 and data_pec=38042 [0] projects/plutus/src/ltc_afe_fsm.c:79: READING FROM CELLS [0] projects/plutus/src/ltc_afe_fsm.c:86: RUNNING CELL CALLBACK

Comparing to the Keysight U1282A multimeter, all voltages were within 1mV, expect for the top 2 cells (the last 2 printed out) which measured low by the LTC6811 by 1.3mV and 1.6mV respectively.

 

Overall, voltage reading is a success!

 

Temperature Readings

-I need to attach some thermistors first…

I put a variety of them on so that we can estimate temperature accuracy over the full range:

Thermistor Cell Input number

Bottom Resistor Value

Expected Voltage (10k upper resistor, 3V0 ref)

Temperature Result Expected

Thermistor Cell Input number

Bottom Resistor Value

Expected Voltage (10k upper resistor, 3V0 ref)

Temperature Result Expected

0

10k

1.5

25

1

3.3k

0.74436

52

2

10k

1.5

25

3

4.7k

0.959186

43

4

10k

1.5

25

5

5.1k

1.013245

41

6

10k

1.5

25

7

6.8k

1.214285

34

8

10k

1.5

25

9

10k pot

0-1.5

25+

10

10k

1.5

25

11

3.3k

0.74436

52

12

4.7k

0.959186

43

13

5.1k

1.013245

41

14

6.8k

1.214285

34

15

10k

1.5

25

16

10k

1.5

25

17

10k NTC

1.5@25C

25


Updates July 9, 2020

There were a few bugs to work out in the code to print the values, but they didn’t match what was expected. I probed the COMM lines between the 2 chips, and there was nothing - just held to 5V for 2s (beofore the chip goes to sleep) with the pullup resistor. (pins 17,18,19)
I also probed the output of the chip - pulled to GND for the whole time. (pin 43)

I also verified the outputs of the thermistors, and verified that they are at the correct voltages as listed in the table above.

@Gerald Aryeetey (Deactivated) and I found a few errors in the code that explained why this was not working.

  1. There was no PEC sent at the end of the WRCOMM data bits

  2. The data bits of the WRCOMM command seemed to be getting sent out in the wrong order - d0, icom0, fcom0 instead of icom0, d0, fcom0.
    I found this error with the logic analyzer.

In the screenshot above, I hard-coded it to send:
ICOM0: 8
D0: 5
FCOM0: 9
(which are the correct commands, just wanted to remove 1 extra possible error)and it is sending
5,8,9 and then 4 padding bits (I think)
So, somehow the bitfields are messing it up.

I had this note in the compiler when they were at uint8_t and changed them to uint16_t and then didn't get that error.

Apparently there were some changes to how bitfields work in GCC4.4: https://gcc.gnu.org/gcc-4.4/changes.html

I (@Micah Black will hard-code some more stuff while @Gerald Aryeetey (Deactivated) looks in to a fix for this.

 

Updates July 10, 2020

We fixed the PEC and the correct order for the COMM register data bits.

With those fizes, we got communication happening over the external SPI port to theMUX. Still not getting any useable data from it though (all ~2.1V for the measured channels - they should be around 1.5V or below according to the the calculations above. Note that the measurements are just noise - they can easily be ‘adjusted’ by touching them or probing with a scope (the scope input impedance pulling the measurement down to almost 0V.

I really wish this board had better test points for all of the communication outputs - but that’s beside the point.

 

The mux datasheet mentions that the data should be clocked in the to serial input MSB-first, and the LTC6811 datasheet mentions that all data transfers happen MSB first, so we should be good on that front.

 

I noticed this seemingly weird behavior of the 3V0_VREF2 pin measured at the input to the buffer. The device turns on, takes 12 temperature measurements, then goes back to sleep. It seems the 3V0_VREF2 is going to sleep between every AUX measurement. The period between the dips in 3V0 lines up with what I'm seeing on the logic analyzer for the read AUX commands, roughly a 13mS period (note I had slowed down the SPI communication for this to get a better visual of the external SPI port on the scope earlier, and the 13mS period will not be what is in the car). @Liam Hawkins Is this expected behavior? Should the references be powering off? Sleep transition takes 2s, and the Green LED for 5V0 indicating DRIVE is always on, so the IC seems to be going in to STANDBY state.

This is happening whether the thermistors are connected or not, so it should not be due to overloading the buffer or the regulator.

Next steps I’ll try getting some wires soldered to the TQFP mux package and hopefully connect them to the logic analyzer or scope so we can see what is actually happening (I had verified communication was present measuring just 1 channel at a time, so I am unsure if timings are accurate.

Updates July 11, 2020

I hooked up the LTC external SPI interface (the one to the MUX) to the logic analyzer and verified that the signal is getting passed along correctly.

This screenshot should be correctly setting the bits A2 and A0 to enable the input 5 of the mux.

None of this makes sense…
So going back through the configuration and reading everything carefully:

ADG731: Data is shifted in to the register, MSB first, on the FALLING edge of the clock.
Checking this against the logic analyzer output, it was set up to sample on the RISING edge of the clock (CPHA=1). Swapping this to the correct CPHA=0 for sampling on the falling edge:
(This is the same spot in the transmission as the above diagram)

Notice that all the bits are shifted by 1, and the there is a ‘1' that gets shifted in to the first nEN bit of the ADG731’s input shift register. Thus, the mux is NEVER ENABLED. I think we found our problem.

Time to dig through the LTC6811 datasheet and see if the external SPI mode can be changed.
And I think we have a bigger problem than we realized here. The LTC6811 only supports SPI MODE 3:

Which is in conflict with what the ADG731 supports - either:

  • CPOL=1, CPHA=0 for SPI MODE 2

  • CPOL=0, CPHA=1 for SPI MODE 1

 

So at this point, we need to find a way to modify the interface, or we need to find a new MUX.

 

Options:

AGD732 - The ADG732 is a parallel controlled 32ch mux. Pins A0-A5 control the channel, and the WR pin must be toggled in order to latch the input data. This means that we need 6 pins to control it, and thus the device is not compatible.

Modify Interface - The LTC6811 outputs are Open-Drain, and the inputs on the ADG731 I believe are high impedance. If we invert the clock signal, I think we will be able to get sampling in the correct spot of the signal. SCLK is on pin 19 of the ADG731, and pin 33 of the LTC6811. The signal is pulled HIGH to 5V0 with R1.

I think I can cut the trace between PIN 19 and R1 and insert an N-FET and an extra pull-up resistor to achieve the signal inversion.

This is the circuit that would be put in between the pins to invert the clock:

I swapped in that circuit between the pins, using a DMN3023L for the external FET since there were some lying around from BMS carrier (these are the FETs used to power the contactors). Turns out the FETs are likely a little too powerful and/or the pullup resistance isn’t strong enough since the signal never recovers. The scope shot below is of the LTC GPIO pin acting as SCLK. You can see it properly disables the FET, but it takes 10+uS to fully recover to disable the FET. This time is a function of the gate capacitance of the FET and the external pullup resistance.

Also visible in this graph is a cool effect that I’ve heard of before but never observed - The Miller Effect.

Basically, it is a plateau in the gate voltage as the FET turns on. This happens because the internal capacitors inside the FET are charging up, but also because these capacitors change in value as the FET turns on. At the start of the curve, the voltage starts rising in a waveform that would be expected from a simple RC circuit. It rises like this until the FET starts to turn on. When the FET starts to turn on, there are extra charge carriers that start flowing through the Drain to Source path. The presence of these extra charge carriers increase the parasitic Gate to Drain capacitance, thus slowing down the rate of voltage rise - giving the plateau observed in the FET switching waveforms.
More info on the Miller Effect as well as FET behaviour is in these Application Notes (which I think are an amazing read) (Vgp - Voltage Gate Plateau - in the figure is the Miller Plateau)

So, we know a power FET is not the right part for the job - a small signal FET or BJT should work much better. I have some 2n3904 NPN BJTs lying around which I think will do a bit of a better job.

I swapped in the 2N3904 BJT and I got some data!!!!

Thermistor on Cell #

Expected mV

Reading mV

Error mV (Expected - Reading)

Thermistor on Cell #

Expected mV

Reading mV

Error mV (Expected - Reading)

0

1500

1502.8

+2.8

1

744.36

742.8

-1.56

2

1500

1501.3

+1.3

3

959.186

958.7

-0.486

4

1500

1504.5

+4.5

5

1013.245

1013.2

-0.045

And its very accurate!

Here’s the SCLK signal with the 2N3904 signal:
You can see it rises up to just under 5V between pulses, which is well above the 2.4V logic high threshold of the ADG731:

YAY! It all works.

This is the circuit to implement the clock inversion:

Here's a picture of the test setup with the modifications:

And for good measure, here’s a picture of the logic analyzer output with the modificiations:

 

@James Jo We need to implement this circuit on AFE rev 2.

 

Now we just have to figure out why we are only getting 6 readings from the LTC6811 IC.

@Gerald Aryeetey (Deactivated) This is our next task.