Potential risk warning for CiA 443 sub-sea systems supporting a bit rate change via Layer Setting Services
Although ESAcademy is not an active member of the CiA 443 group, we have several customers and business partners using CiA 443 and came across a potential reliability issue in regards to bit rate changes.
It is our understanding that the reliability requirements for CiA 443 sub-sea applications are very high. Bootloaders are written and tested in a way that even power failures at any time or severe communication errors can not break the system. In worst case, an application is not programmed and a device remains in bootloader mode and is simply re-programmed again.
However, allowing the CAN bit rate to change with the currently specified mechanisms bears the risk of one or multiple devices failing. If in a CAN network devices are not configured to use the same bit rate, communication fails at a very low level. Devices will recognize that there are errors on the bus and potentially take themselves offline (bus off). If the devices are configured to use these different bit rates, then this error state can not be resolved.
How could such a situation occur?
The Layer Setting Services (LSS, see CiA 305) allow the setup of a bit rate, if all devices connected to a network support these services. Although the method of when exactly to do the bit rate change is very well specified and synchronized, the actual storing of this information (nodes copying this information to their local non-volatile memory) is not. It happens “one-by-one” and as no timings specified, this could be within seconds or even minutes. If there are severe bus communication errors during this time or even a power failure, then all devices will not have the same bit rate configured.
Possible solutions:
1.) Do not use switching of CAN bit rates by LSS, only use it for node ID assignments
2.) Use a power-on default bit rate. Any change to the bit rates is not stored in non-volatile memory, it is only temporary. With each reset or power cycle all devices fall back to their initial default bit rate.
3.) Use auto detect. Note: this only works if not all nodes are doing it, there must be at least one node communicating for the others to be able to do an auto detection. This feature is not available with all CAN controllers (requires passive listen-only mode).
4.) Check with CiA 305 group what else can be done to make the bit rate switch safer, for example by not only synchronizing the time of the physical switch, but also the time when this information is stored into non-volatile memory.
Until this is solved we recommend all existing systems to not make use of the bit rate switching by LSS.