Error analysis of medical devices

For a short-term project, the task was to identify and correct a sporadic fault in a medical device that caused it to crash. The manufacturer’s development department was unable to locate the error due to incomplete documentation and procurement of components from defunct external companies. The project involved drivers programmed in C, sequence code in C++, and a GUI developed in C#. Multiple networked devices, including communication with a Windows laptop, were part of this project. Due to the complex nature of the investigations, I will provide a brief summary highlighting some of the work conducted.

Brief summary

  1. Migration of software applications from Visual Studio 2008 to Visual Studio 2015.
  2. Elimination of various compiler warnings found in different software components.
  3. Enhancement of the software application by adding log messages for improved diagnostic capabilities.
  4. Identification of the fault through the utilization of an existing flowchart, specifically focusing on isolating it to the initialization phase of a device.
Flowchart
  • USB driver analysis, including the search for blocking calls such as deadlocks.
  • Analysis of the frames sent over the interface at the byte level, including frame decoding.
  • Firmware analysis of the medical devices.
  • Development of an application for automatic brute forcing of the initialization phase and for generating log and statistic messages.
Try to make sense of the log messages
Frame decoding
Captured timeouts
  • Firmware analysis of the medical devices
  • Development of an application for automatic brute forcing of the initialization phase and for generating log and statistic messages
An excerpt from the results report, heavily abbreviated, reads: According to the code analysis (usb_win.c) of the USB schedule function, the COM port closes after a timeout. If the COM port is open—i.e., a connection is established at the beginning—then the port is in the “COM_CHECK” state. In this state, a counter/timer “iTimo=10” is set and then the state changes to “COM_WAIT”. In the “COM_WAIT” state, the program waits for COM port responses—in this case, responses from the hardware—and decrements the counter/timer “iTimo”. According to code comments, a response is expected within 500ms. Various frame timeouts and COM connection terminations could be identified in the log files. To accomplish this, several hundred automated initialization processes were performed using the extended test client I developed. These errors were observed on several different systems, but with different frequencies of occurrence.
Errors during initialization phase
It was found that the more performant a system is, the fewer errors occur. The “schedule_USB” task was called every 1ms by the scheduler. Through an internal counter “iTimo”, this task waited a total of only 10ms for a response from the medical device. However, the response time sometimes exceeded 10ms, depending on the system, leading to frame timeouts, COM port timeouts, and COM port connection terminations. The call frequency of the task was initially increased to 50ms and later to 150ms, meaning that now a total of 1500ms must pass before timeouts occur. This adjustment led to a significant improvement on all systems, such that no more initialization errors were detected during the automated initialization tests, and the log files also no longer showed any frame timeouts or COM connection terminations. Furthermore, the initialization process has now become significantly more performant.
No errors after fix.