Network hangs during MQTT test

Hello, I’m running a test of the MQTT Client interface on a FreeRTOS/LWIP platform and IAR debugger on a device connected to the local LAN. LWIP version is 2.1.2. The problem is that after 1, 2,..15 hours of test the network hangs (no ping, apparently no interrupts) and after breaking with the debugger the stack (Tasks List) results corrupted as in the picture. Apart from the network all the other tasks work correctly. There was no exception or ASSERT apparently fired. The test consists in publishing 2 values (SysTime and TExt) every 2 seconds on a public broker (cloudmqtt.com) and subscribing to 1 message (‘LED’). The settings of mqttopts are left default. The problem is more likely to happen when MQTT tracing is disabled, and expecially (but not only) upon receiving the message subscribed. These are the last lines of output I got before network hangs: tcpsentcb: Calling QoS 0 publish complete callback STACK WaterMark =729 Publish: ICON65646464/SysTime=5565373 mqttpublish: Publish with payload length 7 to topic “ICON65646464/SysTime” mqttoutputsend: tcpsndbuf: 8192 bytes, ringbuflinearavailable: 11, get 245, put 21 Publish: ICON65646464/TExt=23.4 mqttpublish: Publish with payload length 4 to topic “ICON65646464/TExt” tcpsentcb: Calling QoS 0 publish complete callback MQTT: LED ON STACK WaterMark =729 Publish: ICON65646464/SysTime=5565929 mqttpublish: Publish with payload length 7 to topic “ICON65646464/SysTime” mqttoutputsend: tcpsndbuf: 8192 bytes, ringbuflinearavailable: 32, get 48, put 80 Publish: ICON65646464/TExt=22.9 mqttpublish: Publish with payload length 4 to topic “ICON65646464/TExt”

Network hangs during MQTT test

PCB state when this problem happens

Network hangs during MQTT test

Hmm, there would seem to be multiple places that could potentially cause this type of corruption, but I suggest starting with the network driver. Which chip are you running on? Where did you get the lwIP and lwIP port from?

Network hangs during MQTT test

Thanks Richard. Checking the sources I found the Port belonging to an older SDK so I updated to SDK 2.6.0 (and LWIP 2.1.2). The network chip is internal at the K65 NXP micro. The driver in use is **; @file: startup_MK65F18.s ; @purpose: CMSIS Cortex-M4 Core Device Startup File ; MK65F18 ; @version: 3.0 ; @date: 2015-3-25 ; @build: b151210** After the upgrade I ‘ve never seen the ‘Task Stack’ corrupted in the debugger as in the first image, but the problem reported in the second one still happens, at random intervals, sometimes leading to reset of the controller -without exception – sometimes just hanging the network. i.e. in the function mqttoutputsend() .. err = altcpwrite(tpcb, mqttringbufgetptr(rb), sendlen, TCPWRITEFLAGCOPY | (wrap ? TCPWRITEFLAGMORE : 0)); … if (err == ERROK) { mqttringbufadvancegetidx(rb, sendlen); / Flush / altcpoutput(tpcb); }* altcp_output get stuck in this loop: / useg should point to last segment on unacked queue / useg = pcb->unacked; if (useg != NULL) { for (; useg->next != NULL; useg = useg->next); } ..due to corruption of the PCB as reported If so, I also wonder why the previos call to altcpwrite() can pass the check (tcpwrite_checks) without entering in LWIPASSERT(“tcpwrite: no pbufs on queue => both queues empty”, pcb->unacked == NULL && pcb->unsent == NULL); If you have any ideas on where I could look or need some more info, let me know. Bye

Network hangs during MQTT test

Maybe it is not the definitive solution and I’m asking you to recheck , but it seems to work better this way: Note: The segment pointed by **useg ** was used in comparations despite of a possible – and in my case verified – NULL value. ~~~ errt tcpoutput(struct tcppcb *pcb) { … /* put segment on unacknowledged list if length > 0 */ if (TCPTCPLEN(seg) > 0){ seg->next = NULL; /* unacked list is empty? / if (pcb->unacked == NULL) { pcb->unacked = seg; useg = seg; / unacked list is not empty? / } //else else if (useg)//(-IABR Jun 29, 2019 useg CAN be NULL ) { / In the case of fast retransmit, the packet should not go to the tail * of the unacked queue, but rather somewhere before it. * We need to check for this case. -STJ Jul 27, 2004 / if (TCP_SEQ_LT(lwip_ntohl(seg->tcphdr->seqno), lwip_ntohl(useg->tcphdr->seqno))) { / add segment to before tail of unacked list, keeping the list sorted */ struct tcp_seg **cur_seg = &(pcb->unacked); while (*cur_seg && TCP_SEQ_LT(lwip_ntohl((*cur_seg)->tcphdr->seqno), lwip_ntohl(seg->tcphdr->seqno))) { cur_seg = &((*cur_seg)->next ); } seg->next = (*cur_seg); (*cur_seg) = seg; } else { /* add segment to tail of unacked list */ useg->next = seg; useg = useg->next; } } /* do not queue empty segments on the unacked list */ } … } ~~~

Network hangs during MQTT test

Sorry – I’m not too familiar with lwIP code these days (we have had our own TCP/IP stack for some years now) so I’m not sure what I’m looking at here.