lwip crashing, apparently in sys_check_timeouts

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

lwip crashing, apparently in sys_check_timeouts

Keith Rubow
I am using lwip 2.0.3 with NO_SYS = 1 on a bare metal 386ex processor
with a Wiznet IIM7010A ethernet interface operating in MAC RAW mode.
Yes, this hardware is REALLY old, but we have lots of hardware out in
the field and need to fix a problem, and lwip seems like the best way to
fix it.

My hardware has a hardware watchdog timer that will reset the processor
if not refreshed at least every 5 seconds. Since implementing lwip I am
experiencing extremely infrequent system crashes. My debugging indicates
that I call sys_check_timeouts, and then the watchdog timer times out,
resetting the system. Normally the watchdog timer is getting refreshed
many times per second around the main program loop. It appears that
sys_check_timeouts is taking at least 5 seconds to execute.

This crash (or watchdog timer reset) is happening very infrequently. The
system will run flawlessly for 2-3 days before crashing. Unfortunately I
do not have source level debugging capability on this old hardware. It
is possible a memory exception could be vectoring me to a default
exception handler that simply hangs in a loop until the watchdog timer
resets the system (but proper code should never cause a memory
exception). Or the sys_check_timeouts could simply be taking too long to
execute. Can it ever take >5 seconds to execute sys_check_timeouts?

My IIM7010A ethernet interface is being operated in polled mode. My main
loop operates as follows:
for (;;) {
   if (received ethernet frame available in wiznet) {
     ethernetif_input(&wiznetif);    // this will read and process the
ethernet frame for this interface
   }
   sys_check_timeouts();
   if (I have data to send) {
     size = size of my data to send; // size is never more than 128 bytes
     if (tcp_sndbuf(userinfo.pcb) >= size) {    // if enough free space
to send it
       if (tcp_write(userinfo.pcb, mybuffer, size, TCP_WRITE_FLAG_COPY)
== ERR_OK) { // if send is successful
         tcp_output(userinfo.pcb);    // initiate sending of data
       }
     }
   }
   do other stuff not related to lwip or tcp/ip    // none of it is very
time consuming, should take only a few 10's of milliseconds
   reset_watchdog_timer();
}

I use ipv4 with fixed ip address. I have a single TCP/IP connection up,
use callback function for received data, and everything is working
perfectly except for the occasional crashes. My debugging ability is
limited to setting debug variables to values in a region of memory that
is preserved across reboots. My debug variables always show that the
last thing I did was call sys_check_timeouts before the restart, and I
never returned from sys_check_timeouts. Any ideas?

Keith



_______________________________________________
lwip-users mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/lwip-users
Reply | Threaded
Open this post in threaded view
|

Re: lwip crashing, apparently in sys_check_timeouts

Stephen Cowell
When presented with problems like this I start by replacing all the
default handlers with handlers that log a hit.  Since you have NVR that
you can read and write this should be easy... I use my User Page Flash
for the same purpose.  This will help you divide-and-conquer.
__
Steve
.

On 8/20/2018 1:13 PM, Keith Rubow wrote:

> I am using lwip 2.0.3 with NO_SYS = 1 on a bare metal 386ex processor
> with a Wiznet IIM7010A ethernet interface operating in MAC RAW mode.
> Yes, this hardware is REALLY old, but we have lots of hardware out in
> the field and need to fix a problem, and lwip seems like the best way
> to fix it.
>
> My hardware has a hardware watchdog timer that will reset the
> processor if not refreshed at least every 5 seconds. Since
> implementing lwip I am experiencing extremely infrequent system
> crashes. My debugging indicates that I call sys_check_timeouts, and
> then the watchdog timer times out, resetting the system. Normally the
> watchdog timer is getting refreshed many times per second around the
> main program loop. It appears that sys_check_timeouts is taking at
> least 5 seconds to execute.
>
> This crash (or watchdog timer reset) is happening very infrequently.
> The system will run flawlessly for 2-3 days before crashing.
> Unfortunately I do not have source level debugging capability on this
> old hardware. It is possible a memory exception could be vectoring me
> to a default exception handler that simply hangs in a loop until the
> watchdog timer resets the system (but proper code should never cause a
> memory exception). Or the sys_check_timeouts could simply be taking
> too long to execute. Can it ever take >5 seconds to execute
> sys_check_timeouts?
>
> My IIM7010A ethernet interface is being operated in polled mode. My
> main loop operates as follows:
> for (;;) {
>   if (received ethernet frame available in wiznet) {
>     ethernetif_input(&wiznetif);    // this will read and process the
> ethernet frame for this interface
>   }
>   sys_check_timeouts();
>   if (I have data to send) {
>     size = size of my data to send; // size is never more than 128 bytes
>     if (tcp_sndbuf(userinfo.pcb) >= size) {    // if enough free space
> to send it
>       if (tcp_write(userinfo.pcb, mybuffer, size, TCP_WRITE_FLAG_COPY)
> == ERR_OK) { // if send is successful
>         tcp_output(userinfo.pcb);    // initiate sending of data
>       }
>     }
>   }
>   do other stuff not related to lwip or tcp/ip    // none of it is
> very time consuming, should take only a few 10's of milliseconds
>   reset_watchdog_timer();
> }
>
> I use ipv4 with fixed ip address. I have a single TCP/IP connection
> up, use callback function for received data, and everything is working
> perfectly except for the occasional crashes. My debugging ability is
> limited to setting debug variables to values in a region of memory
> that is preserved across reboots. My debug variables always show that
> the last thing I did was call sys_check_timeouts before the restart,
> and I never returned from sys_check_timeouts. Any ideas?
>
> Keith
>
>
>
> _______________________________________________
> lwip-users mailing list
> [hidden email]
> https://lists.nongnu.org/mailman/listinfo/lwip-users


_______________________________________________
lwip-users mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/lwip-users
Reply | Threaded
Open this post in threaded view
|

Re: lwip crashing, apparently in sys_check_timeouts

Sergio R. Caprile
In reply to this post by Keith Rubow
You can just check the code yourself, but no, sys_check_timeouts()
should return asap since it is meant to be called as frequently as
possible (afap?) and living in the main loop of many systems.
However, from time to time, something needs to be done, like for example
resending a TCP segment, so it will do something. It does call a system
function you provide, though, to get current time in ms. You should
first check for blocking there, and then perhaps try to discover the
whole scenario to get a clue on where the dragon is lurking. Did you
happen to log the traffic so you can correlate crashes to (for example)
retransmissions ?
Just rambling.

_______________________________________________
lwip-users mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/lwip-users