lwip hangs after >12 hours of work

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

lwip hangs after >12 hours of work

Artem Moroz
Hi, All!

I have STM32F7 board with lwip 2.0.3 running PPOS connection with SIMCOM
modem. I am attaching my lwipopts.h file.  lwipopts.h
<http://lwip.100.n7.nabble.com/file/t2270/lwipopts.h>   After some (long)
period of  time when modem fails to transmit data, the PPOS connection seems
to be "hung". It does nothing. No retries. I've tried to simulate the
behavior by dropping connection on counter variable, but it does restore.

I have the following output function which shows error "MDM tx timeout end"
and then "MDM tx send failed" two times and then gets not invoked.

static u32_t ppp1_output_cb(ppp_pcb *pcb, u8_t *data, u32_t len, void *ctx)
{
        //printf("ppp1_o len %d begin\n", len);
        osEvent evt={0};
        do
        {
                evt = osMessageGet(msgLwipTxUartHandle, 0);

        } while(evt.status == osEventMessage);

        if (!g_bModemNetworkInited)
        {
                printf("MDM tx not inited end\r\n");
                return len;
        }

        if (g_bModemSendFailed)
        {
                printf("MDM tx send failed\r\n");
                return len;
        }

        osSemaphoreWait(semModemHandle, osWaitForever);
        HAL_UART_Transmit_DMA(g_pUartModem, data, len);
        osSemaphoreRelease(semModemHandle);

// static int nSent = 0;
//
// if (nSent < 200)
// {
// nSent++;
        evt = osMessageGet(msgLwipTxUartHandle, 500);
        if (evt.status == osEventMessage)
        {
                //printf("MDM tx len %lu\n", len);
                return len;
        }
// }
// else
// {
// osDelay(500);
//
// }

        g_bModemSendFailed = 1;
        printf("MDM tx timeout end\r\n");
        //pppapi_close(pcb, 1);

        return len;
}

What may be the problem? How can I trace it down?





--
Sent from: http://lwip.100.n7.nabble.com/lwip-users-f3.html

_______________________________________________
lwip-users mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/lwip-users
Reply | Threaded
Open this post in threaded view
|

Re: lwip hangs after >12 hours of work

Patrick Klos-2
I know it's a long shot, but I'm gonna go off on a tangent and ask how your system does elapsed timing?

Bottom line: check that your timers aren't rolling over.  If you happen to have a 100 kHz clock and use it to time things with 32 bit unsigned integers, it'll roll over in about 12 hours (well, 11.93 hours to be more precise).  I've seen such odd timing related issues several times in the last few months, so I thought I'd mention it.

If that doesn't shed any light, a few follow up questions:
  1. Is it always the same exact amount of time (give or take a little)?? (if yes, answer #2)
  2. If you start the system, but don't start the connection for 30 minutes, does the system stop after the amount of time the system was up or the amount of time that the connection was up?
  3. How many characters have been transmitted and received during that timeframe?
Good luck hunting!

Patrick Klos
Klos Technologies, Inc.

On 3/25/2020 7:06 PM, Artem Moroz wrote:
Hi, All!

I have STM32F7 board with lwip 2.0.3 running PPOS connection with SIMCOM
modem. I am attaching my lwipopts.h file.  lwipopts.h
<http://lwip.100.n7.nabble.com/file/t2270/lwipopts.h>   After some (long)
period of  time when modem fails to transmit data, the PPOS connection seems
to be "hung". It does nothing. No retries. I've tried to simulate the
behavior by dropping connection on counter variable, but it does restore. 

I have the following output function which shows error "MDM tx timeout end"
and then "MDM tx send failed" two times and then gets not invoked. 

static u32_t ppp1_output_cb(ppp_pcb *pcb, u8_t *data, u32_t len, void *ctx)
{
	//printf("ppp1_o len %d begin\n", len);
	osEvent evt={0};
	do
	{
		evt = osMessageGet(msgLwipTxUartHandle, 0);

	} while(evt.status == osEventMessage);

	if (!g_bModemNetworkInited)
	{
		printf("MDM tx not inited end\r\n");
		return len;
	}

	if (g_bModemSendFailed)
	{
		printf("MDM tx send failed\r\n");
		return len;
	}

	osSemaphoreWait(semModemHandle, osWaitForever);
	HAL_UART_Transmit_DMA(g_pUartModem, data, len);
	osSemaphoreRelease(semModemHandle);

//	static int nSent = 0;
//
//	if (nSent < 200)
//	{
//	nSent++;
	evt = osMessageGet(msgLwipTxUartHandle, 500);
	if (evt.status == osEventMessage)
	{
		//printf("MDM tx len %lu\n", len);
		return len;
	}
//	}
//	else
//	{
//		osDelay(500);
//
//	}

	g_bModemSendFailed = 1;
	printf("MDM tx timeout end\r\n");
	//pppapi_close(pcb, 1);

	return len;
}

What may be the problem? How can I trace it down?





--
Sent from: http://lwip.100.n7.nabble.com/lwip-users-f3.html

_______________________________________________
lwip-users mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/lwip-users


_______________________________________________
lwip-users mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/lwip-users
Reply | Threaded
Open this post in threaded view
|

Re: lwip hangs after >12 hours of work

Trampas Stern
Along the same lines does it reset with a reboot?  If so check timers.  If not consider hardware heating. 

On Wed, Mar 25, 2020 at 7:52 PM Patrick Klos <[hidden email]> wrote:
I know it's a long shot, but I'm gonna go off on a tangent and ask how your system does elapsed timing?

Bottom line: check that your timers aren't rolling over.  If you happen to have a 100 kHz clock and use it to time things with 32 bit unsigned integers, it'll roll over in about 12 hours (well, 11.93 hours to be more precise).  I've seen such odd timing related issues several times in the last few months, so I thought I'd mention it.

If that doesn't shed any light, a few follow up questions:
  1. Is it always the same exact amount of time (give or take a little)?? (if yes, answer #2)
  2. If you start the system, but don't start the connection for 30 minutes, does the system stop after the amount of time the system was up or the amount of time that the connection was up?
  3. How many characters have been transmitted and received during that timeframe?
Good luck hunting!

Patrick Klos
Klos Technologies, Inc.

On 3/25/2020 7:06 PM, Artem Moroz wrote:
Hi, All!

I have STM32F7 board with lwip 2.0.3 running PPOS connection with SIMCOM
modem. I am attaching my lwipopts.h file.  lwipopts.h
<http://lwip.100.n7.nabble.com/file/t2270/lwipopts.h>   After some (long)
period of  time when modem fails to transmit data, the PPOS connection seems
to be "hung". It does nothing. No retries. I've tried to simulate the
behavior by dropping connection on counter variable, but it does restore. 

I have the following output function which shows error "MDM tx timeout end"
and then "MDM tx send failed" two times and then gets not invoked. 

static u32_t ppp1_output_cb(ppp_pcb *pcb, u8_t *data, u32_t len, void *ctx)
{
	//printf("ppp1_o len %d begin\n", len);
	osEvent evt={0};
	do
	{
		evt = osMessageGet(msgLwipTxUartHandle, 0);

	} while(evt.status == osEventMessage);

	if (!g_bModemNetworkInited)
	{
		printf("MDM tx not inited end\r\n");
		return len;
	}

	if (g_bModemSendFailed)
	{
		printf("MDM tx send failed\r\n");
		return len;
	}

	osSemaphoreWait(semModemHandle, osWaitForever);
	HAL_UART_Transmit_DMA(g_pUartModem, data, len);
	osSemaphoreRelease(semModemHandle);

//	static int nSent = 0;
//
//	if (nSent < 200)
//	{
//	nSent++;
	evt = osMessageGet(msgLwipTxUartHandle, 500);
	if (evt.status == osEventMessage)
	{
		//printf("MDM tx len %lu\n", len);
		return len;
	}
//	}
//	else
//	{
//		osDelay(500);
//
//	}

	g_bModemSendFailed = 1;
	printf("MDM tx timeout end\r\n");
	//pppapi_close(pcb, 1);

	return len;
}

What may be the problem? How can I trace it down?





--
Sent from: http://lwip.100.n7.nabble.com/lwip-users-f3.html

_______________________________________________
lwip-users mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/lwip-users

_______________________________________________
lwip-users mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/lwip-users

_______________________________________________
lwip-users mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/lwip-users
Reply | Threaded
Open this post in threaded view
|

Re: lwip hangs after >12 hours of work

Artem Moroz
In reply to this post by Patrick Klos-2
I have increased data download size and now it is hanging much more
frequently, approximately 10 minutes. I am investingating.



--
Sent from: http://lwip.100.n7.nabble.com/lwip-users-f3.html

_______________________________________________
lwip-users mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/lwip-users
Reply | Threaded
Open this post in threaded view
|

Re: lwip hangs after >12 hours of work

Trampas Stern
Sounds like a possible stack overflow. 


On Thu, Mar 26, 2020 at 10:20 AM Artem Moroz <[hidden email]> wrote:
I have increased data download size and now it is hanging much more
frequently, approximately 10 minutes. I am investingating.



--
Sent from: http://lwip.100.n7.nabble.com/lwip-users-f3.html

_______________________________________________
lwip-users mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/lwip-users

_______________________________________________
lwip-users mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/lwip-users
Reply | Threaded
Open this post in threaded view
|

Re: lwip hangs after >12 hours of work

Artem Moroz
Can it be some problems in PPOS input path? Some data that may hang PPOS in
case of bad input. I doubt this is stack overflow or timer wrap-around, I
have double checked this



--
Sent from: http://lwip.100.n7.nabble.com/lwip-users-f3.html

_______________________________________________
lwip-users mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/lwip-users
Reply | Threaded
Open this post in threaded view
|

Re: lwip hangs after >12 hours of work

Patrick Klos-2
On 3/26/2020 11:23 AM, Artem Moroz wrote:
Can it be some problems in PPOS input path? Some data that may hang PPOS in
case of bad input. I doubt this is stack overflow or timer wrap-around, I
have double checked this

I doubt an input issue would consistently show up at 12 hours on a regular basis?

How about the other questions? 
  1. Is it 12 hours after you boot your device or 12 hours after the connection is started [test by starting the connection an hour after booting the device]? 
  2. How many characters have been transmitted and received when the failure occurs?
  3. How does the failure manifest itself?  Does your device actually crash or just stop communicating?
You said it happens sooner if you do larger transfers??  Any other clues with regards to that??

What more can you tell us about your usage?

Patrick


_______________________________________________
lwip-users mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/lwip-users