lwIP, FreeRTOS, STM32: SSL client "hangs" on semaphore take using infinite timeout

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

lwIP, FreeRTOS, STM32: SSL client "hangs" on semaphore take using infinite timeout

bulek44
Hello,

I've noticed and at last analyzed/debug the nasty situation, where my SSL
client task "hangs" after few hours with no further action.

I've analyzed the situation and it seems that it hangs on taking a semaphore
with infinite timeout inside LwIP part.


#1
I was quite surprised, because I've noticed, that semaphore is taken
(called) by using infinite timeout. That means that task will never resume
or know that something is wrong (it seems to take semaphore to
send a message through LwIP). Shouldn't such code always be written in more
non-blocking manner and return in some finite time interval if no semaphore
is available...


#2
AFAIK, blocking calls should be avoided, particularly if they show potential
to be blocking forever.


#3
There is a setting in the LwIP code that enables/disables IP Core Locking -
LWIP_TCPIP_CORE_LOCKING and triggers the use of semaphore.
Has anyone any idea, what happens if I disable that setting ?


Thanks in advance,
regards,
Bully.

Call Stack (from bottom -> up - lines executed are in bold):

/in sys_arch_sem_wait() at sys_arch.c:322 0x802ecc6   :   /
 
#if (osCMSIS < 0x20000U)
   *while(osSemaphoreWait (*sem, osWaitForever) != osOK);*
   return (osKernelSysTick() - starttime);
#else
   while(osSemaphoreAcquire(*sem, osWaitForever) != osOK);
   return (osKernelGetTickCount() - starttime);
#endif

/in lwip_netconn_do_write() at api_msg.c:1.675 0x801ff28       /  
/**
 * Send some data on a TCP pcb contained in a netconn
 * Called from netconn_write
 *
 * @param m the api_msg_msg pointing to the connection
 */
void
lwip_netconn_do_write(void *m)
{
 struct api_msg *msg = (struct api_msg*)m;
 
 if (ERR_IS_FATAL(msg->conn->last_err)) {
   msg->err = msg->conn->last_err;
 } else {
   if (NETCONNTYPE_GROUP(msg->conn->type) == NETCONN_TCP) {
#if LWIP_TCP
     if (msg->conn->state != NETCONN_NONE) {
       /* netconn is connecting, closing or in blocking write */
       msg->err = ERR_INPROGRESS;
     } else if (msg->conn->pcb.tcp != NULL) {
       msg->conn->state = NETCONN_WRITE;
       /* set all the variables used by lwip_netconn_do_writemore */
       LWIP_ASSERT("already writing or closing", msg->conn->current_msg ==
NULL &&
         msg->conn->write_offset == 0);
       LWIP_ASSERT("msg->msg.w.len != 0", msg->msg.w.len != 0);
       msg->conn->current_msg = msg;
       msg->conn->write_offset = 0;
#if LWIP_TCPIP_CORE_LOCKING
       if (lwip_netconn_do_writemore(msg->conn, 0) != ERR_OK) {
         LWIP_ASSERT("state!", msg->conn->state == NETCONN_WRITE);
         UNLOCK_TCPIP_CORE();
     *    sys_arch_sem_wait(LWIP_API_MSG_SEM(msg), 0);*
         LOCK_TCPIP_CORE();
         LWIP_ASSERT("state!", msg->conn->state != NETCONN_WRITE);
       }
#else /* LWIP_TCPIP_CORE_LOCKING */
       lwip_netconn_do_writemore(msg->conn);
#endif /* LWIP_TCPIP_CORE_LOCKING */
       /* for both cases: if lwip_netconn_do_writemore was called, don't ACK
the APIMSG
          since lwip_netconn_do_writemore ACKs it! */
       return;
     } else {
       msg->err = ERR_CONN;
     }
#else /* LWIP_TCP */
     msg->err = ERR_VAL;
#endif /* LWIP_TCP */
#if (LWIP_UDP || LWIP_RAW)
   } else {
     msg->err = ERR_VAL;
#endif /* (LWIP_UDP || LWIP_RAW) */
   }
 }
 TCPIP_APIMSG_ACK(msg);
}
 
.....





--
Sent from: http://lwip.100.n7.nabble.com/lwip-devel-f11621.html

_______________________________________________
lwip-devel mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/lwip-devel
Reply | Threaded
Open this post in threaded view
|

Re: lwIP, FreeRTOS, STM32: SSL client "hangs" on semaphore take using infinite timeout

goldsimon@gmx.de
"bulek44" wrote:

> I've noticed and at last analyzed/debug the nasty situation, where my SSL
> client task "hangs" after few hours with no further action.
>
> I've analyzed the situation and it seems that it hangs on taking a semaphore
> with infinite timeout inside LwIP part.
>
>
> #1
> I was quite surprised, because I've noticed, that semaphore is taken
> (called) by using infinite timeout. That means that task will never resume
> or know that something is wrong (it seems to take semaphore to
> send a message through LwIP). Shouldn't such code always be written in more
> non-blocking manner and return in some finite time interval if no semaphore
> is available...

No. This is a mutex, not a semaphore. There's no way to specify which timeout
is OK and at which time an error should be raised. Seems you found a bug
somewhere, but adding a timeout to taking the core lock won't fix that.

>
>
> #2
> AFAIK, blocking calls should be avoided, particularly if they show potential
> to be blocking forever.

Well, no. Even for nonblocking sockets, an OS just doesn't work that way: at
some point you *have* to block to ensure thread synchronization.

>
>
> #3
> There is a setting in the LwIP code that enables/disables IP Core Locking -
> LWIP_TCPIP_CORE_LOCKING and triggers the use of semaphore.
> Has anyone any idea, what happens if I disable that setting ?

You'll get the message passing variant. lwip_write() will send a message
containing what to do to the tcpip_thread and block on a semaphore waiting
for the response.

That won't fix your bug, either.

You'll need to find out which task has locked the tcpip core mutex and why.

Regards,
Simon

_______________________________________________
lwip-devel mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/lwip-devel
Reply | Threaded
Open this post in threaded view
|

Re: lwIP, FreeRTOS, STM32: SSL client "hangs" on semaphore take using infinite timeout

bulek44
[hidden email] wrote

> "bulek44" wrote:
>> I've noticed and at last analyzed/debug the nasty situation, where my SSL
>> client task "hangs" after few hours with no further action.
>>
>> I've analyzed the situation and it seems that it hangs on taking a
>> semaphore
>> with infinite timeout inside LwIP part.
>>
>>
>> #1
>> I was quite surprised, because I've noticed, that semaphore is taken
>> (called) by using infinite timeout. That means that task will never
>> resume
>> or know that something is wrong (it seems to take semaphore to
>> send a message through LwIP). Shouldn't such code always be written in
>> more
>> non-blocking manner and return in some finite time interval if no
>> semaphore
>> is available...
>
> No. This is a mutex, not a semaphore. There's no way to specify which
> timeout
> is OK and at which time an error should be raised. Seems you found a bug
> somewhere, but adding a timeout to taking the core lock won't fix that.
>
>
>> #3
>> There is a setting in the LwIP code that enables/disables IP Core Locking
>> -
>> LWIP_TCPIP_CORE_LOCKING and triggers the use of semaphore.
>> Has anyone any idea, what happens if I disable that setting ?
>
> You'll get the message passing variant. lwip_write() will send a message
> containing what to do to the tcpip_thread and block on a semaphore waiting
> for the response.
>
> That won't fix your bug, either.
>
> You'll need to find out which task has locked the tcpip core mutex and
> why.
>
> Regards,
> Simon
>
> _______________________________________________
> lwip-devel mailing list

> lwip-devel@

> https://lists.nongnu.org/mailman/listinfo/lwip-devel

Hello,

thanks for the answer... It seems I need to explore further... It's hard to
say, what could take that mutex, since I have only one task using LwIP
(DHCP, network setup, then SSL communication - in here it hangs),
particularly sending/reading from socket...

The only thing that comes to my mind is that I have enabled
/LWIP_NETIF_LINK_CALLBACK (Callback Function on Interface Link Changes)
LWIP_NETIF_LINK_CALLBACK Parameter Description: Set parameter allow to
support a callback function from an interface whenever the link changes
(i.e., link down). Dependency: None./

Is there a chance that if link status changes that somehow this callback
could cause this problem?
Not sure under which task that callback is started...

Is there anything other internal in LwIP that could take the mutex ?

Are there any rules regarding calling LwIP functions from other tasks
described somewhere (should I use mutexes) - i will actually use only link
change related callbacks - nothing else ?

Thanks in advance,
regards,
Bulek







--
Sent from: http://lwip.100.n7.nabble.com/lwip-devel-f11621.html

_______________________________________________
lwip-devel mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/lwip-devel