Reentrency problem within mem.c module

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

Reentrency problem within mem.c module

EVS Hardware Dpt
Hello,

Recently we found a problem in our product. After some times we found that we had a reentrency problem within mem.c module.

In our application when sending data to our Ethernet driver, the driver call pbuf_ref(p) for each pbuf it receives then in the TX ISR handler, it calls pbuf_free(p). On the receive side, RX interrupt simply set a flag telling that some packets are waiting for service. I believe a lot of people here are using this kind of mechanism.

So beside the TX End ISR handler which only call pbuf_free as necessary, all other parts of the stack are running at normal level (ie not interrupt level). So there is effectively only one thread for the stack. We dont have an OS (NO_SYS = 1) but SYS_LIGHTWEIGHT_PROT  = 1. Stack version is currently 1.1.1, but latest CVS version still exhibit this behavior.

Our problem :

We are receiving a lot of data (70MB/s rate on Gigabit interface), so the stack send a lot of ACK.

For each ACK, pbuf_alloc() is called, with pbuf flag as PBUF_RAM => mem_malloc is called, imagine at this point we have a TX Done interrupt, as no SYS_ARCH_PROTECT(lev) was called, the task is interrupted

TX Done Interrupt :
we call pbuf_free(p) in our interrupt handler, which in turn call mem_free and cause our problem because we end up with inconsistencies with Heap management. Typically this fire an ASSERT in a subsequent mem.c function.

In the source code, you can see that there was some protection in the form of sys_sem_wait/sys_sem_signal, but as previously mentioned, we don't use OS Primitives.


The solution of this problem, once found was simply to add SYS_ARCH_PROTECT protection mechanism as an alternative to sys_sem_xxx function already present. The generic form is :

#if SYS_LIGHTWEIGHT_PROT
   SYS_ARCH_PROTECT(level)
#else
   sys_sem_wait(mem_sem)
#endif /* SYS_LIGHTWEIGHT_PROT */


I want to have your advice to known if it's something that must be corrected (I believe so), or left as is.

Thanks,
Fred.

_______________________________________________
lwip-users mailing list
[hidden email]
http://lists.nongnu.org/mailman/listinfo/lwip-users
Reply | Threaded
Open this post in threaded view
|

Re: Reentrency problem within mem.c module

Siva Velusamy

The solution of this problem, once found was simply to add SYS_ARCH_PROTECT protection mechanism as an alternative to sys_sem_xxx function already present. The generic form is :

#if SYS_LIGHTWEIGHT_PROT
   SYS_ARCH_PROTECT(level)
#else
   sys_sem_wait(mem_sem)
#endif /* SYS_LIGHTWEIGHT_PROT */


I want to have your advice to known if it's something that must be corrected (I believe so), or left as is.


Yes, you need to have SYS_ARCH_PROTECT properly defined if you do not implement the sys_arch layer.

-Siva

_______________________________________________
lwip-users mailing list
[hidden email]
http://lists.nongnu.org/mailman/listinfo/lwip-users
Reply | Threaded
Open this post in threaded view
|

Re: Reentrency problem within mem.c module

goldsimon@gmx.de
In reply to this post by EVS Hardware Dpt
Hi,

I also encountered this problem when working with a DMA enabled MAC.  
As you said, this bug does still exist in CVS head, only nobody ever  
really cared to change it, I think.

The reason that _I_ didn't change it is CVS head includes a new  
option to forware mem_malloc() calls to new memp pools of different  
sizes (this was introduced to prevent memory fragmentation). And  
since memp_malloc() uses SYS_ARCH_PROTECT macros only, this solves  
your problem. The downside is that while memory fragmentation is  
prevented, memory usage is higher (because of the fixed size pools)  
_and_ you dont have it in v1.1.1!

The reason mem_malloc() doesn't use the SYS_ARCH_PROTECT macros is  
that this disables interrupt for  too long time span (if your heap is  
big and fragmented). And if your product has higher priority tasks  
than the TCPIP stack, these tasks might get slowed down.

A generic solution for this problem is to either let the user  
configure which protection to use or to put freed memory blocks on a  
list that i later processed in normal context. But to implement this,  
we would need a sys_sem_trywait() in the sys layer, which we  
currently haven't.

Anyway, could you please file a bug report on savannah so this bug  
doesn't get lost (if we don't fix it, it has to be documented).

Simon


Am 25.10.2007 um 16:55 schrieb EVS Hardware Dpt:

> Hello,
>
> Recently we found a problem in our product. After some times we  
> found that we had a reentrency problem within mem.c module.
>
> In our application when sending data to our Ethernet driver, the  
> driver call pbuf_ref(p) for each pbuf it receives then in the TX  
> ISR handler, it calls pbuf_free(p). On the receive side, RX  
> interrupt simply set a flag telling that some packets are waiting  
> for service. I believe a lot of people here are using this kind of  
> mechanism.
>
> So beside the TX End ISR handler which only call pbuf_free as  
> necessary, all other parts of the stack are running at normal level  
> (ie not interrupt level). So there is effectively only one thread  
> for the stack. We dont have an OS (NO_SYS = 1) but  
> SYS_LIGHTWEIGHT_PROT  = 1. Stack version is currently 1.1.1, but  
> latest CVS version still exhibit this behavior.
>
> Our problem :
>
> We are receiving a lot of data (70MB/s rate on Gigabit interface),  
> so the stack send a lot of ACK.
>
> For each ACK, pbuf_alloc() is called, with pbuf flag as PBUF_RAM =>  
> mem_malloc is called, imagine at this point we have a TX Done  
> interrupt, as no SYS_ARCH_PROTECT(lev) was called, the task is  
> interrupted
>
> TX Done Interrupt :
> we call pbuf_free(p) in our interrupt handler, which in turn call  
> mem_free and cause our problem because we end up with  
> inconsistencies with Heap management. Typically this fire an ASSERT  
> in a subsequent mem.c function.
>
> In the source code, you can see that there was some protection in  
> the form of sys_sem_wait/sys_sem_signal, but as previously  
> mentioned, we don't use OS Primitives.
>
>
> The solution of this problem, once found was simply to add  
> SYS_ARCH_PROTECT protection mechanism as an alternative to  
> sys_sem_xxx function already present. The generic form is :
>
> #if SYS_LIGHTWEIGHT_PROT
>    SYS_ARCH_PROTECT(level)
> #else
>    sys_sem_wait(mem_sem)
> #endif /* SYS_LIGHTWEIGHT_PROT */
>
>
> I want to have your advice to known if it's something that must be  
> corrected (I believe so), or left as is.
>
> Thanks,
> Fred.
> _______________________________________________
> lwip-users mailing list
> [hidden email]
> http://lists.nongnu.org/mailman/listinfo/lwip-users



_______________________________________________
lwip-users mailing list
[hidden email]
http://lists.nongnu.org/mailman/listinfo/lwip-users
Reply | Threaded
Open this post in threaded view
|

Re: Reentrency problem within mem.c module

goldsimon@gmx.de
In reply to this post by Siva Velusamy

Am 25.10.2007 um 17:21 schrieb Siva Velusamy:

>
> The solution of this problem, once found was simply to add  
> SYS_ARCH_PROTECT protection mechanism as an alternative to  
> sys_sem_xxx function already present. The generic form is :
>
> #if SYS_LIGHTWEIGHT_PROT
>    SYS_ARCH_PROTECT(level)
> #else
>    sys_sem_wait(mem_sem)
> #endif /* SYS_LIGHTWEIGHT_PROT */
>
>
> I want to have your advice to known if it's something that must be  
> corrected (I believe so), or left as is.
>
>
> Yes, you need to have SYS_ARCH_PROTECT properly defined if you do  
> not implement the sys_arch layer.

mem_malloc() does't use the SYS_ARCH_PROTECT macros, so in this case,  
this doesn't help

Simon


_______________________________________________
lwip-users mailing list
[hidden email]
http://lists.nongnu.org/mailman/listinfo/lwip-users
Reply | Threaded
Open this post in threaded view
|

Re: Reentrency problem within mem.c module

Jonathan Larmour
In reply to this post by goldsimon@gmx.de
Simon Goldschmidt wrote:
> Hi,
>
> I also encountered this problem when working with a DMA enabled MAC. As
> you said, this bug does still exist in CVS head, only nobody ever really
> cared to change it, I think.

I have a similar requirement (calling pbuf_free()). My solution is making
sure it's only ever done in the context of the TCP/IP thread.
tcpip_callback() can be used for this.

> The reason mem_malloc() doesn't use the SYS_ARCH_PROTECT macros is that
> this disables interrupt for  too long time span (if your heap is big and
> fragmented). And if your product has higher priority tasks than the
> TCPIP stack, these tasks might get slowed down.
>
> A generic solution for this problem is to either let the user configure
> which protection to use or to put freed memory blocks on a list that i
> later processed in normal context. But to implement this, we would need
> a sys_sem_trywait() in the sys layer, which we currently haven't.

I think doing a tcpip_callback() call (or something similar), so the tcpip
thread handles it, would be smaller and simpler. (Of course for NO_SYS==1,
you can just call pbuf_free directly since there should only be one context
in lwIP as it is).

Perhaps pbuf_free() is a sufficiently common operation for this sort of
thing, that we could make a new pbuf_free_safe() call or something, which
either calls pbuf_free() (NO_SYS==1) or tcpip_callback to call pbuf_free
(NO_SYS==0).

Jifl
--
eCosCentric Limited      http://www.eCosCentric.com/     The eCos experts
Barnwell House, Barnwell Drive, Cambridge, UK.       Tel: +44 1223 245571
Registered in England and Wales: Reg No 4422071.
------["The best things in life aren't things."]------      Opinions==mine


_______________________________________________
lwip-users mailing list
[hidden email]
http://lists.nongnu.org/mailman/listinfo/lwip-users
Reply | Threaded
Open this post in threaded view
|

Re: Reentrency problem within mem.c module

goldsimon@gmx.de
Jonathan Larmour schrieb:
> I think doing a tcpip_callback() call (or something similar), so the tcpip
> thread handles it, would be smaller and simpler. (Of course for NO_SYS==1,
> you can just call pbuf_free directly since there should only be one context
> in lwIP as it is).
>  
That's the problem: you can't simply call pbuf_free directly because you
could be sending a packet at non-interrupt level while trying to free a
pbuf at TX interrupt level!

For NO_SYS==0, I favour your tcpip_callback since the interrupt function
doesn't need to walk the heap and thus is faster. But I would favour a
solution that solves this problem for both NO_SYS settings...

> Perhaps pbuf_free() is a sufficiently common operation for this sort of
> thing, that we could make a new pbuf_free_safe() call or something, which
> either calls pbuf_free() (NO_SYS==1) or tcpip_callback to call pbuf_free
> (NO_SYS==0).
>  
pbuf_free_safe() would be enough in this situation, but since memp_free
is the problem (no pbuf_free()),
I think memp_free_safe() would be better since it is more generic.

Simon


_______________________________________________
lwip-users mailing list
[hidden email]
http://lists.nongnu.org/mailman/listinfo/lwip-users
Reply | Threaded
Open this post in threaded view
|

Re: Reentrency problem within mem.c module

EVS Hardware Dpt

pbuf_free_safe() would be enough in this situation, but since memp_free
is the problem (no pbuf_free()),
I think memp_free_safe() would be better since it is more generic.


mem_free_safe could do the trick but only when paired with a mem_malloc_safe. Or you have another idea with mem_free_safe.

I like your idea of putting a list of must_be_freed buffer.

We could have a lock variable that is set in mem_malloc

void mem_malloc()
{
   //Acquire Lock
   mem_lock = 1;

   //Do all the allocating stuff

   //Release Lock
   mem_lock = 0;
}

in mem_free()
{
   if(mem_lock == 1)
   {
      //Put buffer on the must_be_freed list for later processing (mem_free_list)
   }
   else
   {
      //Continue the free process

     //And also free other item in the must_be_freed list
     mem_free_list()
   }
}

We could then process list in another portion of the stack and why not in mem_malloc.


As a side note, this kind of functionality (list access) could be achieved with some Atomic primitives (InterlockedCompareExchange, AtomicCompareExchange). I our code we use often this kind of functions. This could be interesting to provide some macro, that could be redefined to meet each platform.

A basic implementation (compatible with all platforms) would be
typedef u32_t uDef_t; //Default architecture unsigned value (here, 32 bits processor => 32 bits unsigned value)

uDef_t AtomicCompareExchange(uDef_t* Address, uDef_t Comparand, uDef_t NewVal)
{
   uDef_t OldValue;
   SYS_ARCH_DECL_PROTECT(old_level);

   old_level = SYS_ARCH_PROTECT();
   if((OldValue = *Address) == Comparand){
      *Address = NewVal;     
   }
   SYS_ARCH_UNPROTECT(old_level);
   return OldValue;
}

In a win32 system this would be the InterlockedCompareExchange function, in a PowerPC platform this would be a function using atomic primitives functionnalities.

Fred

_______________________________________________
lwip-users mailing list
[hidden email]
http://lists.nongnu.org/mailman/listinfo/lwip-users
Reply | Threaded
Open this post in threaded view
|

Re: Reentrency problem within mem.c module

goldsimon@gmx.de
In reply to this post by goldsimon@gmx.de
Simon Goldschmidt schrieb:
> Anyway, could you please file a bug report on savannah so this bug  
> doesn't get lost (if we don't fix it, it has to be documented).
>  
Did that already:

http://savannah.nongnu.org/bugs/index.php?21433


_______________________________________________
lwip-users mailing list
[hidden email]
http://lists.nongnu.org/mailman/listinfo/lwip-users