Receive path stuck due to pbuf_alloc returning NULL

classic Classic list List threaded Threaded
15 messages Options
Reply | Threaded
Open this post in threaded view
|

Receive path stuck due to pbuf_alloc returning NULL

Sebastian Gonzalez
Hi,

I am using lwIP 1.3.2 with an Atmel AT91SAM7X512, and FreeRTOS. We have already used this combination in other projects with no problem, but now we using our design in a network with high density of UDP broadcast traffic causing the system to stop receiving.
The transmission path keeps working as I can see ARP request messages coming out in the wireshark traces.
After debugging and searching I found that several people had the same issue: The pbuf_alloc call from low_level_input in the ethernet driver returns NULL during the packet storm and keeps returning NULL, as if the TCP/IP task wasn't fast enough to free the pbufs, and thus the packets from the EMAC do not move to the upper layers.
I do understand that during a packet storm all the packets that can't be processed are dropped, actually that's the behaviour that I expect. But I don't get why the consumer process is unable to free the packets that have already been passed to the upper layer.
I have tried giving the TCP/IP thread the higher priority with no results. Also changed the number of pbufs from 8 to 16 and noticed that the problem happened later in time.
Is there a recommended value for the number of pbufs, considering my reduced schema of memory?

Best regards.

Sebastian
 
Reply | Threaded
Open this post in threaded view
|

Re: Receive path stuck due to pbuf_alloc returning NULL

Sylvain Rochet
Hi Sebastian,


On Mon, May 27, 2013 at 10:01:44AM -0700, Sebastian Gonzalez wrote:

> Hi,
>
> I am using lwIP 1.3.2 with an Atmel AT91SAM7X512, and FreeRTOS. We have
> already used this combination in other projects with no problem, but now we
> using our design in a network with high density of UDP broadcast traffic
> causing the system to stop receiving.
> The transmission path keeps working as I can see ARP request messages coming
> out in the wireshark traces.
> After debugging and searching I found that several people had the same
> issue: The pbuf_alloc call from low_level_input in the ethernet driver
> returns NULL during the packet storm and keeps returning NULL, as if the
> TCP/IP task wasn't fast enough to free the pbufs, and thus the packets from
> the EMAC do not move to the upper layers.
> I do understand that during a packet storm all the packets that can't be
> processed are dropped, actually that's the behaviour that I expect. But I
> don't get why the consumer process is unable to free the packets that have
> already been passed to the upper layer.
> I have tried giving the TCP/IP thread the higher priority with no results.
> Also changed the number of pbufs from 8 to 16 and noticed that the problem
> happened later in time.
> Is there a recommended value for the number of pbufs, considering my reduced
> schema of memory?
As usual, looks like a bug in the MACB driver, you have to check
carefully if the lwIP pbug get free()d whatever is happening along the
input and output path.


I can't talk about the AT91 MACB driver, but the AT32 MACB driver suffer
a huge bug about that, it only free()s MACB TX buffers of successfully
sent frames, which ends up by locking the TX path, RX path is still live
and is allocating all pbuf.

  void vClearMACBTxBuffer(void) {
    // The first buffer in the frame should have the bit set automatically. */
    if( xTxDescriptors[ uxNextBufferToClear ].U_Status.status & AVR32_TRANSMIT_OK ) {
      [...]
    }
  }

  "Before a transmission, bit 31 is the "used" bit which must be zero
   when the control word is read. It is written to one when a frame has
   been transmitted."

Guess what happens if the "transmit ok" bit is not set and the "should
have the bit set" ... going to be false :>


If it helps, I attached my patch against the AT32 MACB driver which
helps the system to recover, maybe the AT91 driver is similar. The patch
is not perfect, because it drops all queued frames, which I consider
adequate because it only happens a few times per week on a very very
loaded ethif.


Sylvain

_______________________________________________
lwip-users mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/lwip-users

at32-macb-fix-non-free-pbuf-on-failed-tx-frame.diff (2K) Download Attachment
signature.asc (205 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Receive path stuck due to pbuf_alloc returning NULL

Sebastian Gonzalez
Hi Sylvain,

Thank you for your inputs. I have checked the files that you sent and this issue is covered in my driver.
Based on my tests it looks like a race condition on the receive path. Pbufs are allocated in the etherif layer and passed to an upper layer for processing, this upper layer does not free the pbufs, not because of a memory leakage, just because it can't (i.e. IP defragmenting).
At some point when the traffic rises the upper layer requires more packets in order to complete frames and release the pbufs, and the etherif layer can not allocate more pbufs because it has ran out of memory, so they come to a deadlock.
How can I break it? If I close every socket of my application layer will it make it?

Best regards.

Sebastian.


El 27/05/2013 19:54, Sylvain Rochet [via lwIP] escribió:
Hi Sebastian,


On Mon, May 27, 2013 at 10:01:44AM -0700, Sebastian Gonzalez wrote:

> Hi,
>
> I am using lwIP 1.3.2 with an Atmel AT91SAM7X512, and FreeRTOS. We have
> already used this combination in other projects with no problem, but now we
> using our design in a network with high density of UDP broadcast traffic
> causing the system to stop receiving.
> The transmission path keeps working as I can see ARP request messages coming
> out in the wireshark traces.
> After debugging and searching I found that several people had the same
> issue: The pbuf_alloc call from low_level_input in the ethernet driver
> returns NULL during the packet storm and keeps returning NULL, as if the
> TCP/IP task wasn't fast enough to free the pbufs, and thus the packets from
> the EMAC do not move to the upper layers.
> I do understand that during a packet storm all the packets that can't be
> processed are dropped, actually that's the behaviour that I expect. But I
> don't get why the consumer process is unable to free the packets that have
> already been passed to the upper layer.
> I have tried giving the TCP/IP thread the higher priority with no results.
> Also changed the number of pbufs from 8 to 16 and noticed that the problem
> happened later in time.
> Is there a recommended value for the number of pbufs, considering my reduced
> schema of memory?
As usual, looks like a bug in the MACB driver, you have to check
carefully if the lwIP pbug get free()d whatever is happening along the
input and output path.


I can't talk about the AT91 MACB driver, but the AT32 MACB driver suffer
a huge bug about that, it only free()s MACB TX buffers of successfully
sent frames, which ends up by locking the TX path, RX path is still live
and is allocating all pbuf.

  void vClearMACBTxBuffer(void) {
    // The first buffer in the frame should have the bit set automatically. */
    if( xTxDescriptors[ uxNextBufferToClear ].U_Status.status & AVR32_TRANSMIT_OK ) {
      [...]
    }
  }

  "Before a transmission, bit 31 is the "used" bit which must be zero
   when the control word is read. It is written to one when a frame has
   been transmitted."

Guess what happens if the "transmit ok" bit is not set and the "should
have the bit set" ... going to be false :>


If it helps, I attached my patch against the AT32 MACB driver which
helps the system to recover, maybe the AT91 driver is similar. The patch
is not perfect, because it drops all queued frames, which I consider
adequate because it only happens a few times per week on a very very
loaded ethif.


Sylvain

_______________________________________________
lwip-users mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/lwip-users

at32-macb-fix-non-free-pbuf-on-failed-tx-frame.diff (2K) Download Attachment
signature.asc (205 bytes) Download Attachment



If you reply to this email, your message will be added to the discussion below:
http://lwip.100.n7.nabble.com/Receive-path-stuck-due-to-pbuf-alloc-returning-NULL-tp21461p21463.html
To unsubscribe from Receive path stuck due to pbuf_alloc returning NULL, click here.
NAML

--
Firma By
Merry
            Christmas
Sebastián González
Investigación y Desarrollo
Tlf: 902 82 00 82
Fax: 902 82 00 83

[hidden email]
www.by.com.es


Antes de imprimir este e-mail piense si realmente es necesario hacerlo, el medio ambiente se lo agradecerá.

ADVERTENCIA
La información contenida en este correo electrónico, es de carácter privado y confidencial, siendo para uso exclusivo de su destinatario. Si usted no es el destinatario correcto, o ha recibido esta comunicación por error, le informamos que está totalmente prohibida cualquier divulgación, distribución o reproducción de esta comunicación según la legislación vigente y le rogamos que nos lo notifique inmediatamente, procediendo a su destrucción sin continuar su lectura.
Su dirección de correo electrónico, así como el resto de los datos de carácter personal que nos facilite, podrían ser objeto de tratamiento automatizado en nuestros ficheros, con la finalidad de gestionar la agenda de contactos de BY TECHDESIGN,S.L. Vd. podrá en cualquier momento ejercer sus derechos de acceso, rectificación, cancelación y oposición según la Ley Orgánica 15/1999 mediante notificación escrita a la siguiente dirección: CALLE TOMAS EDISON 5 28500 ARGANDA DEL REY.


Reply | Threaded
Open this post in threaded view
|

Re: Receive path stuck due to pbuf_alloc returning NULL

Sebastian Gonzalez
Hi,

By the way this question was already in the mailing list with no awnsers:

http://lwip.100.n7.nabble.com/during-Boradcast-Storm-pbuf-alloc-returns-Zero-release-pbuf-tp21289p21291.html

Best regards.

Sebastian.
Reply | Threaded
Open this post in threaded view
|

Re: Receive path stuck due to pbuf_alloc returning NULL

Ivan Delamer-2
In reply to this post by Sebastian Gonzalez
Hello Sebastian,

The example AT91 port from FreeRTOS has some issues that show up in
large traffic scenarios, typically with many UDP broadcasts or
multicasts, but could also be caused by high volume TCP.

I have optimized the driver in many ways so I don't remember exactly
the changes I made to fix this, but it was something like this:

- When pbufs are allocated in ethernetif.c, there are some conditions
where the packets aren't freed properly (if not an IP packet?). this led
to a memory leak.

- Sometimes there are not enough memp messages to pass packets/messages
to the tcpip thread. I increased this from 6? to 20.

There is also sometimes some unexpected behaviors from the AT91 EMAC
when receiving packets really fast, so for example if you are in the IRQ
for sending and then you receive something during, you need to take
care. My ISR looks like this:

void vEMACISR_Handler( void )
{
volatile unsigned portLONG ulIntStatus __attribute__ ((unused));
volatile unsigned portLONG ulTxStatus;
portBASE_TYPE xHigherPriorityTaskWoken = pdFALSE;

     /* Find the cause of the interrupt, cleared on read. */
     ulIntStatus = AT91C_BASE_EMAC->EMAC_ISR;

     if ( AT91C_BASE_EMAC->EMAC_RSR & AT91C_EMAC_REC )
     {
         /* A frame has been received, signal the lwIP task so it can
process
         the Rx descriptors. */
         AT91C_BASE_EMAC->EMAC_RSR = AT91C_EMAC_REC;
         xSemaphoreGiveFromISR( xSemaphore, &xHigherPriorityTaskWoken );
     }

     ulTxStatus = AT91C_BASE_EMAC->EMAC_TSR;
     if( ulTxStatus & AT91C_EMAC_COMP )
     {
         /* A frame has been transmitted.  Mark all the buffers used by
the
         frame just transmitted as free again. */
         AT91C_BASE_EMAC->EMAC_TSR = AT91C_EMAC_COMP;
         vClearEMACTxBuffer();
     }
     if( ulTxStatus & (AT91C_EMAC_UND | AT91C_EMAC_BEX))
     {
         /* A frame Tx failed. Reset Tx buffers. */
         AT91C_BASE_EMAC->EMAC_TSR = AT91C_EMAC_UND | AT91C_EMAC_BEX;
         EMAC_Statistics.tx_errors++;
         vResetEMACTxBuffer();
     }


     /* Clear the interrupt. */
     AT91C_BASE_AIC->AIC_EOICR = 0;

     /* If a task was woken by either a frame being received then we may
need to
     switch to another task.  If the unblocked task was of higher
priority then
     the interrupted task it will then execute immediately that the ISR
     completes. */
     if( xHigherPriorityTaskWoken )
     {
      portYIELD_FROM_ISR();
     }
}
/*-----------------------------------------------------------*/

void  vEMACISR_Wrapper( void )
{
     /* Save the context of the interrupted task. */
     portSAVE_CONTEXT();

     /* Call the handler to do the work.  This must be a separate
     function to ensure the stack frame is set up correctly. */
     vEMACISR_Handler();

     /* Restore the context of whichever task will execute next. */
     portRESTORE_CONTEXT();
}

I hope this helps. I've struggled a lot when moving from low-traffic
lab network to high-traffic production network.

Cheers
Ivan


> Date: Mon, 27 May 2013 10:01:44 -0700 (PDT)
> From: Sebastian Gonzalez <[hidden email]>
> To: [hidden email]
> Subject: [lwip-users] Receive path stuck due to pbuf_alloc returning
> NULL
> Message-ID: <[hidden email]>
> Content-Type: text/plain; charset=us-ascii
>
> Hi,
>
> I am using lwIP 1.3.2 with an Atmel AT91SAM7X512, and FreeRTOS. We
> have
> already used this combination in other projects with no problem, but
> now we
> using our design in a network with high density of UDP broadcast
> traffic
> causing the system to stop receiving.
> The transmission path keeps working as I can see ARP request messages
> coming
> out in the wireshark traces.
> After debugging and searching I found that several people had the same
> issue: The pbuf_alloc call from low_level_input in the ethernet driver
> returns NULL during the packet storm and keeps returning NULL, as if
> the
> TCP/IP task wasn't fast enough to free the pbufs, and thus the packets
> from
> the EMAC do not move to the upper layers.
> I do understand that during a packet storm all the packets that can't
> be
> processed are dropped, actually that's the behaviour that I expect.
> But I
> don't get why the consumer process is unable to free the packets that
> have
> already been passed to the upper layer.
> I have tried giving the TCP/IP thread the higher priority with no
> results.
> Also changed the number of pbufs from 8 to 16 and noticed that the
> problem
> happened later in time.
> Is there a recommended value for the number of pbufs, considering my
> reduced
> schema of memory?
>
> Best regards.
>
> Sebastian
>
>
>
>
> --
> View this message in context:
> http://lwip.100.n7.nabble.com/Receive-path-stuck-due-to-pbuf-alloc-returning-NULL-tp21461.html
> Sent from the lwip-users mailing list archive at Nabble.com.
>


_______________________________________________
lwip-users mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/lwip-users
Reply | Threaded
Open this post in threaded view
|

Re: Receive path stuck due to pbuf_alloc returning NULL

Sebastian Gonzalez
Dear Ivan,

I had some issues with the driver that had to be debugged as well, adding AT91C_EMAC_ROVR and  AT91C_EMAC_RXUBR as interrupt sources for example, in order to signal an overrun in the ISR. Apart of that my handler looks pretty much like yours.
The number of messages that could be queued was 6 as well in my code Vs. 8 pbufs. Provided that the messages/packets are used for incoming packets and other messages to the thread this number was definitely wrong. Changed it to 16 and testing looks good.
I'll try this solution a couple of days.

Thank you so much.
Best regards.

Sebastian.


El 28/05/2013 18:06, Ivan Delamer-2 [via lwIP] escribió:
Hello Sebastian,

The example AT91 port from FreeRTOS has some issues that show up in
large traffic scenarios, typically with many UDP broadcasts or
multicasts, but could also be caused by high volume TCP.

I have optimized the driver in many ways so I don't remember exactly
the changes I made to fix this, but it was something like this:

- When pbufs are allocated in ethernetif.c, there are some conditions
where the packets aren't freed properly (if not an IP packet?). this led
to a memory leak.

- Sometimes there are not enough memp messages to pass packets/messages
to the tcpip thread. I increased this from 6? to 20.

There is also sometimes some unexpected behaviors from the AT91 EMAC
when receiving packets really fast, so for example if you are in the IRQ
for sending and then you receive something during, you need to take
care. My ISR looks like this:

void vEMACISR_Handler( void )
{
volatile unsigned portLONG ulIntStatus __attribute__ ((unused));
volatile unsigned portLONG ulTxStatus;
portBASE_TYPE xHigherPriorityTaskWoken = pdFALSE;

     /* Find the cause of the interrupt, cleared on read. */
     ulIntStatus = AT91C_BASE_EMAC->EMAC_ISR;

     if ( AT91C_BASE_EMAC->EMAC_RSR & AT91C_EMAC_REC )
     {
         /* A frame has been received, signal the lwIP task so it can
process
         the Rx descriptors. */
         AT91C_BASE_EMAC->EMAC_RSR = AT91C_EMAC_REC;
         xSemaphoreGiveFromISR( xSemaphore, &xHigherPriorityTaskWoken );
     }

     ulTxStatus = AT91C_BASE_EMAC->EMAC_TSR;
     if( ulTxStatus & AT91C_EMAC_COMP )
     {
         /* A frame has been transmitted.  Mark all the buffers used by
the
         frame just transmitted as free again. */
         AT91C_BASE_EMAC->EMAC_TSR = AT91C_EMAC_COMP;
         vClearEMACTxBuffer();
     }
     if( ulTxStatus & (AT91C_EMAC_UND | AT91C_EMAC_BEX))
     {
         /* A frame Tx failed. Reset Tx buffers. */
         AT91C_BASE_EMAC->EMAC_TSR = AT91C_EMAC_UND | AT91C_EMAC_BEX;
         EMAC_Statistics.tx_errors++;
         vResetEMACTxBuffer();
     }


     /* Clear the interrupt. */
     AT91C_BASE_AIC->AIC_EOICR = 0;

     /* If a task was woken by either a frame being received then we may
need to
     switch to another task.  If the unblocked task was of higher
priority then
     the interrupted task it will then execute immediately that the ISR
     completes. */
     if( xHigherPriorityTaskWoken )
     {
      portYIELD_FROM_ISR();
     }
}
/*-----------------------------------------------------------*/

void  vEMACISR_Wrapper( void )
{
     /* Save the context of the interrupted task. */
     portSAVE_CONTEXT();

     /* Call the handler to do the work.  This must be a separate
     function to ensure the stack frame is set up correctly. */
     vEMACISR_Handler();

     /* Restore the context of whichever task will execute next. */
     portRESTORE_CONTEXT();
}

I hope this helps. I've struggled a lot when moving from low-traffic
lab network to high-traffic production network.

Cheers
Ivan


> Date: Mon, 27 May 2013 10:01:44 -0700 (PDT)
> From: Sebastian Gonzalez <[hidden email]>
> To: [hidden email]
> Subject: [lwip-users] Receive path stuck due to pbuf_alloc returning
> NULL
> Message-ID: <[hidden email]>
> Content-Type: text/plain; charset=us-ascii
>
> Hi,
>
> I am using lwIP 1.3.2 with an Atmel AT91SAM7X512, and FreeRTOS. We
> have
> already used this combination in other projects with no problem, but
> now we
> using our design in a network with high density of UDP broadcast
> traffic
> causing the system to stop receiving.
> The transmission path keeps working as I can see ARP request messages
> coming
> out in the wireshark traces.
> After debugging and searching I found that several people had the same
> issue: The pbuf_alloc call from low_level_input in the ethernet driver
> returns NULL during the packet storm and keeps returning NULL, as if
> the
> TCP/IP task wasn't fast enough to free the pbufs, and thus the packets
> from
> the EMAC do not move to the upper layers.
> I do understand that during a packet storm all the packets that can't
> be
> processed are dropped, actually that's the behaviour that I expect.
> But I
> don't get why the consumer process is unable to free the packets that
> have
> already been passed to the upper layer.
> I have tried giving the TCP/IP thread the higher priority with no
> results.
> Also changed the number of pbufs from 8 to 16 and noticed that the
> problem
> happened later in time.
> Is there a recommended value for the number of pbufs, considering my
> reduced
> schema of memory?
>
> Best regards.
>
> Sebastian
>
>
>
>
> --
> View this message in context:
> http://lwip.100.n7.nabble.com/Receive-path-stuck-due-to-pbuf-alloc-returning-NULL-tp21461.html
> Sent from the lwip-users mailing list archive at Nabble.com.
>


_______________________________________________
lwip-users mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/lwip-users



If you reply to this email, your message will be added to the discussion below:
http://lwip.100.n7.nabble.com/Receive-path-stuck-due-to-pbuf-alloc-returning-NULL-tp21461p21475.html
To unsubscribe from Receive path stuck due to pbuf_alloc returning NULL, click here.
NAML

--
Firma By
Merry
            Christmas
Sebastián González
Investigación y Desarrollo
Tlf: 902 82 00 82
Fax: 902 82 00 83

[hidden email]
www.by.com.es


Antes de imprimir este e-mail piense si realmente es necesario hacerlo, el medio ambiente se lo agradecerá.

ADVERTENCIA
La información contenida en este correo electrónico, es de carácter privado y confidencial, siendo para uso exclusivo de su destinatario. Si usted no es el destinatario correcto, o ha recibido esta comunicación por error, le informamos que está totalmente prohibida cualquier divulgación, distribución o reproducción de esta comunicación según la legislación vigente y le rogamos que nos lo notifique inmediatamente, procediendo a su destrucción sin continuar su lectura.
Su dirección de correo electrónico, así como el resto de los datos de carácter personal que nos facilite, podrían ser objeto de tratamiento automatizado en nuestros ficheros, con la finalidad de gestionar la agenda de contactos de BY TECHDESIGN,S.L. Vd. podrá en cualquier momento ejercer sus derechos de acceso, rectificación, cancelación y oposición según la Ley Orgánica 15/1999 mediante notificación escrita a la siguiente dirección: CALLE TOMAS EDISON 5 28500 ARGANDA DEL REY.


Reply | Threaded
Open this post in threaded view
|

Re: Receive path stuck due to pbuf_alloc returning NULL

Sebastian Gonzalez
Hi,

I am still having problems, but it's true that it takes more time for the system to stop receiving under the same stress tests.
I made a little test that includes a semaphore that is set everytime a pbuf is allocated and is released every time the packet is processed, but with no positive results. Wether not every pbuf is freed in the TCP/IP thread after being processed, or there is a memory leak as Ivan sugested.
Is there any way to release all the memory? I don't mind having to close my application sockets if the system is restored and I can reopen them.

Thanks.

Sebestian.
Reply | Threaded
Open this post in threaded view
|

Re: Receive path stuck due to pbuf_alloc returning NULL

Ivan Delamer-2
In reply to this post by Sebastian Gonzalez
I don't know if this still applies but a while ago there was a memory
leak in the AT91/FreeRTOS implementation of ethernetif_input() when
passing the pbuf to netif->input

I have this code:

     switch( htons( ethhdr->type ) )
     {
       /* IP packet? */
       case ETHTYPE_IP:
       case ETHTYPE_ARP:
       case ETHTYPE_IPV6:
         /* pass to network layer */
         if (pxNetIf->input( p, pxNetIf ) != ERR_OK)
         {
           pbuf_free( p );
#if LINK_STATS
           lwip_stats.link.drop++;
           lwip_stats.link.err++;
#endif /* LINK_STATS */
         }
         break;

       default:
         pbuf_free( p );
#if LINK_STATS
         lwip_stats.link.drop++;
         lwip_stats.link.proterr++;
#endif /* LINK_STATS */
         p = NULL;
         break;
       }

The key is to free the pbuf not only if packet type is unrecognized,
but also if pxNetIf->input fails (this is where I had my memory leak).

Cheers
Ivan

> Date: Thu, 30 May 2013 02:14:04 -0700 (PDT)
> From: Sebastian Gonzalez <[hidden email]>
> To: [hidden email]
> Subject: Re: [lwip-users] Receive path stuck due to pbuf_alloc
> returning NULL
> Message-ID: <[hidden email]>
> Content-Type: text/plain; charset=us-ascii
>
> Hi,
>
> I am still having problems, but it's true that it takes more time for
> the
> system to stop receiving under the same stress tests.
> I made a little test that includes a semaphore that is set everytime a
> pbuf
> is allocated and is released every time the packet is processed, but
> with no
> positive results. Wether not every pbuf is freed in the TCP/IP thread
> after
> being processed, or there is a memory leak as Ivan sugested.
> Is there any way to release all the memory? I don't mind having to
> close my
> application sockets if the system is restored and I can reopen them.
>
> Thanks.
>
> Sebestian.


_______________________________________________
lwip-users mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/lwip-users
Reply | Threaded
Open this post in threaded view
|

Re: Receive path stuck due to pbuf_alloc returning NULL

Sebastian Gonzalez
Dear Ivan,

The issue that you mention is already covered in the latest example from Atmel, see code below.
After reviewing my code I noticed that all sockets are opened and closed as thery are used, all but an UDP socket that remained open continuously, used to receive broadcast frames, I am now closing this socket and reopening it every 30 seconds if no frames are received, this operation releases any memory that could be stuck in the deadlock. Looks good so far.
I'll let the system the system under test this weekend.

Thanks.

    switch( htons( ethhdr->type ) )
        {           
            /* IP or ARP packet? */
              case ETHTYPE_IP:
              case ETHTYPE_ARP:
#if PPPOE_SUPPORT
              /* PPPoE packet? */
              case ETHTYPE_PPPOEDISC:
              case ETHTYPE_PPPOE:
#endif /* PPPOE_SUPPORT */
                /* full packet send to tcpip_thread to process */                 
                  if( xNetIf->input(p, xNetIf) != ERR_OK )
                  { LWIP_DEBUGF(NETIF_DEBUG, ("ethernetif_input: IP input error\n"));
                    pbuf_free(p);
                    p = NULL;
                  }
              break;     
            default:
                pbuf_free( p );
                p = NULL;
                break;
        }       
El 30/05/2013 19:10, Ivan Delamer-2 [via lwIP] escribió:
I don't know if this still applies but a while ago there was a memory
leak in the AT91/FreeRTOS implementation of ethernetif_input() when
passing the pbuf to netif->input

I have this code:

     switch( htons( ethhdr->type ) )
     {
       /* IP packet? */
       case ETHTYPE_IP:
       case ETHTYPE_ARP:
       case ETHTYPE_IPV6:
         /* pass to network layer */
         if (pxNetIf->input( p, pxNetIf ) != ERR_OK)
         {
           pbuf_free( p );
#if LINK_STATS
           lwip_stats.link.drop++;
           lwip_stats.link.err++;
#endif /* LINK_STATS */
         }
         break;

       default:
         pbuf_free( p );
#if LINK_STATS
         lwip_stats.link.drop++;
         lwip_stats.link.proterr++;
#endif /* LINK_STATS */
         p = NULL;
         break;
       }

The key is to free the pbuf not only if packet type is unrecognized,
but also if pxNetIf->input fails (this is where I had my memory leak).

Cheers
Ivan

> Date: Thu, 30 May 2013 02:14:04 -0700 (PDT)
> From: Sebastian Gonzalez <[hidden email]>
> To: [hidden email]
> Subject: Re: [lwip-users] Receive path stuck due to pbuf_alloc
> returning NULL
> Message-ID: <[hidden email]>
> Content-Type: text/plain; charset=us-ascii
>
> Hi,
>
> I am still having problems, but it's true that it takes more time for
> the
> system to stop receiving under the same stress tests.
> I made a little test that includes a semaphore that is set everytime a
> pbuf
> is allocated and is released every time the packet is processed, but
> with no
> positive results. Wether not every pbuf is freed in the TCP/IP thread
> after
> being processed, or there is a memory leak as Ivan sugested.
> Is there any way to release all the memory? I don't mind having to
> close my
> application sockets if the system is restored and I can reopen them.
>
> Thanks.
>
> Sebestian.


_______________________________________________
lwip-users mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/lwip-users



If you reply to this email, your message will be added to the discussion below:
http://lwip.100.n7.nabble.com/Receive-path-stuck-due-to-pbuf-alloc-returning-NULL-tp21461p21490.html
To unsubscribe from Receive path stuck due to pbuf_alloc returning NULL, click here.
NAML

--
Firma By
BY TECH
Sebastián González
Investigación y Desarrollo
Tlf: 902 82 00 82
Fax: 902 82 00 83

[hidden email]
www.by.com.es


Antes de imprimir este e-mail piense si realmente es necesario hacerlo, el medio ambiente se lo agradecerá.

ADVERTENCIA
La información contenida en este correo electrónico, es de carácter privado y confidencial, siendo para uso exclusivo de su destinatario. Si usted no es el destinatario correcto, o ha recibido esta comunicación por error, le informamos que está totalmente prohibida cualquier divulgación, distribución o reproducción de esta comunicación según la legislación vigente y le rogamos que nos lo notifique inmediatamente, procediendo a su destrucción sin continuar su lectura.
Su dirección de correo electrónico, así como el resto de los datos de carácter personal que nos facilite, podrían ser objeto de tratamiento automatizado en nuestros ficheros, con la finalidad de gestionar la agenda de contactos de BY TECHDESIGN,S.L. Vd. podrá en cualquier momento ejercer sus derechos de acceso, rectificación, cancelación y oposición según la Ley Orgánica 15/1999 mediante notificación escrita a la siguiente dirección: CALLE TOMAS EDISON 5 28500 ARGANDA DEL REY.


Reply | Threaded
Open this post in threaded view
|

Re: Receive path stuck due to pbuf_alloc returning NULL

tushartp
In reply to this post by Ivan Delamer-2
Hello Ivan,

Could you please send me the .C file and .h file for the AT91EMAC Driver.

The receive path stuck during High TCP/IP traffic and I cannot connect to the AT91.
The strange thing is that I can Ping the IP address and it respond me.


Thanks,
Tushar
Reply | Threaded
Open this post in threaded view
|

Re: Receive path stuck due to pbuf_alloc returning NULL

Mary Jane
This post was updated on .
In reply to this post by Sebastian Gonzalez
Hi all, I have a similar problem. I am using FreeRTOS 9.0.0 and lwIP 2.1.0
contrib FreeRTOS port. I use a notification in ISR and send ethernet packets
to message box in ethernet input thread. Then I process those packets in
tcpip_thread().

I can successfully receive and send SNMP packets at 1.3 Mbps. Above this
speed, response rate to SNMP requests gradually decrease as I increase
incoming packets. After 13 Mbps, there is no answer at all.

The problem is, even if I stop sending SNMP requests to device, I cannot
ping the device anymore. The program does not get into EMAC RX ISR. However,
it continues to execute other threads.

I investigated the problem, and I observed that the problem happens when I
use pbuf_alloc(PBUF_POOL). If I prevent the program to get into pbuf_alloc
during the storm, and reenable pbuf_alloc after storm, I can ping the
device.

Any help will be appreciated.

Regards
Mary Jane



--
Sent from: http://lwip.100.n7.nabble.com/lwip-users-f3.html

_______________________________________________
lwip-users mailing list
lwip-users@nongnu.org
https://lists.nongnu.org/mailman/listinfo/lwip-users
Reply | Threaded
Open this post in threaded view
|

Re: Receive path stuck due to pbuf_alloc returning NULL

Sergio R. Caprile
In reply to this post by Sebastian Gonzalez
If your device does not go to the Eth Rx ISR that is because either the
controller is not interrupting anymore or the OS is not accepting the IRQ.
pbuf_alloc() does not control your chip, your driver does.
Your driver should gracefully discard the incoming frame if it can't
accept its contents (put them in a pbuf).
This should also include letting the chip in a useful state so it can
interrupt again, and refrain from writing anywhere; something like:

        p = pbuf_alloc(PBUF_RAW, somesize, PBUF_POOL);
        if (p != NULL) {
                // handle the data from the chip to the pbuf
        } else {
                // LINK_STATS_INC(link.memerr); was useful long time ago, and I guess
it still is
                // flush the chip so it "thinks data has been delivered", I read it
anyway like if delivering, ymmv
                return; // can't go ahead, and next frame will likely have to be
discarded too... mmm... ymmv
        }
        // perhaps check some packet types and eventually
        if (netif->input(p, netif != ERROR_OK))
                pbuf_free(p);
        // but if you will not deliver, then pbuf_free(p);

Perhaps your driver is not checking returned value and goes ahead when
it shouldn't ?
Maybe your driver does check the value and forgets to properly care for
the controller when there is no place to put the data ?


_______________________________________________
lwip-users mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/lwip-users
Reply | Threaded
Open this post in threaded view
|

Re: Receive path stuck due to pbuf_alloc returning NULL

Mary Jane
This post was updated on .
Hi Sergio,

Thanks for the answer.

"If your device does not go to the Eth Rx ISR that is because either the
controller is not interrupting anymore or the OS is not accepting the IRQ."


I don't think FreeRTOS controls EMAC interrupts, what I face is probably
your first assumption.

"  p = pbuf_alloc(PBUF_RAW, somesize, PBUF_POOL);
        if (p != NULL) {
                // handle the data from the chip to the pbuf
        } else {
                // LINK_STATS_INC(link.memerr); was useful long time ago,
and I guess
it still is
                // flush the chip so it "thinks data has been delivered", I
read it
anyway like if delivering, ymmv
                return; // can't go ahead, and next frame will likely have
to be
discarded too... mmm... ymmv
        }
        // perhaps check some packet types and eventually
        if (netif->input(p, netif != ERROR_OK))
                pbuf_free(p);
        // but if you will not deliver, then pbuf_free(p);

Perhaps your driver is not checking returned value and goes ahead when
it shouldn't ?
Maybe your driver does check the value and forgets to properly care for
the controller when there is no place to put the data ? "


I am implementing something similar to this. What I don't understand is
neither ISR nor ethernet input task does not use any return value from
hdkif_input(), which implements these functions. I don't even know what
pbuf_alloc has to do with emac controller. It looks to me that pbuf_alloc
should not affect ISR mechanism. However, as I told in my question,
disabling pbuf_alloc() saves the program from not getting into ISR.

I tried to change some options in lwip. I enabled MEMP_OVERFLOW_CHECK,
MEMP_SANITY_CHECK in opt.h and LWIP_FREERTOS_SYS_ARCH_PROTECT_USES_MUTEX in
sys_arch.c and that made an improvement. Now interrupts fail after 7-8
minutes instead of 1-2 seconds.

I am kind of a newbie, not very familiar with the device, freertos and lwip.
But these observartions make me think the problem is with lwip.

Can you help me further?
Regards
Mary Jane



--
Sent from: http://lwip.100.n7.nabble.com/lwip-users-f3.html

_______________________________________________
lwip-users mailing list
lwip-users@nongnu.org
https://lists.nongnu.org/mailman/listinfo/lwip-users
Reply | Threaded
Open this post in threaded view
|

Re: Receive path stuck due to pbuf_alloc returning NULL

Sergio R. Caprile
In reply to this post by Sergio R. Caprile
Your scenario is composed of four main pieces of software (to name them
some way). lwIP will handle TCP/IP stuff but will not mess with your
controller. Your driver will. Your driver is not lwIP, it is your
responsibility. I guess you handled it to your vendor and are using
something that is vendor provided.
The third piece of the puzzle is your port, the adaptation layer that
glues lwIP to your OS (or bare metal). Your port is not lwIP, it is your
responsibility. You are using FreeRTOS, that seems to be a working port
and a good one; I don't use it and I can't comment on it.

IRQs, taking care of the controller --> driver
semaphores, locks --> port

when called: frames in, frames out --> lwIP

and then there is your application, using lwIP whithin your port framework.

The only connection between pbuf_alloc() and your controller is that
your driver will use pbuf_alloc to get the memory to put the incoming
frame there. The piece of code that will get the interrupt, handle the
semaphore or whatever, read the frame, store it in the pbuf (the piece
of memory returned by pbuf_alloc()), and keep the controller healthy is
your driver.

You seem to have a problem with the driver not operating correctly on
the chip, or there is some semaphore/mutex/equivalent in your port that
is not properly setup.

You have to at least know whether IRQs are not being generated or not
being accepted. Place a breakpoint on the IRQ handler and see if it
fires and then what happens there. (Of course you will need to provide
some traffic so your controller can receive it.)
If it doesn't, try to find the ones disabling/reenabling it and why that
might fail (race conditions, deadlocks, that stuff).
Welcome to the fascinating world of debugging embedded systems.

There is another possibility, that you are violating the threading rules
somewhere.
Please check it: https://www.nongnu.org/lwip/2_1_x/pitfalls.html


_______________________________________________
lwip-users mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/lwip-users
Reply | Threaded
Open this post in threaded view
|

Re: Receive path stuck due to pbuf_alloc returning NULL

Mary Jane
Hi Sergio,

I finally managed to solve the problem. As I said, FreeRTOS has nothing to
do with interrupts, the problem is that ISR just doesn't fire.

But you are right in the sense that the problem is not with lwIP, it is all
about the emac driver.

This issue now is irrelevant to lwIP but there might be people having the
same problem using lwIP with TMS570, so I will put the solution link below:
https://e2e.ti.com/support/microcontrollers/hercules/f/312/t/678879
<https://e2e.ti.com/support/microcontrollers/hercules/f/312/t/678879>  

Regards,
Mary Jane



--
Sent from: http://lwip.100.n7.nabble.com/lwip-users-f3.html

_______________________________________________
lwip-users mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/lwip-users