tcp_active_pcbs corrupt after resetting connection ???

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

tcp_active_pcbs corrupt after resetting connection ???

Terence D
Hi, I'm using lwIP 1.41 with a Texas Instruments Tiva Launchpad development
board (the TM4C1294).  I'm seeing an odd issue occur when resetting a
connection during stress testing.  The Tiva board is the TCP/IP server.  I
have a client application I made that runs on a Windows machine and connects
automatically when starting up.

My stress test consists of making multiple connections to the board.  A new
connection - a new instance of the Windows client app - is made every two
seconds until there are six connections.  I then wait five seconds, kill all
of the Windows client apps and start over making new connections every two
seconds.

The Tiva dev board does some various send/receive communication between the
client app and the board - which all appears to work fine - but then under
certain logic will reset the client connection using the following code:

        tcp_arg(pcb, NULL);
        tcp_sent(pcb, NULL);
        tcp_recv(pcb, NULL);
        tcp_err(pcb, NULL);
        tcp_close(pcb);

This code is run in the lwIPHostTimerHandler.  Which of course is called
from the Ethernet interrupt handler.  Based on other examples, I see no
problem with this.  However, intermittently, it appears to corrupt the
lwIP's tcp.c's tcp_active_pcbs linked list.

Maybe one out of five times, at some point after resetting the client
connection using the above code, the firmware code will get stuck usually in
tcp_fasttmr of tcp.c.  This is because one of the pcb "next" pointers will
point to itself or one of the earlier pcb pointers.  This results in an
infinite loop in tcp_fasttmr as it walks the list - it never gets to a null
next pointer.

Anyone have any idea of what may be happening here?



--
Sent from: http://lwip.100.n7.nabble.com/lwip-users-f3.html

_______________________________________________
lwip-users mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/lwip-users
Reply | Threaded
Open this post in threaded view
|

Re: tcp_active_pcbs corrupt after resetting connection ???

goldsimon@gmx.de
Am 27.03.2019 um 20:05 schrieb Terence D:
> Hi, I'm using lwIP 1.41 with a Texas Instruments Tiva Launchpad development
> board (the TM4C1294).

Now that's a really old version of lwIP!

> [..]
> This code is run in the lwIPHostTimerHandler.  Which of course is called
> from the Ethernet interrupt handler.

Wait a minute, "of course"? Have you read this (mostly valid for 1.4.x
as well):
https://www.nongnu.org/lwip/2_1_x/pitfalls.html

> Based on other examples, I see no problem with this.

Ehrm, based on which examples? When running code from ETH interrupt
handler, you have to *know* what you are doing! Basically this means:
*no* access into lwIP from any other interrupt priority or main loop
*unless* the ethernet interrupt is disabled.

> However, intermittently, it appears to corrupt the
> lwIP's tcp.c's tcp_active_pcbs linked list.

And I would have written that as an example if you hadn't watched out
for what I wrote in the lines above ;-)

Try to clean up your code (in terms of execution threads) and if you
can, think about upgrading to a more recent version.

Regards,
Simon

_______________________________________________
lwip-users mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/lwip-users
Reply | Threaded
Open this post in threaded view
|

Re: tcp_active_pcbs corrupt after resetting connection ???

Terence D
Hi Simon - Thanks for the reply.  I've made my own replies below:

> Hi, I'm using lwIP 1.41 with a Texas Instruments Tiva Launchpad development
> board (the TM4C1294).
 
>>Now that's a really old version of lwIP!

Yes, unfortunately 1.41 is the version that is packaged in TI's latest release of Tivaware and not a more recent version.

> [..]
> This code is run in the lwIPHostTimerHandler.  Which of course is called
> from the Ethernet interrupt handler.

>>Wait a minute, "of course"? Have you read this (mostly valid for 1.4.x
>>as well):

Yes, I've been working with lwIP on the Tiva for some time now and have been aware of these instructions for quite a while and have taken great care to make sure all lwIP calls (all tcp_* calls) are made on a single "thread", this being the Tiva's Ethernet interrupt.  For example, when needing to send new data outside of the Ethernet interrupt I place the data in a thread safe queue.  This queue is then processed (i.e. the data from the queue is sent using tcp_wtite and tcp_output) only during the Tiva's Ethernet interrupt.

> Based on other examples, I see no problem with this.

>>Ehrm, based on which examples? 

TI's Tivaware contains examples of using lwIP where the lwIPHostTimerHandler function makes tcp_ calls.  The examples have lwIPEthernetIntHandler() as the ISR, this calls  lwiplib.c's lwIPServiceTimers() which calls lwIPHostTimerHandler() 

>>When running code from ETH interrupt
>>handler, you have to *know* what you are doing! Basically this means:
>>*no* access into lwIP from any other interrupt priority or main loop
>>*unless* the ethernet interrupt is disabled.

Totally agree.  I've taken great care in my code to always do this and, as far as I can tell, this is indeed what I am doing.  I never do any calls into lwIP from anywhere except the Ethernet interrupt handler.

>> However, intermittently, it appears to corrupt the
>> lwIP's tcp.c's tcp_active_pcbs linked list.

>And I would have written that as an example if you hadn't watched out
>for what I wrote in the lines above ;-)

Right, you're saying the tcp_active_pcbs linked list being corrupted like this is a common result of multiple execution contexts in lwIP code.  I could understand that.

>Try to clean up your code (in terms of execution threads) and if you
>can, think about upgrading to a more recent version.

The code is quite clean, and, as an experienced developer, I've reviewed it many times to verify the logic is correct and no thread issues exist.  Other than this issue I'm having, the 1.41 lib is performing very well.  I'm doing a lot of communication with a number of clients over long periods of time without any issues.  Unfortunately, using non-Tivaware 3rd party code is probably not an option for me at this point.  I do understand 1.41 is quite an old version of lwIP so if the answer boils down to "upgrade" being the advised solution I could understand this.

_______________________________________________
lwip-users mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/lwip-users
Reply | Threaded
Open this post in threaded view
|

Re: tcp_active_pcbs corrupt after resetting connection ???

goldsimon@gmx.de


On 27.03.19 22:00, Terence Darwen wrote:

> Hi Simon - Thanks for the reply.  I've made my own replies below:
>
>  > Hi, I'm using lwIP 1.41 with a Texas Instruments Tiva Launchpad
> development
>  > board (the TM4C1294).
>  >>Now that's a really old version of lwIP!
>
> Yes, unfortunately 1.41 is the version that is packaged in TI's latest
> release of Tivaware and not a more recent version.
>
>  > [..]
>  > This code is run in the lwIPHostTimerHandler.  Which of course is called
>  > from the Ethernet interrupt handler.
>
>  >>Wait a minute, "of course"? Have you read this (mostly valid for 1.4.x
>  >>as well):
>  >>https://www.nongnu.org/lwip/2_1_x/pitfalls.html
>
> Yes, I've been working with lwIP on the Tiva for some time now and have
> been aware of these instructions for quite a while and have taken great
> care to make sure all lwIP calls (all tcp_* calls) are made on a single
> "thread", this being the Tiva's Ethernet interrupt.  For example, when
> needing to send new data outside of the Ethernet interrupt I place the
> data in a thread safe queue.  This queue is then processed (i.e. the
> data from the queue is sent using tcp_wtite and tcp_output) only during
> the Tiva's Ethernet interrupt.
>
>  > Based on other examples, I see no problem with this.
>
>  >>Ehrm, based on which examples?
>
> TI's Tivaware contains examples of using lwIP where the
> lwIPHostTimerHandler function makes tcp_ calls.  The examples
> have lwIPEthernetIntHandler() as the ISR, this calls  lwiplib.c's
> lwIPServiceTimers() which calls lwIPHostTimerHandler()
>
>  >>When running code from ETH interrupt
>  >>handler, you have to *know* what you are doing! Basically this means:
>  >>*no* access into lwIP from any other interrupt priority or main loop
>  >>*unless* the ethernet interrupt is disabled.
>
> Totally agree.  I've taken great care in my code to always do this and,
> as far as I can tell, this is indeed what I am doing.  I never do any
> calls into lwIP from anywhere except the Ethernet interrupt handler.
>
>  >> However, intermittently, it appears to corrupt the
>  >> lwIP's tcp.c's tcp_active_pcbs linked list.
>
>  >And I would have written that as an example if you hadn't watched out
>  >for what I wrote in the lines above ;-)
>
> Right, you're saying the tcp_active_pcbs linked list being corrupted
> like this is a common result of multiple execution contexts in lwIP
> code.  I could understand that.
>
>  >Try to clean up your code (in terms of execution threads) and if you
>  >can, think about upgrading to a more recent version.
>
> The code is quite clean, and, as an experienced developer, I've reviewed
> it many times to verify the logic is correct and no thread issues
> exist.  Other than this issue I'm having, the 1.41 lib is performing
> very well.  I'm doing a lot of communication with a number of clients
> over long periods of time without any issues.  Unfortunately, using
> non-Tivaware 3rd party code is probably not an option for me at this
> point.  I do understand 1.41 is quite an old version of lwIP so if the
> answer boils down to "upgrade" being the advised solution I could
> understand this.

Well, I'm sorry to say I cannot tell you what's the reason for the bug
you're seeing. I cannot recall a specific bug leading to this, but of
course it could be that it's a bug that has been fixed by now. However,
using lwIP like that (only in the ETH interrupt) is quite uncommon and
hard to review, so I wouldn't put concurrency problems aside as a reason
here...

Regards,
Simon

_______________________________________________
lwip-users mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/lwip-users
Reply | Threaded
Open this post in threaded view
|

Re: tcp_active_pcbs corrupt after resetting connection ???

Terence D


On Wed, Mar 27, 2019 at 4:13 PM Simon Goldschmidt <[hidden email]> wrote:


On 27.03.19 22:00, Terence Darwen wrote:
> Hi Simon - Thanks for the reply.  I've made my own replies below:
>
>  > Hi, I'm using lwIP 1.41 with a Texas Instruments Tiva Launchpad
> development
>  > board (the TM4C1294).
>  >>Now that's a really old version of lwIP!
>
> Yes, unfortunately 1.41 is the version that is packaged in TI's latest
> release of Tivaware and not a more recent version.
>
>  > [..]
>  > This code is run in the lwIPHostTimerHandler.  Which of course is called
>  > from the Ethernet interrupt handler.
>
>  >>Wait a minute, "of course"? Have you read this (mostly valid for 1.4.x
>  >>as well):
>  >>https://www.nongnu.org/lwip/2_1_x/pitfalls.html
>
> Yes, I've been working with lwIP on the Tiva for some time now and have
> been aware of these instructions for quite a while and have taken great
> care to make sure all lwIP calls (all tcp_* calls) are made on a single
> "thread", this being the Tiva's Ethernet interrupt.  For example, when
> needing to send new data outside of the Ethernet interrupt I place the
> data in a thread safe queue.  This queue is then processed (i.e. the
> data from the queue is sent using tcp_wtite and tcp_output) only during
> the Tiva's Ethernet interrupt.
>
>  > Based on other examples, I see no problem with this.
>
>  >>Ehrm, based on which examples?
>
> TI's Tivaware contains examples of using lwIP where the
> lwIPHostTimerHandler function makes tcp_ calls.  The examples
> have lwIPEthernetIntHandler() as the ISR, this calls  lwiplib.c's
> lwIPServiceTimers() which calls lwIPHostTimerHandler()
>
>  >>When running code from ETH interrupt
>  >>handler, you have to *know* what you are doing! Basically this means:
>  >>*no* access into lwIP from any other interrupt priority or main loop
>  >>*unless* the ethernet interrupt is disabled.
>
> Totally agree.  I've taken great care in my code to always do this and,
> as far as I can tell, this is indeed what I am doing.  I never do any
> calls into lwIP from anywhere except the Ethernet interrupt handler.
>
>  >> However, intermittently, it appears to corrupt the
>  >> lwIP's tcp.c's tcp_active_pcbs linked list.
>
>  >And I would have written that as an example if you hadn't watched out
>  >for what I wrote in the lines above ;-)
>
> Right, you're saying the tcp_active_pcbs linked list being corrupted
> like this is a common result of multiple execution contexts in lwIP
> code.  I could understand that.
>
>  >Try to clean up your code (in terms of execution threads) and if you
>  >can, think about upgrading to a more recent version.
>
> The code is quite clean, and, as an experienced developer, I've reviewed
> it many times to verify the logic is correct and no thread issues
> exist.  Other than this issue I'm having, the 1.41 lib is performing
> very well.  I'm doing a lot of communication with a number of clients
> over long periods of time without any issues.  Unfortunately, using
> non-Tivaware 3rd party code is probably not an option for me at this
> point.  I do understand 1.41 is quite an old version of lwIP so if the
> answer boils down to "upgrade" being the advised solution I could
> understand this.

Well, I'm sorry to say I cannot tell you what's the reason for the bug
you're seeing. I cannot recall a specific bug leading to this, but of
course it could be that it's a bug that has been fixed by now. However,
using lwIP like that (only in the ETH interrupt) is quite uncommon and
hard to review, so I wouldn't put concurrency problems aside as a reason
here...

Okay, no problem, Simon.  Thank you for the replies. 

_______________________________________________
lwip-users mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/lwip-users