TCP state machine problem? LWIP 1.4.1

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

TCP state machine problem? LWIP 1.4.1

Fabian Koch-2

Hey all,

 

we had some weird behavior with a TCP connection on LWIP 1.4.1 when the peer (non-LWIP) has a cable disconnect:

 

  • LWIP has an established TCP connection #1 running fine
  • Peer has a cable disconnect
  • Our application on top of LWIP runs into a receive timeout and closes the socket (500ms)
  • Peer reconnects cable
  • Our application opens a new connection #2 which again is established and running fine
  • The FINACK+PSHACK re-sends of connection #1 also reaches the peer which answers RSTACK
  • This keeps on looping until we restart the whole machine with LWIP

 

Also, I have a sort of “netstat” implemented on top of the LWIP socket API which runs over all possible sockets we have and if it finds a valid conn pointer there, prints infos (local addr, remot addr, port, TCP state and such). And connection #1 does not show up anymore in this view!

 

In my mind, the TCP state machine should be in FIN_WAIT_1 while the peer cable is disconnected?

And it should just jump to either CLOSED or TIME_WAIT when receiving the RSTs upon cable reconnect?

 

I attached a clipped pcap with only connection #1 shown and the problem starting at packet #19. Image the final exchange going on forever to understand the problem ;o)

 

Any comments or debugging ideas appreciated.

 

Kind regards

Fabian

 


_______________________________________________
lwip-users mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/lwip-users

LWIP_1_4_1_TCP_state.pcapng (15K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: TCP state machine problem? LWIP 1.4.1

Sergio R. Caprile
mmm... the ACK number..., I think I've seen this one or two years ago,
search the list and or the patches for "one less" or something like that.
I'm not fresh on this, but I think that is the problem, the ACK to the
RST has the wrong number and causes a retransmission. I can't remember
if this is also related to the half-closed connection; you might check
on that too.

_______________________________________________
lwip-users mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/lwip-users
Reply | Threaded
Open this post in threaded view
|

Re: TCP state machine problem? LWIP 1.4.1

goldsimon@gmx.de
On 08.03.2019 15:47, Sergio R. Caprile wrote:
> mmm... the ACK number..., I think I've seen this one or two years ago,
> search the list and or the patches for "one less" or something like that.
> I'm not fresh on this, but I think that is the problem, the ACK to the
> RST has the wrong number and causes a retransmission. I can't remember
> if this is also related to the half-closed connection; you might check
> on that too.

I know this might not be an option, but 1.4.1 is *really* old and this
one as well as numerous other things might already be fixed in one of
the newer versions.

Regards,
Simon

_______________________________________________
lwip-users mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/lwip-users
Reply | Threaded
Open this post in threaded view
|

Re: TCP state machine problem? LWIP 1.4.1

Fabian Koch-2
Hey Simon,

> I know this might not be an option, but 1.4.1 is *really* old and
> this one as well as numerous other things might already be fixed
> in one of the newer versions.

Yes. This is not an option unfortunately. We considered updating to 2.0.3 a while ago and I tried integrating it but it requires quite some changes which we can't justify for the old product.
Is there a chance to find a specific fix/patch for this?

I tried searching the bug reports and patches on savannah and the mailing list but did not find something that really matches.

Any hints would be much appreciated.

Kind regards
Fabian

_______________________________________________
lwip-users mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/lwip-users
Reply | Threaded
Open this post in threaded view
|

Re: TCP state machine problem? LWIP 1.4.1

Sergio R. Caprile
In reply to this post by Sergio R. Caprile
I guess this is what I remember... not exactly your problem nor a
helping hand but perhaps you can start digging here:
http://savannah.nongnu.org/bugs/?48328


_______________________________________________
lwip-users mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/lwip-users
Reply | Threaded
Open this post in threaded view
|

Re: TCP state machine problem? LWIP 1.4.1

Fabian Koch-2
Update:

Even when copying the current LWIP master state of the top of tcp_process() to my 1.4.1 working copy, the behavior still results in this RST/ACK pingpong.

I fear that something in tcp_receive() has changed as well or even deeper?

At least in the referenced bug report, the current state of master seems to have helped. My only idea right now is to blame it on the IP stack of the peer but that is a bit counterproductive right now.

Any further ideas or help in this?

Kind regards
Fabian

-----Original Message-----
From: lwip-users <lwip-users-bounces+fabian.koch=[hidden email]> On Behalf Of Sergio R. Caprile
Sent: Montag, 11. März 2019 13:49
To: [hidden email]
Subject: Re: [lwip-users] TCP state machine problem? LWIP 1.4.1

CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you recognize the sender and know the content is safe.


I guess this is what I remember... not exactly your problem nor a helping hand but perhaps you can start digging here:
https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fsavannah.nongnu.org%2Fbugs%2F%3F48328&amp;data=02%7C01%7Cfabian.koch%40de.abb.com%7C589d34f7bc5246775db008d6a620398d%7C372ee9e09ce04033a64ac07073a91ecd%7C0%7C0%7C636879054661901465&amp;sdata=vrNa%2B%2FWJEbduI0x6jFfNikdNIuCMJNtGmYfw55D6Y9E%3D&amp;reserved=0


_______________________________________________
lwip-users mailing list
[hidden email]
https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.nongnu.org%2Fmailman%2Flistinfo%2Flwip-users&amp;data=02%7C01%7Cfabian.koch%40de.abb.com%7C589d34f7bc5246775db008d6a620398d%7C372ee9e09ce04033a64ac07073a91ecd%7C0%7C0%7C636879054661901465&amp;sdata=j%2BDCwH4vS%2F3i9LSRzaiLzT6GDIUe59p8eesKtlE%2B3kI%3D&amp;reserved=0

_______________________________________________
lwip-users mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/lwip-users
Reply | Threaded
Open this post in threaded view
|

Re: TCP state machine problem? LWIP 1.4.1

Sergio R. Caprile
In reply to this post by Sergio R. Caprile
Nope, I'm sorry, no further ideas on my side.
Since you can't upgrade, perhaps diffing against git commits around that
time or against git head would provide a clue on what to change.
There could be another bug report more related to your problem, I just
didn't find it but I haven't done an exhaustive search.
Good luck


_______________________________________________
lwip-users mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/lwip-users