endless loop after window updates

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

endless loop after window updates

Andre Puschmann
Hey folks,
I still have those (at least for me) curious problems with the stack. I
did some more research and figured out a way to turn lwip into kind of
endless loop with only 2 packets every 1500ms.
If I send many packets very quickly to a windows box and then produce
some load on it (e.g. write the received data into a file) windows xp
decreases its receive_wnd with every single ACK. This is o.k. to lwip up
to the certain point where windows resets its receive_window (e.g.
13720bytes .. packet #226 in trace).
lwip now sends as many packets as fit into the new window, which
properly isn't the right behavior, isn't it?
After the following ACK lwip continues bursting out packets regardless
of the seq-no's xp is expecting.
I am still wondering why the whole systems now gets into the this
endless loop with 1 packet per second transfer rate.
It is always the first of two transmitted packets which gets lost, which
seems, at least to me, to be very odd.
Further more, there is no way to "heal" the connection, so far.

Any suggestions what might be the problem or where to pay more attention
to are more than welcome!


I uploaded a trace which shows the described behavior.
bzipped: http://www.puschmann.net/public/stream_with_wnd_update.tar.bz2
(~130k)

unzipped: http://www.puschmann.net/public/stream_with_wnd_update.cap (~300k)



Kind regards


Andre



_______________________________________________
lwip-users mailing list
[hidden email]
http://lists.nongnu.org/mailman/listinfo/lwip-users
Reply | Threaded
Open this post in threaded view
|

Re: endless loop after window updates

Kieran Mansley
On Mon, 2006-11-13 at 20:20 +0100, Andre Puschmann wrote:

> Hey folks,
> I still have those (at least for me) curious problems with the stack. I
> did some more research and figured out a way to turn lwip into kind of
> endless loop with only 2 packets every 1500ms.
> If I send many packets very quickly to a windows box and then produce
> some load on it (e.g. write the received data into a file) windows xp
> decreases its receive_wnd with every single ACK. This is o.k. to lwip up
> to the certain point where windows resets its receive_window (e.g.
> 13720bytes .. packet #226 in trace).
> lwip now sends as many packets as fit into the new window, which
> properly isn't the right behavior, isn't it?

Looks fine to me - there's no obvious sign of congestion, so it can send
as many packets (up to the advertised receive window) as it likes.  The
other end continues to advertise more space to send into, and so lwIP
carries on sending.  No problems here as far as I can see.

> After the following ACK lwip continues bursting out packets regardless
> of the seq-no's xp is expecting.
> I am still wondering why the whole systems now gets into the this
> endless loop with 1 packet per second transfer rate.
> It is always the first of two transmitted packets which gets lost, which
> seems, at least to me, to be very odd.

No packets are getting lost.  In each burst there are three packets from
lwIP.  This first is a retransmission of a previous packet, the second
has a bad checksum (and so will need to be retransmitted as the receiver
will throw it away; the retransmission forms the first packet of the
next burst) and the third is fine.  There is then a delay while lwIP
waits to see if it gets acknowledgements for the data.  Because one was
thrown away, it doesn't and so it retransmits the missing packet, and
can then carry on - this is the start of the next burst.  Because it
interprets having to retransmit as a clue about congestion it can now
only send a few packets at a time, rather than the long streams it was
before.  The connection would "heal" automatically if it didn't keep
sending packets with bad checksums.

I've not seen something get into this very repetitive and regular state
before.  There is something clearly wrong.  It is almost as if the act
of retransmitting a packet is causing another one to get a bad checksum.
That is the sort of thing I think you need to look into.

Hope that helps,

Kieran



_______________________________________________
lwip-users mailing list
[hidden email]
http://lists.nongnu.org/mailman/listinfo/lwip-users
Reply | Threaded
Open this post in threaded view
|

Re: endless loop after window updates

Andre Puschmann
Kieran Mansley wrote:

> On Mon, 2006-11-13 at 20:20 +0100, Andre Puschmann wrote:
>> Hey folks,
>> I still have those (at least for me) curious problems with the stack. I
>> did some more research and figured out a way to turn lwip into kind of
>> endless loop with only 2 packets every 1500ms.
>> If I send many packets very quickly to a windows box and then produce
>> some load on it (e.g. write the received data into a file) windows xp
>> decreases its receive_wnd with every single ACK. This is o.k. to lwip up
>> to the certain point where windows resets its receive_window (e.g.
>> 13720bytes .. packet #226 in trace).
>> lwip now sends as many packets as fit into the new window, which
>> properly isn't the right behavior, isn't it?
>
> Looks fine to me - there's no obvious sign of congestion, so it can send
> as many pac ikets (up to the advertised receive window) as it likes.  The
> other end continues to advertise more space to send into, and so lwIP
> carries on sending.  No problems here as far as I can see.

You're right. I guess I was a bit confused about this large number of
packets.

>> After the following ACK lwip continues bursting out packets regardless
>> of the seq-no's xp is expecting.
>> I am still wondering why the whole systems now gets into the this
>> endless loop with 1 packet per second transfer rate.
>> It is always the first of two transmitted packets which gets lost, which
>> seems, at least to me, to be very odd.
>
> No packets are getting lost.  In each burst there are three packets from
> lwIP.  This first is a retransmission of a previous packet, the second
> has a bad checksum (and so will need to be retransmitted as the receiver
> will throw it away; the retransmission forms the first packet of the
> next burst) and the third is fine.  There is then a delay while lwIP
> waits to see if it gets acknowledgements for the data.  Because one was
> thrown away, it doesn't and so it retransmits the missing packet, and
> can then carry on - this is the start of the next burst.  Because it
> interprets having to retransmit as a clue about congestion it can now
> only send a few packets at a time, rather than the long streams it was
> before.  The connection would "heal" automatically if it didn't keep
> sending packets with bad checksums.
>
> I've not seen something get into this very repetitive and regular state
> before.  There is something clearly wrong.  It is almost as if the act
> of retransmitting a packet is causing another one to get a bad checksum.
> That is the sort of thing I think you need to look into.

I mean to lwip it looks like the first of two packets always gets lost.
Its true that windows drops this packet because of a bad checksum, but
after lwip retransmitted the packet (with the same checksum) it's ok for
windows, that confuses me. I can't see any differences between those two
packets.


> Hope that helps,
>
> Kieran

Kind Regards

André



_______________________________________________
lwip-users mailing list
[hidden email]
http://lists.nongnu.org/mailman/listinfo/lwip-users
Reply | Threaded
Open this post in threaded view
|

Re: Re: endless loop after window updates

Kieran Mansley
On Tue, 2006-11-14 at 20:38 +0100, Andre Puschmann wrote:

> Its true that windows drops this packet because of a bad checksum, but
> after lwip retransmitted the packet (with the same checksum) it's ok for
> windows, that confuses me. I can't see any differences between those two
> packets.

There are differences in the payloads.  Compare, for example, frames 227
and 247.  (247 is a retransmission of 227).  The end of the payload of
frame 227 (as displayed by ethereal) is:

04c0  dd e3 e0 e0 e0 e0 d2 d5  d2 d5 d4 d5 d3 d3 d2
d6   ........ ........
04d0  d5 d7 d5 d5 d6 d5 d8 d6  d4 d4 d9 d5 da d6 d6
d6   ........ ........
04e0  d6 d9 d7 d5 d6 db d7 d7  d6 d8 d6 d8 d8 db dc
d9   ........ ........
04f0  db d8 d8 d6 da dd db da  d9 de dc de d8 da dc
da   ........ ........
0500  dc db dc de dd df de db  d9 db dc e2 dd dc dc
dc   ........ ........
0510  de e0 e0 df de e0 e3 e0  e1 e0 e2 e0 e1 e2 e1
e0   ........ ........
0520  e3 e0 e1 e3 e0 dd e2 df  e5 e1 e2 e0 e5 e1 e1
e4   ........ ........
0530  df e2 e0 e2 e2 e3 e2 e0  e6 e2 e5 e0 e0 e4 e1
e4   ........ ........
0540  df e2 df e2 e4 e1 e1 e3  df e0 df e3 e2 e0 e3
e1   ........ ........
0550  dd e4 df df e0 e2 e3 e4  e6 e1 e1 e4 e0 df e1
e1   ........ ........
0560  e1 de df e0 e3 e1 de e4  e2 e0 e1 dd df e2 e1
e0   ........ ........
0570  e3 e4 e2 de e0 e0 e1 de  de e0 de df dd e0 e0
df   ........ ........
0580  de e3 dc e0 de df df de  dd dd df dd dc dd d9
db   ........ ........
0590  da da da d9 da dd d9 df  dc db d8 da d8 db dc
d7   ........ ........
05a0  dc d9 dc da da d9 db dc  dd d8 d8 da d7 dc bc
00   ........ ........
05b0  05
a0                                              ..              

The end of the payload of frame 247 (as displayed by ethereal) is:

04c0  dd e3 e0 e0 e0 e0 e1 e0  e1 e2 e2 e0 df e1 e1
e1   ........ ........
04d0  dd e4 e1 e0 e2 e0 e3 e1  df e3 e2 e4 df e2 e2
e0   ........ ........
04e0  e0 de df df e3 e0 e2 e0  e0 e0 df e3 e3 e1 e2
e4   ........ ........
04f0  e4 e1 e1 e2 e1 e1 e1 e1  e5 de dc e0 e2 e2 df
e1   ........ ........
0500  e2 e2 e1 de e0 e4 e1 e0  de de e2 df df e0 de
de   ........ ........
0510  e0 e1 de db de e5 e1 e1  de dd de df dd de df
de   ........ ........
0520  e5 dc df dd dd df dd db  dc de dd de df dd de
dc   ........ ........
0530  de df db de dc da dc da  da da dc e0 da da d8
d8   ........ ........
0540  dc db dd de d5 d9 d7 d5  de d9 da d7 db da d6
d4   ........ ........
0550  da d7 dc d9 d5 d7 d9 d8  d8 d5 d5 d5 d7 d5 d6
d4   ........ ........
0560  d4 d3 d9 d2 d1 d5 d3 d5  d0 d4 d0 d5 d1 d3 d4
d2   ........ ........
0570  d3 d0 d5 d1 d1 d1 d1 d1  cf d1 cd cf d3 ce ce
ce   ........ ........
0580  d0 cc cf cf ce ca ca ca  ca ca c7 cd cd cd c8
c8   ........ ........
0590  c8 cb c8 c8 c8 ca cb c8  ca c9 c7 c6 c4 c5 c7
c2   ........ ........
05a0  c4 c5 c6 c8 c4 c5 c4 c3  c3 c6 c5 c1 c5 c4 bc
00   ........ ........
05b0  05 a0

There are lots of differences between these.  However, as the checksum
is the same for them both, the two must have been the same when they
were checksummed, and the differences were introduced after that.  For
example, when it is transferred from lwIP to your hardware.

Kieran



_______________________________________________
lwip-users mailing list
[hidden email]
http://lists.nongnu.org/mailman/listinfo/lwip-users