Throughput benchmark question

classic Classic list List threaded Threaded
24 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Re: Throughput benchmark question - nasty ~1.5 second pauses

Dave Nadler
Hi all - Back to look at this delay issue. Update:
I studied the driver and ST-provided FreeRTOS/LwIP/cmsis glue and all looks AOK,
unlike the extremely buggy code ST provides for ST32F7xx series.

Again, after a lost packet, the host (PC running Windows 10) issues a single duplicate-ack and waits.
LwIP receives the single duplicate ack and by design ignores it (tcp_in.c lines 1207-1227).
LwIP takes two passes through slow_tmr (.5 sec intervals) before retransmitting the lost packet.
Hence nasty >1 second delay.

I tried patching LwIP to immediately retransmit on a duplicate ack (line 1215):
              if (pcb->dupacks >= 1 /* DRN kludge prevents 1+sec delays after lost packet, should be: 3 */) {
                /* Do fast retransmit (checked via TF_INFR, not via dupacks count) */
                tcp_rexmit_fast(pcb);
              }
Wireshark shows a TCP out-of-order packet, which it did not do unpatched (after the 1.5 sec delay):

Is this OK? Or is there something wrong in tcp_rexmit_fast?
The client picks up and continues happily regardless and I get the desired through-put with no problematic long pauses...

Thanks!
Best Regards, Dave

On 3/15/2019 5:56 PM, Dave Nadler wrote:
To recap: LwIP 2.1.2 on FreeRTOS 9, ST32F429, IPv4, TCP.
I want to see how much I can consistently push through the stack.
Made a simple test server (sockets API) which repeatedly outputs 101-character lines.
I access the server via PuTTY raw mode on Winbloze over a local network.
I can usually send 3 lines per msec for a second (3000 lines in 1 second), but...
Sometimes, I get ~ 1-second pauses (as seen in Putty or TeraTerm).

Here's the capture:

http://www.nadler.com/backups/20190227_Lwip_pause.pcapng

Everything is going swimmingly until 4316.

The windows client notes a missing segment and issues a duplicate ACK as expected.
This exact pattern is quite repeatable.
FreeRTOS is running happily during the evil pause (LED blinky task uninterrupted).

Why does the LwIP application take ~1.5 seconds to retransmit the data?

Again, thanks for your time and any hints...
Best Regards, Dave


-- 
Dave Nadler, USA East Coast voice (978) 263-0097, [hidden email], Skype 
 Dave.Nadler1

_______________________________________________
lwip-users mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/lwip-users
Reply | Threaded
Open this post in threaded view
|

Re: Throughput benchmark question - nasty ~1.5 second pauses

goldsimon@gmx.de
Am 05.05.2019 um 15:47 schrieb Dave Nadler:

> Hi all - Back to look at this delay issue. Update:
> I studied the driver and ST-provided FreeRTOS/LwIP/cmsis glue and all
> looks AOK,
> unlike the extremely buggy code ST provides for ST32F7xx series.
>
> Again, after a lost packet, the host (PC running Windows 10) issues a
> _*single*_ duplicate-ack and waits.
> LwIP receives the single duplicate ack and by design _*ignores it*_
> (tcp_in.c lines 1207-1227).
> LwIP takes two passes through slow_tmr (.5 sec intervals) before
> retransmitting the lost packet.
> Hence nasty >1 second delay.
>
> I tried patching LwIP to _immediately_ retransmit on a duplicate ack
> (line 1215):
>                if (pcb->dupacks >= 1 /* DRN kludge prevents 1+sec delays
> after lost packet, should be: 3 */) {
>                  /* Do fast retransmit (checked via TF_INFR, not via
> dupacks count) */
>                  tcp_rexmit_fast(pcb);
>                }
> Wireshark shows a TCP out-of-order packet, which it did not do unpatched
> (after the 1.5 sec delay):
>
> http://www.nadler.com/backups/20190503_Lwip_pause_kludgeFix.pcapng
>
> Is this OK? Or is there something wrong in tcp_rexmit_fast?

No, fast_rexmit is supposed to kick in after 3 dupacks, not at the first
dupack.

Regards,
Simon

_______________________________________________
lwip-users mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/lwip-users
Reply | Threaded
Open this post in threaded view
|

Re: Throughput benchmark question - nasty ~1.5 second pauses

Dave Nadler
On 5/6/2019 3:12 PM, [hidden email] wrote:
Am 05.05.2019 um 15:47 schrieb Dave Nadler:
Hi all - Back to look at this delay issue. Update:
I studied the driver and ST-provided FreeRTOS/LwIP/cmsis glue and all
looks AOK,
unlike the extremely buggy code ST provides for ST32F7xx series.

Again, after a lost packet, the host (PC running Windows 10) issues a
_*single*_ duplicate-ack and waits.
LwIP receives the single duplicate ack and by design _*ignores it*_
(tcp_in.c lines 1207-1227).
LwIP takes two passes through slow_tmr (.5 sec intervals) before
retransmitting the lost packet.
Hence nasty >1 second delay.

I tried patching LwIP to _immediately_ retransmit on a duplicate ack
(line 1215):
               if (pcb->dupacks >= 1 /* DRN kludge prevents 1+sec delays
after lost packet, should be: 3 */) {
                 /* Do fast retransmit (checked via TF_INFR, not via
dupacks count) */
                 tcp_rexmit_fast(pcb);
               }
Wireshark shows a TCP out-of-order packet, which it did not do unpatched
(after the 1.5 sec delay):

http://www.nadler.com/backups/20190503_Lwip_pause_kludgeFix.pcapng

Is this OK? Or is there something wrong in tcp_rexmit_fast?

No, fast_rexmit is supposed to kick in after 3 dupacks, not at the first
dupack.


Thanks Simon, I can see that is the code's intent, except the PC never sends more than one dupack.
Sorry if I wasn't clear, my precise question is:
Why is tcp_rexmit_fast resulting in an out-of-order packet as reported by Wireshark?
Is something wrong in tcp_rexmit_fast?
Thanks again,
Best Regards, Dave


Regards,
Simon
_______________________________________________
lwip-users mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/lwip-users


-- 
Dave Nadler, USA East Coast voice (978) 263-0097, [hidden email], Skype 
 Dave.Nadler1

_______________________________________________
lwip-users mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/lwip-users
Reply | Threaded
Open this post in threaded view
|

Re: Throughput benchmark question - nasty ~1.5 second pauses

goldsimon@gmx.de
Am 06.05.2019 um 23:22 schrieb Dave Nadler:

> On 5/6/2019 3:12 PM, [hidden email] wrote:
>> Am 05.05.2019 um 15:47 schrieb Dave Nadler:
>>> Hi all - Back to look at this delay issue. Update:
>>> I studied the driver and ST-provided FreeRTOS/LwIP/cmsis glue and all
>>> looks AOK,
>>> unlike the extremely buggy code ST provides for ST32F7xx series.
>>>
>>> Again, after a lost packet, the host (PC running Windows 10) issues a
>>> _*single*_ duplicate-ack and waits.
>>> LwIP receives the single duplicate ack and by design _*ignores it*_
>>> (tcp_in.c lines 1207-1227).
>>> LwIP takes two passes through slow_tmr (.5 sec intervals) before
>>> retransmitting the lost packet.
>>> Hence nasty >1 second delay.
>>>
>>> I tried patching LwIP to _immediately_ retransmit on a duplicate ack
>>> (line 1215):
>>>                if (pcb->dupacks >= 1 /* DRN kludge prevents 1+sec delays
>>> after lost packet, should be: 3 */) {
>>>                  /* Do fast retransmit (checked via TF_INFR, not via
>>> dupacks count) */
>>>                  tcp_rexmit_fast(pcb);
>>>                }
>>> Wireshark shows a TCP out-of-order packet, which it did not do unpatched
>>> (after the 1.5 sec delay):
>>>
>>> http://www.nadler.com/backups/20190503_Lwip_pause_kludgeFix.pcapng
>>>
>>> Is this OK? Or is there something wrong in tcp_rexmit_fast?
>>
>> No, fast_rexmit is supposed to kick in after 3 dupacks, not at the first
>> dupack.
>
>
> Thanks Simon, I can see that is the code's intent, except the PC never
> sends more than one dupack.
> Sorry if I wasn't clear, my precise question is:
> Why is tcp_rexmit_fast resulting in an out-of-order packet as reported
> by Wireshark?
> Is something wrong in tcp_rexmit_fast?

If I understood you right, the out-of-order packet is only shown if you
modify tcp_rexmit_fast to be called after the first dupack?

If so, wireshark probably just cannot handle the retransmission taking
place so soon with only one dupack and suspects someone has fiddled
around with the tcp stack instead of keeping to the specs and just marks
it as out-of-order... ;-)

Regards,
Simon

_______________________________________________
lwip-users mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/lwip-users
12