Low Iperf performance of lwip 1.4.1 on STM32 and FreeRTOS

classic Classic list List threaded Threaded
16 messages Options
Reply | Threaded
Open this post in threaded view
|

Low Iperf performance of lwip 1.4.1 on STM32 and FreeRTOS

Claudius Zingerli-2
Hello all,

I'm working on a project using lwIP 1.4.1, FreeRTOS 7.4.2 on an
STM32F407 MCU.
I have several UDP/TCP/Multicast services running well, but when I tried
to measure TCP bandwidth with Iperf as well as with dd|nc, I get very
low results.
Iperf basically just sends a lot of data and lwIP drops it (using
netconn_recv();netbuf_delete() or netconn_recv_tcp_pbuf();pbuf_free();)

An analysis with Wireshark shows the following:
(TCP_MSS=TCP_WND=1460)
- SYN,SYNACK,ACK,PSH,PSH (as usual)
- ZeroWindow (client stuck), WindowUpdate (some ms later)
- PSH, ZeroWindow, WindowUpdate,...

As I understand it, this is how TCP works. Quite low bandwidth (a few
hundred kBps) with these settings, but it works.
When I try to increase TCP_WND to p.e. 5kB, the following problems arise:
- Dup ACKs (from lwIP)
- lots of Retransmissions (from Linux)
The bandwidth is in the Bps to kBps range (at most). I spent hours, but
have no clue where to look next. Any ideas what could be the reason?
(Iperf Linux to Linux results in the full line speed)

One interesting thing is: I get about 0.5% packet drop if I do a ping -f
(100 Pings per second, packets seem to never arrive at the Eth
interrupt). MCU load is always quite low (I have a low prio blink task
that still gets its CPU time as well as )
Things I already fixed: (my design bases on ST's ethernet code)
- Check any stacks/NULL/malloc fails
- Check if pbuf fits into Tx buffer
- Check if there is enough pbuf_mem to fits Rx packet
- In packet reception I try to drain the input queue (by checking
DMARxDescToGet->Status & ETH_DMARxDesc_OWN )
- ETH_DMASR_RBUS cleared in low_level_input()

I just ran out of ideas how to fix the problem. Is this about tuning
lwipopts.h? Attached, my current version.

Best regards

Claudius

_______________________________________________
lwip-users mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/lwip-users

lwipopts.h (9K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Low Iperf performance of lwip 1.4.1 on STM32 and FreeRTOS

Claudius Zingerli
Hello all,

I'm working on a project using lwIP 1.4.1, FreeRTOS 7.4.2 on an
STM32F407 MCU.
I have several UDP/TCP/Multicast services running well, but when I tried
to measure TCP bandwidth with Iperf as well as with dd|nc, I get very
low results.
Iperf basically just sends a lot of data and lwIP drops it (using
netconn_recv();netbuf_delete() or netconn_recv_tcp_pbuf();pbuf_free();)

An analysis with Wireshark shows the following:
(TCP_MSS=TCP_WND=1460)
- SYN,SYNACK,ACK,PSH,PSH (as usual)
- ZeroWindow (client stuck), WindowUpdate (some ms later)
- PSH, ZeroWindow, WindowUpdate,...

As I understand it, this is how TCP works. Quite low bandwidth (a few
hundred kBps) with these settings, but it works.
When I try to increase TCP_WND to p.e. 5kB, the following problems arise:
- Dup ACKs (from lwIP)
- lots of Retransmissions (from Linux)
The bandwidth is in the Bps to kBps range (at most). I spent hours, but
have no clue where to look next. Any ideas what could be the
reason?(Iperf Linux to Linux results in the full line speed)

One interesting thing is: I get about 0.5% packet drop if I do a ping -f
(100 Pings per second, packets seem to never arrive at the Eth
interrupt). MCU load is always quite low (I have a low prio blink task
that still gets its CPU time as well as )
Things I already fixed: (my design bases on ST's ethernet code)
- Check any stacks/NULL/malloc fails
- Check if pbuf fits into Tx buffer
- Check if there is enough pbuf_mem to fits Rx packet
- In packet reception I try to drain the input queue (by checking
DMARxDescToGet->Status & ETH_DMARxDesc_OWN )
- ETH_DMASR_RBUS cleared in low_level_input()

I just ran out of ideas how to fix the problem. Is this about tuning
lwipopts.h? Attached, my current version.

Best regards

Claudius




_______________________________________________
lwip-users mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/lwip-users

lwipopts.h (9K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Low Iperf performance of lwip 1.4.1 on STM32 and FreeRTOS

FreeRTOS info
On 21/06/2013 08:59, Claudius Zingerli wrote:

> Hello all,
>
> I'm working on a project using lwIP 1.4.1, FreeRTOS 7.4.2 on an
> STM32F407 MCU.
> I have several UDP/TCP/Multicast services running well, but when I tried
> to measure TCP bandwidth with Iperf as well as with dd|nc, I get very
> low results.
> Iperf basically just sends a lot of data and lwIP drops it (using
> netconn_recv();netbuf_delete() or netconn_recv_tcp_pbuf();pbuf_free();)
>
> An analysis with Wireshark shows the following:
> (TCP_MSS=TCP_WND=1460)
> - SYN,SYNACK,ACK,PSH,PSH (as usual)
> - ZeroWindow (client stuck), WindowUpdate (some ms later)
> - PSH, ZeroWindow, WindowUpdate,...


Ever so slightly off topic -

It sounds like there are lots of people doing good work with FreeRTOS
and lwIP here, and I'm sorry I don't get the time to contribute to these
threads more often.  In the past I have attempted to maintain an
"example integration" running in the FreeRTOS Win32 simulator, but
projects discussed here go far beyond that.

I would be very grateful if people could occasionally post frameworks of
their code in the FreeRTOS Interactive site for others to reference.

http://interactive.freertos.org

Regards,
Richard.

+ http://www.FreeRTOS.org
Designed for microcontrollers. More than 103000 downloads in 2012.

+ http://www.FreeRTOS.org/plus
Trace, safety certification, FAT FS, TCP/IP, training, and more...

_______________________________________________
lwip-users mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/lwip-users
Reply | Threaded
Open this post in threaded view
|

Re: Low Iperf performance of lwip 1.4.1 on STM32 and FreeRTOS

ella
The problem is not FreeRTOS but buggy and ugly STM32 netif driver. I have studied original driver provided by ST and had nothing but rewrite it.  

Just one example of wrong architecture of this driver. This is from low_level_output():

    buffer =  (u8 *)(DMATxDescToSet->Buffer1Addr);
    for(q = p; q != NULL; q = q->next)
    {
      memcpy((u8_t*)&buffer[l], q->payload, q->len);
      l = l + q->len;
    }

Consider that buffers are allocated as
extern uint8_t Tx_Buff[ETH_TXBUFNB][ETH_TX_BUF_SIZE];
and are linked to chained DMA descriptors.

If packet size bigger then ETH_TX_BUF_SIZE you are at potential danger of wrap around that is not treated in code. Same happens for RX flow. So no surprise you have a problems with big packets.
And this is only one place, there is a number of others. There are also a few races.
In short DO NOT USE THIS DRIVER.

Reply | Threaded
Open this post in threaded view
|

Re: Low Iperf performance of lwip 1.4.1 on STM32 and FreeRTOS

Claudius Zingerli
Dear Ella,

Well well... That's what I'm currently considering as well: A complete
rewrite of the MAC driver. ST's code is definitely ugly, inconsistent
and often seems to be copy-pasted from older code, but not really
adapted to the new devices/functions.
I already fixed the thing you mentioned by ASSERTing l+q->len to be
smaller than the buffer (ST's driver checks that somehow later by
splitting one buffer into multiple buffers in
ETH_Prepare_Transmit_Descriptors(...), but I'm not sure if that still
works with the last chained DMA descriptor).

Before I write my own MAC driver, I wanted to get a benchmark running
with the original code to compare it to LPCs Iperf benchmarks. They
implemented a zero-copy MAC driver for LwIP that achieves almost line
speed (at much slower clock rates than ST).
For my device: Ping round trip time is between 130us and 250us (1 Switch
between Linux+STM32F407 running at 150MHz), but >0% packet loss, TCP is
unreliable and UDP seems to work, but not benchmarked yet.

So: Any open (BSD/GPL), stable and optimally zero-copy MAC drivers for
STM32F4x7+FreeRTOS available here? Maybe I can get some inspiration from
ChibiOS's driver as they seem to have mostly not used ST code.

Regards

Claudius


On 6/23/2013 7:16 AM, ella wrote:

> The problem is not FreeRTOS but buggy and ugly STM32 netif driver. I have
> studied original driver provided by ST and had nothing but rewrite it.
>
> Just one example of wrong architecture of this driver. This is from
> low_level_output():
>
>      buffer =  (u8 *)(DMATxDescToSet->Buffer1Addr);
>      for(q = p; q != NULL; q = q->next)
>      {
>        memcpy((u8_t*)&buffer[l], q->payload, q->len);
>        l = l + q->len;
>      }
>
> Consider that buffers are allocated as
> extern uint8_t Tx_Buff[ETH_TXBUFNB][ETH_TX_BUF_SIZE];
> and are linked to chained DMA descriptors.
>
> If packet size bigger then ETH_TX_BUF_SIZE you are at potential danger of
> wrap around that is not treated in code. Same happens for RX flow. So no
> surprise you have a problems with big packets.
> And this is only one place, there is a number of others. There are also a
> few races.
> In short DO NOT USE THIS DRIVER.
>
>
>
>
>
> --
> View this message in context: http://lwip.100.n7.nabble.com/Low-Iperf-performance-of-lwip-1-4-1-on-STM32-and-FreeRTOS-tp21579p21581.html
> Sent from the lwip-users mailing list archive at Nabble.com.
>
> _______________________________________________
> lwip-users mailing list
> [hidden email]
> https://lists.nongnu.org/mailman/listinfo/lwip-users
>
>


_______________________________________________
lwip-users mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/lwip-users
Reply | Threaded
Open this post in threaded view
|

[OT] Re: Low Iperf performance of lwip 1.4.1 on STM32 and FreeRTOS

Claudius Zingerli
In reply to this post by FreeRTOS info
Dear Richard,

I completely agree with your request to put the code online. But
currently I'm working on some quite fundamental problems, so a
svn/git-like repo (or links to such a repo) would be much more practical
yet than uploading a zip of some alpha-level code. I could come back to
that at a later stage.

Claudius

On 6/21/2013 1:11 PM, FreeRTOS Info wrote:
> I would be very grateful if people could occasionally post frameworks of
> their code in the FreeRTOS Interactive site for others to reference.
>
> http://interactive.freertos.org


_______________________________________________
lwip-users mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/lwip-users
Reply | Threaded
Open this post in threaded view
|

Re: Low Iperf performance of lwip 1.4.1 on STM32 and FreeRTOS

Claudius Zingerli
In reply to this post by ella
Hi ella and all,

Some progress here: I receive a lot of CRC & Align errors (MMC counters
of STM32). At least there is /some/ correlation between these errors and
LWIP behaving strangely (if there is an increase in the MMC counters,
LwIP gets into trouble, if there is none, LwIP mostly works fine). This
may further be related to the usage of RMII between the MAC and PHY and
creating the RMII-Clock with the STM32-PLL. Datasheet jitter and
precision should be OK for the PHY, but this might not be the cleanest
solution (There is some hint in the datasheet that good guys should
source the RMII clock by bypassing the PLL). So in a next step, I'm
going to use a dedicated 50MHz oscillator to clock the PHY and MCU.
On the software-side: An own implementation of the MAC driver is on the
way. Could probably be open sourced if there is some interest.

Claudius


On 6/23/2013 7:16 AM, ella wrote:

> The problem is not FreeRTOS but buggy and ugly STM32 netif driver. I have
> studied original driver provided by ST and had nothing but rewrite it.
>
> Just one example of wrong architecture of this driver. This is from
> low_level_output():
>
>      buffer =  (u8 *)(DMATxDescToSet->Buffer1Addr);
>      for(q = p; q != NULL; q = q->next)
>      {
>        memcpy((u8_t*)&buffer[l], q->payload, q->len);
>        l = l + q->len;
>      }
>
> Consider that buffers are allocated as
> extern uint8_t Tx_Buff[ETH_TXBUFNB][ETH_TX_BUF_SIZE];
> and are linked to chained DMA descriptors.
>
> If packet size bigger then ETH_TX_BUF_SIZE you are at potential danger of
> wrap around that is not treated in code. Same happens for RX flow. So no
> surprise you have a problems with big packets.
> And this is only one place, there is a number of others. There are also a
> few races.
> In short DO NOT USE THIS DRIVER.
>
>
>
>
>
> --
> View this message in context: http://lwip.100.n7.nabble.com/Low-Iperf-performance-of-lwip-1-4-1-on-STM32-and-FreeRTOS-tp21579p21581.html
> Sent from the lwip-users mailing list archive at Nabble.com.
>
> _______________________________________________
> lwip-users mailing list
> [hidden email]
> https://lists.nongnu.org/mailman/listinfo/lwip-users
>
>


_______________________________________________
lwip-users mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/lwip-users
Reply | Threaded
Open this post in threaded view
|

Re: Low Iperf performance of lwip 1.4.1 on STM32 and FreeRTOS

Jeff Barlow
On 7/1/2013 12:56 AM, Claudius Zingerli wrote:
> ...This may further be related to the usage of RMII between the MAC
> and PHY and creating the RMII-Clock with the STM32-PLL. Datasheet
> jitter and precision should be OK for the PHY, but this might not be
> the cleanest solution (There is some hint in the datasheet that good
> guys should source the RMII clock by bypassing the PLL). So in a
> next step, I'm going to use a dedicated 50MHz oscillator to clock the
> PHY and MCU.

This is a known issue. Deriving the RMII-Clock with the STM32-PLL is
just not a robust design.

There are several PHY chips (Micrel, etc) that have built in RMII clock
generators that can use a low cost 25MHz crystal and provide a nice low
jitter clock back to the MCU.
--
Later,
Jeff

_______________________________________________
lwip-users mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/lwip-users
Reply | Threaded
Open this post in threaded view
|

Re: Low Iperf performance of lwip 1.4.1 on STM32 and FreeRTOS

ella
In reply to this post by Claudius Zingerli
Hi,
The clock jitter is documented in ST errata. (This file is not easy to find. Instead of hiding it ST had to put it into Data Sheet and big bold letters). As far as I understand it you can not use ST PLL neither for MII not for RMII as in both cases it does not fir the long term jitter requirement for PHY. Without going deep into understanding the meaning of this bug and possible outcome we have used external 25MHz crystal with MII. (For RMII you will need 50MHz one).

As for open source, I also thought about it but ST ignorance stopped me from doing it. I think respectable company like ST has to take care on the ugly source code they provide on their web site. It is not only related to Ethernet driver but their Peripheral Library is exactly in the same state. (In my projects I do not use it at all and write my own library adding support of different HW modules as a need for it comes up).
So I decided not to help them and not to disclose any code. But if you are looking for some cooperation I'm in. The final goal is to get stable driver with zero-copy. To my understanding MAC and DMA periheral of the STM32F2xx is sufficient for that.
 
Reply | Threaded
Open this post in threaded view
|

Re: Low Iperf performance of lwip 1.4.1 on STM32 and FreeRTOS

Claudius Zingerli
In reply to this post by Jeff Barlow
On 7/1/2013 9:00 PM, Jeff Barlow wrote:
> On 7/1/2013 12:56 AM, Claudius Zingerli wrote:
[RMII-Clock from PLL is bad]
> This is a known issue. Deriving the RMII-Clock with the STM32-PLL is
> just not a robust design.
>
> There are several PHY chips (Micrel, etc) that have built in RMII clock
> generators that can use a low cost 25MHz crystal and provide a nice low
> jitter clock back to the MCU.

In the final design, we plan use a 3-port switch from Micrel. It can be
clocked from 25MHz or 50MHz, but the board I'm using to develop the
software has a DP83848 that does need a 50MHz clock source for RMII.
STM32F4 /might/ be able to handle 50MHz as a main clock source. (The
datasheet is ambiguous about that: The drawing says 4-26MHz HSE, but the
table says 1-50MHz HSE. One could interpret that as from 26MHz one has
to use an oscillator, below an Xtal would fit as well)

Regards

Claudius

_______________________________________________
lwip-users mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/lwip-users
Reply | Threaded
Open this post in threaded view
|

Re: Low Iperf performance of lwip 1.4.1 on STM32 and FreeRTOS

Jeff Barlow
On 7/2/2013 4:50 AM, Claudius Zingerli wrote:
> In the final design, we plan use a 3-port switch from Micrel. It can be
> clocked from 25MHz or 50MHz, but the board I'm using to develop the
> software has a DP83848 that does need a 50MHz clock source for RMII.
> STM32F4 /might/ be able to handle 50MHz as a main clock source.

I see. For a one-off dev board I think it's always less frustrating to
just use a separate 50MHz oscillator. The older PHY chips can be really
fussy about clock jitter. I think once you get away from that DP83848
you'll find things less painful.

I was just suggesting using the Micrel RMII clock output to feed the
RMII clock input on the MCU. I've never tried feeding a 50MHz clock into
the HSE on a STM32F407 but it strikes me as risky. I do seem to recall
that some of the PHYs also have a direct 25MHz output that would work
for that. Don't know if that includes your switch chip, however.
--
Later,
Jeff

_______________________________________________
lwip-users mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/lwip-users
Reply | Threaded
Open this post in threaded view
|

Re: Low Iperf performance of lwip 1.4.1 on STM32 and FreeRTOS

Krzysztof Wesołowski

We put 25MHz HSE to STM32F4, then forward same clock from MCO to micrel phy and then use micrels 50MHz for RMII.

On Jul 2, 2013 8:50 PM, "Jeff Barlow" <[hidden email]> wrote:
On 7/2/2013 4:50 AM, Claudius Zingerli wrote:
In the final design, we plan use a 3-port switch from Micrel. It can be
clocked from 25MHz or 50MHz, but the board I'm using to develop the
software has a DP83848 that does need a 50MHz clock source for RMII.
STM32F4 /might/ be able to handle 50MHz as a main clock source.

I see. For a one-off dev board I think it's always less frustrating to just use a separate 50MHz oscillator. The older PHY chips can be really fussy about clock jitter. I think once you get away from that DP83848 you'll find things less painful.

I was just suggesting using the Micrel RMII clock output to feed the RMII clock input on the MCU. I've never tried feeding a 50MHz clock into the HSE on a STM32F407 but it strikes me as risky. I do seem to recall that some of the PHYs also have a direct 25MHz output that would work for that. Don't know if that includes your switch chip, however.
--
Later,
Jeff

_______________________________________________
lwip-users mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/lwip-users

_______________________________________________
lwip-users mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/lwip-users
Reply | Threaded
Open this post in threaded view
|

Re: Low Iperf performance of lwip 1.4.1 on STM32 and FreeRTOS

ella
Hi,
Can you tell me please exact part number of the Micrel PHY that can work in RMII with external 25MHz crystal. I'd like to try it as well.
Thanks.
Reply | Threaded
Open this post in threaded view
|

Re: Low Iperf performance of lwip 1.4.1 on STM32 and FreeRTOS

Claudius Zingerli
Hi ella,

On 7/3/2013 5:51 AM, ella wrote:
 > Can you tell me please exact part number of the Micrel PHY that can
 > work in RMII with external 25MHz crystal. I'd like to try it as well.

We plan to use KSZ8863RLL. It can be clocked from 25MHz (xtal,osc) or
50MHz (osc). No practical experience yet. Anyone using that IC as well?

Regards
Claudius

PS: Using a 50MHz Osc for the DP83848 results in Ping: 0ppm packet loss,
92us/117us/258us min/avg/max rtt, 92Mbps tcp receive bandwidth using
Iperf on an STM32F407 clocked at 150MHz connected to a fast Linux PC via
a Dlink USB-FastEthernet adapter.

_______________________________________________
lwip-users mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/lwip-users
Reply | Threaded
Open this post in threaded view
|

Re: Low Iperf performance of lwip 1.4.1 on STM32 and FreeRTOS

Pomeroy, Marty
 

>> > Can you tell me please exact part number of the Micrel PHY

>> We plan to use KSZ8863RLL.

We're using KSZ8031. 100MHz been working for about year with LPC1788
RMII.

Marty

_______________________________________________
lwip-users mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/lwip-users
Reply | Threaded
Open this post in threaded view
|

Re: Low Iperf performance of lwip 1.4.1 on STM32 and FreeRTOS

Jeff Barlow
In reply to this post by ella
On 7/2/2013 8:51 PM, ella wrote:
> Can you tell me please exact part number of the Micrel PHY that can work in
> RMII with external 25MHz crystal.

I think most of the newer Micrel parts work that way. Have a look at
<http://www.micrel.com/index.php/en/products/lan-solutions/phys.html>

I'd guess other vendors recent designs would be similar.
--
Later,
Jeff

_______________________________________________
lwip-users mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/lwip-users