I've been troubleshooting an RX performance issue for a couple of weeks now
and while I've made incremental gains and improved my own driver-side code,
I'm still missing something big here and I suspect that I'm likely just
misusing lwIP or have a bad config. Any help would be greatly appreciated.
- Port: Unix
- Hardware: Modern desktop-class hardware for both RX and TX
- API: Socket (I know the raw API is more performant).
- Protocol of concern: TCP
For each of the following tests I'm sending and receiving blobs of data of
exponentially increasing size (from 4mb to 256mb). All sizes exhibit roughly
the same problem over the course of their TCP stream.
- (1) OS native sockets TX -> OS native sockets RX (for baseline
- (2) lwIP socket API TX -> OS native RX (no issue here)
- (3) OS native TX -> lwIP socket RX (poor performance)
- (4) lwIP socket TX -> lwIP socket RX (exceedingly poor performance)
- In the capture file I have attached (126.96.36.199) is the native OS stack
sending to lwIP at (188.8.131.52)
I have tested against macOS and linux stacks, both react similarly to the
odd lwIP behavior.
In all test configurations (1/2/3/4) at home on my LAN I get roughly
comparable throughput and everything behaves just fine. However when I test
from a workstation at my office to home things change. For test (2) I see
about 90% of baseline, but for test (3) I usually see 25% of baseline (very
rarely it performs around 90% like test (2)). And test (4) is absolutely
horrendous with about 5% of baseline.
When looking at a packet capture (attached), I see a smattering of DUP ACKS,
Retransmissions, and in some cases ACKs that seem out of order or extremely
old but that are not duplicates!
It appears that eventually the sending side decides to reduce the segment
I've turned on DEBUG types and observed delayed ACK from lwIP
I've also turned on STATS and am not noticing any errors whatsoever.
- Given that I only see this from my office I am suspecting that either
packet loss or the increased latency is creating a situation that my current
lwIP config isn't handling well.
- Since I can observe >95% of baseline when on my LAN I doubt there is a
bottleneck in my code or lwIP itself.
- Since the performance seems to get worse with the addition of (lwIP RX)
on either side I'm suspecting that if I fix my issue in test (3) it should
also fix test (4).
- After much reading a common theme of dropped frames comes up, I've
inspected and simplified my ethernet driver to the point where I don't
believe this is a possibility, especially given how well it performs on my
- I've read that TCP_TMR_INTERVAL is tick based and not based on an actual
timer. I've toyed around with this value lowering it to 1 and raising it to
4000 but I feel this is a shortsighted approach asking for trouble. I've
looked at the unix port and it does look like it's using a clock so I'm
somewhat confused on this point.
- Initially I thought this could be a window scaling issue so I've bumped
it up to a pretty high value for testing.
- Am I out in the weeds?
- What else can I do to narrow down the issue?
- What are some reasonable values to explore in lwipopts.h given my setup?
Your msg is too long for me, I'm too lazy to read it and too dumb to
keep focus at the same time.
Your capture file is long too, but fortunately retransmissions happen
right at the beginning.
I see you are ACKing 100ms later, several frames later.
I see (at least once) that you ACK a frame and ms later you ACK again
and even several times in a row (frame #177 and starting at #182).
That looks (to me) like a time base problem, check your sys_now() and
your port. I'm more of the bare metal type so I can't tell you much more
on how to setup an OS port. I've seen the unix port long ago and used it
as bare metal, don't know how it will handle timing info to lwIP (nor
And... 2814 bytes per frame ? Jumbo frames ? Can you try with more
common MTUs over the Internet ? Just in case.
Try to run some perf test over UDP, this will move the timers out of the
scenario and you can check for possible frame loss. UDP datagrams should
be numbered, though.
In any case, violating threading rules causes lots of strange artifacts,
make sure you don't.
Thank you for taking a look and thank you for the suggestions. After
considering your idea about testing with just UDP I decided to try with ICMP
packet loss measurements and found out that indeed it was my driver that was
dropping frames on occasion. Ugh.
Fixing my driver-side code has resolved all of the issues previously