[bug #57452] CHECKSUM_ON_COPY leads to random bit error in TCP checksum

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

[bug #57452] CHECKSUM_ON_COPY leads to random bit error in TCP checksum

Simon Goldschmidt
URL:
  <https://savannah.nongnu.org/bugs/?57452>

                 Summary: CHECKSUM_ON_COPY leads to random bit error in TCP
checksum
                 Project: lwIP - A Lightweight TCP/IP stack
            Submitted by: vbrzeski
            Submitted on: Thu 19 Dec 2019 09:42:25 PM UTC
                Category: None
                Severity: 3 - Normal
              Item Group: Faulty Behaviour
                  Status: None
                 Privacy: Public
             Assigned to: None
             Open/Closed: Open
         Discussion Lock: Any
         Planned Release: None
            lwIP version: git head

    _______________________________________________________

Details:

Hello,

I am having problems after developing a LWIP_CHKSUM_COPY() assembly routine.
Implementations of a superior MEMCPY to the standard library, as well as a
faster LWIP_CHKSUM by utilizing ADDC instructions were both successful.
Unfortunately this improvement has caused me much frustration.

Naturally I assumed my assembly code was erroneous so I spent time I disabled
all of these macros and found the error to still exist. Considering the bug
was consistent (checksum off by 1 in MSB of checksum) I attempted to squish
it. Unfortunately after a few days I was unsuccessful.

I have done:
 - Sanity checks of my MEMCPY(), LWIP_CHKSUM_COPY() and LWIP_CHKSUM() - data
copied correctly and checksum correct
 - Turned off these features
 - Turned off my asm byteswapping functions
 - Turned off compiler optimizations (only constant folding)

Chasing the bug it seems random, and I cannot define the behavior that
reproduces it, however, it is very common. I have attached logs of debug
messages as well as a netcap with the sanity check override disabled
(commenting line 1595 in tcp_out.c).

Target: Infineon XC22xxM with ASIX Ax88796b netif

Thanks & regards,
-Victor




    _______________________________________________________

File Attachments:


-------------------------------------------------------
Date: Thu 19 Dec 2019 09:42:25 PM UTC  Name: csum_copy_capture.pcapng  Size:
69KiB   By: vbrzeski
examples of checksum issues
<http://savannah.nongnu.org/bugs/download.php?file_id=48092>
-------------------------------------------------------
Date: Thu 19 Dec 2019 09:42:25 PM UTC  Name: teraterm.log  Size: 597KiB   By:
vbrzeski
examples of checksum issues
<http://savannah.nongnu.org/bugs/download.php?file_id=48093>

    _______________________________________________________

Reply to this item at:

  <https://savannah.nongnu.org/bugs/?57452>

_______________________________________________
  Message sent via Savannah
  https://savannah.nongnu.org/


_______________________________________________
lwip-devel mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/lwip-devel
Reply | Threaded
Open this post in threaded view
|

[bug #57452] CHECKSUM_ON_COPY leads to random bit error in TCP checksum

Simon Goldschmidt
Follow-up Comment #1, bug #57452 (project lwip):

Have you tried TCP_CHECKSUM_ON_COPY_SANITY_CHECK? That should run
TCP_CHECKSUM_ON_COPY and traditional checksum in parallel and report errors at
the same time. If you put a breakpoint on the error-reporting lines or insert
code to dump state there, you might be able to debug what's wrong.

    _______________________________________________________

Reply to this item at:

  <https://savannah.nongnu.org/bugs/?57452>

_______________________________________________
  Message sent via Savannah
  https://savannah.nongnu.org/


_______________________________________________
lwip-devel mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/lwip-devel
Reply | Threaded
Open this post in threaded view
|

[bug #57452] CHECKSUM_ON_COPY leads to random bit error in TCP checksum

Simon Goldschmidt
Follow-up Comment #2, bug #57452 (project lwip):


[comment #1 comment #1:]
> Have you tried TCP_CHECKSUM_ON_COPY_SANITY_CHECK? That should run
TCP_CHECKSUM_ON_COPY and traditional checksum in parallel and report errors at
the same time. If you put a breakpoint on the error-reporting lines or insert
code to dump state there, you might be able to debug what's wrong.

Yes I have,

I have even went as far as to write a sanity check for each call to my own
LWIP_CHKSUM (compared to algorithm 2), and LWIP_CHKSUM_COPY (compared to
algorithm 2, and memcmp).

I have done breakpoints, but the issue seems random. I did have a bug in my
netif driver which proved unrelated. There just seems to be a random bit
error, where the normal checksum routine doesn't have it, nor my own
LWIP_CHKSUM.

When using TCP_CHECKSUM_ON_COPY_SANITY_CHECK the checksum that was incorrect
is corrected in the packet itself, and not reported via Wireshark, but only
via the DEBUGF utility.

I have attached a capture, as well as a log. To find an example of the error,
ctrl+F for "tcp_output_segment: calculated checksum is FF56 instead of FF55"

Every single one of the checksum errors is off by 1. I will dedicate more time
to debugging in a week or two.


    _______________________________________________________

Reply to this item at:

  <https://savannah.nongnu.org/bugs/?57452>

_______________________________________________
  Message sent via Savannah
  https://savannah.nongnu.org/


_______________________________________________
lwip-devel mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/lwip-devel