lwIP Checksum routine

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

lwIP Checksum routine

Sathya Thammanur
Hi all,
The lwip_chksum() function in lwip/src/core/inet.c seems to be unoptimized. This is doing halfword reads and additions. Wouldnt it be better to do word accesses and hence additions? There would be some prologue and epilogue code to checks for bringing the buffer to halfword->word boundaries. HAs anyone tried doing the same for any of their ports? Or am I missing something out here?

Thanks,
Sathya


_______________________________________________
lwip-users mailing list
[hidden email]
http://lists.nongnu.org/mailman/listinfo/lwip-users
Reply | Threaded
Open this post in threaded view
|

Re: lwIP Checksum routine

Jim Gibbons
We did an optimization for one port (NiosII).  This is very CPU dependent.  In our particular case, we did better with 16-bit accesses owing to a slow shifter.  We did the best by handling 8 half-words in one pass of an outer loop.  This allowed us to use small constant offsets that could be encoded in the load instructions, e.g., acc += data[0]; acc += data[1]; etc.  The loop overheads and the pointer update (data += 8) became a much smaller fraction of the CPU time taken.

But, as I said, this stuff is very CPU dependent.  Considering that, I think that the core code is as it should be. 

It's a simple thing to change for your particular CPU, so I would urge you to do so.  I would also urge you to try a couple of different things and measure your results.  We were surprised when we found that full word accesses weren't good for us, and you may find some surprising things with your CPU.

You might also want to check your ethernet chip.  Some of the newer ones can assist you at the time of transmission.

Good luck!

Sathya Thammanur wrote:
Hi all,
The lwip_chksum() function in lwip/src/core/inet.c seems to be unoptimized. This is doing halfword reads and additions. Wouldnt it be better to do word accesses and hence additions? There would be some prologue and epilogue code to checks for bringing the buffer to halfword->word boundaries. HAs anyone tried doing the same for any of their ports? Or am I missing something out here?

Thanks,
Sathya


_______________________________________________ lwip-users mailing list [hidden email] http://lists.nongnu.org/mailman/listinfo/lwip-users

--
E-mail signature
Jim Gibbons
[hidden email]
Gibbons and Associates, Inc.
TEL: (408) 984-1441
900 Lafayette, Suite 704, Santa Clara, CA
FAX: (408) 247-6395



_______________________________________________
lwip-users mailing list
[hidden email]
http://lists.nongnu.org/mailman/listinfo/lwip-users
Reply | Threaded
Open this post in threaded view
|

Re: lwIP Checksum routine

Sathya Thammanur
Thanks for the reply Jim.

Sathya

On 11/14/05, Jim Gibbons <[hidden email]> wrote:
We did an optimization for one port (NiosII).  This is very CPU dependent.  In our particular case, we did better with 16-bit accesses owing to a slow shifter.  We did the best by handling 8 half-words in one pass of an outer loop.  This allowed us to use small constant offsets that could be encoded in the load instructions, e.g., acc += data[0]; acc += data[1]; etc.  The loop overheads and the pointer update (data += 8) became a much smaller fraction of the CPU time taken.

But, as I said, this stuff is very CPU dependent.  Considering that, I think that the core code is as it should be. 

It's a simple thing to change for your particular CPU, so I would urge you to do so.  I would also urge you to try a couple of different things and measure your results.  We were surprised when we found that full word accesses weren't good for us, and you may find some surprising things with your CPU.

You might also want to check your ethernet chip.  Some of the newer ones can assist you at the time of transmission.

Good luck!

Sathya Thammanur wrote:
Hi all,
The lwip_chksum() function in lwip/src/core/inet.c seems to be unoptimized. This is doing halfword reads and additions. Wouldnt it be better to do word accesses and hence additions? There would be some prologue and epilogue code to checks for bringing the buffer to halfword->word boundaries. HAs anyone tried doing the same for any of their ports? Or am I missing something out here?

Thanks,
Sathya



_______________________________________________
lwip-users mailing list
[hidden email]
<a href="http://lists.nongnu.org/mailman/listinfo/lwip-users" target="_blank" onclick="return top.js.OpenExtLink(window,event,this)">http://lists.nongnu.org/mailman/listinfo/lwip-users

--
Jim Gibbons
[hidden email]
Gibbons and Associates, Inc.
TEL: (408) 984-1441
900 Lafayette, Suite 704, Santa Clara, CA
FAX: (408) 247-6395



_______________________________________________
lwip-users mailing list
[hidden email]
<a onclick="return top.js.OpenExtLink(window,event,this)" href="http://lists.nongnu.org/mailman/listinfo/lwip-users" target="_blank"> http://lists.nongnu.org/mailman/listinfo/lwip-users



_______________________________________________
lwip-users mailing list
[hidden email]
http://lists.nongnu.org/mailman/listinfo/lwip-users
Reply | Threaded
Open this post in threaded view
|

Re: lwIP Checksum routine

Ashutosh Srivastava
In reply to this post by Jim Gibbons
E-mail signature
Thanks for this optimization info. I have already started on coding the
checksum computation in my processor assembly.
 
Can anyone suggest any other critical part of LWIP which gives
performance enhancement when optimized in assembly?
 
Thanks,
Ashutosh
----- Original Message -----
Sent: Tuesday, November 15, 2005 4:52 AM
Subject: Re: [lwip-users] lwIP Checksum routine

We did an optimization for one port (NiosII).  This is very CPU dependent.  In our particular case, we did better with 16-bit accesses owing to a slow shifter.  We did the best by handling 8 half-words in one pass of an outer loop.  This allowed us to use small constant offsets that could be encoded in the load instructions, e.g., acc += data[0]; acc += data[1]; etc.  The loop overheads and the pointer update (data += 8) became a much smaller fraction of the CPU time taken.

But, as I said, this stuff is very CPU dependent.  Considering that, I think that the core code is as it should be. 

It's a simple thing to change for your particular CPU, so I would urge you to do so.  I would also urge you to try a couple of different things and measure your results.  We were surprised when we found that full word accesses weren't good for us, and you may find some surprising things with your CPU.

You might also want to check your ethernet chip.  Some of the newer ones can assist you at the time of transmission.

Good luck!

Sathya Thammanur wrote:
Hi all,
The lwip_chksum() function in lwip/src/core/inet.c seems to be unoptimized. This is doing halfword reads and additions. Wouldnt it be better to do word accesses and hence additions? There would be some prologue and epilogue code to checks for bringing the buffer to halfword->word boundaries. HAs anyone tried doing the same for any of their ports? Or am I missing something out here?

Thanks,
Sathya


_______________________________________________ lwip-users mailing list [hidden email] http://lists.nongnu.org/mailman/listinfo/lwip-users

--
Jim Gibbons
[hidden email]
Gibbons and Associates, Inc.
TEL: (408) 984-1441
900 Lafayette, Suite 704, Santa Clara, CA
FAX: (408) 247-6395



_______________________________________________
lwip-users mailing list
[hidden email]
http://lists.nongnu.org/mailman/listinfo/lwip-users

_______________________________________________
lwip-users mailing list
[hidden email]
http://lists.nongnu.org/mailman/listinfo/lwip-users
Reply | Threaded
Open this post in threaded view
|

Re: Re: lwIP Checksum routine

timmy brolin
In reply to this post by Sathya Thammanur
The checksum routine should really be written in assembly. By writing it in assembly you can take advantage of the carry flag. This is not possible in C.

A very efficient assembly version will first load a big chunk of data into the registers using a "load multiple" instruction, then add all the 16 or 32bit registers using a "add with carry" instruction.
(then loop as many times as necessary)

Processors with 32bit "add with carry" instructions can do a very fast checksum computation using this method, but even 16bit "add with carry" instructions yield good results.

If you are looking for other things to optimise... Make sure routines such as memcopy and setmem are performed using either DMA or "load/store multiple" assembly instructions.

/Timmy Brolin

-----Original Message-----
From: "Ashutosh Srivastava" <[hidden email]>
To: "Mailing list for lwIP users" <[hidden email]>
Date: Tue, 15 Nov 2005 12:26:13 +0530
Subject: Re: [lwip-users] lwIP Checksum routine

E-mail signatureThanks for this optimization info. I have already started on coding the
checksum computation in my processor assembly.

Can anyone suggest any other critical part of LWIP which gives
performance enhancement when optimized in assembly?

Thanks,
Ashutosh
  ----- Original Message -----
  From: Jim Gibbons
  To: Mailing list for lwIP users
  Sent: Tuesday, November 15, 2005 4:52 AM
  Subject: Re: [lwip-users] lwIP Checksum routine


  We did an optimization for one port (NiosII).  This is very CPU dependent.  In our particular case, we did better with 16-bit accesses owing to a slow shifter.  We did the best by handling 8 half-words in one pass of an outer loop.  This allowed us to use small constant offsets that could be encoded in the load instructions, e.g., acc += data[0]; acc += data[1]; etc.  The loop overheads and the pointer update (data += 8) became a much smaller fraction of the CPU time taken.

  But, as I said, this stuff is very CPU dependent.  Considering that, I think that the core code is as it should be.  

  It's a simple thing to change for your particular CPU, so I would urge you to do so.  I would also urge you to try a couple of different things and measure your results.  We were surprised when we found that full word accesses weren't good for us, and you may find some surprising things with your CPU.

  You might also want to check your ethernet chip.  Some of the newer ones can assist you at the time of transmission.

  Good luck!

  Sathya Thammanur wrote:
    Hi all,
    The lwip_chksum() function in lwip/src/core/inet.c seems to be unoptimized. This is doing halfword reads and additions. Wouldnt it be better to do word accesses and hence additions? There would be some prologue and epilogue code to checks for bringing the buffer to halfword->word boundaries. HAs anyone tried doing the same for any of their ports? Or am I missing something out here?

    Thanks,
    Sathya


----------------------------------------------------------------------------
_______________________________________________
lwip-users mailing list
[hidden email]
http://lists.nongnu.org/mailman/listinfo/lwip-users

  --
        Jim Gibbons
       [hidden email]

        Gibbons and Associates, Inc.
       TEL: (408) 984-1441

        900 Lafayette, Suite 704, Santa Clara, CA
       FAX: (408) 247-6395






------------------------------------------------------------------------------


  _______________________________________________
  lwip-users mailing list
  [hidden email]
  http://lists.nongnu.org/mailman/listinfo/lwip-users




_______________________________________________
lwip-users mailing list
[hidden email]
http://lists.nongnu.org/mailman/listinfo/lwip-users