RE : Optimizations for applications requiring limitedfunctionality.

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

RE : Optimizations for applications requiring limitedfunctionality.

Frédéric BERNON
Hi Roger,

>I have noticed a decrease in performance (about 40%)
40% ???? Was is this measure ? Max bandwidth on output, number of cycles used, footprint? If I understand what you wrote, it was on max bandwidth? And just due to statistics? Seems strange...

About reducing code size ("raw" API, ARP, ICMP echo, TCP/IP, and UDP), I think you can set options like this (some are based on last cvs), but read opt.h and CHANGELOG to get more details :

#define SYS_LIGHTWEIGHT_PROT            1 /* I suppose you are in single thread model, or you got your own safe-thread feature */
#define NO_SYS                          1 /* Same, Simon Goldschmidt comment that like "NO_SYS=1 means raw-API/polling only and NO_SYS=0 means netconn/socket-API & tcpip_thread()", and it's true in most of cases */
#define MEM_LIBC_MALLOC                 1 /* Disable if you don't got a c runtime, or if it's not efficient */
#define MEMP_SANITY_CHECK               0 /* to use on debug mode, if you think  you got corrupt memory */
#define LWIP_ARP                        1 /* Need for you, if you have an arp device (mainly ethernet), of course */
#define ARP_QUEUEING                    0 /* Reduce code (note if you use it that this feature got several patch since 1.2.0) */
#define ETHARP_TRUST_IP_MAC             0 /* Will reduce cycles consume if your device receive lot of packets */
#define ETHARP_TCPIP_INPUT              0 /* Not need if you use rawapi */
#define ETHARP_TCPIP_ETHINPUT           0 /* Not need if you use rawapi */
#define IP_FORWARD                      0 /* Not need if you don't do a router or not use loopif */
#define IP_OPTIONS                      1 /* 1, except if your device is deployed on a network where you don't have a FULL control */
#define IP_REASSEMBLY                   0 /* Will reduce footprint, but you can't receive fragmented packets */
#define IP_FRAG                         0 /* Will reduce footprint, but you can't send fragmented packets */
#define LWIP_RAW                        0 /* Warning, this is for raw pcb (use by example if you want to create a ping or tracert tools) and not for what is called "raw api" (which is also called low-level "core" / "callback" or native api)*/
#define LWIP_DHCP                       0 /* You don't say you need it */
#define LWIP_SNMP                       0 /* You don't say you need it */
#define LWIP_IGMP                       0 /* You don't say you need it */
#define LWIP_UDP                        1 /* You say you need it */
#define LWIP_TCP                        1 /* You say you need it */

#define LWIP_NETIF_HOSTNAME             0 /* You don't say you need it */
#define LWIP_NETIF_API                  0 /* Can be used with sequential api */
#define LWIP_NETIF_CALLBACK             0 /* You don't say you need it */
#define LWIP_HAVE_LOOPIF                0 /* You don't say you need it */

#define LWIP_EVENT_API                  0 /* I suppose you use CALLBACK_API, but set to 0 if you need it */

#define LWIP_COMPAT_SOCKETS             0 /* You don't need it */
#define LWIP_POSIX_SOCKETS_IO_NAMES     0 /* You don't need it */
#define LWIP_TCP_KEEPALIVE              0 /* You don't need it */
#define LWIP_SO_RCVTIMEO                0 /* You don't need it */
#define SO_REUSE                        0 /* Don't work */
#define LWIP_STATS                      0 /* You already said you disable it */
#define PPP_SUPPORT                     0 /* You don't say you need it */

You can try this, or wait other developers comments... Note I don't have set any value about sizes or delays, which have to be set to get the performance you need...

====================================
Frédéric BERNON
HYMATOM SA
Chef de projet informatique
Microsoft Certified Professional
Tél. : +33 (0)4-67-87-61-10
Fax. : +33 (0)4-67-70-85-44
Email : [hidden email]
Web Site : http://www.hymatom.fr 
====================================
P Avant d'imprimer, penser à l'environnement
 


-----Message d'origine-----
De : lwip-users-bounces+frederic.bernon=[hidden email] [mailto:lwip-users-bounces+frederic.bernon=[hidden email]] De la part de Roger Cover
Envoyé : samedi 14 avril 2007 01:32
À : [hidden email]
Objet : [lwip-users] Optimizations for applications requiring limitedfunctionality.


Greetings,

I have completed my upgrade to version 1.2.0 (from 0.6.3). The reliability of my system is much improved. The developers have done a good job increasing the robustness of the library.

Now that I have version 1.2.0 running, I have noticed a decrease in performance (about 40%). Part of the decrease I initially noticed was because my old lwipopts.h file used an old (and no longer correct) method to turned statistics collection off. I am using the "raw" API and I only need ARP, ICMP echo, TCP/IP, and UDP. The major performance issue is only in UDP.

This brings me to my question: I already found how to turn off the statistics and "raw" packet code. What else can I turn off in lwipopts.h to remove things I don't need? Version 1.2.0 has a lot of features that 0.6.3 did not have, and any help jump-starting my performance improvement quest will be greatly appreciated.

Regards,
Roger W. Cover
Spectral Instruments, Inc.
420 N. Bonita Ave.
Tucson, AZ 85745
Voice: 520-884-8821 ext. 144
FAX: 520-884-8803


_______________________________________________
lwip-users mailing list
[hidden email] http://lists.nongnu.org/mailman/listinfo/lwip-users

_______________________________________________
lwip-users mailing list
[hidden email]
http://lists.nongnu.org/mailman/listinfo/lwip-users

=?iso-8859-1?Q?Fr=E9d=E9ric_BERNON=2Evcf?= (810 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: RE : Optimizations for applications requiring limitedfunctionality.

Andrew Lentvorski
Frédéric BERNON wrote:

> #define SO_REUSE                        0 /* Don't work */

What's the problem with SO_REUSE?

-a


_______________________________________________
lwip-users mailing list
[hidden email]
http://lists.nongnu.org/mailman/listinfo/lwip-users
Reply | Threaded
Open this post in threaded view
|

RE: RE : Optimizations for applications requiringlimitedfunctionality.

Roger Cover
In reply to this post by Frédéric BERNON
Greetings Frédéric,

The performance decrease I measured was relative to version 0.6.3 of lwIP. The measurement is the total transfer time for a 33560192 byte data set from my instrument to an application on my PC using TCP/IP. The time was 13.98 seconds for lwIP 0.6.3 and 19.56 seconds for lwIP 1.2.0. I am using the same "driver", with minor modifications to accommodate the API changes in the lwIP code from 0.6.3 to 1.2.0, and the same applications on the PC and my embedded PPC405 processor. Removing the statistics improved the performance, but did not recover the entire 40%.

I will let you know what improvements I get from the lwipopts.h changes you suggested.

Regards,
Roger
-----Original Message-----
From: lwip-users-bounces+rcover=[hidden email] [mailto:lwip-users-bounces+rcover=[hidden email]] On Behalf Of Frédéric BERNON
Sent: Saturday, April 14, 2007 3:03 AM
To: Mailing list for lwIP users
Subject: RE : [lwip-users] Optimizations for applications requiringlimitedfunctionality.

Hi Roger,

>I have noticed a decrease in performance (about 40%)
40% ???? Was is this measure ? Max bandwidth on output, number of cycles used, footprint? If I understand what you wrote, it was on max bandwidth? And just due to statistics? Seems strange...



_______________________________________________
lwip-users mailing list
[hidden email]
http://lists.nongnu.org/mailman/listinfo/lwip-users
Reply | Threaded
Open this post in threaded view
|

Re: RE : Optimizations for applications requiringlimitedfunctionality.

timmy brolin
If you want to increase performance in a limited functionality application, perhaps you don't need the UDP checksum?
I think most of the CPU cycles related to TCP or UDP communication are consumed in the checksum calculation.

/Timmy

Roger Cover wrote:
Greetings Frédéric,

The performance decrease I measured was relative to version 0.6.3 of lwIP. The measurement is the total transfer time for a 33560192 byte data set from my instrument to an application on my PC using TCP/IP. The time was 13.98 seconds for lwIP 0.6.3 and 19.56 seconds for lwIP 1.2.0. I am using the same "driver", with minor modifications to accommodate the API changes in the lwIP code from 0.6.3 to 1.2.0, and the same applications on the PC and my embedded PPC405 processor. Removing the statistics improved the performance, but did not recover the entire 40%.

I will let you know what improvements I get from the lwipopts.h changes you suggested.

Regards,
Roger
-----Original Message-----
From: [hidden email] [[hidden email]] On Behalf Of Frédéric BERNON
Sent: Saturday, April 14, 2007 3:03 AM
To: Mailing list for lwIP users
Subject: RE : [lwip-users] Optimizations for applications requiringlimitedfunctionality.

Hi Roger,

  
I have noticed a decrease in performance (about 40%)
    
40% ???? Was is this measure ? Max bandwidth on output, number of cycles used, footprint? If I understand what you wrote, it was on max bandwidth? And just due to statistics? Seems strange...



_______________________________________________
lwip-users mailing list
[hidden email]
http://lists.nongnu.org/mailman/listinfo/lwip-users
 

  

_______________________________________________
lwip-users mailing list
[hidden email]
http://lists.nongnu.org/mailman/listinfo/lwip-users
Reply | Threaded
Open this post in threaded view
|

RE: Optimizations forapplications requiring limited functionality.

Roger Cover
Greetings Timmy,
 
I have rewritten the checksum routine in assembler for my processor (as recommended by Adam Dunkels in the thread "Gigabit Ethernet and lwIP"). It is not my experience that this is the largest consumer of CPU cycles. ip_output_if() seems to be where my application spends over 80% of its time on UDP transfers, but this is called after the checksum calculation is completed. I have not profiled the TCP/IP transfer yet, just measured its total time.
 
Regards,
Roger
________________________________

From: lwip-users-bounces+rcover=[hidden email] [mailto:lwip-users-bounces+rcover=[hidden email]] On Behalf Of Timmy Brolin
Sent: Monday, April 16, 2007 12:11 PM
To: Mailing list for lwIP users
Subject: Re: RE : [lwip-users] Optimizations forapplications requiringlimitedfunctionality.


If you want to increase performance in a limited functionality application, perhaps you don't need the UDP checksum?
I think most of the CPU cycles related to TCP or UDP communication are consumed in the checksum calculation.

/Timmy

Roger Cover wrote:

        Greetings Frédéric,
       
        The performance decrease I measured was relative to version 0.6.3 of lwIP. The measurement is the total transfer time for a 33560192 byte data set from my instrument to an application on my PC using TCP/IP. The time was 13.98 seconds for lwIP 0.6.3 and 19.56 seconds for lwIP 1.2.0. I am using the same "driver", with minor modifications to accommodate the API changes in the lwIP code from 0.6.3 to 1.2.0, and the same applications on the PC and my embedded PPC405 processor. Removing the statistics improved the performance, but did not recover the entire 40%.
       
        I will let you know what improvements I get from the lwipopts.h changes you suggested.
       
        Regards,
        Roger
        -----Original Message-----
        From: lwip-users-bounces+rcover=[hidden email] [mailto:lwip-users-bounces+rcover=[hidden email]] On Behalf Of Frédéric BERNON
        Sent: Saturday, April 14, 2007 3:03 AM
        To: Mailing list for lwIP users
        Subject: RE : [lwip-users] Optimizations for applications requiringlimitedfunctionality.
       
        Hi Roger,
       
         

                I have noticed a decrease in performance (about 40%)
                   

        40% ???? Was is this measure ? Max bandwidth on output, number of cycles used, footprint? If I understand what you wrote, it was on max bandwidth? And just due to statistics? Seems strange...
       
       
       
        _______________________________________________
        lwip-users mailing list
        [hidden email]
        http://lists.nongnu.org/mailman/listinfo/lwip-users
         
       
         



_______________________________________________
lwip-users mailing list
[hidden email]
http://lists.nongnu.org/mailman/listinfo/lwip-users
Reply | Threaded
Open this post in threaded view
|

Re: Optimizations forapplications requiring limited functionality.

timmy brolin
That sounds very strange. Are your sure?
When I rewrote the checksum routine in assembly, I doubled the total
performance of our TCP/IP stack.
For UDP communications, the only loop of any significance I can think of
is the checksum routine.
Regards,
Timmy Brolin


Roger Cover wrote:

>Greetings Timmy,
>
>I have rewritten the checksum routine in assembler for my processor (as recommended by Adam Dunkels in the thread "Gigabit Ethernet and lwIP"). It is not my experience that this is the largest consumer of CPU cycles. ip_output_if() seems to be where my application spends over 80% of its time on UDP transfers, but this is called after the checksum calculation is completed. I have not profiled the TCP/IP transfer yet, just measured its total time.
>
>Regards,
>Roger
>________________________________
>
>From: lwip-users-bounces+rcover=[hidden email] [mailto:lwip-users-bounces+rcover=[hidden email]] On Behalf Of Timmy Brolin
>Sent: Monday, April 16, 2007 12:11 PM
>To: Mailing list for lwIP users
>Subject: Re: RE : [lwip-users] Optimizations forapplications requiringlimitedfunctionality.
>
>
>If you want to increase performance in a limited functionality application, perhaps you don't need the UDP checksum?
>I think most of the CPU cycles related to TCP or UDP communication are consumed in the checksum calculation.
>
>/Timmy
>
>Roger Cover wrote:
>
> Greetings Frédéric,
>
> The performance decrease I measured was relative to version 0.6.3 of lwIP. The measurement is the total transfer time for a 33560192 byte data set from my instrument to an application on my PC using TCP/IP. The time was 13.98 seconds for lwIP 0.6.3 and 19.56 seconds for lwIP 1.2.0. I am using the same "driver", with minor modifications to accommodate the API changes in the lwIP code from 0.6.3 to 1.2.0, and the same applications on the PC and my embedded PPC405 processor. Removing the statistics improved the performance, but did not recover the entire 40%.
>
> I will let you know what improvements I get from the lwipopts.h changes you suggested.
>
> Regards,
> Roger
> -----Original Message-----
> From: lwip-users-bounces+rcover=[hidden email] [mailto:lwip-users-bounces+rcover=[hidden email]] On Behalf Of Frédéric BERNON
> Sent: Saturday, April 14, 2007 3:03 AM
> To: Mailing list for lwIP users
> Subject: RE : [lwip-users] Optimizations for applications requiringlimitedfunctionality.
>
> Hi Roger,
>
>  
>
> I have noticed a decrease in performance (about 40%)
>    
>
> 40% ???? Was is this measure ? Max bandwidth on output, number of cycles used, footprint? If I understand what you wrote, it was on max bandwidth? And just due to statistics? Seems strange...
>
>
>
> _______________________________________________
> lwip-users mailing list
> [hidden email]
> http://lists.nongnu.org/mailman/listinfo/lwip-users
>
>
>  
>
>
>
>_______________________________________________
>lwip-users mailing list
>[hidden email]
>http://lists.nongnu.org/mailman/listinfo/lwip-users
>
>
>  
>


_______________________________________________
lwip-users mailing list
[hidden email]
http://lists.nongnu.org/mailman/listinfo/lwip-users
Reply | Threaded
Open this post in threaded view
|

RE: Optimizations forapplications requiring limited functionality.

Roger Cover
Greetings All,

I have made a discovery about my problem. I upgraded to version 9.1 of the Xilinx EDK at the same time I upgraded to lwIP 1.2.0. This EDK upgrade changed the GNU compiler suite that I am using. The new version of the compiler is the source of my problem. It generates much less efficient code, even with optimizations turned up.

The time spent in the lwIP library (for my UDP transfer) is now only 4.2% of the total transfer time. The bulk of the transfer time is in the Xilinx driver code (82%). The suggestions I received (thanks to Frédéric Bernon) to remove unused options from lwIP did reduce the time used by the lwIP library. Unfortunately, that was not my problem.

I am not sure why the new version of the compiler is so much less efficient. The old compiler produced code that transferred my 33554432-byte dataset in 5.8 seconds. The code produced with new compiler takes 8.8 seconds (62.5% of the throughput performance). I will be looking into that.

My mistake was in presuming that the same driver source code would produce the same executable code under EDK 8.1 and EDK 9.1. That led me to the incorrect conclusion that the difference in performance was in the new lwIP library. Thank you all for your help.

Regards,
Roger W. Cover


_______________________________________________
lwip-users mailing list
[hidden email]
http://lists.nongnu.org/mailman/listinfo/lwip-users
Reply | Threaded
Open this post in threaded view
|

RE: Optimizations forapplications requiring limitedfunctionality.

Matthew Yingling
----Original Message----
From: lwip-users-bounces+matt=[hidden email]
[mailto:lwip-users-bounces+matt=[hidden email]]On Behalf Of
Roger Cover
Sent: Wednesday, April 18, 2007 1:03 PM
To: Mailing list for lwIP users
Subject: RE: [lwip-users] Optimizations forapplications requiring
limitedfunctionality.

> Greetings All,
>
> I have made a discovery about my problem. I upgraded to version 9.1
> of the Xilinx EDK at the same time I upgraded to lwIP 1.2.0. This EDK
> upgrade changed the GNU compiler suite that I am using. The new
> version of the compiler is the source of my problem. It generates
> much less efficient code, even with optimizations turned up.
>
> The time spent in the lwIP library (for my UDP transfer) is now only
> 4.2% of the total transfer time. The bulk of the transfer time is in
> the Xilinx driver code (82%). The suggestions I received (thanks to
> Frédéric Bernon) to remove unused options from lwIP did reduce the
> time used by the lwIP library. Unfortunately, that was not my
> problem.
>
> I am not sure why the new version of the compiler is so much less
> efficient. The old compiler produced code that transferred my
> 33554432-byte dataset in 5.8 seconds. The code produced with new
> compiler takes 8.8 seconds (62.5% of the throughput performance). I
> will be looking into that.
>
> My mistake was in presuming that the same driver source code would
> produce the same executable code under EDK 8.1 and EDK 9.1. That led
> me to the incorrect conclusion that the difference in performance was
> in the new lwIP library. Thank you all for your help.
>
> Regards,
> Roger W. Cover
>

Hi Roger,

Xilinx upgraded their compiler from GCC 3.4.1 to 4.1.1 between EDK 8.2 and
9.1.  If you are confident the compiler is creating less efficient code, I
recommend you open a case with them.  I have no experience with the new EDK,
but I suspect the new compiler may not be fully optimized yet.

Matthew



_______________________________________________
lwip-users mailing list
[hidden email]
http://lists.nongnu.org/mailman/listinfo/lwip-users
Reply | Threaded
Open this post in threaded view
|

RE: Optimizations forapplications requiring limitedfunctionality.

Matthew Yingling
In reply to this post by Roger Cover
----Original Message----
From: lwip-users-bounces+matt=[hidden email]
[mailto:lwip-users-bounces+matt=[hidden email]]On Behalf Of
Roger Cover
Sent: Wednesday, April 18, 2007 1:03 PM
To: Mailing list for lwIP users
Subject: RE: [lwip-users] Optimizations forapplications requiring
limitedfunctionality.

> Greetings All,
>
> I have made a discovery about my problem. I upgraded to version 9.1
> of the Xilinx EDK at the same time I upgraded to lwIP 1.2.0. This EDK
> upgrade changed the GNU compiler suite that I am using. The new
> version of the compiler is the source of my problem. It generates
> much less efficient code, even with optimizations turned up.
>
> The time spent in the lwIP library (for my UDP transfer) is now only
> 4.2% of the total transfer time. The bulk of the transfer time is in
> the Xilinx driver code (82%). The suggestions I received (thanks to
> Frédéric Bernon) to remove unused options from lwIP did reduce the
> time used by the lwIP library. Unfortunately, that was not my
> problem.
>
> I am not sure why the new version of the compiler is so much less
> efficient. The old compiler produced code that transferred my
> 33554432-byte dataset in 5.8 seconds. The code produced with new
> compiler takes 8.8 seconds (62.5% of the throughput performance). I
> will be looking into that.
>
> My mistake was in presuming that the same driver source code would
> produce the same executable code under EDK 8.1 and EDK 9.1. That led
> me to the incorrect conclusion that the difference in performance was
> in the new lwIP library. Thank you all for your help.
>
> Regards,
> Roger W. Cover
>

Hi Roger,

Xilinx upgraded their compiler from GCC 3.4.1 to 4.1.1 between EDK 8.2 and
9.1.  If you are confident the compiler is creating less efficient code, I
recommend you open a case with them.  I have no experience with the new EDK,
but I suspect the new compiler may not be fully optimized yet.

Matthew



_______________________________________________
lwip-users mailing list
[hidden email]
http://lists.nongnu.org/mailman/listinfo/lwip-users