Email: Password: Remember Me | Create Account (Free)

Back to Subject List

Old thread has been locked -- no new posts accepted in this thread
Jan Waclawek
08/20/05 08:54
Read: 890 times

#99736 - tradeoff
Responding to: Maarten Brock's previous message
There is usually a tradeoff between code size and speed.
For example, algorithm-based CRC-16 will take up around 20 asm lines (cca 30 bytes); the table driven version takes up 512 bytes on the tables alone... Here, the increase in code size is less pronounced, but the "tricks" certainly take up some code space.

My "fast" solution (I called it "behemoth") is huge compared to "standard code", as it is a quite enormous tree with a different branch almost for each input.
It started with dividing the shift into bigger parts according to '51 capabilities, performed one after the other: first the byte-shifts, then nibble-shifts and finally the bit-shifts (#1). Then I made up the "tree", so that the bytes which are completely zero don't get shifted by nibbles and bits (#2). The final version contains some tune-ups, e.g. replacing two clr c by one anl Rx,#3Fh (and two other "tricks" - see description in the header) (#3). Here is the evolution of times and size:
         -------- cycles ------|- bytes -
         worst   best   average   size
#1        81      20      50.6     77
#2        69      18      34.8    208
#3        64      18      32.5    204
But nowadays we see increasing amount of code memory size in microcontrollers, so maybe it's time to change strategy.
Btw. isn't it possible make the choice of "strategy" (speed vs. size) an option in the compilers?

One more word of caution - the solutions are "speed-optimal" for standard 12-clocker '51s and the 2- and 6-clockers, which have the same instruction-cycle structure as the standard. The 4- and 1-clockers have different munber of instruction cycles per various instruction groups. As my "behemoth" uses quite a lot of jumps - and jumps execute longer on singleclockers than other instructions - and also might spoil the jump-cache on the >=40MHz variants (SiLabs, uPSD34xx). Also mutiply and divide tend to execute significantly longer on singleclockers compared to other instructions, so it might turn out, that the "conventional" solution is comparable in terms of execution time to these "tuned-up"'s.

Jan Waclawek

PS. IMHO you should ask Craig as per usage of the results in SDCC - although IANAL.
PS2. Craig, isn't it possible to see the other solutions, too?

List of 19 messages in thread
First challenge done, new challenge up      Craig Steiner      08/13/05 20:05      
   Seems about right to me...      Graham Cole      08/15/05 03:33      
      2 weeks?        Jan Waclawek      08/15/05 03:39      
   re:challenge      Jacob Boyce      08/15/05 09:30      
      "move the data intelligently"      Dan Henry      08/15/05 10:51      
         re:      Jacob Boyce      08/15/05 11:19      
            Overlapping data is part of the challeng      Dan Henry      08/15/05 12:15      
               Yep!      Craig Steiner      08/15/05 12:19      
            Should work      Craig Steiner      08/15/05 12:17      
   And the Winner is...      Maarten Brock      08/19/05 02:11      
      A worthy winner      Graham Cole      08/19/05 08:54      
         Yes      Craig Steiner      08/19/05 11:23      
            one + one      Jan Waclawek      08/20/05 06:27      
         Note taken      Maarten Brock      08/20/05 08:19      
            tradeoff      Jan Waclawek      08/20/05 08:54      
      exec time and size      Jan Waclawek      08/19/05 11:06      
         exec time and size II        Jan Waclawek      08/20/05 09:01      
   Public domain....      Graham Cole      08/24/05 05:14      
      Open source?      José Félix Díaz Ivorra      08/24/05 08:24      

Back to Subject List