Difference between revisions of "Different implementations of a simple Collatz iterator and their performance"

From ScienceZero
Jump to: navigation, search
Line 22: Line 22:
  
 
== Visual Basic 2008 ==
 
== Visual Basic 2008 ==
The code is compiled to common intermediate language that is made executable by the .net just in time compiler. The advantage is that the JIT compiler can optimize the code for the specific CPU at runtime. In this case the slower CPU with 64 bit registers is the fastest because of the optimization. The code should also run unmodified on Linux with [http://www.mono-project.com/Main_Page Mono] installed. The multithreaded version uses Microsoft Parallel Extensions to .NET Framework 3.5 and required almost no change to the program.
+
The code is compiled to common intermediate language that is made executable by the .net just in time compiler. The advantage is that the JIT compiler can optimize the code for the specific CPU at runtime. In this case the slower CPU with 64 bit registers is the fastest because of the optimization. The code should also run unmodified on Linux with [http://www.mono-project.com/Main_Page Mono] installed. The multithreaded version uses Microsoft [http://www.microsoft.com/uk/msdn/screencasts/screencast/290/Intro-to-Parallel-Extensions-to-the-NET-Framework.aspx Parallel Extensions] to .NET Framework 3.5 and required almost no change to the program.
  
 
=== Vista 64, 2.4 GHz core 2 Q6600 ===
 
=== Vista 64, 2.4 GHz core 2 Q6600 ===
Line 37: Line 37:
  
 
== x86 assembly language ==
 
== x86 assembly language ==
Because of the suspected low quality of 32 bit code generated by modern C++ compilers a hand crafted assembly version was made for comparison.
+
Because of the suspected low quality of 32 bit code generated by modern C++ compilers a hand crafted assembly version was made for comparison. The performance gain was made by keeping the value in a 64 bit MMX register and exploiting some statistical properties of thg ealgorithm that is not obvious to a normal compiler.
 +
 
 
=== Vista 32, 2.66 GHz core 2 E6750 ===
 
=== Vista 32, 2.66 GHz core 2 E6750 ===
 
*55.2s - optimized, not unrolled
 
*55.2s - optimized, not unrolled
Line 49: Line 50:
  
 
== BSGP ==
 
== BSGP ==
 
+
The [http://www.kunzhou.net/#BSGP Bulk-Synchronous GPU Programming] compiler is advanced and is capable of impressive results but the documentation is severely lacking and the future of the compiler seems unclear.
 
=== GeForce 8800 GTS 512 MB (G92) 1.625 GHz  ===
 
=== GeForce 8800 GTS 512 MB (G92) 1.625 GHz  ===
 
* 2.04s - 32768 threads
 
* 2.04s - 32768 threads

Revision as of 03:38, 4 January 2009

The Collatz conjecture was proposed by Lothar Collatz in 1937. The conjecture is also known as the 3n + 1 conjecture.

The procedure is that if n is divisible by two then divide by two, else multiply by 3 and add 1, iterate until n reaches 1. The unproven conjecture is that for all values of n the procedure will always reach 1.

The benchmark times the iteration of the 226 first values of n.

Visual C++

Vista 64, 2.4 GHz core 2 Q6600

  • 22.1s - 64 bit exe, one thread, unrolled 1000 times
  • 30.5s - 64 bit exe, one thread, unrolled 10 times
  • 45.9s - 32 bit exe, one thread, unrolled 1000 times
  • 49.2s - 64 bit exe, one thread, not unrolled
  • 53.3s - 32 bit exe, one thread, unrolled 10 times
  • 76.1s - 32 bit exe, one thread, not unrolled


Vista 32, 2.66 GHz core 2 E6750

  • 41.3 - 32 bit exe, one thread, unrolled 1000 times
  • 47.7 - 32 bit exe, one thread, unrolled 10 times
  • 65.6 - 32 bit exe, one thread, not unrolled


Visual Basic 2008

The code is compiled to common intermediate language that is made executable by the .net just in time compiler. The advantage is that the JIT compiler can optimize the code for the specific CPU at runtime. In this case the slower CPU with 64 bit registers is the fastest because of the optimization. The code should also run unmodified on Linux with Mono installed. The multithreaded version uses Microsoft Parallel Extensions to .NET Framework 3.5 and required almost no change to the program.

Vista 64, 2.4 GHz core 2 Q6600

  • 55.4s - .net exe, one thread
  • 11.3s - .net exe, four threads, unrolled 100 times


Vista 32, 2.66 GHz core 2 E6750

  • 66.4s - .net exe, one thread
  • 42.9s - .net exe, one thread, unrolled 100 times
  • 33.6s - .net exe, two threads
  • 22.9s - .net exe, two threads, unrolled 100 times


x86 assembly language

Because of the suspected low quality of 32 bit code generated by modern C++ compilers a hand crafted assembly version was made for comparison. The performance gain was made by keeping the value in a 64 bit MMX register and exploiting some statistical properties of thg ealgorithm that is not obvious to a normal compiler.

Vista 32, 2.66 GHz core 2 E6750

  • 55.2s - optimized, not unrolled
  • 26.4s - optimized, unrolled 10 times


CUDA

GeForce 8800 GTS 512 MB (G92) 1.625 GHz

  • 2.07s - 128 blocks x 256 threads


BSGP

The Bulk-Synchronous GPU Programming compiler is advanced and is capable of impressive results but the documentation is severely lacking and the future of the compiler seems unclear.

GeForce 8800 GTS 512 MB (G92) 1.625 GHz

  • 2.04s - 32768 threads


Conclusion

External links