Write programs with cython for higher performance

My favorite is Python, its code is elegant and practical, but unfortunately it is slower than most languages ​​in terms of speed. Most people also think that the speed and ease of use are bipolar. It is really painful to write C code. Cython tries to eliminate this duality and lets you have both Python syntax and C data types and functions -- both of which are the best in the world. Remember, I am by no means an expert in this area. This is my first note on Cython's real experience:

EDIT: Based on some of the feedback I received, it seems a bit confusing - Cython is used to generate C extensions instead of stand-alone programs. All acceleration is done for a function of an existing Python application. There is no use of C or Lisp to rewrite the entire application, and there is no handwritten C extension. Just use a simple way to integrate C's speed and C data types into Python functions.
Write programs with cython for higher performance

Now we can say that we can make the following great_circle function faster. The so-called great_circle is the problem of calculating the distance between two points along the surface of the Earth:

Import math

Def great_circle ( lon1 , lat1 , lon2 , lat2 ) :

     Radius = 3956 #miles

     x = math . pi / 180.0

     a = ( 90.0 - lat1 ) * ( x )

     b = ( 90.0 - lat2 ) * ( x )

     Theta = ( lon2 - lon1 ) * ( x )

     c = math . acos (( math . cos ( a ) * math . cos ( b )) +

                   ( math . sin ( a ) * math . sin ( b ) * math . cos ( theta )))

     Return radius* c

Let's call it 500,000 times and measure its time:

Import timeit

Lon1 , lat1 , lon2 , lat2 = - 72.345 , 34.323 , - 61.823 , 54.826

Num = 500000

t = timeit . Timer ( "p1.great_circle(%f,%f,%f,%f)" % ( lon1 , lat1 , lon2 , lat2 ),

                        "import p1" )

Print "Pure python function" , t . timeit ( num ), "sec"

About 2.2 seconds. It is too slow!

Let's try to quickly rewrite it with Cython and see if there is a difference:

Import math

Def great_circle ( float lon1 , float lat1 , float lon2 , float lat2 ) :

     cdef float radius = 3956.0

     Cdef float pi = 3.14159265

     Cdef float x = pi / 180.0

     Cdef float a , b , theta , c

     a = ( 90.0 - lat1 ) * ( x )

     b = ( 90.0 - lat2 ) * ( x )

     Theta = ( lon2 - lon1 ) * ( x )

     c = math . acos (( math . cos ( a ) * math . cos ( b )) + ( math . sin ( a ) * math . sin ( b ) * math . cos ( theta )))

     Return radius* c

Note that we still importmath - cython lets you mix and match Python and C data types to some extent. The conversion is automatic, but not without cost. What we did in this example is to define a Python function, declare its input parameters to be a float type, and declare the type as a C floating point data type for all variables. The calculation part still uses Python's math module.

Now we need to convert it to C code and compile it to a Python extension. The best way to do this is to write a release script called setup.py. But now we use the manual way to understand the witchcraft:

# this will create a c1.c file - the C source code to build a python extension

Cython c1 . pyx   

  

# Compile the object file

Gcc - c - fPIC - I / usr / include / python2 . 5 / c1 . c   

  

# Link it into a shared library

Gcc - shared c1 . o - o c1 . so

Now you should have a c1.so (or .dll) file that can be imported by Python. Run it now:

     t = timeit . Timer ( "c1.great_circle(%f,%f,%f,%f)" % ( lon1 , lat1 , lon2 , lat2 ),

                      "import c1" )

     Print "Cython function (still using python math)" , t . timeit ( num ), "sec"

About 1.8 seconds. There is no such big performance boost as we initially expected. Using the math module of python should be the bottleneck. Now let's replace it with the C standard library:

Cdef extern from "math.h" :

     Float cosf ( float theta )

     Float sinf ( float theta )

     Float acosf ( float theta )

Def great_circle ( float lon1 , float lat1 , float lon2 , float lat2 ) :

     cdef float radius = 3956.0

     Cdef float pi = 3.14159265

     Cdef float x = pi / 180.0

     Cdef float a , b , theta , c

     a = ( 90.0 - lat1 ) * ( x )

     b = ( 90.0 - lat2 ) * ( x )

     Theta = ( lon2 - lon1 ) * ( x )

     c = acosf (( cosf ( a ) * cosf ( b )) + ( sinf ( a ) * sinf ( b ) * cosf ( theta )))

     Return radius* c

Corresponding to import math, we use cdef extern to declare the function from the specified header file (here is the math.h using the C standard library). We replaced the costly Python function, then built a new shared library and retested it.

     t = timeit . Timer ( "c2.great_circle(%f,%f,%f,%f)" % ( lon1 , lat1 , lon2 , lat2 ),

                      "import c2" )

     Print "Cython function (using trig function from math.h)" , t . timeit ( num ), "sec"

I like it a bit now? 0.4 seconds – 5 times faster than pure Python functions. What other methods can we increase the speed? C2.great_circle() is still a Python function call, which means it generates the overhead of the Python API (building parameter tuples, etc.), and if we can write a pure C function, we might be able to speed up.

Cdef extern from "math.h" :

     Float cosf ( float theta )   

     Float sinf ( float theta )   

     Float acosf ( float theta )   

  

Cdef float _great_circle ( float lon1 , float lat1 , float lon2 , float lat2 ) :

     Cdef float radius = 3956.0   

     Cdef float pi = 3.14159265   

     Cdef float x = pi / 180.0   

     Cdef float a , b , theta , c   

  

     a = ( 90.0 - lat1 ) * ( x )   

     b = ( 90.0 - lat2 ) * ( x )   

     Theta = ( lon2 - lon1 ) * ( x )   

     c = acosf (( cosf ( a ) * cosf ( b )) + ( sinf ( a ) * sinf ( b ) * cosf ( theta )))   

     Return radius * c   

  

Def great_circle ( float lon1 , float lat1 , float lon2 , float lat2 , int num ) :

     Cdef int i   

     Cdef float x   

     For i from 0 < = i < num :

         x = _great_circle ( lon1 , lat1 , lon2 , lat2 )   

     Return x

Note that we still have a Python function ( def ) that takes an extra argument num. The loop in this function uses for i from 0 < = i< num: instead of Pythonic, but much slower for i in range(num):. The real calculation is done in the C function (cdef), which returns the float type. This version is only 0.2 seconds - 10 times faster than the original Python function.

To prove that what we have done is optimized enough, you can write a small application in pure C and then measure the time:

#include

#include

#define NUM 500000

  

Float great_circle ( float lon1 , float lat1 , float lon2 , float lat2 ){   

     Float radius = 3956.0 ;   

     Float pi = 3.14159265 ;   

     Float x = pi / 180.0 ;   

     Float a , b , theta , c ;   

  

     a = ( 90.0 - lat1 ) * ( x );   

     b = ( 90.0 - lat2 ) * ( x );   

     Theta = ( lon2 - lon1 ) * ( x );   

     c = acos (( cos ( a ) * cos ( b )) + ( sin ( a ) * sin ( b ) * cos ( theta )));   

     Return radius * c ;   

}   

  

Int main () {   

     Int i ;   

     Float x ;   

     For ( i = 0 ; i < = NUM ; i ++ )   

         x = great_circle ( - 72.345 , 34.323 , - 61.823 , 54.826 );   

     Printf ( "%f" , x );   

}

Compile it with gcc -lm -octest ctest.c and test with time./ctest ... for about 0.2 seconds. This gives me confidence that my Cython extension is also very efficient relative to my C code (this is not to say that my C programming ability is very weak).

How much performance can be optimized with cython usually depends on how many loops, numeric operations, and Python function calls, which slow down the program. Some people have reported a 100 to 1000 times speed increase in some cases. As for other tasks, it may not be so useful. Keep this in mind before frantically rewriting Python code with Cython:

“We should forget small efficiency, and premature optimization is the root of all evil, 97% of cases.” – DonaldKnuth

In other words, write the program in Python first and see if it meets your needs. In most cases, its performance is good enough... but sometimes it feels slow, then use the parser to find the bottleneck function, then rewrite it with cython, and you'll get better performance soon.


Linear Encoder

Draw-wire sensors of the wire sensor series measure with high linearity across the entire measuring range and are used for distance and position measurements of 100mm up to 20,000mm. Draw-wire sensors from LANDER are ideal for integration and subsequent assembly in serial OEM applications, e.g., in medical devices, lifts, conveyors and automotive engineering.

Linear Encoder,Digital Linear Encoder,Draw Wire Sensor,1500Mm Linear Encoder

Jilin Lander Intelligent Technology Co., Ltd , https://www.jilinlandermotor.com

Posted on