I am experimenting with Cython and OpenCV and trying to benchmark the performance for image manipulation. I have tried optimizing my Cython code as much as I could, but I still get slower performance with it. I understand most of the code is executed in C because of OpenCV, yet I expected better performance for python loops using Cython. Can anyone tell me if there anything I can do to improve it? Following is my code:
# load_images.py
import cv2
from random import randint
import numpy as np
def fetch_images(n):
def get_img():
x = randint(640, 6144)
y = randint(640, 6144)
return np.random.rand(x,y, 3).astype(np.uint8)
return [get_img() for _ in range(n)]
def resize_img(img):
img = cv2.resize(img, (640, 640))
return cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
def preprocess(images):
return [resize_img(img) for img in images]
# load_images_cy.pyx
import cv2
from random import randint
import numpy as np
cimport numpy as np
cimport cython
ctypedef np.uint8_t DTYPE_t
#cython.boundscheck(False)
#cython.wraparound(False)
cdef np.ndarray[DTYPE_t, ndim=3] get_img():
cdef int x = randint(640, 6144)
cdef int y = randint(640, 6144)
return np.random.rand(x,y, 3).astype(np.uint8)
cpdef list fetch_images(int n):
cdef int _;
return [get_img() for _ in range(n)]
cdef np.ndarray[DTYPE_t, ndim=2] resize_img(np.ndarray[DTYPE_t, ndim=3] img):
cdef np.ndarray[DTYPE_t, ndim=3] im;
im = cv2.resize(img, (640, 640))
return cv2.cvtColor(im, cv2.COLOR_BGR2GRAY)
cpdef np.ndarray[DTYPE_t, ndim=3] preprocess(list images):
cdef np.ndarray[DTYPE_t, ndim=3] img;
cdef np.ndarray[DTYPE_t, ndim=3] collection = np.empty((len(images), 640, 640), dtype=np.uint8);
cdef int i;
for i, img in enumerate(images):
collection[i] = resize_img(img)
return collection
# main.py
import load_images_cy
import load_images
import timeit
images = load_images.fetch_images(20)
result_cy = timeit.timeit(lambda: load_images_cy.preprocess(images), number=20)
result_py = timeit.timeit(lambda: load_images.preprocess(images), number=20)
print(f'{result_py/result_cy} times faster')
Output:
0.9192241989059127 times faster
Cython is primarily meant for interfacing with C code, and writing Python extension modules more easily. While performance improvements can be obtained through Cython, it is not intended to be a drop-in speed-up for Python code.
PyPy, however, is intended to be a more-or-less drop-in speed-up for Python code. It provides an alternate interpreter which is generally faster than CPython, the reference/default Python implementation.
Also, your decorators here:
#cython.boundscheck(False)
#cython.wraparound(False)
cdef np.ndarray[DTYPE_t, ndim=3] get_img():
...
Only apply to get_img - not any of the other functions below. Not sure if that was intentional or not. There should not be a blank line between these.
If you want to stick with Cython, and gain performance improvements through it, consider altering the compilation options, such as providing -O2 or -O3.
Related
I am trying to use a modified version of the RANSAC regressor algorithm, but instead of changing the sklearn's function (to avoid the problems of be using sklearn 0.24.1 and to avoid some warnings from the function), I've found an algorithm in github and was trying to cythonize it to improve speed before I can make my modifications. To my surprise the speed gain was very poor, was it because the code is full of numpy calls or did I make something wrong? The following codes are the python version, cython version (with proper modifications to avoid errors):
##Python version
import numpy as np
def ransac_polyfit(x,y,order, t, n=0.8,k=100,f=0.9):
besterr = np.inf
bestfit = np.array([None])
for kk in range(k):
maybeinliers = np.random.randint(len(x), size=int(n*len(x)))
maybemodel = np.polyfit(x[maybeinliers], y[maybeinliers], order)
alsoinliers = np.abs(np.polyval(maybemodel, x)-y) < t
if sum(alsoinliers) > len(x)*f:
bettermodel = np.polyfit(x[alsoinliers], y[alsoinliers], order)
thiserr = np.sum(np.abs(np.polyval(bettermodel, x[alsoinliers])-y[alsoinliers]))
if thiserr < besterr:
bestfit = bettermodel
besterr = thiserr
return bestfit
##Cython version
cimport cython
cimport numpy as np
import numpy as np
np.import_array()
#cython.boundscheck(False)
#cython.wraparound(False)
cdef ransac_polyfit(np.ndarray[np.npy_int64, ndim=1] x,
np.ndarray[np.npy_int64, ndim=1] y,
np.npy_intp order,
np.npy_float32 t,
np.npy_float32 n=0.8,
np.npy_intp k=100,
np.npy_float32 f=0.9):
cdef np.npy_intp kk, i, ransac_control
cdef np.npy_float64 thiserr, besterr = -1.0
cdef np.ndarray[np.npy_int64, ndim=1] maybeinliers
cdef np.ndarray[np.npy_bool, ndim=1] alsoinliers
cdef np.ndarray[np.npy_float64, ndim=1] bestfit, maybemodel, bettermodel
bestfit = np.zeros(1, dtype = np.float64)
for kk in range(k):
maybeinliers = np.random.randint(len(x), size=int(n*len(x)))
maybemodel = np.polyfit(x[maybeinliers], y[maybeinliers], order)
alsoinliers = np.abs(polyval(maybemodel, x)-y) < t
if sum(alsoinliers) > len(x)*f:
bettermodel = np.polyfit(x[alsoinliers], y[alsoinliers], order)
thiserr = np.sum(np.abs(polyval(bettermodel, x[alsoinliers])-y[alsoinliers]))
if (thiserr < besterr) or (besterr == -1.0):
bestfit = bettermodel
besterr = thiserr
#Since I can't return an empty array, I had to set it as an array with a zero and the ransac_control variable will tell if the function was able or not to find a good model
if (besterr == -1.0):
ransac_control = 0
return ransac_control, bestfit
else:
ransac_control = 1
return ransac_control, bestfit
**PS: Couldn't send the image of the HTML page of the cython code because it's my first question
In my code, I am trying to define a dynamic array with changing number of the rows and columns, depends on the new conditions inside the function, meaning I might add more rows or columns. I tried to make two dimensional pointer arrays and I want to be able to pass this 2-D pointer array as an argument to a function.
This is the small part of my code:
Update: test.pyx
from libc.string cimport memset
import numpy as np
cimport numpy as np
cimport cython
from cython.view cimport array as cvarray
from libc.stdlib cimport malloc, free
from libc.math cimport log, exp
from cython_gsl cimport *
import ctypes
cdef gsl_rng *r = gsl_rng_alloc(gsl_rng_mt19937)
cdef int** zeros2(dim):
assert len(dim) == 2
cdef int i
cdef int **matrix
matrix = <int**> malloc(sizeof(int*) * dim[0])
for i from 0 <= i < dim[0]:
matrix[i] = <int*> malloc(sizeof(int) * dim[1])
memset(matrix[i], 0, sizeof(int) * dim[1])
return matrix
#cython.cdivision(True)
#cython.wraparound(False)
#cython.boundscheck(False)
cdef void generator(double* alpha,int* D, double* m):
cdef Py_ssize_t i
for i from 0 <= i < D[0]:
m[i]=gsl_ran_beta(r, alpha[0], 1)
return
#cython.cdivision(True)
#cython.boundscheck(False)
#cython.wraparound(False)
cdef void initializer(double* alpha, int* D, int* N, double* m, int** Z ):
cdef int i, j
generator(alpha, D, &m[0])
for i from 0 <= i < D[0]:
for j from 0 <= j < N[0]:
Z[j][i]= gsl_ran_bernoulli(r, m[i])
print Z[j][i]
return
def run(int n, int d, double alpha):
cdef np.ndarray[double, ndim=1, mode='c'] mu=np.empty((d,), dtype=ctypes.c_double)
cdef int **Z = zeros2((n, d))
initializer(&alpha, &d, &n, &mu[0], <int **>(&Z[0][0]) )
setup.py
from distutils.core import setup, Extension
from Cython.Build import cythonize
from numpy import get_include
import numpy
import cython_gsl
from Cython.Distutils import build_ext
ext_modules = [
Extension(
"test",
["test.pyx"],
libraries=cython_gsl.get_libraries(),
library_dirs=[cython_gsl.get_library_dir()],
include_dirs=[numpy.get_include(), cython_gsl.get_include()])
]
ext_modules = cythonize(ext_modules)
setup(
name='test',
ext_modules=ext_modules,
cmdclass={'build_ext': build_ext})
Update:
The code gets compiled but when I import the run function in python I get this error:
>>> import test
>>> test.run( 10, 4,0.9)
Segmentation fault (core dumped)
I am not sure the 2-dimensional array that I defined is the best approach to solve my problem of defining a dynamical array and what is the reason I got this error?
Any suggestions would be most welcome.
Your immediate problem is that:
<int **>(&Z[0][0])
takes the address of the first element of the first row and casts it to an int**. It's actually an int* (because it's the address of an int). Therefore the memory that initializer writes to is nonsense and you get a segmentation fault. Casts are often an indication that you're doing something wrong.
You just need to pass Z which is already an int**.
The problem is that n, d and alpha are Python variables, so &n is not something you can do. You can change run to cdef function, or maybe create a temporary version:
cdef int _n = n;
and then pass &_n
However, based on your code, what's the point of passing a pointer to these three variables, anyway? You don't modify them. You can simply pass them without a pointer.
I would like to calculate the p values of a large 2D numpy t values array. However, this takes long time and I would like to improve its speed. I tried using GSL.
Although a single gsl_cdf_tdist_P is much much faster than scipy.stats.t.sf, when iterating over the ndarray, the process is very slow. I would like help to improve this.
See the code below.
GSL_Test.pyx
import cython
cimport cython
import numpy
cimport numpy
from cython_gsl cimport *
DTYPE = numpy.float32
ctypedef numpy.float32_t DTYPE_t
cdef get_gsl_p(double t, double nu):
return (1 - gsl_cdf_tdist_P(t, nu)) * 2
#cython.boundscheck(False)
#cython.wraparound(False)
#cython.nonecheck(False)
cdef get_gsl_p_for_2D_matrix(numpy.ndarray[DTYPE_t, ndim=2] t_matrix, int n):
cdef unsigned int rows = t_matrix.shape[0]
cdef numpy.ndarray[DTYPE_t, ndim=2] out = numpy.zeros((rows, rows), dtype='float32')
cdef unsigned int row, col
for row in range(rows):
for col in range(rows):
out[row, col] = get_gsl_p(t_matrix[row, col], n-2)
return out
def get_gsl_p_for_2D_matrix_def(numpy.ndarray[DTYPE_t, ndim=2] t_matrix, int n):
return get_gsl_p_for_2D_matrix(t_matrix, n)
ipython
import GSL_Test
import numpy
import scipy.stats
a = numpy.random.rand(3544, 3544).astype('float32')
%timeit -n 1 GSL_Test.get_gsl_p_for_2D_matrix(a, 25)
1 loop, best of 3: 7.87 s per loop
%timeit -n 1 scipy.stats.t.sf(a, 25)*2
1 loop, best of 3: 4.66 s per loop
UPDATE: Adding cdef declarations I was able to reduce the computational time but not lower than scipy still. I modified the code to have the cdef declarations.
%timeit -n 1 GSL_Test.get_gsl_p_for_2D_matrix_def(a, 25)
1 loop, best of 3: 6.73 s per loop
You can get some small gain in raw performance by using a raw special function instead of stats.t.sf. Looking at the source, you find (https://github.com/scipy/scipy/blob/master/scipy/stats/_continuous_distns.py#L3849)
def _sf(self, x, df):
return sc.stdtr(df, -x)
So that you can use stdtr directly:
np.random.seed(1234)
x = np.random.random((3740, 374))
t1 = stats.t.sf(x, 25)
t2 = stdtr(25, -x)
1 loop, best of 3: 653 ms per loop
1 loop, best of 3: 562 ms per loop
If you do reach out for cython, the typed memoryview syntax often gives you faster code than the old ndarray syntax:
from scipy.special.cython_special cimport stdtr
from numpy cimport npy_intp
import numpy as np
def tsf(double [:, ::1] x, int df=25):
cdef double[:, ::1] out = np.empty_like(x)
cdef npy_intp i, j
cdef double tmp, xx
for i in range(x.shape[0]):
for j in range(x.shape[1]):
xx = x[i, j]
out[i, j] = stdtr(df, -xx)
return np.asarray(out)
Here I'm also using the cython_special interface, which is only avaialble in the dev version of scipy (http://scipy.github.io/devdocs/special.cython_special.html#module-scipy.special.cython_special), but you can use GSL if you want.
Finally, if you suspect a bottleneck in iterations, don't forget to inspect the output of cython -a to see if there's some python overhead in the hot loops.
Here is the Black (Black Scholes less the dividend) option pricing model for options on futures written in Cython with actual multi-threading, but I can't run it. (NOW FIXED, SEE LATER POST BELOW FOR ANSWER). I am using Python 3.5 with Microsoft Visual Studio 2015 compiler. Here is the serial version that takes 3.5s for 10M options: Cython program is slower than plain Python (10M options 3.5s vs 3.25s Black Scholes) - what am I missing?
I attempted to make this parallel by using nogil but after compiling, I cannot access the internal function CyBlackP. There are several issues with this (at least on Windows). 1) Cython when generating the OpenMP code assumes you are beyond v2.0 but Microsoft Visual Studio 2015 is stuck on the old version which requires signed iterators. The workaround I have is after first attempting to build the code, it will error out, then open the output CyBlackP.cpp file in Microsoft Visual Studio 2015, search for size_t __pyx_t_2 (line 1430), then change it to ssize_t __pyx_t_2, and change the next line from size_t __pyx_t_3 to ssize_t __pyx_t_3 to get rid of signed/unsigned errors, and compile again. 2) You can't directly go from NumPy arrays into the function as nogil only works on pure C/C++ functions, so I have several helper functions to convert the NumPy array inputs into C++ vector format, pass those to a C++ function, then convert the returned vector back to a NumPy array. I'm posting the parallel code here for others to use and I'm sure someone out there can figure out why I can't access the parallel function from Python - the non-parallel version was accessed like this from CyBlackP.CyBlackP import CyBlackP.
Code is here with steps on how to build. First file save as CyBlackP.pyx
[note the exposed function to Python here is CyBlackP, which converts the NumPy input arrays into C vectors through the helper functions, then passes the C vectors to the C function CyBlackParallel, which runs with nogil and OpenMP. The results are then converted back to a NumPy array and returned from CyBlackP back to Python]:
import numpy as np
cimport cython
from cython.parallel cimport prange
from libcpp.vector cimport vector
cdef extern from "math.h" nogil:
double exp(double)
double log(double)
double erf(double)
double sqrt(double)
cdef double std_norm_cdf(double x) nogil:
return 0.5*(1+erf(x/sqrt(2.0)))
#cython.boundscheck(False)
#cython.wraparound(False)
#cython.cdivision(True)
cdef CyBlackParallel(vector[double] Black_PnL, vector[double] Black_S, vector[double] Black_Texpiry, vector[double] Black_strike, vector[double] Black_volatility, vector[double] Black_IR, vector[int] Black_callput):
cdef int i
N = Black_PnL.size()
cdef double d1, d2
for i in prange(N, nogil=True, num_threads=4, schedule='static'):
d1 = ((log(Black_S[i] / Black_strike[i]) + Black_Texpiry[i] * (Black_volatility[i] * Black_volatility[i]) / 2)) / (Black_volatility[i] * sqrt(Black_Texpiry[i]))
d2 = d1 - Black_volatility[i] * sqrt(Black_Texpiry[i])
Black_PnL[i] = exp(-Black_IR[i] * Black_Texpiry[i]) * (Black_callput[i] * Black_S[i] * std_norm_cdf(Black_callput[i] * d1) - Black_callput[i] * Black_strike[i] * std_norm_cdf(Black_callput[i] * d2))
return Black_PnL
cdef vector[double] arrayToVector(np.ndarray[np.float64_t,ndim=1] array):
cdef long size = array.size
cdef vector[double] vec
cdef long i
for i in range(size):
vec.push_back(array[i])
return vec
cdef vector[int] INTarrayToVector(np.ndarray[np.int64_t,ndim=1] array):
cdef long size = array.size
cdef vector[int] vec
cdef long i
for i in range(size):
vec.push_back(array[i])
return vec
cdef np.ndarray[np.float64_t, ndim=1] vectorToArray(vector[double] vec):
cdef np.ndarray[np.float64_t, ndim=1] arr = np.zeros(vec.size())
cdef long i
for i in range(vec.size()):
arr[i] = vec[i]
return arr
#cython.boundscheck(False)
#cython.wraparound(False)
#cython.cdivision(True)
cpdef CyBlackP(ndarray[np.float64_t, ndim=1] PnL, ndarray[np.float64_t, ndim=1] S0, ndarray[np.float64_t, ndim=1] Texpiry, ndarray[np.float64_t, ndim=1] strike, ndarray [np.float64_t, ndim=1] volatility, ndarray[np.float64_t, ndim=1] IR, ndarray[np.int64_t, ndim=1] callput):
cdef vector[double] Black_PnL, Black_S, Black_Texpiry, Black_strike, Black_volatility, Black_IR
cdef ndarray[np.float64_t, ndim=1] Results
cdef vector[int] Black_callput
Black_PnL = arrayToVector(PnL)
Black_S = arrayToVector(S0)
Black_Texpiry = arrayToVector(Texpiry)
Black_strike = arrayToVector(strike)
Black_volatility = arrayToVector(volatility)
Black_IR = arrayToVector(IR)
Black_callput = INTarrayToVector(callput)
Black_PnL = CyBlackParallel (Black_PnL, Black_S, Black_Texpiry, Black_strike, Black_volatility, Black_IR, Black_callput)
Results = vectorToArray(Black_PnL)
return Results
Next code piece save as setup.py for use by Cython:
try:
from setuptools import setup
from setuptools import Extension
except ImportError:
from distutils.core import setup
from distutils.extension import Extension
from Cython.Distutils import build_ext
import numpy as np
ext_modules = [Extension("CyBlackP",sources=["CyBlackP.pyx"],
extra_compile_args=['/Ot', '/openmp', '/favor:INTEL64', '/EHsc', '/GA'],
language='c++')]
setup(
name= 'Generic model class',
cmdclass = {'build_ext': build_ext},
include_dirs = [np.get_include()],
ext_modules = ext_modules)
Then from a command prompt, type: python setup.py build_ext --inplace --compiler=msvc to build.
Any help on getting access to this function is appreciated, not sure why I can't seem to locate it after compiling. I can import CyBlackP or from CyBlackP import * but I can't get to the actual function to calculate the option values.
Here is a realistic NumPy test script to use if you want to test this Cython function:
BlackPnL = np.zeros(10000000)
Black_S=np.random.randint(200, 10000, 10000000)*0.01
Black_Texpiry=np.random.randint(1,500,10000000)*0.01
Black_strike=np.random.randint(1,100,10000000)*0.1
Black_volatility=np.random.rand(10000000)*1.2
Black_IR=np.random.rand(10000000)*0.1
Black_callput=np.sign(np.random.randn(10000000))
Black_callput=Black_callput.astype(np.int64)
Okay I figured out what was wrong using dependency walker http://www.dependencywalker.com/ on the CyBlackP.cp35-win_amd64.pyd file generated by Cython. It showed that 2 DLLs were not found: msvcp140_app.dll and vcomp140_app.dll which are just the x64 versions of MSVC OpenMP and CRT C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\redist\x64\
Microsoft.VC140.OpenMP\vcomp140.dll and C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\redist\x64\Microsoft.VC14
0.CRT\msvcp140.dll renamed with _app inserted, and copied to the \CyBlackP\ project directory. I also updated my setup.py like this which gets rid of the annoying import statement (now just from CyBlackP import CyBlackP):
try:
from setuptools import setup
from setuptools import Extension
except ImportError:
from distutils.core import setup
from distutils.extension import Extension
from Cython.Distutils import build_ext
import numpy as np
import os
module = 'CyBlackP'
ext_modules = [Extension(module, sources=[module + ".pyx"],
extra_compile_args=['/Ot', '/favor:INTEL64', '/EHsc', '/GA', '/openmp'],
language='c++')]
setup(
name = module,
cmdclass = {'build_ext': build_ext},
include_dirs = [np.get_include(), os.path.join(np.get_include(), 'numpy')],
ext_modules = ext_modules)
I wrote a python code that manages a lot of data and thus it takes a lot of time. So, I found out Cython and I began to change my code.
Basically, all I did is to change functions' declarations (cdef type name(arguments with variable type) ), to declare cdef variables with its type, and to declare cdef classes.
I'm writing all the .pyx with eclipse, and I'm compiling with the command python setup.py build_ext --inplace and running it with eclipse.
My issue is that comparing python with cython speed, there isn't any difference.
I run the command cython -a <file> to generate a html file and there are a lot of yellow lines.
I don't know if I'm doing something wrong, I should include something else, and I don't know how to delete these yellow lines.
I just paste some code lines, that's the part that I'd like to speed up and because the code is very long.
main.pyx
'''there are a lot of ndarray objects stored in a file and in this step I get each of them until there are no more items '''
cdef ReadWavePoints (WavePointManagement wavePointManagement, ColumnManagement columnManagement):
cdef int runReadWavePoints
wavePointManagement.OpenWavePointFileLoad(wavePointsFile)
runReadWavePoints = 1
while runReadWavePoints == 1:
try:
wavePointManagement.LoadWavePointFile()
wavePointManagement.RoundCoordinates()
wavePointManagement.SortWavePointList()
GroupColumnsVoxels(wavePointManagement.GetWavePointList(), columnManagement)
except:
wavePointManagement.CloseWavePointFile()
columnManagement.CloseWriteColumnFile()
break
'''I check which points are in the same XYZ (voxel) and in the same XY (column)'''
cdef GroupColumnsVoxels (object wavePointList, ColumnManagement columnManagement):
cdef int indexWavePointRef, indexWavePoint
cdef int saved
cdef double voxelValue
cdef int sizeWavePointList
sizeWavePointList = len(wavePointList)
indexWavePointRef = 0
while indexWavePointRef < sizeWavePointList - 1:
saved = 0
voxelValue = (wavePointList[indexWavePointRef]).GetValue()
for indexWavePoint in xrange(indexWavePointRef + 1, len(wavePointList)):
if (wavePointList[indexWavePointRef]).GetX() == (wavePointList[indexWavePoint]).GetX() and (wavePointList[indexWavePointRef]).GetY() == (wavePointList[indexWavePoint]).GetY():
if (wavePointList[indexWavePointRef]).GetZ() == (wavePointList[indexWavePoint]).GetZ():
if voxelValue < (wavePointList[indexWavePoint]).GetValue():
voxelValue = (wavePointList[indexWavePoint]).GetValue()
else:
saved = 1
CheckVoxel((wavePointList[indexWavePointRef]).GetX(), (wavePointList[indexWavePointRef]).GetY(), (wavePointList[indexWavePointRef]).GetZ(), voxelValue)
indexWavePointRef = indexWavePoint
if indexWavePointRef == sizeWavePointList - 1:
CheckVoxel((wavePointList[indexWavePointRef]).GetX(), (wavePointList[indexWavePointRef]).GetY(), (wavePointList[indexWavePointRef]).GetZ(), (wavePointList[indexWavePointRef]).GetValue())
break
else:
saved = 1
CheckVoxel((wavePointList[indexWavePointRef]).GetX(), (wavePointList[indexWavePointRef]).GetY(), (wavePointList[indexWavePointRef]).GetZ(), voxelValue)
columnObject = columnInstance.Column((wavePointList[indexWavePointRef]).GetX(), (wavePointList[indexWavePointRef]).GetY())
columnManagement.AddColumn(columnObject)
MaximumHeightColumn((wavePointList[indexWavePointRef]).GetX(), (wavePointList[indexWavePointRef]).GetY(), (wavePointList[indexWavePointRef]).GetZ())
indexWavePointRef = indexWavePoint
break
if saved == 0:
CheckVoxel((wavePointList[indexWavePointRef]).GetX(), (wavePointList[indexWavePointRef]).GetY(), (wavePointList[indexWavePointRef]).GetZ(), voxelValue)
indexWavePointRef = indexWavePoint
columnObject = columnInstance.Column((wavePointList[indexWavePointRef]).GetX(), (wavePointList[indexWavePointRef]).GetY())
columnManagement.AddColumn(columnObject)
MaximumHeightColumn((wavePointList[indexWavePointRef]).GetX(), (wavePointList[indexWavePointRef]).GetY(), (wavePointList[indexWavePointRef]).GetZ())
'''I check if the data stored in a voxel is lower than the new one; if its the case, I store it'''
cdef CheckVoxel (double X, double Y, double Z, double newValue):
cdef object bandVoxel, structvalCheckVoxel, out_str
cdef tuple valueCheckVoxel
bandVoxel = datasetVoxels.GetRasterBand(int(math.floor(Z/0.3))+1)
structvalCheckVoxel = bandVoxel.ReadRaster(int(math.floor((X-Xmin)/0.25)), int(math.floor((Ymax-Y)/0.25)), 1, 1, buf_type=gdal.GDT_Float32)
valueCheckVoxel = struct.unpack('f', structvalCheckVoxel)
if newValue > valueCheckVoxel[0]:
out_str = struct.pack('f', newValue)
bandVoxel.WriteRaster(int(math.floor((X-Xmin)/0.25)), int(math.floor((Ymax-Y)/0.25)), 1, 1, out_str)
'''I check if this point has the highest Z and I store this information'''
cdef MaximumHeightColumn(double X, double Y, double newZ):
cdef object bandMetricMaximumHeightColumn, structvalMaximumHeightColumn, out_strMaximumHeightColumn
cdef tuple valueMaximumHeightColumn
bandMetricMaximumHeightColumn = datasetMetrics.GetRasterBand(10)
structvalMaximumHeightColumn = bandMetricMaximumHeightColumn.ReadRaster(int(math.floor((X-Xmin)/0.25)), int(math.floor((Ymax-Y)/0.25)), 1, 1, buf_type=gdal.GDT_Float32)
valueMaximumHeightColumn = struct.unpack('f', structvalMaximumHeightColumn)
if newZ > round(valueMaximumHeightColumn[0], 1):
out_strMaximumHeightColumn = struct.pack('f', newZ)
bandMetricMaximumHeightColumn.WriteRaster(int(math.floor((X-Xmin)/0.25)), int(math.floor((Ymax-Y)/0.25)), 1, 1, out_strMaximumHeightColumn)
WavePointManagement.pyx
'''this class serializes, rounds and sorts the points of each ndarray'''
import cPickle as pickle
import numpy as np
cimport numpy as np
import math
cdef class WavePointManagement(object):
'''
This class manages all the points extracted from the waveform
'''
cdef object fileObject, wavePointList
__slots__ = ('wavePointList', 'fileObject')
def __cinit__(self):
'''
Constructor
'''
self.fileObject = None
self.wavePointList = np.array([])
cdef object GetWavePointList(self):
return self.wavePointList
cdef void OpenWavePointFileLoad (self, object fileName):
self.fileObject = file(fileName, 'rb')
cdef void LoadWavePointFile (self):
self.wavePointList = None
self.wavePointList = pickle.load(self.fileObject)
cdef void SortWavePointList (self):
self.wavePointList = sorted(self.wavePointList, key=lambda k: (k.x, k.y, k.z))
cdef void RoundCoordinates (self):
cdef int indexPointObject, sizeWavePointList
for pointObject in self.GetWavePointList():
pointObject.SetX(round(math.floor(pointObject.GetX()/0.25)*0.25, 2))
pointObject.SetY(round(math.ceil(pointObject.GetY()/0.25)*0.25, 2))
pointObject.SetZ(round(math.floor(pointObject.GetZ()/0.3)*0.3, 1))
cdef void CloseWavePointFile(self):
self.fileObject.close()
setup.py
from distutils.core import setup
from distutils.extension import Extension
from Cython.Distutils import build_ext
import numpy
ext = Extension("main", ["main.pyx"], include_dirs = [numpy.get_include()])
setup (ext_modules=[ext],
cmdclass = {'build_ext' : build_ext}
)
test_cython.py
'''this is the file I run with eclipse after compiling'''
from main import main
main()
How could I speed up this code?
Your code jumps back and forth between using numpy arrays and lists. As such there is virtually no difference between the code that cython will produce.
The following code produces a python list, and the key function is a pure python function as well.
self.wavePointList = sorted(self.wavePointList, key=lambda k: (k.x, k.y, k.z))
You will want to use ndarray.sort (or numpy.sort if you don't want to sort inplace). To do this you will also need to change how your objects are stored in the array. That is, you will need to use a structured array. See numpy.sort for examples on how to sort structured arrays -- particularly the last two examples on the page.
Once you have your data stored in a numpy array then you need to tell cython about how the data is stored in the array. This includes providing type information and the dimensions of the array. This page provides more information how to work efficiently with numpy arrays.
An example of show to create and sort structured arrays:
import numpy as np
cimport numpy as np
DTYPE = [('name', 'S10'), ('height', np.float64), ('age', np.int32)]
cdef packed struct Person:
char name[10]
np.float64_t height
np.int32_t age
ctypedef Person DTYPE_t
def create_array():
values = [('Arthur', 1.8, 41), ('Lancelot', 1.9, 38),
('Galahad', 1.7, 38)]
return np.array(values, dtype=DTYPE)
cpdef sort_by_age_then_height(np.ndarray[DTYPE_t, ndim=1] arr):
arr.sort(order=['age', 'height'])
Finally, you will need to convert your code from using python methods to using the standard c library methods for a further speed up. Below is an example using RoundCoordinates. ``cpdef` means the function is also exposed to python by a wrapper function.
cimport cython
cimport numpy as np
from libc.math cimport floor, ceil, round
import numpy as np
DTYPE = [('x', np.float64), ('y', np.float64), ('z', np.float64)]
cdef packed struct Point3D:
np.float64_t x, y, z
ctypedef Point3D DTYPE_t
# Caution should be used when turning the bounds check off as it can lead to undefined
# behaviour if you use an invalid index.
#cython.boundscheck(False)
cpdef RoundCoordinates_cy(np.ndarray[DTYPE_t] pointlist):
cdef int i
cdef DTYPE_t point
for i in range(len(pointlist)): # this line is optimised into a c loop
point = pointlist[i] # creates a copy of the point
point.x = round(floor(point.x/0.25)*2.5) / 10
point.y = round(ceil(point.y/0.25)*2.5) / 10
point.z = round(floor(point.z/0.3)*3) / 10
pointlist[i] = point # overwrites the old point data with the new data
Finally, before rewriting your entire code base, you should profile your code to see which functions the program spends most of its time and optimise those functions before bothering about optimising other functions.