I made a CPP DLL and I'm trying to call function inside it from python.
I've achieved this multiple times for other functions, but this one, I just can't find my mistake.
dll_name = "..\\src\\x64\\Debug\\2019-3A-IBD-MLDLL.dll"
dllabspath = os.path.dirname(os.path.abspath(__file__)) + os.path.sep + dll_name
myDll = CDLL(dllabspath)
#fit_reg_RBF_naive
myDll.fit_reg_RBF_naive.argtypes = [ct.c_void_p, ct.c_double, ct.c_void_p, ct.c_int, ct.c_int]
myDll.fit_reg_RBF_naive.restypes = ct.c_void_p
#predict_reg_RBF_naive
myDll.predict_reg_RBF_naive.argtypes = [ct.c_void_p, ct.c_void_p, ct.c_void_p, ct.c_int, ct.c_double, ct.c_int]
myDll.predict_reg_RBF_naive.restypes = ct.c_double
def fit_reg_RBF_naive(pyXTrain, pyGamma, pyYTrain, pySampleCount, pyInputCountPerSample):
XTrain = (ct.c_double * len(pyXTrain))(*pyXTrain)
YTrain = (ct.c_double * len(pyYTrain))(*pyYTrain)
inputCountPerSample = ct.c_int(pyInputCountPerSample)
sampleCount = ct.c_int(pySampleCount)
gamma = ct.c_double(pyGamma)
return myDll.fit_reg_RBF_naive(XTrain, gamma, YTrain, sampleCount, inputCountPerSample)
def predict_reg_RBF_naive(pyW, pyXTrain, pyXpredict ,pyInputCountPerSample, pyGamma, pySampleCount):
XTrain = (ct.c_double * len(pyXTrain))(*pyXTrain)
inputCountPerSample = ct.c_int(pyInputCountPerSample)
sampleCount = ct.c_int(pySampleCount)
gamma = ct.c_double(pyGamma)
Xpredict = (ct.c_double * len(pyXpredict))(*pyXpredict)
return myDll.predict_reg_RBF_naive(W, XTrain, Xpredict, inputCountPerSample, gamma, sampleCount)
Basically I load my DLL, set Ctypes for arguments and result for both of my fonctions. Then I make a python wrapper so that the user does not have to retype every cast from python to cpp.
My types on the cpp side seems good too:
extern "C" {
SUPEREXPORT double predict_reg_RBF_naive(double* W, double* X, double* Xpredict, int inputCountPerSample, double gamma, int N);
SUPEREXPORT double* fit_reg_RBF_naive(double* XTrain, double gamma, double* YTrain, int sampleCount, int inputCountPerSample);
}
I have no warning from the compiler for the cpp part, I've printed the memory adresse before the return inside fit_reg_RBF_naive from cpp and the W in python and they are the same.
000002B358384980 // cpp address of W before return
0x58384980 # Python address of W after function call
For me it seems the same address. Maybe I'm wrong.
So when I try to call my second cpp function it said
myDll.predict_reg_RBF_naive(W, XTrain, Xpredict,inputCountPerSample, gamma, sampleCount)
OSError: exception: access violation reading 0x000000007C7380A0
It crashed in the cpp when it tries to read W. They are no free or 'delete' in the cpp and the variable is properly allocated : double* W = new double[2];
Also, when I print W type in python I get <class 'int'>.
How comes my W seems to have the same address regarding the language, but has not the good type? Changing the result type of fit_reg_RBF_naive to POINTER(ct.c_double * 2) makes no change.
EDIT:
Here is how I call my functions:
from dll_load import predict_reg_RBF_naive, fit_reg_RBF_naive
gamma = 50
sampleCount = 2
inputCountPerSample = 3
XTrain = [1.0, 1.0, 1.0, 3.0, 3.0, 3.0]
YTrain = [-1.0, 1.0]
Xpredict = [1.0, 1.0, 1.0]
W = fit_reg_RBF_naive(XTrain, gamma, YTrain, sampleCount, inputCountPerSample)
print(predict_reg_RBF_naive(W, XTrain, Xpredict, inputCountPerSample, gamma, sampleCount))
[Python 3.Docs]: ctypes - A foreign function library for Python.
You misspelled restypes (it should be restype). By doing so, restype is not initialized, and defaults to int (this wouldn't be a problem on 32bit), and you ran into:
[SO]: Python ctypes cdll.LoadLibrary, instantiate an object, execute its method, private variable address truncated (#CristiFati's answer)
[SO]: python ctypes issue on different OSes (#CristiFati's answer)
Besides that, there are several problems in the code:
If the C function specifies a pointer (double* in this case), don't use ctypes.c_void_p (in argtypes or restype) to map it, as it might be too wide, use (for this case) ctypes.POINTER(ctypes.c_double) instead
For me this doesn't even compile (I wonder how were you able to run that code). I'm going to exemplify on XTrain only, but applies to YTrain and Xpredict as well. ctypes doesn't know to convert a Python list to a ctypes.POINTER(ctypes.c_double) (or ctypes.c_void_p), and the conversion must be made manually (to a ctypes.c_double array):
XTrain = [1.0, 1.0, 1.0, 3.0, 3.0, 3.0]
xtrain_ctypes = (ctypes.c_double * len(XTrain))(*XTrain)
and pass xtrain_ctypes to the functions.
Related
I am attempting to figure out why calling a function in a dynamically loaded lib crashes python. I'm doing the following, I have a C++ function in a dynamic library file, which is loaded in python using ctypes. I then call the function from python:
lib = cdll.LoadLibrary(libPath)
# Note: using c_char_p instead of POINTER(c_char) does not yield any difference in result
# Export const char* GetSection(const char* TilesetID, int32_t X0, int32_t Y0, int32_t X1, int32_t Y1, uint8_t*& OutData, uint64_t& OutDataSize)
lib.GetSection.argtypes = [POINTER(c_char), c_int32, c_int32, c_int32, c_int32, POINTER(c_void_p), POINTER(c_uint64)]
lib.GetSection.restype = POINTER(c_char)
output_data = c_void_p()
output_size = c_uint64()
str_data = lib.GetSection(id.encode('ascii'), x0, y0, x1, y1, byref(output_data), byref(output_size))
On MacOS, this works exactly as expected. Unfortunately on Windows 11, it does not. I'm running from a Jupyter notebook and the kernel crashes and restarts immediately after the lib.GetSection call.
I have attached the Visual Studio debugger to the process, and can see that on the C++ side of things, the function is being correctly called, all parameters are correct, and it returns without error. It is at this point that the python kernel crashes, deep in a python call stack that I don't have symbols for.
How do I even approach debugging this? Does anything look wrong with the way I am calling the function?
Having a toy C++ function to demonstrate your problem would help. Below is a best guess C++ function with the same signature and the Python code to call it:
test.cpp
#include <cstdint>
#define API __declspec(dllexport)
extern "C" {
API const char* GetSection(const char* TilesetID, int32_t X0, int32_t Y0, int32_t X1,
int32_t Y1, uint8_t*& OutData, uint64_t& OutDataSize) {
OutData = new uint8_t[5] { 1, 2, 3, 4, 5 };
OutDataSize = 5;
return "hello";
}
API void Delete(uint8_t* OutData) {
delete [] OutData;
}
}
test.py
import ctypes as ct
dll = ct.CDLL('./test')
# Note change to 2nd to last argument.
dll.GetSection.argtypes = (ct.c_char_p, ct.c_int32, ct.c_int32, ct.c_int32, ct.c_int32,
ct.POINTER(ct.POINTER(ct.c_uint8)), ct.POINTER(ct.c_uint64))
dll.GetSection.restype = ct.c_char_p
def GetSection(tileid, x0, y0, x1, y1):
output_data = ct.POINTER(ct.c_uint8)()
output_size = ct.c_uint64()
str_data = dll.GetSection(tileid, x0, y0, x1, y1,
ct.byref(output_data), ct.byref(output_size))
out_data = output_data[:output_size.value] # create a Python list of the data
dll.Delete(output_data) # can delete the data now
return str_data, out_data
print(GetSection(b'id', 1, 2, 3, 4))
Output:
(b'hello', [1, 2, 3, 4, 5])
Currently, I'm playing around with a script generating Julia sets and the Mandelbrot set and then using pygame to render the points.
Essentially, the screen is mapped to a smaller coordinate system where its bounded by -2.5, 2.5 on the x axis and -1, 1 on the y axis. Each of the pixels in this mapped range is then passed to a function to check whether its complex number equivalent is in the given set. This function returns the number of iterations it took to calculate whether the number is in the set or not (or the max iterations).
Then, for each pixel, I know what colour to colour it based on this iteration score and render each of the pixels one by one. This part of the process is really intensive and takes a ~30 seconds to render but can be much more depending on the complexity of the set.
Here is the code for finding out if a passed complex number and complex coordinate are in Julia set, this doesn't take long to compute at all when checking 1920 * 1080 pixels:
max_iter = 45
def julia(z, c):
n = 0
while abs(z) <= 2 and n < max_iter:
z = z * z + c
n += 1
return n
Here is the code I use for pygame rendering, this is definitely where the problem lies:
size_ = 1920, 1080
re_ = -2.5, 2.5
im_ = -1, 1
surf = pygame.Surface(size)
colour_gradient1 = [c, c1, c2, c3, ...] # This is some list of colours generated by a gradient function
for x in range(0, size_[0]):
for y in range(0, size_[1]):
z = complex(re_[0] + (x / size_[0]) * (re_[1] - re_[0]),
im_[0] + (y / size_[1]) * (im_[1] - im_[0]))
m = julia(z, c)
colour = colour_gradient1[m]
pygame.draw.rect(surf,
colour,
(x, y, 1, 1))
I think I understand why this is performance intensive in that both pygame and python aren't really optimised for rendering stuff to the screen like this. I'm currently trying to learn C++ and I understand its better for stuff like this.
I also experimented with a zoom function where I could select a box with the mouse and the script would render this selected area but implementing this was where the problem stuck out. As the zoomed in fractals got more complex, the script took too long to use this function.
So my question, is there a better way to render something like this in close to real-time using python and maybe pygame? I'm open to using a different package but if it's possible through pygame that would be ideal.
Attached below are a couple of pictures of the generated sets:
Fractal generating algorithms always slow down the further in you zoom because there needs to be ever more iterations per pixel the deeper you go (or before the bail-out is reached).
This is never going to be particularly fast in an interpreted language. Sure you can tweak it to increase the speed a little, but it will never be "real time" (say < 1 second / image) for all zoom levels.
If you want to continue in Python, you will have to just agree with yourself that it's never going to be fast.
However. However, you could split the generation of each quadrant into separate processes which would each run on their own CPU/core. That will give you a N/cores speed up.
There are some optimisations that can be performed with detecting symmetry in the image, and only calculating say half the pixels because the other side is a mirror of it (like the horizontal axis through an zoomed-out Mandelbrot set). You could probably refer to the source of the venerable Fractint Program for examples of this.
...
Aside: I wrote one of these (drawing a Mandelbrot set) in C using the nVidia CUDA library which spreads the calculation over the 1200-ish "CPU"s on the video card (using a mid-range 2018 laptop). While it worked quite fast, for sufficiently large images, or deeply "zoomed-in" fractals, it still became slow. There's just so much number crunching involved.
(This question finally made me install PyOpenGL. So thanks!)
As far as I've seen, iterating over each pixel individually,
will never give good performance (not in C++/C/Assembly/).
Vectorization (in the CPU) will help. What will really help,
is using the GPU's ability to apply one operation(/kernel),
to a whole multi-dimensional array of elements, in parallel.
Specifically: Using a fragment shader, for calculating the
color of each pixel. But that means using a graphics API
like OpenGL(/Vulkan/Direct3D/), or a GPGPU/Compute API like
OpenCL(/CUDA/).
If the resulting image is used within the graphics pipeline,
then it can stay on the GPU & be displayed directly from
there. If the resulting image needs to be used e.g. in a
GUI, saved to disk, or similar, it needs to be brought from
GPU to CPU (maybe render-to-texture, read the framebuffer,
use off-screen buffers, or other options that I don't know).
import numpy as np
from OpenGL.GL import *
from OpenGL.GL import shaders
from OpenGL.GLUT import *
# Vertex shader: Pass through (no model-view-projection).
vsSrc = '''
#version 300 es
layout (location = 0) in vec4 posIn;
void main()
{
gl_Position = posIn;
}
'''
# Fragment shader: Compute fractal color, per-pixel.
# en.wikipedia.org/wiki/Mandelbrot_set#Computer_drawings
fsSrc = '''
#version 300 es
precision mediump float;
out vec4 colorOut;
vec2 mapLinear(
vec2 val,
vec2 srcMin, vec2 srcMax,
vec2 dstMin, vec2 dstMax
) {
vec2 valNorm = (val - srcMin) / (srcMax - srcMin);
return valNorm * (dstMax - dstMin) + dstMin;
}
void main()
{
// Debugging: Return fixed color; see which pixels get it.
//colorOut = vec4(0.0, 0.5, 0.0, 1.0);
//return;
// Originally, origin is top-left. Convert to Cartesian.
vec2 pixelMin = vec2(0.0f, 720.0f);
vec2 pixelMax = vec2(1280.0f, 0.0f);
vec2 mbMin = vec2(-2.5f, -1.0f);
vec2 mbMax = vec2(1.0f, 1.0f);
vec2 mbExtent = mbMax - mbMin;
vec2 mbCenter = mbMin + (mbExtent / 2.0f);
vec2 fragMapped = mapLinear(
gl_FragCoord.xy, pixelMin, pixelMax, mbMin, mbMax
);
float real = 0.0f;
float imag = 0.0f;
int iter = 0;
const int maxIter = 500;
while (
((real*real + imag*imag) < 4.0f) &&
(iter < maxIter)
) {
float realTemp = real*real - imag*imag + fragMapped.x;
imag = 2.0f*real*imag + fragMapped.y;
real = realTemp;
++iter;
}
// Using generated colors, instead of indexing a palette.
// (Don't remember anymore where this came from,
// or if it was a heuristic.)
vec3 chosenColor;
float iterNorm = float(iter) / float(maxIter);
if (iterNorm > 0.5f) {
float iterNormInverse = 1.0f - iterNorm;
chosenColor = vec3(
0.0f, iterNormInverse, iterNormInverse
);
}
else {
chosenColor = vec3(0.0f, iterNorm, iterNorm);
}
colorOut = vec4(chosenColor.xyz, 1.0f);
}
'''
def compileFractalShader():
vs = shaders.compileShader(vsSrc, GL_VERTEX_SHADER)
fs = shaders.compileShader(fsSrc, GL_FRAGMENT_SHADER)
return shaders.compileProgram(vs, fs)
# Geometry: Just 2 triangles, covering the display surface.
# (So that the fragment shader runs for all surface pixels.)
def drawTriangles():
topLeftTriangle = (
1.0, 1.0, 0.0,
-1.0, -1.0, 0.0,
-1.0, 1.0, 0.0
)
bottomRightTriangle = (
1.0, 1.0, 0.0,
-1.0, -1.0, 0.0,
1.0, -1.0, 0.0
)
verts = np.array(
topLeftTriangle + bottomRightTriangle,
dtype=np.float32
)
glVertexAttribPointer(0, 3, GL_FLOAT, GL_FALSE, 0, verts)
glEnableVertexAttribArray(0)
glDrawArrays(GL_TRIANGLES, 0, 6)
def printShaderException(e):
errorMsg, shaderSrc, shaderType = e.args
print('Shader error message:')
for line in errorMsg.split('\\n'): print(line)
print('--')
#print('Shader source:')
#for line in shaderSrc[0].split(b'\n'): print(line)
#print('--')
print('Shader type:', shaderType)
WIDTH = 1280
HEIGHT = 720
glutInit()
glutInitWindowSize(WIDTH, HEIGHT)
glutCreateWindow('Fractals with fragment shaders.')
# Create shaders, after creating a window / opengl-context:
try: fractalShader = compileFractalShader()
except RuntimeError as e:
printShaderException(e)
exit()
glViewport(0, 0, WIDTH, HEIGHT)
glClearColor(0.5, 0.0, 0.5, 1.0)
def display():
glClear(GL_COLOR_BUFFER_BIT)
with fractalShader: drawTriangles()
glutSwapBuffers()
glutDisplayFunc(display)
glutMainLoop()
This is entirely unoptimized.
Also, as Kingsley wrote, zooming (not shown here)
was slowing things down even in the GPU (but: unoptimized).
I am working on a C/C++ DLL which used OpenCV, and in this one I perform some operations. In this example, I change the contrast of an image I read on Python, transfer to the DLL to perform the operation, and get back the result on Python to display it. I am doing this using pointers on the first pixel of each image, but in Python I don't find the way to recreate correctly the image using this pointer.
I already verified the Mat object in C++ is continuous, and I check the result saved from the DLL which is correct. The problem is in Python for me, but I don't see where I do something wrong.
The C++ class and function :
#pragma once
#include <vector>
#include <string>
#include <fstream>
#include <opencv2/core/core.hpp>
#include <opencv2\highgui\highgui.hpp>
#include <thread>
using namespace cv;
using namespace std;
class EpsImageProcessing
{
// -------------- Methods --------------
public:
EpsImageProcessing();
~EpsImageProcessing();
unsigned short * imAdjustContrast(void * ptrImg, int width, int height, int contrastValue);
// -------------- Atributes --------------
Mat imgResult;
unsigned short *imgAdress;
};
unsigned short * EpsImageProcessing::imAdjustContrast(void * ptrImg, int width, int height, int contrastValue)
{
// Get image and reshape it as Mat object
Mat imgTemp = Mat(height, width, CV_8UC1, (uchar*)ptrImg);
// Convert to double to perform calculations
imgTemp.convertTo(imgTemp, CV_32FC1);
// Calculate the contrast coefficient
float coeff = (259*((float)contrastValue+255)) / (255*(259 - (float)contrastValue));
// Change contrast
imgTemp = coeff * (imgTemp - 128) + 128;
// Convert image to original type
imgTemp.convertTo(imgTemp, CV_8UC1);
// Return result
imgResult= imgTemp.clone(); // imgTmp is an attribute of the class of my DLL
imwrite("imgAfter.jpg", imgResult);
bool test = imgResult.isContinuous(); // return true
imgAdress = imgResult.ptr<ushort>();
return imgAdress; //imgResult.ptr<ushort>(); // (unsigned short *)imgResult.data;
}
Then the C wrapper to do the link between C++ and others langages like Python :
__declspec(dllexport) unsigned short* __stdcall imAdjustContrast(void* handle, void* imgPtr, int width, int height, int contrastValue)
{
if (handle)
{
EpsImageProcessing* data = (EpsImageProcessing*)handle;
return data->imAdjustContrast(imgPtr, width, height, contrastValue);
}
return false;
}
And the Python code :
from ctypes import *
import numpy, os, cv2
import matplotlib.pyplot as plt
dirpath = os.environ['PATH']
os.environ['PATH'] = dirpath + ";C:/x64/Debug/" # include of opencv_world.dll
mydll = cdll.LoadLibrary("MyDll.dll")
class mydllClass(object):
def __init__(self, width, height, nFrame, path, filename):
mydll.AllocateHandleImg.argtypes = []
mydll.AllocateHandleImg.restype = c_void_p
mydll.imAdjustContrast.argtypes = [c_void_p, c_void_p, c_int, c_int, c_int]
mydll.imAdjustContrast.restype = POINTER(c_ushort)
self.obj = mydll.AllocateHandleImg()
def imAdjustContrast(self, ptrImg, width, height, contrast):
return mydll.imAdjustContrast(self.obj, ptrImg, width, height, contrast)
img0 = cv2.imread("C:\\Users\\mg\\Downloads\\imgInit.jpg", 0)
imgC = myclass.imAdjustContrast(img0.__array_interface__['data'][0], img0.shape[1], img0.shape[0], -127)
imgAfter = cv2.imread("C:\\Users\\mg\\Downloads\\imgAfter.jpg", 0)
image = numpy.zeros((img0.shape[0],img0.shape[1]), dtype=numpy.dtype(numpy.uint8))
for i in range(img0.shape[0]):
for j in range(img0.shape[1]):
indice = i*img0.shape[1]+j
image[i,j] = numpy.uint8(imgC[indice])
newImg = numpy.ctypeslib.as_array(cast(imgC, POINTER(c_uint8)), shape=(img0.shape))
plt.figure()
plt.subplot(221)
plt.imshow(imgAfter)
plt.gray()
plt.colorbar()
plt.title('image saved from C++ DLL')
plt.subplot(222)
plt.imshow(image)
plt.gray()
plt.colorbar()
plt.title('image recreated in Python (for loop)')
plt.subplot(223)
plt.imshow(newImg)
plt.gray()
plt.colorbar()
plt.title('image recreated in Python (cast)')
plt.show()
And the final result on Python is :
I found that the small difference between the two "good images" (image saved in C++ and recreate in Python with cast method) are from the compression of the image (.jpg) which is different between Python and C++. Dealing with a png and the image created in Python with the C++ pointer is okay with the cast method.
So the problem now is about the two for loops which don't create the image from the pointer in a good way. Any idea?
I'm trying to write two Cython functions to wrap external functions. The functions are the inverse of each another; one accepts a string, and returns a struct with two fields: a void pointer to a 2D array (the second dimension is always two elements: [[1.0, 2.0], [3.0, 4.0], [5.0, 6.0], … ]), and the array's length. The other accepts the same struct, and returns a string. So far, I've got the following. It compiles, but the cast to and from the nested list is definitely incorrect.
My .pxd:
cdef extern from "header.h":
struct _FFIArray:
void* data
size_t len
cdef _FFIArray decode_polyline_ffi(char* polyline, int precision);
cdef char* encode_coordinates_ffi(_FFIArray, int precision);
cdef void drop_float_array(_FFIArray coords);
cdef void drop_cstring(char* polyline)
My .pyx:
import numpy as np
from pypolyline_p cimport (
_FFIArray,
decode_polyline_ffi,
encode_coordinates_ffi,
drop_float_array,
drop_cstring
)
def encode_coordinates(coords, int precision):
""" coords looks like [[1.0, 2.0], [3.0, 4.0], …] """
cdef double[::1] ncoords = np.array(coords, dtype=np.float64)
cdef _FFIArray coords_ffi
# Wrong
coords_ffi.data = <void*>&ncoords[0]
# Wrong
coords_ffi.len = ncoords.shape[0]
cdef char* result = encode_coordinates_ffi(coords_ffi, precision)
cdef bytes polyline = result
drop_cstring(result)
return polyline
def decode_polyline(bytes polyline, int precision):
cdef char* to_send = polyline
cdef _FFIArray result = decode_polyline_ffi(to_send, precision)
# Wrong
cdef double* incoming_ptr = <double*>(result.data)
# Wrong
cdef double[::1] view = <double[:result.len:1]>incoming_ptr
coords = np.copy(view)
drop_float_array(result)
return coords
I think the issue is that you're trying to use 2D arrays and 1D memoryviews
In the encoding function
# the coords are a 2D, C contiguous array
cdef double[:,::1] ncoords = np.array(coords, dtype=np.float64)
# ...
coords_ffi.data = <void*>&ncoords[0,0] # take the 0,0 element
# the rest stays the same
In the decoding function
# specify it as a 2D, len by 2, C contiguous array
cdef double[:,::1] view = <double[:result.len,:2:1]>incoming_ptr
# the rest stays the same
(It's possible that your FFI functions expect Fortran contiguous arrays. In which case the ::1 goes on the first dimension of the memoryview, and you also change incoming_ptr)
I wrote a python code that manages a lot of data and thus it takes a lot of time. So, I found out Cython and I began to change my code.
Basically, all I did is to change functions' declarations (cdef type name(arguments with variable type) ), to declare cdef variables with its type, and to declare cdef classes.
I'm writing all the .pyx with eclipse, and I'm compiling with the command python setup.py build_ext --inplace and running it with eclipse.
My issue is that comparing python with cython speed, there isn't any difference.
I run the command cython -a <file> to generate a html file and there are a lot of yellow lines.
I don't know if I'm doing something wrong, I should include something else, and I don't know how to delete these yellow lines.
I just paste some code lines, that's the part that I'd like to speed up and because the code is very long.
main.pyx
'''there are a lot of ndarray objects stored in a file and in this step I get each of them until there are no more items '''
cdef ReadWavePoints (WavePointManagement wavePointManagement, ColumnManagement columnManagement):
cdef int runReadWavePoints
wavePointManagement.OpenWavePointFileLoad(wavePointsFile)
runReadWavePoints = 1
while runReadWavePoints == 1:
try:
wavePointManagement.LoadWavePointFile()
wavePointManagement.RoundCoordinates()
wavePointManagement.SortWavePointList()
GroupColumnsVoxels(wavePointManagement.GetWavePointList(), columnManagement)
except:
wavePointManagement.CloseWavePointFile()
columnManagement.CloseWriteColumnFile()
break
'''I check which points are in the same XYZ (voxel) and in the same XY (column)'''
cdef GroupColumnsVoxels (object wavePointList, ColumnManagement columnManagement):
cdef int indexWavePointRef, indexWavePoint
cdef int saved
cdef double voxelValue
cdef int sizeWavePointList
sizeWavePointList = len(wavePointList)
indexWavePointRef = 0
while indexWavePointRef < sizeWavePointList - 1:
saved = 0
voxelValue = (wavePointList[indexWavePointRef]).GetValue()
for indexWavePoint in xrange(indexWavePointRef + 1, len(wavePointList)):
if (wavePointList[indexWavePointRef]).GetX() == (wavePointList[indexWavePoint]).GetX() and (wavePointList[indexWavePointRef]).GetY() == (wavePointList[indexWavePoint]).GetY():
if (wavePointList[indexWavePointRef]).GetZ() == (wavePointList[indexWavePoint]).GetZ():
if voxelValue < (wavePointList[indexWavePoint]).GetValue():
voxelValue = (wavePointList[indexWavePoint]).GetValue()
else:
saved = 1
CheckVoxel((wavePointList[indexWavePointRef]).GetX(), (wavePointList[indexWavePointRef]).GetY(), (wavePointList[indexWavePointRef]).GetZ(), voxelValue)
indexWavePointRef = indexWavePoint
if indexWavePointRef == sizeWavePointList - 1:
CheckVoxel((wavePointList[indexWavePointRef]).GetX(), (wavePointList[indexWavePointRef]).GetY(), (wavePointList[indexWavePointRef]).GetZ(), (wavePointList[indexWavePointRef]).GetValue())
break
else:
saved = 1
CheckVoxel((wavePointList[indexWavePointRef]).GetX(), (wavePointList[indexWavePointRef]).GetY(), (wavePointList[indexWavePointRef]).GetZ(), voxelValue)
columnObject = columnInstance.Column((wavePointList[indexWavePointRef]).GetX(), (wavePointList[indexWavePointRef]).GetY())
columnManagement.AddColumn(columnObject)
MaximumHeightColumn((wavePointList[indexWavePointRef]).GetX(), (wavePointList[indexWavePointRef]).GetY(), (wavePointList[indexWavePointRef]).GetZ())
indexWavePointRef = indexWavePoint
break
if saved == 0:
CheckVoxel((wavePointList[indexWavePointRef]).GetX(), (wavePointList[indexWavePointRef]).GetY(), (wavePointList[indexWavePointRef]).GetZ(), voxelValue)
indexWavePointRef = indexWavePoint
columnObject = columnInstance.Column((wavePointList[indexWavePointRef]).GetX(), (wavePointList[indexWavePointRef]).GetY())
columnManagement.AddColumn(columnObject)
MaximumHeightColumn((wavePointList[indexWavePointRef]).GetX(), (wavePointList[indexWavePointRef]).GetY(), (wavePointList[indexWavePointRef]).GetZ())
'''I check if the data stored in a voxel is lower than the new one; if its the case, I store it'''
cdef CheckVoxel (double X, double Y, double Z, double newValue):
cdef object bandVoxel, structvalCheckVoxel, out_str
cdef tuple valueCheckVoxel
bandVoxel = datasetVoxels.GetRasterBand(int(math.floor(Z/0.3))+1)
structvalCheckVoxel = bandVoxel.ReadRaster(int(math.floor((X-Xmin)/0.25)), int(math.floor((Ymax-Y)/0.25)), 1, 1, buf_type=gdal.GDT_Float32)
valueCheckVoxel = struct.unpack('f', structvalCheckVoxel)
if newValue > valueCheckVoxel[0]:
out_str = struct.pack('f', newValue)
bandVoxel.WriteRaster(int(math.floor((X-Xmin)/0.25)), int(math.floor((Ymax-Y)/0.25)), 1, 1, out_str)
'''I check if this point has the highest Z and I store this information'''
cdef MaximumHeightColumn(double X, double Y, double newZ):
cdef object bandMetricMaximumHeightColumn, structvalMaximumHeightColumn, out_strMaximumHeightColumn
cdef tuple valueMaximumHeightColumn
bandMetricMaximumHeightColumn = datasetMetrics.GetRasterBand(10)
structvalMaximumHeightColumn = bandMetricMaximumHeightColumn.ReadRaster(int(math.floor((X-Xmin)/0.25)), int(math.floor((Ymax-Y)/0.25)), 1, 1, buf_type=gdal.GDT_Float32)
valueMaximumHeightColumn = struct.unpack('f', structvalMaximumHeightColumn)
if newZ > round(valueMaximumHeightColumn[0], 1):
out_strMaximumHeightColumn = struct.pack('f', newZ)
bandMetricMaximumHeightColumn.WriteRaster(int(math.floor((X-Xmin)/0.25)), int(math.floor((Ymax-Y)/0.25)), 1, 1, out_strMaximumHeightColumn)
WavePointManagement.pyx
'''this class serializes, rounds and sorts the points of each ndarray'''
import cPickle as pickle
import numpy as np
cimport numpy as np
import math
cdef class WavePointManagement(object):
'''
This class manages all the points extracted from the waveform
'''
cdef object fileObject, wavePointList
__slots__ = ('wavePointList', 'fileObject')
def __cinit__(self):
'''
Constructor
'''
self.fileObject = None
self.wavePointList = np.array([])
cdef object GetWavePointList(self):
return self.wavePointList
cdef void OpenWavePointFileLoad (self, object fileName):
self.fileObject = file(fileName, 'rb')
cdef void LoadWavePointFile (self):
self.wavePointList = None
self.wavePointList = pickle.load(self.fileObject)
cdef void SortWavePointList (self):
self.wavePointList = sorted(self.wavePointList, key=lambda k: (k.x, k.y, k.z))
cdef void RoundCoordinates (self):
cdef int indexPointObject, sizeWavePointList
for pointObject in self.GetWavePointList():
pointObject.SetX(round(math.floor(pointObject.GetX()/0.25)*0.25, 2))
pointObject.SetY(round(math.ceil(pointObject.GetY()/0.25)*0.25, 2))
pointObject.SetZ(round(math.floor(pointObject.GetZ()/0.3)*0.3, 1))
cdef void CloseWavePointFile(self):
self.fileObject.close()
setup.py
from distutils.core import setup
from distutils.extension import Extension
from Cython.Distutils import build_ext
import numpy
ext = Extension("main", ["main.pyx"], include_dirs = [numpy.get_include()])
setup (ext_modules=[ext],
cmdclass = {'build_ext' : build_ext}
)
test_cython.py
'''this is the file I run with eclipse after compiling'''
from main import main
main()
How could I speed up this code?
Your code jumps back and forth between using numpy arrays and lists. As such there is virtually no difference between the code that cython will produce.
The following code produces a python list, and the key function is a pure python function as well.
self.wavePointList = sorted(self.wavePointList, key=lambda k: (k.x, k.y, k.z))
You will want to use ndarray.sort (or numpy.sort if you don't want to sort inplace). To do this you will also need to change how your objects are stored in the array. That is, you will need to use a structured array. See numpy.sort for examples on how to sort structured arrays -- particularly the last two examples on the page.
Once you have your data stored in a numpy array then you need to tell cython about how the data is stored in the array. This includes providing type information and the dimensions of the array. This page provides more information how to work efficiently with numpy arrays.
An example of show to create and sort structured arrays:
import numpy as np
cimport numpy as np
DTYPE = [('name', 'S10'), ('height', np.float64), ('age', np.int32)]
cdef packed struct Person:
char name[10]
np.float64_t height
np.int32_t age
ctypedef Person DTYPE_t
def create_array():
values = [('Arthur', 1.8, 41), ('Lancelot', 1.9, 38),
('Galahad', 1.7, 38)]
return np.array(values, dtype=DTYPE)
cpdef sort_by_age_then_height(np.ndarray[DTYPE_t, ndim=1] arr):
arr.sort(order=['age', 'height'])
Finally, you will need to convert your code from using python methods to using the standard c library methods for a further speed up. Below is an example using RoundCoordinates. ``cpdef` means the function is also exposed to python by a wrapper function.
cimport cython
cimport numpy as np
from libc.math cimport floor, ceil, round
import numpy as np
DTYPE = [('x', np.float64), ('y', np.float64), ('z', np.float64)]
cdef packed struct Point3D:
np.float64_t x, y, z
ctypedef Point3D DTYPE_t
# Caution should be used when turning the bounds check off as it can lead to undefined
# behaviour if you use an invalid index.
#cython.boundscheck(False)
cpdef RoundCoordinates_cy(np.ndarray[DTYPE_t] pointlist):
cdef int i
cdef DTYPE_t point
for i in range(len(pointlist)): # this line is optimised into a c loop
point = pointlist[i] # creates a copy of the point
point.x = round(floor(point.x/0.25)*2.5) / 10
point.y = round(ceil(point.y/0.25)*2.5) / 10
point.z = round(floor(point.z/0.3)*3) / 10
pointlist[i] = point # overwrites the old point data with the new data
Finally, before rewriting your entire code base, you should profile your code to see which functions the program spends most of its time and optimise those functions before bothering about optimising other functions.