I'm trying to speed up a piece of code that generates all possible splits of a string.
splits('foo') -> [('f', 'oo'), ('fo', 'o'), ('foo', '')]
The code for this in python is very simple:
def splits(text):
return [(text[:i + 1], text[i + 1:])
for i in range(len(text))]
Is there a way to speed this up via cython or some other means? For context, the greater purpose of this code is to find the split of a string with the highest probability.
This isn't the sort of problem that Cython tends to help with much. It's using slicing, which ends up largely the same speed as pure Python (i.e. actually pretty good).
Using a 100 character long byte string (b'0'*100) and 10000 iterations in timeit I get:
Your code as written - 0.37s
Your code as written but compiled in Cython - 0.21s
Your code with the line cdef int i and compiled in Cython - 0.20s (this is reproducably a small improvement. It's more significant with longer strings)
Your cdef int i and the parameter typed to bytes text - 0.28s (i.e. worse).
Best speed is got by using the Python C API directly (see code below) - 0.11s. I've chosen to do this mostly in Cython (but calling the API functions myself) for convenience, but you could write very similar code in C directly with a little more manual error checking. I've written this for the Python 3 API assuming you're using bytes objects (i.e. PyBytes instead of PyString) so if you're using Python 2, or Unicode and Python 3 you'll have to change it a little.
from cpython cimport *
cdef extern from "Python.h":
# This isn't included in the cpython definitions
# using PyObject* rather than object lets us control refcounting
PyObject* Py_BuildValue(const char*,...) except NULL
def split(text):
cdef Py_ssize_t l,i
cdef char* s
# Cython automatically checks the return value and raises an error if
# these fail. This provides a type-check on text
PyBytes_AsStringAndSize(text,&s,&l)
output = PyList_New(l)
for i in range(l):
# PyList_SET_ITEM steals a reference
# the casting is necessary to ensure that Cython doesn't
# decref the result of Py_BuildValue
PyList_SET_ITEM(output,i,
<object>Py_BuildValue('y#y#',s,i+1,s+i+1,l-(i+1)))
return output
If you don't want to go all the way with using the C API then a version that preallocates the list output = [None]*len(text) and does a for-loop rather than a list comprehension is marginally more efficient than your original version - 0.18s
In summary, just compiling it in Cython gives you a decent speed up (a bit less than 2x) and setting the type of i helps a little. This is all you can really achieve with Cython conventionally. To get full speed you basically need to resort to using the Python C API directly. That gets you a little under a 4x speed up which I think is pretty decent.
Related
I am trying to convert a list of objects (GeoJSON) to shapely objects using cython, but I am running into a error:
This piece of code seems to be the issue: cdef object result[N]. How do I declare a list/array from a given list?
Here is my current code:
def convert_geoms(list array):
cdef int i, N=len(array)
cdef double x, s=0.0
cdef object result[N] # ERROR HERE
for i in range(N):
g = build_geometry_objects2(array[i])
result[i] = g
return result
There's two issues with cdef object result[N]:
It creates a C array of Python objects - this doesn't really work because C arrays aren't easily integrated with the Python object reference counting (in this you'd need to copy the whole array to something else when you returned it anyway, since it's a local variable that's only scoped to the function).
For C arrays of the form sometype result[N], N must be known at compile-time. In this case N is different for each function call, so the variable definition is invalid anyway.
There's multiple solutions - most of them involve just accepting that you're using Python objects so not worrying about specifying the types and just writing valid Python code. I'd probably write it as a list comprehension. I suspect Cython will do surprisingly well at producing optimized code for that
return [ build_geometry_objects2(array[i]) for i in range(len(array)) ]
# or
return [ build_geometry_objects2(a) for a in array ]
The second version is probably better, but if it matters you can time it.
If the performance really matters you can use Python C API calls which you can cimport from cpython.list. See Cythonize list of all splits of a string for an example of something similar where list creation is optimized this way. The advantage of PyList_New is that it creates an appropriately sized list at the start filled with NULL pointers, which you can then fill in.
I am working on learning Cython (See How to Cythonize a Python Class with an attribute that is a function?)
I have the following function that I want to speed up, but Cython says both lines are going through python:
cdef selectindex(float fitnesspref):
r = np.random.rand()
return int(log(r) / log(fitnesspref))
So I need to get a random number (I originally just used random() built in Python call, but switched to Numpy hoping it was faster or would Cythonize better. Then I need to take the log of two floats, divide them, and then return the result back as an integer. Nothing too difficult, and should be a great question that people can then use later. I'm struggling to find simple solutions to all the info I need via the Cython docs or Google. I would have thought this was like in every tutorial, but I'm not having a lot of luck.
Looking around though, I can't find an nice easy solution. For example: Canonical way to generate random numbers in Cython
Is there a best, but simpler, way to Cythonize a function like this?
As a side note, I had a similar problem with abs() but then after I changed everything to be cdefs it seemed to automatically switch to using the C version.
Here is the results of the HTML generated by Cython:
Another common one I keep bumping into is:
start = time.time()
i.e. is there a simple way to get start and end times using C code to speed that up? I have that line there in an inner loop, so it's slowing things down. But it's really important to what I'm trying to do.
Update 1:
I'm trying to follow the suggestions in the comments and they don't seem to work out like I'd expect (which is why this all confuses me in the first place.) For example, here is a c version of random that I wrote and compiled. Why is it still yellow?
Update 2:
Okay, I researched it further and here is why it won't compile all the way to C:
It's doing a check to make sure I'm not dividing by zero, even though it's a constant and can't ever be zero. How do you get rid of that?
For anyone that follows me, here were the final answers:
Many common functions, including len() are already built in. If you switch it to use carrays it automatically compiles to C. See this link.
For the rest, the following imports were required:
from libc.stdlib cimport rand, RAND_MAX
from libc.math cimport log
To replace calls to random.random():
#cython.cdivision(True)
cdef float crandom() except -1:
return <float>rand() / <float>RAND_MAX
To replace calls to random.randint():
#cython.cdivision(True)
cdef int crandint(int lower, int upper) except -1:
return (rand() % (upper - lower + 1)) + lower
My selectindex function got rewritten as:
cdef int selectindex(float fitnesspref, int popsize) except -1:
cdef float r
cdef int val
r = crandom() + 0.00000000001
return <int>(log(r) / log(fitnesspref))
That 0.00000000001 was necessary because C and Python behave slightly differently here. The Python version never returns a straight up zero apparently out of the random calls but the crandom does every so often. And a log of zero is undefined. This might be because my c version is only working with a limited number of ints as it's starting point.
I never did come up with a way to replace time.time().
I hope this helps some newbie following after me.
I would like to use something like a structarray in cython, and I would like this structarray as easily accessible in python as in cython.
Based on a whim I used a recarray using a dtype that looks like the struct that I would like to use. Curiously, it just works and allows me to use a c structarray that, over the hood ;), is a numpy recarray for the python user.
Here is my example
# This is a "structarray in cython with numpy recarrays" testfile
import numpy as np
cimport numpy as np
# My structarray has nodes with fields x and y
# This also works without packed, but I have seen packed used in other places where people asked similar questions
# I assume that for two doubles that is equivalent but is necessary for in8s in between
cdef packed struct node:
double x
double y
# I suppose that would be the equivalent numpy dtype?
# Note: During compilation it warns me about double to float downcasts, but I do not see where
nodetype = [('x' , np.float64),('y', np.float64)]
def fun():
# Make 10 element recarray
# (Just looked it up. A point where 1-based indexing would save a look in the docs)
mynode1 = np.recarray(10,dtype=nodetype)
# Recarray with cdef struct
mynode1 = np.recarray(10,dtype=nodetype)
# Fill it with non-garbage somewhere
mynode1[2].x=1.0
mynode1[2].y=2.0
# Brave: give recarray element to a c function assuming its equivalent to the struct
ny = cfuny(mynode1[2])
assert ny==2.0 # works!
# Test memoryview, assuming type node
cdef node [:] nview = mynode1
ny = cfunyv(nview,2)
assert ny==2.0 # works!
# This sets the numpy recarray value with a c function the gts a memoryview
cfunyv_set(nview,5,9.0)
assert mynode1[5].y==9.0 # alsow works!
return 0
# return node element y from c struct node
cdef double cfuny(node n):
return n.y
# give element i from memoryview of recarray to c function expecting a c struct
cdef double cfunyv(node [:] n, int i):
return cfuny(n[i])
# write into recarray with a function expecting a memoryview with type node
cdef int cfunyv_set(node [:] n,int i,double val):
n[i].y = val
return 0
Of course I am not the first to try this.
Here for example the same thing is done, and it even states that this usage would be part of the manual here, but I cannot find this on the page. I suspect it was there at some point. There are also several discussions involving the use of strings in such a custom type (e.g. here), and from the answers I gather that the possibility of casting a recarray on a cstruct is intended behaviour, as the discussion talks about incorporating a regression test about the given example and having fixed the string error at some point.
My question
I could not find any documentation that states that this should work besides forum answers. Can someone show me where that is documented?
And, for some additional curiosity
Will this likely break at any point during the development of numpy or cython?
From the other forum entries on the subject it seems that packed is necessary for this to work once more interesting datatypes are part of the struct. I am not a compiler expert and have never used structure packing myself, but I suspect that whether a structure gets packed or not depends on the compiler settings. Does that mean that someone who compiles numpy without packing structures needs to compile this cython code without the packed?
This doesn't seem to be directly documented. Best reference I can give you is the typed memoryview docs here.
Rather than specific cython support for numpy structured dtypes this instead seems a consequence of support for the PEP 3118 buffer protocol. numpy exposes a Py_buffer struct for its arrays, and cython knows how to cast those into structs.
The packing is necessary. My understanding is x86 is aligned on itemsize byte boundaries, whereas as a numpy structured dtype is packed into the minimum space possible. Probably clearest by example:
%%cython
import numpy as np
cdef struct Thing:
char a
# 7 bytes padding, double must be 8 byte aligned
double b
thing_dtype = np.dtype([('a', np.byte), ('b', np.double)])
print('dtype size: ', thing_dtype.itemsize)
print('unpacked struct size', sizeof(Thing))
dtype size: 9
unpacked struct size 16
Just answering the final sub-question:
From the other forum entries on the subject it seems that packed is necessary for this to work once more interesting datatypes are part of the struct. I am not a compiler expert and have never used structure packing myself, but I suspect that whether a structure gets packed or not depends on the compiler settings. Does that mean that someone who compiles numpy without packing structures needs to compile this cython code without the packed?
Numpy's behaviour is decided at runtime rather than compile-time. It will calculate the minimum amount of space a structure can need and allocate blocks of that. It won't be changed by any compiler settings so should be reliable.
cdef packed struct is therefore always needed to match numpy. However, it does not generate standards compliant C code. Instead, it uses extensions to GCC, MSVC (and others). Therefore it works fine on the major C compilers that currently exist, but in principle might fail on a future compiler. It looks like it should be possible to use the C11 standard alignas to achieve the same thing in a standards compliant way, so Cython could hopefully be modified to do that if needed.
I have a C++ library and I want to wrap some of its functionality in python.
The function splits the given character array into 5 parts, not actual splitting but the structure we pass a pointer to, contains the information about the parts after the function returns. The 5 structures each contain 2 integers, one denoting the beginning of the part, and the other, the part's length.
The python wrapper should accept a python string and return a dictionary or tuple of the 5 parts(as python strings also).
My current approach of calling the function and then splitting the python string based on the sub-part information using python slicing syntax has not yielded any significant speed gains. I realize that there are many similar questions, but none of those cases have been helpful to me.
The Cython definition code is -
cdef extern from "parse.h" namespace util
ctypedef struct part:
int begin;
int len;
ctypedef struct Parsed:
part part1;
part part2;
part part3;
part part4;
part part5;
void ParseFunc(const char* url, int url_len, Parsed* parsed)
The Cython code is -
cimport parseDef
def parse(url, url_len):
cdef parseDef.Parsed parsed
parseDef.parseFunc(url, url_len, &parsed)
part1 = url[parsed.part1.begin:parsed.part1.begin+parsed.part1.len]
#similar code for other parts
return (part1, part2, part3, part4, part5)
Typical string size for this wrapper will be 10-50 generally.
You can get a small benefit from doing the indexing on const char* instead of the string
cimport parseDef
def parse(url, url_len):
cdef const char* url_as_char_ptr = url # automatic conversion
cdef parseDef.Parsed parsed
parseDef.parseFunc(url, url_len, &parsed)
part1 = url_as_char_ptr[parsed.part1.begin:parsed.part1.begin+parsed.part1.len]
#similar code for other parts
return (part1, part2, part3, part4, part5)
I don't think you can beat this by much is that, mostly because the c-code generated is actually pretty efficient. The indexing line is translated to something like
__pyx_t_2 = __Pyx_PyBytes_FromStringAndSize(__pyx_v_url_as_char_ptr + idx1, idx2 - idx1)
(noting that I've replaced parsed.part1.begin with idx1 just for the sake of readability and because I'm testing this with slightly different code since I don't have parseFunc. You can check your exact code with cython -a yourfile.pyx and looking at the html output).
This is basically just calling the Python c-api string constructor function. That will necessarily make a copy of the string it is passed, but you can't avoid that (the Python string constructor always makes a copy). That doesn't leave a lot of overhead to remove.
I am having a dictionary,
my_dict = {'a':[1,2,3], 'b':[4,5] , 'c':[7,1,2])
I want to use this dictionary inside a Cython nogil function . So , I tried to declare it as
cdef dict cy_dict = my_dict
Up to this stage is fine.
Now I need to iterate over the keys of my_dict and if the values are in list, iterate over it. In Python , it is quite easy as follows:
for key in my_dict:
if isinstance(my_dict[key], (list, tuple)):
###### Iterate over the value of the list or tuple
for value in list:
## Do some over operation.
But, inside Cython, I want to implement the same that too inside nogil . As, python objects are not allowed inside nogil, I am all stuck up here.
with nogil:
#### same implementation of the same in Cython
Can anyone please help me out ?
You can't use Python dict without the GIL because everything you could do with it involves manipulating Python objects. You most sensible option is to accept that you need the GIL. There's a less sensible option too involving C++ maps, but it may be hard to apply for your specific case.
You can use with gil: to reacquire the GIL. There is obvious an overhead here (parts using the GIL can't be executed in parallel, and there may be a delay which it waits for the GIL). However, if the dictionary manipulation is a small chunk of a larger piece of Cython code this may not be too bad:
with nogil:
# some large chunk of computationally intensive code goes here
with gil:
# your dictionary code
# more computationally intensive stuff here
The other less sensible option is to use C++ maps (along side other C++ standard library data types). Cython can wrap these and automatically convert them. To give a trivial example based on your example data:
from libcpp.map cimport map
from libcpp.string cimport string
from libcpp.vector cimport vector
from cython.operator cimport dereference, preincrement
def f():
my_dict = {'a':[1,2,3], 'b':[4,5] , 'c':[7,1,2]}
# the following conversion has an computational cost to it
# and must be done with the GIL. Depending on your design
# you might be able to ensure it's only done once so that the
# cost doesn't matter much
cdef map[string,vector[int]] m = my_dict
# cdef statements can't go inside no gil, but much of the work can
cdef map[string,vector[int]].iterator end = m.end()
cdef map[string,vector[int]].iterator it = m.begin()
cdef int total_length = 0
with nogil: # all this stuff can now go inside nogil
while it != end:
total_length += dereference(it).second.size()
preincrement(it)
print total_length
(you need to compile this with language='c++').
The obvious disadvantage to this is that the data-types inside the dict must be known in advance (it can't be an arbitrary Python object). However, since you can't manipulate arbitrary Python objects inside a nogil block you're pretty restricted anyway.
6-year later addendum: I don't recommend the "use C++ objects everywhere" approach as a general approach. The Cython-C++ interface is a bit clunky and you can spend a lot of time working around it. The Python containers are actually better than you think. Everyone tends to forget about the cost of converting their C++ objects to/from Python objects. People rarely consider if they really need to release the GIL or if they just read an article on the internet somewhere saying that the GIL is bad..
It's good for some tasks, but think carefully before blindly replacing all your list with vector, dict with map etc.. As a rule, if your C++ types live entirely within your function it may be a good move (but think twice...). If they're being converted as input or output arguments then think a third time.