I'm trying to use cython to speed up my code. Since I'm working with an array of strings, I want to use string and vector from c++. But I have problems compiling if I import c libraries. For an example, I tried to implement an example from here: https://cython.readthedocs.io/en/latest/src/tutorial/cython_tutorial.html.
So, my code is
from libcpp.vector cimport vector
def primes(unsigned int nb_primes):
cdef int n, i
cdef vector[int] p
p.reserve(nb_primes) # allocate memory for 'nb_primes' elements.
n = 2
while p.size() < nb_primes: # size() for vectors is similar to len()
for i in p:
if n % i == 0:
break
else:
p.push_back(n) # push_back is similar to append()
n += 1
# Vectors are automatically converted to Python
# lists when converted to Python objects.
return p
I save thiscode like 'test_char.pyx'. For compilation i use it:
from Cython.Build import cythonize
setup(name='test_char',
ext_modules = cythonize('test_char.pyx')
)
After that i get test_char.c, but i don't get test_char.py.
If i will use this code (without cimport):
def primes(int nb_primes):
cdef int n, i, len_p
cdef int p[1000]
if nb_primes > 1000:
nb_primes = 1000
len_p = 0 # The current number of elements in p.
n = 2
while len_p < nb_primes:
# Is n prime?
for i in p[:len_p]:
if n % i == 0:
break
# If no break occurred in the loop, we have a prime.
else:
p[len_p] = n
len_p += 1
n += 1
# Let's return the result in a python list:
result_as_list = [prime for prime in p[:len_p]]
return result_as_list
all be right. So, plz, any ideas?
from distutils.extension import Extension
extensions = [
Extension("test_char", ["test_char.pyx"]
, language="c++"
)
]
setup(
name="test_char",
ext_modules = cythonize(extensions),
)
it can solve this problem
This is for a school project. I need to create a function using recursion to convert an integer to binary string. It must be a str returned, not an int. The base case is n==0, and then 0 would need to be returned. There must be a base case like this, but this is where I think I am getting the extra 0 from (I could be wrong). I am using Python 3.6 with the IDLE and the shell to execute it.
The function works just fine, expect for this additional zero that I need gone.
Here is my function, dtobr:
def dtobr(n):
"""
(int) -> (str)
This function has the parameter n, which is a non-negative integer,
and it will return the string of 0/1's
which is the binary representation of n. No side effects.
Returns bianry string as mentioned. This is like the function
dtob (decimal to bianary) but this is using recursion.
Examples:
>>> dtob(27)
'11011'
>>> dtob(0)
'0'
>>> dtob(1)
'1'
>>> dtob(2)
'10'
"""
if n == 0:
return str(0)
return dtobr(n // 2) + str(n % 2)
This came from the function I already wrote which converted it just fine, but without recursion. For reference, I will include this code as well, but this is not what I need for this project, and there are no errors with this:
def dtob(n):
"""
(int) -> (str)
This function has the parameter n, which is a non-negative integer,
and it will return the string of 0/1's
which is the binary representation of n. No side effects.
Returns bianry string as mentioned.
Examples:
>>> dtob(27)
'11011'
>>> dtob(0)
'0'
>>> dtob(1)
'1'
>>> dtob(2)
'10'
"""
string = ""
if n == 0:
return str(0)
while n > 0:
remainder = n % 2
string = str(remainder) + string
n = n // 2
Hopefully someone can help me get ride of that additional left hand zero. Thanks!
You need to change the condition to recursively handle both the n // 2 and n % 2:
if n <= 1:
return str(n) # per #pault's suggestion, only needed str(n) instead of str(n % 2)
else:
return dtobr(n // 2) + dtobr(n % 2)
Test case:
for i in [0, 1, 2, 27]:
print(dtobr(i))
# 0
# 1
# 10
# 11011
FYI you can easily convert to binary format like so:
'{0:b}'.format(x) # where x is your number
Since there is already an answer that points and resolves the issue with recursive way, lets see some interesting ways to achieve same goal.
Lets define a generator that will give us iterative way of getting binary numbers.
def to_binary(n):
if n == 0: yield "0"
while n > 0:
yield str(n % 2)
n = n / 2
Then you can use this iterable to get decimal to binary conversion in multiple ways.
Example 1.
reduce function is used to concatenate chars received from to_binary iterable (generator).
from functools import reduce
def to_binary(n):
if n == 0: yield "0"
while n > 0:
yield str(n % 2)
n = n / 2
print reduce(lambda x, y: x+y, to_binary(0)) # 0
print reduce(lambda x, y: x+y, to_binary(15)) # 1111
print reduce(lambda x, y: x+y, to_binary(15)) # 11011
Example 2.
join takes iterable, unrolls it and joins them by ''
def to_binary(n):
if n == 0: yield "0"
while n > 0:
yield str(n % 2)
n = n / 2
print ''.join(to_binary(0)) # 0
print ''.join(to_binary(1)) # 1
print ''.join(to_binary(15)) # 1111
print ''.join(to_binary(27)) # 11011
I have implemented a Mersenne Twister in Python using the following example code and while it does work as intended, I am not clear on how to limit results returned to a range of integers. For example, if I wanted to use this mechanism to determine the value of a dice roll, I would (right now, in an incredibly inefficient manner) iterate through potential results from the MT until something falls within the set. There has to be a much more memory-efficient manner to get a value to fall within a set range, like 1-20.
Currently, the algorithm returns a random set of numbers that don't seem to peak anywhere close to the set range:
2840889030
2341262508
2626522481
893458501
1134227444
3424236607
4171927007
1414775506
318984778
811882651
1509520423
1796453323
571461449
2606098999
2100002233
202969379
2318195635
1583585513
863717092
1218132929
1044954980
2997947229
867650808
177016714
2532350044
2917724494
2789913671
2793703767
1477382755
2552234519
2230774266
956596469
1165204853
1261233074
1856099289
21274564
1867584221
200970721
2112891842
139474834
93227265
1919721548
1026587194
30693196
3114464709
2194502660
2235520335
1877205724
1093736467
3136329929
1838505684
1358237877
2394536120
1268347552
1222927042
2982839076
1155599683
1943346953
3778719619
1483759762
3227630028
2775862513
2991889829
4252811853
995611629
626323532
3895812866
4027023347
3778533921
3840271846
4289281429
2263887842
402963991
2957069652
238880521
3643974307
472466724
3309455978
3588191581
1390613042
290666747
1375502175
1172854301
2159248842
3279978887
2206149102
804187781
3811948116
4134597627
1556281173
2590972812
3291094915
1836658937
3721612785
365099684
3884686172
2966532828
3609464378
1672431128
3959413372
For testing purposes I implemented the current test logic (part of __main__) and so far it just runs infinitely.
#!/usr/bin/env python2
# -*- coding: utf-8 -*-
"""
Based on the pseudocode in https://en.wikipedia.org/wiki/Mersenne_Twister. Generates uniformly distributed 32-bit integers in the range [0, 232 − 1] with the MT19937 algorithm
Yaşar Arabacı <yasar11732 et gmail nokta com>
"""
# Create a length 624 list to store the state of the generator
MT = [0 for i in xrange(624)]
index = 0
# To get last 32 bits
bitmask_1 = (2 ** 32) - 1
# To get 32. bit
bitmask_2 = 2 ** 31
# To get last 31 bits
bitmask_3 = (2 ** 31) - 1
def initialize_generator(seed):
"Initialize the generator from a seed"
global MT
global bitmask_1
MT[0] = seed
for i in xrange(1,624):
MT[i] = ((1812433253 * MT[i-1]) ^ ((MT[i-1] >> 30) + i)) & bitmask_1
def extract_number():
"""
Extract a tempered pseudorandom number based on the index-th value,
calling generate_numbers() every 624 numbers
"""
global index
global MT
if index == 0:
generate_numbers()
y = MT[index]
y ^= y >> 11
y ^= (y << 7) & 2636928640
y ^= (y << 15) & 4022730752
y ^= y >> 18
index = (index + 1) % 624
return y
def generate_numbers():
"Generate an array of 624 untempered numbers"
global MT
for i in xrange(624):
y = (MT[i] & bitmask_2) + (MT[(i + 1 ) % 624] & bitmask_3)
MT[i] = MT[(i + 397) % 624] ^ (y >> 1)
if y % 2 != 0:
MT[i] ^= 2567483615
if __name__ == "__main__":
from datetime import datetime
now = datetime.now()
solved = False
initialize_generator(now.microsecond)
#for i in xrange(10):
# "Print 10 random numbers as an example"
while(solved != True):
generated_number = extract_number()
while(generated_number <= 20 and generated_number >= 1):
print generated_number
solved = True
Any advice on how to implement? It appears that it may not even get a chance to drop down to a number within the predefined set.
I want to pick a random integer between a and b, inclusive.
I know 3 ways of doing it. However, their performance seems very counter-intuitive:
import timeit
t1 = timeit.timeit("n=random.randint(0, 2)", setup="import random", number=100000)
t2 = timeit.timeit("n=random.choice([0, 1, 2])", setup="import random", number=100000)
t3 = timeit.timeit("n=random.choice(ar)", setup="import random; ar = [0, 1, 2]", number=100000)
[print(t) for t in [t1, t2, t3]]
On my machine, this gives:
0.29744589625620965
0.19716156798482648
0.17500512311108346
Using an online interpreter, this gives:
0.23830216699570883
0.16536146598809864
0.15081614299560897
Note how the most direct version (#1) that uses the dedicated function for doing what I'm doing is 50% worse that the strangest version (#3) which pre-defines an array and then chooses randomly from it.
What's going on?
It's just implementation details. randint delegates to randrange, so it has another layer of function call overhead, and randrange goes through a lot of argument checking and other crud. In contrast, choice is a really simple one-liner.
Here's the code path randint goes through for this call, with comments and unexecuted code stripped out:
def randint(self, a, b):
return self.randrange(a, b+1)
def randrange(self, start, stop=None, step=1, _int=int, _maxwidth=1L<<BPF):
istart = _int(start)
if istart != start:
# not executed
if stop is None:
# not executed
istop = _int(stop)
if istop != stop:
# not executed
width = istop - istart
if step == 1 and width > 0:
if width >= _maxwidth:
# not executed
return _int(istart + _int(self.random()*width))
And here's the code path choice goes through:
def choice(self, seq):
return seq[int(self.random() * len(seq))]
I rewrote the original radix sort algorithm for Python from Wikipedia using arrays from SciPy to gain performance and to reduce code length, which I managed to accomplish. Then I took the classic (in-memory, pivot based) quick sort algorithm from Literate Programming and compared their performance.
I had the expectation that radix sort will beat quick sort beyond a certain threshold, which it did not. Further, I found Erik Gorset's Blog's asking the question "Is radix sort faster than quick sort for integer arrays?". There the answer is that
.. the benchmark shows the MSB in-place radix sort to be consistently over 3 times faster than quicksort for large arrays.
Unfortunately, I could not reproduce the result; the differences are that (a) Erik chose Java and not Python and (b) he uses the MSB in-place radix sort, whereas I just fill buckets inside a Python dictionary.
According to theory radix sort should be faster (linear) compared to quick sort; but apparently it depends a lot on the implementation. So where is my mistake?
Here is the code comparing both algorithms:
from sys import argv
from time import clock
from pylab import array, vectorize
from pylab import absolute, log10, randint
from pylab import semilogy, grid, legend, title, show
###############################################################################
# radix sort
###############################################################################
def splitmerge0 (ls, digit): ## python (pure!)
seq = map (lambda n: ((n // 10 ** digit) % 10, n), ls)
buf = {0:[], 1:[], 2:[], 3:[], 4:[], 5:[], 6:[], 7:[], 8:[], 9:[]}
return reduce (lambda acc, key: acc.extend(buf[key]) or acc,
reduce (lambda _, (d,n): buf[d].append (n) or buf, seq, buf), [])
def splitmergeX (ls, digit): ## python & numpy
seq = array (vectorize (lambda n: ((n // 10 ** digit) % 10, n)) (ls)).T
buf = {0:[], 1:[], 2:[], 3:[], 4:[], 5:[], 6:[], 7:[], 8:[], 9:[]}
return array (reduce (lambda acc, key: acc.extend(buf[key]) or acc,
reduce (lambda _, (d,n): buf[d].append (n) or buf, seq, buf), []))
def radixsort (ls, fn = splitmergeX):
return reduce (fn, xrange (int (log10 (absolute (ls).max ()) + 1)), ls)
###############################################################################
# quick sort
###############################################################################
def partition (ls, start, end, pivot_index):
lower = start
upper = end - 1
pivot = ls[pivot_index]
ls[pivot_index] = ls[end]
while True:
while lower <= upper and ls[lower] < pivot: lower += 1
while lower <= upper and ls[upper] >= pivot: upper -= 1
if lower > upper: break
ls[lower], ls[upper] = ls[upper], ls[lower]
ls[end] = ls[lower]
ls[lower] = pivot
return lower
def qsort_range (ls, start, end):
if end - start + 1 < 32:
insertion_sort(ls, start, end)
else:
pivot_index = partition (ls, start, end, randint (start, end))
qsort_range (ls, start, pivot_index - 1)
qsort_range (ls, pivot_index + 1, end)
return ls
def insertion_sort (ls, start, end):
for idx in xrange (start, end + 1):
el = ls[idx]
for jdx in reversed (xrange(0, idx)):
if ls[jdx] <= el:
ls[jdx + 1] = el
break
ls[jdx + 1] = ls[jdx]
else:
ls[0] = el
return ls
def quicksort (ls):
return qsort_range (ls, 0, len (ls) - 1)
###############################################################################
if __name__ == "__main__":
###############################################################################
lower = int (argv [1]) ## requires: >= 2
upper = int (argv [2]) ## requires: >= 2
color = dict (enumerate (3*['r','g','b','c','m','k']))
rslbl = "radix sort"
qslbl = "quick sort"
for value in xrange (lower, upper):
#######################################################################
ls = randint (1, value, size=value)
t0 = clock ()
rs = radixsort (ls)
dt = clock () - t0
print "%06d -- t0:%0.6e, dt:%0.6e" % (value, t0, dt)
semilogy (value, dt, '%s.' % color[int (log10 (value))], label=rslbl)
#######################################################################
ls = randint (1, value, size=value)
t0 = clock ()
rs = quicksort (ls)
dt = clock () - t0
print "%06d -- t0:%0.6e, dt:%0.6e" % (value, t0, dt)
semilogy (value, dt, '%sx' % color[int (log10 (value))], label=qslbl)
grid ()
legend ((rslbl,qslbl), numpoints=3, shadow=True, prop={'size':'small'})
title ('radix & quick sort: #(integer) vs duration [s]')
show ()
###############################################################################
###############################################################################
And here is the result comparing sorting durations in seconds (logarithmic vertical axis) for integer arrays of size in range from 2 to 1250 (horizontal axis); lower curve belongs to quick sort:
Radix vs Quick Sort Comparison
Quick sort is smooth at the power changes (e.g. at 10, 100 or 1000), but radix sort just jumps a little but follows otherwise qualitatively the same path as quick sort, just much slower!
You have several problems here.
First of all, as pointed out in the comments, your data set is far too small for the theoretical complexity to overcome the overheads in the code.
Next your implementation with all those unnecessary function calls and copying lists around is very inefficient. Writing the code in a straightforward procedural manner will almost always be faster than a functional solution (for Python that is, other languages will differ here). You have a procedural implementation of quicksort so if you write your radix sort in the same style it may turn out faster even for small lists.
Finally, it may be that when you do try large lists the overheads of memory management begin to dominate. That means that you have a limited window between small lists where the efficiency of the implementation is the dominant factor and large lists where the memory management is the dominant factor.
Here's some code that uses your quicksort but a simple radixsort written procedurally but trying to avoid so much copying of data. You'll see that even for short lists it beats the quicksort but more interestingly as the data size goes up so does the ratio between quicksort and radix sort and then it begins to drop again as the memory management starts to dominate (simple things like freeing a list of 1,000,000 items take a significant time):
from random import randint
from math import log10
from time import clock
from itertools import chain
def splitmerge0 (ls, digit): ## python (pure!)
seq = map (lambda n: ((n // 10 ** digit) % 10, n), ls)
buf = {0:[], 1:[], 2:[], 3:[], 4:[], 5:[], 6:[], 7:[], 8:[], 9:[]}
return reduce (lambda acc, key: acc.extend(buf[key]) or acc,
reduce (lambda _, (d,n): buf[d].append (n) or buf, seq, buf), [])
def splitmerge1 (ls, digit): ## python (readable!)
buf = [[] for i in range(10)]
divisor = 10 ** digit
for n in ls:
buf[(n//divisor)%10].append(n)
return chain(*buf)
def radixsort (ls, fn = splitmerge1):
return list(reduce (fn, xrange (int (log10 (max(abs(val) for val in ls)) + 1)), ls))
###############################################################################
# quick sort
###############################################################################
def partition (ls, start, end, pivot_index):
lower = start
upper = end - 1
pivot = ls[pivot_index]
ls[pivot_index] = ls[end]
while True:
while lower <= upper and ls[lower] < pivot: lower += 1
while lower <= upper and ls[upper] >= pivot: upper -= 1
if lower > upper: break
ls[lower], ls[upper] = ls[upper], ls[lower]
ls[end] = ls[lower]
ls[lower] = pivot
return lower
def qsort_range (ls, start, end):
if end - start + 1 < 32:
insertion_sort(ls, start, end)
else:
pivot_index = partition (ls, start, end, randint (start, end))
qsort_range (ls, start, pivot_index - 1)
qsort_range (ls, pivot_index + 1, end)
return ls
def insertion_sort (ls, start, end):
for idx in xrange (start, end + 1):
el = ls[idx]
for jdx in reversed (xrange(0, idx)):
if ls[jdx] <= el:
ls[jdx + 1] = el
break
ls[jdx + 1] = ls[jdx]
else:
ls[0] = el
return ls
def quicksort (ls):
return qsort_range (ls, 0, len (ls) - 1)
if __name__=='__main__':
for value in 1000, 10000, 100000, 1000000, 10000000:
ls = [randint (1, value) for _ in range(value)]
ls2 = list(ls)
last = -1
start = clock()
ls = radixsort(ls)
end = clock()
for i in ls:
assert last <= i
last = i
print("rs %d: %0.2fs" % (value, end-start))
tdiff = end-start
start = clock()
ls2 = quicksort(ls2)
end = clock()
last = -1
for i in ls2:
assert last <= i
last = i
print("qs %d: %0.2fs %0.2f%%" % (value, end-start, ((end-start)/tdiff*100)))
The output when I run this is:
C:\temp>c:\python27\python radixsort.py
rs 1000: 0.00s
qs 1000: 0.00s 212.98%
rs 10000: 0.02s
qs 10000: 0.05s 291.28%
rs 100000: 0.19s
qs 100000: 0.58s 311.98%
rs 1000000: 2.47s
qs 1000000: 7.07s 286.33%
rs 10000000: 31.74s
qs 10000000: 86.04s 271.08%
Edit:
Just to clarify. The quicksort implementation here is very memory friendly, it sorts in-place so no matter how large the list it is just shuffling data around not copying it. The original radixsort effectively copies the list twice for each digit: once into the smaller lists and then again when you concatenate the lists. Using itertools.chain avoids that second copy but there's still a lot of memory allocation/deallocation going on. (Also 'twice' is approximate as list appending does involve extra copying even if it is amortized O(1) so I should maybe say 'proportional to twice'.)
Your data representation is very expensive. Why do you use a hashmap for your buckets? Why use a base10 representation that you need to compute logarithms (= expensive to compute) for?
Avoid lambda expressions and such, I don't think python can optimize them very well yet.
Maybe start with sorting 10-byte strings for the benchmark instead. And: no Hashmaps and similar expensive datastructures.