Why random.randint() is much slower than random.getrandbits() - python

A made a test which compares random.randint() and random.getrandbits() in Python. The result shows that getrandbits is way faster than randint.
from random import randint, getrandbits, seed
from datetime import datetime
def ts():
return datetime.now().timestamp()
def diff(st, en):
print(f'{round((en - st) * 1000, 3)} ms')
seed(111)
N = 6
rn = []
st = ts()
for _ in range(10**N):
rn.append(randint(0, 1023))
en = ts()
diff(st,en)
rn = []
st = ts()
for _ in range(10**N):
rn.append(getrandbits(10))
en = ts()
diff(st,en)
The numbers range is exactly the same, because 10 random bits is range from 0 (in case of 0000000000) to 1023 (in case of 1111111111).
The output of the code:
590.509 ms
91.01 ms
As you can see, getrandbits under the same conditions is almost 6.5 times faster. But why? Can somebody explain that?

randint uses randrange which uses _randbelow which is _randbelow_with_get_randbits (if it is present) which uses getrandbits. However, randrange includes overhead on top of the underlying call to getrandbits because it must check for the start and step arguments. Even though there are fast paths for these checks, they are still there. There are definitely other things contributing to the 6.5 slowdown when using randint, but this answer atleast shows you that randint will always be slower than getrandbits.

getrandbits() is a very short function going to native code almost immediately:
def getrandbits(self, k):
"""getrandbits(k) -> x. Generates an int with k random bits."""
if k < 0:
raise ValueError('number of bits must be non-negative')
numbytes = (k + 7) // 8 # bits / 8 and rounded up
x = int.from_bytes(_urandom(numbytes), 'big')
return x >> (numbytes * 8 - k) # trim excess bits
_urandom is os.urandom, which you can't even find in os.py in the same folder, as it's native code. See details here: Where can I find the source code of os.urandom()?
randint() is a return self.randrange(a, b+1) call, where randrange() is a long and versatile function, almost 100 lines of Python code, its "fast track" ends in a return istart + self._randbelow(width) after circa 50 lines of checking and initializing.
_randbelow() is actually a choice, depending on the existence of getrandbits() which exists in your case, so probably this one is running:
def _randbelow_with_getrandbits(self, n):
"Return a random int in the range [0,n). Returns 0 if n==0."
if not n:
return 0
getrandbits = self.getrandbits
k = n.bit_length() # don't use (n-1) here because n can be 1
r = getrandbits(k) # 0 <= r < 2**k
while r >= n:
r = getrandbits(k)
return r
so randint() ends in the same getrandbits() after doing a lot of other things, and that's pretty much how it's slower than it.

The underlying reason is the function that getrandbits calls is written in C, look at the source code at GitHub Python repository
You will find that getrandbits under the hood calls urandom from os module, and if you look at the implementation of this function, you'll find that it is indeed written in C.
os_urandom_impl(PyObject *module, Py_ssize_t size)
/*[clinic end generated code: output=42c5cca9d18068e9 input=4067cdb1b6776c29]*/
{
PyObject *bytes;
int result;
if (size < 0)
return PyErr_Format(PyExc_ValueError,
"negative argument not allowed");
bytes = PyBytes_FromStringAndSize(NULL, size);
if (bytes == NULL)
return NULL;
result = _PyOS_URandom(PyBytes_AS_STRING(bytes), PyBytes_GET_SIZE(bytes));
if (result == -1) {
Py_DECREF(bytes);
return NULL;
}
return bytes;
}
Talking about randint, if you look at the source code for this function, it calls randrange, which further calls _randbelow_with_get_randbits.
def randrange(self, start, stop=None, step=_ONE):
"""Choose a random item from range(start, stop[, step]).
This fixes the problem with randint() which includes the
endpoint; in Python this is usually not what you want.
"""
# This code is a bit messy to make it fast for the
# common case while still doing adequate error checking.
try:
istart = _index(start)
except TypeError:
istart = int(start)
if istart != start:
_warn('randrange() will raise TypeError in the future',
DeprecationWarning, 2)
raise ValueError("non-integer arg 1 for randrange()")
_warn('non-integer arguments to randrange() have been deprecated '
'since Python 3.10 and will be removed in a subsequent '
'version',
DeprecationWarning, 2)
if stop is None:
# We don't check for "step != 1" because it hasn't been
# type checked and converted to an integer yet.
if step is not _ONE:
raise TypeError('Missing a non-None stop argument')
if istart > 0:
return self._randbelow(istart)
raise ValueError("empty range for randrange()")
# stop argument supplied.
try:
istop = _index(stop)
except TypeError:
istop = int(stop)
if istop != stop:
_warn('randrange() will raise TypeError in the future',
DeprecationWarning, 2)
raise ValueError("non-integer stop for randrange()")
_warn('non-integer arguments to randrange() have been deprecated '
'since Python 3.10 and will be removed in a subsequent '
'version',
DeprecationWarning, 2)
width = istop - istart
try:
istep = _index(step)
except TypeError:
istep = int(step)
if istep != step:
_warn('randrange() will raise TypeError in the future',
DeprecationWarning, 2)
raise ValueError("non-integer step for randrange()")
_warn('non-integer arguments to randrange() have been deprecated '
'since Python 3.10 and will be removed in a subsequent '
'version',
DeprecationWarning, 2)
# Fast path.
if istep == 1:
if width > 0:
return istart + self._randbelow(width)
raise ValueError("empty range for randrange() (%d, %d, %d)" % (istart, istop, width))
# Non-unit step argument supplied.
if istep > 0:
n = (width + istep - 1) // istep
elif istep < 0:
n = (width + istep + 1) // istep
else:
raise ValueError("zero step for randrange()")
if n <= 0:
raise ValueError("empty range for randrange()")
return istart + istep * self._randbelow(n)
def randint(self, a, b):
"""Return random integer in range [a, b], including both end points.
"""
return self.randrange(a, b+1)
As you can see, there are so many things going on with Try/Except block in randrange function and as they are written in Python, due to all these overheads and control flow, it's slow.

This is the code that gets executed when you call randint:
#!/usr/bin/python3
import random
random.randint(0, 1023)
$ python3 -m trace -t s.py
zen importlib._bootstrap>(194): s.py(4): random.randint(0, 1023)
--- modulename: random, funcname: randint
random.py(339): return self.randrange(a, b+1)
--- modulename: random, funcname: randrange
random.py(301): istart = int(start)
random.py(302): if istart != start:
random.py(304): if stop is None:
random.py(310): istop = int(stop)
random.py(311): if istop != stop:
random.py(313): width = istop - istart
random.py(314): if step == 1 and width > 0:
random.py(315): return istart + self._randbelow(width)
--- modulename: random, funcname: _randbelow_with_getrandbits
random.py(241): if not n:
random.py(243): getrandbits = self.getrandbits
random.py(244): k = n.bit_length() # don't use (n-1) here because n can be 1
random.py(245): r = getrandbits(k) # 0 <= r < 2**k
random.py(246): while r >= n:
random.py(248): return r
So the answer is that there's some extra boilerplate that ends up calling getrandbits, which you call directly otherwise. Note also that calling _randbelow_with_getrandbits with a power of two exhibits the worst case behavior: It will sample getrandbits(11) until it gets a number with 10 bits, so it will discard half of the values (this is not the main reason why it is slower, but adding the rejection sampling logic to your calls to getrandbits make them ~twice as slow).
Some of the answers say that getrandbits end up calling os.urandom. That is not true.
The regular getrandbits function is implemented directly in C in _random_Random_getrandbits_impl. The one that calls os.urandom is from SystemRandom.
You can check it by using the trace module:
#!/usr/bin/python3
import random
r = random.SystemRandom()
r.getrandbits(5)
random.getrandbits(5)
$ python3 -m trace -t s.py
...
<frozen importlib._bootstrap>(186): <frozen importlib._bootstrap>(187): <frozen importlib._bootstrap>(191): <frozen importlib._bootstrap>(192): <frozen importlib._bootstrap>(194): s.py(4): r = random.SystemRandom()
--- modulename: random, funcname: __init__
random.py(123): self.seed(x)
--- modulename: random, funcname: seed
random.py(800): return None
random.py(124): self.gauss_next = None
s.py(6): r.getrandbits(5)
--- modulename: random, funcname: getrandbits
random.py(786): if k < 0:
random.py(788): numbytes = (k + 7) // 8 # bits / 8 and rounded up
random.py(789): x = int.from_bytes(_urandom(numbytes), 'big')
random.py(790): return x >> (numbytes * 8 - k) # trim excess bits
s.py(8): random.getrandbits(5)

Related

Cython promlem: cimport libcpp.vector not compiled

I'm trying to use cython to speed up my code. Since I'm working with an array of strings, I want to use string and vector from c++. But I have problems compiling if I import c libraries. For an example, I tried to implement an example from here: https://cython.readthedocs.io/en/latest/src/tutorial/cython_tutorial.html.
So, my code is
from libcpp.vector cimport vector
def primes(unsigned int nb_primes):
cdef int n, i
cdef vector[int] p
p.reserve(nb_primes) # allocate memory for 'nb_primes' elements.
n = 2
while p.size() < nb_primes: # size() for vectors is similar to len()
for i in p:
if n % i == 0:
break
else:
p.push_back(n) # push_back is similar to append()
n += 1
# Vectors are automatically converted to Python
# lists when converted to Python objects.
return p
I save thiscode like 'test_char.pyx'. For compilation i use it:
from Cython.Build import cythonize
setup(name='test_char',
ext_modules = cythonize('test_char.pyx')
)
After that i get test_char.c, but i don't get test_char.py.
If i will use this code (without cimport):
def primes(int nb_primes):
cdef int n, i, len_p
cdef int p[1000]
if nb_primes > 1000:
nb_primes = 1000
len_p = 0 # The current number of elements in p.
n = 2
while len_p < nb_primes:
# Is n prime?
for i in p[:len_p]:
if n % i == 0:
break
# If no break occurred in the loop, we have a prime.
else:
p[len_p] = n
len_p += 1
n += 1
# Let's return the result in a python list:
result_as_list = [prime for prime in p[:len_p]]
return result_as_list
all be right. So, plz, any ideas?
from distutils.extension import Extension
extensions = [
Extension("test_char", ["test_char.pyx"]
, language="c++"
)
]
setup(
name="test_char",
ext_modules = cythonize(extensions),
)
it can solve this problem

Decimal To Binary Python Getting an Extra Zero In Return String

This is for a school project. I need to create a function using recursion to convert an integer to binary string. It must be a str returned, not an int. The base case is n==0, and then 0 would need to be returned. There must be a base case like this, but this is where I think I am getting the extra 0 from (I could be wrong). I am using Python 3.6 with the IDLE and the shell to execute it.
The function works just fine, expect for this additional zero that I need gone.
Here is my function, dtobr:
def dtobr(n):
"""
(int) -> (str)
This function has the parameter n, which is a non-negative integer,
and it will return the string of 0/1's
which is the binary representation of n. No side effects.
Returns bianry string as mentioned. This is like the function
dtob (decimal to bianary) but this is using recursion.
Examples:
>>> dtob(27)
'11011'
>>> dtob(0)
'0'
>>> dtob(1)
'1'
>>> dtob(2)
'10'
"""
if n == 0:
return str(0)
return dtobr(n // 2) + str(n % 2)
This came from the function I already wrote which converted it just fine, but without recursion. For reference, I will include this code as well, but this is not what I need for this project, and there are no errors with this:
def dtob(n):
"""
(int) -> (str)
This function has the parameter n, which is a non-negative integer,
and it will return the string of 0/1's
which is the binary representation of n. No side effects.
Returns bianry string as mentioned.
Examples:
>>> dtob(27)
'11011'
>>> dtob(0)
'0'
>>> dtob(1)
'1'
>>> dtob(2)
'10'
"""
string = ""
if n == 0:
return str(0)
while n > 0:
remainder = n % 2
string = str(remainder) + string
n = n // 2
Hopefully someone can help me get ride of that additional left hand zero. Thanks!
You need to change the condition to recursively handle both the n // 2 and n % 2:
if n <= 1:
return str(n) # per #pault's suggestion, only needed str(n) instead of str(n % 2)
else:
return dtobr(n // 2) + dtobr(n % 2)
Test case:
for i in [0, 1, 2, 27]:
print(dtobr(i))
# 0
# 1
# 10
# 11011
FYI you can easily convert to binary format like so:
'{0:b}'.format(x) # where x is your number
Since there is already an answer that points and resolves the issue with recursive way, lets see some interesting ways to achieve same goal.
Lets define a generator that will give us iterative way of getting binary numbers.
def to_binary(n):
if n == 0: yield "0"
while n > 0:
yield str(n % 2)
n = n / 2
Then you can use this iterable to get decimal to binary conversion in multiple ways.
Example 1.
reduce function is used to concatenate chars received from to_binary iterable (generator).
from functools import reduce
def to_binary(n):
if n == 0: yield "0"
while n > 0:
yield str(n % 2)
n = n / 2
print reduce(lambda x, y: x+y, to_binary(0)) # 0
print reduce(lambda x, y: x+y, to_binary(15)) # 1111
print reduce(lambda x, y: x+y, to_binary(15)) # 11011
Example 2.
join takes iterable, unrolls it and joins them by ''
def to_binary(n):
if n == 0: yield "0"
while n > 0:
yield str(n % 2)
n = n / 2
print ''.join(to_binary(0)) # 0
print ''.join(to_binary(1)) # 1
print ''.join(to_binary(15)) # 1111
print ''.join(to_binary(27)) # 11011

How to pull a random value from two numbers based on a Mersenne Twister returned value

I have implemented a Mersenne Twister in Python using the following example code and while it does work as intended, I am not clear on how to limit results returned to a range of integers. For example, if I wanted to use this mechanism to determine the value of a dice roll, I would (right now, in an incredibly inefficient manner) iterate through potential results from the MT until something falls within the set. There has to be a much more memory-efficient manner to get a value to fall within a set range, like 1-20.
Currently, the algorithm returns a random set of numbers that don't seem to peak anywhere close to the set range:
2840889030
2341262508
2626522481
893458501
1134227444
3424236607
4171927007
1414775506
318984778
811882651
1509520423
1796453323
571461449
2606098999
2100002233
202969379
2318195635
1583585513
863717092
1218132929
1044954980
2997947229
867650808
177016714
2532350044
2917724494
2789913671
2793703767
1477382755
2552234519
2230774266
956596469
1165204853
1261233074
1856099289
21274564
1867584221
200970721
2112891842
139474834
93227265
1919721548
1026587194
30693196
3114464709
2194502660
2235520335
1877205724
1093736467
3136329929
1838505684
1358237877
2394536120
1268347552
1222927042
2982839076
1155599683
1943346953
3778719619
1483759762
3227630028
2775862513
2991889829
4252811853
995611629
626323532
3895812866
4027023347
3778533921
3840271846
4289281429
2263887842
402963991
2957069652
238880521
3643974307
472466724
3309455978
3588191581
1390613042
290666747
1375502175
1172854301
2159248842
3279978887
2206149102
804187781
3811948116
4134597627
1556281173
2590972812
3291094915
1836658937
3721612785
365099684
3884686172
2966532828
3609464378
1672431128
3959413372
For testing purposes I implemented the current test logic (part of __main__) and so far it just runs infinitely.
#!/usr/bin/env python2
# -*- coding: utf-8 -*-
"""
Based on the pseudocode in https://en.wikipedia.org/wiki/Mersenne_Twister. Generates uniformly distributed 32-bit integers in the range [0, 232 − 1] with the MT19937 algorithm
Yaşar Arabacı <yasar11732 et gmail nokta com>
"""
# Create a length 624 list to store the state of the generator
MT = [0 for i in xrange(624)]
index = 0
# To get last 32 bits
bitmask_1 = (2 ** 32) - 1
# To get 32. bit
bitmask_2 = 2 ** 31
# To get last 31 bits
bitmask_3 = (2 ** 31) - 1
def initialize_generator(seed):
"Initialize the generator from a seed"
global MT
global bitmask_1
MT[0] = seed
for i in xrange(1,624):
MT[i] = ((1812433253 * MT[i-1]) ^ ((MT[i-1] >> 30) + i)) & bitmask_1
def extract_number():
"""
Extract a tempered pseudorandom number based on the index-th value,
calling generate_numbers() every 624 numbers
"""
global index
global MT
if index == 0:
generate_numbers()
y = MT[index]
y ^= y >> 11
y ^= (y << 7) & 2636928640
y ^= (y << 15) & 4022730752
y ^= y >> 18
index = (index + 1) % 624
return y
def generate_numbers():
"Generate an array of 624 untempered numbers"
global MT
for i in xrange(624):
y = (MT[i] & bitmask_2) + (MT[(i + 1 ) % 624] & bitmask_3)
MT[i] = MT[(i + 397) % 624] ^ (y >> 1)
if y % 2 != 0:
MT[i] ^= 2567483615
if __name__ == "__main__":
from datetime import datetime
now = datetime.now()
solved = False
initialize_generator(now.microsecond)
#for i in xrange(10):
# "Print 10 random numbers as an example"
while(solved != True):
generated_number = extract_number()
while(generated_number <= 20 and generated_number >= 1):
print generated_number
solved = True
Any advice on how to implement? It appears that it may not even get a chance to drop down to a number within the predefined set.

Performance of choice vs randint

I want to pick a random integer between a and b, inclusive.
I know 3 ways of doing it. However, their performance seems very counter-intuitive:
import timeit
t1 = timeit.timeit("n=random.randint(0, 2)", setup="import random", number=100000)
t2 = timeit.timeit("n=random.choice([0, 1, 2])", setup="import random", number=100000)
t3 = timeit.timeit("n=random.choice(ar)", setup="import random; ar = [0, 1, 2]", number=100000)
[print(t) for t in [t1, t2, t3]]
On my machine, this gives:
0.29744589625620965
0.19716156798482648
0.17500512311108346
Using an online interpreter, this gives:
0.23830216699570883
0.16536146598809864
0.15081614299560897
Note how the most direct version (#1) that uses the dedicated function for doing what I'm doing is 50% worse that the strangest version (#3) which pre-defines an array and then chooses randomly from it.
What's going on?
It's just implementation details. randint delegates to randrange, so it has another layer of function call overhead, and randrange goes through a lot of argument checking and other crud. In contrast, choice is a really simple one-liner.
Here's the code path randint goes through for this call, with comments and unexecuted code stripped out:
def randint(self, a, b):
return self.randrange(a, b+1)
def randrange(self, start, stop=None, step=1, _int=int, _maxwidth=1L<<BPF):
istart = _int(start)
if istart != start:
# not executed
if stop is None:
# not executed
istop = _int(stop)
if istop != stop:
# not executed
width = istop - istart
if step == 1 and width > 0:
if width >= _maxwidth:
# not executed
return _int(istart + _int(self.random()*width))
And here's the code path choice goes through:
def choice(self, seq):
return seq[int(self.random() * len(seq))]

Why is my radix sort python implementation slower than quick sort?

I rewrote the original radix sort algorithm for Python from Wikipedia using arrays from SciPy to gain performance and to reduce code length, which I managed to accomplish. Then I took the classic (in-memory, pivot based) quick sort algorithm from Literate Programming and compared their performance.
I had the expectation that radix sort will beat quick sort beyond a certain threshold, which it did not. Further, I found Erik Gorset's Blog's asking the question "Is radix sort faster than quick sort for integer arrays?". There the answer is that
.. the benchmark shows the MSB in-place radix sort to be consistently over 3 times faster than quicksort for large arrays.
Unfortunately, I could not reproduce the result; the differences are that (a) Erik chose Java and not Python and (b) he uses the MSB in-place radix sort, whereas I just fill buckets inside a Python dictionary.
According to theory radix sort should be faster (linear) compared to quick sort; but apparently it depends a lot on the implementation. So where is my mistake?
Here is the code comparing both algorithms:
from sys import argv
from time import clock
from pylab import array, vectorize
from pylab import absolute, log10, randint
from pylab import semilogy, grid, legend, title, show
###############################################################################
# radix sort
###############################################################################
def splitmerge0 (ls, digit): ## python (pure!)
seq = map (lambda n: ((n // 10 ** digit) % 10, n), ls)
buf = {0:[], 1:[], 2:[], 3:[], 4:[], 5:[], 6:[], 7:[], 8:[], 9:[]}
return reduce (lambda acc, key: acc.extend(buf[key]) or acc,
reduce (lambda _, (d,n): buf[d].append (n) or buf, seq, buf), [])
def splitmergeX (ls, digit): ## python & numpy
seq = array (vectorize (lambda n: ((n // 10 ** digit) % 10, n)) (ls)).T
buf = {0:[], 1:[], 2:[], 3:[], 4:[], 5:[], 6:[], 7:[], 8:[], 9:[]}
return array (reduce (lambda acc, key: acc.extend(buf[key]) or acc,
reduce (lambda _, (d,n): buf[d].append (n) or buf, seq, buf), []))
def radixsort (ls, fn = splitmergeX):
return reduce (fn, xrange (int (log10 (absolute (ls).max ()) + 1)), ls)
###############################################################################
# quick sort
###############################################################################
def partition (ls, start, end, pivot_index):
lower = start
upper = end - 1
pivot = ls[pivot_index]
ls[pivot_index] = ls[end]
while True:
while lower <= upper and ls[lower] < pivot: lower += 1
while lower <= upper and ls[upper] >= pivot: upper -= 1
if lower > upper: break
ls[lower], ls[upper] = ls[upper], ls[lower]
ls[end] = ls[lower]
ls[lower] = pivot
return lower
def qsort_range (ls, start, end):
if end - start + 1 < 32:
insertion_sort(ls, start, end)
else:
pivot_index = partition (ls, start, end, randint (start, end))
qsort_range (ls, start, pivot_index - 1)
qsort_range (ls, pivot_index + 1, end)
return ls
def insertion_sort (ls, start, end):
for idx in xrange (start, end + 1):
el = ls[idx]
for jdx in reversed (xrange(0, idx)):
if ls[jdx] <= el:
ls[jdx + 1] = el
break
ls[jdx + 1] = ls[jdx]
else:
ls[0] = el
return ls
def quicksort (ls):
return qsort_range (ls, 0, len (ls) - 1)
###############################################################################
if __name__ == "__main__":
###############################################################################
lower = int (argv [1]) ## requires: >= 2
upper = int (argv [2]) ## requires: >= 2
color = dict (enumerate (3*['r','g','b','c','m','k']))
rslbl = "radix sort"
qslbl = "quick sort"
for value in xrange (lower, upper):
#######################################################################
ls = randint (1, value, size=value)
t0 = clock ()
rs = radixsort (ls)
dt = clock () - t0
print "%06d -- t0:%0.6e, dt:%0.6e" % (value, t0, dt)
semilogy (value, dt, '%s.' % color[int (log10 (value))], label=rslbl)
#######################################################################
ls = randint (1, value, size=value)
t0 = clock ()
rs = quicksort (ls)
dt = clock () - t0
print "%06d -- t0:%0.6e, dt:%0.6e" % (value, t0, dt)
semilogy (value, dt, '%sx' % color[int (log10 (value))], label=qslbl)
grid ()
legend ((rslbl,qslbl), numpoints=3, shadow=True, prop={'size':'small'})
title ('radix & quick sort: #(integer) vs duration [s]')
show ()
###############################################################################
###############################################################################
And here is the result comparing sorting durations in seconds (logarithmic vertical axis) for integer arrays of size in range from 2 to 1250 (horizontal axis); lower curve belongs to quick sort:
Radix vs Quick Sort Comparison
Quick sort is smooth at the power changes (e.g. at 10, 100 or 1000), but radix sort just jumps a little but follows otherwise qualitatively the same path as quick sort, just much slower!
You have several problems here.
First of all, as pointed out in the comments, your data set is far too small for the theoretical complexity to overcome the overheads in the code.
Next your implementation with all those unnecessary function calls and copying lists around is very inefficient. Writing the code in a straightforward procedural manner will almost always be faster than a functional solution (for Python that is, other languages will differ here). You have a procedural implementation of quicksort so if you write your radix sort in the same style it may turn out faster even for small lists.
Finally, it may be that when you do try large lists the overheads of memory management begin to dominate. That means that you have a limited window between small lists where the efficiency of the implementation is the dominant factor and large lists where the memory management is the dominant factor.
Here's some code that uses your quicksort but a simple radixsort written procedurally but trying to avoid so much copying of data. You'll see that even for short lists it beats the quicksort but more interestingly as the data size goes up so does the ratio between quicksort and radix sort and then it begins to drop again as the memory management starts to dominate (simple things like freeing a list of 1,000,000 items take a significant time):
from random import randint
from math import log10
from time import clock
from itertools import chain
def splitmerge0 (ls, digit): ## python (pure!)
seq = map (lambda n: ((n // 10 ** digit) % 10, n), ls)
buf = {0:[], 1:[], 2:[], 3:[], 4:[], 5:[], 6:[], 7:[], 8:[], 9:[]}
return reduce (lambda acc, key: acc.extend(buf[key]) or acc,
reduce (lambda _, (d,n): buf[d].append (n) or buf, seq, buf), [])
def splitmerge1 (ls, digit): ## python (readable!)
buf = [[] for i in range(10)]
divisor = 10 ** digit
for n in ls:
buf[(n//divisor)%10].append(n)
return chain(*buf)
def radixsort (ls, fn = splitmerge1):
return list(reduce (fn, xrange (int (log10 (max(abs(val) for val in ls)) + 1)), ls))
###############################################################################
# quick sort
###############################################################################
def partition (ls, start, end, pivot_index):
lower = start
upper = end - 1
pivot = ls[pivot_index]
ls[pivot_index] = ls[end]
while True:
while lower <= upper and ls[lower] < pivot: lower += 1
while lower <= upper and ls[upper] >= pivot: upper -= 1
if lower > upper: break
ls[lower], ls[upper] = ls[upper], ls[lower]
ls[end] = ls[lower]
ls[lower] = pivot
return lower
def qsort_range (ls, start, end):
if end - start + 1 < 32:
insertion_sort(ls, start, end)
else:
pivot_index = partition (ls, start, end, randint (start, end))
qsort_range (ls, start, pivot_index - 1)
qsort_range (ls, pivot_index + 1, end)
return ls
def insertion_sort (ls, start, end):
for idx in xrange (start, end + 1):
el = ls[idx]
for jdx in reversed (xrange(0, idx)):
if ls[jdx] <= el:
ls[jdx + 1] = el
break
ls[jdx + 1] = ls[jdx]
else:
ls[0] = el
return ls
def quicksort (ls):
return qsort_range (ls, 0, len (ls) - 1)
if __name__=='__main__':
for value in 1000, 10000, 100000, 1000000, 10000000:
ls = [randint (1, value) for _ in range(value)]
ls2 = list(ls)
last = -1
start = clock()
ls = radixsort(ls)
end = clock()
for i in ls:
assert last <= i
last = i
print("rs %d: %0.2fs" % (value, end-start))
tdiff = end-start
start = clock()
ls2 = quicksort(ls2)
end = clock()
last = -1
for i in ls2:
assert last <= i
last = i
print("qs %d: %0.2fs %0.2f%%" % (value, end-start, ((end-start)/tdiff*100)))
The output when I run this is:
C:\temp>c:\python27\python radixsort.py
rs 1000: 0.00s
qs 1000: 0.00s 212.98%
rs 10000: 0.02s
qs 10000: 0.05s 291.28%
rs 100000: 0.19s
qs 100000: 0.58s 311.98%
rs 1000000: 2.47s
qs 1000000: 7.07s 286.33%
rs 10000000: 31.74s
qs 10000000: 86.04s 271.08%
Edit:
Just to clarify. The quicksort implementation here is very memory friendly, it sorts in-place so no matter how large the list it is just shuffling data around not copying it. The original radixsort effectively copies the list twice for each digit: once into the smaller lists and then again when you concatenate the lists. Using itertools.chain avoids that second copy but there's still a lot of memory allocation/deallocation going on. (Also 'twice' is approximate as list appending does involve extra copying even if it is amortized O(1) so I should maybe say 'proportional to twice'.)
Your data representation is very expensive. Why do you use a hashmap for your buckets? Why use a base10 representation that you need to compute logarithms (= expensive to compute) for?
Avoid lambda expressions and such, I don't think python can optimize them very well yet.
Maybe start with sorting 10-byte strings for the benchmark instead. And: no Hashmaps and similar expensive datastructures.

Categories

Resources