Python not in dict condition sentence performance - python

Does anybody know about what is better to use thinking about speed and resources? Link to some trusted sources would be much appreciated.
if key not in dictionary.keys():
or
if not dictionary.get(key):

Firstly, you'd do
if key not in dictionary:
since dicts are iterated over by keys.
Secondly, the two statements are not equivalent - the second condition would be true if the corresponding values is falsy (0, "", [] etc.), not only if the key doesn't exist.
Lastly, the first method is definitely faster and more pythonic. Function/method calls are expensive. If you're unsure, timeit.

In my experience, using in is faster than using get, although the speed of get can be improved by caching the get method so it doesn't have to be looked up each time. Here are some timeit tests:
''' in vs get speed test
Comparing the speed of cache retrieval / update using `get` vs using `in`
http://stackoverflow.com/a/35451912/4014959
Written by PM 2Ring 2015.12.01
Updated for Python 3 2017.08.08
'''
from __future__ import print_function
from timeit import Timer
from random import randint
import dis
cache = {}
def get_cache(x):
''' retrieve / update cache using `get` '''
res = cache.get(x)
if res is None:
res = cache[x] = x
return res
def get_cache_defarg(x, get=cache.get):
''' retrieve / update cache using defarg `get` '''
res = get(x)
if res is None:
res = cache[x] = x
return res
def in_cache(x):
''' retrieve / update cache using `in` '''
if x in cache:
return cache[x]
else:
res = cache[x] = x
return res
#slow to fast.
funcs = (
get_cache,
get_cache_defarg,
in_cache,
)
def show_bytecode():
for func in funcs:
fname = func.__name__
print('\n%s' % fname)
dis.dis(func)
def time_test(reps, loops):
''' Print timing stats for all the functions '''
for func in funcs:
fname = func.__name__
print('\n%s: %s' % (fname, func.__doc__))
setup = 'from __main__ import data, ' + fname
cmd = 'for v in data: %s(v)' % (fname,)
times = []
t = Timer(cmd, setup)
for i in range(reps):
r = 0
for j in range(loops):
r += t.timeit(1)
cache.clear()
times.append(r)
times.sort()
print(times)
datasize = 1024
maxdata = 32
data = [randint(1, maxdata) for i in range(datasize)]
#show_bytecode()
time_test(3, 500)
typical output on my 2Ghz machine running Python 2.6.6:
get_cache: retrieve / update cache using `get`
[0.65624237060546875, 0.68499755859375, 0.76354193687438965]
get_cache_defarg: retrieve / update cache using defarg `get`
[0.54204297065734863, 0.55032730102539062, 0.56702113151550293]
in_cache: retrieve / update cache using `in`
[0.48754477500915527, 0.49125504493713379, 0.50087881088256836]

TLDR: Use if key not in dictionary. This is idiomatic, robust and fast.
There are four versions of relevance to this question: the 2 posed in the question, and the optimal variant of them:
key not in dictionary.keys() # inA
key not in dictionary # inB
not dictionary.get(key) # getA
sentinel = object()
dictionary.get(key, sentinel) is not sentinel # getB
Both A variants have shortcomings that mean you should not use them. inA needlessly creates a dict view on the keys - this adds an indirection step. getA looks at the truth of the value - this leads to incorrect results for values such as '' or 0.
As for using inB over getB: both do the same thing, namely looking at whether there is a value for key. However, getB also returns that value or default and has to compare it against the sentinel. Consequently, using get is considerably slower:
$ PREPARE="
> import random
> data = {a: True for a in range(0, 512, 2)}
> sentinel=object()"
$ python3 -m perf timeit -s "$PREPARE" '27 in data'
.....................
Mean +- std dev: 33.9 ns +- 0.8 ns
$ python3 -m perf timeit -s "$PREPARE" 'data.get(27, sentinel) is not sentinel'
.....................
Mean +- std dev: 105 ns +- 5 ns
Note that pypy3 has practically the same performance for both variants once the JIT has warmed up.

Ok, I've tested it on python 3.4.3 and all three ways give the same result around 0.00001 second.
import random
a = {}
for i in range(0, 1000000):
a[str(random.random())] = random.random()
import time
t1 = time.time(); 1 in a.keys(); t2 = time.time(); print("Time=%s" % (t2 - t1))
t1 = time.time(); 1 in a; t2 = time.time(); print("Time=%s" % (t2 - t1))
t1 = time.time(); not a.get(1); t2 = time.time(); print("Time=%s" % (t2 - t1))

Related

Am I stupid or is Julia just insanely faster than python?

I'm trying to do a pretty simple task in Python that I have already done in Julia. It consists of taking an array of multiple 3d elements and making a dictionary of indexes of unique values from that list (note the list is 6,000,000 elements long). I have done this in Julia and it is reasonably fast (6 seconds) - here is the code:
function unique_ids(itr)
#create dictionnary where keys have type of whatever itr is
d = Dict{eltype(itr), Vector}()
#iterate through values in itr
for (index,val) in enumerate(itr)
#check if the dictionary
if haskey(d, val)
push!(d[val],index)
else
#add value of itr if its not in v yet
d[val] = [index]
end
end
return collect(values(d))
end
So far so good. However, when I try doing this in Python, it seems to take forever, so long that I can't even tell you how long. So the question is, am I doing something dumb here, or is this just the reality of the differences between these two languages? Here is my Python code, a translation of the Julia code.
def unique_ids(row_list):
d = {}
for (index,val) in tqdm(enumerate(row_list)):
if str(val) in d:
d[str(val)].extend([index])
else:
d[str(val)] = [index]
return list(d.values())
Note that I use strings for the keys of the dict in Python as it is not possible to have an array as a key in Python.
I think the bottom line is this type of function can definitely run in python in less than 6 seconds.
I think the main issue as many people have pointed out is tqdm and using a string in the dictionary. If you take these out it gets a lot faster. Interesting, swapping to the collections.defaultdict really helps as well. If I get a moment I will write the equivalent function in C++ using the python C API and I will append those results.
I have included the test code below, for 1 test iteration with 6,000,000 I get 4.9 secs best in python; this is with an i9-9980XE processor, I don't know what your test is on. Note the key is critical, if I swap the tuple to be an int the time is 1.48 secs, so how the input data is presented makes a huge difference.
Method
Time
Relative
Original
16.01739319995977
10.804717667865178
Removing tqdm
12.82462279999163
8.650997501336496
Removing to string and just using tuple
5.3935559000237845
3.6382854561971936
Removing double dictionary lookup
4.682285099988803
3.1584895191283664
Using collections defaultdict
4.493273599946406
3.0309896277014063
Using defaultdict with int key
1.4824443999677896
1.0
Looking over a smaller dataset (1,000,000), but more iterations (100) I get a closer gap:
Method
Time
Relative
Original
253.63316379999742
4.078280268213264
Removing tqdm
195.89607029996114
3.1498999032904
Removing to string and just using tuple
69.18050129996845
1.1123840004584065
Removing double dictionary lookup
68.65376710001146
1.1039144073575153
Using collections defaultdict
62.19120489998022
1.0
The Julia benchmarks do look very interesting. I haven't had a chance to look at these in detail but I do wonder how much the python benchmarks leverage libraries like numpy as scipy.
With this test code:
from tqdm import tqdm
import timeit, random
from collections import defaultdict
random.seed(1)
rand_max = 100
data_size = 1000000
iterations = 100
data = [tuple(random.randint(0, rand_max) for i in range(3)) for j in range(data_size)]
data2 = [t[0] for t in data]
def method0(row_list):
d = {}
for (index,val) in tqdm(enumerate(row_list)):
if str(val) in d:
d[str(val)].extend([index])
else:
d[str(val)] = [index]
return list(d.values())
def method1(row_list):
d = {}
for index,val in enumerate(row_list):
if str(val) in d:
d[str(val)].extend([index])
else:
d[str(val)] = [index]
return list(d.values())
def method2(row_list):
d = {}
for index, val in enumerate(row_list):
if val in d:
d[val].extend([index])
else:
d[val] = [index]
return list(d.values())
def method3(row_list):
d = {}
for index, val in enumerate(row_list):
if (l := d.get(val)):
l.append(index)
else:
d[val] = [index]
return d.values()
def method4(row_list):
d = defaultdict(list)
for (index,val) in enumerate(row_list):
d[val].append(index)
return list(d.values())
assert (m0 := method0(data)) == method1(data)
assert m0 == method2(data)
assert (m0 := sorted(m0)) == sorted(method3(data))
assert m0 == sorted(method4(data))
t0 = timeit.timeit(lambda: method0(data), number=iterations)
t1 = timeit.timeit(lambda: method1(data), number=iterations)
t2 = timeit.timeit(lambda: method2(data), number=iterations)
t3 = timeit.timeit(lambda: method3(data), number=iterations)
t4 = timeit.timeit(lambda: method4(data), number=iterations)
tmin = min((t0, t1, t2, t3, t4))
print(f'| Method | Time | Relative |')
print(f'|------------------ |----------------------|')
print(f'| Original | {t0} | {t0 / tmin} |')
print(f'| Removing tqdm | {t1} | {t1 / tmin} |')
print(f'| Removing to string and just using tuple | {t2} | {t2 / tmin} |')
print(f'| Removing double dictionary lookup | {t3} | {t3 / tmin} |')
print(f'| Using collections defaultdict | {t4} | {t4 / tmin} |')
Not really, proper Python code that uses Python built-in objects takes roughly 5 seconds ... or 6 seconds if you increase rand_max
import random
from collections import defaultdict
input_list_len = 6_000_000
rand_max = 100
row_list = []
for _ in range(input_list_len):
row_list.append(tuple(random.randint(0,rand_max) for x in range(3)))
def unique_ids(row_list):
d = defaultdict(list)
for (index,val) in enumerate(row_list):
d[val].append(index)
return list(d.values())
import time
t1 = time.time()
output = unique_ids(row_list)
t2 = time.time()
print(f"total time = {t2-t1}")
The thing is, Julia is a JIT compiled language, so a compiler can optimize the code for speed by knowing the objects' types, Python is an interpreted language, you have to know the correct functions and containers to use to make the code fast by running in C, otherwise your code will be running in a slow op-code interpreter.
Edit: this is CPython that we are benchmarking, this implementation of python is heavily dependent on using C code for speedup, which makes it the most extendible version of python, other versions of python that use a JIT compiler like PyPy ( and Julia) are not as extendible as CPython, but have faster code by compiling their op-code to machine code at runtime, while CPython trades the speed of python op-code for extendibility with faster C code.
Edit2: the real power of CPython is that, assuming that you app is based around this operation, you can write this operation in C/C++ and add it to CPython very easily to be faster than julia at this operation most of the time, you also get access to many python modules that are also written in C/C++ to speed up your development, in the end CPython is usually used as a glue around a wide range of C/C++ modules, basically as a high level glue.
Edit3: to settle this for python once and for all, using the same llvm compiler of julia to make python code faster using numba, the resulting time is 3.3 seconds, which is as fast as julia can get, i am really impressed by the maintainers of numba.
import random
import numba
from numba.typed.typedlist import List
from numba.typed.typeddict import Dict
import numba.types
input_list_len = 6_000_000
rand_max = 100
#numba.njit
def generate_list():
row_list = List()
for _ in range(input_list_len):
a = random.randint(0,rand_max)
b = random.randint(0,rand_max)
c = random.randint(0,rand_max)
row_list.append((a,b,c))
return row_list
row_list = generate_list()
#numba.njit("ListType(ListType(int64))(ListType(UniTuple(int64,3)))")
def unique_ids(row_list):
d = Dict()
for (index,val) in enumerate(row_list):
if val in d:
d[val].append(index)
else:
a = List()
a.append(index)
d[val] = a
return List(d.values())
import time
t1 = time.time()
output = unique_ids(row_list)
t2 = time.time()
print(f"total time = {t2-t1}")

python-measure function time

I am having a problem with measuring the time of a function.
My function is a "linear search":
def linear_search(obj, item,):
for i in range(0, len(obj)):
if obj[i] == item:
return i
return -1
And I made another function that measures the time 100 times and adds all the results to a list:
def measureTime(a):
nl=[]
import random
import time
for x in range(0,100): #calculating time
start = time.time()
a
end =time.time()
times=end-start
nl.append(times)
return nl
When I'm using measureTime(linear_search(list,random.choice(range(0,50)))), the function always returns [0.0].
What can cause this problem? Thanks.
you are actually passing the result of linear_search into function measureTime, you need to pass in the function and arguments instead for them to be execute inside measureTime function like #martijnn2008 answer
Or better wise you can consider using timeit module to to the job for you
from functools import partial
import timeit
def measureTime(n, f, *args):
# return average runtime for n number of times
# use a for loop with number=1 to get all individual n runtime
return timeit.timeit(partial(f, *args), number=n)
# running within the module
measureTime(100, linear_search, list, random.choice(range(0,50)))
# if running interactively outside the module, use below, lets say your module name mymodule
mymodule.measureTime(100, mymodule.linear_search, mymodule.list, mymodule.random.choice(range(0,50)))
Take a look at the following example, don't know exactly what you are trying to achieve so I guessed it ;)
import random
import time
def measureTime(method, n, *args):
start = time.time()
for _ in xrange(n):
method(*args)
end = time.time()
return (end - start) / n
def linear_search(lst, item):
for i, o in enumerate(lst):
if o == item:
return i
return -1
lst = [random.randint(0, 10**6) for _ in xrange(10**6)]
repetitions = 100
for _ in xrange(10):
item = random.randint(0, 10**6)
print 'average runtime =',
print measureTime(linear_search, repetitions, lst, item) * 1000, 'ms'

binarySearch vs in, Unexpected Results (Python)

I am trying to compare the complexity of in and binarySearch in python2. Expecting O(1) for in and O(logn) for binarySearch. However, the results are unexpected. Are programs timed incorrectly or there is another mistake?
Here is the code:
import time
x = [x for x in range(1000000)]
def Time_in(alist,item):
t1 = time.time()
found = item in alist
t2 = time.time()
timer = t2 - t1
return found, timer
def Time_binarySearch(alist, item):
first = 0
last = len(alist)-1
found = False
t1 = time.time()
while first<=last and not found:
midpoint = (first + last)//2
if alist[midpoint] == item:
found = True
else:
if item < alist[midpoint]:
last = midpoint-1
else:
first = midpoint+1
t2 = time.time()
timer = t2 - t1
return found, timer
print "binarySearch: ", Time_binarySearch(x, 600000)
print "in: ", Time_in(x, 600000)
The results are:
The binary search is going so fast that when you try to print the time it took, it just prints 0.0. Whereas using in takes long enough that you see the very small fraction of a second it took.
The reason that in does take longer is because this is a list, not a set or similar data structure; whereas with a set, membership testing is somewhere between O(1) and O(logn), in a list, every element has to be checked in order until there's a match, or the list is exhausted.
Here's some benchmarking code:
from __future__ import print_function
import bisect
import timeit
def binarysearch(alist, item):
first = 0
last = len(alist) - 1
found = False
while first <= last and not found:
midpoint = (first + last) // 2
if alist[midpoint] == item:
found = True
else:
if item < alist[midpoint]:
last = midpoint - 1
else:
first = midpoint + 1
return found
def bisect_index(alist, item):
idx = bisect.bisect_left(alist, item)
if idx != len(alist) and alist[idx] == item:
found = True
else:
found = False
return found
time_tests = [
(' 600 in list(range(1000))',
'600 in alist',
'alist = list(range(1000))'),
(' 600 in list(range(10000000))',
'600 in alist',
'alist = list(range(10000000))'),
(' 600 in set(range(1000))',
'600 in aset',
'aset = set(range(1000))'),
('6000000 in set(range(10000000))',
'6000000 in aset',
'aset = set(range(10000000))'),
('binarysearch(list(range(1000)), 600)',
'binarysearch(alist, 600)',
'from __main__ import binarysearch; alist = list(range(1000))'),
('binarysearch(list(range(10000000)), 6000000)',
'binarysearch(alist, 6000000)',
'from __main__ import binarysearch; alist = list(range(10000000))'),
('bisect_index(list(range(1000)), 600)',
'bisect_index(alist, 600)',
'from __main__ import bisect_index; alist = list(range(1000))'),
('bisect_index(list(range(10000000)), 6000000)',
'bisect_index(alist, 6000000)',
'from __main__ import bisect_index; alist = list(range(10000000))'),
]
for display, statement, setup in time_tests:
result = timeit.timeit(statement, setup, number=1000000)
print('{0:<45}{1}'.format(display, result))
And the results:
# Python 2.7
600 in list(range(1000)) 5.29039907455
600 in list(range(10000000)) 5.22499394417
600 in set(range(1000)) 0.0402979850769
6000000 in set(range(10000000)) 0.0390179157257
binarysearch(list(range(1000)), 600) 0.961972951889
binarysearch(list(range(10000000)), 6000000) 3.014950037
bisect_index(list(range(1000)), 600) 0.421462059021
bisect_index(list(range(10000000)), 6000000) 0.634694814682
# Python 3.4
600 in list(range(1000)) 8.578510413994081
600 in list(range(10000000)) 8.578105041990057
600 in set(range(1000)) 0.04088461003266275
6000000 in set(range(10000000)) 0.043901249999180436
binarysearch(list(range(1000)), 600) 1.6799193460028619
binarysearch(list(range(10000000)), 6000000) 6.099467994994484
bisect_index(list(range(1000)), 600) 0.5168328559957445
bisect_index(list(range(10000000)), 6000000) 0.7694612839259207
# PyPy 2.6.0 (Python 2.7.9)
600 in list(range(1000)) 0.122292041779
600 in list(range(10000000)) 0.00196599960327
600 in set(range(1000)) 0.101480007172
6000000 in set(range(10000000)) 0.00759720802307
binarysearch(list(range(1000)), 600) 0.242530822754
binarysearch(list(range(10000000)), 6000000) 0.189949035645
bisect_index(list(range(1000)), 600) 0.132127046585
bisect_index(list(range(10000000)), 6000000) 0.197204828262
Why do you expect O(1) when testing if an element is contained in a list?
If you don't know anything about the list (like that it is sorted as in your example) then you have to go through each element and compare it.
So you get O(N).
Python lists cannot assume anything about what you store in them, so they have to use a naive implementation for list.__contains__.
If you want a faster test, then you can try to use a dictionary or set.
Here are time complexities of all methods for lists in python:
So as it can be seen x in s is O(n), which is significantly slower than binarySearch O(logn).

How to use timeit where each test requires random setup

I have a function f(x) that takes as input a list x of 100 random floats between 0 and 1. Different lists will result in different running times of f.
I want to find out how long f takes to run on average, over a large number of different random lists. What's the best way to do this? Should I use timeit and if so is there a way I can do this without including the time it takes to generate each random list in each trial?
This is how I would do it without timeit (pseudocode):
for i = 1 to 10000:
x = random list
start = current time
f(x)
end = current time
results.append(end - start)
return mean(results)
You can make a timer decorator:
Here is some example code:
from time import time
class Timer(object):
def __init__(self, func):
"""
Decorator that times a function
#param func: Function being decorated
#type func: callable
"""
self.func = func
def __call__(self, *args, **kwargs):
start = time()
self.func(*args, **kwargs)
end = time()
return end - start
#Timer
def cheese():
for var in xrange(9999999):
continue
for var in xrange(100):
print cheese()
Working example, with fewer loops.
import timeit, random
def summer(myList):
result = 0
for num in myList:
result += num
return result
for i in range(10):
x = [random.randint(0, 100) for i in range(100000)]
print timeit.timeit("summer(x)", setup="from __main__ import x, summer", number = 100)
You can import the variable using from __main__ import x
I think this does the trick. It will execute the setup once per repeat and then execute stmt number=1 time. However I don't think this is that much better than the simple loop you posted.
import timeit
stmt = '[x*x*x for x in xrange(n)]' # just an example
setup = 'import random; n = random.randint(10, 100)'
r = 10000
times = timeit.repeat(stmt, setup, repeat=r, number=1)
print min(times), max(times), sum(times)/r
There is also a "cell mode" that you can use with timeit in the IPython shell, but it only returns the fasted time and there is no easy way to change that (?).
import random
%%timeit -r 10000 -n 1 n = random.randint(10,100)
var = [x*x*x for x in xrange(n)]

Increasing speed of python code

I have some python code that has many classes. I used cProfile to find that the total time to run the program is 68 seconds. I found that the following function in a class called Buyers takes about 60 seconds of those 68 seconds. I have to run the program about 100 times, so any increase in speed will help. Can you suggest ways to increase the speed by modifying the code? If you need more information that will help, please let me know.
def qtyDemanded(self, timePd, priceVector):
'''Returns quantity demanded in period timePd. In addition,
also updates the list of customers and non-customers.
Inputs: timePd and priceVector
Output: count of people for whom priceVector[-1] < utility
'''
## Initialize count of customers to zero
## Set self.customers and self.nonCustomers to empty lists
price = priceVector[-1]
count = 0
self.customers = []
self.nonCustomers = []
for person in self.people:
if person.utility >= price:
person.customer = 1
self.customers.append(person)
else:
person.customer = 0
self.nonCustomers.append(person)
return len(self.customers)
self.people is a list of person objects. Each person has customer and utility as its attributes.
EDIT - responsed added
-------------------------------------
Thanks so much for the suggestions. Here is the
response to some questions and suggestions people have kindly
made. I have not tried them all, but will try others and write back later.
(1) #amber - the function is accessed 80,000 times.
(2) #gnibbler and others - self.people is a list of Person objects in memory. Not connected to a database.
(3) #Hugh Bothwell
cumtime taken by the original function - 60.8 s (accessed 80000 times)
cumtime taken by the new function with local function aliases as suggested - 56.4 s (accessed 80000 times)
(4) #rotoglup and #Martin Thomas
I have not tried your solutions yet. I need to check the rest of the code to see the places where I use self.customers before I can make the change of not appending the customers to self.customers list. But I will try this and write back.
(5) #TryPyPy - thanks for your kind offer to check the code.
Let me first read a little on the suggestions you have made to see if those will be feasible to use.
EDIT 2
Some suggested that since I am flagging the customers and noncustomers in the self.people, I should try without creating separate lists of self.customers and self.noncustomers using append. Instead, I should loop over the self.people to find the number of customers. I tried the following code and timed both functions below f_w_append and f_wo_append. I did find that the latter takes less time, but it is still 96% of the time taken by the former. That is, it is a very small increase in the speed.
#TryPyPy - The following piece of code is complete enough to check the bottleneck function, in case your offer is still there to check it with other compilers.
Thanks again to everyone who replied.
import numpy
class person(object):
def __init__(self, util):
self.utility = util
self.customer = 0
class population(object):
def __init__(self, numpeople):
self.people = []
self.cus = []
self.noncus = []
numpy.random.seed(1)
utils = numpy.random.uniform(0, 300, numpeople)
for u in utils:
per = person(u)
self.people.append(per)
popn = population(300)
def f_w_append():
'''Function with append'''
P = 75
cus = []
noncus = []
for per in popn.people:
if per.utility >= P:
per.customer = 1
cus.append(per)
else:
per.customer = 0
noncus.append(per)
return len(cus)
def f_wo_append():
'''Function without append'''
P = 75
for per in popn.people:
if per.utility >= P:
per.customer = 1
else:
per.customer = 0
numcustomers = 0
for per in popn.people:
if per.customer == 1:
numcustomers += 1
return numcustomers
EDIT 3: It seems numpy is the problem
This is in response to what John Machin said below. Below you see two ways of defining Population class. I ran the program below twice, once with each way of creating Population class. One uses numpy and one does not use numpy. The one without numpy takes similar time as John found in his runs. One with numpy takes much longer. What is not clear to me is that the popn instance is created before time recording begins (at least that is what it appears from the code). Then, why is numpy version taking longer. And, I thought numpy was supposed to be more efficient. Anyhow, the problem seems to be with numpy and not so much with the append, even though it does slow down things a little. Can someone please confirm with the code below? Thanks.
import random # instead of numpy
import numpy
import time
timer_func = time.time # using Mac OS X 10.5.8
class Person(object):
def __init__(self, util):
self.utility = util
self.customer = 0
class Population(object):
def __init__(self, numpeople):
random.seed(1)
self.people = [Person(random.uniform(0, 300)) for i in xrange(numpeople)]
self.cus = []
self.noncus = []
# Numpy based
# class Population(object):
# def __init__(self, numpeople):
# numpy.random.seed(1)
# utils = numpy.random.uniform(0, 300, numpeople)
# self.people = [Person(u) for u in utils]
# self.cus = []
# self.noncus = []
def f_wo_append(popn):
'''Function without append'''
P = 75
for per in popn.people:
if per.utility >= P:
per.customer = 1
else:
per.customer = 0
numcustomers = 0
for per in popn.people:
if per.customer == 1:
numcustomers += 1
return numcustomers
t0 = timer_func()
for i in xrange(20000):
x = f_wo_append(popn)
t1 = timer_func()
print t1-t0
Edit 4: See the answers by John Machin and TryPyPy
Since there have been so many edits and updates here, those who find themselves here for the first time may be a little confused. See the answers by John Machin and TryPyPy. Both of these can help in improving the speed of the code substantially. I am grateful to them and others who alerted me to slowness of append. Since, in this instance I am going to use John Machin's solution and not use numpy for generating utilities, I am accepting his response as an answer. However, I really appreciate the directions pointed out by TryPyPy also.
There are many things you can try after optimizing your Python code for speed. If this program doesn't need C extensions, you can run it under PyPy to benefit from its JIT compiler. You can try making a C extension for possibly huge speedups. Shed Skin will even allow you to convert your Python program to a standalone C++ binary.
I'm willing to time your program under these different optimization scenarios if you can provide enough code for benchmarking,
Edit: First of all, I have to agree with everyone else: are you sure you're measuring the time correctly? The example code runs 100 times in under 0.1 seconds here, so there is a good chance the either the time is wrong or you have a bottleneck (IO?) that isn't present in the code sample.
That said, I made it 300000 people so times were consistent. Here's the adapted code, shared by CPython (2.5), PyPy and Shed Skin:
from time import time
import random
import sys
class person(object):
def __init__(self, util):
self.utility = util
self.customer = 0
class population(object):
def __init__(self, numpeople, util):
self.people = []
self.cus = []
self.noncus = []
for u in util:
per = person(u)
self.people.append(per)
def f_w_append(popn):
'''Function with append'''
P = 75
cus = []
noncus = []
# Help CPython a bit
# cus_append, noncus_append = cus.append, noncus.append
for per in popn.people:
if per.utility >= P:
per.customer = 1
cus.append(per)
else:
per.customer = 0
noncus.append(per)
return len(cus)
def f_wo_append(popn):
'''Function without append'''
P = 75
for per in popn.people:
if per.utility >= P:
per.customer = 1
else:
per.customer = 0
numcustomers = 0
for per in popn.people:
if per.customer == 1:
numcustomers += 1
return numcustomers
def main():
try:
numpeople = int(sys.argv[1])
except:
numpeople = 300000
print "Running for %s people, 100 times." % numpeople
begin = time()
random.seed(1)
# Help CPython a bit
uniform = random.uniform
util = [uniform(0.0, 300.0) for _ in xrange(numpeople)]
# util = [random.uniform(0.0, 300.0) for _ in xrange(numpeople)]
popn1 = population(numpeople, util)
start = time()
for _ in xrange(100):
r = f_wo_append(popn1)
print r
print "Without append: %s" % (time() - start)
popn2 = population(numpeople, util)
start = time()
for _ in xrange(100):
r = f_w_append(popn2)
print r
print "With append: %s" % (time() - start)
print "\n\nTotal time: %s" % (time() - begin)
if __name__ == "__main__":
main()
Running with PyPy is as simple as running with CPython, you just type 'pypy' instead of 'python'. For Shed Skin, you must convert to C++, compile and run:
shedskin -e makefaster.py && make
# Check that you're using the makefaster.so file and run test
python -c "import makefaster; print makefaster.__file__; makefaster.main()"
And here is the Cython-ized code:
from time import time
import random
import sys
cdef class person:
cdef readonly int utility
cdef public int customer
def __init__(self, util):
self.utility = util
self.customer = 0
class population(object):
def __init__(self, numpeople, util):
self.people = []
self.cus = []
self.noncus = []
for u in util:
per = person(u)
self.people.append(per)
cdef int f_w_append(popn):
'''Function with append'''
cdef int P = 75
cdef person per
cus = []
noncus = []
# Help CPython a bit
# cus_append, noncus_append = cus.append, noncus.append
for per in popn.people:
if per.utility >= P:
per.customer = 1
cus.append(per)
else:
per.customer = 0
noncus.append(per)
cdef int lcus = len(cus)
return lcus
cdef int f_wo_append(popn):
'''Function without append'''
cdef int P = 75
cdef person per
for per in popn.people:
if per.utility >= P:
per.customer = 1
else:
per.customer = 0
cdef int numcustomers = 0
for per in popn.people:
if per.customer == 1:
numcustomers += 1
return numcustomers
def main():
cdef int i, r, numpeople
cdef double _0, _300
_0 = 0.0
_300 = 300.0
try:
numpeople = int(sys.argv[1])
except:
numpeople = 300000
print "Running for %s people, 100 times." % numpeople
begin = time()
random.seed(1)
# Help CPython a bit
uniform = random.uniform
util = [uniform(_0, _300) for i in xrange(numpeople)]
# util = [random.uniform(0.0, 300.0) for _ in xrange(numpeople)]
popn1 = population(numpeople, util)
start = time()
for i in xrange(100):
r = f_wo_append(popn1)
print r
print "Without append: %s" % (time() - start)
popn2 = population(numpeople, util)
start = time()
for i in xrange(100):
r = f_w_append(popn2)
print r
print "With append: %s" % (time() - start)
print "\n\nTotal time: %s" % (time() - begin)
if __name__ == "__main__":
main()
For building it, it's nice to have a setup.py like this one:
from distutils.core import setup
from distutils.extension import Extension
from Cython.Distutils import build_ext
ext_modules = [Extension("cymakefaster", ["makefaster.pyx"])]
setup(
name = 'Python code to speed up',
cmdclass = {'build_ext': build_ext},
ext_modules = ext_modules
)
You build it with:
python setupfaster.py build_ext --inplace
Then test:
python -c "import cymakefaster; print cymakefaster.file; cymakefaster.main()"
Timings were run five times for each version, with Cython being the fastest and easiest of the code generators to use (Shed Skin aims to be simpler, but cryptic error messages and implicit static typing made it harder here). As for best value, PyPy gives impressive speedup in the counter version with no code changes.
#Results (time in seconds for 30000 people, 100 calls for each function):
Mean Min Times
CPython 2.5.2
Without append: 35.037 34.518 35.124, 36.363, 34.518, 34.620, 34.559
With append: 29.251 29.126 29.339, 29.257, 29.259, 29.126, 29.272
Total time: 69.288 68.739 69.519, 70.614, 68.746, 68.739, 68.823
PyPy 1.4.1
Without append: 2.672 2.655 2.655, 2.670, 2.676, 2.690, 2.668
With append: 13.030 12.672 12.680, 12.725, 14.319, 12.755, 12.672
Total time: 16.551 16.194 16.196, 16.229, 17.840, 16.295, 16.194
Shed Skin 0.7 (gcc -O2)
Without append: 1.601 1.599 1.599, 1.605, 1.600, 1.602, 1.599
With append: 3.811 3.786 3.839, 3.795, 3.798, 3.786, 3.839
Total time: 5.704 5.677 5.715, 5.705, 5.699, 5.677, 5.726
Cython 0.14 (gcc -O2)
Without append: 1.692 1.673 1.673, 1.710, 1.678, 1.688, 1.711
With append: 3.087 3.067 3.079, 3.080, 3.119, 3.090, 3.067
Total time: 5.565 5.561 5.562, 5.561, 5.567, 5.562, 5.572
Edit: Aaaand more meaningful timings, for 80000 calls with 300 people each:
Results (time in seconds for 300 people, 80000 calls for each function):
Mean Min Times
CPython 2.5.2
Without append: 27.790 25.827 25.827, 27.315, 27.985, 28.211, 29.612
With append: 26.449 24.721 24.721, 27.017, 27.653, 25.576, 27.277
Total time: 54.243 50.550 50.550, 54.334, 55.652, 53.789, 56.892
Cython 0.14 (gcc -O2)
Without append: 1.819 1.760 1.760, 1.794, 1.843, 1.827, 1.871
With append: 2.089 2.063 2.100, 2.063, 2.098, 2.104, 2.078
Total time: 3.910 3.859 3.865, 3.859, 3.944, 3.934, 3.951
PyPy 1.4.1
Without append: 0.889 0.887 0.894, 0.888, 0.890, 0.888, 0.887
With append: 1.671 1.665 1.665, 1.666, 1.671, 1.673, 1.681
Total time: 2.561 2.555 2.560, 2.555, 2.561, 2.561, 2.569
Shed Skin 0.7 (g++ -O2)
Without append: 0.310 0.301 0.301, 0.308, 0.317, 0.320, 0.303
With append: 1.712 1.690 1.733, 1.700, 1.735, 1.690, 1.702
Total time: 2.027 2.008 2.035, 2.008, 2.052, 2.011, 2.029
Shed Skin becomes fastest, PyPy surpasses Cython. All three speed things up a lot compared to CPython.
Please consider trimming down your f_wo_append function:
def f_wo_append():
'''Function without append'''
P = 75
numcustomers = 0
for person in popn.people:
person.customer = iscust = person.utility >= P
numcustomers += iscust
return numcustomers
Edit in response to OP's comment """This made it a lot worse! The trimmed version takes 4 times more time than the version I have posted above. """
There is no way that that could take "4 times more" (5 times?) ... here is my code, which demonstrates a significant reduction in the "without append" case, as I suggested, and also introduces a significant improvement in the "with append" case.
import random # instead of numpy
import time
timer_func = time.clock # better on Windows, use time.time on *x platform
class Person(object):
def __init__(self, util):
self.utility = util
self.customer = 0
class Population(object):
def __init__(self, numpeople):
random.seed(1)
self.people = [Person(random.uniform(0, 300)) for i in xrange(numpeople)]
self.cus = []
self.noncus = []
def f_w_append(popn):
'''Function with append'''
P = 75
cus = []
noncus = []
for per in popn.people:
if per.utility >= P:
per.customer = 1
cus.append(per)
else:
per.customer = 0
noncus.append(per)
popn.cus = cus # omitted from OP's code
popn.noncus = noncus # omitted from OP's code
return len(cus)
def f_w_append2(popn):
'''Function with append'''
P = 75
popn.cus = []
popn.noncus = []
cusapp = popn.cus.append
noncusapp = popn.noncus.append
for per in popn.people:
if per.utility >= P:
per.customer = 1
cusapp(per)
else:
per.customer = 0
noncusapp(per)
return len(popn.cus)
def f_wo_append(popn):
'''Function without append'''
P = 75
for per in popn.people:
if per.utility >= P:
per.customer = 1
else:
per.customer = 0
numcustomers = 0
for per in popn.people:
if per.customer == 1:
numcustomers += 1
return numcustomers
def f_wo_append2(popn):
'''Function without append'''
P = 75
numcustomers = 0
for person in popn.people:
person.customer = iscust = person.utility >= P
numcustomers += iscust
return numcustomers
if __name__ == "__main__":
import sys
popsize, which, niter = map(int, sys.argv[1:4])
pop = Population(popsize)
func = (f_w_append, f_w_append2, f_wo_append, f_wo_append2)[which]
t0 = timer_func()
for _unused in xrange(niter):
nc = func(pop)
t1 = timer_func()
print "popsize=%d func=%s niter=%d nc=%d seconds=%.2f" % (
popsize, func.__name__, niter, nc, t1 - t0)
and here are the results of running it (Python 2.7.1, Windows 7 Pro, "Intel Core i3 CPU 540 # 3.07 GHz"):
C:\junk>\python27\python ncust.py 300 0 80000
popsize=300 func=f_w_append niter=80000 nc=218 seconds=5.48
C:\junk>\python27\python ncust.py 300 1 80000
popsize=300 func=f_w_append2 niter=80000 nc=218 seconds=4.62
C:\junk>\python27\python ncust.py 300 2 80000
popsize=300 func=f_wo_append niter=80000 nc=218 seconds=5.55
C:\junk>\python27\python ncust.py 300 3 80000
popsize=300 func=f_wo_append2 niter=80000 nc=218 seconds=4.29
Edit 3 Why numpy takes longer:
>>> import numpy
>>> utils = numpy.random.uniform(0, 300, 10)
>>> print repr(utils[0])
42.777972538362874
>>> type(utils[0])
<type 'numpy.float64'>
and here's why my f_wo_append2 function took 4 times longer:
>>> x = utils[0]
>>> type(x)
<type 'numpy.float64'>
>>> type(x >= 75)
<type 'numpy.bool_'> # iscust refers to a numpy.bool_
>>> type(0 + (x >= 75))
<type 'numpy.int32'> # numcustomers ends up referring to a numpy.int32
>>>
The empirical evidence is that these custom types aren't so fast when used as scalars ... perhaps because they need to reset the floating-point hardware each time they are used. OK for big arrays, not for scalars.
Are you using any other numpy functionality? If not, just use the random module. If you have other uses for numpy, you may wish to coerce the numpy.float64 to float during the population setup.
You can eliminate some lookups by using local function aliases:
def qtyDemanded(self, timePd, priceVector):
'''Returns quantity demanded in period timePd. In addition,
also updates the list of customers and non-customers.
Inputs: timePd and priceVector
Output: count of people for whom priceVector[-1] < utility
'''
price = priceVector[-1]
self.customers = []
self.nonCustomers = []
# local function aliases
addCust = self.customers.append
addNonCust = self.nonCustomers.append
for person in self.people:
if person.utility >= price:
person.customer = 1
addCust(person)
else:
person.customer = 0
addNonCust(person)
return len(self.customers)
This comment rings alarm bells:
'''Returns quantity demanded in period timePd. In addition,
also updates the list of customers and non-customers.
Aside from the fact that timePd is not used in the function, if you really want just to return the quantity, do just that in the function. Do the "in addition" stuff in a separate function.
Then profile again and see which of these two functions you are spending most of your time in.
I like to apply SRP to methods as well as classes: it makes them easier to test.
Depending on how often you add new elements to self.people or change person.utility, you could consider sorting self.people by the utility field.
Then you could use a bisect function to find the lower index i_pivot where the person[i_pivot].utility >= price condition is met. This would have a lower complexity ( O(log N) ) than your exhaustive loop ( O(N) )
With this information, you could then update your people list if needed :
Do you really need to update the utility field each time ? In the sorted case, you could easily deduce this value while iterating : for example, considering your list sorted in incresing order, utility = (index >= i_pivot)
Same question with customers and nonCustomers lists. Why do you need them? They could be replaced by slices of the original sorted list : for example, customers = self.people[0:i_pivot]
All this would allow you to reduce the complexity of your algorithm, and use more built-in (fast) Python functions, this could speedup your implementation.
Some curious things I noted:
timePd is passed as a parameter but never used
price is an array but you only ever use the last entry - why not pass the value there instead of passing the list?
count is initialized and never used
self.people contains multiple person objects which are then copied to either self.customers or self.noncustomers as well as having their customer flag set. Why not skip the copy operation and, on return, just iterate over the list, looking at the customer flag? This would save the expensive append.
Alternatively, try using psyco which can speed up pure Python, sometimes considerably.
It's surprising that the function shown is such a bottleneck because it's so relatively simple. For that reason, I'd double check my profiling procedure and results. However, if they're correct, the most time consuming part of your function has to be the for loop it contains, of course, so it makes sense to focus on speeding that up. One way to do this is by replacing the if/else with straight-line code. You can also reduce the attribute lookup for the append list method slightly. Here's how both of those things could be accomplished:
def qtyDemanded(self, timePd, priceVector):
'''Returns quantity demanded in period timePd. In addition,
also updates the list of customers and non-customers.
Inputs: timePd and priceVector
Output: count of people for whom priceVector[-1] < utility
'''
price = priceVector[-1] # last price
kinds = [[], []] # initialize sublists of noncustomers and customers
kindsAppend = [kinds[b].append for b in (False, True)] # append methods
for person in self.people:
person.customer = person.utility >= price # customer test
kindsAppend[person.customer](person) # add to proper list
self.nonCustomers = kinds[False]
self.customers = kinds[True]
return len(self.customers)
That said, I must add that it seems a little redundant to have both a customer flag in each person object and also put each of them into a separate list depending on that attribute. Not creating these two lists would of course speed the loop up further.
You're asking for guesses, and mostly you're getting guesses.
There's no need to guess. Here's an example.

Categories

Resources