Simple parallel process in Python

Simple parallel process in Python - python

I have a (very) simple process to carry out many times, along the lines of:
import time
class Shop:
def __init__(self):
time.sleep(1.)
self.sale_price = 0
def update(self, weight, price):
self.sale_price = weight * price
my_list = [Shop() for _ in range(20)] #20 shops
weight, price = 2, 3
for o in my_list:
o.update(weight, price)
(The function update changes the internal state of o, without any return value)
my question: What's a simple way to make this loop/process run in parallel?
I've seen many examples here for parallel mathematical operations (e.g. matrix operations), but I can't get them working for my simple example. I'm using Python 3.6.5

Found a simple solution:
import time
from joblib import Parallel, delayed
import multiprocessing
class Shop:
def __init__(self):
time.sleep(1.)
self.sale_price = 0
def update(self, weight, price):
self.sale_price = weight * price
my_list = [Shop() for _ in range(20)] #20 shops
weight, price = 2, 3
### Non-parallel
for o in my_list:
o.update(weight, price)
### Parallel
#Helper function (doesn't take weight/price parameters - assumes these are already defined)
def update2(o):
o.update(weight, price)
return o
num_cores = multiprocessing.cpu_count()
my_list = Parallel(n_jobs=num_cores)(delayed(update2)(o) for o in my_list)
I realise this reassigns my_list at the end, but this is fine for my purpose.

Related

Adjusting nested apply() functions for large datasets

I have two dataframes that I'm trying to compare, and am facing a volume issue.
I am passing one row of a new item description through a 4.5 million row inventory list and calculating similarity. I only need the top x recommendations and am realizing my current approach quickly gets overwhelmed with the volume of data and is crashing the kernel.
I have not dealt with this data size before, so I am unsure how to adjust my code.
Any advice is greatly appreciated. The current approach was to put the data into the dataframe first(holding_df) and then groupby to collect the top recommendations, but once this process is scaled to the full size of the data, it crashes.
> df.head()
item_desc
0 paintbrush
1 mop #2
2 red bucket
3 o-light flashlight
> df_inventory.head()
item_desc
0 broom
1 mop
2 bucket
3 flashlight
import pandas as pd
from fuzzywuzzy import fuzz
def calculate_similarity(x, y):
sample_list.append(
{
"New Item": x,
"Inventory Item": y,
"Similarity": fuzz.ratio(str(x).lower(), str(y).lower()),
}
)
return
sample_list = []
df = pd.DataFrame(
{"ITEM_DESC": ["paintbrush", "mop #2", "red bucket", "o-light flashlight"]}
)
df_inventory = pd.DataFrame({"ITEM_DESC": ["broom", "mop", "bucket", "flashlight"]})
temp = df["ITEM_DESC"].apply(
lambda x: df_inventory["ITEM_DESC"].apply(lambda y: calculate_similarity(x, y))
)
holding_df = pd.DataFrame(sample_list)

I implemented something in plain Python that won't break your kernel, but it won't be super fast.
It takes about 6-7 seconds to compare a single new product with the whole inventory. That will probably be too slow for 3.5k items (about 6h and 20 min if I'd run it on my machine). With some work, it can be parallelized though.
6.5s per new item
3500 * 6.5 / 3600 (s/h) -> 6h 20min
The main memory-saver is the FixedSizeLeaderboard class that I implemented to keep track of the top n most similar items for a new product. As the task is now CPU-Bound and not really memory-bound, you can benefit from rewriting it a bit to use the multiprocessing module.
I decided to just generate some test data that may or may not represent actual performance. I added a few comments where you'd plug in your data.
import bisect
import collections
import contextlib
import itertools
import time
import typing
import uuid
from fuzzywuzzy import fuzz
#contextlib.contextmanager
def log_runtime(task: str):
"""Contextmanager that logs the runtime of a piece of code."""
start = time.perf_counter()
yield
runtime = time.perf_counter() - start
print("Task '%s' took %.4f seconds" % (task, runtime))
def inventory_generator() -> typing.Iterable[str]:
"""Returns an iterable that yields product names."""
def string_generator() -> typing.Iterable[str]:
while True:
yield str(uuid.uuid4())
yield from ("aaa", "aba", "def", "dse", "asd")
yield from string_generator()
class FixedSizeLeaderboard:
size: int
_min_score: int
_items: typing.List[typing.Tuple[int, object]]
def __init__(self, size) -> None:
self.size = size
self._items = []
self._min_score = None
def add(self, score: int, item: object) -> None:
if len(self._items) < self.size or score > self._min_score:
self._eject_element_with_lowest_score()
bisect.insort(self._items, (score, item))
self._min_score = self._items[0][0]
def _eject_element_with_lowest_score(self) -> None:
if len(self._items) == self.size:
# The list is sorted, so we can pop the first one
self._items.pop(0)
def get(self) -> typing.List[typing.Tuple[int, object]]:
return sorted(self._items, reverse=True)
def main():
num_new_products = 2
num_products_in_inventory = 4_500_000
top_n_similarities = 3
with log_runtime("Generate dummy-products"):
# Convert everything to lowercase once.
# This is not really required for uuids, but it should happen ONCE
# Instead of the inventory_generator, you'd pass the content of your dataframe here.
new_products = list(
map(str.lower, itertools.islice(inventory_generator(), num_new_products))
)
inventoried_products = list(
map(
str.lower,
itertools.islice(inventory_generator(), num_products_in_inventory),
)
)
task_desc = (
f"{num_new_products} x {num_products_in_inventory}"
f" = {num_new_products * num_products_in_inventory} similarity computations"
)
product_to_leaderboard: typing.Dict[
str, FixedSizeLeaderboard
] = collections.defaultdict(lambda: FixedSizeLeaderboard(top_n_similarities))
with log_runtime(task_desc):
for new_product, existing_product in itertools.product(
new_products, inventoried_products
):
similarity = fuzz.ratio(new_product, existing_product)
product_to_leaderboard[new_product].add(similarity, existing_product)
# Sort of pretty output formatting
for product, similarities in product_to_leaderboard.items():
print("=" * 3, "New Product", product, "=" * 3)
for position, (score, product) in enumerate(similarities.get()):
print(f"{position + 1:02}. score: {score} product: {product}")
if __name__ == "__main__":
main()
If we execute it, we get something like this:
$ python apply_thingy.py
Task 'Generate dummy-products' took 1.6449 seconds
Task '2 x 4500000 = 9000000 similarity computations' took 12.0887 seconds
=== New Product 2d10f990-355e-42f6-b518-0a21a7fb8d5c ===
01. score: 56 product: f2100878-3c3e-4f86-b410-3c362184d195
02. score: 56 product: 5fc9b30c-35ed-4167-b997-1bf0a2af5b68
03. score: 56 product: 523210b2-e5e0-496a-b0b1-a1b2af49b0d5
=== New Product aaa ===
01. score: 100 product: aaa
02. score: 100 product: aaa
03. score: 100 product: aaa

Optimizing a parallel implementation of a list comprehension

I have a dataframe, where each row contains a list of integers. I also have a reference-list that I use to check what integers in the dataframe appear in this list.
I have made two implementations of this, one single-threaded and one multi-threaded. The single-threaded implementation is quite fast (takes roughly 0.1s on my machine), whereas the multithreaded takes roughly 5s.
My question is: Is this due to my implementation being poor, or is this merely a case where the overhead due to multithreading is so large that it doesn't make sense to use multiple threads?
The example is below:
import time
from random import randint
import pandas as pd
import multiprocessing
from functools import partial
class A:
def __init__(self, N):
self.ls = [[randint(0, 99) for i in range(20)] for j in range(N)]
self.ls = pd.DataFrame({'col': self.ls})
self.lst_nums = [randint(0, 99) for i in range(999)]
#classmethod
def helper(cls, lst_nums, col):
return any([s in lst_nums for s in col])
def get_idx_method1(self):
method1 = self.ls['col'].apply(lambda nums: any(x in self.lst_nums for x in nums))
return method1
def get_idx_method2(self):
pool = multiprocessing.Pool(processes=1)
method2 = pool.map(partial(A.helper, self.lst_nums), self.ls['col'])
pool.close()
return method2
if __name__ == "__main__":
a = A(50000)
start = time.time()
m1 = a.get_idx_method1()
end = time.time()
print(end-start)
start = time.time()
m2 = a.get_idx_method2()
end = time.time()
print(end - start)

First of all, multiprocessing is useful when the cost of data communication between the main process and the others is less comparable to the time cost of the function.
Another thing is that you made an error in your code:
def helper(cls, lst_nums, col):
return any([s in lst_nums for s in col])
VS
any(x in self.lst_nums for x in nums)
You have that list [] in the helper method, which will make the any() method to wait for the entire array to be computed, while the second any() will just stop at the first True value.
In conclusion if you remove list brackets from the helper method and maybe increase the randint range for lst_nums initializer, you will notice an increase in speed when using multiple processes.
self.lst_nums = [randint(0, 10000) for i in range(999)]
and
def helper(cls, lst_nums, col):
return any(s in lst_nums for s in col)

python genetic algorithm found optimum solution

I explain, I am trying to develop a program to optimize a system based on the parameters it receives. My program will have to vary these parameters to try to find the best possible combination.
here is a code to simplify my problem:
parameters=[["toto1","toto2","toto3"],["tutu1","tutu2","tutu3"],["titi1","titi2","titi3"],["tata1","tata2","tata3"]]
def MySysteme(param1,param2,param3,param4):
result=0
for i in range(0,len(param1)):
result+=ord(param1[i])
for i in range(0,len(param2)):
result+=ord(param1[i])
for i in range(0,len(param3)):
result+=ord(param1[i])
for i in range(0,len(param4)):
result+=ord(param1[i])
return result
print(MySysteme(parameters[0][0],parameters[1][2],parameters[2][2],parameters[3][0]))
print(MySysteme(parameters[1][0],parameters[2][2],parameters[3][2],parameters[0][0]))
print(MySysteme(parameters[3][1],parameters[1][2],parameters[2][2],parameters[0][0]))
#how to find the highest value?
I try to (try) find the highest number, without testing all the parameters naively. hence the use of a genetic algorithm. 1 parameter is a list contained in the list parameters, the contents of the list is a varariante of my parameter
knowing that in my function / my system, one should not have 2 times the same parameter, for example this should not happen: print (MySystem (parameters [1] [0], parameters [1] [0])) or this print (MySystem (parameters [2] [1], parameters [2] [0]))
on the other hand the number of parameters is included in 1 and 4 (there can be 1,2,3 or 4 parameters)
To solve my problem here is the data that I consider: Individual: it is a variant of parameter which carries a name ("toto1", "tata3", "toto2 = 12" ... etc.) Population: set of the variants of the parameters fitness : it is the result of the function according to the parameters a circuit: a set of parameters
but unlike the commercial traveler, I have no starting data => that is to say that I do not have GPS coordinates. and it is at this level that I am stuck for the resolution of my problem.
can anyone help me?
edit:
I have been looking some examples of how I could find the points at which a function achieves its maxium using a genetic algorithm approach in Python. I looked at this tutorial
https://lethain.com/genetic-algorithms-cool-name-damn-simple/
my objective is to found the smaller number to "mySysteme" function
i set a new code :
je re-explique mon probleme plus simplement. J’ai mets un code plus complet, plus clair avec un algo génétique.
from random import randint, random
from operator import add
from functools import reduce
parameters=[["toto123","toto27","toto3000"],["tu","tut","tutu378694245"],["t","choicezaert","titi3=78965"],["blabla","2","conjoncture_is_enable"]]
def individual(length, min, max):
return [ randint(min,max) for x in range(length) ]
def population(count, length, min, max):
return [ individual(length, min, max) for x in range(count) ]
def fitness(individual, target):
sum = reduce(add, individual, 0)
return abs(target-sum)
def grade(pop, target):
individu_number_parameters=randint(1, len(parameters)-1)
for j in range(0,individu_number_parameters):
position=randint(1, len(parameters)-1)
parameter=parameters[position]
if isinstance(parameter, list):
parameter=parameters[position][randint(1, len(parameters[position])-1)]
result=0
for i in range(0,len(parameter)):
result+=ord(parameter[i])
return result
def evolve(pop, target, retain=0.2, random_select=0.05, mutate=0.01):
graded = [ (fitness(x, target), x) for x in pop]
graded = [ x[1] for x in sorted(graded)]
retain_length = int(len(graded)*retain)
parents = graded[:retain_length]
for individual in graded[retain_length:]:
if random_select > random():
parents.append(individual)
for individual in parents:
if mutate > random():
pos_to_mutate = randint(0, len(individual)-1)
individual[pos_to_mutate] = randint(
min(individual), max(individual))
parents_length = len(parents)
desired_length = len(pop) - parents_length
children = []
while len(children) < desired_length:
male = randint(0, parents_length-1)
female = randint(0, parents_length-1)
if male != female:
male = parents[male]
female = parents[female]
half = int(len(male) / 2)
child = male[:half] + female[half:]
children.append(child)
parents.extend(children)
return parents
target = 0
p_count = 100
i_length = 6
i_min = 0
i_max = 100
p = population(p_count, i_length, i_min, i_max)
fitness_history = [grade(p, target),]
for i in range(1000):
p = evolve(p, target)
fitness_history.append(grade(p, target))
for datum in fitness_history:
print(datum)
print(len(fitness_history))
I updated with new code. My ask : i want that my program found smaller number

single iteration sharing the iterator

I have a lot of data, usually in a file. I want to compute some quantities so I have this kind of functions:
def mean(iterator):
n = 0
sum = 0.
for i in iterator:
sum += i
n += 1
return sum / float(n)
I have also many other similar functions (var, size, ...)
Now I have an iterator iterating throught the data: iter_data. I can compute all the quantities I want: m = mean(iter_data); v = var(iter_data) and so on, but the problem is that I am iterating many times and this is expensive in my case. Actually the I/O is the most expensive part.
So the question is: can I compute my quantities m, v, ... iterating only one time over iter_data keeping separate the functions mean, var, ... so that it is easy to add new ones?
What I need is something similar to boost::accumulators

For example use objects and callbacks like:
class Counter():
def __init__(self):
self.n = 0
def __call__(self, i):
self.n += 1
class Summer():
def __init__(self):
self.sum = 0
def __call__(self, i):
self.sum += i
def process(iterator, callbacks):
for i in iterator:
for f in callbacks: f(i)
counter = Counter()
summer = Summer()
callbacks = [counter, summer]
iterator = xrange(10) # testdata
process(iterator, callbacks)
# process results from callbacks
n = counter.n
sum = summer.sum
This is easily extendible and iterates the data only once.

You can use itertools.tee and generator magic (I say magic because it's not exactly nice and readable):
import itertools
def mean(iterator):
n = 0
sum = 0.
for i in iterator:
sum += i
n += 1
yield
yield sum / float(n)
def multi_iterate(funcs, iter_data):
iterators = itertools.tee(iter_data, len(funcs))
result_iterators = [func(values) for func, values in zip(funcs, iterators)]
for results in itertools.izip(*result_iterators):
pass
return results
mean_result, var_result = multi_iterate([mean, var], iter([10, 20, 30]))
print(mean_result) # 20.0
By the way, you can write mean in a simpler way:
def mean(iterator):
total = 0.
for n, item in enumerate(iterator, 1):
total += i
yield
yield total / n
You shouldn't name variables sum because that shadows the built-in function with the same name.

Without classes, you could adapt the following:
def my_mean():
total = 0.
length = 0
while True:
val = (yield)
if val is not None:
total += val
length += 1
else:
yield total / length
def my_len():
length = 0
while True:
val = (yield)
if val is not None:
length += 1
else:
yield length
def my_sum():
total = 0.
while True:
val = (yield)
if val is not None:
total += val
else:
yield total
def process(iterable, **funcs):
fns = {name:func() for name, func in funcs.iteritems()}
for fn in fns.itervalues():
fn.send(None)
for item in iterable:
for fn in fns.itervalues():
fn.send(item)
return {name:next(func) for name, func in fns.iteritems()}
data = [1, 2, 3]
print process(data, items=my_len, some_other_value=my_mean, Total=my_sum)
# {'items': 3, 'some_other_value': 2.0, 'Total': 6.0}

What you want is to have a main Calc class that iterates over the data applying different calculation for mean, var, etc and then can return those values through an interface. You could make it more generic by letting calculations register themselves with this class before the main calculation and then have their results available through new accessors in the interface.

Increasing speed of python code

I have some python code that has many classes. I used cProfile to find that the total time to run the program is 68 seconds. I found that the following function in a class called Buyers takes about 60 seconds of those 68 seconds. I have to run the program about 100 times, so any increase in speed will help. Can you suggest ways to increase the speed by modifying the code? If you need more information that will help, please let me know.
def qtyDemanded(self, timePd, priceVector):
'''Returns quantity demanded in period timePd. In addition,
also updates the list of customers and non-customers.
Inputs: timePd and priceVector
Output: count of people for whom priceVector[-1] < utility
'''
## Initialize count of customers to zero
## Set self.customers and self.nonCustomers to empty lists
price = priceVector[-1]
count = 0
self.customers = []
self.nonCustomers = []
for person in self.people:
if person.utility >= price:
person.customer = 1
self.customers.append(person)
else:
person.customer = 0
self.nonCustomers.append(person)
return len(self.customers)
self.people is a list of person objects. Each person has customer and utility as its attributes.
EDIT - responsed added
-------------------------------------
Thanks so much for the suggestions. Here is the
response to some questions and suggestions people have kindly
made. I have not tried them all, but will try others and write back later.
(1) #amber - the function is accessed 80,000 times.
(2) #gnibbler and others - self.people is a list of Person objects in memory. Not connected to a database.
(3) #Hugh Bothwell
cumtime taken by the original function - 60.8 s (accessed 80000 times)
cumtime taken by the new function with local function aliases as suggested - 56.4 s (accessed 80000 times)
(4) #rotoglup and #Martin Thomas
I have not tried your solutions yet. I need to check the rest of the code to see the places where I use self.customers before I can make the change of not appending the customers to self.customers list. But I will try this and write back.
(5) #TryPyPy - thanks for your kind offer to check the code.
Let me first read a little on the suggestions you have made to see if those will be feasible to use.
EDIT 2
Some suggested that since I am flagging the customers and noncustomers in the self.people, I should try without creating separate lists of self.customers and self.noncustomers using append. Instead, I should loop over the self.people to find the number of customers. I tried the following code and timed both functions below f_w_append and f_wo_append. I did find that the latter takes less time, but it is still 96% of the time taken by the former. That is, it is a very small increase in the speed.
#TryPyPy - The following piece of code is complete enough to check the bottleneck function, in case your offer is still there to check it with other compilers.
Thanks again to everyone who replied.
import numpy
class person(object):
def __init__(self, util):
self.utility = util
self.customer = 0
class population(object):
def __init__(self, numpeople):
self.people = []
self.cus = []
self.noncus = []
numpy.random.seed(1)
utils = numpy.random.uniform(0, 300, numpeople)
for u in utils:
per = person(u)
self.people.append(per)
popn = population(300)
def f_w_append():
'''Function with append'''
P = 75
cus = []
noncus = []
for per in popn.people:
if per.utility >= P:
per.customer = 1
cus.append(per)
else:
per.customer = 0
noncus.append(per)
return len(cus)
def f_wo_append():
'''Function without append'''
P = 75
for per in popn.people:
if per.utility >= P:
per.customer = 1
else:
per.customer = 0
numcustomers = 0
for per in popn.people:
if per.customer == 1:
numcustomers += 1
return numcustomers
EDIT 3: It seems numpy is the problem
This is in response to what John Machin said below. Below you see two ways of defining Population class. I ran the program below twice, once with each way of creating Population class. One uses numpy and one does not use numpy. The one without numpy takes similar time as John found in his runs. One with numpy takes much longer. What is not clear to me is that the popn instance is created before time recording begins (at least that is what it appears from the code). Then, why is numpy version taking longer. And, I thought numpy was supposed to be more efficient. Anyhow, the problem seems to be with numpy and not so much with the append, even though it does slow down things a little. Can someone please confirm with the code below? Thanks.
import random # instead of numpy
import numpy
import time
timer_func = time.time # using Mac OS X 10.5.8
class Person(object):
def __init__(self, util):
self.utility = util
self.customer = 0
class Population(object):
def __init__(self, numpeople):
random.seed(1)
self.people = [Person(random.uniform(0, 300)) for i in xrange(numpeople)]
self.cus = []
self.noncus = []
# Numpy based
# class Population(object):
# def __init__(self, numpeople):
# numpy.random.seed(1)
# utils = numpy.random.uniform(0, 300, numpeople)
# self.people = [Person(u) for u in utils]
# self.cus = []
# self.noncus = []
def f_wo_append(popn):
'''Function without append'''
P = 75
for per in popn.people:
if per.utility >= P:
per.customer = 1
else:
per.customer = 0
numcustomers = 0
for per in popn.people:
if per.customer == 1:
numcustomers += 1
return numcustomers
t0 = timer_func()
for i in xrange(20000):
x = f_wo_append(popn)
t1 = timer_func()
print t1-t0
Edit 4: See the answers by John Machin and TryPyPy
Since there have been so many edits and updates here, those who find themselves here for the first time may be a little confused. See the answers by John Machin and TryPyPy. Both of these can help in improving the speed of the code substantially. I am grateful to them and others who alerted me to slowness of append. Since, in this instance I am going to use John Machin's solution and not use numpy for generating utilities, I am accepting his response as an answer. However, I really appreciate the directions pointed out by TryPyPy also.

There are many things you can try after optimizing your Python code for speed. If this program doesn't need C extensions, you can run it under PyPy to benefit from its JIT compiler. You can try making a C extension for possibly huge speedups. Shed Skin will even allow you to convert your Python program to a standalone C++ binary.
I'm willing to time your program under these different optimization scenarios if you can provide enough code for benchmarking,
Edit: First of all, I have to agree with everyone else: are you sure you're measuring the time correctly? The example code runs 100 times in under 0.1 seconds here, so there is a good chance the either the time is wrong or you have a bottleneck (IO?) that isn't present in the code sample.
That said, I made it 300000 people so times were consistent. Here's the adapted code, shared by CPython (2.5), PyPy and Shed Skin:
from time import time
import random
import sys
class person(object):
def __init__(self, util):
self.utility = util
self.customer = 0
class population(object):
def __init__(self, numpeople, util):
self.people = []
self.cus = []
self.noncus = []
for u in util:
per = person(u)
self.people.append(per)
def f_w_append(popn):
'''Function with append'''
P = 75
cus = []
noncus = []
# Help CPython a bit
# cus_append, noncus_append = cus.append, noncus.append
for per in popn.people:
if per.utility >= P:
per.customer = 1
cus.append(per)
else:
per.customer = 0
noncus.append(per)
return len(cus)
def f_wo_append(popn):
'''Function without append'''
P = 75
for per in popn.people:
if per.utility >= P:
per.customer = 1
else:
per.customer = 0
numcustomers = 0
for per in popn.people:
if per.customer == 1:
numcustomers += 1
return numcustomers
def main():
try:
numpeople = int(sys.argv[1])
except:
numpeople = 300000
print "Running for %s people, 100 times." % numpeople
begin = time()
random.seed(1)
# Help CPython a bit
uniform = random.uniform
util = [uniform(0.0, 300.0) for _ in xrange(numpeople)]
# util = [random.uniform(0.0, 300.0) for _ in xrange(numpeople)]
popn1 = population(numpeople, util)
start = time()
for _ in xrange(100):
r = f_wo_append(popn1)
print r
print "Without append: %s" % (time() - start)
popn2 = population(numpeople, util)
start = time()
for _ in xrange(100):
r = f_w_append(popn2)
print r
print "With append: %s" % (time() - start)
print "\n\nTotal time: %s" % (time() - begin)
if __name__ == "__main__":
main()
Running with PyPy is as simple as running with CPython, you just type 'pypy' instead of 'python'. For Shed Skin, you must convert to C++, compile and run:
shedskin -e makefaster.py && make
# Check that you're using the makefaster.so file and run test
python -c "import makefaster; print makefaster.__file__; makefaster.main()"
And here is the Cython-ized code:
from time import time
import random
import sys
cdef class person:
cdef readonly int utility
cdef public int customer
def __init__(self, util):
self.utility = util
self.customer = 0
class population(object):
def __init__(self, numpeople, util):
self.people = []
self.cus = []
self.noncus = []
for u in util:
per = person(u)
self.people.append(per)
cdef int f_w_append(popn):
'''Function with append'''
cdef int P = 75
cdef person per
cus = []
noncus = []
# Help CPython a bit
# cus_append, noncus_append = cus.append, noncus.append
for per in popn.people:
if per.utility >= P:
per.customer = 1
cus.append(per)
else:
per.customer = 0
noncus.append(per)
cdef int lcus = len(cus)
return lcus
cdef int f_wo_append(popn):
'''Function without append'''
cdef int P = 75
cdef person per
for per in popn.people:
if per.utility >= P:
per.customer = 1
else:
per.customer = 0
cdef int numcustomers = 0
for per in popn.people:
if per.customer == 1:
numcustomers += 1
return numcustomers
def main():
cdef int i, r, numpeople
cdef double _0, _300
_0 = 0.0
_300 = 300.0
try:
numpeople = int(sys.argv[1])
except:
numpeople = 300000
print "Running for %s people, 100 times." % numpeople
begin = time()
random.seed(1)
# Help CPython a bit
uniform = random.uniform
util = [uniform(_0, _300) for i in xrange(numpeople)]
# util = [random.uniform(0.0, 300.0) for _ in xrange(numpeople)]
popn1 = population(numpeople, util)
start = time()
for i in xrange(100):
r = f_wo_append(popn1)
print r
print "Without append: %s" % (time() - start)
popn2 = population(numpeople, util)
start = time()
for i in xrange(100):
r = f_w_append(popn2)
print r
print "With append: %s" % (time() - start)
print "\n\nTotal time: %s" % (time() - begin)
if __name__ == "__main__":
main()
For building it, it's nice to have a setup.py like this one:
from distutils.core import setup
from distutils.extension import Extension
from Cython.Distutils import build_ext
ext_modules = [Extension("cymakefaster", ["makefaster.pyx"])]
setup(
name = 'Python code to speed up',
cmdclass = {'build_ext': build_ext},
ext_modules = ext_modules
)
You build it with:
python setupfaster.py build_ext --inplace
Then test:
python -c "import cymakefaster; print cymakefaster.file; cymakefaster.main()"
Timings were run five times for each version, with Cython being the fastest and easiest of the code generators to use (Shed Skin aims to be simpler, but cryptic error messages and implicit static typing made it harder here). As for best value, PyPy gives impressive speedup in the counter version with no code changes.
#Results (time in seconds for 30000 people, 100 calls for each function):
Mean Min Times
CPython 2.5.2
Without append: 35.037 34.518 35.124, 36.363, 34.518, 34.620, 34.559
With append: 29.251 29.126 29.339, 29.257, 29.259, 29.126, 29.272
Total time: 69.288 68.739 69.519, 70.614, 68.746, 68.739, 68.823
PyPy 1.4.1
Without append: 2.672 2.655 2.655, 2.670, 2.676, 2.690, 2.668
With append: 13.030 12.672 12.680, 12.725, 14.319, 12.755, 12.672
Total time: 16.551 16.194 16.196, 16.229, 17.840, 16.295, 16.194
Shed Skin 0.7 (gcc -O2)
Without append: 1.601 1.599 1.599, 1.605, 1.600, 1.602, 1.599
With append: 3.811 3.786 3.839, 3.795, 3.798, 3.786, 3.839
Total time: 5.704 5.677 5.715, 5.705, 5.699, 5.677, 5.726
Cython 0.14 (gcc -O2)
Without append: 1.692 1.673 1.673, 1.710, 1.678, 1.688, 1.711
With append: 3.087 3.067 3.079, 3.080, 3.119, 3.090, 3.067
Total time: 5.565 5.561 5.562, 5.561, 5.567, 5.562, 5.572
Edit: Aaaand more meaningful timings, for 80000 calls with 300 people each:
Results (time in seconds for 300 people, 80000 calls for each function):
Mean Min Times
CPython 2.5.2
Without append: 27.790 25.827 25.827, 27.315, 27.985, 28.211, 29.612
With append: 26.449 24.721 24.721, 27.017, 27.653, 25.576, 27.277
Total time: 54.243 50.550 50.550, 54.334, 55.652, 53.789, 56.892
Cython 0.14 (gcc -O2)
Without append: 1.819 1.760 1.760, 1.794, 1.843, 1.827, 1.871
With append: 2.089 2.063 2.100, 2.063, 2.098, 2.104, 2.078
Total time: 3.910 3.859 3.865, 3.859, 3.944, 3.934, 3.951
PyPy 1.4.1
Without append: 0.889 0.887 0.894, 0.888, 0.890, 0.888, 0.887
With append: 1.671 1.665 1.665, 1.666, 1.671, 1.673, 1.681
Total time: 2.561 2.555 2.560, 2.555, 2.561, 2.561, 2.569
Shed Skin 0.7 (g++ -O2)
Without append: 0.310 0.301 0.301, 0.308, 0.317, 0.320, 0.303
With append: 1.712 1.690 1.733, 1.700, 1.735, 1.690, 1.702
Total time: 2.027 2.008 2.035, 2.008, 2.052, 2.011, 2.029
Shed Skin becomes fastest, PyPy surpasses Cython. All three speed things up a lot compared to CPython.

Please consider trimming down your f_wo_append function:
def f_wo_append():
'''Function without append'''
P = 75
numcustomers = 0
for person in popn.people:
person.customer = iscust = person.utility >= P
numcustomers += iscust
return numcustomers
Edit in response to OP's comment """This made it a lot worse! The trimmed version takes 4 times more time than the version I have posted above. """
There is no way that that could take "4 times more" (5 times?) ... here is my code, which demonstrates a significant reduction in the "without append" case, as I suggested, and also introduces a significant improvement in the "with append" case.
import random # instead of numpy
import time
timer_func = time.clock # better on Windows, use time.time on *x platform
class Person(object):
def __init__(self, util):
self.utility = util
self.customer = 0
class Population(object):
def __init__(self, numpeople):
random.seed(1)
self.people = [Person(random.uniform(0, 300)) for i in xrange(numpeople)]
self.cus = []
self.noncus = []
def f_w_append(popn):
'''Function with append'''
P = 75
cus = []
noncus = []
for per in popn.people:
if per.utility >= P:
per.customer = 1
cus.append(per)
else:
per.customer = 0
noncus.append(per)
popn.cus = cus # omitted from OP's code
popn.noncus = noncus # omitted from OP's code
return len(cus)
def f_w_append2(popn):
'''Function with append'''
P = 75
popn.cus = []
popn.noncus = []
cusapp = popn.cus.append
noncusapp = popn.noncus.append
for per in popn.people:
if per.utility >= P:
per.customer = 1
cusapp(per)
else:
per.customer = 0
noncusapp(per)
return len(popn.cus)
def f_wo_append(popn):
'''Function without append'''
P = 75
for per in popn.people:
if per.utility >= P:
per.customer = 1
else:
per.customer = 0
numcustomers = 0
for per in popn.people:
if per.customer == 1:
numcustomers += 1
return numcustomers
def f_wo_append2(popn):
'''Function without append'''
P = 75
numcustomers = 0
for person in popn.people:
person.customer = iscust = person.utility >= P
numcustomers += iscust
return numcustomers
if __name__ == "__main__":
import sys
popsize, which, niter = map(int, sys.argv[1:4])
pop = Population(popsize)
func = (f_w_append, f_w_append2, f_wo_append, f_wo_append2)[which]
t0 = timer_func()
for _unused in xrange(niter):
nc = func(pop)
t1 = timer_func()
print "popsize=%d func=%s niter=%d nc=%d seconds=%.2f" % (
popsize, func.__name__, niter, nc, t1 - t0)
and here are the results of running it (Python 2.7.1, Windows 7 Pro, "Intel Core i3 CPU 540 # 3.07 GHz"):
C:\junk>\python27\python ncust.py 300 0 80000
popsize=300 func=f_w_append niter=80000 nc=218 seconds=5.48
C:\junk>\python27\python ncust.py 300 1 80000
popsize=300 func=f_w_append2 niter=80000 nc=218 seconds=4.62
C:\junk>\python27\python ncust.py 300 2 80000
popsize=300 func=f_wo_append niter=80000 nc=218 seconds=5.55
C:\junk>\python27\python ncust.py 300 3 80000
popsize=300 func=f_wo_append2 niter=80000 nc=218 seconds=4.29
Edit 3 Why numpy takes longer:
>>> import numpy
>>> utils = numpy.random.uniform(0, 300, 10)
>>> print repr(utils[0])
42.777972538362874
>>> type(utils[0])
<type 'numpy.float64'>
and here's why my f_wo_append2 function took 4 times longer:
>>> x = utils[0]
>>> type(x)
<type 'numpy.float64'>
>>> type(x >= 75)
<type 'numpy.bool_'> # iscust refers to a numpy.bool_
>>> type(0 + (x >= 75))
<type 'numpy.int32'> # numcustomers ends up referring to a numpy.int32
>>>
The empirical evidence is that these custom types aren't so fast when used as scalars ... perhaps because they need to reset the floating-point hardware each time they are used. OK for big arrays, not for scalars.
Are you using any other numpy functionality? If not, just use the random module. If you have other uses for numpy, you may wish to coerce the numpy.float64 to float during the population setup.

You can eliminate some lookups by using local function aliases:
def qtyDemanded(self, timePd, priceVector):
'''Returns quantity demanded in period timePd. In addition,
also updates the list of customers and non-customers.
Inputs: timePd and priceVector
Output: count of people for whom priceVector[-1] < utility
'''
price = priceVector[-1]
self.customers = []
self.nonCustomers = []
# local function aliases
addCust = self.customers.append
addNonCust = self.nonCustomers.append
for person in self.people:
if person.utility >= price:
person.customer = 1
addCust(person)
else:
person.customer = 0
addNonCust(person)
return len(self.customers)

This comment rings alarm bells:
'''Returns quantity demanded in period timePd. In addition,
also updates the list of customers and non-customers.
Aside from the fact that timePd is not used in the function, if you really want just to return the quantity, do just that in the function. Do the "in addition" stuff in a separate function.
Then profile again and see which of these two functions you are spending most of your time in.
I like to apply SRP to methods as well as classes: it makes them easier to test.

Depending on how often you add new elements to self.people or change person.utility, you could consider sorting self.people by the utility field.
Then you could use a bisect function to find the lower index i_pivot where the person[i_pivot].utility >= price condition is met. This would have a lower complexity ( O(log N) ) than your exhaustive loop ( O(N) )
With this information, you could then update your people list if needed :
Do you really need to update the utility field each time ? In the sorted case, you could easily deduce this value while iterating : for example, considering your list sorted in incresing order, utility = (index >= i_pivot)
Same question with customers and nonCustomers lists. Why do you need them? They could be replaced by slices of the original sorted list : for example, customers = self.people[0:i_pivot]
All this would allow you to reduce the complexity of your algorithm, and use more built-in (fast) Python functions, this could speedup your implementation.

Some curious things I noted:
timePd is passed as a parameter but never used
price is an array but you only ever use the last entry - why not pass the value there instead of passing the list?
count is initialized and never used
self.people contains multiple person objects which are then copied to either self.customers or self.noncustomers as well as having their customer flag set. Why not skip the copy operation and, on return, just iterate over the list, looking at the customer flag? This would save the expensive append.
Alternatively, try using psyco which can speed up pure Python, sometimes considerably.

It's surprising that the function shown is such a bottleneck because it's so relatively simple. For that reason, I'd double check my profiling procedure and results. However, if they're correct, the most time consuming part of your function has to be the for loop it contains, of course, so it makes sense to focus on speeding that up. One way to do this is by replacing the if/else with straight-line code. You can also reduce the attribute lookup for the append list method slightly. Here's how both of those things could be accomplished:
def qtyDemanded(self, timePd, priceVector):
'''Returns quantity demanded in period timePd. In addition,
also updates the list of customers and non-customers.
Inputs: timePd and priceVector
Output: count of people for whom priceVector[-1] < utility
'''
price = priceVector[-1] # last price
kinds = [[], []] # initialize sublists of noncustomers and customers
kindsAppend = [kinds[b].append for b in (False, True)] # append methods
for person in self.people:
person.customer = person.utility >= price # customer test
kindsAppend[person.customer](person) # add to proper list
self.nonCustomers = kinds[False]
self.customers = kinds[True]
return len(self.customers)
That said, I must add that it seems a little redundant to have both a customer flag in each person object and also put each of them into a separate list depending on that attribute. Not creating these two lists would of course speed the loop up further.

You're asking for guesses, and mostly you're getting guesses.
There's no need to guess. Here's an example.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Simple parallel process in Python - python

Related

Adjusting nested apply() functions for large datasets

Optimizing a parallel implementation of a list comprehension

python genetic algorithm found optimum solution

single iteration sharing the iterator

Increasing speed of python code

Categories

Resources