Can I parallelize this nested for loop? (Python 3.5)

Can I parallelize this nested for loop? (Python 3.5) - python

I recently found out that the bottleneck of my code is the following block. N is of order 10,000, and L (10,000)^2. RQ_func is just a function that takes indices (tuples) and returns float V and dictionary sp_dist of {index : probability} format.
Is there a way I can parallelize this code? I have access to cluster computing from which I can use up to 20 cores at a time and would like to use the option.
R = np.empty((L,))
Q = scipy.sparse.lil_matrix((L, N))
traverser = 0 # Populate R and Q by traversing the array
for s_index in state_indices:
for a_index in action_indices:
V, sp_dist = RQ_func(s_index, a_index)
R[traverser] = V
for sp_index, prob in sp_dist.items():
Q[traverser, sp_index] = prob
traverser += 1

Related

Finding similar numbers in a list and getting the average

I currently have the numbers above in a list. How would you go about adding similar numbers (by nearest 850) and finding average to make the list smaller.
For example I have the list
l = [2000,2200,5000,2350]
In this list, i want to find numbers that are similar by n+500
So I want all the numbers similar by n+500 which are 2000,2200,2350 to be added and divided by the amount there which is 3 to find the mean. This will then replace the three numbers added. so the list will now be l = [2183,5000]
As the image above shows the numbers in the list. Here I would like the numbers close by n+850 to all be selected and the mean to be found

It seems that you look for a clustering algorithm - something like K-means.
This algorithm is implemented in scikit-learn package
After you find your K means, you can count how many of your data were clustered with that mean, and make your computations.
However, it's not clear in your case what is K. You can try and run the algorithm for several K values until you get your constraints (the n+500 distance between the means)

You can use:
import numpy as np
l = np.array([2000,2200,5000,2350])
# find similar numbers (that are within each 500 fold)
similar = l // 500
# for each similar group get the average and convert it to integer (as in the desired output)
new_list = [np.average(l[similar == num]).astype(int) for num in np.unique(similar)]
print(new_list)
Output:
[2183, 5000]

Step 1:
list = [5620.77978515625,
7388.43017578125,
7683.580078125,
8296.6513671875,
8320.82421875,
8557.51953125,
8743.5,
9163.220703125,
9804.7939453125,
9913.86328125,
9940.1396484375,
9951.74609375,
10074.23828125,
10947.0419921875,
11048.662109375,
11704.099609375,
11958.5,
11964.8232421875,
12335.70703125,
13103.0,
13129.529296875,
16463.177734375,
16930.900390625,
17712.400390625,
18353.400390625,
19390.96484375,
20089.0,
34592.15625,
36542.109375,
39478.953125,
40782.078125,
41295.26953125,
42541.6796875,
42893.58203125,
44578.27734375,
45077.578125,
48022.2890625,
52535.13671875,
58330.5703125,
61597.91796875,
62757.12890625,
64242.79296875,
64863.09765625,
66930.390625]
Step 2:
seen = [] #to log used indices pairs
diff_dic = {} #to record indices and diff
for i,a in enumerate(list):
for j,b in enumerate(list):
if i!=j and (i,j)[::-1] not in seen:
seen.append((i,j))
diff_dic[(i,j)] = abs(a-b)
keys = []
for ind, diff in diff_dic.items():
if diff <= 850:
keys.append(ind)
uniques_k = [] #to record unique indices
for pair in keys:
for key in pair:
if key not in uniques_k:
uniques_k.append(key)
import numpy as np
list_arr = np.array(list)
nearest_avg = np.mean(list_arr[uniques_k])
list_arr = np.delete(list_arr, uniques_k)
list_arr = np.append(list_arr, nearest_avg)
list_arr
output:
array([ 5620.77978516, 34592.15625, 36542.109375, 39478.953125, 48022.2890625, 52535.13671875, 58330.5703125 , 61597.91796875, 62757.12890625, 66930.390625 , 20566.00205365])

You just need a conditional list comprehension like this:
l = [2000,2200,5000,2350]
n = 2000
a = [ (x) for x in l if ((n -250) < x < (n + 250)) ]
Then you can average with
np.mean(a)
or whatever method you prefer.

optimization comparison between cvxpy and gurobi

There are number of jobs to be assigned to number of resources each with a score (performance indicator) and cost. The resource assignment problem (RAP) objective is to maximize assignment scores considering the budget. Constraints: Each resource can handle at most one job and each job if it is filled should be done by one resource. Also, there is a limited budget to spend.
I have tackled the problem in two ways: CVXPY using gurobi solver and gurobi packages. My challenge is I can't program it in a memory-efficient way with cvxpy. There are hundreds of constraint list comprehensions! How can I can improve efficiency of my code in cvxpy? For example, is there a better way to define dictionary variables in cvxpy similar to gurobi?
ms is dictionary of format {('firstName lastName', 'job'), score_value}
cst is dictionary of format {('firstName lastName', 'job'), cost_value}
job is set of jobs
res is set of resources {'firstName lastName'}
G (or g in gurobi implementation) is a dictionary with jobs as keys and values of 0 or 1 whether that job is filled due to budget limit (0 if filled and 1 if not)
thanks
github link including codes and memory profiling comparison
gurobi implementation:
m = gp.Model("RAP")
assign = m.addVars(ms.keys(), vtype=GRB.BINARY, name="assign")
g = m.addVars(job, name="gap")
m.addConstrs((assign.sum("*", j) + g[j] == 1 for j in job), name="demand")
m.addConstrs((assign.sum(r, "*") <= 1 for r in res), name="supply")
m.addConstr(assign.prod(cst) <= budget, name="Budget")
job_gap_penalty = 101 # penatly of not filling a job
m.setObjective(assign.prod(ms) -job_gap_penalty*g.sum(), GRB.MAXIMIZE)
m.optimize()
cvxpy implenentation:
X = {}
for a in ms.keys():
X[a] = cp.Variable(boolean=True, name="assign")
G = {}
for g in job:
G[g] = cp.Variable(boolean=True, name="gap")
constraints = []
for j in job:
X_r = 0
for r in res:
X_r += X[r, j]
constraints += [
X_r + G[j] == 1
]
for r in res:
X_j = 0
for j in job:
X_j += X[r, j]
constraints += [
X_j <= 1
]
constraints += [
np.array(list(cst.values())) # np.array(list(X.values())) <= budget,
]
obj = cp.Maximize(np.array(list(ms.values())) # np.array(list(X.values()))
- job_gap_penalty * cp.sum(list(G.values())))
prob = cp.Problem(obj, constraints)
prob.solve(solver=cp.GUROBI, verbose=False)
Here is the memory profiling comparison:
memeory profiling for cvxpy
memory profiling for gurobi

Previously, I tried to solve thru defining dictionary variables similar to gurobi but at is not available in cvxpy, the code was not efficient when scaling up. But now I solved it thru matrix variables and then converting to dictionary variables which super fast!
assign_scores = np.array(list(ms.values())).reshape(len(res), len(job))
assign_cost = np.array(list(cst.values())).reshape(len(res), len(job))
# make a bool matrix variable with the shape of number of resources and jobs
x = cp.Variable(shape=(len(res), len(job)), boolean=True, name="assign")
# make a bool vector variable with the shape of number of jobs
g = cp.Variable(shape=(len(job), ), boolean=True, name="gap")
constraints = []
# each job can be assigned to at most one resource or remains unfilled due to budget cap
constraints += [cp.sum(x[:, j]) + g[j] == 1 for j in range(len(job))]
# each resource can be assigned to at most one job
constraints += [cp.sum(x[r, :]) <= 1 for r in range(len(res))]
# budget cap
constraints += [cp.sum(cp.multiply(assign_cost, x)) <= budget]
# pentalty if a job is not filled
job_gap_penalty=101
# objective is to maiximize performance score
obj = cp.Maximize(cp.sum(cp.multiply(assign_scores, x) - job_gap_penalty * cp.sum(g)))
prob = cp.Problem(obj, constraints)
prob.solve(solver=cp.GUROBI, verbose=True)

Optimizing loop for millions of entry selections

I have a python anonymisation mechanism that rely on generating fake data from existing attributes.
Those attributes are accessible in the domain D which is an array of 16 sets, each set representing values possible for each attributes.
the attributes are ['uid', 'trans_id', 'trans_date', 'trans_type', 'operation', 'amount', 'balance', 'k_symbol', 'bank', 'acct_district_id', 'frequency', 'acct_date', 'disp_type', 'cli_district_id', 'gender', 'zip']
Some attributes have very few values (gener is M or F), some are unique (uid) and can have 1260000 different values.
The fake data is generated as tuples of randomly selected attributes inside the domain.
I have to generate nearly 2 million tuples.
The first implementation of this was:
def beta_step(I, V, beta, n, m, D):
r = approx_binomial(m - n, beta)
print("r = " + str(r))
i = 0
while i < r:
t = []
for attribute in D:
a_j = choice(list(attribute))
t.append(a_j)
if t not in V + I:
V.append(t)
i += 1
This took around 0,5s for each tuple.
Note that I and V are existing lists (with initialy respectively 1200000 and 800000 tuples)
I already found out that I could speed-up things by converting D to a 2D array once and for all, in order not to convert sets in list on each run
for attribute in D:
a_j = choice(attribute)
t.append(a_j)
This gets me down to 0.2s by tuple.
I also tried looping fewer times and generating multiple tuples at a time like so:
def beta_step(I, V, beta, n, m, D):
D = [list(attr) for attr in D ] #Convert D in 2D list
r = approx_binomial(m - n, beta)
print("r = " + str(r))
i = 0
NT = 1000 #Number of tuples generated at a time
while i < r:
T = [[] for j in range(NT)]
for attribute in D:
a_j = choices(attribute,k=min(NT,r-i))
for j in range(len(a_j)):
T[j].append(a_j[j])
for t in T:
if t not in V + I:
V.append(t)
i += 1
But this takes around 220s for 1000 tuples so it is not faster than before.
I have timed the different parts and it seems that it is the last for loop that takes most of the time (Around 217s).
Is there any way I could speed things up in order not to run it for 50 hours?
=======================
EDIT : I implemented #Larri suggestion like that :
def beta_step(I, V, beta, n, m, D):
D = [list(attr) for attr in D ] #Convert D in list of lists
I = set(tuple(t) for t in I)
V = set(tuple(t) for t in V)
r = approx_binomial(m - n, beta)
print("r = " + str(r))
i = 0
print('SIZE I', len(I))
print('SIZE V', len(V))
NT = 1000 #Number of tuples to generate at each pass
while i < r:
T = [[] for j in range(min(NT,r-i))]
for attribute in D:
a_j = choices(attribute,k=min(NT,r-i))
for j in range(len(a_j)):
T[j].append(a_j[j])
new_T = set(tuple(t) for t in T) - I
size_V_before = len(V)
V.update(new_T)
size_V_after = len(V)
delta_V = size_V_after-size_V_before
i += delta_V
return [list(t) for t in V]
it now takes about 0s to add elements to V
In total, adding the 1680000 tuples took 91s
However, converting back to a 2d array takes 200s, is there a way to make it faster that doesn't involve rewritting the whole program to work on sets ?

For the last for loop at least, consider converting to sets instead of using arrays. That allows you to use set.update() method without having to check if t is already included in V. This is assuming that you can incorporate the A in the logic somehow. From the given code I can't see any reference to A.
So you can change it to something like V.update(T). The i would then be the delta of len(V) before and after the operation.

Compute counts of set partitions with mulitplicity and without order

I do have a piece of code that compute partitions of a set of (potentialy duplicated) integers. But i am interested in the set of possible partition and there multiplicity.
You can for exemple launch the follwoing code :
import numpy as np
from collections import Counter
import pandas as pd
def _B(i):
# for a given multiindex i, we defined _B(i) as the set of integers containg i_j times the number j:
if len(i) != 1:
B = []
for j in range(len(i)):
B.extend(i[j]*[j])
else:
B = i*[0]
return B
def _partition(collection):
# from here: https://stackoverflow.com/a/62532969/8425270
if len(collection) == 1:
yield (collection,)
return
first = collection[0]
for smaller in _partition(collection[1:]):
# insert `first` in each of the subpartition's subsets
for n, subset in enumerate(smaller):
yield smaller[:n] + ((first,) + subset,) + smaller[n + 1 :]
# put `first` in its own subset
yield ((first,),) + smaller
def to_list(tpl):
# the final hierarchy is
return list(list(i) if isinstance(i, tuple) else i for i in tpl)
def _Pi(inst_B):
# inst_B must be a tuple
if type(inst_B) != tuple :
inst_B = tuple(inst_B)
pp = [tuple(sorted(p)) for p in _partition(inst_B)]
c = Counter(pp)
Pi = c.keys()
N = list()
for pi in Pi:
N.append(c[pi])
Pi = [to_list(pi) for pi in Pi]
return Pi, N
if __name__ == "__main__":
import cProfile
pr = cProfile.Profile()
pr.enable()
sh = (3, 3, 3)
rez = list()
rez_sorted= list()
rez_ref = list()
for idx in np.ndindex(sh):
if sum(idx) > 0:
print(idx)
Pi, N = _Pi(_B(idx))
print(pd.DataFrame({'Pi': Pi, 'N': N * np.array([np.math.factorial(len(pi) - 1) for pi in Pi])}))
pr.disable()
# after your program ends
pr.print_stats(sort="tottime")
This code computes, for several examples of tuples of integer numbers (generated by np.ndindex) the partitions and counts i need. Everything happens in the _partition and the _Pi functions, this is were you should look at.
If you look closely at how these two functions are working, you'll see that they comput eevery potential partition and THEN count up how many times they appeared. For small problems, this is fine, but if the size of the prolbme increase, this starts to take a looooot of time. Try setting sh = (5,5,5), you'll see what i mean;
So the problem is the following :
Is there a way to compute directly the partitions and there number of occurences instead ?
Edit: I cross-posted on mathoverflow there, and they propose a solution in this article, in corrolary 2.10 (page 10 of the pdf). The problem could be solved by implmenting the sets p(v,r) in this corrolary.
I was hoping, as in the univariate case, that those sets would have a nice recursive expression but i ould not find one yet.
More Edit : This problem is equivalent to finding all (multiset)-partitions of a multiset. If the solution for finding (set)-partitions of a set is given by Bell partial polynomials, here we need multivariate version of these polynomials.

Query long lists

I would like to query the value of an exponentially weighted moving average at particular points. An inefficient way to do this is as follows. l is the list of times of events and queries has the times at which I want the value of this average.
a=0.01
l = [3,7,10,20,200]
y = [0]*1000
for item in l:
y[int(item)]=1
s = [0]*1000
for i in xrange(1,1000):
s[i] = a*y[i-1]+(1-a)*s[i-1]
queries = [23,68,103]
for q in queries:
print s[q]
Outputs:
0.0355271185019
0.0226018371526
0.0158992102478
In practice l will be very large and the range of values in l will also be huge. How can you find the values at the times in queries more efficiently, and especially without computing the potentially huge lists y and s explicitly. I need it to be in pure python so I can use pypy.
Is it possible to solve the problem in time proportional to len(l)
and not max(l) (assuming len(queries) < len(l))?

Here is my code for doing this:
def ewma(l, queries, a=0.01):
def decay(t0, x, t1, a):
from math import pow
return pow((1-a), (t1-t0))*x
assert l == sorted(l)
assert queries == sorted(queries)
samples = []
try:
t0, x0 = (0.0, 0.0)
it = iter(queries)
q = it.next()-1.0
for t1 in l:
# new value is decayed previous value, plus a
x1 = decay(t0, x0, t1, a) + a
# take care of all queries between t0 and t1
while q < t1:
samples.append(decay(t0, x0, q, a))
q = it.next()-1.0
# take care of all queries equal to t1
while q == t1:
samples.append(x1)
q = it.next()-1.0
# update t0, x0
t0, x0 = t1, x1
# take care of any remaining queries
while True:
samples.append(decay(t0, x0, q, a))
q = it.next()-1.0
except StopIteration:
return samples
I've also uploaded a fuller version of this code with unit tests and some comments to pastebin: http://pastebin.com/shhaz710
EDIT: Note that this does the same thing as what Chris Pak suggests in his answer, which he must have posted as I was typing this. I haven't gone through the details of his code, but I think mine is a bit more general. This code supports non-integer values in l and queries. It also works for any kind of iterables, not just lists since I don't do any indexing.

I think you could do it in ln(l) time, if l is sorted. The basic idea is that the non recursive form of EMA is a*s_i + (1-a)^1 * s_(i-1) + (1-a)^2 * s_(i-2) ....
This means for query k, you find the greatest number in l less than k, and for a estimation limit, use the following, where v is the index in l, l[v] is the value
(1-a)^(k-v) *l[v] + ....
Then, you spend lg(len(l)) time in search + a constant multiple for the depth of your estimation. I'll provide a code sample in a little bit (after work) if you want it, just wanted to get my idea out there while I was thinking about it
here's the code -
v is the dictionary of values at a given time; replace with 1 if it's just a 1 every time...
import math
from bisect import bisect_right
a = .01
limit = 1000
l = [1,5,14,29...]
def find_nearest_lt(l, time):
i = bisect_right(a, x)
if i:
return i-1
raise ValueError
def find_ema(l, time):
i = find_nearest_lt(l, time)
if l[i] == time:
result = a * v[l[i]
i -= 1
else:
result = 0
while (time-l[i]) < limit:
result += math.pow(1-a, time-l[i]) * v[l[i]]
i -= 1
return result
if I'm thinking correctly, the find nearest is l(n), then the while loop is <= 1000 iterations, guaranteed, so it's technically a constant (though a kind of large one). find_nearest was stolen from the page on bisect - http://docs.python.org/2/library/bisect.html

It appears that y is a binary value -- either 0 or 1 -- depending on the values of l. Why not use y = set(int(item) for item in l)? That's the most efficient way to store and look up a list of numbers.
Your code will cause an error the first time through this loop:
s = [0]*1000
for i in xrange(1000):
s[i] = a*y[i-1]+(1-a)*s[i-1]
because i-1 is -1 when i=0 (first pass of loop) and both y[-1] and s[-1] are the last element of the list, not the previous. Maybe you want xrange(1,1000)?
How about this code:
a=0.01
l = [3.0,7.0,10.0,20.0,200.0]
y = set(int(item) for item in l)
queries = [23,68,103]
ewma = []
x = 1 if (0 in y) else 0
for i in xrange(1, queries[-1]):
x = (1-a)*x
if i in y:
x += a
if i == queries[0]:
ewma.append(x)
queries.pop(0)
When it's done, ewma should have the moving averages for each query point.
Edited to include SchighSchagh's improvements.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Can I parallelize this nested for loop? (Python 3.5) - python

Related

Finding similar numbers in a list and getting the average

optimization comparison between cvxpy and gurobi

Optimizing loop for millions of entry selections

Compute counts of set partitions with mulitplicity and without order

Query long lists

Categories

Resources