Parallelize Python Loop: Questions

Parallelize Python Loop: Questions - python

I have some code that is calculating the value of a large number of discrete actions and outputting the best action and it's value.
A_max = 0
for i in...
A = f(i)
if A > A_max
x = i
A_max = A
I'd like to parallelize this code in order to save time. Now, my understanding is that as calculating f(i) doesn't depend on calculating f(j) first, I can just use joblib.Parallel for that part of the code and get something like:
results = Parallel(n_jobs=-1)(delayed(f)(i) for i in...)
A_max = max(results)
x = list.index(A_max)
is this correct?
My next issue is that my code contains a dictionary that the function f alters as it does it calculation. My understanding is that if the code is parallelized, each concurrent process will be altering the same dictionary. Is this correct and if so would creating copies of the dictionary at the beginning of f solve the issue?
Finally, in the documentation I'm seeing references to backends called "Lorky" and "threading", what is the difference between these backends?

Related

What is the advantage of generator in terms of memory in these two examples?

One of the generator advantage is that it uses less memory and consumes fewer resources. That is, we do not produce all the data at once and we do not allocate memory to all of them, and only a one value is generated each time. The state and status and values of the variables are stored, and in fact the code can be stopped and resumed by calling it to continue.
I wrote two codes and I am comparing them, I see that the generator can be written normally and now I do not see any points for the generator. Can anyone tell me what is the advantage of this generator in compare to when it be written normally? One value is generated with each iteration of both of them.
The first code:
def gen(n):
for i in range(n):
i = i ** 2
i += 1
yield i
g = gen(3)
for i in g:
print(i)
The second one:
def func(i):
i = i ** 2
i += 1
return i
for i in range(3):
print(func(i))
I know that the id of g is constant whereas the id of func(i) is changing.
Is that what the main generator advantage means?

To be specific about the above codes that you have mentioned in the question, there is no difference in terms of memory between the two approaches you have shown, but first one is more preferable because everything you need is inside the same generator function, whereas in the second case, the loop and the function are at two different places, and every time you need to use the second function, you need to use the loop outside which unnecessarily increases the redundancy.
Actually the two functions you have written, the generator one, and the normal function, they are not equivalent.
In the generator, you are returning all the values, i.e. the loop is inside the generator function:
def gen(n):
for i in range(n):
i = i ** 2
i += 1
yield i
But, in the second case, you are just returning one value, and the loop is outside the function:
def func(i):
i = i ** 2
i += 1
return i
In order to make the second function equivalent to the first one, you need to have the loop inside the function:
def func(n):
for i in range(n):
i = i ** 2
i += 1
return i
Now, of course the above function always return a single value for i=0 if control goes inside the loop, so to fix this, you need to return an entire sequence, which demands you to have a list or similar data structure that allows you to store multiple values:
def func(n):
result = []
for i in range(n):
i = i ** 2
i += 1
result.append(i)
return result
for v in func(3):
print(v)
1
2
5
Now, you can clearly differentiate the two cases, in the first one, each values are evaluated sequentially and processed later i.e. printed, but in the second case, you ended up having the entire result in memory before you can actually process it.

The main advantage is when you have a large dataset. It is basically the idea of lazy loading which means that a data is not called unless it is required. This saves your resources because typically in a list, the entire thing is loaded at once which might take up a lot of primary memory if the data is large enough.

The advantage of the first code is with respect to something you did not show. What is meant that generating and consuming one value at a time takes less memory than first generating all values, collecting them in a list, and then consuming them from the list.
The second code with which to compare the first code should have been:
def gen2(n):
result = []
for i in range(n):
i = i ** 2
i += 1
result.append(i)
return result
g = gen2(3)
for i in g:
print(i)
Note how the result of gen2 can be used exactly like the result of gen from your first example, but gen2 uses more memory if n is getting larger, whereas gen uses the same amount of memory no matter how large n is.

How can I correspond multiple ‘values’ to a single key? (in a matrix diagonalization scenario)

I'm writing a code (using Python) to simultaneously diagonalize two commuting matrices A and B, and hoping to create a dictionary to do this. Here, 'keys' are the eigenvalues, and 'values' are eigenvectors (may or may not be degenerate). Here's the program I've written. However, it cannot give me all the eigenvectors with shared eigenvalue. I guess I need to change something in my code to accommodate degeneracy, but how can I do this? Thanks!
def simultaneous_eig(A, B):
epsilon = 10**-10
vals, vecs = la.eig(A)
degen = {}
for n in range(0,len(vals)):
for m in range(0,n):
#equality up to certain precision
if np.abs(vals[m]-vals[n]) < epsilon:
degen.get(vals[m], vecs[:,n])
degen[vals[n]] = np.array([vecs[:,n]])
return degen

I found a few issues in your function. First, to achieve the recursive comparison you are trying to make, the outer loop should go from 0 to len(vals)-1, and the inner loop should be from “n” to len(vals), as seen in my corrected snippet below.
Also, the “.get()” method for dictionaries does not modify in place. In other words, you must assign the output to a variable within the “if” clause (again, see corrected code below).
Finally, to avoid extra singleton dimensions, avoid putting “[]” around the variable you wish to convert to a numpy array.
The revised function below should solve your problems. I wasn’t able to test it completely, so let me know if there are still issues.
Happy coding!
def simultaneous_eig(A, B):
epsilon = 10**-10
vals, vecs = la.eig(A)
degen = {}
for n in range(0,len(vals)-1):
for m in range(n+1,len(vals)):
#equality up to certain precision
if np.abs(vals[m]-vals[n]) < epsilon:
vecs[:,n] = degen.get(vals[m])
degen[vals[n]] = np.array([vecs[:,n]])
return degen

Depending on the size of the values in your matrices, you may be using the wrong criterion for "almost equal". For example, when comparing scalars, 1E-20 and 1.1E-20 would be "almost equal" according to your criterion, but 1E20 and 1.1E20 would not be, even though the relative error is the same.
Why not just use math.isclose?

Adding further properties to a function: timing and iteration improvement

I have a function written in python which does two procedures:
Preprocessing: read in data from an array and compute some values that I will later need to prevent repeated computation
Iterate and compute a 'summary' of the data at every stage and use this to solve an optimisation problem.
The code is as follows:
import numpy as np
def iterative_hessian(data, targets,
sketch_method, sketch_size, num_iters):
'''
Original problem is min 0.5*||Ax-b||_2^2
iterative_hessian asks us to minimise 0.5*||S_Ax||_2^2 - <A^Tb, x>
for a summary of the data S_A
'''
A = data
y = targets
n,d = A.shape
x0 = np.zeros(shape=(d,))
m = int(sketch_size) # sketching dimension
ATy = A.T#y
covariance_mat = A.T.dot(A)
for n_iter in range(int(num_iters)):
S_A = m**(-0.5)*np.random.normal(size=(m, n))
B = S_A.T.dot(S_A)
z = ATy - covariance_mat#x0 + np.dot(S_A.T, np.dot(S_A,x0)) #
x_new = np.linalg.solve(B,z)
x0 = x_new
return np.ravel(x0)
In practise I do not use the S_A = m**(-0.5)*np.random.normal(size=(m, n)) line but use a different random transform which is faster to apply but in principle it is sufficient for the question. This code works well for what I need but I was wondering if there is a reasonable way to do the following:
Instead of repeating the line S_A = m**(-0.5)*np.random.normal(size=(m, n)) for every iteration, is there a way to specify the number of independent random copies (num_iters - which can be thought of as between 10 and 30) of S_A that are needed prior to the iteration and scan through the input only once to generate all such copies? I think this would store the S_A variables in some kind of multi-dimensional array but I'm not sure how best to do this, or whether it is even practical. I have tried a basic example doing this in parallel but it is slower than repeatedly passing through the matrix.
Suppose that I want to endow this function with more properties, for instance I want to return the average time taken on line x_new = np.linalg.solve(B,z). Doing this is straightforward - import a time module and put the code in the function, however, this will always time the function and perhaps I only want to do this when testing. An easy way around this is to create a parameter in the function definition time_updates = False and then have if time_updates == False: proceed as above else: copy the exact same code but with some timing functionality added. Is there a better way to do this which can perhaps use classes in Python?
My intention is to use this iteration on blocks of data read in from a file which doesn't fit into memory. Whilst it might be possible to store a block in memory, it would be convenient if the function only passed over that block once rather than num_iters times. Passing over the quantities computed , S_A, covariance_matrix etc, is fine however.

How to calculate Delta F / F using python?

I've recently "taught" myself python in order to analyze data for my experiments. As such I'm pretty clueless on many aspects. I've managed to make my analysis work for certain files but in some cases it breaks down and I imagine it is a result of faulty programming.
Currently I export a file containing 3 numpy arrays. One of these arrays is my signal (float values from -10 to 10). What I wish to do is to normalize every datum in this array to a range of values that preceed it. (i.e. the 30001st value must have the average of the preceeding 3000 values subtracted from it and then the difference must then be divided by thisvery same average (the preceeding 3000 values). My data is collected at a rate of 100Hz thus to get a normalization of the alst 30s i must use the preceeding 3000values.
As it stand this is how I've managed to make it work:
this stores the signal into the variable photosignal
photosignal = np.array(seg.analogsignals[0], ndmin=1)
now this the part I use to get the delta F/F over a moving window of 30s
normalizedphotosignal = [(uu-(np.mean(photosignal[uu-3000:uu])))/abs(np.mean(photosignal[uu-3000:uu])) for uu in photosignal[3000:]]
The following adds 3000 values to the beginning to keep the array the same length since later on i must time lock it to another list that is the same length
holder =list(range(3000))
normalizedphotosignal = holder + normalizedphotosignal
What I have noticed is that in certain files this code gives me an error because it says that the"slice" is empty and therefore it cannot create a mean.
I think maybe there is a better way to program this that could avoid this problem altogether. Or this a correct way to approach this problem?
So i tried the solution but it is quite slow and it nevertheless still gives me the "empty slice error".
I went over the moving average post and found this method:
def running_mean(x, N):
cumsum = np.cumsum(np.insert(x, 0, 0))
return (cumsum[N:] - cumsum[:-N]) / N
however I'm having trouble accommodating it to my desired output. namely (x-running average)/running average

Allright so I finally figured it out thanks to your help and the posts you referred me to.
The calculation for my entire data (300 000 +) takes about a second!
I used the following code:
def runningmean(x,N):
cumsum =np.cumsum(np.insert(x,0,0))
return (cumsum[N:] -cumsum[:-N])/N
photosignal = np.array(seg.analogsignal[0], ndmin =1)
photosignalaverage = runningmean(photosignal, 3000)
holder = np.zeros(2999)
photosignalaverage = np.append(holder,photosignalaverage)
detalfsignal = (photosignal-photosignalaverage)/abs(photosignalaverage)
Photosignal stores my raw signal in a numpy array.
Photosignalaverage uses cumsum to calculate the running average of every datapoint in photosignal. I then add the first 2999 values as 0, to maintian the same list size as my photosignal.
I then use basic numpy calculations to get my delta F/F signal.
Thank you once more for the feedback, was truly helpful!

Your approach goes in the right direction. However, you made a mistake in your list comprehension: you are using uu as your index whereas uu are the elements of your input data photosignal.
You want something like this:
normalizedphotosignal2 = np.zeros((photosignal.shape[0]-3000))
for i, uu in enumerate(photosignal[3000:]):
normalizedphotosignal2 = (uu - (np.mean(photosignal[i-3000:i]))) / abs(np.mean(photosignal[i-3000:i]))
Keep in mind that for-loops are relatively slow in python. If performance is an issue here, you could try avoiding the for loop and use numpy methods instead (e.g. have a look at Moving average or running mean).
Hope this helps.

Memoized to DP solution - Making Change

Recently I read a problem to practice DP. I wasn't able to come up with one, so I tried a recursive solution which I later modified to use memoization. The problem statement is as follows :-
Making Change. You are given n types of coin denominations of values
v(1) < v(2) < ... < v(n) (all integers). Assume v(1) = 1, so you can
always make change for any amount of money C. Give an algorithm which
makes change for an amount of money C with as few coins as possible.
[on problem set 4]
I got the question from here
My solution was as follows :-
def memoized_make_change(L, index, cost, d):
if index == 0:
return cost
if (index, cost) in d:
return d[(index, cost)]
count = cost / L[index]
val1 = memoized_make_change(L, index-1, cost%L[index], d) + count
val2 = memoized_make_change(L, index-1, cost, d)
x = min(val1, val2)
d[(index, cost)] = x
return x
This is how I've understood my solution to the problem. Assume that the denominations are stored in L in ascending order. As I iterate from the end to the beginning, I have a choice to either choose a denomination or not choose it. If I choose it, I then recurse to satisfy the remaining amount with lower denominations. If I do not choose it, I recurse to satisfy the current amount with lower denominations.
Either way, at a given function call, I find the best(lowest count) to satisfy a given amount.
Could I have some help in bridging the thought process from here onward to reach a DP solution? I'm not doing this as any HW, this is just for fun and practice. I don't really need any code either, just some help in explaining the thought process would be perfect.
[EDIT]
I recall reading that function calls are expensive and is the reason why bottom up(based on iteration) might be preferred. Is that possible for this problem?

Here is a general approach for converting memoized recursive solutions to "traditional" bottom-up DP ones, in cases where this is possible.
First, let's express our general "memoized recursive solution". Here, x represents all the parameters that change on each recursive call. We want this to be a tuple of positive integers - in your case, (index, cost). I omit anything that's constant across the recursion (in your case, L), and I suppose that I have a global cache. (But FWIW, in Python you should just use the lru_cache decorator from the standard library functools module rather than managing the cache yourself.)
To solve for(x):
If x in cache: return cache[x]
Handle base cases, i.e. where one or more components of x is zero
Otherwise:
Make one or more recursive calls
Combine those results into `result`
cache[x] = result
return result
The basic idea in dynamic programming is simply to evaluate the base cases first and work upward:
To solve for(x):
For y starting at (0, 0, ...) and increasing towards x:
Do all the stuff from above
However, two neat things happen when we arrange the code this way:
As long as the order of y values is chosen properly (this is trivial when there's only one vector component, of course), we can arrange that the results for the recursive call are always in cache (i.e. we already calculated them earlier, because y had that value on a previous iteration of the loop). So instead of actually making the recursive call, we replace it directly with a cache lookup.
Since every component of y will use consecutively increasing values, and will be placed in the cache in order, we can use a multidimensional array (nested lists, or else a Numpy array) to store the values instead of a dictionary.
So we get something like:
To solve for(x):
cache = multidimensional array sized according to x
for i in range(first component of x):
for j in ...:
(as many loops as needed; better yet use `itertools.product`)
If this is a base case, write the appropriate value to cache
Otherwise, compute "recursive" index values to use, look up
the values, perform the computation and store the result
return the appropriate ("last") value from cache

I suggest considering the relationship between the value you are constructing and the values you need for it.
In this case you are constructing a value for index, cost based on:
index-1 and cost
index-1 and cost%L[index]
What you are searching for is a way of iterating over the choices such that you will always have precalculated everything you need.
In this case you can simply change the code to the iterative approach:
for each choice of index 0 upwards:
for each choice of cost:
compute value corresponding to index,cost
In practice, I find that the iterative approach can be significantly faster (e.g. *4 perhaps) for simple problems as it avoids the overhead of function calls and checking the cache for preexisting values.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Parallelize Python Loop: Questions - python

Related

What is the advantage of generator in terms of memory in these two examples?

How can I correspond multiple ‘values’ to a single key? (in a matrix diagonalization scenario)

Adding further properties to a function: timing and iteration improvement

How to calculate Delta F / F using python?

Memoized to DP solution - Making Change

Categories

Resources