Passing float to a nested for loop and storing output - python

I have a strong background in Matlab, and I am trying to switch to python. I am trying to write a nested for loop with numpy array and storing output values.
My code reads like:
import numpy as np
import math
# T parameter
kk = np.arange(0, 20, 0.1)
print(len(kk))
# V parameter
pp = np.arange(1, 5, 1)
print(len(pp))
a = len(kk)
b = len(pp)
P = np.zeros((a,b))
for T in kk:
print(T)
for V in pp:
print(V)
P = math.exp(-T*V/10)
print(P)
Explanation/Question
kk, pp are the vectors. In for loop(s) correct values of T and V parameters are being called. However, values of P are not being stored.
I tried the following change P[T][V] = math.exp(-T*V/10), I get the following error: IndexError: only integers, slices (:), ellipsis (...), numpy.newaxis (None) and integer or boolean arrays are valid indices
Any help will be appreciated. Thank you in advance.

In this code you define P as a 2d array. But the loop you assign the scalar result of the math.exp expression to that variable. That replaces the original P value, and also replaces the value calculated in the previous loop. This kind of loop doesn't work in MATLAB does it? Don't you have to assign the scalar value to some 'slot' in P?
P = np.zeros((a,b))
for T in kk:
print(T)
for V in pp:
print(V)
P = math.exp(-T*V/10)
A better way:
In [301]: kk = np.arange(0,20,0.1)
In [302]: kk.shape
Out[302]: (200,)
In [303]: pp = np.arange(1, 5,1)
In [304]: pp.shape
Out[304]: (4,)
In numpy we prefer to use fast whole-array methods. Here I use broadcasting to perform an outer like calculation of kk with pp.
In [305]: P = np.exp(-kk[:,None]*pp/10)
In [306]: P.shape
Out[306]: (200, 4)
(I believe MATLAB added broadcasting in recent years; numpy has had it from the beginning.)
Comparing this with the iterative version:
In [309]: P1 = np.zeros((200,4))
...: for i in range(0,len(kk)):
...: for j in range(0,len(pp)):
...: T = kk[i]
...: V = pp[j]
...: P1[i,j] = math.exp(-T*V/10)
...:
In [310]: P1.shape
Out[310]: (200, 4)
In [311]: np.allclose(P,P1)
Out[311]: True
A cleaner way of writing indexed iteration in Python is with enumerate:
In [312]: P1 = np.zeros((200,4))
...: for i,T in enumerate(kk):
...: for j,V in enumerate(pp):
...: P1[i,j] = math.exp(-T*V/10)

Based on the line where you mentioned trying P[T][V] = math.exp(-T*V/10), you might also be interested in this option:
import numpy as np
import math
# T parameter
kk = np.arange(0, 20, 0.1)
print(len(kk))
# V parameter
pp = np.arange(1, 5, 1)
print(len(pp))
a = len(kk)
b = len(pp)
P = np.zeros((a,b))
for i in range(0,len(kk)):
for j in range(0,len(pp)):
T = kk[i]
V = pp[j]
P[i][j] = math.exp(-T*V/10)
# you can also simply do this:
#P[i][j] = math.exp(-kk[i]*pp[j]/10)
Although it's straightforward, it's not particularly clean. Since you mentioned that you're switching to python, I'd take a look at hpaulj's answer for a more thorough explanation and as well as a nice alternative to iterating through arrays.

You can make a dictionary if you want to see the keys and values per your comment. This might make more sense actually. I would recommend against a plethora of dynamically created variables, as with a dictionary, you can call the entire dictionary OR specific values, which you could store as variables later anyway. Obviously, it depends on the scope of your project and what solution makes sense, but you could also turn the dictionary into a pandas dataframe with pd.DataFrame() for analysis, so it gives you flexibility. You said you are new to python, so you might want to check out pandas if you haven't heard of it, but you probably have as it is one of or the most popular library.
import numpy as np
import math
P_dict = {}
# T parameter
kk = np.arange(0, 20, 0.1)
# print(len(kk))
# V parameter
pp = np.arange(1, 5, 1)
# print(len(pp))
a = len(kk)
b = len(pp)
P = np.zeros((a,b))
for T in kk:
# print(T)
for V in pp:
# print(V)
P = math.exp(-T*V/10)
key = f'{T},{V}'
value = P
P_dict[key] = value
print(P_dict)
This is how you would call a value in the dict based on the key.
P_dict['19.900000000000002,3']
You can also edit this line of code to whatever format you want: key = f'{T},{V}' and call the key acording to the format as I have done in my example.
Output:
0.002554241418992996
Either way, a list or a dict prints some interesting python abstract art!

Related

How to find the index corresponding to an array in another array? (python)

There are 2 arrays A and B:
import numpy as np
A = np.array([1,2,3,4])
B = np.array([2,5,2,1,1,6])
If the element in A exists in B, output their index in B. The ideal output C is:
C = np.array([3,4,0,2])
While a bit ugly, this should work. You want to use np.where and np.concatenate. I'm using a placeholder list to store values and recombine, there may be a smoother method, but this should do the trick until the further reading of the docs may provide a better solution.
import numpy as np
A = np.array([1,2,3,4])
B = np.array([2,5,2,1,1,6])
preC= []
for i in A:
if len(np.where(B == i)[0]) > 0:
preC.append(np.where(B == i)[0])
C = np.concatenate(preC)
print(C)

Optimize code for step function using only NumPy

I'm trying to optimize the function 'pw' in the following code using only NumPy functions (or perhaps list comprehensions).
from time import time
import numpy as np
def pw(x, udata):
"""
Creates the step function
| 1, if d0 <= x < d1
| 2, if d1 <= x < d2
pw(x,data) = ...
| N, if d(N-1) <= x < dN
| 0, otherwise
where di is the ith element in data.
INPUT: x -- interval which the step function is defined over
data -- an ordered set of data (without repetitions)
OUTPUT: pw_func -- an array of size x.shape[0]
"""
vals = np.arange(1,udata.shape[0]+1).reshape(udata.shape[0],1)
pw_func = np.sum(np.where(np.greater_equal(x,udata)*np.less(x,np.roll(udata,-1)),vals,0),axis=0)
return pw_func
N = 50000
x = np.linspace(0,10,N)
data = [1,3,4,5,5,7]
udata = np.unique(data)
ti = time()
pw(x,udata)
tf = time()
print(tf - ti)
import cProfile
cProfile.run('pw(x,udata)')
The cProfile.run is telling me that most of the overhead is coming from np.where (about 1 ms) but I'd like to create faster code if possible. It seems that performing the operations row-wise versus column-wise makes some difference, unless I'm mistaken, but I think I've accounted for it. I know that sometimes list comprehensions can be faster but I couldn't figure out a faster way than what I'm doing using it.
Searchsorted seems to yield better performance but that 1 ms still remains on my computer:
(modified)
def pw(xx, uu):
"""
Creates the step function
| 1, if d0 <= x < d1
| 2, if d1 <= x < d2
pw(x,data) = ...
| N, if d(N-1) <= x < dN
| 0, otherwise
where di is the ith element in data.
INPUT: x -- interval which the step function is defined over
data -- an ordered set of data (without repetitions)
OUTPUT: pw_func -- an array of size x.shape[0]
"""
inds = np.searchsorted(uu, xx, side='right')
vals = np.arange(1,uu.shape[0]+1)
pw_func = vals[inds[inds != uu.shape[0]]]
num_mins = np.sum(xx < np.min(uu))
num_maxs = np.sum(xx > np.max(uu))
pw_func = np.concatenate((np.zeros(num_mins), pw_func, np.zeros(xx.shape[0]-pw_func.shape[0]-num_mins)))
return pw_func
This answer using piecewise seems pretty close, but that's on a scalar x0 and x1. How would I do it on arrays? And would it be more efficient?
Understandably, x may be pretty big but I'm trying to put it through a stress test.
I am still learning though so some hints or tricks that can help me out would be great.
EDIT
There seems to be a mistake in the second function since the resulting array from the second function doesn't match the first one (which I'm confident that it works):
N1 = pw1(x,udata.reshape(udata.shape[0],1)).shape[0]
N2 = np.sum(pw1(x,udata.reshape(udata.shape[0],1)) == pw2(x,udata))
print(N1 - N2)
yields
15000
data points that are not the same. So it seems that I don't know how to use 'searchsorted'.
EDIT 2
Actually I fixed it:
pw_func = vals[inds[inds != uu.shape[0]]]
was changed to
pw_func = vals[inds[inds[(inds != uu.shape[0])*(inds != 0)]-1]]
so at least the resulting arrays match. But the question still remains on whether there's a more efficient way of going about doing this.
EDIT 3
Thanks Tin Lai for pointing out the mistake. This one should work
pw_func = vals[inds[(inds != uu.shape[0])*(inds != 0)]-1]
Maybe a more readable way of presenting it would be
non_endpts = (inds != uu.shape[0])*(inds != 0) # only consider the points in between the min/max data values
shift_inds = inds[non_endpts]-1 # searchsorted side='right' includes the left end point and not right end point so a shift is needed
pw_func = vals[shift_inds]
I think I got lost in all those brackets! I guess that's the importance of readability.
A very abstract yet interesting problem! Thanks for entertaining me, I had fun :)
p.s. I'm not sure about your pw2 I wasn't able to get it output the same as pw1.
For reference the original pws:
def pw1(x, udata):
vals = np.arange(1,udata.shape[0]+1).reshape(udata.shape[0],1)
pw_func = np.sum(np.where(np.greater_equal(x,udata)*np.less(x,np.roll(udata,-1)),vals,0),axis=0)
return pw_func
def pw2(xx, uu):
inds = np.searchsorted(uu, xx, side='right')
vals = np.arange(1,uu.shape[0]+1)
pw_func = vals[inds[inds[(inds != uu.shape[0])*(inds != 0)]-1]]
num_mins = np.sum(xx < np.min(uu))
num_maxs = np.sum(xx > np.max(uu))
pw_func = np.concatenate((np.zeros(num_mins), pw_func, np.zeros(xx.shape[0]-pw_func.shape[0]-num_mins)))
return pw_func
My first attempt was utilising a lot of boardcasting operation from numpy:
def pw3(x, udata):
# the None slice is to create new axis
step_bool = x >= udata[None,:].T
# we exploit the fact that bools are integer value of 1s
# skipping the last value in "data"
step_vals = np.sum(step_bool[:-1], axis=0)
# for the step_bool that we skipped from previous step (last index)
# we set it to zerp so that we can negate the step_vals once we reached
# the last value in "data"
step_vals[step_bool[-1]] = 0
return step_vals
After looking at the searchsorted from your pw2 I had a new approach that utilise it with much higher performance:
def pw4(x, udata):
inds = np.searchsorted(udata, x, side='right')
# fix-ups the last data if x is already out of range of data[-1]
if x[-1] > udata[-1]:
inds[inds == inds[-1]] = 0
return inds
Plots with:
plt.plot(pw1(x,udata.reshape(udata.shape[0],1)), label='pw1')
plt.plot(pw2(x,udata), label='pw2')
plt.plot(pw3(x,udata), label='pw3')
plt.plot(pw4(x,udata), label='pw4')
with data = [1,3,4,5,5,7]:
with data = [1,3,4,5,5,7,11]
pw1,pw3,pw4 are all identical
print(np.all(pw1(x,udata.reshape(udata.shape[0],1)) == pw3(x,udata)))
>>> True
print(np.all(pw1(x,udata.reshape(udata.shape[0],1)) == pw4(x,udata)))
>>> True
Performance: (timeit by default runs 3 times, average of number=N of times)
print(timeit.Timer('pw1(x,udata.reshape(udata.shape[0],1))', "from __main__ import pw1, x, udata").repeat(number=1000))
>>> [3.1938983199979702, 1.6096494779994828, 1.962694135003403]
print(timeit.Timer('pw2(x,udata)', "from __main__ import pw2, x, udata").repeat(number=1000))
>>> [0.6884554479984217, 0.6075002400029916, 0.7799002879983163]
print(timeit.Timer('pw3(x,udata)', "from __main__ import pw3, x, udata").repeat(number=1000))
>>> [0.7369808239964186, 0.7557657590004965, 0.8088172269999632]
print(timeit.Timer('pw4(x,udata)', "from __main__ import pw4, x, udata").repeat(number=1000))
>>> [0.20514375300263055, 0.20203858999957447, 0.19906871100101853]

Rounding a list of values to the nearest value from another list in python

Suppose I have the following two arrays:
>>> a = np.random.normal(size=(5,))
>>> a
array([ 1.42185826, 1.85726088, -0.18968258, 0.55150255, -1.04356681])
>>> b = np.random.normal(size=(10,10))
>>> b
array([[ 0.64207828, -1.08930317, 0.22795289, 0.13990505, -0.9936441 ,
1.07150754, 0.1701072 , 0.83970818, -0.63938211, -0.76914925],
[ 0.07776129, -0.37606964, -0.54082077, 0.33910246, 0.79950839,
0.33353221, 0.00967273, 0.62224009, -0.2007335 , -0.3458876 ],
[ 2.08751603, -0.52128218, 1.54390634, 0.96715102, 0.799938 ,
0.03702108, 0.36095493, -0.13004965, -1.12163463, 0.32031951],
[-2.34856521, 0.11583369, -0.0056261 , 0.80155082, 0.33421475,
-1.23644508, -1.49667424, -1.01799365, -0.58232326, 0.404464 ],
[-0.6289335 , 0.63654201, -1.28064055, -1.01977467, 0.86871352,
0.84909353, 0.33036771, 0.2604609 , -0.21102014, 0.78748329],
[ 1.44763687, 0.84205291, 0.76841512, 1.05214051, 2.11847126,
-0.7389102 , 0.74964783, -1.78074088, -0.57582084, -0.67956203],
[-1.00599479, -0.93125754, 1.43709533, 1.39308038, 1.62793589,
-0.2744919 , -0.52720952, -0.40644809, 0.14809867, -1.49267633],
[-1.8240385 , -0.5416585 , 1.10750423, 0.56598464, 0.73927224,
-0.54362927, 0.84243497, -0.56753587, 0.70591902, -0.26271302],
[-1.19179547, -1.38993415, -1.99469983, -1.09749452, 1.28697997,
-0.74650318, 1.76384156, 0.33938808, 0.61647274, -0.42166111],
[-0.14147554, -0.96192206, 0.14434349, 1.28437894, -0.38865447,
-1.42540195, 0.93105528, 0.28993325, -1.16119916, -0.58244758]])
I have to find a way to round all values from b to the nearest value found in a.
Does anyone know of a good way to do this with python? I am at a total loss myself.
Here is something you can try
import numpy as np
def rounder(values):
def f(x):
idx = np.argmin(np.abs(values - x))
return values[idx]
return np.frompyfunc(f, 1, 1)
a = np.random.normal(size=(5,))
b = np.random.normal(size=(10,10))
rounded = rounder(a)(b)
print(rounded)
The rounder function takes the values which we want to round to. It creates a function which takes a scalar and returns the closest element from the values array. We then transform this function to a broadcast-able function using numpy.frompyfunc. This way you are not limited to using this on 2d arrays, numpy automatically does broadcasting for you without any loops.
If you sort a you can use bisect to find the index in array a where each element from the sub arrays of array b would land:
import numpy as np
from bisect import bisect
a = np.random.normal(size=(5,))
b = np.random.normal(size=(10, 10))
a.sort()
size = a.size
for sub in b:
for ind2, ele in enumerate(sub):
i = bisect(a, ele, hi=size-1)
i1, i2 = a[i], a[i-1]
sub[ind2] = i1 if abs(i1 - ele) < abs(i2 - ele) else i2
Assuming a will always be 1 dimensional, and that b can have any dimension in this solution.
Create two temporary arrays tiling a and b into the dimensions of the other (here both will now have a shape of (5,10,10)).
at = np.tile(np.reshape(a, (-1, *list(np.ones(len(b.shape)).astype(int)))), (1, *b.shape))
bt = np.tile(b, (a.size, *list(np.ones(len(b.shape)).astype(int))))
For the nearest operation, you can take the absolute value of the difference between the two. The minimum value of that operation in the first dimension (dimension 0) gives the index in the a array.
idx = np.argmin(np.abs(at-bt),axis=0)
All that is left is to select the values from array a using the index, which will return an array in the shape of b with the nearest values from a.
ans = a[idx]
This method can also be used (modifying how the index is calculated) to do other operations, such as a floor, ceil, etc.
Note that this solution can be memory intensive, which is not much of an issue with small arrays. A looping solution could be less memory intensive at the cost of speed.
I don't know Numpy, but I don't think knowledge of Numpy is needed to be able to answer this question. Assuming that an array can be iterated and modified in the same way as a list, the following code solves your problem by using a nested loop to find the closest value.
for i in range(len(b)):
for k in range(len(b[i])):
closest = a[0]
for j in range(1, len(a)):
if abs(a[j] - b[i][k]) < abs(closest - b[i][k]):
closest = a[j]
b[i][k] = closest
Disclaimer: a more pythonic approach may exist.

Evaluate array at specific subarray

I warn in advance: I may be utterly confused at the moment. I tell a short story about what I actually try to achieve because that may clear things up. Say I have f(a,b,c,d,e), and I want to find arg max (d,e) f(a,b,c,d,e). Consider a (trivial example of a ) discretized grid F of f:
F = np.tile(np.arange(0,10,0.1)[newaxis,newaxis,:,newaxis,newaxis], [10, 10, 1, 10, 10])
maxE = F.max(axis=-1)
argmaxD = maxE.argmax(axis=-1)
maxD = F.max(axis=-2)
argmaxE = maxD.argmax(axis=-1)
This is the case how I typically solve the discretized version. But now assume instead, that I want to solve arg max d f(a,b,c,d,e=X): Instead of optimally chosen e for every other input, e is a fixed and given (of size AxBxCxD, which in this example would be 10x10x100x10). I have troubles solving this.
My naive approach was
X = np.tile(np.arange(0,10)[newaxis,newaxis,:,newaxis], [10,10,1,10])
maxX = F[X]
argmaxD = maxX.argmax(axis=-1)
However, the huge surge of memory that crashes my IDE implies that F[X] is apparently not what I was looking for.
Performance is key.
I believe you can do it like this, but maybe there's a better way..
n = 10
F = np.tile(np.arange(0,n,0.1)[None,None,:,None,None], [n, n, 1, n, n])
X = np.tile(np.arange(0,n)[None,None,:,None], [n, n, 1, n])
a,b,c,d = np.ogrid[:n,:n,:n,:n]
argmaxD = F[a,b,c,d,X].argmax(axis=-1)
Above X doesn't occupy the whole space, as we discussed in the comments. If you would like to choose e for all a,b,c and d you could do e.g.:
X = np.tile(np.arange(0,n,0.1).astype(int)[None,None,:,None], [n, n, 1, n])
a,b,c,d = np.ogrid[:n,:n,:100,:n]
argmaxD = F[a,b,c,d,X].argmax(axis=-1)
Also, notice that instead of tile you could make use of broadcasting. But then F[a,b,c,d,X] has a singular dimension so you should provide something like axis=3:
X = np.arange(0,n,0.1).astype(int)[None,None,:,None]
a,b,c,d = np.ogrid[:n,:n,:100,:n]
argmaxD = F[a,b,c,d,X].argmax(axis=3)
This would be my idea to solve this.
from itertools import product, starmap
f = lambda a,b,c,d,e : d / e
args_iterable = product([1],[2],[3],range(1,1000),range(1,1000))
max_val, max_args = max(starmap(lambda *args: (f(*args), args) , args_iterable))
print max_args

NumPy: 1D interpolation of a 3D array

I'm rather new to NumPy. Anyone have an idea for making this code, especially the nested loops, more compact/efficient? BTW, dist and data are three-dimensional numpy arrays.
def interpolate_to_distance(self,distance):
interpolated_data=np.ndarray(self.dist.shape[1:])
for j in range(interpolated_data.shape[1]):
for i in range(interpolated_data.shape[0]):
interpolated_data[i,j]=np.interp(
distance,self.dist[:,i,j],self.data[:,i,j])
return(interpolated_data)
Thanks!
Alright, I'll take a swag with this:
def interpolate_to_distance(self, distance):
dshape = self.dist.shape
dist = self.dist.T.reshape(-1, dshape[-1])
data = self.data.T.reshape(-1, dshape[-1])
intdata = np.array([np.interp(distance, di, da)
for di, da in zip(dist, data)])
return intdata.reshape(dshape[0:2]).T
It at least removes one loop (and those nested indices), but it's not much faster than the original, ~20% faster according to %timeit in IPython. On the other hand, there's a lot of (probably unnecessary, ultimately) transposing and reshaping going on.
For the record, I wrapped it up in a dummy class and filled some 3 x 3 x 3 arrays with random numbers to test:
import numpy as np
class TestClass(object):
def interpolate_to_distance(self, distance):
dshape = self.dist.shape
dist = self.dist.T.reshape(-1, dshape[-1])
data = self.data.T.reshape(-1, dshape[-1])
intdata = np.array([np.interp(distance, di, da)
for di, da in zip(dist, data)])
return intdata.reshape(dshape[0:2]).T
def interpolate_to_distance_old(self, distance):
interpolated_data=np.ndarray(self.dist.shape[1:])
for j in range(interpolated_data.shape[1]):
for i in range(interpolated_data.shape[0]):
interpolated_data[i,j]=np.interp(
distance,self.dist[:,i,j],self.data[:,i,j])
return(interpolated_data)
if __name__ == '__main__':
testobj = TestClass()
testobj.dist = np.random.randn(3, 3, 3)
testobj.data = np.random.randn(3, 3, 3)
distance = 0
print 'Old:\n', testobj.interpolate_to_distance_old(distance)
print 'New:\n', testobj.interpolate_to_distance(distance)
Which prints (for my particular set of randoms):
Old:
[[-0.59557042 -0.42706077 0.94629049]
[ 0.55509032 -0.67808257 -0.74214045]
[ 1.03779189 -1.17605275 0.00317679]]
New:
[[-0.59557042 -0.42706077 0.94629049]
[ 0.55509032 -0.67808257 -0.74214045]
[ 1.03779189 -1.17605275 0.00317679]]
I also tried np.vectorize(np.interp) but couldn't get that to work. I suspect that would be much faster if it did work.
I couldn't get np.fromfunction to work either, as it passed (2) 3 x 3 (in this case) arrays of indices to np.interp, the same arrays you get from np.mgrid.
One other note: according the the docs for np.interp,
np.interp does not check that the x-coordinate sequence xp is increasing. If
xp is not increasing, the results are nonsense. A simple check for
increasingness is::
np.all(np.diff(xp) > 0)
Obviously, my random numbers violate the 'always increasing' rule, but you'll have to be more careful.

Categories

Resources