This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Numpy/Python: Array iteration without for-loop
Suppose I have a matrix of size 100x100 and I would like to compare each pixel to its direct neighbor (left, upper, right, lower) and then do some operations on the current matrix or a new one of the same size.
A sample code in Python/Numpy could look like the following:
(the comparison >0.5 has no meaning, I just want to give a working example for some operation while comparing the neighbors)
import numpy as np
my_matrix = np.random.rand(100,100)
new_matrix = np.array((100,100))
my_range = np.arange(1,99)
for i in my_range:
for j in my_range:
if my_matrix[i,j+1] > 0.5:
new_matrix[i,j+1] = 1
if my_matrix[i,j-1] > 0.5:
new_matrix[i,j-1] = 1
if my_matrix[i+1,j] > 0.5:
new_matrix[i+1,j] = 1
if my_matrix[i-1,j] > 0.5:
new_matrix[i-1,j] = 1
if my_matrix[i+1,j+1] > 0.5:
new_matrix[i+1,j+1] = 1
if my_matrix[i+1,j-1] > 0.5:
new_matrix[i+1,j-1] = 1
if my_matrix[i-1,j+1] > 0.5:
new_matrix[i-1,j+1] = 1
This can get really nasty if I want to step into one neighboring cell and start from it to compare it to its neighbors ... Do you have some suggestions how this can be done in a more efficient manner? Is this even possible?
I'm not 100% sure what you're aiming for with your code, which ignoring indexing issues at boundaries is equivalent to
new_matrix = my_matrix > 0.5
but you can do advanced versions of these calculation quickly with morphological operations:
import numpy as np
from scipy.ndimage import morphology
a = np.random.rand(5,5)
b = a > 0.5
element = np.array([[0, 1, 0], [1, 1, 1], [0, 1, 0]])
result = morphology.binary_dilation(b, element) * 1
The way to keep this from "getting nasty" is: Encapsulate the neighbor-checking code in a function. Then you can just call it with the coordinates of the neighbor when necessary.
If you need to keep track of which pairs you've checked, so that you don't keep the same ones, use some sort of memoization on top of that.
Related
I am looking for an efficient way to do the following calculations on millions of arrays. For the values in each array, I want to calculate the mean of the values in the bin with most frequency as demonstrated below. Some of the arrays might contain nan values and other values are float. The loop for my actual data takes too long to finish.
import numpy as np
array = np.array([np.random.uniform(0, 10) for i in range(800,)])
# adding nan values
mask = np.random.choice([1, 0], array.shape, p=[.7, .3]).astype(bool)
array[mask] = np.nan
array = array.reshape(50, 16)
bin_values=np.linspace(0, 10, 21)
f = np.apply_along_axis(lambda a: np.histogram(a, bins=bin_values)[0], 1, array)
bin_start = np.apply_along_axis(lambda a: bin_values[np.argmax(a)], 1, f).reshape(array.shape[0], -1)
bin_end = bin_start + (abs(bin_values[1]-bin_values[0])
values = np.zeros(array.shape[0])
for i in range(array.shape[0]):
values[i] = np.nanmean(array[i][(array[i]>=bin_start[i])*(array[i]<bin_end[i])])
Also, when I run the above code I get three warnings. The first is 'RuntimeWarning: Mean of empty slice' for the line where I calculate the value variable. I set a condition in case I have all nan values to skip this line, but the warning did not go away. I was wondering what the reason is. The other two warnings are for when the less and greater_equal conditions do not meet which make sense to me since they might be nan values.
The arrays that I want to run this algorithm on are independent, but I am already processing them with 12 separate scripts. Running the code in parallel would be an option, however, for now I am looking to improve the algorithm itself.
The reason that I am using lambda function is to run numpy.histogram over an axis since it seems the histogram function does not take an axis as an option. I was able to use a mask and remove the loop from the code. The code is 2 times faster now, but I think it still can be improved more.
I can explain what I want to do in more detail by an example if it clarifies it. Imagine I have 36 numbers which are greater than 0 and smaller than 20. Also, I have bins with equal distance of 0.5 over the same interval (0.0_0.5, 0.5_1.0, 1.0_1.5, … , 19.5_20.0). I want to see if I put the 36 numbers in their corresponding bin what would be the mean of the numbers within the bin which contain the most number of numbers.
Please post your solution if you can think of a faster algorithm.
import numpy as np
# creating an array to test the algorithm
array = np.array([np.random.uniform(0, 10) for i in range(800,)])
# adding nan values
mask = np.random.choice([1, 0], array.shape, p=[.7, .3]).astype(bool)
array[mask] = np.nan
array = array.reshape(50, 16)
# the algorithm
bin_values=np.linspace(0, 10, 21)
# calculating the frequency of each bin
f = np.apply_along_axis(lambda a: np.histogram(a, bins=bin_values)[0], 1, array)
bin_start = np.apply_along_axis(lambda a: bin_values[np.argmax(a)], 1, f).reshape(array.shape[0], -1)
bin_end = bin_start + (abs(bin_values[1]-bin_values[0]))
# creating a mask to get the mean over the bin with maximum frequency
mask = (array>=bin_start) * (array<bin_end)
mask_nan = np.tile(np.nan, (mask.shape[0], mask.shape[1]))
mask_nan[mask] = 1
v = np.nanmean(array * mask_nan, axis = 1)
I want to create a random diagonal matrix with size n such that each element in the diagonal entries has 50% chance of being -1 and 50% chance of being 1. Is there any advice for this?
import numpy as np
diagonal_entries = np.random.randint(low = -1, high = 1, size = n)
D = np.diag(diagonal_entries)
However, the problem is that `np.random.randint includes 0 as the value too. I only want -1 and 1, excluding 0.
You can use np.random.choice to sample a vector
import numpy as np
n=100
vec=np.random.choice([-1,1],n)
mat=np.diag(vec)
You can combine a few NumPy routines for a concise routine doing this:
def random_diagonal(n, proba_minus=0):
diagonal = np.ones(n)
diagonal[np.random.random(size=n) < proba_minus] = -1
return np.diagflat(diagonal)
The random routine allows you to define the probability of having "-1" and the routine np.diagflat creates a diagonal matrix from its diagonal. Both operations above are vectorized but for large sizes you need of course to know that there is a temporary array for the boolean mask.
What about something like this:
import numpy as np
diagonal_entries = np.random.randint(low = 0, high = 2, size = 4)
print diagonal_entries
# i*2-1 will map [0,1] -> [2*0-1 == -1, 2*1-1 == 1] == [-1,1]
modified = [i*2-1 for i in diagonal_entries]
D = np.diag(modified)
print D
I used the same function with a little modification on the results to suite your [-1,1] needs.
My 2nd option would be this modified = [1 if i == 1 else -1 for i in diagonal_entries]
I am having the following numpy arrays:
import numpy as np
y2 = np.array([[0.2,0.1,0.8,0.4],[0.4,0.2,0.5,0.1],[0.4,0.2,0.5,0.1]])
y1 = np.array([[1,0,0,0],[0,1,0,0],[0,0,0,1]])
What I am trying to do is to get the position of y1 compared to y2. To be more clear: y1 is the label data and y2 is the predicted data and I want to see in which rank position an algorithm predicted compared with the real data.
I am doing the following:
counter = 0
indexes2 = []
indexes = np.where(y1)[1]
sorted_values = np.argsort(-y2)
for value in sorted_values:
indexes2.append(np.where(value==indexes[counter])[0][0] + 1)
counter += 1
b = np.array(indexes2)
The output is correct:
>>> b
>>> array([2, 2, 3], dtype=int64)
But, I am pretty sure that there is a more elegant way of doing and more optimized. Any hint?
Vectorize the nested loop
We could get rid of the loop by making use of broadcasting -
b = (sorted_values == indexes[:,None]).argmax(1)+1
Some Improvement
For performance, we could optimize the computation of indexes, like so -
indexes = y1.argmax(1)
Bigger Improvement
Additionally, we could optimize on sorted_values computation by avoiding the negation of y2, by doing -
sorted_values2 = np.argsort(y2)
Then, compute b by using broadcasted comparsion as done earlier and subtract the argmax indices from the length of each row. This in effect does the descending ordering along each row as done in the posted question, where we had negation of argsort.
Thus, the final step would be -
b = y2.shape[1] - (sorted_values2 == indexes[:,None]).argmax(1)
I am quite new to Python and I have an array of some parameter detections, some of the values were detected incorrectly and (like 4555555):
array = [1, 20, 55, 33, 4555555, 1]
And I want to somehow smooth it. Right now I'm doing that with a weighted mean:
def smoothify(array):
for i in range(1, len(array) - 2):
array[i] = 0.7 * array[i] + 0.15 * (array[i - 1] + array[i + 1])
return array
But it works pretty bad, of course, we can take a weighted mean of more than 3 elements, but it results in copypasting... I tried to find some native functions for that, but I failed.
Could you please help me with that?
P.S. Sorry if it's a noob question :(
Thanks for your time,
Best regards, Anna
For weighted smoothing purposes, you are basically looking to perform convolution. For our case, since we are dealing with 1D arrays, we can simply use NumPy's 1D convolution function : np.convolve for a vectorized solution. The only important thing to remember here is that the weights are to be reversed given the nature of convolution that uses a reversed version of the kernel that slides across the main input array. Thus, the solution would be -
weights = [0.7,0.15,0.15]
out = np.convolve(array,np.array(weights)[::-1],'same')
If you were looking to get weighted mean, you could get those with out/sum(weights). In our case, since the sum of the given weights is already 1, so the output would stay the same as out.
Let's plot the output alongwith the input for a graphical debugging -
# Input array and weights
array = [1, 20, 55, 33, 455, 200, 100, 20 ]
weights = [0.7,0.15,0.15]
out = np.convolve(array,np.array(weights)[::-1],'same')
x = np.arange(len(array))
f, axarr = plt.subplots(2, sharex=True, sharey=True)
axarr[0].plot(x,array)
axarr[0].set_title('Original and smoothened arrays')
axarr[1].plot(x,out)
Output -
Would suggest numpy.average to help you with this. the trick is getting the weights calculated - below I zip up the three lists - one the same as the original array, the next one step ahead, the next one step behind. Once we have the weights, we feed them into the np.average function
import numpy as np
array = [1, 20, 55, 33, 4555555, 1]
arrayCompare = zip(array, array[1:] + [0], [0] + array)
weights = [.7 * x + .15 * (y + z) for x, y, z in arrayCompare]
avg = np.average(array, weights=weights)
Maybe you want to have a look at numpy and in particular at numpy.average.
Also, did you see this question Weighted moving average in python? Might be helpful, too.
Since you tagged this with numpy I wrote how I would do this with numpy:
import numpy as np
def smoothify(thisarray):
"""
returns moving average of input using:
out(n) = .7*in(n) + 0.15*( in(n-1) + in(n+1) )
"""
# make sure we got a numpy array, else make it one
if type(thisarray) == type([]): thisarray = np.array(thisarray)
# do the moving average by adding three slices of the original array
# returns a numpy array,
# could be modified to return whatever type we put in...
return 0.7 * thisarray[1:-1] + 0.15 * ( thisarray[2:] + thisarray[:-2] )
myarray = [1, 20, 55, 33, 4555555, 1]
smootharray = smoothify(myarray)
Instead of looping through the original array, with numpy you can get "slices" by indexing. The output array will be two items shorter than the input array. The central points (n) are thisarray[1:-1] : "From item index 1 until the last item (not inclusive)". The other slices are "From index 2 until the end" and "Everything except the last two"
Consider the following code
X=np.matrix([[1,-1,1],[-1,0,1]])
print X.T
'''
[[ 1 -1]
[-1 0]
[ 1 1]]
'''
I want to check if a solution exists where the transpose has a <0 solution. For example this would mean checking if the following has a solution
1*y1 + -1*y2 < 0
-1*y1 + 0*y2 < 0
1*y1 + 1*y2 < 0
Tried reading http://docs.scipy.org/doc/numpy/reference/generated/numpy.linalg.solve.html#numpy.linalg.solve but apparently no such luck
It seems that your question is equivalent to asking if the plane that contains the origin and also vectors U=r_[1,-1,1] and V=r_[-1, 0, 1] extends into the octant of 3-d space where all coords are negative.
The cross product UxV (or cross(U,V) is normal to this plane. If this cross-product has three nonzero components all of the same sign, then none of the the normals from it can be in the dreaded octant. For the case of your numbers, I get all three components negative, so there is no solution.
[UPDATE]
In general, the tricky things happen when the normal contains zeros:
Three-zeros: Your original vectors are parallel, or one of them is zero. Pick one that is not zero and if all components have the same sign, then you have a solution.
Two-zeros: Your plane is one of X=0, Z=0, Y=0. Thus one dimension is always nonnegative, there are no solutions.
One-zero: Your plain includes the X, Y or Z axis. There is a solution if and only if the remaining two components of the normal have differing signs.
here is the documentation you need:
numpy apply along axis
import numpy as np:
def func(b,y1,y2):
a = b.T
if a[0]*y1 + a[1]*y2 < 0:
return True
else:
return False
np.apply_along_axis(func,0,X,y1,y2)
so now let's say you want y1 as -1 and y2 as 3:
>>> np.apply_along_axis(func,0,X,-1,3)
array([ True, False, False], dtype=bool)
so this means on transpose the first row (which would be the normal first column) works with your algorithm, the second and third do not!
this is a function for an arbitrary number of Ys as in as large of a matrix as you want:
def func(b,*args):
a = b.T
total = [a[i]*args[i] for i in range(len(args)-1)]
if sum(total) < 0:
return True
else:
return False