I have a dictionary of 2D arrays and I would like to normalize each row of each 2D array by its mean.
I have:
for key, value in sorted(baseline.items()):
for i in baseline[str(key)]:
i = i / np.mean(i)
Where:
baseline is a dict
baseline[str(key)] is a 2D numpy array
i is a 1D array
print(i) results in the appropriately updated values, however the individual rows across baseline.items() do not get updated.
What am I missing?
First of all, here is a solution:
for i in baseline.values():
i /= i.mean(axis=1, keepdims=True)
Now as to why. The loop for i in baseline[key]: binds a view into the row of a 2D array to the name i at each iteration. You don't need str(key) because the outer loop ensures that the keys are correct. In fact, avoid transforming the keys unnecessarily to avoid surprises, like if you accidentally get an integer key.
The line i = i / np.mean(i) does not do in-place division of the array by its mean. It computes the array i / np.mean(i), then rebinds the name i to the new array. The new array is then discarded on the next iteration.
You can fix this by re-assigning into the slice that i represents:
i[:] = i / np.mean(i)
Alternatively, you can perform the division in-place using the correct operator:
i /= np.mean(i)
As you can see in my solution, there is no need to iterate over the rows at all. np.mean is a vectorized function that can operate along any axis of an array. By setting keepdims=True, you ensure that the result has the right shape to be broadcasted right back over the original when you divide them.
A less flexible alternative to i.mean(axis=1, keepdims=True) specific for 2D arrays is
i.mean(axis=1)[:, None]
Related
Any efficient way to merge one tensor to another in Pytorch, but on specific indexes.
Here is my full problem.
I have a list of indexes of a tensor in below code xy is the original tensor.
I need to preserve the rows (those rows who are in indexes list) of xy and apply some function on elements other than those indexes (For simplicity let say the function is 'multiply them with two),
xy = torch.rand(100,4)
indexes=[1,2,55,44,66,99,3,65,47,88,99,0]
Then merge them back into the original tensor.
This is what I have done so far:
I create a mask tensor
indexes=[1,2,55,44,66,99,3,65,47,88,99,0]
xy = torch.rand(100,4)
mask=[]
for i in range(0,xy.shape[0]):
if i in indexes:
mask.append(False)
else:
mask.append(True)
print(mask)
import numpy as np
target_mask = torch.from_numpy(np.array(mask, dtype=bool))
print(target_mask.sum()) #output is 89 as these are element other than preserved.
Apply the function on masked rows
zy = xy[target_mask]
print(zy)
zy=zy*2
print(zy)
Code above is working fine and posted here to clarify the problem
Now I want to merge tensor zy into xy on specified index saved in the list indexes.
Here is the pseudocode I made, as one can see it is too complex and need 3 for loops to complete the task. and it will be too much resources wastage.
# pseudocode
for masked_row in indexes:
for xy_rows_index in xy:
if xy_rows_index= masked_row
pass
else:
take zy tensor row and replace here #another loop to read zy.
But I am not sure what is an efficient way to merge them, as I don't want to use NumPy or for loop etc. It will make the process slow, as the original tensor is too big and I am going to use GPU.
Any efficient way in Pytorch for this?
Once you have your mask you can assign updated values in place.
zy = 2 * xy[target_mask]
xy[target_mask] = zy
As for acquiring the mask I don't see a problem necessarily with your approach, though using the built-in set operations would probably be more efficient. This also gives an index tensor instead of a mask, which, depending on the number of indices being updated, may be more efficient.
i = list(set(range(len(xy)))-set(indexes))
zy = 2 * xy[i]
xy[i] = zy
Edit:
To address the comment, specifically to find the complement of indices of i we can do
i_complement = list(set(range(len(xy)))-set(i))
However, assuming indexes contains only values between 0 and len(xy)-1 then we could equivalently use i_complement = len(set(indexes)), which just removes the repeated values in indexes.
Let's say I have a function (called numpyarrayfunction) that outputs an array every time I run it. I would like to run the function multiple times and store the resulting arrays. Obviously, the current method that I am using to do this -
numpyarray = np.zeros((5))
for i in range(5):
numpyarray[i] = numpyarrayfunction
generates an error message since I am trying to store an array within an array.
Eventually, what I would like to do is to take the average of the numbers that are in the arrays, and then take the average of these averages. But for the moment, it would be useful to just know how to store the arrays!
Thank you for your help!
As comments and other answers have already laid out, a good way to do this is to store the arrays being returned by numpyarrayfunction in a normal Python list.
If you want everything to be in a single numpy array (for, say, memory efficiency or computation speed), and the arrays returned by numpyarrayfunction are of a fixed length n, you could make numpyarray multidimensional:
numpyarray = np.empty((5, n))
for i in range(5):
numpyarray[i, :] = numpyarrayfunction
Then you could do np.average(numpyarray, axis = 1) to average over the second axis, which would give you back a one-dimensional array with the average of each array you got from numpyarrayfunction. np.average(numpyarray) would be the average over all the elements, or np.average(np.average(numpyarray, axis = 1)) if you really want the average value of the averages.
More on numpy array indexing.
I initially misread what was going on inside the for loop there. The reason you're getting an error is because numpy arrays will only store numeric types by default, and numpyarrayfunction is returning a non-numeric value (from the name, probably another numpy array). If that function already returns a full numpy array, then you can do something more like this:
arrays = []
for i in range(5):
arrays.append(numpyarrayfunction(args))
Then, you can take the average like so:
avgarray = np.zeros((len(arrays[0])))
for array in arrays:
avgarray += array
avgarray = avgarray/len(arrays)
I have an array of 2d indices.
indices = [[2,4], [6,77], [102,554]]
Now, I have a different 4-dimensional array, arr, and I want to only extract an array (it is an array, since it is 4-dimensional) with corresponding index in the indices array. It is equivalent to the following code.
for i in range(len(indices)):
output[i] = arr[indices[i][0], indices[i][1]]
However, I realized that using explicit for-loop yields a slow result. Is there any built-in numpy API that I can utilized? At this point, I tried using np.choose, np.put, np.take, but did not succeed to yield what I wanted. Thank you!
We need to index into the first two axes with the two columns from indices (thinking of it as an array).
Thus, simply convert to array and index, like so -
indices_arr = np.array(indices)
out = arr[indices_arr[:,0], indices_arr[:,1]]
Or we could extract those directly without converting to array and then index -
d0,d1 = [i[0] for i in indices], [i[1] for i in indices]
out = arr[d0,d1]
Another way to extract the elements would be with conversion to tuple, like so -
out = arr[tuple(indices_arr.T)]
If indices is already an array, skip the conversion process and use indices in places where we had indices_arr.
Try using the take function of numpy arrays. Your code should be something like:
outputarray= np.take(arr,indices)
So I have image data which I am iterating through in order to find the pixel which have useful data in them, I then need to find these coordinates subject to a conditional statement and then put these into an array or DataFrame. The code I have so far is:
pix_coor = np.empty((0,2))
for (x,y), value in np.ndenumerate(data_int):
if value >= sigma3:
pix_coor.append([x,y])
where data is just an image array (129,129). All the pixels that have a value larger than sigma3 are useful and the other ones I dont need.
Creating an empty array works fine but when I append this it doesn't seem to work, I need to end up with an array which has two columns of x and y values for the useful pixels. Any ideas?
You could simply use np.argwhere for a vectorized solution -
pix_coor = np.argwhere(data_int >= sigma3)
In numpy, array.append is not an inplace operation, instead it copies the entire array into newly allocated memory (big enough to hold it along with the new values), and returns the new array. Therefore it should be used as such:
new_arr = arr.append(values)
Obviously, this is not an efficient way to add elements one by one.
You should use probably a regular python list for this.
Alternatively, pre allocate the numpy array with all values and then resize it:
pix_coor = np.empty((data_int.size, 2), int)
c = 0
for (x, y), value in np.ndenumerate(data_int):
if value >= sigma3:
pix_coor[c] = (x, y)
c += 1
numpy.resize(pix_coor, (c, 2))
Note that I used np.empty((data_int.size, 2), int), since your coordinates are integral, while numpy defaults to floats.
I am very new to Python, and I am trying to get used to performing Python's array operations rather than looping through arrays. Below is an example of the kind of looping operation I am doing, but am unable to work out a suitable pure array operation that does not rely on loops:
import numpy as np
def f(arg1, arg2):
# an arbitrary function
def myFunction(a1DNumpyArray):
A = a1DNumpyArray
# Create a square array with each dimension the size of the argument array.
B = np.zeros((A.size, A.size))
# Function f is a function of two elements of the 1D array. For each
# element, i, I want to perform the function on it and every element
# before it, and store the result in the square array, multiplied by
# the difference between the ith and (i-1)th element.
for i in range(A.size):
B[i,:i] = f(A[i], A[:i])*(A[i]-A[i-1])
# Sum through j and return full sums as 1D array.
return np.sum(B, axis=0)
In short, I am integrating a function which takes two elements of the same array as arguments, returning an array of results of the integral.
Is there a more compact way to do this, without using loops?
The use of an arbitrary f function, and this [i, :i] business complicates by passing a loop.
Most of the fast compiled numpy operations work on the whole array, or whole rows and/or columns, and effectively do so in parallel. Loops that are inherently sequential (value from one loop depends on the previous) don't fit well. And different size lists or arrays in each loop are also a good indicator that 'vectorizing' will be difficult.
for i in range(A.size):
B[i,:i] = f(A[i], A[:i])*(A[i]-A[i-1])
With a sample A and known f (as simple as arg1*arg2), I'd generate a B array, and look for patterns that treat B as a whole. At first glance it looks like your B is a lower triangle. There are functions to help index those. But that final sum might change the picture.
Sometimes I tackle these problems with a bottom up approach, trying to remove inner loops first. But in this case, I think some sort of big-picture approach is needed.