Explanation on Numpy Broadcasting Answer - python

I recently posted a question here which was answered exactly as I asked. However, I think I overestimated my ability to manipulate the answer further. I read the broadcasting doc, and followed a few links that led me way back to 2002 about numpy broadcasting.
I've used the second method of array creation using broadcasting:
N = 10
out = np.zeros((N**3,4),dtype=int)
out[:,:3] = (np.arange(N**3)[:,None]/[N**2,N,1])%N
which outputs:
[[0,0,0,0]
[0,0,1,0]
...
[0,1,0,0]
[0,1,1,0]
...
[9,9,8,0]
[9,9,9,0]]
but I do not understand via the docs how to manipulate that. I would ideally like to be able to set the increments in which each individual column changes.
ex. Column A changes by 0.5 up to 2, column B changes by 0.2 up to 1, and column C changes by 1 up to 10.
[[0,0,0,0]
[0,0,1,0]
...
[0,0,9,0]
[0,0.2,0,0]
...
[0,0.8,9,0]
[0.5,0,0,0]
...
[1.5,0.8,9,0]]
Thanks for any help.

You can adjust your current code just a little bit to make it work.
>>> out = np.zeros((4*5*10,4))
>>> out[:,:3] = (np.arange(4*5*10)[:,None]//(5*10, 10, 1)*(0.5, 0.2, 1)%(2, 1, 10))
>>> out
array([[ 0. , 0. , 0. , 0. ],
[ 0. , 0. , 1. , 0. ],
[ 0. , 0. , 2. , 0. ],
...
[ 0. , 0. , 8. , 0. ],
[ 0. , 0. , 9. , 0. ],
[ 0. , 0.2, 0. , 0. ],
...
[ 0. , 0.8, 9. , 0. ],
[ 0.5, 0. , 0. , 0. ],
...
[ 1.5, 0.8, 9. , 0. ]])
The changes are:
No int dtype on the array, since we need it to hold floats in some columns. You could specify a float dtype if you want (or even something more complicated that only allows floats in the first two columns).
Rather than N**3 total values, figure out the number of distinct values for each column, and multiply them together to get our total size. This is used for both zeros and arange.
Use the floor division // operator in the first broadcast operation because we want integers at this point, but later we'll want floats.
The values to divide by are again based on the number of values for the later columns (e.g. for A,B,C numbers of values, divide by B*C, C, 1).
Add a new broadcast operation to multiply by various scale factors (how much each value increases at once).
Change the values in the broadcast mod % operation to match the bounds on each column.

This small example helps me understand what is going on:
In [123]: N=2
In [124]: np.arange(N**3)[:,None]/[N**2, N, 1]
Out[124]:
array([[ 0. , 0. , 0. ],
[ 0.25, 0.5 , 1. ],
[ 0.5 , 1. , 2. ],
[ 0.75, 1.5 , 3. ],
[ 1. , 2. , 4. ],
[ 1.25, 2.5 , 5. ],
[ 1.5 , 3. , 6. ],
[ 1.75, 3.5 , 7. ]])
So we generate a range of numbers (0 to 7) and divide them by 4,2, and 1.
The rest of the calculation just changes each value without further broadcasting
Apply %N to each element
In [126]: np.arange(N**3)[:,None]/[N**2, N, 1]%N
Out[126]:
array([[ 0. , 0. , 0. ],
[ 0.25, 0.5 , 1. ],
[ 0.5 , 1. , 0. ],
[ 0.75, 1.5 , 1. ],
[ 1. , 0. , 0. ],
[ 1.25, 0.5 , 1. ],
[ 1.5 , 1. , 0. ],
[ 1.75, 1.5 , 1. ]])
Assigning to an int array is the same as converting the floats to integers:
In [127]: (np.arange(N**3)[:,None]/[N**2, N, 1]%N).astype(int)
Out[127]:
array([[0, 0, 0],
[0, 0, 1],
[0, 1, 0],
[0, 1, 1],
[1, 0, 0],
[1, 0, 1],
[1, 1, 0],
[1, 1, 1]])

Related

Iterate over rows, and perform addition

So, here I have a numpy array, array([[-1.228, 0.709, 0. ], [ 0. , 2.836, 0. ], [ 1.228, 0.709, 0. ]]). What my plan is to perform addition to all the rows of this array with a vector (say [1,2,3]), and then append the result onto the end of it i.e the addition of another three rows? I want to perform the same process, like 5 times, so that the vector is added only to the last three rows, which were the result of the previous calculation(addition). Any suggestions?
Just use np.append along the first axis:
import numpy as np
a = np.array([[-1.228, 0.709, 0. ], [ 0. , 2.836, 0. ], [ 1.228, 0.709, 0. ]])
v = np.array([1, 2, 3])
new_a = np.append(a, a+v, axis=0)
For the addition part, just write something like a[0]+[1,2,3] (where a is your array), numpy will perform addition element-wise as expected.
For appending a=np.append(a, [line], axis=1) is what you're looking for, where line is the new line you want to add, for example the result of the previous sum.
The iteration can be easily repeated selecting the last three rows thanks to negative indexing: if you use a[-1], a[-2] and a[-3] you'll be sure to pick the last three lines
If you really need to keep results within a single array, a better option is to create it at the beginning and perform operations you need on it.
arr = np.array([[-1.228, 0.709, 0. ], [ 0. , 2.836, 0. ], [ 1.228, 0.709, 0. ]])
vector = np.array([1,2,3])
N = 5
multiarr = np.tile(arr, (1,N))
>>> multiarr
array([[-1.228, 0.709, 0. , -1.228, 0.709, 0. , -1.228, 0.709, 0. , -1.228, 0.709, 0. , -1.228, 0.709, 0. ],
[ 0. , 2.836, 0. , 0. , 2.836, 0. , 0. , 2.836, 0. , 0. , 2.836, 0. , 0. , 2.836, 0. ],
[ 1.228, 0.709, 0. , 1.228, 0.709, 0. , 1.228, 0.709, 0. , 1.228, 0.709, 0. , 1.228, 0.709, 0. ]])
multivector = (vector * np.arange(N)[:, None]).ravel()
>>> multivector
array([ 0, 0, 0, 1, 2, 3, 2, 4, 6, 3, 6, 9, 4, 8, 12])
>>> multiarr + multivector
array([[-1.228, 0.709, 0. , -0.228, 2.709, 3. , 0.772, 4.709, 6. , 1.772, 6.709, 9. , 2.772, 8.709, 12. ],
[ 0. , 2.836, 0. , 1. , 4.836, 3. , 2. , 6.836, 6. , 3. , 8.836, 9. , 4. , 10.836, 12. ],
[ 1.228, 0.709, 0. , 2.228, 2.709, 3. , 3.228, 4.709, 6. , 4.228, 6.709, 9. , 5.228, 8.709, 12. ]])

How to compute a spatial distance matrix from a given value

I've been looking for a way to (efficiently) compute a distance matrix from a target value and an input matrix.
If you consider an input array as:
[0 0 1 2 5 2 1]
[0 0 2 3 5 2 1]
[0 1 1 2 5 4 1]
[1 1 1 2 5 4 0]
Ho do you compute the spatial distance matrix associated to the target value 0?
i.e. what is the distance from each pixel to the closest 0 value?
Thanks in advance
You are looking for scipy.ndimage.morphology.distance_transform_edt. It operates on a binary array and computes euclidean distances on each TRUE position to the nearest background FALSE position. In our case, since we want to find out distances from nearest 0s, so the background is 0. Now, under the hoods, it converts the input to a binary array assuming 0 as the background, so we can just use it with the default parameters. Hence, it would be as simple as -
In [179]: a
Out[179]:
array([[0, 0, 1, 2, 5, 2, 1],
[0, 0, 2, 3, 5, 2, 1],
[0, 1, 1, 2, 5, 4, 1],
[1, 1, 1, 2, 5, 4, 0]])
In [180]: from scipy import ndimage
In [181]: ndimage.distance_transform_edt(a)
Out[181]:
array([[0. , 0. , 1. , 2. , 3. , 3.16, 3. ],
[0. , 0. , 1. , 2. , 2.83, 2.24, 2. ],
[0. , 1. , 1.41, 2.24, 2.24, 1.41, 1. ],
[1. , 1.41, 2.24, 2.83, 2. , 1. , 0. ]])
Solving for generic case
Now, let's say we want to find out distances from nearest 1s, then it would be -
In [183]: background = 1 # element from which distances are to be computed
# compare this with original array, a to verify
In [184]: ndimage.distance_transform_edt(a!=background)
Out[184]:
array([[2. , 1. , 0. , 1. , 2. , 1. , 0. ],
[1.41, 1. , 1. , 1.41, 2. , 1. , 0. ],
[1. , 0. , 0. , 1. , 2. , 1. , 0. ],
[0. , 0. , 0. , 1. , 2. , 1.41, 1. ]])

How to scale each column of a matrix

This is how I scale a single vector:
vector = np.array([-4, -3, -2, -1, 0])
# pass the vector, current range of values, the desired range, and it returns the scaled vector
scaledVector = np.interp(vector, (vector.min(), vector.max()), (-1, +1)) # results in [-1. -0.5 0. 0.5 1. ]
How can I apply the above approach to each column of a given matrix?
matrix = np.array(
[[-4, -4, 0, 0, 0],
[-3, -3, 1, -15, 0],
[-2, -2, 8, -1, 0],
[-1, -1, 11, 12, 0],
[0, 0, 50, 69, 80]])
scaledMatrix = [insert code that scales each column of the matrix]
Note that the first two columns of the scaledMatrix should be equal to the scaledVector from the first example. For the matrix above, the correctly computed scaledMatrix is:
[[-1. -1. -1. -0.64285714 -1. ]
[-0.5 -0.5 -0.96 -1. -1. ]
[ 0. 0. -0.68 -0.66666667 -1. ]
[ 0.5 0.5 -0.56 -0.35714286 -1. ]
[ 1. 1. 1. 1. 1. ]]
My current approach (wrong):
np.interp(matrix, (np.min(matrix), np.max(matrix)), (-1, +1))
If you want to do it by hand and understand what's going on:
First substract columnwise mins to make each columns have min 0.
Then divide by columnwise amplitude (max - min) to make each column have max 1.
Now each column is between 0 and 1. If you want it to be between -1 and 1, multiply by 2, and substract 1:
In [3]: mins = np.min(matrix, axis=0)
In [4]: maxs = np.max(matrix, axis=0)
In [5]: (matrix - mins[None, :]) / (maxs[None, :] - mins[None, :])
Out[5]:
array([[ 0. , 0. , 0. , 0.17857143, 0. ],
[ 0.25 , 0.25 , 0.02 , 0. , 0. ],
[ 0.5 , 0.5 , 0.16 , 0.16666667, 0. ],
[ 0.75 , 0.75 , 0.22 , 0.32142857, 0. ],
[ 1. , 1. , 1. , 1. , 1. ]])
In [6]: 2 * _ - 1
Out[6]:
array([[-1. , -1. , -1. , -0.64285714, -1. ],
[-0.5 , -0.5 , -0.96 , -1. , -1. ],
[ 0. , 0. , -0.68 , -0.66666667, -1. ],
[ 0.5 , 0.5 , -0.56 , -0.35714286, -1. ],
[ 1. , 1. , 1. , 1. , 1. ]])
I use [None, :] for numpy to understand that I'm talking about "row vectors", not column ones.
Otherwise, use the wonderful sklearn package, whose preprocessing module has lots of useful transformers:
In [13]: from sklearn.preprocessing import MinMaxScaler
In [14]: scaler = MinMaxScaler(feature_range=(-1, 1))
In [15]: scaler.fit(matrix)
Out[15]: MinMaxScaler(copy=True, feature_range=(-1, 1))
In [16]: scaler.transform(matrix)
Out[16]:
array([[-1. , -1. , -1. , -0.64285714, -1. ],
[-0.5 , -0.5 , -0.96 , -1. , -1. ],
[ 0. , 0. , -0.68 , -0.66666667, -1. ],
[ 0.5 , 0.5 , -0.56 , -0.35714286, -1. ],
[ 1. , 1. , 1. , 1. , 1. ]])

Numpy: placing values into an 1-of-n array based on indices in another array

Suppose we had two arrays: some values, e.g. array([1.2, 1.4, 1.6]), and some indices (let's say, array([0, 2, 1])) Our output is expected to be the values put into a bigger array, "addressed" by the indices, so we would get
array([[ 1.2, 0. , 0. ],
[ 0. , 0. , 1.4],
[ 0. , 1.6, 0. ]])
Is there a way to do this without loops, in a nice, fast way?
With
a = zeros((3,3))
b = array([0, 2, 1])
vals = array([1.2, 1.4, 1.6])
You just need to index it (with the help of arange or r_):
>>> a[r_[:len(b)], b] = vals
array([[ 1.2, 0. , 0. ],
[ 0. , 0. , 1.4],
[ 0. , 1.6, 0. ]])
How do we modify this for higher dimensions? For example, a is a 5x4x3 array and b and vals are 5x4 arrays.
then How do we modify the statement a[r_[:len(b)],b] = vals ?

Sort a numpy matrix based on its diagonal

I have a matrix that should have ones on the diagonal but the columns are mixed up.
But I don't know how, without the obvious for loop, to efficiently interchange rows to get unity on the diagonals. I'm not even sure what key I would pass to sort on.
Any suggestions?
You can use numpy's argmax to determine the goal column ordering and reorder your matrix using the argmax results as column indices:
>>> z = numpy.array([[ 0.1 , 0.1 , 1. ],
... [ 1. , 0.1 , 0.09],
... [ 0.1 , 1. , 0.2 ]])
numpy.argmax(z, axis=1)
>>> array([2, 0, 1]) #Goal column indices
z[:,numpy.argmax(z, axis=1)]
>>> array([[ 1. , 0.1 , 0.1 ],
... [ 0.09, 1. , 0.1 ],
... [ 0.2 , 0.1 , 1. ]])
>>> import numpy as np
>>> a = np.array([[ 1. , 0.5, 0.5, 0. ],
... [ 0.5, 0.5, 1. , 0. ],
... [ 0. , 1. , 0. , 0.5],
... [ 0. , 0.5, 0.5, 1. ]])
>>> np.array(sorted(a, cmp=lambda x, y: list(x).index(1) - list(y).index(1)))
array([[ 1. , 0.5, 0.5, 0. ],
[ 0. , 1. , 0. , 0.5],
[ 0.5, 0.5, 1. , 0. ],
[ 0. , 0.5, 0.5, 1. ]])
It actually sorts by rows, not columns (but the result is the same). It works by sorting by the index of the column the 1 is in.

Categories

Resources