How to scale each column of a matrix - python

This is how I scale a single vector:
vector = np.array([-4, -3, -2, -1, 0])
# pass the vector, current range of values, the desired range, and it returns the scaled vector
scaledVector = np.interp(vector, (vector.min(), vector.max()), (-1, +1)) # results in [-1. -0.5 0. 0.5 1. ]
How can I apply the above approach to each column of a given matrix?
matrix = np.array(
[[-4, -4, 0, 0, 0],
[-3, -3, 1, -15, 0],
[-2, -2, 8, -1, 0],
[-1, -1, 11, 12, 0],
[0, 0, 50, 69, 80]])
scaledMatrix = [insert code that scales each column of the matrix]
Note that the first two columns of the scaledMatrix should be equal to the scaledVector from the first example. For the matrix above, the correctly computed scaledMatrix is:
[[-1. -1. -1. -0.64285714 -1. ]
[-0.5 -0.5 -0.96 -1. -1. ]
[ 0. 0. -0.68 -0.66666667 -1. ]
[ 0.5 0.5 -0.56 -0.35714286 -1. ]
[ 1. 1. 1. 1. 1. ]]
My current approach (wrong):
np.interp(matrix, (np.min(matrix), np.max(matrix)), (-1, +1))

If you want to do it by hand and understand what's going on:
First substract columnwise mins to make each columns have min 0.
Then divide by columnwise amplitude (max - min) to make each column have max 1.
Now each column is between 0 and 1. If you want it to be between -1 and 1, multiply by 2, and substract 1:
In [3]: mins = np.min(matrix, axis=0)
In [4]: maxs = np.max(matrix, axis=0)
In [5]: (matrix - mins[None, :]) / (maxs[None, :] - mins[None, :])
Out[5]:
array([[ 0. , 0. , 0. , 0.17857143, 0. ],
[ 0.25 , 0.25 , 0.02 , 0. , 0. ],
[ 0.5 , 0.5 , 0.16 , 0.16666667, 0. ],
[ 0.75 , 0.75 , 0.22 , 0.32142857, 0. ],
[ 1. , 1. , 1. , 1. , 1. ]])
In [6]: 2 * _ - 1
Out[6]:
array([[-1. , -1. , -1. , -0.64285714, -1. ],
[-0.5 , -0.5 , -0.96 , -1. , -1. ],
[ 0. , 0. , -0.68 , -0.66666667, -1. ],
[ 0.5 , 0.5 , -0.56 , -0.35714286, -1. ],
[ 1. , 1. , 1. , 1. , 1. ]])
I use [None, :] for numpy to understand that I'm talking about "row vectors", not column ones.
Otherwise, use the wonderful sklearn package, whose preprocessing module has lots of useful transformers:
In [13]: from sklearn.preprocessing import MinMaxScaler
In [14]: scaler = MinMaxScaler(feature_range=(-1, 1))
In [15]: scaler.fit(matrix)
Out[15]: MinMaxScaler(copy=True, feature_range=(-1, 1))
In [16]: scaler.transform(matrix)
Out[16]:
array([[-1. , -1. , -1. , -0.64285714, -1. ],
[-0.5 , -0.5 , -0.96 , -1. , -1. ],
[ 0. , 0. , -0.68 , -0.66666667, -1. ],
[ 0.5 , 0.5 , -0.56 , -0.35714286, -1. ],
[ 1. , 1. , 1. , 1. , 1. ]])

Related

Iterate over rows, and perform addition

So, here I have a numpy array, array([[-1.228, 0.709, 0. ], [ 0. , 2.836, 0. ], [ 1.228, 0.709, 0. ]]). What my plan is to perform addition to all the rows of this array with a vector (say [1,2,3]), and then append the result onto the end of it i.e the addition of another three rows? I want to perform the same process, like 5 times, so that the vector is added only to the last three rows, which were the result of the previous calculation(addition). Any suggestions?
Just use np.append along the first axis:
import numpy as np
a = np.array([[-1.228, 0.709, 0. ], [ 0. , 2.836, 0. ], [ 1.228, 0.709, 0. ]])
v = np.array([1, 2, 3])
new_a = np.append(a, a+v, axis=0)
For the addition part, just write something like a[0]+[1,2,3] (where a is your array), numpy will perform addition element-wise as expected.
For appending a=np.append(a, [line], axis=1) is what you're looking for, where line is the new line you want to add, for example the result of the previous sum.
The iteration can be easily repeated selecting the last three rows thanks to negative indexing: if you use a[-1], a[-2] and a[-3] you'll be sure to pick the last three lines
If you really need to keep results within a single array, a better option is to create it at the beginning and perform operations you need on it.
arr = np.array([[-1.228, 0.709, 0. ], [ 0. , 2.836, 0. ], [ 1.228, 0.709, 0. ]])
vector = np.array([1,2,3])
N = 5
multiarr = np.tile(arr, (1,N))
>>> multiarr
array([[-1.228, 0.709, 0. , -1.228, 0.709, 0. , -1.228, 0.709, 0. , -1.228, 0.709, 0. , -1.228, 0.709, 0. ],
[ 0. , 2.836, 0. , 0. , 2.836, 0. , 0. , 2.836, 0. , 0. , 2.836, 0. , 0. , 2.836, 0. ],
[ 1.228, 0.709, 0. , 1.228, 0.709, 0. , 1.228, 0.709, 0. , 1.228, 0.709, 0. , 1.228, 0.709, 0. ]])
multivector = (vector * np.arange(N)[:, None]).ravel()
>>> multivector
array([ 0, 0, 0, 1, 2, 3, 2, 4, 6, 3, 6, 9, 4, 8, 12])
>>> multiarr + multivector
array([[-1.228, 0.709, 0. , -0.228, 2.709, 3. , 0.772, 4.709, 6. , 1.772, 6.709, 9. , 2.772, 8.709, 12. ],
[ 0. , 2.836, 0. , 1. , 4.836, 3. , 2. , 6.836, 6. , 3. , 8.836, 9. , 4. , 10.836, 12. ],
[ 1.228, 0.709, 0. , 2.228, 2.709, 3. , 3.228, 4.709, 6. , 4.228, 6.709, 9. , 5.228, 8.709, 12. ]])

Make elements with value division by zero equal to zero in a 2D numpy array

I have a code snippet:
import numpy as np
x1 = [[1,4,2,1],
[1,1,4,5],
[0.5,0.3, 1,6],
[0.8,0.2,0.7,1]]
x2 = [[7,0,2,3],
[8,0,4,5],
[0.1,0, 2,6],
[0.1,0,0.16666667,6]]
np.true_divide(x1, x2)
The output is:
array([[0.14285714, inf, 1. , 0.33333333],
[0.125 , inf, 1. , 1. ],
[5. , inf, 0.5 , 1. ],
[8. , inf, 4.19999992, 0.16666667]])
I am aware that some elements will have zerodivision error which can be seen as 'inf'.
How can I use 'try and except' to change all these 'inf' results into 0? Or is there a better method to convert all those 'inf's into 0?
You can use numpy.where to select the values for which the division result or the original values be retained:
import numpy as np
x1 = np.array([[1,4,2,1],
[1,1,4,5],
[0.5,0.3, 1,6],
[0.8,0.2,0.7,1]])
x2 = np.array([[7,0,2,3],
[8,0,4,5],
[0.1,0, 2,6],
[0.1,0,0.16666667,6]])
np.where(x2==0, 0, x1/x2)
# or
# np.where(x2==0, x2, np.true_divide(x1, x2))
Output:
array([[0.14285714, 0. , 1. , 0.33333333],
[0.125 , 0. , 1. , 1. ],
[5. , 0. , 0.5 , 1. ],
[8. , 0. , 4.19999992, 0.16666667]])
0/0 can handle by adding invalid='ignore' to numpy.errstate()
introducing numpy.nan_to_num() to convert np.nan to 0.
with np.errstate(divide='ignore', invalid='ignore'):
c = np.true_divide(x1,x2)
c[c == np.inf] = 0
c = np.nan_to_num(c)
print(c)
Output
[[0.14285714 0. 1. 0.33333333]
[0.125 0. 1. 1. ]
[5. 0. 0.5 1. ]
[8. 0. 4.19999992 0.16666667]]

How to compute a spatial distance matrix from a given value

I've been looking for a way to (efficiently) compute a distance matrix from a target value and an input matrix.
If you consider an input array as:
[0 0 1 2 5 2 1]
[0 0 2 3 5 2 1]
[0 1 1 2 5 4 1]
[1 1 1 2 5 4 0]
Ho do you compute the spatial distance matrix associated to the target value 0?
i.e. what is the distance from each pixel to the closest 0 value?
Thanks in advance
You are looking for scipy.ndimage.morphology.distance_transform_edt. It operates on a binary array and computes euclidean distances on each TRUE position to the nearest background FALSE position. In our case, since we want to find out distances from nearest 0s, so the background is 0. Now, under the hoods, it converts the input to a binary array assuming 0 as the background, so we can just use it with the default parameters. Hence, it would be as simple as -
In [179]: a
Out[179]:
array([[0, 0, 1, 2, 5, 2, 1],
[0, 0, 2, 3, 5, 2, 1],
[0, 1, 1, 2, 5, 4, 1],
[1, 1, 1, 2, 5, 4, 0]])
In [180]: from scipy import ndimage
In [181]: ndimage.distance_transform_edt(a)
Out[181]:
array([[0. , 0. , 1. , 2. , 3. , 3.16, 3. ],
[0. , 0. , 1. , 2. , 2.83, 2.24, 2. ],
[0. , 1. , 1.41, 2.24, 2.24, 1.41, 1. ],
[1. , 1.41, 2.24, 2.83, 2. , 1. , 0. ]])
Solving for generic case
Now, let's say we want to find out distances from nearest 1s, then it would be -
In [183]: background = 1 # element from which distances are to be computed
# compare this with original array, a to verify
In [184]: ndimage.distance_transform_edt(a!=background)
Out[184]:
array([[2. , 1. , 0. , 1. , 2. , 1. , 0. ],
[1.41, 1. , 1. , 1.41, 2. , 1. , 0. ],
[1. , 0. , 0. , 1. , 2. , 1. , 0. ],
[0. , 0. , 0. , 1. , 2. , 1.41, 1. ]])

Explanation on Numpy Broadcasting Answer

I recently posted a question here which was answered exactly as I asked. However, I think I overestimated my ability to manipulate the answer further. I read the broadcasting doc, and followed a few links that led me way back to 2002 about numpy broadcasting.
I've used the second method of array creation using broadcasting:
N = 10
out = np.zeros((N**3,4),dtype=int)
out[:,:3] = (np.arange(N**3)[:,None]/[N**2,N,1])%N
which outputs:
[[0,0,0,0]
[0,0,1,0]
...
[0,1,0,0]
[0,1,1,0]
...
[9,9,8,0]
[9,9,9,0]]
but I do not understand via the docs how to manipulate that. I would ideally like to be able to set the increments in which each individual column changes.
ex. Column A changes by 0.5 up to 2, column B changes by 0.2 up to 1, and column C changes by 1 up to 10.
[[0,0,0,0]
[0,0,1,0]
...
[0,0,9,0]
[0,0.2,0,0]
...
[0,0.8,9,0]
[0.5,0,0,0]
...
[1.5,0.8,9,0]]
Thanks for any help.
You can adjust your current code just a little bit to make it work.
>>> out = np.zeros((4*5*10,4))
>>> out[:,:3] = (np.arange(4*5*10)[:,None]//(5*10, 10, 1)*(0.5, 0.2, 1)%(2, 1, 10))
>>> out
array([[ 0. , 0. , 0. , 0. ],
[ 0. , 0. , 1. , 0. ],
[ 0. , 0. , 2. , 0. ],
...
[ 0. , 0. , 8. , 0. ],
[ 0. , 0. , 9. , 0. ],
[ 0. , 0.2, 0. , 0. ],
...
[ 0. , 0.8, 9. , 0. ],
[ 0.5, 0. , 0. , 0. ],
...
[ 1.5, 0.8, 9. , 0. ]])
The changes are:
No int dtype on the array, since we need it to hold floats in some columns. You could specify a float dtype if you want (or even something more complicated that only allows floats in the first two columns).
Rather than N**3 total values, figure out the number of distinct values for each column, and multiply them together to get our total size. This is used for both zeros and arange.
Use the floor division // operator in the first broadcast operation because we want integers at this point, but later we'll want floats.
The values to divide by are again based on the number of values for the later columns (e.g. for A,B,C numbers of values, divide by B*C, C, 1).
Add a new broadcast operation to multiply by various scale factors (how much each value increases at once).
Change the values in the broadcast mod % operation to match the bounds on each column.
This small example helps me understand what is going on:
In [123]: N=2
In [124]: np.arange(N**3)[:,None]/[N**2, N, 1]
Out[124]:
array([[ 0. , 0. , 0. ],
[ 0.25, 0.5 , 1. ],
[ 0.5 , 1. , 2. ],
[ 0.75, 1.5 , 3. ],
[ 1. , 2. , 4. ],
[ 1.25, 2.5 , 5. ],
[ 1.5 , 3. , 6. ],
[ 1.75, 3.5 , 7. ]])
So we generate a range of numbers (0 to 7) and divide them by 4,2, and 1.
The rest of the calculation just changes each value without further broadcasting
Apply %N to each element
In [126]: np.arange(N**3)[:,None]/[N**2, N, 1]%N
Out[126]:
array([[ 0. , 0. , 0. ],
[ 0.25, 0.5 , 1. ],
[ 0.5 , 1. , 0. ],
[ 0.75, 1.5 , 1. ],
[ 1. , 0. , 0. ],
[ 1.25, 0.5 , 1. ],
[ 1.5 , 1. , 0. ],
[ 1.75, 1.5 , 1. ]])
Assigning to an int array is the same as converting the floats to integers:
In [127]: (np.arange(N**3)[:,None]/[N**2, N, 1]%N).astype(int)
Out[127]:
array([[0, 0, 0],
[0, 0, 1],
[0, 1, 0],
[0, 1, 1],
[1, 0, 0],
[1, 0, 1],
[1, 1, 0],
[1, 1, 1]])

Sort a numpy matrix based on its diagonal

I have a matrix that should have ones on the diagonal but the columns are mixed up.
But I don't know how, without the obvious for loop, to efficiently interchange rows to get unity on the diagonals. I'm not even sure what key I would pass to sort on.
Any suggestions?
You can use numpy's argmax to determine the goal column ordering and reorder your matrix using the argmax results as column indices:
>>> z = numpy.array([[ 0.1 , 0.1 , 1. ],
... [ 1. , 0.1 , 0.09],
... [ 0.1 , 1. , 0.2 ]])
numpy.argmax(z, axis=1)
>>> array([2, 0, 1]) #Goal column indices
z[:,numpy.argmax(z, axis=1)]
>>> array([[ 1. , 0.1 , 0.1 ],
... [ 0.09, 1. , 0.1 ],
... [ 0.2 , 0.1 , 1. ]])
>>> import numpy as np
>>> a = np.array([[ 1. , 0.5, 0.5, 0. ],
... [ 0.5, 0.5, 1. , 0. ],
... [ 0. , 1. , 0. , 0.5],
... [ 0. , 0.5, 0.5, 1. ]])
>>> np.array(sorted(a, cmp=lambda x, y: list(x).index(1) - list(y).index(1)))
array([[ 1. , 0.5, 0.5, 0. ],
[ 0. , 1. , 0. , 0.5],
[ 0.5, 0.5, 1. , 0. ],
[ 0. , 0.5, 0.5, 1. ]])
It actually sorts by rows, not columns (but the result is the same). It works by sorting by the index of the column the 1 is in.

Categories

Resources