Sort a numpy matrix based on its diagonal - python

I have a matrix that should have ones on the diagonal but the columns are mixed up.
But I don't know how, without the obvious for loop, to efficiently interchange rows to get unity on the diagonals. I'm not even sure what key I would pass to sort on.
Any suggestions?

You can use numpy's argmax to determine the goal column ordering and reorder your matrix using the argmax results as column indices:
>>> z = numpy.array([[ 0.1 , 0.1 , 1. ],
... [ 1. , 0.1 , 0.09],
... [ 0.1 , 1. , 0.2 ]])
numpy.argmax(z, axis=1)
>>> array([2, 0, 1]) #Goal column indices
z[:,numpy.argmax(z, axis=1)]
>>> array([[ 1. , 0.1 , 0.1 ],
... [ 0.09, 1. , 0.1 ],
... [ 0.2 , 0.1 , 1. ]])

>>> import numpy as np
>>> a = np.array([[ 1. , 0.5, 0.5, 0. ],
... [ 0.5, 0.5, 1. , 0. ],
... [ 0. , 1. , 0. , 0.5],
... [ 0. , 0.5, 0.5, 1. ]])
>>> np.array(sorted(a, cmp=lambda x, y: list(x).index(1) - list(y).index(1)))
array([[ 1. , 0.5, 0.5, 0. ],
[ 0. , 1. , 0. , 0.5],
[ 0.5, 0.5, 1. , 0. ],
[ 0. , 0.5, 0.5, 1. ]])
It actually sorts by rows, not columns (but the result is the same). It works by sorting by the index of the column the 1 is in.

Related

How do I convert numpy mgrid function as a function?

Here is the way how numpy.mgrid is used.
grid = np.mgrid[x1:y1:100j , x2:y2:100j, ..., xn:yn:100j]
However, I find this structure very irritating. Therefore, I would like to create function f which works as follows:
f([(x1,y1,100),...,(xn,yn,100)]) = np.mgrid[x1:y1:100j , x2:y2:100j, ..., xn:yn:100j]
How can I create f?
(Here is the source code for np.mgrid)
Just loop over each item passed to f and make a slice out of it with slice, and to get 100j from 100, multiply 100 by 1j:
def f(items):
slices = [slice(i[0], i[1], 1j * i[2]) for i in items]
return np.mgrid[slices]
Output:
>>> np.all( f([(1,2,5), (2,3,5)]) == np.mgrid[1:2:5j, 2:3:5j] )
True
You could make calling the function even simpler by using *items instead of items:
def f(*items):
slices = [slice(i[0], i[1], 1j * i[2]) for i in items]
return np.mgrid[slices]
Output:
>>> np.all( f([1,2,5], [2,3,5]) == np.mgrid[1:2:5j, 2:3:5j] )
True
mgrid is an instance of a cute class that lets us use indexing notation. Under the covers it uses np.linspace (or np.arange) to generate the ranges.
In [29]: x1,y1 = 0,1; x2,y2 = 1,3
In [30]: np.mgrid[x1:y1:3j, x2:y2:4j]
Out[30]:
array([[[0. , 0. , 0. , 0. ],
[0.5 , 0.5 , 0.5 , 0.5 ],
[1. , 1. , 1. , 1. ]],
[[1. , 1.66666667, 2.33333333, 3. ],
[1. , 1.66666667, 2.33333333, 3. ],
[1. , 1.66666667, 2.33333333, 3. ]]])
meshgrid is the function equivalent. I suspect it was the original function, and mgrid (and ogrid) was secondary version:
In [31]: np.meshgrid(np.linspace(x1,y1,3), np.linspace(x2,y2,4), indexing='ij')
Out[31]:
[array([[0. , 0. , 0. , 0. ],
[0.5, 0.5, 0.5, 0.5],
[1. , 1. , 1. , 1. ]]),
array([[1. , 1.66666667, 2.33333333, 3. ],
[1. , 1.66666667, 2.33333333, 3. ],
[1. , 1.66666667, 2.33333333, 3. ]])]
mgrid creates a n-d array; meshgrid returns a list of arrays. Otherwise they are equivalent. np.array(Out[31]) creates the array.
sparse versions
ogrid produces a "sparse" pair of arrays that, with broadcasting, functions the same way:
In [37]: np.ogrid[x1:y1:3j, x2:y2:4j]
Out[37]:
[array([[0. ],
[0.5],
[1. ]]),
array([[1. , 1.66666667, 2.33333333, 3. ]])]
meshgrid has an equivalent sparse mode:
In [38]: np.meshgrid(np.linspace(x1,y1,3), np.linspace(x2,y2,4), indexing='ij',
...: sparse=True)
Out[38]:
[array([[0. ],
[0.5],
[1. ]]),
array([[1. , 1.66666667, 2.33333333, 3. ]])]
We can create the same pair of arrays with:
In [39]: np.ix_(np.linspace(x1,y1,3), np.linspace(x2,y2,4))
Out[39]:
(array([[0. ],
[0.5],
[1. ]]),
array([[1. , 1.66666667, 2.33333333, 3. ]]))
or even:
In [40]: (np.linspace(x1,y1,3)[:,None], np.linspace(x2,y2,4)[None,:])
Out[40]:
(array([[0. ],
[0.5],
[1. ]]),
array([[1. , 1.66666667, 2.33333333, 3. ]]))

Make elements with value division by zero equal to zero in a 2D numpy array

I have a code snippet:
import numpy as np
x1 = [[1,4,2,1],
[1,1,4,5],
[0.5,0.3, 1,6],
[0.8,0.2,0.7,1]]
x2 = [[7,0,2,3],
[8,0,4,5],
[0.1,0, 2,6],
[0.1,0,0.16666667,6]]
np.true_divide(x1, x2)
The output is:
array([[0.14285714, inf, 1. , 0.33333333],
[0.125 , inf, 1. , 1. ],
[5. , inf, 0.5 , 1. ],
[8. , inf, 4.19999992, 0.16666667]])
I am aware that some elements will have zerodivision error which can be seen as 'inf'.
How can I use 'try and except' to change all these 'inf' results into 0? Or is there a better method to convert all those 'inf's into 0?
You can use numpy.where to select the values for which the division result or the original values be retained:
import numpy as np
x1 = np.array([[1,4,2,1],
[1,1,4,5],
[0.5,0.3, 1,6],
[0.8,0.2,0.7,1]])
x2 = np.array([[7,0,2,3],
[8,0,4,5],
[0.1,0, 2,6],
[0.1,0,0.16666667,6]])
np.where(x2==0, 0, x1/x2)
# or
# np.where(x2==0, x2, np.true_divide(x1, x2))
Output:
array([[0.14285714, 0. , 1. , 0.33333333],
[0.125 , 0. , 1. , 1. ],
[5. , 0. , 0.5 , 1. ],
[8. , 0. , 4.19999992, 0.16666667]])
0/0 can handle by adding invalid='ignore' to numpy.errstate()
introducing numpy.nan_to_num() to convert np.nan to 0.
with np.errstate(divide='ignore', invalid='ignore'):
c = np.true_divide(x1,x2)
c[c == np.inf] = 0
c = np.nan_to_num(c)
print(c)
Output
[[0.14285714 0. 1. 0.33333333]
[0.125 0. 1. 1. ]
[5. 0. 0.5 1. ]
[8. 0. 4.19999992 0.16666667]]

How can I multiply items in two arrays if there are zeros in python?

I have two arrays in python.
For a, it looks like
array([[0. , 0.08],
[0.12, 0. ],
[0.12, 0.08]])
For b, it looks like
array([[0.88, 0. ],
[0. , 0.92],
[0. , 0. ]])
I want to do the multiplication for these two arrays like below:
array([[0.08*0.88], ### 1st row of a multiplies 1st row of b without zeros
[0.12*0.92], ### 2nd row of a multiplies 2nd row of b without zeros
[0.12*0.08]]) ### multiplies o.12 and 0.08 together in 3rd row of a without zeros in 3rd row of b
And the final desired result is:
array([[0.0704],
[0.1104],
[0.0096]])
How can I achieve this? I could really use your help.
Just replace zero values by 1 on both the arrays, then pass a*b to np.prod with axis=1, and keepdims=True:
>>> a[a==0] = 1
>>> b[b==0] = 1
>>> np.prod(a*b, axis=1, keepdims=True)
#output:
array([[0.0704],
[0.1104],
[0.0096]])
Consider the following strategy:
a = np.array([[0. , 0.08],
[0.12, 0. ],
[0.12, 0.08]])
b = np.array([[0.88, 0. ],
[0. , 0.92],
[0. , 0. ]])
c = np.hstack([a, b]) # stick a and b together along axis 1
d = np.where(c == 0, 1, c) # turn the 0s into 1s
result = np.prod(d, axis=1) # calculate the production along axis 1
# array([0.0704, 0.1104, 0.0096])
You can do it like this
# First concatenate both the arrays
temp = np.concatenate((arr1, arr2), axis=1)
'''
the result will be like this
array([[0. , 0.08, 0.88, 0. ],
[0.12, 0. , 0. , 0.92],
[0.12, 0.08, 0. , 0. ]])
'''
# Sort the arrays
temp.sort()
'''
result: array([[0. , 0. , 0.08, 0.88],
[0. , 0. , 0.12, 0.92],
[0. , 0. , 0.08, 0.12]])
'''
res = temp[:, -1] * temp[:, -2]
'''
result: array([0.0704, 0.1104, 0.0096])
'''

How to scale each column of a matrix

This is how I scale a single vector:
vector = np.array([-4, -3, -2, -1, 0])
# pass the vector, current range of values, the desired range, and it returns the scaled vector
scaledVector = np.interp(vector, (vector.min(), vector.max()), (-1, +1)) # results in [-1. -0.5 0. 0.5 1. ]
How can I apply the above approach to each column of a given matrix?
matrix = np.array(
[[-4, -4, 0, 0, 0],
[-3, -3, 1, -15, 0],
[-2, -2, 8, -1, 0],
[-1, -1, 11, 12, 0],
[0, 0, 50, 69, 80]])
scaledMatrix = [insert code that scales each column of the matrix]
Note that the first two columns of the scaledMatrix should be equal to the scaledVector from the first example. For the matrix above, the correctly computed scaledMatrix is:
[[-1. -1. -1. -0.64285714 -1. ]
[-0.5 -0.5 -0.96 -1. -1. ]
[ 0. 0. -0.68 -0.66666667 -1. ]
[ 0.5 0.5 -0.56 -0.35714286 -1. ]
[ 1. 1. 1. 1. 1. ]]
My current approach (wrong):
np.interp(matrix, (np.min(matrix), np.max(matrix)), (-1, +1))
If you want to do it by hand and understand what's going on:
First substract columnwise mins to make each columns have min 0.
Then divide by columnwise amplitude (max - min) to make each column have max 1.
Now each column is between 0 and 1. If you want it to be between -1 and 1, multiply by 2, and substract 1:
In [3]: mins = np.min(matrix, axis=0)
In [4]: maxs = np.max(matrix, axis=0)
In [5]: (matrix - mins[None, :]) / (maxs[None, :] - mins[None, :])
Out[5]:
array([[ 0. , 0. , 0. , 0.17857143, 0. ],
[ 0.25 , 0.25 , 0.02 , 0. , 0. ],
[ 0.5 , 0.5 , 0.16 , 0.16666667, 0. ],
[ 0.75 , 0.75 , 0.22 , 0.32142857, 0. ],
[ 1. , 1. , 1. , 1. , 1. ]])
In [6]: 2 * _ - 1
Out[6]:
array([[-1. , -1. , -1. , -0.64285714, -1. ],
[-0.5 , -0.5 , -0.96 , -1. , -1. ],
[ 0. , 0. , -0.68 , -0.66666667, -1. ],
[ 0.5 , 0.5 , -0.56 , -0.35714286, -1. ],
[ 1. , 1. , 1. , 1. , 1. ]])
I use [None, :] for numpy to understand that I'm talking about "row vectors", not column ones.
Otherwise, use the wonderful sklearn package, whose preprocessing module has lots of useful transformers:
In [13]: from sklearn.preprocessing import MinMaxScaler
In [14]: scaler = MinMaxScaler(feature_range=(-1, 1))
In [15]: scaler.fit(matrix)
Out[15]: MinMaxScaler(copy=True, feature_range=(-1, 1))
In [16]: scaler.transform(matrix)
Out[16]:
array([[-1. , -1. , -1. , -0.64285714, -1. ],
[-0.5 , -0.5 , -0.96 , -1. , -1. ],
[ 0. , 0. , -0.68 , -0.66666667, -1. ],
[ 0.5 , 0.5 , -0.56 , -0.35714286, -1. ],
[ 1. , 1. , 1. , 1. , 1. ]])

Explanation on Numpy Broadcasting Answer

I recently posted a question here which was answered exactly as I asked. However, I think I overestimated my ability to manipulate the answer further. I read the broadcasting doc, and followed a few links that led me way back to 2002 about numpy broadcasting.
I've used the second method of array creation using broadcasting:
N = 10
out = np.zeros((N**3,4),dtype=int)
out[:,:3] = (np.arange(N**3)[:,None]/[N**2,N,1])%N
which outputs:
[[0,0,0,0]
[0,0,1,0]
...
[0,1,0,0]
[0,1,1,0]
...
[9,9,8,0]
[9,9,9,0]]
but I do not understand via the docs how to manipulate that. I would ideally like to be able to set the increments in which each individual column changes.
ex. Column A changes by 0.5 up to 2, column B changes by 0.2 up to 1, and column C changes by 1 up to 10.
[[0,0,0,0]
[0,0,1,0]
...
[0,0,9,0]
[0,0.2,0,0]
...
[0,0.8,9,0]
[0.5,0,0,0]
...
[1.5,0.8,9,0]]
Thanks for any help.
You can adjust your current code just a little bit to make it work.
>>> out = np.zeros((4*5*10,4))
>>> out[:,:3] = (np.arange(4*5*10)[:,None]//(5*10, 10, 1)*(0.5, 0.2, 1)%(2, 1, 10))
>>> out
array([[ 0. , 0. , 0. , 0. ],
[ 0. , 0. , 1. , 0. ],
[ 0. , 0. , 2. , 0. ],
...
[ 0. , 0. , 8. , 0. ],
[ 0. , 0. , 9. , 0. ],
[ 0. , 0.2, 0. , 0. ],
...
[ 0. , 0.8, 9. , 0. ],
[ 0.5, 0. , 0. , 0. ],
...
[ 1.5, 0.8, 9. , 0. ]])
The changes are:
No int dtype on the array, since we need it to hold floats in some columns. You could specify a float dtype if you want (or even something more complicated that only allows floats in the first two columns).
Rather than N**3 total values, figure out the number of distinct values for each column, and multiply them together to get our total size. This is used for both zeros and arange.
Use the floor division // operator in the first broadcast operation because we want integers at this point, but later we'll want floats.
The values to divide by are again based on the number of values for the later columns (e.g. for A,B,C numbers of values, divide by B*C, C, 1).
Add a new broadcast operation to multiply by various scale factors (how much each value increases at once).
Change the values in the broadcast mod % operation to match the bounds on each column.
This small example helps me understand what is going on:
In [123]: N=2
In [124]: np.arange(N**3)[:,None]/[N**2, N, 1]
Out[124]:
array([[ 0. , 0. , 0. ],
[ 0.25, 0.5 , 1. ],
[ 0.5 , 1. , 2. ],
[ 0.75, 1.5 , 3. ],
[ 1. , 2. , 4. ],
[ 1.25, 2.5 , 5. ],
[ 1.5 , 3. , 6. ],
[ 1.75, 3.5 , 7. ]])
So we generate a range of numbers (0 to 7) and divide them by 4,2, and 1.
The rest of the calculation just changes each value without further broadcasting
Apply %N to each element
In [126]: np.arange(N**3)[:,None]/[N**2, N, 1]%N
Out[126]:
array([[ 0. , 0. , 0. ],
[ 0.25, 0.5 , 1. ],
[ 0.5 , 1. , 0. ],
[ 0.75, 1.5 , 1. ],
[ 1. , 0. , 0. ],
[ 1.25, 0.5 , 1. ],
[ 1.5 , 1. , 0. ],
[ 1.75, 1.5 , 1. ]])
Assigning to an int array is the same as converting the floats to integers:
In [127]: (np.arange(N**3)[:,None]/[N**2, N, 1]%N).astype(int)
Out[127]:
array([[0, 0, 0],
[0, 0, 1],
[0, 1, 0],
[0, 1, 1],
[1, 0, 0],
[1, 0, 1],
[1, 1, 0],
[1, 1, 1]])

Categories

Resources