How to index columns with a computed array? - python

Please have a look at this code:
import numpy as np
from scipy.spatial import distance
#1
X = [[0,0], [0,1], [0,2], [0,3], [0,4], [0,5]]
c = [[0,0], [0,1], [0,3]]
#2
dists = distance.cdist(X, c)
print(dists)
#3
dmini = np.argmin(dists, axis=1)
print(dmini)
#4
mindists = dists[:, dmini]
print(mindists)
(#1) So I have my data X, some other points (centroids) c, then (#2) I compute the distance from each point in X to all the centroids c, and store the result in dists.
(#3) Then I select the index of the minimum distances with argmin.
(#4) Now I only want to select the value of the minimum values, using the indexes computed in step #3.
However, I get a strange output.
# dists
[[ 0. 1. 3.]
[ 1. 0. 2.]
[ 2. 1. 1.]
[ 3. 2. 0.]
[ 4. 3. 1.]
[ 5. 4. 2.]]
#dmini
[0 1 1 2 2 2]
#mindists
[[ 0. 1. 1. 3. 3. 3.]
[ 1. 0. 0. 2. 2. 2.]
[ 2. 1. 1. 1. 1. 1.]
[ 3. 2. 2. 0. 0. 0.]
[ 4. 3. 3. 1. 1. 1.]
[ 5. 4. 4. 2. 2. 2.]]
Reading here and there, it seems possible to select specific columns by giving a list of integers (indexes). In this case I should use the dmini values for indexing columns along rows.
I was expecting mindists to be (6,) in shape. What am I doing wrong?

Related

Slicing 2D NumPy Array, removing first and last row and column

I have a 2D Numpy array of tile objects that serves as a map. The outer ring is all "wall" values to make a closed border. I want to make a copy of the inner values to iterate over without touching the outer rows and columns. I'm trying:
inner_values = map.tiles[1:-1][1:-1]
to cut off the top and bottom rows and left and right columns. My map is 100*70, and this keeps giving me an array of shape (96, 70) when I want (98, 68). How can I use slices correctly to get my inner values? Thanks!
You are just about there...you can put all the indices inside the brackets to get what you want:
import numpy as np
a = np.ones([5, 5])
print(a)
# [[1. 1. 1. 1. 1.]
# [1. 1. 1. 1. 1.]
# [1. 1. 1. 1. 1.]
# [1. 1. 1. 1. 1.]
# [1. 1. 1. 1. 1.]]
a[1:-1, 1:-1] = 0
print(a)
# [[1. 1. 1. 1. 1.]
# [1. 0. 0. 0. 1.]
# [1. 0. 0. 0. 1.]
# [1. 0. 0. 0. 1.]
# [1. 1. 1. 1. 1.]]
Or given your dimensions:
a = np.ones([100,70])
a[1:-1, 1:-1].shape
# (98, 68)

Modifying (keras/tensorflow) Tensors using numpy methods

I want to perform a specific operation. Namely, from a matrix:
A = np.array([[1,2],
[3,4]])
To the following
B = np.array([[1, 0, 0, 2, 0, 0],
[0, 1, 0, 0, 2, 0],
[0, 0, 1, 0, 0, 2],
[3, 0, 0, 4, 0, 0],
[0, 3, 0, 0, 4, 0],
[0, 0, 3, 0, 0, 4]])
Or in words: multiply every entry by the identity matrix and keep the same order.
Now I have accomplished this by using numpy, using the following code. Here N and M are the dimensions of the starting matrix, and the dimension of the identity matrix.
l_slice = 3
n_slice = 2
A = np.reshape(np.arange(1, 1+N ** 2), (N, N))
B = np.array([i * np.eye(M) for i in A.flatten()])
C = B.reshape(N, N, M, M).reshape(N, N * M, M).transpose([0, 2, 1]).reshape((N * M, N * M))
where C has my desired properties.
But now I want do this modification in Keras/Tensorflow, where the matrix A is the outcome of one of my layers.
However, I am not sure yet if I will be able to properly create matrix B. Especially when batches are involved, I think I will somehow mess up the dimensions of my problem.
Can anyone with more Keras/Tensorflow experience comment on this 'reshape' and how he/she sees this happening within Keras/Tensorflow?
Here is a way to do that with TensorFlow:
import tensorflow as tf
data = tf.placeholder(tf.float32, [None, None])
n = tf.placeholder(tf.int32, [])
eye = tf.eye(n)
mult = data[:, tf.newaxis, :, tf.newaxis] * eye[tf.newaxis, :, tf.newaxis, :]
result = tf.reshape(mult, n * tf.shape(data))
with tf.Session() as sess:
a = sess.run(result, feed_dict={data: [[1, 2], [3, 4]], n: 3})
print(a)
Output:
[[1. 0. 0. 2. 0. 0.]
[0. 1. 0. 0. 2. 0.]
[0. 0. 1. 0. 0. 2.]
[3. 0. 0. 4. 0. 0.]
[0. 3. 0. 0. 4. 0.]
[0. 0. 3. 0. 0. 4.]]
By the way, you can do basically the same in NumPy, which should be faster than your current solution:
import numpy as np
data = np.array([[1, 2], [3, 4]])
n = 3
eye = np.eye(n)
mult = data[:, np.newaxis, :, np.newaxis] * eye[np.newaxis, :, np.newaxis, :]
result = np.reshape(mult, (n * data.shape[0], n * data.shape[1]))
print(result)
# The output is the same as above
EDIT:
I'll try to give some intuition about why/how this works, sorry if it's too long. It is not that hard but I think it's sort of tricky to explain. Maybe it is easier to see how the following multiplication works
import numpy as np
data = np.array([[1, 2], [3, 4]])
n = 3
eye = np.eye(n)
mult1 = data[:, :, np.newaxis, np.newaxis] * eye[np.newaxis, np.newaxis, :, :]
Now, mult1 is a sort of "matrix of matrices". If I give two indices, I will get the diagonal matrix for the corresponding element in the original one:
print(mult1[0, 0])
# [[1. 0. 0.]
# [0. 1. 0.]
# [0. 0. 1.]]
So you could say this matrix could be visualize like this:
| 1 0 0 | | 2 0 0 |
| 0 1 0 | | 0 2 0 |
| 0 0 1 | | 0 0 2 |
| 3 0 0 | | 4 0 0 |
| 0 3 0 | | 0 4 0 |
| 0 0 3 | | 0 0 4 |
However this is deceiving, because if you try to reshape this to the final shape the result is not the right one:
print(np.reshape(mult1, (n * data.shape[0], n * data.shape[1])))
# [[1. 0. 0. 0. 1. 0.]
# [0. 0. 1. 2. 0. 0.]
# [0. 2. 0. 0. 0. 2.]
# [3. 0. 0. 0. 3. 0.]
# [0. 0. 3. 4. 0. 0.]
# [0. 4. 0. 0. 0. 4.]]
The reason is that reshaping (conceptually) "flattens" the array first and then gives the new shape. But the flattened array in this case is not what you need:
print(mult1.ravel())
# [1. 0. 0. 0. 1. 0. 0. 0. 1. 2. 0. 0. 0. 2. 0. ...
You see, it first traverses the first submatrix, then the second, etc. What you want though is for it to traverse first the first row of the first submatrix, then the first row of the second submatrix, then second row of first submatrix, etc. So basically you want something like:
Take the first two submatrices (the ones with 1 and 2)
Take all the first rows ([1, 0, 0] and [2, 0, 0]).
Take the first of these ([1, 0, 0])
Take each of its elements (1, 0 and 0).
And then continue for the rest. So if you think about it, we traversing first the axis 0 (row of "matrix of matrices"), then 2 (rows of each submatrix), then 1 (column of "matrix of matrices") and finally 3 (columns of submatrices). So we can just reorder the axis to do that:
mult2 = mult1.transpose((0, 2, 1, 3))
print(np.reshape(mult2, (n * data.shape[0], n * data.shape[1])))
# [[1. 0. 0. 2. 0. 0.]
# [0. 1. 0. 0. 2. 0.]
# [0. 0. 1. 0. 0. 2.]
# [3. 0. 0. 4. 0. 0.]
# [0. 3. 0. 0. 4. 0.]
# [0. 0. 3. 0. 0. 4.]]
And it works! So in the solution I posted, to avoid the tranposing, I just make the multiplication so the order of the axes is exactly that:
mult = data[
:, # Matrix-of-matrices rows
np.newaxis, # Submatrix rows
:, # Matrix-of-matrices columns
np.newaxis # Submatrix columns
] * eye[
np.newaxis, # Matrix-of-matrices rows
:, # Submatrix rows
np.newaxis, # Matrix-of-matrices columns
: # Submatrix columns
]
I hope that makes it slightly clearer. To be honest, in this case in particular I could came up with the solution quickly because I had to solve a similar problem not too long ago, and I guess you end up building an intuition of these things.
Another way to achieve the same effect in numpy is to use the following:
A = np.array([[1,2],
[3,4]])
B = np.repeat(np.repeat(A, 3, axis=0), 3, axis=1) * np.tile(np.eye(3), (2,2))
Then, to replicate it in tensorflow, we can use tf.tile, but there is no tf.repeat, however someone has provided this function on tensorflow tracker.
def tf_repeat(tensor, repeats):
"""
Args:
input: A Tensor. 1-D or higher.
repeats: A list. Number of repeat for each dimension, length must be the same as the number of dimensions in input
Returns:
A Tensor. Has the same type as input. Has the shape of tensor.shape * repeats
"""
with tf.variable_scope("repeat"):
expanded_tensor = tf.expand_dims(tensor, -1)
multiples = [1] + list(repeats)
tiled_tensor = tf.tile(expanded_tensor, multiples=multiples)
repeated_tesnor = tf.reshape(tiled_tensor, tf.shape(tensor) * repeats)
return repeated_tesnor
and thus the tensorflow implementation will look like the following. Here I also consider that the first dimension represents batches, and thus we do not operate on it.
N = 2
M = 3
nbatch = 2
Ain = np.reshape(np.arange(1, 1 + N*N*nbatch), (nbatch, N, N))
A = tf.placeholder(tf.float32, shape=(nbatch, N, N))
B = tf.tile(tf.eye(M), [N, N]) * tf_repeat(A, [1, M, M])
with tf.Session() as sess:
print(sess.run(C, feed_dict={A: Ain}))
and the result:
[[[1. 0. 0. 2. 0. 0.]
[0. 1. 0. 0. 2. 0.]
[0. 0. 1. 0. 0. 2.]
[3. 0. 0. 4. 0. 0.]
[0. 3. 0. 0. 4. 0.]
[0. 0. 3. 0. 0. 4.]]
[[5. 0. 0. 6. 0. 0.]
[0. 5. 0. 0. 6. 0.]
[0. 0. 5. 0. 0. 6.]
[7. 0. 0. 8. 0. 0.]
[0. 7. 0. 0. 8. 0.]
[0. 0. 7. 0. 0. 8.]]]

Using slice notation to set 1 dimension of numpy array

From what I understand of python's slice notation, using the slice notation shallow-copies the array in question.
However, what happens if you set a slice of an array equal to a certain value?
For example:
import numpy as np
a=np.zeros(shape=(3,2))
b=np.zeros(shape=(3,2))
for i in range(0,2):
a[:,i]=i+1
for i in range(0,2):
for x in range(0,3):
b[x,i]=i+1
print a
print b
Here a and b are identical.
Is there a reason I should not use the slice notation in this way? (I have never seen anyone use the slice notation in this way, so I feel like there might be)
I see no reason you cannot use your code. below is a speed test of the sample above expanded a bit to ensure that it would take enough time to register.
test of numpy code on https://www.tutorialspoint.com/online_python_ide.php
import numpy as np
import time
a=np.zeros(shape=(4,3))
b=np.zeros(shape=(4,3))
print "--before--"
print a
print ""
print b
start_time = time.time()
for i in range(0,3):
a[:,i]=i+1
time1 = time.time()
for i in range(0,3):
for x in range(0,4):
b[x,i]=i+1
time2 = time.time()
print "--after--"
print a
print ("this took %s seconds\n"% (time1-start_time))
print b
print ("this took %s seconds\n"% (time2-time1))
print "--done--\n"
----- output ----
--before--
[[ 0. 0. 0.]
[ 0. 0. 0.]
[ 0. 0. 0.]
[ 0. 0. 0.]]
[[ 0. 0. 0.]
[ 0. 0. 0.]
[ 0. 0. 0.]
[ 0. 0. 0.]]
--after--
[[ 1. 2. 3.]
[ 1. 2. 3.]
[ 1. 2. 3.]
[ 1. 2. 3.]]
this took 1.21593475342e-05 seconds
[[ 1. 2. 3.]
[ 1. 2. 3.]
[ 1. 2. 3.]
[ 1. 2. 3.]]
this took 7.86781311035e-06 seconds
--done--

Python creating matrix using if condition on indices : incorrect result

I have the following code where I have been trying to create a tridiagonal matrix x using if-conditions.
#!/usr/bin/env python
# import useful modules
import numpy as np
N=5
x=np.identity(N)
#x=np.zeros((N,N))
print x
# Construct NxN matrix
for i in range(N):
for j in range(N):
if i-j==1:
x[i][j]=1
elif j-1==1:
x[i][j]=-1
else:
x[i][j]=0
print "i= ",i," j= ",j
print x
I desire to get
[[ 0. -1. 0. 0. 0.]
[ 1. 0. -1. 0. 0.]
[ 0. 1. 0. -1 0.]
[ 0. 0. 1. 0. -1.]
[ 0. 0. 0. 1. 0.]]
However, I obtain
[[ 0. 0. -1. 0. 0.]
[ 1. 0. -1. 0. 0.]
[ 0. 1. -1. 0. 0.]
[ 0. 0. 1. 0. 0.]
[ 0. 0. -1. 1. 0.]]
What's going wrong?
Bonus question : Can I forcefully index from 1 to 5 instead of 0 to 4 in this example, or Python never allows that?
elif j-1==1: should be elif j-i==1:.
And no, lists/arrays etc. are always indexed from 0.
As for the bonus question, the first element of a sequence in Python has always the index 0. However, if for some particular reason (for example to prevent off-by-one errors) you wish to count the elements of a sequence from a value other than 0, you could use the built-in function enumerate() and set the value of the optional parameter start to fit your needs:
>>> seq = ['a', 'b', 'c']
>>> for count, item in enumerate(seq, start=1):
... print(count, item)
...
1 a
2 b
3 c

Numpy - Modal matrix and diagonal Eigenvalues

I wrote a simple Linear Algebra code in Python Numpy to calculate the Diagonal of EigenValues by calculating $M^{-1}.A.M$ (M is the Modal Matrix) and it's working strange.
Here's the Code :
import numpy as np
array = np.arange(16)
array = array.reshape(4, -1)
print(array)
[[ 0 1 2 3]
[ 4 5 6 7]
[ 8 9 10 11]
[12 13 14 15]]
eigenvalues, eigenvectors = np.linalg.eig(array)
print eigenvalues
[ 3.24642492e+01 -2.46424920e+00 1.92979794e-15 -4.09576009e-16]
print eigenvectors
[[-0.11417645 -0.7327781 0.54500164 0.00135151]
[-0.3300046 -0.28974835 -0.68602671 0.40644504]
[-0.54583275 0.15328139 -0.2629515 -0.8169446 ]
[-0.76166089 0.59631113 0.40397657 0.40914805]]
inverseEigenVectors = np.linalg.inv(eigenvectors) #M^(-1)
diagonal= inverseEigenVectors.dot(array).dot(eigenvectors) #M^(-1).A.M
print(diagonal)
[[ 3.24642492e+01 -1.06581410e-14 5.32907052e-15 0.00000000e+00]
[ 7.54951657e-15 -2.46424920e+00 -1.72084569e-15 -2.22044605e-16]
[ -2.80737213e-15 1.46768503e-15 2.33547852e-16 7.25592561e-16]
[ -6.22319863e-15 -9.69656080e-16 -1.38050658e-30 1.97215226e-31]]
the final 'diagonal' matrix should be a diagonal matrix with EigenValues on the main diagonal and zeros elsewhere. but it's not... the two first main diagonal values ARE eigenvalues but the two second aren't (although just like the two second eigenvalues, they are nearly zero).
and by the way a number like $-1.06581410e-14$ is literally zero so how can I make numpy show them as zero?
What am I doing wrong?
Thanks...
Just round the final result to the desired digits :
print(diagonal.round(5))
array([[ 32.46425, 0. , 0. , 0. ],
[ 0. , -2.46425, 0. , 0. ],
[ 0. , 0. , 0. , 0. ],
[ 0. , 0. , 0. , 0. ]])
Don't confuse precision of computation and printing policies.
>>> diagonal[np.abs(diagonal)<0.0000000001]=0
>>> print diagonal
[[ 32.4642492 0. 0. 0. ]
[ 0. -2.4642492 0. 0. ]
[ 0. 0. 0. 0. ]
[ 0. 0. 0. 0. ]]
>>>

Categories

Resources