Sparse arrays from tuples - python

I searched the net to find a guide for Scipy sparse matrices and I failed. I would be happy if anybody would share any source for it but now going to question:
I have an array of tuples. I want to change the array of tuples to a sparse matrix where the tuples appear on the main diagonal and diagonal just beside to it as the following example shows it. What is the fancy(efficient) way of doing it?
import numpy as np
A=np.asarray([[1,2],[3,4],[5,6],[7,8]])
B=np.zeros((A.shape[0],A.shape[0]+1))
for i in range(A.shape[0]):
B[i,i]=A[i,0]
B[i,i+1]=A[i,1]
print B
Output being:
[[ 1. 2. 0. 0. 0.]
[ 0. 3. 4. 0. 0.]
[ 0. 0. 5. 6. 0.]
[ 0. 0. 0. 7. 8.]]

You can build those really fast as a CSR matrix:
>>> A = np.asarray([[1,2],[3,4],[5,6],[7,8]])
>>> rows = len(A)
>>> cols = rows + 1
>>> data = A.flatten() # we want a copy
>>> indptr = np.arange(0, len(data)+1, 2) # 2 non-zero entries per row
>>> indices = np.repeat(np.arange(cols), [1] + [2] * (cols-2) + [1])
>>> import scipy.sparse as sps
>>> a_sps = sps.csr_matrix((data, indices, indptr), shape=(rows, cols))
>>> a_sps.A
array([[1, 2, 0, 0, 0],
[0, 3, 4, 0, 0],
[0, 0, 5, 6, 0],
[0, 0, 0, 7, 8]])

Try diags from scipy
import numpy as np
import scipy.sparse
A = np.asarray([[1,2],[3,4],[5,6],[7,8]])
B = scipy.sparse.diags([A[:,0], A[:,1]], [0, 1], [4, 5])
When I print B.todense(), it gives me
[[ 1. 2. 0. 0. 0.]
[ 0. 3. 4. 0. 0.]
[ 0. 0. 5. 6. 0.]
[ 0. 0. 0. 7. 8.]]

Related

Create 4d numpy array from lists, where axis[i] holds the list[i] elements

I have three lists, and I can build a numpy array of objects with (3,2,4) shape from them.
However, what I want is a numpy array with shape (2, 3, 4), where each axis corresponds to a "variable" in the list.
I tried using np.reshape with no success.
import numpy as np
p1 = [[1, 1, 1, 1], [1, 1, 1, 1]]
p2 = [[0, 0, 0, 0], [0, 0, 0, 0]]
p3 = [[2, 2, 2, 2], [2, 2, 2, 2]]
points = [p1, p2, p3]
nparr = np.array(points, dtype=np.float32)
What I obtain is
[[[1. 1. 1. 1.]
[1. 1. 1. 1.]]
[[0. 0. 0. 0.]
[0. 0. 0. 0.]]
[[2. 2. 2. 2.]
[2. 2. 2. 2.]]]
But what I would like to obtain is
[[[1. 1. 1. 1.]
[0. 0. 0. 0.]
[2. 2. 2. 2.]]
[[[1. 1. 1. 1.]
[0. 0. 0. 0.]
[2. 2. 2. 2.]]
Is there a clean way to achieve this without having to change the original input lists?
nparr = np.array([a for a in zip(p1,p2,p3)], dtype=np.float32)
or
nparr = np.squeeze(np.array([list(zip(p1,p2,p3))]), dtype=np.float32)
You can just do a transposition of the two first axis using:
nparr.transpose(1, 0, 2)
Since transpose return a view of the array with modified strides, this operation is very very cheap. However, on big array, it may be better to copy the view using .copy() so to work on contiguous views.

How to do an outer product of 3 vectors to create a 3d matrix in numpy? (and same for nd)

If i want to do an outer product of 2 vectors to create a 2d matrix, each element a product of the two respective elements in the original vectors:
b = np.arange(5).reshape((1, 5))
a = np.arange(5).reshape((5, 1))
a * b
array([[ 0, 0, 0, 0, 0],
[ 0, 1, 2, 3, 4],
[ 0, 2, 4, 6, 8],
[ 0, 3, 6, 9, 12],
[ 0, 4, 8, 12, 16]])
I want the same for 3 (or for n) vectors.
An equivalent non numpy answer:
a = np.arange(5)
b = np.arange(5)
c = np.arange(5)
res = np.zeros((a.shape[0], b.shape[0], c.shape[0]))
for ia in range(len(a)):
for ib in range(len(b)):
for ic in range(len(c)):
res[ia, ib, ic] = a[ia] * b[ib] * c[ic]
print(res)
out:
[[[ 0. 0. 0. 0. 0.]
[ 0. 0. 0. 0. 0.]
[ 0. 0. 0. 0. 0.]
[ 0. 0. 0. 0. 0.]
[ 0. 0. 0. 0. 0.]]
[[ 0. 0. 0. 0. 0.]
[ 0. 1. 2. 3. 4.]
[ 0. 2. 4. 6. 8.]
[ 0. 3. 6. 9. 12.]
[ 0. 4. 8. 12. 16.]]
[[ 0. 0. 0. 0. 0.]
[ 0. 2. 4. 6. 8.]
[ 0. 4. 8. 12. 16.]
[ 0. 6. 12. 18. 24.]
[ 0. 8. 16. 24. 32.]]
[[ 0. 0. 0. 0. 0.]
[ 0. 3. 6. 9. 12.]
[ 0. 6. 12. 18. 24.]
[ 0. 9. 18. 27. 36.]
[ 0. 12. 24. 36. 48.]]
[[ 0. 0. 0. 0. 0.]
[ 0. 4. 8. 12. 16.]
[ 0. 8. 16. 24. 32.]
[ 0. 12. 24. 36. 48.]
[ 0. 16. 32. 48. 64.]]]
How to do this with numpy [no for loops]?
Also, how to do this for a general function, not necessarily *?
NumPy provides you with np.outer() for computing the outer product.
This is a less powerful version of more versatile approaches:
ufunc.outer()
np.tensordot()
np.einsum()
np.einsum() is the only one capable of handling more than two input arrays:
import numpy as np
def prod(items, start=1):
for item in items:
start = start * item
return start
a = np.arange(5)
b = np.arange(5)
c = np.arange(5)
r0 = np.zeros((a.shape[0], b.shape[0], c.shape[0]))
for ia in range(len(a)):
for ib in range(len(b)):
for ic in range(len(c)):
r0[ia, ib, ic] = a[ia] * b[ib] * c[ic]
r1 = prod([a[:, None, None], b[None, :, None], c[None, None, :]])
# same as: r1 = a[:, None, None] * b[None, :, None] * c[None, None, :]
# same as: r1 = a.reshape(-1, 1, 1) * b.reshape(1, -1, 1) * c.reshape(1, 1, -1)
print(np.all(r0 == r2))
# True
r2 = np.einsum('i,j,k->ijk', a, b, c)
print(np.all(r0 == r2))
# True
# as per #hpaulj suggestion
r3 = prod(np.ix_(a, b, c))
print(np.all(r0 == r3))
# True
Of course, the broadcasting approach (which is the same that you used with the array.reshape() version of your code, except that it uses a slightly different syntax for providing the correct shape), can be automatized by explicitly building the slicing (or equivalently the array.reshape() parameters).
In [166]: a = np.arange(2)
...: b = np.arange(3)
...: c = np.arange(4)
As shown in comments and answer:
In [167]: R = np.einsum('i,j,k',a,b,c)
We can also np.ix_ construct arrays that broadcast against each other. This is often used to construct block indexing arrays, but works here as well:
In [168]: A,B,C = np.ix_(a,b,c)
In [169]: A,B,C
Out[169]:
(array([[[0]],
[[1]]]),
array([[[0],
[1],
[2]]]),
array([[[0, 1, 2, 3]]]))
In [170]: R1 = A*B*C
testing:
In [171]: np.allclose(R,R1)
Out[171]: True
That broadcasted product can be done in one line with:
In [172]: np.prod(np.array(np.ix_(a,b,c),object)).shape
Out[172]: (2, 3, 4)
Without that explicit object dtype casting I get a future warning about creating an ragged array.
np.meshgrid(a,b,c, sparse=True, indexing='ij') is an alternative to ix_.
While these ix_ etc expressions are nice, you should become thoroughly comfortable using:
a[:, None, None] * b[None, :, None] * c[None, None, :]
This kind of dimension expansion gives you the most power and flexibility.
The simplest approach (from this answer) is to use:
functools.reduce(np.multiply.outer, (a, b, c))
This works for any number of dimensions, and unlike np.prod(np.ix_(...)) it does not result in numpy deprecation warnings about introducing jagged arrays.

Python 2.7 appending column to 2d array

I have 2 arrays, x and y:
x = [[ 1. 2. 3. 4.]
[ 5. 6. 7. 8.]
[ 9. 0. 3. 6.]]
y = [[ 1. 0. 0.]
[ 0. 1. 0.]
[ 0. 0. 1.]]
I want a z matrix, as: z = [y[0], x, y[1], y[2]]:
[[ 1. 1. 2. 3. 4. 0. 0.]
[ 0. 5. 6. 7. 8. 1. 0.]
[ 0. 9. 0. 3. 6. 0. 1.]]
So I made this code:
z = np.c_[y[0], x]
for j in range(n):
z = np.c_[x, y[j]]
But it is not saving the matrix. My resulting z was just the last operation:
[[ 1. 2. 3. 4. 0.]
[ 5. 6. 7. 8. 0.]
[ 9. 0. 3. 6. 1.]]
How could I save the changes made on the matrix? I also tried to numpy.append() the same way, but it gives an error message:
ValueError: all the input arrays must have same number of dimensions
Using np.c to stack columns of y and x..
np.c_[np.array(y)[0],np.asanyarray(x),np.array(y)[1],np.array(y)[2]]
Out[536]:
array([[1, 1, 2, ..., 4, 0, 0],
[0, 5, 6, ..., 8, 1, 0],
[0, 9, 0, ..., 6, 0, 1]])
Or you can use np.roll to shift the columns before stacking them and shift again afterwards.
np.roll(np.c_[np.array(x),np.roll(np.array(y),-1,axis=1)],1,axis=1)
Out[549]:
array([[1, 1, 2, ..., 4, 0, 0],
[0, 5, 6, ..., 8, 1, 0],
[0, 9, 0, ..., 6, 0, 1]])
I think that the command you are looking for is numpy.insert(a, pos, col, axis = 1). If you make z = insert(y, 1, x, axis = 1) it will insert a new column on y with the values from x, and save the output in z.

Python Numpy Matrix Operations - matrix[a==b]?

I've been trying to create a watershed algorithm and as all the examples seem to be in Python I've run into a bit of a wall. I've been trying to find in numpy documentation what this line means:
matrixVariable[A==255] = 0
but have had no luck. Could anyone explain what that operation does?
For context the line in action: label [lbl == -1] = 0
The expression A == 255 creates a boolean array which is True where x == 255 in A and False otherwise.
The expression matrixVariable[A==255] = 0 sets each index corresponding to a True value in A == 255 to 0.
EG:
import numpy as np
A = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
B = np.zeros([3, 3])
print('before:')
print(B)
B[A>5] = 5
print('after:')
print(B)
OUT:
[[ 0. 0. 0.]
[ 0. 0. 0.]
[ 0. 0. 0.]]
after:
[[ 0. 0. 0.]
[ 0. 0. 5.]
[ 5. 5. 5.]]
I assumed that matrixVariable and A are numpy arrays. If the assumption is correct then "matrixVariable[A==255] = 0" expression first gets the index of the array A where values of A are equal to 255 then gets the values of matrixVariable for those index and set them to "0"
Example:
import numpy as np
matrixVariable = np.array([(1, 3),
(2, 2),
(3,1)])
A = np.array([255, 1,255])
So A[0] and A[2] are equal to 255
matrixVariable[A==255]=0 #then sets matrixVariable[0] and matrixVariable[2] to zero
print(matrixVariable) # this would print
[[0 0]
[2 2]
[0 0]]

np_utils.to_categorical Reverse

import numpy as np
from keras.utils import np_utils
nsample = 100
sample_space = ["HOME","DRAW","AWAY"]
array = np.random.choice(sample_space, nsample, )
uniques, coded_id = np.unique(array, return_inverse=True)
coded_array = np_utils.to_categorical(coded_id)
Example
Input
['AWAY', 'HOME', 'DRAW', 'AWAY', ...]
Output coded_array
[[ 0. 1. 0.]
[ 0. 0. 1.]
[ 0. 0. 1.]
...,
[ 0. 0. 1.]
[ 0. 0. 1.]
[ 1. 0. 0.]]
How to reverse process and get the original data from coded_array?
You can use np.argmax to retrieve back those ids and then simply indexing into uniques should give you the original array. Thus, we would have an implementation, like so -
uniques[y_code.argmax(1)]
Sample run -
In [44]: arr
Out[44]: array([5, 7, 3, 2, 4, 3, 7])
In [45]: uniques, ids = np.unique(arr, return_inverse=True)
In [46]: y_code = np_utils.to_categorical(ids, len(uniques))
In [47]: uniques[y_code.argmax(1)]
Out[47]: array([5, 7, 3, 2, 4, 3, 7])

Categories

Resources