import numpy as np
from keras.utils import np_utils
nsample = 100
sample_space = ["HOME","DRAW","AWAY"]
array = np.random.choice(sample_space, nsample, )
uniques, coded_id = np.unique(array, return_inverse=True)
coded_array = np_utils.to_categorical(coded_id)
Example
Input
['AWAY', 'HOME', 'DRAW', 'AWAY', ...]
Output coded_array
[[ 0. 1. 0.]
[ 0. 0. 1.]
[ 0. 0. 1.]
...,
[ 0. 0. 1.]
[ 0. 0. 1.]
[ 1. 0. 0.]]
How to reverse process and get the original data from coded_array?
You can use np.argmax to retrieve back those ids and then simply indexing into uniques should give you the original array. Thus, we would have an implementation, like so -
uniques[y_code.argmax(1)]
Sample run -
In [44]: arr
Out[44]: array([5, 7, 3, 2, 4, 3, 7])
In [45]: uniques, ids = np.unique(arr, return_inverse=True)
In [46]: y_code = np_utils.to_categorical(ids, len(uniques))
In [47]: uniques[y_code.argmax(1)]
Out[47]: array([5, 7, 3, 2, 4, 3, 7])
Related
I have three lists, and I can build a numpy array of objects with (3,2,4) shape from them.
However, what I want is a numpy array with shape (2, 3, 4), where each axis corresponds to a "variable" in the list.
I tried using np.reshape with no success.
import numpy as np
p1 = [[1, 1, 1, 1], [1, 1, 1, 1]]
p2 = [[0, 0, 0, 0], [0, 0, 0, 0]]
p3 = [[2, 2, 2, 2], [2, 2, 2, 2]]
points = [p1, p2, p3]
nparr = np.array(points, dtype=np.float32)
What I obtain is
[[[1. 1. 1. 1.]
[1. 1. 1. 1.]]
[[0. 0. 0. 0.]
[0. 0. 0. 0.]]
[[2. 2. 2. 2.]
[2. 2. 2. 2.]]]
But what I would like to obtain is
[[[1. 1. 1. 1.]
[0. 0. 0. 0.]
[2. 2. 2. 2.]]
[[[1. 1. 1. 1.]
[0. 0. 0. 0.]
[2. 2. 2. 2.]]
Is there a clean way to achieve this without having to change the original input lists?
nparr = np.array([a for a in zip(p1,p2,p3)], dtype=np.float32)
or
nparr = np.squeeze(np.array([list(zip(p1,p2,p3))]), dtype=np.float32)
You can just do a transposition of the two first axis using:
nparr.transpose(1, 0, 2)
Since transpose return a view of the array with modified strides, this operation is very very cheap. However, on big array, it may be better to copy the view using .copy() so to work on contiguous views.
Given a nested list with unequal number of elements, I would like to find the fastest way to calculate the product of the cartesian product along the last axis. In other words, first calculate the cartesian product between all sublists, then find the multiplicative product along all combinations. Then finally, I want to insert those values into a matrix of the same size/dimensionality as the original input. As an added piece of complexity, I want to pad axes of shape (1, ) with an extra 0. For example:
example1 = [[1, 2], [3, 4], [5], [6], [7]]
should result in
[[[[[ 630. 0.]
[ 0. 0.]]
[[ 0. 0.]
[ 0. 0.]]]
[[[ 840. 0.]
[ 0. 0.]]
[[ 0. 0.]
[ 0. 0.]]]]
[[[[1260. 0.]
[ 0. 0.]]
[[ 0. 0.]
[ 0. 0.]]]
[[[1680. 0.]
[ 0. 0.]]
[[ 0. 0.]
[ 0. 0.]]]]]
which has a shape (2, 2, 2, 2, 2), although it would be (2, 2, 1, 1, 1) without padding.
My initial function is:
def convert_nest_to_product_tensor(nest):
# find indices to collect elements from
combinations = list(itertools.product(*[range(len(l)) for l in nest]))
# collect elements and then calculate product for every Cartesian product
products = np.array(
[np.product([nest[i][idx] for i, idx in enumerate(comb)]) for comb in combinations]
)
# pad tensor for axes of shape 1
tensor_shape = [len(l) for l in nest]
tensor_shape = tuple([axis_shape+1 if axis_shape==1 else axis_shape for axis_shape in tensor_shape])
tensor = np.zeros(tensor_shape)
# insert values
for i, idx in enumerate(combinations):
tensor[idx] = products[i]
return tensor
However, it takes while, specifically the part where I find the product of the Cartesian products. I tried replacing that component using np.meshgrid + np.stack:
products = np.stack(np.meshgrid(*nest), axis=-1).reshape(-1, len(nest))
products = np.prod(products, axis=-1)
and while I get the correct values much faster, but they are not in the correct output order:
[[[[[ 630. 0.]
[ 0. 0.]]
[[ 0. 0.]
[ 0. 0.]]]
[[[1260. 0.]
[ 0. 0.]]
[[ 0. 0.]
[ 0. 0.]]]]
[[[[ 840. 0.]
[ 0. 0.]]
[[ 0. 0.]
[ 0. 0.]]]
[[[1680. 0.]
[ 0. 0.]]
[[ 0. 0.]
[ 0. 0.]]]]]
Any feedback on how to make this work (quickly) is much appreciated!
A simple way of getting the cartesian tuples and product:
In [10]: alist = list(itertools.product(*example1))
In [11]: alist
Out[11]: [(1, 3, 5, 6, 7), (1, 4, 5, 6, 7), (2, 3, 5, 6, 7), (2, 4, 5, 6, 7)]
In [12]: [np.prod(x) for x in alist]
Out[12]: [630, 840, 1260, 1680]
Or use math.prod for a no-numpy solution.
If i want to do an outer product of 2 vectors to create a 2d matrix, each element a product of the two respective elements in the original vectors:
b = np.arange(5).reshape((1, 5))
a = np.arange(5).reshape((5, 1))
a * b
array([[ 0, 0, 0, 0, 0],
[ 0, 1, 2, 3, 4],
[ 0, 2, 4, 6, 8],
[ 0, 3, 6, 9, 12],
[ 0, 4, 8, 12, 16]])
I want the same for 3 (or for n) vectors.
An equivalent non numpy answer:
a = np.arange(5)
b = np.arange(5)
c = np.arange(5)
res = np.zeros((a.shape[0], b.shape[0], c.shape[0]))
for ia in range(len(a)):
for ib in range(len(b)):
for ic in range(len(c)):
res[ia, ib, ic] = a[ia] * b[ib] * c[ic]
print(res)
out:
[[[ 0. 0. 0. 0. 0.]
[ 0. 0. 0. 0. 0.]
[ 0. 0. 0. 0. 0.]
[ 0. 0. 0. 0. 0.]
[ 0. 0. 0. 0. 0.]]
[[ 0. 0. 0. 0. 0.]
[ 0. 1. 2. 3. 4.]
[ 0. 2. 4. 6. 8.]
[ 0. 3. 6. 9. 12.]
[ 0. 4. 8. 12. 16.]]
[[ 0. 0. 0. 0. 0.]
[ 0. 2. 4. 6. 8.]
[ 0. 4. 8. 12. 16.]
[ 0. 6. 12. 18. 24.]
[ 0. 8. 16. 24. 32.]]
[[ 0. 0. 0. 0. 0.]
[ 0. 3. 6. 9. 12.]
[ 0. 6. 12. 18. 24.]
[ 0. 9. 18. 27. 36.]
[ 0. 12. 24. 36. 48.]]
[[ 0. 0. 0. 0. 0.]
[ 0. 4. 8. 12. 16.]
[ 0. 8. 16. 24. 32.]
[ 0. 12. 24. 36. 48.]
[ 0. 16. 32. 48. 64.]]]
How to do this with numpy [no for loops]?
Also, how to do this for a general function, not necessarily *?
NumPy provides you with np.outer() for computing the outer product.
This is a less powerful version of more versatile approaches:
ufunc.outer()
np.tensordot()
np.einsum()
np.einsum() is the only one capable of handling more than two input arrays:
import numpy as np
def prod(items, start=1):
for item in items:
start = start * item
return start
a = np.arange(5)
b = np.arange(5)
c = np.arange(5)
r0 = np.zeros((a.shape[0], b.shape[0], c.shape[0]))
for ia in range(len(a)):
for ib in range(len(b)):
for ic in range(len(c)):
r0[ia, ib, ic] = a[ia] * b[ib] * c[ic]
r1 = prod([a[:, None, None], b[None, :, None], c[None, None, :]])
# same as: r1 = a[:, None, None] * b[None, :, None] * c[None, None, :]
# same as: r1 = a.reshape(-1, 1, 1) * b.reshape(1, -1, 1) * c.reshape(1, 1, -1)
print(np.all(r0 == r2))
# True
r2 = np.einsum('i,j,k->ijk', a, b, c)
print(np.all(r0 == r2))
# True
# as per #hpaulj suggestion
r3 = prod(np.ix_(a, b, c))
print(np.all(r0 == r3))
# True
Of course, the broadcasting approach (which is the same that you used with the array.reshape() version of your code, except that it uses a slightly different syntax for providing the correct shape), can be automatized by explicitly building the slicing (or equivalently the array.reshape() parameters).
In [166]: a = np.arange(2)
...: b = np.arange(3)
...: c = np.arange(4)
As shown in comments and answer:
In [167]: R = np.einsum('i,j,k',a,b,c)
We can also np.ix_ construct arrays that broadcast against each other. This is often used to construct block indexing arrays, but works here as well:
In [168]: A,B,C = np.ix_(a,b,c)
In [169]: A,B,C
Out[169]:
(array([[[0]],
[[1]]]),
array([[[0],
[1],
[2]]]),
array([[[0, 1, 2, 3]]]))
In [170]: R1 = A*B*C
testing:
In [171]: np.allclose(R,R1)
Out[171]: True
That broadcasted product can be done in one line with:
In [172]: np.prod(np.array(np.ix_(a,b,c),object)).shape
Out[172]: (2, 3, 4)
Without that explicit object dtype casting I get a future warning about creating an ragged array.
np.meshgrid(a,b,c, sparse=True, indexing='ij') is an alternative to ix_.
While these ix_ etc expressions are nice, you should become thoroughly comfortable using:
a[:, None, None] * b[None, :, None] * c[None, None, :]
This kind of dimension expansion gives you the most power and flexibility.
The simplest approach (from this answer) is to use:
functools.reduce(np.multiply.outer, (a, b, c))
This works for any number of dimensions, and unlike np.prod(np.ix_(...)) it does not result in numpy deprecation warnings about introducing jagged arrays.
I have 2 arrays, x and y:
x = [[ 1. 2. 3. 4.]
[ 5. 6. 7. 8.]
[ 9. 0. 3. 6.]]
y = [[ 1. 0. 0.]
[ 0. 1. 0.]
[ 0. 0. 1.]]
I want a z matrix, as: z = [y[0], x, y[1], y[2]]:
[[ 1. 1. 2. 3. 4. 0. 0.]
[ 0. 5. 6. 7. 8. 1. 0.]
[ 0. 9. 0. 3. 6. 0. 1.]]
So I made this code:
z = np.c_[y[0], x]
for j in range(n):
z = np.c_[x, y[j]]
But it is not saving the matrix. My resulting z was just the last operation:
[[ 1. 2. 3. 4. 0.]
[ 5. 6. 7. 8. 0.]
[ 9. 0. 3. 6. 1.]]
How could I save the changes made on the matrix? I also tried to numpy.append() the same way, but it gives an error message:
ValueError: all the input arrays must have same number of dimensions
Using np.c to stack columns of y and x..
np.c_[np.array(y)[0],np.asanyarray(x),np.array(y)[1],np.array(y)[2]]
Out[536]:
array([[1, 1, 2, ..., 4, 0, 0],
[0, 5, 6, ..., 8, 1, 0],
[0, 9, 0, ..., 6, 0, 1]])
Or you can use np.roll to shift the columns before stacking them and shift again afterwards.
np.roll(np.c_[np.array(x),np.roll(np.array(y),-1,axis=1)],1,axis=1)
Out[549]:
array([[1, 1, 2, ..., 4, 0, 0],
[0, 5, 6, ..., 8, 1, 0],
[0, 9, 0, ..., 6, 0, 1]])
I think that the command you are looking for is numpy.insert(a, pos, col, axis = 1). If you make z = insert(y, 1, x, axis = 1) it will insert a new column on y with the values from x, and save the output in z.
I searched the net to find a guide for Scipy sparse matrices and I failed. I would be happy if anybody would share any source for it but now going to question:
I have an array of tuples. I want to change the array of tuples to a sparse matrix where the tuples appear on the main diagonal and diagonal just beside to it as the following example shows it. What is the fancy(efficient) way of doing it?
import numpy as np
A=np.asarray([[1,2],[3,4],[5,6],[7,8]])
B=np.zeros((A.shape[0],A.shape[0]+1))
for i in range(A.shape[0]):
B[i,i]=A[i,0]
B[i,i+1]=A[i,1]
print B
Output being:
[[ 1. 2. 0. 0. 0.]
[ 0. 3. 4. 0. 0.]
[ 0. 0. 5. 6. 0.]
[ 0. 0. 0. 7. 8.]]
You can build those really fast as a CSR matrix:
>>> A = np.asarray([[1,2],[3,4],[5,6],[7,8]])
>>> rows = len(A)
>>> cols = rows + 1
>>> data = A.flatten() # we want a copy
>>> indptr = np.arange(0, len(data)+1, 2) # 2 non-zero entries per row
>>> indices = np.repeat(np.arange(cols), [1] + [2] * (cols-2) + [1])
>>> import scipy.sparse as sps
>>> a_sps = sps.csr_matrix((data, indices, indptr), shape=(rows, cols))
>>> a_sps.A
array([[1, 2, 0, 0, 0],
[0, 3, 4, 0, 0],
[0, 0, 5, 6, 0],
[0, 0, 0, 7, 8]])
Try diags from scipy
import numpy as np
import scipy.sparse
A = np.asarray([[1,2],[3,4],[5,6],[7,8]])
B = scipy.sparse.diags([A[:,0], A[:,1]], [0, 1], [4, 5])
When I print B.todense(), it gives me
[[ 1. 2. 0. 0. 0.]
[ 0. 3. 4. 0. 0.]
[ 0. 0. 5. 6. 0.]
[ 0. 0. 0. 7. 8.]]