Vectorised assignment to tensor - python

I'd like to assign multiple values to a tensor, but it seems that it's not supported at least in the way that is possible using numpy.
a = np.zeros((4, 4))
v = np.array([0, 2, 3, 1])
r = np.arange(4)
a[r, v] = 1
>>> a
array([[1., 0., 0., 0.],
[0., 0., 1., 0.],
[0., 0., 0., 1.],
[0., 1., 0., 0.]])
The above works, but the tensorflow equivalent doesn't:
import tensorflow as tf
a = tf.zeros((4, 4))
v = tf.Variable([0, 2, 3, 1])
r = tf.range(4)
a[r, v].assign(1)
TypeError: Only integers, slices, ellipsis, tf.newaxis and scalar tensors are valid indices, got <tf.Tensor: shape=(4,), dtype=int32, numpy=array([0, 1, 2, 3])>
How could this be achieved? Are loops the only option? In my case the resulting array is indeed only slices of an identity matrix rearranged, so maybe that could be taken advantage of somehow.

Your example, which is updating a zero tensor at some indices to a certain value is most of time achieved through tf.scatter_nd :
idx = tf.stack([r,v],axis=-1)
tf.scatter_nd(idx, updates=tf.ones(4), shape=(4,4))
For more complex cases, you can look at the following functions:
tf.tensor_scatter_nd_add: Adds sparse updates to an existing tensor according to indices.
tf.tensor_scatter_nd_sub: Subtracts sparse updates from an existing tensor according to indices.
tf.tensor_scatter_nd_max: to copy element-wise maximum values from one tensor to another.
tf.tensor_scatter_nd_min: to copy element-wise minimum values from one tensor to another.
tf.tensor_scatter_nd_update: Scatter updates into an existing tensor according to indices.
You can read more in the guide: Introduction to tensor slicing

Related

Using np.diag() to construct a 3D array [duplicate]

This question already has answers here:
What's the best way to create a "3D identity matrix" in Numpy?
(3 answers)
Closed 2 years ago.
The standard usage of the np.diag(a) function when given a 1D array a is to create a 2D array with the diagonal entries being the elements of a. In my case, a is a 2D array with size n x m. My goal is to generate an n x n x m array in a manner similar to the np.diag() function, where each n x n slice is a matrix of zeros with the m'th row of a in the diagonal. What is the best way of doing this? Clearly it can be done with the np.diag() function and a for loop, but I am wondering whether a vectorized version of this exists with numpy.
One way to accomplish this is to use the function np.broadcast_to, which broadcasts a given array to a new shape. I had trouble broadcasting the m dimension to the end of the array, but broadcasting it as the first dimension and then transposing along the first and last dimensions also seemed to work just fine.
Please see the code snippet below:
# Specify dimensions
n = 4
m = 3
# Create diagonal matrix
D = np.eye(n)
# Broadcast diagonal and transpose
B = np.transpose(np.broadcast_to(D, (m,) + D.shape), (2, 1, 0))
# Verify shape
print(B.shape)
--> (4, 4, 3)
# Verify correct slice
print(B[:, :, 0])
--> array([[1., 0., 0., 0.],
[0., 1., 0., 0.],
[0., 0., 1., 0.],
[0., 0., 0., 1.]])
Hope this helps!

How to multiply diagonal elements by each other using numpy?

For the purpose of this exercise, let's consider a matrix where the element m_{i, j} is given by the rule m_{i, j} = i*j if i == j and 0 else.
Is there an easy "numpy" way of calculating such a matrix without having to resort to if statements checking for the indices?
You can use the numpy function diag to construct a diagonal matrix if you give it the intended diagonal as a 1D array as input.
So you just need to create that, like [i**2 for i in range (N)] with N the dimension of the matrix.
You could use the identity matrix given by numpy.identity(n) and then multiply it by a n dimensional vector.
Assuming you have a squared matrix, you can do this:
import numpy as np
ary = np.zeros((4, 4))
_ = [ary.__setitem__((i, i), i**2) for i in range(ary.shape[0])]
print(ary)
# array([[0., 0., 0., 0.],
# [0., 1., 0., 0.],
# [0., 0., 4., 0.],
# [0., 0., 0., 9.]])

Create 2D matrices from several csv files

I'm working with Python3 and I would like to load datas from several CSV files.
Each CSV (one measurement) has 3 columns (3 different physical quantities). I want to load each quantity on 3 separate variables. For one CSV file this is quite simple, I used :
TIME,CH1,CH2 = loadtxt(file_path,usecols=(3,4,5),delimiter=',',skiprows=2,unpack=True)
and it worked fine. Now I would like to extend this procedure so I can load several CSV files. Each array would be 2D, each column representing one CSV file. Instead of having several CSV with three variables, I will have 3 2D arrays, which is much more convenient for data analysis.
I thought I could try something like this :
TIME = matrix(zeros((20480,len(file_path)))) # 20480 length of each column
CH1 = matrix(zeros((20480,len(file_path)))) # len(file_path) number of CSV files
CH2 = matrix(zeros((20480,len(file_path))))
for k in range(0,len(file_path)): # reading each CSV file
TIME[:,k],CH1[:,k],CH2[:,k] = loadtxt(file_path[k],usecols=(3,4,5),delimiter=',',skiprows=2,unpack=True)
But it's telling me :
ValueError: could not broadcast input array from shape (20480) into shape (20480,1)
In the end I would like variables looking like this :
TIME = matrix([[0., 0., 0.],
[0., 0., 0.],
[0., 0., 0.],
...,
[0., 0., 0.],
[0., 0., 0.],
[0., 0., 0.]])
Each column is from one different CSV file.
I think this is a quite usual problem, but I don't really get how arrays works in Python. I get this idea from Matlab which is quite straightforward but here I don't know why indexing arrays with TIME[:][:] doesn't work.
Have you any idea how I could do this ?
Thanks.
Use np.array, not np.matrix
I can't emphasize this enough. np.matrix exists only for legacy reasons. See this answer for an explanation of the difference. np.matrix requires 2 dimensions, while np.array permits a single dimension when indexing. This seems to be the source of your error.
Here's a minimal example exhibiting the behaviour you are seeing:
A = np.array([[1, 2, 3], [4, 5, 6]])
B = np.matrix(A)
print(A[:, 0].shape) # (2,)
print(B[:, 0].shape) # (2, 1)
Therefore, define your resultant arrays as np.array objects:
m = 20480
n = len(file_path)
shape = (m, n)
TIME = np.zeros(shape)
CH1 = np.zeros(shape)
CH2 = np.zeros(shape)

subsetting affects .view(np.float64) behaviour

I'm trying to use some sklearn estimators for classifications on the coefficients of some fast fourier transform (technically Discrete Fourier Transform). I obtain a numpy array X_c as output of np.fft.fft(X) and I want to transform it into a real numpy array X_r, with each (complex) column of the original X_c transformed into two (real/float) columns in X_r, i.e the shape goes from (r, c) to (r, 2c). So I use .view(np.float64). and it works at first.
The problem is that if I first decide to keep only some coefficients of the original complex array with X_c2 = X_c[:, range(3)] and then to do the same thing as before instead of having the number of columns doubled, I obtain the number of ranks doubled (the imaginary part of each element is put in a new row below the original).
I really don't understand why this happens.
To make myself clearer, here is a toy example:
import numpy as np
# I create a complex array
X_c = np.arange(8, dtype = np.complex128).reshape(2, 4)
print(X_c.shape) # -> (2, 4)
# I use .view to transform it into something real and it works
# the way I want it.
X_r = X_c.view(np.float64)
print(X_r.shape) # -> (2, 8)
# Now I subset the array.
indices_coef = range(3)
X_c2 = X_c[:, indices_coef]
print(X_c2.shape) # -> (2, 3)
X_r2 = X_c2.view(np.float64)
# In the next line I obtain (4, 3), when I was expecting (2, 6)...
print(X_r2.shape) # -> (4, 3)
Does anyone see a reason for this difference of behavior?
I get a warning:
In [5]: X_c2 = X_c[:,range(3)]
In [6]: X_c2
Out[6]:
array([[ 0.+0.j, 1.+0.j, 2.+0.j],
[ 4.+0.j, 5.+0.j, 6.+0.j]])
In [7]: X_c2.view(np.float64)
/usr/local/bin/ipython3:1: DeprecationWarning: Changing the shape of non-C contiguous array by
descriptor assignment is deprecated. To maintain
the Fortran contiguity of a multidimensional Fortran
array, use 'a.T.view(...).T' instead
#!/usr/bin/python3
Out[7]:
array([[ 0., 1., 2.],
[ 0., 0., 0.],
[ 4., 5., 6.],
[ 0., 0., 0.]])
In [12]: X_c2.strides
Out[12]: (16, 32)
In [13]: X_c2.flags
Out[13]:
C_CONTIGUOUS : False
F_CONTIGUOUS : True
So this copy (or is a view?) is Fortran order. The recommended X_c2.T.view(float).T produces the same 4x3 array without the warning.
As your first view shows, a complex array has the same data layout as twice the number of floats.
I've seen funny shape behavior when trying to view a structured array. I'm wondering the complex dtype is behaving much like a dtype('f8,f8') array.
If I change your X_c2 so it is a copy, I get the expected behavior
In [19]: X_c3 = X_c[:,range(3)].copy()
In [20]: X_c3.flags
Out[20]:
C_CONTIGUOUS : True
F_CONTIGUOUS : False
OWNDATA : True
WRITEABLE : True
ALIGNED : True
UPDATEIFCOPY : False
In [21]: X_c3.strides
Out[21]: (48, 16)
In [22]: X_c3.view(float)
Out[22]:
array([[ 0., 0., 1., 0., 2., 0.],
[ 4., 0., 5., 0., 6., 0.]])
That's reassuring. But I'm puzzled as to why the [:, range(3)] indexing creates a F order view. That should be advance indexing.
And indeed, a true slice does not allow this view
In [28]: X_c[:,:3].view(np.float64)
---------------------------------------------------------------------------
ValueError: new type not compatible with array.
So the range indexing has created some sort of hybrid object.

How to keep numpy from broadcasting when creating an object array of different shaped arrays

I try to store a list of different shaped arrays as a dtype=object array using np.save (I'm aware I could just pickle the list but I'm really curious how to do this).
If I do this:
import numpy as np
np.save('test.npy', [np.zeros((2, 2)), np.zeros((3,3))])
it works.
But this:
np.save('test.npy', [np.zeros((2, 2)), np.zeros((2,3))])
Gives me an error:
ValueError: could not broadcast input array from shape (2,2) into shape (2)
I guess np.save converts the list into an array first, so I tried:
x=np.array([np.zeros((2, 2)), np.zeros((3,3))])
y=np.array([np.zeros((2, 2)), np.zeros((2,3))])
Which has the same effect (first one works, second one doesn't.
The resulting x behaves as expected:
>>> x.shape
(2,)
>>> x.dtype
dtype('O')
>>> x[0].shape
(2, 2)
>>> x[0].dtype
dtype('float64')
I also tried to force the 'object' dtype:
np.array([np.zeros((2, 2)), np.zeros((2,3))], dtype=object)
Without success. It seems numpy tries to broadcast the array with equal first dimension into the new array and realizes too late that their shape is different. Oddly it seems to have worked at one point - so I'm really curious what the difference is, and how to do this properly.
EDIT:
I figured out the case it worked before: The only difference seems to be that the numpy arrays in the list have another data type.
It works with dtype('<f8'), but it doesn't with dtype('float64'), I'm not even sure what the difference is.
EDIT 2:
I found a very non-pythonic way to solve my issue, I add it here, maybe it helps to understand what I wanted to do:
array_list=np.array([np.zeros((2, 2)), np.zeros((2,3))])
save_array = np.empty((len(array_list),), dtype=object)
for idx, arr in enumerate(array_list):
save_array[idx] = arr
np.save('test.npy', save_array)
One of the first things that np.save does is
arr = np.asanyarray(arr)
So yes it is trying to turn your list into an array.
Constructing an object array from arbitrary sized arrays or lists is tricky. np.array(...) tries to create as high a dimensional array as it can, even attempting to concatenate the inputs if possible. The surest way is to do what you did - make the empty array and fill it.
A slightly more compact way of constructing the object array:
In [21]: alist = [np.zeros((2, 2)), np.zeros((2,3))]
In [22]: arr = np.empty(len(alist), dtype=object)
In [23]: arr[:] = alist
In [24]: arr
Out[24]:
array([array([[ 0., 0.],
[ 0., 0.]]),
array([[ 0., 0., 0.],
[ 0., 0., 0.]])], dtype=object)
Here are 3 scenarios:
Arrays that match in shape, combine into a 3d array:
In [27]: np.array([np.zeros((2, 2)), np.zeros((2,2))])
Out[27]:
array([[[ 0., 0.],
[ 0., 0.]],
[[ 0., 0.],
[ 0., 0.]]])
In [28]: _.shape
Out[28]: (2, 2, 2)
Arrays that don't match on the first dimension - create object array
In [29]: np.array([np.zeros((2, 2)), np.zeros((3,2))])
Out[29]:
array([array([[ 0., 0.],
[ 0., 0.]]),
array([[ 0., 0.],
[ 0., 0.],
[ 0., 0.]])], dtype=object)
In [30]: _.shape
Out[30]: (2,)
And awkward intermediate case (which may even be described as a bug). The first dimensions match, but the second ones don't):
In [31]: np.array([np.zeros((2, 2)), np.zeros((2,3))])
...
ValueError: could not broadcast input array from shape (2,2) into shape (2)
[ 0., 0.]])], dtype=object)
It's as though it initialized a (2,2,2) array, and then found that the (2,3) wouldn't fit. And the current logic doesn't allow it to backup and create the object array as it did in the previous scenario.
If you wanted to put the two (2,2) arrays in object array you'd have to use the create and fill logic.

Categories

Resources