Related
I'd like to assign multiple values to a tensor, but it seems that it's not supported at least in the way that is possible using numpy.
a = np.zeros((4, 4))
v = np.array([0, 2, 3, 1])
r = np.arange(4)
a[r, v] = 1
>>> a
array([[1., 0., 0., 0.],
[0., 0., 1., 0.],
[0., 0., 0., 1.],
[0., 1., 0., 0.]])
The above works, but the tensorflow equivalent doesn't:
import tensorflow as tf
a = tf.zeros((4, 4))
v = tf.Variable([0, 2, 3, 1])
r = tf.range(4)
a[r, v].assign(1)
TypeError: Only integers, slices, ellipsis, tf.newaxis and scalar tensors are valid indices, got <tf.Tensor: shape=(4,), dtype=int32, numpy=array([0, 1, 2, 3])>
How could this be achieved? Are loops the only option? In my case the resulting array is indeed only slices of an identity matrix rearranged, so maybe that could be taken advantage of somehow.
Your example, which is updating a zero tensor at some indices to a certain value is most of time achieved through tf.scatter_nd :
idx = tf.stack([r,v],axis=-1)
tf.scatter_nd(idx, updates=tf.ones(4), shape=(4,4))
For more complex cases, you can look at the following functions:
tf.tensor_scatter_nd_add: Adds sparse updates to an existing tensor according to indices.
tf.tensor_scatter_nd_sub: Subtracts sparse updates from an existing tensor according to indices.
tf.tensor_scatter_nd_max: to copy element-wise maximum values from one tensor to another.
tf.tensor_scatter_nd_min: to copy element-wise minimum values from one tensor to another.
tf.tensor_scatter_nd_update: Scatter updates into an existing tensor according to indices.
You can read more in the guide: Introduction to tensor slicing
I want to change a number of values in my pandas dataframe, where the indices that are indicating the columns may vary in size.
I need something that is faster than a for-loop, because it will be done on a lot of rows, and this turned out to be too slow.
As a simple example, consider this
df = pd.DataFrame(np.zeros((5,5)))
Now, I want to change some of the values in this dataframe to 1. If I e.g. want to change the values in the second and fith row for the first two columns, but in the fourth row I want to change all the values, I want something like this to work:
col_indices = np.array([np.arange(2),np.arange(5),np.arange(2)])
row_indices = np.array([1,3,4])
df.loc(row_indices,col_indices) =1
However, this does not work (I suspect that it does not work because the shape of the data you would select is not conform with a dataframe).
Is there any more flexible way of indexing without having to loop over rows etc.?
A solution that works only for range-like arrays (as above) would also work for my current problem - but general answer would also be nice.
Thanks for any help!
IIUC here's one approach. Define the column indices as the amount of columns where you want to insert 1s instead, and the rows where you want to insert them:
col_indices = np.array([2,5,2])
row_indices = np.array([1,3,4])
arr = df.values
And use advanced indexing to set the cells of interest to 1:
arr[row_indices] = np.arange(arr.shape[0]) <= col_indices[:,None]
array([[0., 0., 0., 0., 0.],
[1., 1., 0., 0., 0.],
[0., 0., 0., 0., 0.],
[1., 1., 1., 1., 1.],
[1., 1., 0., 0., 0.]])
I have a bit of code that loads up a long (100k-1mil) set of lines, it has an index in the first column followed by 18 values, for a total of 19 floats per line. This all is put into a numpy array.
I need to do some simple processing on the matrix to keep the index column and get out 1s and 0s depending on conditions of whether values are positive or negative, but the criterion varies as the columns are sequential pairs of values with different reference values.
The code below goes through the columns 2-19 first by evens then odds to check the values, and then creates a temporary list to put into the array I want to have at the end.
I know there's a simpler way to do this, with list comprehension and possibly lambda, but I'm not proficient enough with this to figure it out. So I'm hoping someone can help me reduce the length of this code into something more compact. More efficient would be great too, but I know that the compact methods don't always increase efficiency. It will however help me better understand list comprehension, with and without numpy.
Sample values for reference:
0.000 72.250 -158.622 86.575 -151.153 85.807 -149.803 84.285 -143.701 77.723 -160.471 96.587 -144.020 75.827 -157.071 87.629 -148.856 100.814 -140.488
10.000 56.224 -174.351 108.309 -154.148 68.564 -155.721 83.634 -132.836 75.030 -177.971 100.623 -146.616 61.856 -150.885 92.147 -150.124 91.841 -153.112
20.000 53.357 -153.537 58.190 -160.235 77.575 176.257 93.771 -150.549 77.789 -161.534 103.589 -146.363 73.623 -159.441 99.315 -129.663 92.842 -138.736
And here is the code snippet:
datain = numpy.loadtxt(testfile.txt) #load data
dataout = numpy.zeros(datain.shape) # initialize empty processing array
dataout[:, 0] = datain[:, 0] # assign time values from input data to processing array
dataarray = numpy.zeros(len(datain[0]))
phit = numpy.zeros((len(dataarray)-1)/2)
psit = numpy.zeros((len(dataarray)-1)/2)
for i in range(len(datain)):
dataarray = numpy.copy(datain[i])
phit[:] = dataarray[1::2]
psit[:] = dataarray[2::2]
temp = []
for j in range(len(phit)):
if(phit[j] < 0):
temp.append(1)
else:
temp.append(0)
if(psit[j] > 0):
temp.append(1)
else:
temp.append(0)
dataout[i][1:] = temp
Thanks in advance, I know there's a fair number of questions on these topics here; unfortunately I couldn't find one that helped me get to a solution.
As #abarnert mentioned, the solution here is not to write better loops, but (since you're using Numpy) to not loop in Python at all by understanding how to use Numpy in more advanced ways.
What you have is a matrix like
[ [idx, v0a, v0b, v1a, v1b, ... ], ... ]
And you want a matrix that's basically
[ [idx, 1 if v0a < 0 else 0, 1 if v0b > 0 else 0, ... ], ... ]
We're going to do this in two steps: first, we'll transform the matrix slightly so that the comparisons are all the same; second, we'll apply the comparison in-place.
The only difference between how we handle "even" and "odd" columns is that one is being checked for <0, the other >0. If we modify the second group of columns by multiplying them by -1, then these comparisons both become simply <0:
datain[:, 2::2] *= -1
Now we just want to know, for every value (besides the first column), is that value <0. This is super easy:
datain[:, 1:] < 0
This returns a matrix of boolean values, where each value represents whether or not the corresponding cell in datain[:, 1:] was less than 0. You want these as integers, 1 for True and 0 for False; it turns out, when we assign these boolean values back into our original array (which contains floats), numpy will cast the bools into floats automatically; True will get cast to 1.0, and False will get cast to 0.0.
If you don't want to throw away your original data, simply copy it off first. Here's the complete code:
# If you want to preserve your old data, create a copy for us to modify
dataout = np.array(datain)
# Now assign your integer values into your data array
dataout[:, 2::2] *= -1
dataout[:, 1:] = datain[:, 1:] < 0
For the sample input you provided:
array([[ 0. , 72.25 , 158.622, 86.575, 151.153, 85.807,
149.803, 84.285, 143.701, 77.723, 160.471, 96.587,
144.02 , 75.827, 157.071, 87.629, 148.856, 100.814,
140.488],
[ 10. , 56.224, 174.351, 108.309, 154.148, 68.564,
155.721, 83.634, 132.836, 75.03 , 177.971, 100.623,
146.616, 61.856, 150.885, 92.147, 150.124, 91.841,
153.112],
[ 20. , 53.357, 153.537, 58.19 , 160.235, 77.575,
-176.257, 93.771, 150.549, 77.789, 161.534, 103.589,
146.363, 73.623, 159.441, 99.315, 129.663, 92.842,
138.736]])
This code ends up with the following final result:
array([[ 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0.],
[10., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0.],
[20., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0.]])
Thanks to abarnert for pointing me in the right direction with this, the solution is pretty simple.
datain = numpy.loadtxt(testfile.txt) #load data
dataout = numpy.empty(datain.shape, dtype=int) # initialize empty processing array
dataout[:, 0] = datain[:, 0] # assign time values from input data to processing array
dataout[:, 1::2] = datain[:, 1::2] < 0
dataout[:, 2::2] = datain[:, 2::2] > 0
That's it! Much shorter, much more readable, and gets me the values I want.
Just to give you some context:
I have to translate some MATLAB code into Python 3 one, but here I've been confronted to a little problem.
Matlab:
for i in 1:num_nodes
for j in 1:num_nodes
K{i,j} = zeros(3,3);
Which I translated into:
k_topology = [[]]
for i in range(x):
for i in range(x):
k_topology[[i][j]].extend(np.zeros(3,3))
Also, further in the Matlab code there's a third loop:
for k in 1:3
K{i,j}(k,k) = -1
Which also kind of... Upsets me?
The fact is I don't really see how I can translate this kind of variable into Python. Also, I guess that my Python code's kind of "broken" - and I'm not really asking to any of you to improve it - , so I'm just asking which is the best way to translate Matlab's cell into Python?
I finally found something apparently simple to translate this, using list comprehension - according to kazemakase's answer. The actual Python code is now looking like this:
k_topology = [[np.zeros((3,3)) for j in range(self.get_nb_nodes_from_network())]\
for i in range(self.get_nb_nodes_from_network())]
And looks like something like this in Output:
[[array([[ 0., 0., 0.],
[ 0., 0., 0.],
[ 0., 0., 0.]]),
array([[ 0., 0., 0.],
[ 0., 0., 0.],
[ 0., 0., 0.]]),
array([[ 0., 0., 0.],
[ 0., 0., 0.],
[ 0., 0., 0.]])], ..., [array(...)]]
(There's really too many values to paste it here, but I think you got it.)
The first question you need to ask is "what is a Matlab cell and what could be a suitable corresponding Python type?"
If I remember correctly from my bad old Matlab days, a cell is sort of a container that holds content of mixed types. It is something like a dynamically typed array or matrix. It is multidimensionally indexed.
Python is dynamically typed, so any Python contianer can basically fulfill this function. Lists in Python are indexed, so nested lists could work - but they are somewhat weird to set up and access:
K = [[None] * num_nodes for _ in range(num_nodes)]
K[i][j] # need two indices to access elements of a nested list.
For the particular scenario a dictionary better mirrors Matlab syntax. Although a ditionary takes only one index, we can exploit the fact that tuples can be declared without brackets and that dictionaries can take tuples as index:
K = {}
for i in range(num_nodes):
for j in range(num_nodes):
K[i, j] = np.zeros((3, 3))
for k in 1:3
K[i, j][k, k] = -1
While the dictionary is syntactically more concise, element access is potentially less performant than in nested lists. Nested look different than Matlab code. The choice depends on performance or similarity to the original code. But if performance is an issue there are many more things to consider, anyway. In summary: There is no one best way to do it.
Since the OP expclicitly asked not to improve the code, I explicitly ask him/her to ignore this part of the answer.
A better way to build diagonal matrices is to use np.ones instead of looping over diagonal elements.
K = {}
for i in range(num_nodes):
for j in range(num_nodes):
K[i, j] = -np.ones((3, 3))
Also, nested lists can be constructed without (much) prior initialization, if that is the preferred approach:
K = []
for i in range(num_nodes):
K.append([])
for j in range(num_nodes):
K[-1].append(-np.ones((3, 3)))
Now, for the peace of my soul, let me take apart provide feedback on the OP's code:
k_topology = [[]]
for i in range(x):
for i in range(x):
k_topology[[i][j]].extend(np.zeros(3,3))
This has nothing to do with the original Matlab code (different variable names)
Both loops use i. j is never defined.
[[i][j]] builds a list with one element i and tries to take the jth element. If j is ever something other than 0 this will cause an error.
list.extend a appends all elements of the argument individually to the list - in this case individual rows. list.append would be correct to use as the whole 3x3 matrix should be appended as one element in K.
np.zeros(3, 3) should be np.zeros((3, 3)) (assuming np is an alias for numpy) because the function takes the shape is the first argument, not multiple arguments.
Using the Octave/scipy save/loadmat that I demonstrated in the linked post:
In an Octave session
>> num_nodes=3
num_nodes = 3
>> num_nodes=3;
>> K=cell(num_nodes, num_nodes);
>> for i = 1:num_nodes
for j = 1:num_nodes
K{i,j} = zeros(2,2);
end
end
>> K
K =
{
[1,1] =
0 0
0 0
[2,1] =
0 0
0 0
etc
Access one cell:
>> K{1,2}
ans =
0 0
0 0
Access one element of one cell:
>> K{1,2}(1,1)
ans = 0
>> save -7 kfile.mat K
In Python
In [31]: from scipy import io
In [32]: data = io.loadmat('kfile.mat')
In [34]: data
Out[34]:
{'K': array([[array([[ 0., 0.],
[ 0., 0.]]),
array([[ 0., 0.],
[ 0., 0.]]),
array([[ 0., 0.],
[ 0., 0.]])],
[array([[ 0., 0.],
[ 0., 0.]]),
array([[ 0., 0.],
[ 0., 0.]]),
array([[ 0., 0.],
[ 0., 0.]])],
[array([[ 0., 0.],
[ 0., 0.]]),
array([[ 0., 0.],
[ 0., 0.]]),
array([[ 0., 0.],
[ 0., 0.]])]], dtype=object),
'__globals__': [],
'__header__': b'MATLAB 5.0 MAT-file, written by Octave 4.0.0, 2017-02-15 19:05:44 UTC',
'__version__': '1.0'}
In [35]: data['K'].shape
Out[35]: (3, 3)
In [36]: data['K'][0,0].shape
Out[36]: (2, 2)
In [37]: data['K'][0,0][0,0]
Out[37]: 0.0
loadmat treats a cell as a 2d object dtype array; while regular matrices are 2d numeric arrays. Object arrays are, in many ways like a nested Python list.
I wish to initiate a symmetric matrix in python and populate it with zeros.
At the moment, I have initiated an array of known dimensions but this is unsuitable for subsequent input into R as a distance matrix.
Are there any 'simple' methods in numpy to create a symmetric matrix?
Edit
I should clarify - creating the 'symmetric' matrix is fine. However I am interested in only generating the lower triangular form, ie.,
ar = numpy.zeros((3, 3))
array([[ 0., 0., 0.],
[ 0., 0., 0.],
[ 0., 0., 0.]])
I want:
array([[ 0],
[ 0, 0 ],
[ 0., 0., 0.]])
Is this possible?
I don't think it's feasible to try work with that kind of triangular arrays.
So here is for example a straightforward implementation of (squared) pairwise Euclidean distances:
def pdista(X):
"""Squared pairwise distances between all columns of X."""
B= np.dot(X.T, X)
q= np.diag(B)[:, None]
return q+ q.T- 2* B
For performance wise it's hard to beat it (in Python level). What would be the main advantage of not using this approach?