I'm trying to produce a color mapping of the convergence of a polynomial's roots in complex space. In order to do this, I have created a grid of points and applied Newton's method to those points, in order to find to which complex root they each converge. This gives me a 2d array of complex numbers, the elements of which denote the point to which they converge, within some tolerance. I want to be able to match the numbers in that matrix to an element-wise color mapping.
I have done this by iterating over the array and computing colors element-by-element, but it is very slow, and seems it would benefit from vectorizing. Here's my code so far:
def colorvec(rootsmatrix, known_roots):
dim = len(known_roots)
dist = ndarray((dim, nx, ny))
for i in range(len(known_roots)):
dist[i] = abs(rootsmatrix-known_roots[i])
This creates a 3d array with the distances of each point's computed root to each of the actual roots. It looks something like this, except with 75 000 000 elements.
[ [ [6e-15 7e-15 0.5]
[1.5 5e-15 0.5] #submatrix 1
[0.75 0.98 0.78] ]
[ [1.5 0.75 0.5]
[8e-15 5e-15 0.8] #submatrix 2
[0.75 0.98 0.78] ]
[ [1.25 0.5 5e-15]
[0.5 0.64 4e-15] #submatrix 3
[5e-15 4e-15 7e-15] ]
I want to take dist, and return the 1st dimension argument (i.e., 1, 2, or 3) for each 2nd- and 3rd-dimension argument, for which dist is minimum. That will be my color mapping. For example, comparing the element (0,0) of each of the 3 submatrices would yield that color(0,0) = 0. Similarly, color(1,1) = 0 and color (2,2) = 2. I want to be able to do this for the entire color matrix.
I haven't been able to find a way to do this using numpy.argmin, but I could be missing something. If there's another way to do this, I'd be happy to hear, especially if it doesn't involve loops. I'm making ~25MP images here, so looping takes fully 25 minutes to assign colors.
Thanks in advance for your advice!
You can pass an axis argument to argmin. You want to minimize along the first axis (what you're calling 'submatrices'), which is axis=0:
dist.argmin(0)
dist = array([[[ 6.00e-15, 7.00e-15, 5.00e-01],
[ 1.50e+00, 5.00e-15, 5.00e-01],
[ 7.50e-01, 9.80e-01, 7.80e-01]],
[[ 1.50e+00, 7.50e-01, 5.00e-01],
[ 8.00e-15, 5.00e-15, 8.00e-01],
[ 7.50e-01, 9.80e-01, 7.80e-01]],
[[ 1.25e+00, 5.00e-01, 5.00e-15],
[ 5.00e-01, 6.40e-01, 4.00e-15],
[ 5.00e-15, 4.00e-15, 7.00e-15]]])
dist.argmin(0)
#array([[0, 0, 2],
# [1, 0, 2],
# [2, 2, 2]])
This of course gives you 0, 1, 2 as the returns, if you want 1, 2, 3 as stated, use:
dist.argmin(0) + 1
#array([[1, 1, 3],
# [2, 1, 3],
# [3, 3, 3]])
Finally, if you actually want the minimum value itself (instead of which 'submatrix' it comes from), you can just use dist.min(0):
dist.min(0)
#array([[ 6.00e-15, 7.00e-15, 5.00e-15],
# [ 8.00e-15, 5.00e-15, 4.00e-15],
# [ 5.00e-15, 4.00e-15, 7.00e-15]])
If you want to use the minimum location from the dist matrix to pull a value out of another matrix, it's a little tricky, but you can use
minloc = dist.argmin(0)
other[dist.argmin(0), np.arange(dist.shape[1])[:, None], np.arange(dist.shape[2])]
Note that if other=dist this gives the same output as just calling dist.min(0):
dist[dist.argmin(0), np.arange(dist.shape[1])[:, None], np.arange(dist.shape[2])]
#array([[ 6.00e-15, 7.00e-15, 5.00e-15],
# [ 8.00e-15, 5.00e-15, 4.00e-15],
# [ 5.00e-15, 4.00e-15, 7.00e-15]])
or if other just says which submatrix it is, you get the same thing back:
other = np.ones((3,3,3))*np.arange(1,4).reshape(3,1,1)
other
#array([[[ 1., 1., 1.],
# [ 1., 1., 1.],
# [ 1., 1., 1.]],
# [[ 2., 2., 2.],
# [ 2., 2., 2.],
# [ 2., 2., 2.]],
# [[ 3., 3., 3.],
# [ 3., 3., 3.],
# [ 3., 3., 3.]]])
other[dist.argmin(0), np.arange(dist.shape[1])[:, None], np.arange(dist.shape[2])]
#array([[ 1., 1., 3.],
# [ 2., 1., 3.],
# [ 3., 3., 3.]])
As an unrelated note, you can rewrite colorvec without that loop, assuming rootsmatrix.shape is (nx, ny) and known_roots.shape is (dim,)
def colorvec(rootsmatrix, known_roots):
dist = abs(rootsmatrix - known_roots[:, None, None])
where known_roots[:, None, None] is the same as known_roots.reshape(len(known_roots), 1, 1) and causes it to broadcast with rootsmatrix
Related
I have a tensor called my_tensor with tha shape of [batch_size, seq_length] and I have another tensor named idx with tha shape of [batch_size, 1] which is comprised of indices which start at 0 and finish at "seq_length".
I want to extract the values of in each row of my_tensor on using the indices defined in idx.
I tried to use tf.gather_nd and tf.gather but I was not successful.
Consider the following example:
batch_size = 3
seq_length = 5
idx = [2, 0, 4]
my_tensor = tf.random.uniform(shape=(batch_size, seq_length))
I want to get the values at
[[0, 2],
[1, 0],
[3, 4]]
from my_tensor.
I have to do further process over them, so I would like to have them at the same time (I don't know if it is even possible) and in an efficient way; however, I could not come up with any other methods.
I appreciate any help :)
The trick is to first convert your set of indices into a boolean mask which you can then use to reduce my_tensor as you have described using the boolean_mask operation.
You can accomplish this by one-hot encoding the idx tensor.
So, where idx = [2, 0, 4] we can do tf.one_hot(idx, seq_length) in order to convert it to something like this:
[ [0., 0., 1., 0., 0.],
[1., 0., 0., 0., 0.],
[0., 0., 0., 0., 1.] ]
Then, putting it all together for, say my_tensor:
[ [0.6413697 , 0.4079175 , 0.42499018, 0.3037368 , 0.8580252 ],
[0.8698617 , 0.29096508, 0.11531639, 0.25421357, 0.5844104 ],
[0.6442119 , 0.31816053, 0.6245482 , 0.7249261 , 0.7595779 ] ]
we can proceed as follows:
result = tf.boolean_mask(my_tensor, tf.one_hot(idx,seq_length))
to give:
[0.42499018, 0.8698617 , 0.7595779 ]
as expected
Given a 2D array, I'm looking for a pythonic way to get an array of same shape, with only the maximum element per each row.
See max_row_filter function below
def max_row_filter(mat2d):
m = np.zeros(mat2d.shape)
for r in range(mat2d.shape[0]):
c = np.argmax(mat2d[r])
m[r,c]=mat2d[r,c]
return m
p = np.array([[1,2,3],[5,4,3,],[9,10,3]])
max_row_filter(p)
Out: array([[ 0., 0., 3.],
[ 5., 0., 0.],
[ 0., 10., 0.]])
I'm looking for an efficient way to do this, suitable to be done on big arrays.
Alternative answer (this will keep duplicates):
p * (p==p.max(axis=1, keepdims=True))
If there are no duplicates, you could use numpy.argmax:
import numpy as np
p = np.array([[1, 2, 3],
[5, 4, 3, ],
[9, 10, 3]])
result = np.zeros_like(p)
rows, cols = zip(*enumerate(np.argmax(p, axis=1)))
result[rows, cols] = p[rows, cols]
print(result)
Output
[[ 0 0 3]
[ 5 0 0]
[ 0 10 0]]
Note that, for multiple occurrences argmax return the first occurence.
I'm having trouble understanding a basic concept with tensorflow. How does indexing work for tensor read/write operations? In order to make this specific, how can the following numpy examples be translated to tensorflow (using tensors for the arrays, indices and values being assigned):
x = np.zeros((3, 4))
row_indices = np.array([1, 1, 2])
col_indices = np.array([0, 2, 3])
x[row_indices, col_indices] = 2
x
with output:
array([[ 0., 0., 0., 0.],
[ 2., 0., 2., 0.],
[ 0., 0., 0., 2.]])
... and ...
x[row_indices, col_indices] = np.array([5, 4, 3])
x
with output:
array([[ 0., 0., 0., 0.],
[ 5., 0., 4., 0.],
[ 0., 0., 0., 3.]])
... and finally ...
y = x[row_indices, col_indices]
y
with output:
array([ 5., 4., 3.])
There's github issue #206 to support this nicely, meanwhile you have to resort to verbose work-arounds
The first example can be done with tf.select that combines two same-shaped tensors by selecting each element from one or the other
tf.reset_default_graph()
row_indices = tf.constant([1, 1, 2])
col_indices = tf.constant([0, 2, 3])
x = tf.zeros((3, 4))
sess = tf.InteractiveSession()
# get list of ((row1, col1), (row2, col2), ..)
coords = tf.transpose(tf.pack([row_indices, col_indices]))
# get tensor with 1's at positions (row1, col1),...
binary_mask = tf.sparse_to_dense(coords, x.get_shape(), 1)
# convert 1/0 to True/False
binary_mask = tf.cast(binary_mask, tf.bool)
twos = 2*tf.ones(x.get_shape())
# make new x out of old values or 2, depending on mask
x = tf.select(binary_mask, twos, x)
print x.eval()
gives
[[ 0. 0. 0. 0.]
[ 2. 0. 2. 0.]
[ 0. 0. 0. 2.]]
The second one could be done with scatter_update, except scatter_update only supports on linear indices and works on variables. So you could create a temporary variable and use reshaping like this. (to avoid variables you could use dynamic_stitch, see the end)
# get linear indices
linear_indices = row_indices*x.get_shape()[1]+col_indices
# turn 'x' into 1d variable since "scatter_update" supports linear indexing only
x_flat = tf.Variable(tf.reshape(x, [-1]))
# no automatic promotion, so make updates float32 to match x
updates = tf.constant([5, 4, 3], dtype=tf.float32)
sess.run(tf.initialize_all_variables())
sess.run(tf.scatter_update(x_flat, linear_indices, updates))
# convert back into original shape
x = tf.reshape(x_flat, x.get_shape())
print x.eval()
gives
[[ 0. 0. 0. 0.]
[ 5. 0. 4. 0.]
[ 0. 0. 0. 3.]]
Finally the third example is already supported with gather_nd, you write
print tf.gather_nd(x, coords).eval()
To get
[ 5. 4. 3.]
Edit, May 6
The update x[cols,rows]=newvals can be done without using Variables (which occupy memory between session run calls) by using select with sparse_to_dense that takes vector of sparse values, or relying on dynamic_stitch
sess = tf.InteractiveSession()
x = tf.zeros((3, 4))
row_indices = tf.constant([1, 1, 2])
col_indices = tf.constant([0, 2, 3])
# no automatic promotion, so specify float type
replacement_vals = tf.constant([5, 4, 3], dtype=tf.float32)
# convert to linear indexing in row-major form
linear_indices = row_indices*x.get_shape()[1]+col_indices
x_flat = tf.reshape(x, [-1])
# use dynamic stitch, it merges the array by taking value either
# from array1[index1] or array2[index2], if indices conflict,
# the later one is used
unchanged_indices = tf.range(tf.size(x_flat))
changed_indices = linear_indices
x_flat = tf.dynamic_stitch([unchanged_indices, changed_indices],
[x_flat, replacement_vals])
x = tf.reshape(x_flat, x.get_shape())
print x.eval()
So I'm trying to to take the dot product of two arrays using numpy's dot product function.
import numpy as np
MWFrPos_Hydro1 = subPos1[submaskFirst1]
x = MWFrPos_Hydro1
MWFrVel_Hydro1 = subVel1[submaskFirst1]
y = MWFrVel_Hydro1
MWFrPosMag_Hydro1 = [np.linalg.norm(i) for i in MWFrPos_Hydro1]
np.dot(x, y)
returns
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-135-9ef41eb4235d> in <module>()
6
7
----> 8 np.dot(x, y)
ValueError: shapes (1220,3) and (1220,3) not aligned: 3 (dim 1) != 1220 (dim 0)
And I using this function improperly?
The arrays look like this
print x
[[ 51.61872482 106.19775391 69.64765167]
[ 33.86419296 11.75729942 11.84990311]
[ 12.75009823 58.95491028 38.06708527]
...,
[ 99.00266266 96.0210495 18.79844856]
[ 27.18083954 74.35041809 78.07577515]
[ 19.29788399 82.16114044 1.20453501]]
print y
[[ 40.0402298 -162.62153625 -163.00158691]
[-359.41983032 -115.39328766 14.8419466 ]
[ 95.92044067 -359.26425171 234.57330322]
...,
[ 130.17840576 -7.00977898 42.09699249]
[ 37.37852478 -52.66002655 -318.15155029]
[ 126.1726532 121.3104248 -416.20855713]]
Would for looping np.vdot be more optimal in this circumstance?
You can't take the dot product of two n * m matrices unless m == n -- when multiplying two matrices, A and B, B needs to have as many columns as A has rows. (So you can multiply an n * m matrix with an m * n matrix.)
See this article on multiplying matrices.
Some possible products for (n,3) arrays (here I'll just one)
In [434]: x=np.arange(12.).reshape(4,3)
In [435]: x
Out[435]:
array([[ 0., 1., 2.],
[ 3., 4., 5.],
[ 6., 7., 8.],
[ 9., 10., 11.]])
element by element product, summed across the columns; n values. This is a magnitude like number.
In [436]: (x*x).sum(axis=1)
Out[436]: array([ 5., 50., 149., 302.])
Same thing with einsum, which gives more control over which axes are multiplied, and which are summed.
In [437]: np.einsum('ij,ij->i',x,x)
Out[437]: array([ 5., 50., 149., 302.])
dot requires last of the 1st and 2nd last of 2nd to have the same size, so I have to use x.T (transpose). The diagonal matches the above.
In [438]: np.dot(x,x.T)
Out[438]:
array([[ 5., 14., 23., 32.],
[ 14., 50., 86., 122.],
[ 23., 86., 149., 212.],
[ 32., 122., 212., 302.]])
np.einsum('ij,kj',x,x) does the same thing.
There is a new matmul product, but with 2d arrays like this it is just dot. I have to turn them into 3d arrays to get the 4 values; and even with that I have to squeeze out excess dimensions:
In [450]: x[:,None,:]#x[:,:,None]
Out[450]:
array([[[ 5.]],
[[ 50.]],
[[ 149.]],
[[ 302.]]])
In [451]: np.squeeze(_)
Out[451]: array([ 5., 50., 149., 302.])
For example, given:
import numpy as np
data = np.array(
[[0, 0, 0],
[0, 1, 1],
[1, 0, 1],
[1, 0, 1],
[0, 1, 1],
[0, 0, 0]])
I want to get a 3-dimensional array, looking like:
result = array([[[ 2., 0.],
[ 0., 2.]],
[[ 0., 2.],
[ 0., 0.]]])
One way is:
for row in data
newArray[ row[0] ][ row[1] ][ row[2] ] += 1
What I'm trying to do is the following:
for i in dimension1
for j in dimension2
for k in dimension3
result[i,j,k] = (data[data[data[:,0]==i, 1]==j, 2]==k).sum()
This doesn't seem to work and I would like to achieve the desired result by sticking to my implementation rather than the one mentioned in the beginning (or using any extra imports, eg counter).
Thanks.
You can also use numpy.histogramdd for this:
>>> np.histogramdd(data, bins=(2, 2, 2))[0]
array([[[ 2., 0.],
[ 0., 2.]],
[[ 0., 2.],
[ 0., 0.]]])
The problem is that data[data[data[:,0]==i, 1]==j, 2]==k is not what you expect it to be.
Let's take this apart for the case (i,j,k) == (0,0,0)
data[:,0]==0 is [True, True, False, False, True, True], and data[data[:,0]==0] correctly gives us the lines where the first number is 0.
Now from those lines we get the lines where the second number is 0: data[data[:,0]==0, 1]==0, which gives us [True, False, False, True]. And this is the problem. Because if we take those indices from data, i.e., data[data[data[:,0]==0, 1]==0] we do not get the rows where the first and second number are 0, but the 0th and 3rd row instead:
In [51]: data[data[data[:,0]==0, 1]==0]
Out[51]: array([[0, 0, 0],
[1, 0, 1]])
And if we now filter for the rows where the third number is 0, we get the wrong result w.r.t. the orignal data.
And that's why your approach does not work. For better methods, see the other answers.
You can do something like the following
#Get output dimension and construct output array.
>>> dshape = tuple(data.max(axis=0)+1)
>>> dshape
(2, 2, 2)
>>> out = np.zeros(shape)
If you have numpy 1.8+:
out.flat[np.ravel_multi_index(data.T, dshape)]+=1
Else:
#Get indices and unique the resulting array
>>> inds = np.ravel_multi_index(data.T, dshape)
>>> inds, inverse = np.unique(inds, return_inverse=True)
>>> values = np.bincount(inverse)
>>> values
array([2, 2, 2])
>>> out.flat[inds] = values
>>> out
array([[[ 2., 0.],
[ 0., 2.]],
[[ 0., 2.],
[ 0., 0.]]])
Numpy versions before numpy 1.7 do not have a add.at attribute and the top code will not work without it. As ravel_multi_index may not be the fastest algorithm ever you can look into taking the unique rows of a numpy array. In effect these two operations should be equivalent.
Don't fear the imports. They're what make Python awesome.
If question assumes that you already have the result matrix.
import numpy as np
data = np.array(
[[0, 0, 0],
[0, 1, 1],
[1, 0, 1],
[1, 0, 1],
[0, 1, 1],
[0, 0, 0]]
)
result = np.zeros((2,2,2))
# range of each dim, aka allowable values for each dim
dim_ranges = zip(np.zeros(result.ndim), np.array(result.shape)-1)
dim_ranges
# Out[]:
# [(0.0, 2), (0.0, 2), (0.0, 2)]
# Multidimentional histogram will effectively "count" along each dim
sums,_ = np.histogramdd(data,bins=result.shape,range=dim_ranges)
result += sums
result
# Out[]:
# array([[[ 2., 0.],
# [ 0., 2.]],
#
# [[ 0., 2.],
# [ 0., 0.]]])
This solution solves for any "result" ndarray, no matter what the shape. Additionally, it works fine even if your "data" ndarray has indices which are out-of-bounds for your result matrix.