Issue with true division with Numpy arrays - python

Suppose you have this array:
In [29]: a = array([[10, 20, 30, 40, 50], [14, 28, 42, 56, 70], [18, 36, 54, 72, 90]])
Out[30]: a
array([[ 0, 0, 0, 0, 0],
[14, 28, 42, 56, 70],
[18, 36, 54, 72, 90]])
Now divide the third row by the first one (using from future import division)
In [32]: a[0]/a[2]
Out[32]: array([ 0.55555556, 0.55555556, 0.55555556, 0.55555556, 0.55555556])
Now do the same with each row in a loop:
In [33]: for i in range(3):
print a[i]/a[2]
[ 0.55555556 0.55555556 0.55555556 0.55555556 0.55555556]
[ 0.77777778 0.77777778 0.77777778 0.77777778 0.77777778]
[ 1. 1. 1. 1. 1.]
Everything looks right. But now, assign the first array a[i]/a[2] to a[i]:
In [35]: for i in range(3):
a[i]/=a[2]
....:
In [36]: a
Out[36]:
array([[0, 0, 0, 0, 0],
[0, 0, 0, 0, 0],
[1, 1, 1, 1, 1]])
Alright, no problem. Turns out this is by design. Instead, we should do:
In [38]: for i in range(3):
a[i] = a[i]/a[2]
....:
In [39]: a
Out[39]:
array([[0, 0, 0, 0, 0],
[0, 0, 0, 0, 0],
[1, 1, 1, 1, 1]])
But that doesn't work. Why and how can I fix it?
Thanks in advance.

You can cast the whole array to a float array first:
a = a.astype('float')
a /= a[2]

"Why doesn't this work" -- The reason it doesn't work is because numpy arrays have a datatype when they're created. Any attempt to put a different type into that array will be cast to the appropriate type. In other words, when you try to put a float into your integer array, numpy casts the float to an int. The reasoning behind this is because numpy arrays are designed to be a homogonous type in order for them to have optimal performance. Put another way, they're implemented as arrays in C. And in C, you can't have an array where 1 element is a float and the next is an int. (You can have structs which behave like that, but they're not arrays).
Another solution (in addition to the one proposed by #nneonneo) is to specify the array as a float array from the beginning:
a = array([[10, 20, 30, 40, 50], [14, 28, 42, 56, 70], [18, 36, 54, 72, 90]], dtype=float)

It's not the division that's the issue it's the assignment, ie a[i] = ... (which is also used behind the scene when you do a /= ...). Try this:
>>> a = np.zeros(3, dtype='uint8')
>>> a[:] = [2, -3, 5.9]
>>> print a
[ 2 253 5]
When you do intarray[i] = floatarray[i] numpy has to truncate the floating point values to get them to fit into intarray.

Related

Alternatives for numpy.random generation with choice values and specific frequency of values

I am working in generating an (1109, 8) array with random values generated from a fixed set of numbers [18, 24, 36, 0], I need to ensure each row contains 5 zeros at all times, but it wasn't happening even after adjusting the weightings for probabilities.
My workaround code is below but wanted to know if there is an easier way with another function? or perhaps by adjusting some of the parameters of the generator?
https://numpy.org/doc/stable/reference/random/generator.html
#Random output using new method
from numpy.random import default_rng
rng = default_rng(1)
#generate an array with random values of test duration,
test_duration = rng.choice([18, 24, 36, 0], size = arr.shape, p=[0.075, 0.1, 0.2, 0.625])
# ensure number of tests equals n_tests
n_tests = 3
non_tested = arr.shape[1] - n_tests
for row in range(len(test_duration)):
while np.count_nonzero(test_duration[row, :]) != n_tests:
new_test = rng.choice([18, 24, 36, 0], size = arr.shape[1], p=[0.075, 0.1, 0.2, 0.625])
test_duration[row, :] = np.array(new_test)
else:
pass
print('There are no days exceeding n_tests')
#print(test_durations)
print(test_duration[:10, :])
If you need 5 zeros in every row, you can just randomly select 3 values from [18, 24, 36], pad the rest with zeros and then do a per-row random shuffle. The numpy shuffle happens in-place, so you don't need to reassign.
import numpy as np
c = [18,24,26]
p = np.array([0.075, 0.1, 0.2])
p = p / p.sum() # normalize the probs
a = np.random.choice(c, size=(1109, 3), replace=True, p=(p/p.sum()))
a = np.hstack([a, np.zeros((1109, 5), dtype=np.int32)])
list(map(np.random.shuffle, a))
a
# returns:
array([[ 0, 0, 0, 0, 36, 0, 36, 36],
[ 0, 36, 0, 24, 24, 0, 0, 0],
[ 0, 0, 0, 0, 36, 36, 36, 0]])
...
[ 0, 0, 0, 24, 24, 36, 0, 0],
[ 0, 24, 0, 0, 0, 36, 0, 18],
[ 0, 0, 0, 36, 36, 24, 0, 0]])
You could simply create a random choice for the 5 positions of the zeros in the array, this way you would enforce that there are indeed 5 zeros, and after you sample the [18, 24, 36] with their normalized probabilities.
But by doing this you are not respecting the probability density that you specified in the first place, I don't know in which application you're using this for but this is a point to consider.

Efficiently zero out all but largest n elements for each image pixel

So I have an image I of size (H x W x C), where C is some number of channels. The challenge is to obtain a new image J, again of size (H x W x C), in which J[i, j] contains only the maximum n entries in I[i, j].
Equivalently, think about iterating through each image pixel in I and zero-ing out all but the highest n entries.
What I've tried:
# NOTE: bone_weight_matrix is a matrix of size (256 x 256 x 43)
argsort_four = np.argsort(bone_weight_matrix, axis=2)[:, :, -4:]
# For each pixel, retain only the top four influencing bone weights
proc_matrix = np.zeros(bone_weight_matrix.shape)
for i in range(bone_weight_matrix.shape[0]):
for j in range(bone_weight_matrix.shape[1]):
proc_matrix[i, j, argsort_four[i, j]] = bone_weight_matrix[i, j, argsort_four[i, j]]
return proc_matrix
Problem is this method seems to be super slow and doesn't feel very pythonic. Any advice would be great.
Cheers.
Generic case : Keeping largest or smallest n elements along an axis
Basically two steps would be involved :
Get those n indices to be kept along the specified axis with np.argparition.
Initialize a zeros array and use those earlier obtained indices with advanced-indexing to select from the input array as well as assign into the zeros array.
Let's try to solve for a generic problem that works to select n elements along the specified axis and also be able to keep largest n as well as smallest n elements.
The implementation would look like this -
def keep(ar, n, axis=-1, order='largest'):
axis = np.core.multiarray.normalize_axis_index(axis, ar.ndim)
slice_l = [slice(None, None, None)]*ar.ndim
if order=='largest':
slice_l[axis] = slice(-n,None,None)
idx = np.argpartition(ar, kth=-n, axis=axis)[slice_l]
elif order=='smallest':
slice_l[axis] = slice(None,n,None)
idx = np.argpartition(ar, kth=n, axis=axis)[slice_l]
else:
raise Exception('Invalid order value')
grid = np.ogrid[tuple(map(slice, ar.shape))]
grid[axis] = idx
out = np.zeros_like(ar)
out[grid] = ar[grid]
return out
Sample runs
Input array :
In [208]: np.random.seed(0)
...: I = np.random.randint(11,99,(2,2,6))
In [209]: I
Out[209]:
array([[[55, 58, 75, 78, 78, 20],
[94, 32, 47, 98, 81, 23]],
[[69, 76, 50, 98, 57, 92],
[48, 36, 88, 83, 20, 31]]])
Keep largest 2 elements along last axis :
In [210]: keep(I, n=2, axis=-1, order='largest')
Out[210]:
array([[[ 0, 0, 0, 78, 78, 0],
[94, 0, 0, 98, 0, 0]],
[[ 0, 0, 0, 98, 0, 92],
[ 0, 0, 88, 83, 0, 0]]])
Keep largest 1 element along first axis :
In [211]: keep(I, n=1, axis=1, order='largest')
Out[211]:
array([[[ 0, 58, 75, 0, 0, 0],
[94, 0, 0, 98, 81, 23]],
[[69, 76, 0, 98, 57, 92],
[ 0, 0, 88, 0, 0, 0]]])
Keep smallest 2 elements along last axis :
In [212]: keep(I, n=2, axis=-1, order='smallest')
Out[212]:
array([[[55, 0, 0, 0, 0, 20],
[ 0, 32, 0, 0, 0, 23]],
[[ 0, 0, 50, 0, 57, 0],
[ 0, 0, 0, 0, 20, 31]]])

Converting Matrix Definition to Zero-Indexed Notation - Numpy

I am trying to construct a numpy array (a 2-dimensional numpy array - i.e. a matrix) from a paper that uses a non-standard indexing to construct the matrix. I.e. the top left element is q1,2. instead of q0,0.
Define the n x (n-2) matrix Q by its elements qi,j for i = i,...,n and j = 2, ... , n-1 given by
qj-1,j=h-1j-1, qj,j = h-1j-1 - h-1j and qj+1,j=hjj-1. (I have posted this in Latex form here: http://www.texpaste.com/n/8vwds4fx)
I have tried to implement in python like this:
# n = u_s.size
# n = 299 for this example
n = 299
Q = np.zeros((n,n-2))
for i in range(0,n+1):
for j in range(2,n):
Q[j-1,j] = 1.0/h[j-1]
Q[j,j] = -1.0/h[j-1] - 1.0/h[j]
Q[j+1,j] = 1.0/h[j]
But I always get the error:
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
<ipython-input-54-c07a3b1c81bb> in <module>()
1 for i in range(1,n+1):
2 for j in range(2,n-1):
----> 3 Q[j-1,j] = 1.0/h[j-1]
4 Q[j,j] = -1.0/h[j-1] - 1.0/h[j]
5 Q[j+1,j] = 1.0/h[j]
IndexError: index 297 is out of bounds for axis 1 with size 297
I initially thought I could decrement both i and j in my for loop to keep edge cases safe, as a quick way to move to zero-indexed notation, but this hasn't worked. I also tried incrementing and modifying the range().
Is there a way to convert this definition to one that python can handle? Is this a common issue?
Simplifying the problem to make the assignment pattern obvious:
In [228]: h=np.arange(10,15)
In [229]: Q=np.zeros((5,5),int)
In [230]: for j in range(1,5):
...: Q[j-1:j+2,j] = h[j-1:j+2]
In [231]: Q
Out[231]:
array([[ 0, 10, 0, 0, 0],
[ 0, 11, 11, 0, 0],
[ 0, 12, 12, 12, 0],
[ 0, 0, 13, 13, 13],
[ 0, 0, 0, 14, 14]])
Assignment to the partial first and last columns may need tweaking. Here's the equivalent built from diagonals:
In [232]: np.diag(h,0)+np.diag(h[:-1],1)+np.diag(h[1:],-1)
Out[232]:
array([[10, 10, 0, 0, 0],
[11, 11, 11, 0, 0],
[ 0, 12, 12, 12, 0],
[ 0, 0, 13, 13, 13],
[ 0, 0, 0, 14, 14]])
With the h[j-1], h[j] indexing this diagonal assignment probably needs tweaking, but it should be a useful starting point.
Selecting h values more like what you use (skipping the 1/h for now):
In [238]: Q=np.zeros((5,5),int)
In [239]: for j in range(1,4):
...: Q[j-1:j+2,j] =[h[j-1],h[j-1]+h[j], h[j]]
...:
In [240]: Q
Out[240]:
array([[ 0, 10, 0, 0, 0],
[ 0, 21, 11, 0, 0],
[ 0, 11, 23, 12, 0],
[ 0, 0, 12, 25, 0],
[ 0, 0, 0, 13, 0]])
I'm skipping the two partial end columns for now. The first slicing approach allowed me to be a bit sloppy, since it's ok to slice 'off the end'. The end columns, if set, will require their own expressions.
In [241]: j=0; Q[j:j+2,j] =[h[j], h[j]]
In [242]: j=4; Q[j-1:j+1,j] =[h[j-1],h[j-1]+h[j]]
In [243]: Q
Out[243]:
array([[10, 10, 0, 0, 0],
[10, 21, 11, 0, 0],
[ 0, 11, 23, 12, 0],
[ 0, 0, 12, 25, 13],
[ 0, 0, 0, 13, 27]])
The relevant diagonal pieces are still evident:
In [244]: h[1:]+h[:-1]
Out[244]: array([21, 23, 25, 27])
The equation doesn't contain any value for i. It is referring only to j. The Q should be a matrix of dimension n+2 x n+2. For j = 1, it refers to Q[0,1], Q[1,1] and Q[2,1]. for j =n, it refers to Q[n-1,n], Q[n,n] and Q[n+1,n]. So, Q should have indices from 0 to n+1 which n+2
I don't think, you require the i loop. You can achieve your results only with j loop from 1 to n, but Q should be from 0 to n+1

Multiply NumPy ndarray with every element in another binary ndarray of different size

I have two ndarrays :
a = [[30,40],
[60,90]]
b = [[0,0,1],
[1,0,1],
[1,1,1]]
please notice that a shape might be larger but always square array (50,50) , (100,100)
The wanted result is :
Result = [[a*0,a*0,a*1],
[[a*1,a*0,a*1],
[[a*1,a*1,a*1]]
I managed to get the right answer with this code but I think there would be a built in function in numpy that accomplish this task in fast manners
totalrows=[]
for row in range(b.shape[0]):
cells=[]
for column in range(b.shape[1]):
print row,column
cells.append(b[row,column]*a)
totalrows.append(np.concatenate(cells,axis=1))
return np.concatenate(totalrows,axis=0)
Indeed there's a NumPy built-in np.kron for such block-based elementwise multiplication problems. To solve your case, it could be used like so -
np.kron(b,a)
Sample run -
In [50]: a
Out[50]:
array([[30, 40],
[60, 90]])
In [51]: b
Out[51]:
array([[0, 0, 1],
[1, 0, 1],
[1, 1, 1]])
In [52]: np.kron(b,a)
Out[52]:
array([[ 0, 0, 0, 0, 30, 40],
[ 0, 0, 0, 0, 60, 90],
[30, 40, 0, 0, 30, 40],
[60, 90, 0, 0, 60, 90],
[30, 40, 30, 40, 30, 40],
[60, 90, 60, 90, 60, 90]])
3D array case
Now, let's say we are working with a as a 3D array (m,n,p) and b as (q,r) and assuming you are looking to perform such a block-wise multiplication iteratively along the last axis of a. Thus, the shapes are to be multiplied along the first two axes on the two inputs to get the output array. To achieve such an output, we need to extend the dimension of b by introducing a singleton dimension as the last axis. The final output would be of shape (m*q,n*r,p*1). The implementation would be simply -
np.kron(b[...,None],a)
Shape check -
In [161]: a = np.random.randint(0,99,(4,5,2))
...: b = np.random.randint(0,99,(6,7))
...:
In [162]: np.kron(b[...,None],a).shape
Out[162]: (24, 35, 2)

Numpy Dot Product of two 2-d arrays in numpy to get 3-d array

Sorry for the badly explained title. I am trying to parallelise a part of my code and got stuck on a dot product. I am looking for an efficient way of doing what the code below does, I'm sure there is a simple linear algebra solution but I'm very stuck:
puy = np.arange(8).reshape(2,4)
puy2 = np.arange(12).reshape(3,4)
print puy, '\n'
print puy2.T
zz = np.zeros([4,2,3])
for i in range(4):
zz[i,:,:] = np.dot(np.array([puy[:,i]]).T,
np.array([puy2.T[i,:]]))
One way would be to use np.einsum, which allows you to specify what you want to happen to the indices:
>>> np.einsum('ik,jk->kij', puy, puy2)
array([[[ 0, 0, 0],
[ 0, 16, 32]],
[[ 1, 5, 9],
[ 5, 25, 45]],
[[ 4, 12, 20],
[12, 36, 60]],
[[ 9, 21, 33],
[21, 49, 77]]])
>>> np.allclose(np.einsum('ik,jk->kij', puy, puy2), zz)
True
Here's another way with broadcasting -
(puy[None,...]*puy2[:,None,:]).T

Categories

Resources