I am trying to find the row and column index in a 2d numpy array where the value lies in a range.
Though I am able to accomplish this with the following code, I would like only one occurrence to be encountered in a matrix where a ij = a ji:
In [118]: test_arr = np.array([[1, 0.2, 0.04],
...: [0.2, 0.3, 0.06 ],
...: [0.04, 0.06, 0.09]
...: ])
...:
In [119]: test_arr
Out[119]:
array([[1. , 0.2 , 0.04],
[0.2 , 0.3 , 0.06],
[0.04, 0.06, 0.09]])
In [120]: np.argwhere((test_arr==0.06))
Out[120]:
array([[1, 2],
[2, 1]])
Is there any way using numpy where we can restrict i<j so that the output will only be as:
array([[1, 2]])
Any help is appreciated!
In [38]: In [118]: test_arr = np.array([[1, 0.2, 0.04],
...: ...: [0.2, 0.3, 0.06 ],
...: ...: [0.04, 0.06, 0.09]
...: ...: ])
In [39]: test_arr
Out[39]:
array([[1. , 0.2 , 0.04],
[0.2 , 0.3 , 0.06],
[0.04, 0.06, 0.09]])
In [40]: np.where(test_arr==0.06)
Out[40]: (array([1, 2]), array([2, 1]))
Let's explore using one of the tri functions to set some of the values of the array to 0:
In [41]: np.tril(test_arr)
Out[41]:
array([[1. , 0. , 0. ],
[0.2 , 0.3 , 0. ],
[0.04, 0.06, 0.09]])
In [42]: np.triu(test_arr)
Out[42]:
array([[1. , 0.2 , 0.04],
[0. , 0.3 , 0.06],
[0. , 0. , 0.09]])
Now apply the equality test:
In [44]: np.triu(test_arr)==0.06
Out[44]:
array([[False, False, False],
[False, False, True],
[False, False, False]])
In [45]: np.argwhere(np.triu(test_arr)==0.06)
Out[45]: array([[1, 2]])
Related
I have an numpy array. I want to normalized each rows based on this formula
x_norm = (x-x_min)/(x_max-x_min)
, where x_min is the minimum of each row and x_max is the maximum of each row. Here is a simple example:
a = np.array(
[[0, 1 ,2],
[2, 4 ,7],
[6, 10,5]
])
and desired output:
a = np.array([
[0, 0.5 ,1],
[0, 0.4 ,1],
[0.2, 1 ,0]
])
Thank you
IIUC, you can use raw numpy operations:
x = np.array(
[[0, 1 ,2],
[2, 4 ,7],
[6, 10,5]
])
x_norm = ((x.T-x.min(1))/(x.max(1)-x.min(1))).T
# OR
x_norm = (x-x.min(1)[:,None])/(x.max(1)-x.min(1))[:,None]
output:
array([[0. , 0.5, 1. ],
[0. , 0.4, 1. ],
[0.2, 1. , 0. ]])
NB. if efficiency matters, save the result of x.min(1) in a variable as it is used twice
You could use np.apply_along_axis
a = np.array(
[[0, 1 ,2],
[2, 4 ,7],
[6, 10,5]
])
def scaler(x):
return (x-x.min())/(x.max()-x.min())
np.apply_along_axis(scaler, axis=1, arr=a)
Output:
array([[0. , 0.5, 1. ],
[0. , 0.4, 1. ],
[0.2, 1. , 0. ]])
I have an array, x=[2, 3, 4, 3, 2] which contains the states of model and another array which gives corresponding probabilities of these states, prob=[.2, .1, .4, .1, .2]. But some states are duplicated and I need to sum their corresponding probabilities. So my desired outputs are: unique_elems=[2, 3, 4] and reduced_prob=[.2+.2, .1+.1, .4]. Here is my approach:
x = tf.constant([2, 3, 4, 3, 2])
prob = tf.constant([.2, .1, .4, .1, .2])
unique_elems, _ = tf.unique(x) # [2, 3, 4]
unique_elems = tf.expand_dims(unique_elems, axis=1) # [[2], [3], [4]]
tiled_prob = tf.tile(tf.expand_dims(prob, axis=0), [3, 1])
# [[0.2, 0.1, 0.4, 0.1, 0.2],
# [0.2, 0.1, 0.4, 0.1, 0.2],
# [0.2, 0.1, 0.4, 0.1, 0.2]]
equal = tf.equal(x, unique_elems)
# [[ True, False, False, False, True],
# [False, True, False, True, False],
# [False, False, True, False, False]]
reduced_prob = tf.multiply(tiled_prob, tf.cast(equal, tf.float32))
# [[0.2, 0. , 0. , 0. , 0.2],
# [0. , 0.1, 0. , 0.1, 0. ],
# [0. , 0. , 0.4, 0. , 0. ]]
reduced_prob = tf.reduce_sum(reduced_prob, axis=1)
# [0.4, 0.2, 0.4]
but I am wondering whether there is a more efficient way to do that. In particular I am using tile operation which I think is not very efficient for large arrays.
It can be done in two lines by tf.unsorted_segment_sum:
unique_elems, idx = tf.unique(x) # [2, 3, 4]
reduced_prob = tf.unsorted_segment_sum(prob, idx, tf.size(unique_elems))
I would like calculate the sum of two in two column in a matrix(the sum between the columns 0 and 1, between 2 and 3...).
So I tried to do nested "for" loops but at every time I haven't the good results.
For example:
c = np.array([[0,0,0.25,0.5],[0,0.5,0.25,0],[0.5,0,0,0]],float)
freq=np.zeros(6,float).reshape((3, 2))
#I calculate the sum between the first and second column, and between the fird and the fourth column
for i in range(0,4,2):
for j in range(1,4,2):
for p in range(0,2):
freq[:,p]=(c[:,i]+c[:,j])
But the result is:
print freq
array([[ 0.75, 0.75],
[ 0.25, 0.25],
[ 0. , 0. ]])
Normaly the good result must be (0., 0.5,0.5) and (0.75,0.25,0). So I think the problem is in the nested "for" loops.
Is there a person who know how I can calculate the sum every two columns, because I have a matrix with 400 columns?
You can simply reshape to split the last dimension into two dimensions, with the last dimension of length 2 and then sum along it, like so -
freq = c.reshape(c.shape[0],-1,2).sum(2).T
Reshaping only creates a view into the array, so effectively, we are just using the summing operation here and as such must be efficient.
Sample run -
In [17]: c
Out[17]:
array([[ 0. , 0. , 0.25, 0.5 ],
[ 0. , 0.5 , 0.25, 0. ],
[ 0.5 , 0. , 0. , 0. ]])
In [18]: c.reshape(c.shape[0],-1,2).sum(2).T
Out[18]:
array([[ 0. , 0.5 , 0.5 ],
[ 0.75, 0.25, 0. ]])
Add the slices c[:, ::2] and c[:, 1::2]:
In [62]: c
Out[62]:
array([[ 0. , 0. , 0.25, 0.5 ],
[ 0. , 0.5 , 0.25, 0. ],
[ 0.5 , 0. , 0. , 0. ]])
In [63]: c[:, ::2] + c[:, 1::2]
Out[63]:
array([[ 0. , 0.75],
[ 0.5 , 0.25],
[ 0.5 , 0. ]])
Here is one way using np.split():
In [36]: np.array(np.split(c, np.arange(2, c.shape[1], 2), axis=1)).sum(axis=-1)
Out[36]:
array([[ 0. , 0.5 , 0.5 ],
[ 0.75, 0.25, 0. ]])
Or as a more general way even for odd length arrays:
In [87]: def vertical_adder(array):
return np.column_stack([np.sum(arr, axis=1) for arr in np.array_split(array, np.arange(2, array.shape[1], 2), axis=1)])
....:
In [88]: vertical_adder(c)
Out[88]:
array([[ 0. , 0.75],
[ 0.5 , 0.25],
[ 0.5 , 0. ]])
In [94]: a
Out[94]:
array([[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14]])
In [95]: vertical_adder(a)
Out[95]:
array([[ 1, 5, 4],
[11, 15, 9],
[21, 25, 14]])
I am porting some matlab code to python using numpy and I have the following matlab command:
[xgrid,ygrid]=meshgrid(linspace(-0.5,0.5, GridSize-1), ...
linspace(-0.5,0.5, GridSize-1));
Now, this is fine in 2D but I would like to extend this to n-dimensional. So depending on the input data, GridSize can be a 2, 3 or 4 dimensional vector. So, in 2D this would be:
[xgrid, grid] = np.meshgrid(np.linspace(-0.5,0.5, GridSize[0]),
np.linspace(-0.5,0.5, GridSize[1]));
However, I do not know the dimensions of the input before, so is it possible to rewrite this expression, so that it can generate grids with arbitrary number of dimensions?
You could use loop comprehension to generate all 1D arrays and then use np.meshgrid on all those with * operator that internally does unpacking of argument lists, which is equivalent of MATLAB's comma separated lists, like so -
allG = [np.linspace(-0.5,0.5, G) for G in GridSize]
out = np.meshgrid(*allG)
Sample runs
1) 2D Case :
In [27]: GridSize = [3,4]
In [28]: allG = [np.linspace(-0.5,0.5, G) for G in GridSize]
...: out = np.meshgrid(*allG)
...:
In [29]: out[0]
Out[29]:
array([[-0.5, 0. , 0.5],
[-0.5, 0. , 0.5],
[-0.5, 0. , 0.5],
[-0.5, 0. , 0.5]])
In [30]: out[1]
Out[30]:
array([[-0.5 , -0.5 , -0.5 ],
[-0.16666667, -0.16666667, -0.16666667],
[ 0.16666667, 0.16666667, 0.16666667],
[ 0.5 , 0.5 , 0.5 ]])
2) 3D Case :
In [51]: GridSize = [3,4,2]
In [52]: allG = [np.linspace(-0.5,0.5, G) for G in GridSize]
...: out = np.meshgrid(*allG)
...:
In [53]: out[0]
Out[53]:
array([[[-0.5, -0.5],
[ 0. , 0. ],
[ 0.5, 0.5]], ...
[[-0.5, -0.5],
[ 0. , 0. ],
[ 0.5, 0.5]]])
In [54]: out[1]
Out[54]:
array([[[-0.5 , -0.5 ], ...
[[ 0.16666667, 0.16666667],
[ 0.16666667, 0.16666667],
[ 0.16666667, 0.16666667]],
[[ 0.5 , 0.5 ],
[ 0.5 , 0.5 ],
[ 0.5 , 0.5 ]]])
In [55]: out[2]
Out[55]:
array([[[-0.5, 0.5], ....
[[-0.5, 0.5],
[-0.5, 0.5],
[-0.5, 0.5]]])
I am trying to do the following but with numpy arrays:
x = [(0.1, 1.), (0.1, 2.), (0.1, 3.), (0.1, 4.), (0.1, 5.)]
normal_result = zip(*x)
This should give a result of:
normal_result = [(0.1, 0.1, 0.1, 0.1, 0.1), (1., 2., 3., 4., 5.)]
But if the input vector is a numpy array:
y = np.array(x)
numpy_result = zip(*y)
print type(numpy_result)
It (expectedly) returns a:
<type 'list'>
The issue is that I will need to transform the result back into a numpy array after this.
What I would like to know is what is if there is an efficient numpy function that will avoid these back-and-forth transformations?
You can just transpose it...
>>> a = np.array([(0.1, 1.), (0.1, 2.), (0.1, 3.), (0.1, 4.), (0.1, 5.)])
>>> a
array([[ 0.1, 1. ],
[ 0.1, 2. ],
[ 0.1, 3. ],
[ 0.1, 4. ],
[ 0.1, 5. ]])
>>> a.T
array([[ 0.1, 0.1, 0.1, 0.1, 0.1],
[ 1. , 2. , 3. , 4. , 5. ]])
Try using dstack:
>>> from numpy import *
>>> a = array([[1,2],[3,4]]) # shapes of a and b can only differ in the 3rd dimension (if present)
>>> b = array([[5,6],[7,8]])
>>> dstack((a,b)) # stack arrays along a third axis (depth wise)
array([[[1, 5],
[2, 6]],
[[3, 7],
[4, 8]]])
so in your case it would be:
x = [(0.1, 1.), (0.1, 2.), (0.1, 3.), (0.1, 4.), (0.1, 5.)]
y = np.array(x)
np.dstack(y)
>>> array([[[ 0.1, 0.1, 0.1, 0.1, 0.1],
[ 1. , 2. , 3. , 4. , 5. ]]])