numpy slice an array without copying it - python

I have a large data in matrix x and I need to analyze some some submatrices.
I am using the following code to select the submatrix:
>>> import numpy as np
>>> x = np.random.normal(0,1,(20,2))
>>> x
array([[-1.03266826, 0.04646684],
[ 0.05898304, 0.31834926],
[-0.1916809 , -0.97929025],
[-0.48837085, -0.62295003],
[-0.50731017, 0.50305894],
[ 0.06457385, -0.10670002],
[-0.72573604, 1.10026385],
[-0.90893845, 0.99827162],
[ 0.20714399, -0.56965615],
[ 0.8041371 , 0.21910274],
[-0.65882317, 0.2657183 ],
[-1.1214074 , -0.39886425],
[ 0.0784783 , -0.21630006],
[-0.91802557, -0.20178683],
[ 0.88268539, -0.66470235],
[-0.03652459, 1.49798484],
[ 1.76329838, -0.26554555],
[-0.97546845, -2.41823586],
[ 0.32335103, -1.35091711],
[-0.12981597, 0.27591674]])
>>> index = x[:,1] > 0
>>> index
array([ True, True, False, False, True, False, True, True, False,
True, True, False, False, False, False, True, False, False,
False, True], dtype=bool)
>>> x1 = x[index, :] #x1 is a copy of the submatrix
>>> x1
array([[-1.03266826, 0.04646684],
[ 0.05898304, 0.31834926],
[-0.50731017, 0.50305894],
[-0.72573604, 1.10026385],
[-0.90893845, 0.99827162],
[ 0.8041371 , 0.21910274],
[-0.65882317, 0.2657183 ],
[-0.03652459, 1.49798484],
[-0.12981597, 0.27591674]])
>>> x1[0,0] = 1000
>>> x1
array([[ 1.00000000e+03, 4.64668400e-02],
[ 5.89830401e-02, 3.18349259e-01],
[ -5.07310170e-01, 5.03058935e-01],
[ -7.25736045e-01, 1.10026385e+00],
[ -9.08938455e-01, 9.98271624e-01],
[ 8.04137104e-01, 2.19102741e-01],
[ -6.58823174e-01, 2.65718300e-01],
[ -3.65245877e-02, 1.49798484e+00],
[ -1.29815968e-01, 2.75916735e-01]])
>>> x
array([[-1.03266826, 0.04646684],
[ 0.05898304, 0.31834926],
[-0.1916809 , -0.97929025],
[-0.48837085, -0.62295003],
[-0.50731017, 0.50305894],
[ 0.06457385, -0.10670002],
[-0.72573604, 1.10026385],
[-0.90893845, 0.99827162],
[ 0.20714399, -0.56965615],
[ 0.8041371 , 0.21910274],
[-0.65882317, 0.2657183 ],
[-1.1214074 , -0.39886425],
[ 0.0784783 , -0.21630006],
[-0.91802557, -0.20178683],
[ 0.88268539, -0.66470235],
[-0.03652459, 1.49798484],
[ 1.76329838, -0.26554555],
[-0.97546845, -2.41823586],
[ 0.32335103, -1.35091711],
[-0.12981597, 0.27591674]])
>>>
but I would like x1 to be only a pointer or something like this. Copy the data every time that I need a submatrix is too expensive for me.
How can I do that?
EDIT:
Apparently there is not any solution with the numpy array. Are the pandas data frame better from this point of view?

The information for your array x is summarized in the .__array_interface__ property
In [433]: x.__array_interface__
Out[433]:
{'descr': [('', '<f8')],
'strides': None,
'data': (171396104, False),
'typestr': '<f8',
'version': 3,
'shape': (20, 2)}
It has the array shape, strides (default here), and pointer to the data buffer. A view can point to the same data buffer (possibly further along), and have its own shape and strides.
But indexing with your boolean can't be summarized in those few numbers. Either it has to carry the index array all the way through, or copy selected items from the x data buffer. numpy chooses to copy. You have choice of when to apply the index, now or further down the calling stack.

Since index is an array of type bool, you are doing advanced indexing. And the docs say: „Advanced indexing always returns a copy of the data.“
This makes a lot of sense. Compared to normal indexing where you only need to know the start, stop and step, advanced indexing can use any value from the original array without such a simple rule. This would mean having lots of extra meta information where referenced indices point to that might use more memory than a copy.

If you can manage with a traditional slice such as
x1 = x[3:8]
Then it will be just a pointer.
Have you looked at using masked arrays? You might be able to do exactly what you want.
x = np.array([0.12, 0.23],
[1.23, 3.32],
...
[0.75, 1.23]])
data = np.array([[False, False],
[True, True],
...
[True, True]])
x1 = np.ma.array(x, mask=data)
## x1 can be worked on and only includes elements of x where data==False

Related

Numpy remove a row from a multidimesional array

I have a array like this
k = np.array([[ 1. , -120.8, 39.5],
[ 0. , -120.5, 39.5],
[ 1. , -120.4, 39.5],
[ 1. , -120.3, 39.5]])
I am trying to remove the following row which is also at index 1 position.
b=np.array([ 0. , -120.5, 39.5])
I have tried the traditional methods like the following:
k==b #try to get all True values at index 1 but instead got this
array([[False, False, False],
[ True, False, False],
[False, False, False],
[False, False, False]])
Other thing I tried:
k[~(k[:,0]==0.) & (k[:,1]==-120.5) & (k[:,1]==39.5)]
Got the result like this:
array([], shape=(0, 3), dtype=float64)
I am really surprised why the above methods not working. By the way in the first method I am just trying to get the index so that i can use np.delete later. Also for this problem, I am assuming I don't know the index.
Both k and b are floats, so equality comparisons are subject to floating point inaccuracies. Use np.isclose instead:
k[~np.isclose(k, b).all(axis=1)]
# array([[ 1. , -120.8, 39.5],
# [ 1. , -120.4, 39.5],
# [ 1. , -120.3, 39.5]])
Where
np.isclose(k, b).all(axis=1)
# array([False, True, False, False])
Tells you which row of k matches b.

What is going on behind this numpy selection behavior?

Answering this question, some others and I were actually wrong by considering that the following would work:
Say one has
test = [ [ [0], 1 ],
[ [1], 1 ]
]
import numpy as np
nptest = np.array(test)
What is the reason behind
>>> nptest[:,0]==[1]
array([False, False], dtype=bool)
while one has
>>> nptest[0,0]==[1],nptest[1,0]==[1]
(False, True)
or
>>> nptest==[1]
array([[False, True],
[False, True]], dtype=bool)
or
>>> nptest==1
array([[False, True],
[False, True]], dtype=bool)
Is it the degeneracy in term of dimensions which causes this.
nptest is a 2D array of object dtype, and the first element of each row is a list.
nptest[:, 0] is a 1D array of object dtype, each of whose elements are lists.
When you do nptest[:,0]==[1], NumPy does not perform an elementwise comparison of each element of nptest[:,0] against the list [1]. It creates as high-dimensional an array as it can from [1], producing the 1D array np.array([1]), and then broadcasts the comparison, comparing each element of nptest[:,0] against the integer 1.
Since no list in nptest[:, 0] is equal to 1, all elements of the result are False.

Numpy Chain Indexing

I am trying to gain a better understanding of numpy and have come across something I can't quite understand when it comes to indexing.
Let's say we have this first array of random booleans
bools = np.random.choice([True, False],(7),p=[0.5,0.5])
array([False, True, False, False, True, False, False], dtype=bool)
Then let's also say we have this second array of random numbers selected from a normal distribution
data = np.random.randn(7,3)
array([[ 2.24116809, -0.41761776, -0.69026077],
[-0.85450123, 0.98218741, 0.0233551 ],
[-1.3157436 , -0.79753471, 1.77393444],
[-0.26672724, -0.9532758 , 0.67114247],
[-1.34177843, 1.220083 , -0.35341168],
[ 0.49629327, 1.73943962, 0.59050431],
[ 0.01609382, 0.91396293, 0.3754827 ]])
Using the numpy chain indexing I can do this
data[bools, 2:]
array([[ 0.0233551 ],
[-0.35341168]])
Now let's say I want to simply grab the first element, I can do this
data[bools, 2:][0]
array([ 0.0233551])
But why does this, data[bools, 2:, 0] not work?
But why does this, data[bools, 2:, 0] not work?
Because the input is a 2D array and as such you don't have three dimensions there to use something like : [bools, 2:, 0].
To achieve what you want you are trying to do, you could store the indices corresponding to the True ones in the mask bools and then use it as whole or one element from it for indexing.
A sample run to make things clear -
Inputs :
In [40]: data
Out[40]:
array([[ 1.02429045, 1.74104271, -0.54634826],
[-0.48451969, 0.83455196, 1.94444857],
[ 0.66504345, 0.41821317, 2.52517305],
[ 2.11428982, -0.05769528, 0.84432614],
[ 0.9251009 , -0.74646199, -0.93573164],
[ 0.07321257, -0.10708067, 1.78107884],
[-0.12961046, -0.5787856 , 0.2189466 ]])
In [41]: bools
Out[41]: array([ True, True, False, False, False, False, True], dtype=bool)
Store the valid indices :
In [42]: idx = np.flatnonzero(bools)
In [43]: idx
Out[43]: array([0, 1, 6])
Use as a whole or its first element :
In [44]: data[idx, 2:] # Same as data[bools, 2:]
Out[44]:
array([[-0.54634826],
[ 1.94444857],
[ 0.2189466 ]])
In [45]: data[idx[0], 2:]
Out[45]: array([-0.54634826])
I haven't seen 2d numpy indexing called 'chaining'
data is 2d, and thus can be indexed with a 2 element tuple
data[bools, 2:]
data([bools, slice(2,None,None))]
That can also be expressed as
data[bools,:][:,2:]
where it first selects from rows, and then from columns.
Notice that your indexing produces a (2,1) array; 2 from the number of True in bool, and 1 from the length of the 2: slice.
Your 2nd indexing with [0] is really a row selection:
data[bools, 2:][0]
data[bools, 2:][0,:]
The result is a (1,) array, the size of the 2nd dimension of the intermediate array.

Intersection of two numpy arrays of different dimensions by column

I have two different numpy arrays given. First one is two-dimensional array which looks like (first ten points):
[[ 0. 0. ]
[ 12.54901961 18.03921569]
[ 13.7254902 17.64705882]
[ 14.11764706 17.25490196]
[ 14.90196078 17.25490196]
[ 14.50980392 17.64705882]
[ 14.11764706 17.64705882]
[ 14.50980392 17.25490196]
[ 17.64705882 18.03921569]
[ 21.17647059 34.11764706]]
the second array is just one-dimensional which looks like (first ten points):
[ 18.03921569 17.64705882 17.25490196 17.25490196 17.64705882
17.64705882 17.25490196 17.64705882 21.17647059 22.35294118]
Values from the second (one-dimension) array could occur in first (two-dimension) one in the first column. F.e. 17.64705882
I want to get an array from the two-dimension one where values of the first column match values in the second (one-dimension) array. How to do that?
You can use np.in1d(array1, array2) to search in array1 each value of array2. In your case you just have to take the first column of the first array:
mask = np.in1d(a[:, 0], b)
#array([False, False, False, False, False, False, False, False, True, True], dtype=bool)
You can use this mask to obtain the encountered values:
a[:, 0][mask]
#array([ 17.64705882, 21.17647059])

numpy argmin elegant solution required.

In python to find the index of the minimum value of the array I usey = numpy.argmin(someMat)
Can i find the minimum value of this matrix such that it does not lie within a specified range in a neat way?
"Can i find the minimum value of this matrix such that it does not lie within a specified range in a neat way?"
If you only care about the minimum value satisfying some condition and not the location, then
>>> numpy.random.seed(1)
>>> m = numpy.random.randn(5.,5.)
>>> m
array([[ 1.62434536, -0.61175641, -0.52817175, -1.07296862, 0.86540763],
[-2.3015387 , 1.74481176, -0.7612069 , 0.3190391 , -0.24937038],
[ 1.46210794, -2.06014071, -0.3224172 , -0.38405435, 1.13376944],
[-1.09989127, -0.17242821, -0.87785842, 0.04221375, 0.58281521],
[-1.10061918, 1.14472371, 0.90159072, 0.50249434, 0.90085595]])
>>> m[~ ((m < 0.5) | (m > 0.8))].min()
0.50249433890186823
If you do want the location via argmin, then that's a bit trickier, but one way is to use masked arrays:
>>> numpy.ma.array(m,mask=((m<0.5) | (m > 0.8))).argmin()
23
>>> m.flat[23]
0.50249433890186823
Note that the condition here is flipped, as the mask is True for the excluded values, not the included ones.
Update: it appears that by "within a specified range" you don't mean the minimum value isn't within some bounds, but that you want to exclude portions of the matrix from the search based on the x,y coordinates. Here's one way (same matrix as before):
>>> xx, yy = numpy.indices(m.shape)
>>> points = ((xx == 0) & (yy == 0)) | ((xx > 2) & (yy < 3))
>>> points
array([[ True, False, False, False, False],
[False, False, False, False, False],
[False, False, False, False, False],
[ True, True, True, False, False],
[ True, True, True, False, False]], dtype=bool)
>>> m[points]
array([ 1.62434536, -1.09989127, -0.17242821, -0.87785842, -1.10061918,
1.14472371, 0.90159072])
>>> m[points].min()
-1.1006191772129212
with the corresponding masked array variant if you need the locations. [Edited to use indices instead of mgrid; I'd actually forgotten about it until it was used in another answer today!]
If I'm still wrong :^) and this also isn't what you're after, please edit your question to include a 3x3 example of your desired input and output.
I'm guessing this is what you are trying to achieve:
Argmin with arrays:
>>> from numpy import *
>>> a = array( [2,3,4] )
>>> argmin(a)
0
>>> print a[argmin(a)]
2
Argmin with matrices:
>>> b=array( [[6,5,4],[3,2,1]] )
>>> argmin(b)
5
>>> print b[argmin(b)]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
IndexError: index out of bounds
Same approach for indexing doesn't work for arrays. The reason is that argmin (as well as argmax) returns index of the variable -- in case of a matrix, you need to convert your n-dimensional matrix to a 1-dimensional array of indices.
In order to do this, you need to call ravel :
>>> print b
[[6 5 4]
[3 2 1]]
>>> ravel(b)
array([6, 5, 4, 3, 2, 1])
When you combine ravel with argmin, you must write:
>>> print ravel(b)[argmin(b)]

Categories

Resources