Related
I am attempting to pass filtered values from a Numpy array into a function.
I need to pass values only above a certain value, and their index position with the Numpy array.
I am attempting to avoid iterating over the entire array within python by using Numpys own filtering systems, the arrays i am dealing with have 20k of values in them with potentially only very few being relevant.
import numpy as np
somearray = np.array([1,2,3,4,5,6])
arrayindex = np.nonzero(somearray > 4)
for i in arrayindex:
somefunction(arrayindex[0], somearray[arrayindex[0]])
This threw up errors of logic not being able to handle multiple values,
this led me to testing it through print statement to see what was going on.
for cell in arrayindex:
print(f"index {cell}")
print(f"data {somearray[cell]}")
I expected an output of
index 4
data 5
index 5
data 6
But instead i get
index [4 5]
data [5 6]
I have looked through different methods to iterate through numpy arrays such and neditor, but none seem to still allow me to do the filtering of values outside of the for loop.
Is there a solution to my quandary?
Oh, i am aware that is is generally frowned upon to loop through a numpy array, however the function that i am passing these values to are complex, triggering certain events and involving data to be uploaded to a data base dependent on the data location within the array.
Thanks.
import numpy as np
somearray = np.array([1,2,3,4,5,6])
arrayindex = [idx for idx, val in enumerate(somearray) if val > 4]
for i in range(0, len(arrayindex)):
somefunction(arrayindex[i], somearray[arrayindex[i]])
for i in range(0, len(arrayindex)):
print("index", arrayindex[i])
print("data", somearray[arrayindex[i]])
You need to have a clear idea of what nonzero produces, and pay attention to the difference between indexing with a list(s) and with a tuple.
===
In [110]: somearray = np.array([1,2,3,4,5,6])
...: arrayindex = np.nonzero(somearray > 4)
nonzero produces a tuple of arrays, one per dimension (this becomes more obvious with 2d arrays):
In [111]: arrayindex
Out[111]: (array([4, 5]),)
It can be used directly as an index:
In [113]: somearray[arrayindex]
Out[113]: array([5, 6])
In this 1d case you could take the array out of the tuple, and iterate on it:
In [114]: for i in arrayindex[0]:print(i, somearray[i])
4 5
5 6
argwhere does a 'transpose', which could also be used for iteration
In [115]: idxs = np.argwhere(somearray>4)
In [116]: idxs
Out[116]:
array([[4],
[5]])
In [117]: for i in idxs: print(i,somearray[i])
[4] [5]
[5] [6]
idxs is (2,1) shape, so i is (1,) shape array, resulting in the brackets in the display. Occasionally it's useful, but nonzero is used more (often by it's other name, np.where).
2d
argwhere has a 2d example:
In [119]: x=np.arange(6).reshape(2,3)
In [120]: np.argwhere(x>1)
Out[120]:
array([[0, 2],
[1, 0],
[1, 1],
[1, 2]])
In [121]: np.nonzero(x>1)
Out[121]: (array([0, 1, 1, 1]), array([2, 0, 1, 2]))
In [122]: x[np.nonzero(x>1)]
Out[122]: array([2, 3, 4, 5])
While nonzero can be used to index the array, argwhere elements can't.
In [123]: for ij in np.argwhere(x>1):
...: print(ij,x[ij])
...:
...
IndexError: index 2 is out of bounds for axis 0 with size 2
Problem is that ij is a list, which is used to index on dimension. numpy distinguishes between lists and tuples when indexing. (Earlier versions fudged the difference, but current versions are taking a more rigorous approach.)
So we need to change the list into a tuple. One way is to unpack it:
In [124]: for i,j in np.argwhere(x>1):
...: print(i,j,x[i,j])
...:
...:
0 2 2
1 0 3
1 1 4
1 2 5
I could have used: print(ij,x[tuple(ij)]) in [123].
I should have used unpacking the [117] iteration:
In [125]: for i, in idxs: print(i,somearray[i])
4 5
5 6
or somearray[tuple(i)]
I am trying to create .mat data files using python. The matlab code expects the data to have a certain format, where two-dimensional ndarrays of non-uniform sizes are stored as objects in a column vector. So, in my case, there would be k numpy arrays of shape (m_i, n) - with different m_i for each array - stored in a numpy array with dtype=object of shape (k, 1). I then add this object array to a dictionary and pass it to scipy.io.savemat().
This works fine so long as the m_i are indeed different. If all k arrays happen to have the same number of rows m_i, the behaviour becomes strange. First of all, it requires very explicit assignment to a numpy array of dtype=object that has been initialised to the final size k, otherwise numpy simply creates a three-dimensional array. But even when I have the correct format in python and store it to a .mat file using savemat, there is some kind of problem in the translation to the matlab format.
When I reload the data from the .mat file using scipy.io.loadmat, I find that I still have an object array of shape (k, 1), which still has elements of shape (m, n). However, each element is no longer an int or a float but is instead a numpy array of shape (1, 1) that has to be further indexed to access the contained int or float. So an individual element of an object vector that was supposed to be a numpy array of shape (2, 4) would look something like this:
[array([[array([[0.82374894]]), array([[0.50730055]]),
array([[0.36721625]]), array([[0.45036349]])],
[array([[0.26119276]]), array([[0.16843872]]),
array([[0.28649524]]), array([[0.64239569]])]], dtype=object)]
This also poses a problem for the matlab code that I am trying to build my data files for. It runs fine for the arrays of objects that have different shapes but will break when there are arrays containing arrays of the same shape.
I know this is a rather obscure and possibly unavoidable issue but I figured I would see if anyone else has encountered it and found a fix. Thanks.
I'm not quite clear about the problem. Let me try to recreate your case:
In [58]: from scipy.io import loadmat, savemat
In [59]: A = np.empty((2,1), object)
In [61]: A[0,0]=np.arange(4).reshape(2,2)
In [62]: A[1,0]=np.arange(6).reshape(3,2)
In [63]: A
Out[63]:
array([[array([[0, 1],
[2, 3]])],
[array([[0, 1],
[2, 3],
[4, 5]])]], dtype=object)
In [64]: B=A[[0,0],:]
In [65]: B
Out[65]:
array([[array([[0, 1],
[2, 3]])],
[array([[0, 1],
[2, 3]])]], dtype=object)
As I explained earlier today, creating an object dtype array from arrays of matching size requires special handling. np.array(...) tries to create a higher dimensional array. https://stackoverflow.com/a/56243305/901925
Saving:
In [66]: savemat('foo.mat', {'A':A, 'B':B})
Loading:
In [74]: loadmat('foo.mat')
Out[74]:
{'__header__': b'MATLAB 5.0 MAT-file Platform: posix, Created on: Tue May 21 11:20:42 2019',
'__version__': '1.0',
'__globals__': [],
'A': array([[array([[0, 1],
[2, 3]])],
[array([[0, 1],
[2, 3],
[4, 5]])]], dtype=object),
'B': array([[array([[0, 1],
[2, 3]])],
[array([[0, 1],
[2, 3]])]], dtype=object)}
In [75]: _74['A'][1,0]
Out[75]:
array([[0, 1],
[2, 3],
[4, 5]])
Your problem case looks like it's a object dtype array containing numbers:
In [89]: C = np.arange(4).reshape(2,2).astype(object)
In [90]: C
Out[90]:
array([[0, 1],
[2, 3]], dtype=object)
In [91]: savemat('foo1.mat', {'C': C})
In [92]: loadmat('foo1.mat')
Out[92]:
{'__header__': b'MATLAB 5.0 MAT-file Platform: posix, Created on: Tue May 21 11:39:31 2019',
'__version__': '1.0',
'__globals__': [],
'C': array([[array([[0]]), array([[1]])],
[array([[2]]), array([[3]])]], dtype=object)}
Evidently savemat has converted the integer objects into 2d MATLAB compatible arrays. In MATLAB everything, even scalars, is at least 2d.
===
And in Octave, the object dtype arrays all produce cells, and the 2d numeric arrays produce matrices:
>> load foo.mat
>> A
A =
{
[1,1] =
0 1
2 3
[2,1] =
0 1
2 3
4 5
}
>> B
B =
{
[1,1] =
0 1
2 3
[2,1] =
0 1
2 3
}
>> load foo1.mat
>> C
C =
{
[1,1] = 0
[2,1] = 2
[1,2] = 1
[2,2] = 3
}
Python: Issue reading in str from MATLAB .mat file using h5py and NumPy
is a relatively recent SO that showed there's a difference between the Octave HDF5 and MATLAB.
I have a Pandas series and here are two first two rows:
X.head(2)
Which has 1D arrays for each row: the column header is mels_flatten
mels_flatten
0 [0.0171469795289, 0.0173154008662, 0.395695541...
1 [0.0471267533454, 0.0061760868171, 0.005647608...
I want to store the values in a single array to feed to a classifier model.
np.vstack(X.values)
or
np.array(X.values)
both returns following
array([[ array([ 1.71469795e-02, 1.73154009e-02, 3.95695542e-01, ...,
2.35955651e-04, 8.64118460e-04, 7.74663408e-04])],
[ array([ 0.04712675, 0.00617609, 0.00564761, ..., 0.00277199,
0.00205229, 0.00043118])],
I am not sure how to process array of array objects.
My expected result is:
array([[ 1.71469795e-02, 1.73154009e-02, 3.95695542e-01, ...,
2.35955651e-04, 8.64118460e-04, 7.74663408e-04]],
[ 0.04712675, 0.00617609, 0.00564761, ..., 0.00277199,
0.00205229, 0.00043118]],
Have tried np.concatenate and np.resize as some other posts suggested with no luck.
I find it likely that not all of your 1d arrays are the same length, i.e. your series is not compatible with a rectangular 2d array.
Consider the following dummy example:
import pandas as pd
import numpy as np
X = pd.Series([np.array([1,2,3]),np.array([4,5,6])])
# 0 [1, 2, 3]
# 1 [4, 5, 6]
# dtype: object
np.vstack(X.values)
# array([[1, 2, 3],
# [4, 5, 6]])
As the above demonstrate, a collection of 1d arrays (or lists) of the same size will be nicely stacked to a 2d array. Check the size of your arrays, and you'll probably find that there are some discrepancies:
>>> X.apply(len)
0 3
1 3
dtype: int64
If X.apply(len).unique() returns an array with more than 1 elements, you'll see the proof of the problem. In the above rectangular case:
>>> X.apply(len).unique()
array([3])
In a non-conforming example:
>>> Y = pd.Series([np.array([1,2,3]),np.array([4,5])])
>>> np.array(Y.values)
array([array([1, 2, 3]), array([4, 5])], dtype=object)
>>> Y.apply(len).unique()
array([3, 2])
As you can see, the nested array result is coupled to the non-unique length of items inside the original array.
Starting with a 2D Numpy array I would like to create a 1D array in which each value corresponds to the minimum value of each row in the 2D array.
For example if
dog=[[1,2],[4,3],[6,7]]
then I would like to create an array from
'dog':[1,3,6]
This seems like it should be easy to do, but I'm not getting it so far.
In [54]: dog=[[1,2],[4,3],[6,7]]
In [55]: np.min(dog, axis=1)
Out[55]: array([1, 3, 6])
or, if dog is a NumPy array, you could call its min method:
In [57]: dog = np.array([[1,2],[4,3],[6,7]])
In [58]: dog.min(axis=1)
Out[58]: array([1, 3, 6])
Since dog.shape is (3,2), (for 3 rows, 2 columns), the axis=1 refers to the second dimension in the shape -- the one with 2 elements. Putting axis=1 in the call to dog.min tells NumPy to take the min over the axis=1 direction, thus eliminating the axis of length 2. The result is thus of shape (3,).
Without numpy:
dog=[[1,2],[4,3],[6,7]]
mins = [min(x) for x in dog]
I'm pretty new in numpy and I am having a hard time understanding how to extract from a np.array a sub matrix with defined columns and rows:
Y = np.arange(16).reshape(4,4)
If I want to extract columns/rows 0 and 3, I should have:
[[0 3]
[12 15]]
I tried all the reshape functions...but cannot figure out how to do this. Any ideas?
Give np.ix_ a try:
Y[np.ix_([0,3],[0,3])]
This returns your desired result:
In [25]: Y = np.arange(16).reshape(4,4)
In [26]: Y[np.ix_([0,3],[0,3])]
Out[26]:
array([[ 0, 3],
[12, 15]])
One solution is to index the rows/columns by slicing/striding. Here's an example where you are extracting every third column/row from the first to last columns (i.e. the first and fourth columns)
In [1]: import numpy as np
In [2]: Y = np.arange(16).reshape(4, 4)
In [3]: Y[0:4:3, 0:4:3]
Out[1]: array([[ 0, 3],
[12, 15]])
This gives you the output you were looking for.
For more info, check out this page on indexing in NumPy.
print y[0:4:3,0:4:3]
is the shortest and most appropriate fix .
First of all, your Y only has 4 col and rows, so there is no col4 or row4, at most col3 or row3.
To get 0, 3 cols: Y[[0,3],:]
To get 0, 3 rows: Y[:,[0,3]]
So to get the array you request: Y[[0,3],:][:,[0,3]]
Note that if you just Y[[0,3],[0,3]] it is equivalent to [Y[0,0], Y[3,3]] and the result will be of two elements: array([ 0, 15])
You can also do this using:
Y[[[0],[3]],[0,3]]
which is equivalent to doing this using indexing arrays:
idx = np.array((0,3)).reshape(2,1)
Y[idx,idx.T]
To make the broadcasting work as desired, you need the non-singleton dimension of your indexing array to be aligned with the axis you're indexing into, e.g. for an n x m 2D subarray:
Y[<n x 1 array>,<1 x m array>]
This doesn't create an intermediate array, unlike CT Zhu's answer, which creates the intermediate array Y[(0,3),:], then indexes into it.
This can also be done by slicing: Y[[0,3],:][:,[0,3]]. More elegantly, it is possible to slice arrays (or even reorder them) by given sets of indices for rows, columns, pages, et cetera:
r=np.array([0,3])
c=np.array([0,3])
print(Y[r,:][:,c]) #>>[[ 0 3][12 15]]
for reordering try this:
r=np.array([0,3])
c=np.array([3,0])
print(Y[r,:][:,c])#>>[[ 3 0][15 12]]