I have a multi-dimensional array of objects. I want to interate over the objects using the nditer iterator.
Here is a code example:
import numpy as np
class Test:
def __init__(self,a):
self.a = a
def get_a(self):
return self.a
b = np.empty((2,3),dtype = object)
t_00 = Test(0)
t_01 = Test(1)
t_11 = Test (11)
b[0,0] = t_00
b[0,1] = t_01
b[1,1] = t_11
for item in np.nditer(b,flags = ["refs_ok"]):
if item:
print item.get_a()
I would expect the "item" to contain the object reference that I can use to access data.
However I am getting the following error:AttributeError: 'numpy.ndarray' object has no attribute 'get_a'
My question is how can I go thru the array to access the object in the array?
The array.flat method of iteration will work, and can confirm that this works as you'd expect
for item in b.flat:
if item:
print item.get_a()
Iterating over an array with nditer gives you views of the original array's cells as 0-dimensional arrays. For non-object arrays, this is almost equivalent to producing scalars, since 0-dimensional arrays usually behave like scalars, but that doesn't work for object arrays.
If you were determined to go through nditer for this, you could extract the elements from the 0-dimensional views with the item() method:
for element in np.nditer(b,flags = ["refs_ok"]):
element = element.item()
if element:
print(element.get_a())
Related
def answer_six():
census_df = pd.read_csv('census.csv')
copy = census_df.copy()
states = copy['STNAME'].unique()
counties = copy['CTYNAME']
play = copy.copy()
play = play.set_index(['STNAME','CTYNAME'])
copy = copy.set_index(['STNAME'])
def population_with_top3 (state):
counties = copy.loc[state]['CTYNAME'].values
population_array = list(map(lambda county:int(play.loc[state,county]['CENSUS2010POP'].values),counties))
population_array.sort(reverse = True)
population = population_array[1] + population_array[2] + population_array[3]
return {'STNAME': state, 'POP': population}
states_with_pop = list(map(population_with_top3,states))
return states_with_pop
answer_six()
when running the code I get:
TypeError: only length-1 arrays can be converted to Python scalars
Does anybody have any experience with this kind of issue?
Thanks!
This error happens when you try to use an array in place of single value.
I think the error is in this part of your code.
int(play.loc[state,county]['CENSUS2010POP'].values)
int takes only a single value for typecasting and .values returns an array. But if the array is of size one int will ignore that its an array and take the 1st element. In you case i think your play.loc[,][''].values is returning more than one value. This happens if there is more than one row with same state name, same county name and same CENSUS2010POP.
I was asking myself if it is possible to turn the output of a class into a np.array within the class itself.
I created the following class:
class stats:
def __init__( self, x ):
self.age = x[:,0]
self.education = x[:,1]
self.married = x[:,2]
self.nodegree = x[:,3]
self.RE75 = x[:,4]
self.RE78 = x[:,5]
def Vector( self ):
age = [np.mean(self.age), st.stdev(self.age)]
education = [np.mean(self.education), st.stdev(self.education)]
married = [np.mean(self.married), st.stdev(self.married)]
nodegree = [np.mean(self.nodegree), st.stdev(self.nodegree)]
RE75 = [np.mean(self.RE75), st.stdev(self.RE75)]
RE78 = [np.mean(self.RE78), st.stdev(self.RE78)]
return [age, education, married, nodegree, RE75, RE78]
results1 is a numpy.ndarray of shape 156x6.
I basically want to compute the mean as well as standard deviation for each column of results1 using a class. I use numpy to compute the mean and statistics for the std.
When I am printing the output I get the following:
results1_stats = stats(results1)
print(results1_stats.Vector())
Output:
[[25.98076923076923, 7.299554695959556], [10.314102564102564, 2.0597666237347005], [0.1858974358974359, 0.39027677820527085], [0.7243589743589743, 0.448275807219502], [1490.7220884615383, 3296.5535502409775], [6136.320646794872, 8143.4659725229685]]
Apparently, the class is working as wanted (although there is probablly a more efficent way to code this up).
The problem is, that I would lilke to get the output in a np.array of shape 6x2 (or transposed) directly from the class itself. However, since I just began using classes I don't know if that is even possible.
Any help is appreciated :)
Thank you!
You can construct an numpy array using np.array(your_list_sequence). Additionally, you can use list comprehension to convert list of lists to numpy array. More info here.
Try this:
def get_stats(results):
return np.array([np.array([np.mean(results[:, column]), st.stdev(results[:, column])]) for column in range(6)])
your_new_np_array = get_status(results)
Although, if you want only stats array, having a function for this instead of class would be better and simpler. But, you can easily include that method in your class and get back your expected result.
I have the following array:
a = np.random.rand(5,2)
a
array([[0.98736372, 0.07639041],
[0.45342928, 0.4932295 ],
[0.75789786, 0.48546238],
[0.85854235, 0.74868237],
[0.13534155, 0.79317482]])
and I want to resize it so that it is divided into 2 batches with three elements (adding zeros as required):
array([[[0.98736372, 0.07639041],
[0.45342928, 0.4932295 ],
[0.75789786, 0.48546238]],
[[0.85854235, 0.74868237],
[0.13534155, 0.79317482],
[0, 0]]])
I have tried this, but it returns None:
a = a.copy()
a.resize((2,3,2), refcheck = False)
I believe .reshape would not provide the solution, as it is not able to fill in with 0's to comply with the desired dimensions for the array.
Using numpy.resize, you have to use like this:
import numpy as np
a = np.random.rand(5,2)
b = np.resize(a, (2,3,2))
otherwise you can use the object method to get the same result, like this:
import numpy as np
a = np.random.rand(5,2)
a.np.resize(2,3,2)
b = a.copy()
note the first one return ndarray and the last one returns None because It changes the object itself. For more info, look the numpy.resize doc
Me again... :)
I tried finding an answer to this question but again I was not fortunate enough. So here it is.
What is the difference between calling a numpy array (let's say "iris") and the whole group of data in this array (by using iris[:] for instance).
I´m asking this because of the error that I get when I run the first example (below), while the second example works fine.
Here is the code:
At this first part I load the library and import the dataset from the internet.
import statsmodels.api as sm
iris = sm.datasets.get_rdataset(dataname='iris',
package='datasets')['data']
If I run this code I get an error:
iris.columns.values = [iris.columns.values[x].lower() for x in range( len( iris.columns.values ) ) ]
print(iris.columns.values)
Now if I run this code it works fine:
iris.columns.values[:] = [iris.columns.values[x].lower() for x in range( len( iris.columns.values ) ) ]
print(iris.columns.values)
Best regards,
The difference is that when you do iris.columns.values = ... you try to replace the reference of the values property in iris.columns which is protected (see pandas implementation of pandas.core.frame.DataFrame) and when you do iris.columns.values[:] = ... you access the data of the np.ndarray and replace it with new values. In the second assignment statement you do not overwrite the reference to the numpy object. The [:] is a slice object that is passed to the __setitem__ method of the numpy array.
EDIT:
The exact implementation (there are multiple, here is the pd.Series implementation) of such property is:
#property
def values(self):
""" return the array """
return self.block.values
thus you try to overwrite a property that is constructed with a decorator #property followed by a getter function, and cannot be replaced since it is only provided with a getter and not a setter. See Python's docs on builtins - property()
iris.columns.values = val
calls
type(iris.columns).__setattr__(iris.columns, 'values', val)
This is running pandas' code, because type(iris.columns) is pd.Series
iris.columns.values[:] = val
calls
type(iris.columns.value).__setitem__(iris.columns.value, slice(None), val)
This is running numpy's code, because type(iris.columns.value) is np.ndarray
Using the given routines (how to load Matlab .mat files with scipy), I could not access deeper nested structures to recover them into dictionaries
To present the problem I run into in more detail, I give the following toy example:
load scipy.io as spio
a = {'b':{'c':{'d': 3}}}
# my dictionary: a['b']['c']['d'] = 3
spio.savemat('xy.mat',a)
Now I want to read the mat-File back into python. I tried the following:
vig=spio.loadmat('xy.mat',squeeze_me=True)
If I now want to access the fields I get:
>> vig['b']
array(((array(3),),), dtype=[('c', '|O8')])
>> vig['b']['c']
array(array((3,), dtype=[('d', '|O8')]), dtype=object)
>> vig['b']['c']['d']
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
/<ipython console> in <module>()
ValueError: field named d not found.
However, by using the option struct_as_record=False the field could be accessed:
v=spio.loadmat('xy.mat',squeeze_me=True,struct_as_record=False)
Now it was possible to access it by
>> v['b'].c.d
array(3)
Here are the functions, which reconstructs the dictionaries just use this loadmat instead of scipy.io's loadmat:
import scipy.io as spio
def loadmat(filename):
'''
this function should be called instead of direct spio.loadmat
as it cures the problem of not properly recovering python dictionaries
from mat files. It calls the function check keys to cure all entries
which are still mat-objects
'''
data = spio.loadmat(filename, struct_as_record=False, squeeze_me=True)
return _check_keys(data)
def _check_keys(dict):
'''
checks if entries in dictionary are mat-objects. If yes
todict is called to change them to nested dictionaries
'''
for key in dict:
if isinstance(dict[key], spio.matlab.mio5_params.mat_struct):
dict[key] = _todict(dict[key])
return dict
def _todict(matobj):
'''
A recursive function which constructs from matobjects nested dictionaries
'''
dict = {}
for strg in matobj._fieldnames:
elem = matobj.__dict__[strg]
if isinstance(elem, spio.matlab.mio5_params.mat_struct):
dict[strg] = _todict(elem)
else:
dict[strg] = elem
return dict
Just an enhancement to mergen's answer, which unfortunately will stop recursing if it reaches a cell array of objects. The following version will make lists of them instead, and continuing the recursion into the cell array elements if possible.
import scipy.io as spio
import numpy as np
def loadmat(filename):
'''
this function should be called instead of direct spio.loadmat
as it cures the problem of not properly recovering python dictionaries
from mat files. It calls the function check keys to cure all entries
which are still mat-objects
'''
def _check_keys(d):
'''
checks if entries in dictionary are mat-objects. If yes
todict is called to change them to nested dictionaries
'''
for key in d:
if isinstance(d[key], spio.matlab.mio5_params.mat_struct):
d[key] = _todict(d[key])
return d
def _todict(matobj):
'''
A recursive function which constructs from matobjects nested dictionaries
'''
d = {}
for strg in matobj._fieldnames:
elem = matobj.__dict__[strg]
if isinstance(elem, spio.matlab.mio5_params.mat_struct):
d[strg] = _todict(elem)
elif isinstance(elem, np.ndarray):
d[strg] = _tolist(elem)
else:
d[strg] = elem
return d
def _tolist(ndarray):
'''
A recursive function which constructs lists from cellarrays
(which are loaded as numpy ndarrays), recursing into the elements
if they contain matobjects.
'''
elem_list = []
for sub_elem in ndarray:
if isinstance(sub_elem, spio.matlab.mio5_params.mat_struct):
elem_list.append(_todict(sub_elem))
elif isinstance(sub_elem, np.ndarray):
elem_list.append(_tolist(sub_elem))
else:
elem_list.append(sub_elem)
return elem_list
data = spio.loadmat(filename, struct_as_record=False, squeeze_me=True)
return _check_keys(data)
As of scipy >= 1.5.0 this functionality now comes built-in using the simplify_cells argument.
from scipy.io import loadmat
mat_dict = loadmat(file_name, simplify_cells=True)
I was advised on the scipy mailing list (https://mail.python.org/pipermail/scipy-user/) that there are two more ways to access this data.
This works:
import scipy.io as spio
vig=spio.loadmat('xy.mat')
print vig['b'][0, 0]['c'][0, 0]['d'][0, 0]
Output on my machine:
3
The reason for this kind of access: "For historic reasons, in Matlab everything is at least a 2D array, even scalars."
So scipy.io.loadmat mimics Matlab behavior per default.
Found a solution, one can access the content of the "scipy.io.matlab.mio5_params.mat_struct object" can be investigated via:
v['b'].__dict__['c'].__dict__['d']
Another method that works:
import scipy.io as spio
vig=spio.loadmat('xy.mat',squeeze_me=True)
print vig['b']['c'].item()['d']
Output:
3
I learned this method on the scipy mailing list, too. I certainly don't understand (yet) why '.item()' has to be added in, and:
print vig['b']['c']['d']
will throw an error instead:
IndexError: only integers, slices (:), ellipsis (...), numpy.newaxis (None) and integer or boolean arrays are valid indices
but I'll be back to supplement the explanation when I know it. Explanation of numpy.ndarray.item (from thenumpy reference):
Copy an element of an array to a standard Python scalar and return it.
(Please notice that this answer is basically the same as the comment of hpaulj to the initial question, but I felt that the comment is not 'visible' or understandable enough. I certainly did not notice it when I searched for a solution for the first time, some weeks ago).