When I use R, I can use str() to inspect objects which are a list of things most of the times.
I recently switched to Python for statistics and don't know how to inspect the objects I encounter. For example:
import statsmodels.api as sm
heart = sm.datasets.heart.load_pandas().data
heart.groupby(['censors'])['age']
I want to investigate what kind of object is heart.groupby(['censors']) that allows me to add ['age'] at the end. However, print heart.groupby(['censors']) only tells me the type of the object, not its structure and what I can do with it.
So how do I get to understand the structure of numpy / pandas object, similar to str() in R?
If you're trying to get some insight into what you can do with a Python object, you can inspect it using a beefed-up Python console like IPython. In an IPython session, first put the object you want to look at into a variable:
import statsmodels.api as sm
heart = sm.datasets.heart.load_pandas().data
h_grouped = heart.groupby(['censors'])
Then type out the variable name and double-tap Tab to bring up a list of the object's methods:
In [5]: h_grouped.<Tab><Tab>
# Shows the object's methods
A further benefit of the IPython console is you can quickly check the
help for any individual method by adding a ?:
h_grouped.apply?
# Apply function and combine results
# together in an intelligent way.
If you don't have IPython or a similar console, you can achieve something similar using dir(), e.g. dir(h_grouped), although this will also list
the object's private methods which are generally not useful and shouldn't be
touched in regular use.
type(heart.groupby(['censors'])['age'])
type will tell you what kind of object it is. At the moment you are grouping by a dimension and not telling pandas what to do with age. If you want the mean for example you could do:
heart.groupby(['censors'])['age'].mean()
This would take the mean of age by the group, and return a series.
The groupby is I think a red herring -- "age" is just a column name:
import statsmodels.api as sm
heart = sm.datasets.heart.load_pandas().data
heart
# survival censors age
# 0 15 1 54.3
# ...
heart.keys()
# Index([u'survival', u'censors', u'age'], dtype='object')
Related
I am looking to retrieve the name of an instance of DataFrame, that I pass as an argument to my function, to be able to use this name in the execution of the function.
Example in a script:
display(df_on_step_42)
I would like to retrieve the string "df_on_step_42" to use in the execution of the display function (that display the content of the DataFrame).
As a last resort, I can pass as argument of DataFrame and its name:
display(df_on_step_42, "df_on_step_42")
But I would prefer to do without this second argument.
PySpark DataFrames are non-transformable, so in our data pipeline, we cannot systematically put a name attribute to all the new DataFrames that come from other DataFrames.
You can use the globals() dictionary to search for your variable by matching it using eval.
As #juanpa.arrivillaga mentions, this is fundamentally bad design, but if you need to, here is one way to do this inspired by this old SO answer for python2 -
import pandas as pd
df_on_step_42 = pd.DataFrame()
def get_var_name(var):
for k in globals().keys():
try:
if eval(k) is var:
return k
except:
pass
get_var_name(df_on_step_42)
'df_on_step_42'
Your display would then look like -
display(df_on_step_42, get_var_name(df_on_step_42))
Caution
This will fail for views of variables since they are just pointing to the memory of the original variable. This means that the original variable occurs first in the global dictionary during an iteration of the keys, it will return the name of the original variable.
a = 123
b = a
get_var_name(b)
'a'
I finally found a solution to my problem using the inspect and re libraries.
I use the following lines which correspond to the use of the display() function
import inspect
import again
def display(df):
frame = inspect.getouterframes(inspect.currentframe())[1]
name = re.match("\s*(\S*).display", frame.code_context[0])[1]
print(name)
display(df_on_step_42)
The inspect library allows me to get the call context of the function, in this context, the code_context attribute gives me the text of the line where the function is called, and finally the regex library allows me to isolate the name of the dataframe given as parameter.
It’s not optimal but it works.
I am trying to write a testing program for a python program that takes data, does calculations on it, then puts the output in a class instance object. This object contains several other objects, each with their own attributes. I'm trying to access all the attributes and sub-attributes dynamically with a one size fits all solution, corresponding to elements in a dictionary I wrote to cycle through and get all those attributes for printing onto a test output file.
Edit: this may not be clear from the above but I have a list of the attributes I want, so using something to actually get those attributes is not a problem, although I'm aware python has methods that accomplish this. What I need to do is to be able to get all of those attributes with the same function call, regardless of whether they are top level object attributes or attributes of object attributes.
Python is having some trouble with this - first I tried doing something like this:
for string in attr_dictionary:
...
outputFile.print(outputclass.string)
...
But Python did not like this, and returned an AttributeError
After checking SE, I learned that this is a supposed solution:
for string in attr_dictionary:
...
outputFile.print(getattr(outputclass, string))
...
The only problem is - I want to dynamically access the attributes of objects that are attributes of outputclass. So ideally it would be something like outputclass.objectAttribute.attribute, but this does not work in python. When I use getattr(outputclass, objectAttribute.string), python returns an AttributeError
Any good solution here?
One thing I have thought of trying is creating methods to return those sub-attributes, something like:
class outputObject:
...
def attributeIWant(self,...):
return self.subObject.attributeIWant
...
Even then, it seems like getattr() will return an error because attributeIWant() is supposed to be a function call, it's not actually an attribute. I'm not certain that this is even within the capabilities of Python to make this happen.
Thank you in advance for reading and/or responding, if anyone is familiar with a way to do this it would save me a bunch of refactoring or additional code.
edit: Additional Clarification
The class for example is outputData, and inside that class you could have and instance of the class furtherData, which has the attribute dataIWant:
class outputData:
example: furtherData
example = furtherData()
example.dataIWant = someData
...
with the python getattr I can't access both attributes directly in outputData and attributes of example unless I use separate calls, the attribute of example needs two calls to getattr.
Edit2: I have found a solution I think works for this, see below
I was able to figure this out - I just wrote a quick function that splits the attribute string (for example outputObj.subObj.propertyIWant) then proceeds down the resultant array, calling getattr on each subobject until it reaches the end of the array and returns the actual attribute.
Code:
def obtainAttribute(sample, attributeString: str):
baseObj = sample
attrArray = attributeString.split(".")
for string in attrArray:
if(attrArray.index(string) == (len(attrArray) - 1)):
return getattr(baseObj,string)
else:
baseObj = getattr(baseObj,string)
return "failed"
sample is the object and attributeString is, for example object.subObject.attributeYouWant
Assume a simple function:
def example(formula="")
...
Where the argument "formula" should be one of 2400 possible chemical formulas.
Would there be a way to get code completion for choosing one string from a list of 2400 strings?
First idea was to chose from a list of class attributes.
However, I can not use the dir() method.
These formula strings are often invalid for use as attribute names.
I don't see another method that would list a choice without using attributes of something.
Thanks
Use enums, which PyCharm for instance code completes just nicely, especially if you use type hinting.
import enum
class ChemFormula(enum.Enum):
Chlorine = "cl2"
Hydrogen = "h2"
Water = "h2o"
def example(formula: ChemFormula) -> None:
....
Are you trying to get a random string from a list of strings?
from random import choice
formulas = ['f1', 'f2']
formula = choice(formulas)
example(formula = formula)
If not, then simply indexing (formulas[int]) should do the trick.
I am new on chemical network model. Currently I am converting a previous student python code to adapt the new version in the lab as titled.
firstly, a gas mixture from mechanism (pre defined) is defined
gas_mix = ct.import_phases(mech,['gas'])
then, I want to get the number of the species and use cantera nSpecies
nsp = gas_mix.nSpecies()
and I got the error message as
AttributeError: 'list' object has no attribute 'nSpecies'
Also I tried:
nsp = gas_mix.n_species
and it also shows
AttributeError: 'list' object has no attribute
Would you please kindly help me on this ?
Thank you and best regards,
YouBe
It looks like import_phases returns a list of objects--either a list of "gas mix" or just "gas" objects. I'm not really sure because this is very specific to the program you're working with.
Anyway, try looping over the values in the gas_mix and see if you can call the nSpecies() method or access the n_species attribute:
gas_mix = ct.import_phases(mech,['gas'])
for gm in gas_mix:
print(gm.nSpecies())
# or you can try this:
print(gm.n_species)
Maybe that will get you closer to what you want.
The function import_phases returns a list, which is useful for the case where you want to import multiple phase definitions from the same file, e.g.
mixtures = ct.import_phases(mech, ['gas1', 'gas2'])
where both mixtures[0] and mixtures[2] will then be a single phase definition. If you only want to define a single phase, it is easier to write:
gas_mix = ct.Solution(mech,'gas')
Or, if the mechanism file only contains one phase definition, just
gas_mix = ct.Solution(mech)
From here, you should be able to access the number of species as
gas_mix.n_species
Many of the details of migrating from the old to new Cantera interfaces are described in the documentation page "Migrating from the Old Python Module".
I have a whole series of arrays with similar names mcmcdata.rho0, mcmcdata.rho1, ... and I want to be able to loop through them while updating their values. I can't figure out how this might be done or even what such a thing might be called.
I read my data in from file like this:
names1='l b rho0 rho1 rho2 rho3 rho4 rho5 rho6 rho7 rho8 rho9 rho10 rho11 rho12 rho13 rho14 rho15 rho16 rho17 rho18 rho19 rho20 rho21 rho22 rho23'.split()
mcmcdata=np.genfromtxt(filename,names=names1,dtype=None).view(np.recarray)
and I want to update the "rho" arrays later on after I do some calculations.
for jj in range(dbins):
mcmc_x, mcmc_y, mcmc_z = wf.lbd_to_xyz(mcmcdata.l,mcmcdata.b,d[jj],R_sun)
rho, thindisk, thickdisk, halo = wf.total_density_fithRthinhRthickhzthinhzthickhrfRiA( mcmc_x, mcmc_y, mcmc_z, R_sun,params)
eval("mcmcdata."+names1[2+jj]) = copy.deepcopy(rho)
eval("mcmcthin."+names1[2+jj]) = copy.deepcopy(thindisk)
eval("mcmcthick."+names1[2+jj]) = copy.deepcopy(thickdisk)
eval("mcmchalo."+names1[2+jj]) = copy.deepcopy(halo)
But the eval command is giving an error:
File "<ipython-input-133-30322c5e633d>", line 13
eval("mcmcdata."+names1[2+jj]) = copy.deepcopy(rho)
SyntaxError: can't assign to function call
How can I loop through my existing arrays and update their values?
or
How can identify the arrays by name so I can update them?
The eval command doesn't work the way you seem to think it does. You appear to be using it like a text-replacement macro, hoping that Python will read the given string and then pretend you wrote that text in the original source code. Instead, it receives a string, and then it executes that code. You're giving it an expression that refers to an attribute of an object, which is fine, but the result of evaluating that expression does not yield a thing you can assign to. It yields the value of that attribute.
Although Python provides eval, it also provides many other things that often obviate the need for eval. In the case of your code, Python provides setattr. You give it an object, the name of an attribute on that object, and a value, and it assigns that object's attribute to refer to the given value.
setattr(mcmcdata, names1[2+jj], copy.deepcopy(rho))
It might make the code more readable to get rid of the names1 portion, too. I might write the code like this:
setattr(mcmcdata, 'rho' + str(jj), copy.deepcopy(rho))
That way, it's clear that I'm assigning the rho-related attributes of the object without having to go look at what's held in the names1 list; the name names1 doesn't offer much information about what's in it.