I know that sound generic, so I will try my best to explain the case.
I am getting back from a process, some data. I would like to organize this data in a format that I can interact with it, and access it.
So far I thought that I can make a simple class, which is empty, and at creation time it takes **kwargs and cycle trough them, adding them to the class.
Although I am not sure if this is the correct way to do so. Imagine the following data:
dict1={param1:a, param2:b, param3:c, param4:d} #Operation1
dict2={param2:t, param1:2, param1:r} #Operation2
dict3={param1:1, param7:2, param2:4, param4:b, param3:m} #Operation3
I would like to make a class that when creating, will take the parameters, resulting in the class attribute name taken from the parameter name, and as value, the value of that parameter:
myclass1(dict1)
myclass2.param1 return a, myclass1.param2 return b and so on
But if I want to make myclass using dict2, I can also do so:
myclass2(dict2)
myclass1.param2 return t, myclass2.param1 return 2 and so on
In this way I have a name with the parameter, and can retrieve it at later time, and at the same time I do not have to worry about how many element my data has, since the class will always get the number of parameters, and create class attributes, using the name of the key in the dictionary.
Is possible to achieve this in a simple way in python? I can use dictionary in a dictionary, but it feels utterly complicate, for big data sets.
You can do something like:
In [2]: dict1={'param1':'a', 'param2':'b', 'param3':'c', 'param4':'d'}
In [3]: class A(object):
...: def __init__(self, params):
...: for k, v in params.iteritems():
...: setattr(self, k, v)
...:
In [4]: a = A(dict1)
In [5]: a.param1
Out[5]: 'a'
In [6]: a.param2
Out[6]: 'b'
Related
What is the difference between the two class definitions below,
class my_dict1(dict):
def __init__(self, data):
self = data.copy()
self.N = sum(self.values)
The above code results in AttributeError: 'dict' object has no attribute 'N', while the below code compiles
class my_dict2(dict):
def __init__(self, data):
for k, v in data.items():
self[k] = v
self.N = sum(self.values)
For example,
d = {'a': 3, 'b': 5}
a = my_dict1(d) # results in attribute error
b = my_dict2(d) # works fine
By assigning self itself to anything you assign self to a completely different instance than you were originally dealing with, making it no longer the "self". This instance will be of the broader type dict (because data is a dict), not of the narrower type my_dict1. You would need to do self["N"] in the first example for it to be interpreted without error, but note that even with this, in something like:
abc = mydict_1({})
abc will still not have the key "N" because a completely difference instance in __init__ was given a value for the key "N". This shows you that there's no reasonable scenario where you want to assign self itself to something else.
In regards to my_dict2, prefer composition over inheritance if you want to use a particular dict as a representation of your domain. This means having data as an instance field. See the related C# question Why not inherit from List?, the core answer is still the same. It comes down to whether you want to extend the dict mechanism vs. having a business object based on it.
I'm trying to figure out how Pandas manages to create new object members on the fly. For example, if you do this:
d = {'col1': [1, 2], 'col2': [3, 4]}
df = pd.DataFrame(data=d)
You can immediately do this:
df.col1
and get the contents of col1. How does Pandas create the col1 member on the fly?
Thanks.
Relevant code in the repository that checks for a dictionary input:
class DataFrame(NDFrame):
def __init__(self, data=None, index=None, columns=None, dtype=None,
copy=False):
if data is None:
data = {}
# Some other if-statements that check data types...
elif isinstance(data, dict): mgr = self._init_dict(data, index, columns, dtype=dtype)
Which uses_init_dict method:
def _init_dict(self, data, index, columns, dtype=None):
if columns is not None:
# Does some stuff - but this isn't your case
else:
keys = list(data.keys())
if not isinstance(data, OrderedDict):
# So this part is trying to sort keys to cols in alphabetical order
# The _try_sort function is simple, exists in pandas.core.common
keys = _try_sort(keys)
columns = data_names = Index(keys)
So the real work comes from the Index class in pandas.core.indexes.base. From there things start to get really complicated (and my understanding of what it means to explain the "how" of anything without continuing to regress until you get to machine code, started to melt away) but it's safe to say that if you give the pandas.Index class a 1-dimentional array of data it will create an object with a sliceable set and an associated data type.
Which is exactly what you're observing - you essentially fed it a bunch of keys and pandas understood that it needed to give you something back that you could access as an index (since df.col1 is just syntactic sugar for df['col1']), that you could slice (df[0:1]), and that knew its own data types.
And, of course, after asking the question, i find the answer myself.
It turns out you can use __getattr__ to achieve this. Easiest way (and the one i happen to want) is by using a dictionary, then using __getattr__ to return values from the dictionary, like so:
class test():
def __init__(self):
# super().__init__()
self.adict = {'spam' : 'eggs'}
def __getattr__(self, attr):
return self.adict[attr]
nt = test()
print(nt.spam)
__getattr__ is called when a class attribute isn't found, as is the case here. The interpreter can't find the spam attribute, so it defers this to __getattr__. Things to keep in mind:
If the key doesn't exist in the dictionary, this will raise a key error, not an attribute error.
Don't use __getattribute__, because it is called every time an attribute is called, messing up your entire class.
Thanks to everyone for the input on this.
My goal here is to be able to create nested dictionaries that have attributes that hold lists of values. For example, I want to be able to do something like this:
mydict['Person 1']['height'].vals = [23, 25, 32]
mydict['Person 2']['weight'].vals = [100, 105, 110]
mydict['Person 2']['weight'].excel_locs ['A1', 'A2', 'A3']
So, for each "person" I can keep track of multiple things I might have data on, such as height and weight. The attribute I'm calling 'vals' is just a list of values for heights or weights. Importantly, I want to be able to keep track of things like where the raw data came from, such as its location in an Excel spreadsheet.
Here's what I am currently working off of:
import collections
class Vals:
def add(self, list_of_vals=[], attr_name=[]):
setattr(self, attr_name, val)
def __str__(self):
return str(self.__dict__)
mydict = collections.defaultdict(Vals)
So, I want to be able to add new keys as needed, such as mydict['Person 10']['test scores'], and then create a new attribute such as "vals" if it doesn't exist, but also append new values to it if it does.
Example of what I want to achieve:
mydict['Person 10']['test scores'].add([10, 20, 30], 'vals')
Which should allow mydictmydict['Person 10']['test scores'].vals to return [10, 20, 30]
But then I also want to be able to append to this list later on if needed, such that using .add again append to the existing list. For example, mydict['Person 10']['test scores'].add([1, 2, 3], 'vals') should then allow me to return [10, 20, 30, 1, 2, 3] from mydict['Person 10']['test scores'].vals.
I'm still very much getting used to object oriented programming, classes, etc. I am very open to better strategies that might exist for achieving my goal, which is just a nested dictionary structure which I find convenient for holding data.
If we just modify the Vals class above, it needs a way to determine whether an attribute exists. If so, create and populate it with list_of_vals, otherwise append to the existing list
Thanks!
from what I understand, you want something that you can conveniently hold data. I would actually build a class instead of a nested dictionary, because this allows for an easier way to see how everything works together (and it also helps organize everything!).
class Person(object):
"""__init__() functions as the class constructor"""
def __init__(self, name=None, testScores=None):
self.name = name
self.testScores = testScores
# make a list of class Person(s)
personList = []
personList.append(Person("Person 1",[10,25,32]))
personList.append(Person("Person 2",[22,37,45]))
print("Show one particular item:")
print(personList[0].testScores)
personList[0].testScores.append(50)
print(personList[0].testScores)
print(personList[1].name)
Basically, the Person class is what holds all of the data for an instance of it. If you want to add different types of data, you would add a parameter to the init() function like this:
def __init__(self, name=None, testScores=None, weight = None):
self.name = name
self.testScores = testScores
self.weight = weight
You can edit the values just like you would a variable.
If this isn't what you are looking for, or you are confused, I am willing to try to help you more.
I agree that using a Person class is a better solution here. It's a more abstract & intuitive way to represent a the concept, which will make your code easier to work with.
Check this out:
class Person():
# Define a custom method for retrieving attributes
def __getattr__(self, attr_name):
# If the attribute exists, setdefault will return it.
# If it doesn't yet exist, it will set it to an empty
# dictionary, and then return it.
return self.__dict__.setdefault(attr_name, {})
carolyn = Person()
carolyn.name["value"] = "Carolyn"
carolyn.name["excel_loc"] = "A1"
print(carolyn.name)
# {"value": "Carolyn", "excel_loc": "A1"}
maria = Person()
print(maria.name)
# {}
Then collecting people into a dictionary is easy:
people = {
"carolyn": carolyn,
"maria": maria
}
people["Ralph"] = Person()
people["Ralph"].name["value"] = "Ralph"
You've also made a tricky mistake in defining the add method:
def add(self, list_of_vals=[], attr_name=[]):
...
In Python, you never want to set an empty list as a default variable. Because of the way they're stored under the hood, your default variable will reference the same list every time, instead of creating a new, empty list each time.
Here's a common workaround:
def add(self, list_of_vals=None, attr_name=None):
list_of_vals = list_of_vals or []
attr_name = attr_name or []
...
I have a class
class MyClass():
def __init__(self):
self.a = 7
self.b = 2
#property
def aAndB(self):
return self.a + self.b
I would like a function that iterates over all properties and returns only class instances having a certain property.
My goal is a function like this:
def findInstances(listOfInstances, instanceVariable, instanceValue):
#return all instances in listOfInstances where instanceVariable==instanceValue
Using instance.__dict__ only gives me a and b, but not aAndB. I would like to have a dict of all properties/methods with their values to loop over, so I can search for instances where a certain property (or method decorated with #property) has a certain value.
Currently, calling the function like this
findInstances(someListOfInstances, 'aAndB', '23')
makes Python complain that aAndB is not in instance.__dict__.
Maybe all of you are right and the answers are there, but I still don't get it. All the answers in the mentioned questions get lists, not dictionaries. I want all the properties (including methods with the #property decorator) and their values. Is there a way to iterate over the values of the keys in dir(myClass)? The dir command only contains the names of the attributes, not their values.
I need something like
for a in dir(myClass):
print a, myClass.(a) # get the value for an attribute stored in a variable
To be even more clear: The following achieves exactly what I want but there is probably a better way that I don't know.
for a in dir(myClass):
print a, eval("myClass.{}".format(a))
There's actually a very simple way to do this, using getattr in place of eval:
myClass = MyClass()
for a in dir(myClass):
if(a[:2] != "__"): #don't print double underscore methods
print a, getattr(myClass, a)
Output:
a 7
aAndB 9
b 2
This has the very big advantage of not needing to hard code in the name of your instance into the string, as is required using eval("myClass.{}".format(a))
In Pandas, I've been using custom objects as column labels because they provide rich/flexible functionality for info/methods specific to the column. For example, you can set a custom fmt_fn to format each column (note this is just an example, my actual column label objects are more complex):
In [100]: class Col:
...: def __init__(self, name, fmt_fn):
...: self.name = name
...: self.fmt_fn = fmt_fn
...: def __str__(self):
...: return self.name
...:
In [101]: sec_col = Col('time', lambda val: str(timedelta(seconds=val)).split('.')[0])
In [102]: dollar_col = Col('money', lambda val: '${:.2f}'.format(val))
In [103]: foo = pd.DataFrame(np.random.random((3, 2)) * 1000, columns = [sec_col, dollar_col])
In [104]: print(foo) # ugly
time money
0 773.181402 720.997051
1 33.779925 317.957813
2 590.750129 416.293245
In [105]: print(foo.to_string(formatters = [col.fmt_fn for col in foo.columns])) # pretty
time money
0 0:12:53 $721.00
1 0:00:33 $317.96
2 0:09:50 $416.29
Okay, so I've been happily doing this for a while, but then I recently came across one part of Pandas that doesn't support this. Specifically, methods to_hdf/read_hdf will fail on DataFrames with custom column labels. This is not a dealbreaker for me. I can use pickle instead of HDF5 at the loss of some efficiency.
But the bigger question is, does Pandas in general support custom objects as column labels? In other words, should I continue to use Pandas this way, or will this break in other parts of Pandas (besides HDF5) in the future, causing me future pain?
PS. As a side note, I wouldn't mind if you also chime in on how you solve the problem of column-specific info such as the fmt_fn in the example above, if you're not currently using custom objects as column labels.
Fine-grained control of formatting of a DataFrame isn't really possible right now. E.g., see here or here for some discussion of possibilities. I'm sure a well thought out API (and PR!) would be well received.
In terms of using custom objects as columns, the two biggest issues are probably serialization, and indexing semantics (e.g. can no longer do df['time']).
One possible work-around would be to wrap your DataFrame is some kind of pretty-print structure, like this:
In [174]: class PrettyDF(object):
...: def __init__(self, data, formatters):
...: self.data = data
...: self.formatters = formatters
...: def __str__(self):
...: return self.data.to_string(formatters=self.formatters)
...: def __repr__(self):
...: return self.__str__()
In [172]: foo = PrettyDF(df,
formatters={'money': '${:.2f}'.format,
'time': lambda val: str(timedelta(seconds=val)).split('.')[0]})
In [178]: foo
Out[178]:
time money
0 0:13:17 $399.29
1 0:08:48 $122.44
2 0:07:42 $491.72
In [180]: foo.data['time']
Out[180]:
0 797.699511
1 528.155876
2 462.999224
Name: time, dtype: float64
It's been five years since this was posted, so i hope this is still helpfull to someone. I've managed to build an object to hold metadata for a pandas dataframe column but still be accessable as a regular column (or so it seems to me). The code below is just the part of the whole class that involves this.
__repr is for presenting the name of the object if the dataframe is printed instead of the object
__eq is for checking the requested name to the available name of the objects __hash is also used in this process Column-names need to be hashable as it works the similar to a dictionary.
Thats probably not pythonic way of descibing it, but seems to me like thats the way it works.
class ColumnDescriptor:
def __init__(self, name, **kwargs):
self.name = name
[self.__setattr__(n, v) for n, v in kwargs.items()]
def __repr__(self): return self.name
def __str__(self): return self.name
def __eq__(self, other): return self.name == other
def __hash__(self): return hash(self.name)