Python recursive setattr()-like function for working with nested dictionaries [duplicate]

Python recursive setattr()-like function for working with nested dictionaries [duplicate] - python

This question already has answers here:
Is it possible to index nested lists using tuples in python?
(7 answers)
Closed 7 months ago.
There are a lot of good getattr()-like functions for parsing nested dictionary structures, such as:
Finding a key recursively in a dictionary
Suppose I have a python dictionary , many nests
https://gist.github.com/mittenchops/5664038
I would like to make a parallel setattr(). Essentially, given:
cmd = 'f[0].a'
val = 'whatever'
x = {"a":"stuff"}
I'd like to produce a function such that I can assign:
x['f'][0]['a'] = val
More or less, this would work the same way as:
setattr(x,'f[0].a',val)
to yield:
>>> x
{"a":"stuff","f":[{"a":"whatever"}]}
I'm currently calling it setByDot():
setByDot(x,'f[0].a',val)
One problem with this is that if a key in the middle doesn't exist, you need to check for and make an intermediate key if it doesn't exist---ie, for the above:
>>> x = {"a":"stuff"}
>>> x['f'][0]['a'] = val
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
KeyError: 'f'
So, you first have to make:
>>> x['f']=[{}]
>>> x
{'a': 'stuff', 'f': [{}]}
>>> x['f'][0]['a']=val
>>> x
{'a': 'stuff', 'f': [{'a': 'whatever'}]}
Another is that keying for when the next item is a lists will be different than the keying when the next item is a string, ie:
>>> x = {"a":"stuff"}
>>> x['f']=['']
>>> x['f'][0]['a']=val
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: 'str' object does not support item assignment
...fails because the assignment was for a null string instead of a null dict. The null dict will be the right assignment for every non-list in dict until the very last one---which may be a list, or a value.
A second problem, pointed out in the comments below by #TokenMacGuy, is that when you have to create a list that does not exist, you may have to create an awful lot of blank values. So,
setattr(x,'f[10].a',val)
---may mean the algorithm will have to make an intermediate like:
>>> x['f']=[{},{},{},{},{},{},{},{},{},{},{}]
>>> x['f'][10]['a']=val
to yield
>>> x
{"a":"stuff","f":[{},{},{},{},{},{},{},{},{},{},{"a":"whatever"}]}
such that this is the setter associated with the getter...
>>> getByDot(x,"f[10].a")
"whatever"
More importantly, the intermediates should /not/ overwrite values that already exist.
Below is the junky idea I have so far---I can identify the lists versus dicts and other data types, and create them where they do not exist. However, I don't see (a) where to put the recursive call, or (b) how to 'build' the deep object as I iterate through the list, and (c) how to distinguish the /probing/ I'm doing as I construct the deep object from the /setting/ I have to do when I reach the end of the stack.
def setByDot(obj,ref,newval):
ref = ref.replace("[",".[")
cmd = ref.split('.')
numkeys = len(cmd)
count = 0
for c in cmd:
count = count+1
while count < numkeys:
if c.find("["):
idstart = c.find("[")
numend = c.find("]")
try:
deep = obj[int(idstart+1:numend-1)]
except:
obj[int(idstart+1:numend-1)] = []
deep = obj[int(idstart+1:numend-1)]
else:
try:
deep = obj[c]
except:
if obj[c] isinstance(dict):
obj[c] = {}
else:
obj[c] = ''
deep = obj[c]
setByDot(deep,c,newval)
This seems very tricky because you kind of have to look-ahead to check the type of the /next/ object if you're making place-holders, and you have to look-behind to build a path up as you go.
UPDATE
I recently had this question answered, too, which might be relevant or helpful.

I have separated this out into two steps. In the first step, the query string is broken down into a series of instructions. This way the problem is decoupled, we can view the instructions before running them, and there is no need for recursive calls.
def build_instructions(obj, q):
"""
Breaks down a query string into a series of actionable instructions.
Each instruction is a (_type, arg) tuple.
arg -- The key used for the __getitem__ or __setitem__ call on
the current object.
_type -- Used to determine the data type for the value of
obj.__getitem__(arg)
If a key/index is missing, _type is used to initialize an empty value.
In this way _type provides the ability to
"""
arg = []
_type = None
instructions = []
for i, ch in enumerate(q):
if ch == "[":
# Begin list query
if _type is not None:
arg = "".join(arg)
if _type == list and arg.isalpha():
_type = dict
instructions.append((_type, arg))
_type, arg = None, []
_type = list
elif ch == ".":
# Begin dict query
if _type is not None:
arg = "".join(arg)
if _type == list and arg.isalpha():
_type = dict
instructions.append((_type, arg))
_type, arg = None, []
_type = dict
elif ch.isalnum():
if i == 0:
# Query begins with alphanum, assume dict access
_type = type(obj)
# Fill out args
arg.append(ch)
else:
TypeError("Unrecognized character: {}".format(ch))
if _type is not None:
# Finish up last query
instructions.append((_type, "".join(arg)))
return instructions
For your example
>>> x = {"a": "stuff"}
>>> print(build_instructions(x, "f[0].a"))
[(<type 'dict'>, 'f'), (<type 'list'>, '0'), (<type 'dict'>, 'a')]
The expected return value is simply the _type (first item) of the next tuple in the instructions. This is very important because it allows us to correctly initialize/reconstruct missing keys.
This means that our first instruction operates on a dict, either sets or gets the key 'f', and is expected to return a list. Similarly, our second instruction operates on a list, either sets or gets the index 0 and is expected to return a dict.
Now let's create our _setattr function. This gets the proper instructions and goes through them, creating key-value pairs as necessary. Finally, it also sets the val we give it.
def _setattr(obj, query, val):
"""
This is a special setattr function that will take in a string query,
interpret it, add the appropriate data structure to obj, and set val.
We only define two actions that are available in our query string:
.x -- dict.__setitem__(x, ...)
[x] -- list.__setitem__(x, ...) OR dict.__setitem__(x, ...)
the calling context determines how this is interpreted.
"""
instructions = build_instructions(obj, query)
for i, (_, arg) in enumerate(instructions[:-1]):
_type = instructions[i + 1][0]
obj = _set(obj, _type, arg)
_type, arg = instructions[-1]
_set(obj, _type, arg, val)
def _set(obj, _type, arg, val=None):
"""
Helper function for calling obj.__setitem__(arg, val or _type()).
"""
if val is not None:
# Time to set our value
_type = type(val)
if isinstance(obj, dict):
if arg not in obj:
# If key isn't in obj, initialize it with _type()
# or set it with val
obj[arg] = (_type() if val is None else val)
obj = obj[arg]
elif isinstance(obj, list):
n = len(obj)
arg = int(arg)
if n > arg:
obj[arg] = (_type() if val is None else val)
else:
# Need to amplify our list, initialize empty values with _type()
obj.extend([_type() for x in range(arg - n + 1)])
obj = obj[arg]
return obj
And just because we can, here's a _getattr function.
def _getattr(obj, query):
"""
Very similar to _setattr. Instead of setting attributes they will be
returned. As expected, an error will be raised if a __getitem__ call
fails.
"""
instructions = build_instructions(obj, query)
for i, (_, arg) in enumerate(instructions[:-1]):
_type = instructions[i + 1][0]
obj = _get(obj, _type, arg)
_type, arg = instructions[-1]
return _get(obj, _type, arg)
def _get(obj, _type, arg):
"""
Helper function for calling obj.__getitem__(arg).
"""
if isinstance(obj, dict):
obj = obj[arg]
elif isinstance(obj, list):
arg = int(arg)
obj = obj[arg]
return obj
In action:
>>> x = {"a": "stuff"}
>>> _setattr(x, "f[0].a", "test")
>>> print x
{'a': 'stuff', 'f': [{'a': 'test'}]}
>>> print _getattr(x, "f[0].a")
"test"
>>> x = ["one", "two"]
>>> _setattr(x, "3[0].a", "test")
>>> print x
['one', 'two', [], [{'a': 'test'}]]
>>> print _getattr(x, "3[0].a")
"test"
Now for some cool stuff. Unlike python, our _setattr function can set unhashable dict keys.
x = []
_setattr(x, "1.4", "asdf")
print x
[{}, {'4': 'asdf'}] # A list, which isn't hashable
>>> y = {"a": "stuff"}
>>> _setattr(y, "f[1.4]", "test") # We're indexing f with 1.4, which is a list!
>>> print y
{'a': 'stuff', 'f': [{}, {'4': 'test'}]}
>>> print _getattr(y, "f[1.4]") # Works for _getattr too
"test"
We aren't really using unhashable dict keys, but it looks like we are in our query language so who cares, right!
Finally, you can run multiple _setattr calls on the same object, just give it a try yourself.

>>> class D(dict):
... def __missing__(self, k):
... ret = self[k] = D()
... return ret
...
>>> x=D()
>>> x['f'][0]['a'] = 'whatever'
>>> x
{'f': {0: {'a': 'whatever'}}}

You can hack something together by fixing two problems:
List that automatically grows when accessed out of bounds (PaddedList)
A way to delay the decision of what to create (list of dict) until you accessed it by the first time (DictOrList)
So the code will look like this:
import collections
class PaddedList(list):
""" List that grows automatically up to the max index ever passed"""
def __init__(self, padding):
self.padding = padding
def __getitem__(self, key):
if isinstance(key, int) and len(self) <= key:
self.extend(self.padding() for i in xrange(key + 1 - len(self)))
return super(PaddedList, self).__getitem__(key)
class DictOrList(object):
""" Object proxy that delays the decision of being a List or Dict """
def __init__(self, parent):
self.parent = parent
def __getitem__(self, key):
# Type of the structure depends on the type of the key
if isinstance(key, int):
obj = PaddedList(MyDict)
else:
obj = MyDict()
# Update parent references with the selected object
parent_seq = (self.parent if isinstance(self.parent, dict)
else xrange(len(self.parent)))
for i in parent_seq:
if self == parent_seq[i]:
parent_seq[i] = obj
break
return obj[key]
class MyDict(collections.defaultdict):
def __missing__(self, key):
ret = self[key] = DictOrList(self)
return ret
def pprint_mydict(d):
""" Helper to print MyDict as dicts """
print d.__str__().replace('defaultdict(None, {', '{').replace('})', '}')
x = MyDict()
x['f'][0]['a'] = 'whatever'
y = MyDict()
y['f'][10]['a'] = 'whatever'
pprint_mydict(x)
pprint_mydict(y)
And the output of x and y will be:
{'f': [{'a': 'whatever'}]}
{'f': [{}, {}, {}, {}, {}, {}, {}, {}, {}, {}, {'a': 'whatever'}]}
The trick consist on creating a defaultdict of objects that can be either a dict or a list depending how you access it.
So when you have the assigment x['f'][10]['a'] = 'whatever' it will work the following way:
Get X['f']. It wont exist so it will return a DictOrList object for the index 'f'
Get X['f'][10]. DictOrList.getitem will be called with an integer index. The DictOrList object will replace itself in the parent collection by a PaddedList
Access the 11th element in the PaddedList will grow it by 11 elements and will return the MyDict element in that position
Assign "whatever" to x['f'][10]['a']
Both PaddedList and DictOrList are bit hacky, but after all the assignments there is no more magic, you have an structure of dicts and lists.

It is possible to synthesize recursively setting items/attributes by overriding __getitem__ to return a return a proxy that can set a value in the original function.
I happen to be working on a library that does a few things similar to this, so I was working on a class that can dynamically assign its own subclasses at instantiation. It makes working with this sort of thing easier, but if that kind of hacking makes you squeamish, you can get similar behavior by creating a ProxyObject similar to the one I create and by creating the individual classes used by the ProxyObject dynamically in the a function. Something like
class ProxyObject(object):
... #see below
def instanciateProxyObjcet(val):
class ProxyClassForVal(ProxyObject,val.__class__):
pass
return ProxyClassForVal(val)
You can use dictionary like I've used in FlexibleObject below would make that implementation significantly more efficient if this is the way you implement it. The code I will providing uses the FlexibleObject though. Right now it only supports classes that, like almost all of Python's builtin classes are capable of being generated by taking an instance of themselves as their sole argument to their __init__/__new__. In the next week or two, I'll add support for anything pickleable, and link to a github repository that contains it. Here's the code:
class FlexibleObject(object):
""" A FlexibleObject is a baseclass for allowing type to be declared
at instantiation rather than in the declaration of the class.
Usage:
class DoubleAppender(FlexibleObject):
def append(self,x):
super(self.__class__,self).append(x)
super(self.__class__,self).append(x)
instance1 = DoubleAppender(list)
instance2 = DoubleAppender(bytearray)
"""
classes = {}
def __new__(cls,supercls,*args,**kws):
if isinstance(supercls,type):
supercls = (supercls,)
else:
supercls = tuple(supercls)
if (cls,supercls) in FlexibleObject.classes:
return FlexibleObject.classes[(cls,supercls)](*args,**kws)
superclsnames = tuple([c.__name__ for c in supercls])
name = '%s%s' % (cls.__name__,superclsnames)
d = dict(cls.__dict__)
d['__class__'] = cls
if cls == FlexibleObject:
d.pop('__new__')
try:
d.pop('__weakref__')
except:
pass
d['__dict__'] = {}
newcls = type(name,supercls,d)
FlexibleObject.classes[(cls,supercls)] = newcls
return newcls(*args,**kws)
Then to use this to use this to synthesize looking up attributes and items of a dictionary-like object you can do something like this:
class ProxyObject(FlexibleObject):
#classmethod
def new(cls,obj,quickrecdict,path,attribute_marker):
self = ProxyObject(obj.__class__,obj)
self.__dict__['reference'] = quickrecdict
self.__dict__['path'] = path
self.__dict__['attr_mark'] = attribute_marker
return self
def __getitem__(self,item):
path = self.__dict__['path'] + [item]
ref = self.__dict__['reference']
return ref[tuple(path)]
def __setitem__(self,item,val):
path = self.__dict__['path'] + [item]
ref = self.__dict__['reference']
ref.dict[tuple(path)] = ProxyObject.new(val,ref,
path,self.__dict__['attr_mark'])
def __getattribute__(self,attr):
if attr == '__dict__':
return object.__getattribute__(self,'__dict__')
path = self.__dict__['path'] + [self.__dict__['attr_mark'],attr]
ref = self.__dict__['reference']
return ref[tuple(path)]
def __setattr__(self,attr,val):
path = self.__dict__['path'] + [self.__dict__['attr_mark'],attr]
ref = self.__dict__['reference']
ref.dict[tuple(path)] = ProxyObject.new(val,ref,
path,self.__dict__['attr_mark'])
class UniqueValue(object):
pass
class QuickRecursiveDict(object):
def __init__(self,dictionary={}):
self.dict = dictionary
self.internal_id = UniqueValue()
self.attr_marker = UniqueValue()
def __getitem__(self,item):
if item in self.dict:
val = self.dict[item]
try:
if val.__dict__['path'][0] == self.internal_id:
return val
else:
raise TypeError
except:
return ProxyObject.new(val,self,[self.internal_id,item],
self.attr_marker)
try:
if item[0] == self.internal_id:
return ProxyObject.new(KeyError(),self,list(item),
self.attr_marker)
except TypeError:
pass #Item isn't iterable
return ProxyObject.new(KeyError(),self,[self.internal_id,item],
self.attr_marker)
def __setitem__(self,item,val):
self.dict[item] = val
The particulars of the implementation will vary depending on what you want. It's obviously significantly easier to just override __getitem__ in the proxy than it is to override both __getitem__ and __getattribute__ or __getattr__. The syntax you are using in setbydot makes it look like you would be happiest with some solution that overrides a mixture of the two.
If you are just using the dictionary to compare values, using =,<=,>= etc. Overriding __getattribute__ works really nicely. If you are wanting to do something more sophisticated, you will probably be better off overriding __getattr__ and doing some checks in __setattr__ to determine whether you want to be synthesizing setting the attribute by setting a value in the dictionary or whether you want to be actually setting the attribute on the item you've obtained. Or you might want to handle it so that if your object has an attribute, __getattribute__ returns a proxy to that attribute and __setattr__ always just sets the attribute in the object (in which case, you can completely omit it). All of these things depend on exactly what you are trying to use the dictionary for.
You also may want to create __iter__ and the like. It takes a little bit of effort to make them, but the details should follow from the implementation of __getitem__ and __setitem__.
Finally, I'm going to briefly summarize the behavior of the QuickRecursiveDict in case it's not immediately clear from inspection. The try/excepts are just shorthand for checking to see whether the ifs can be performed. The one major defect of synthesizing the recursive setting rather than find a way to do it is that you can no longer be raising KeyErrors when you try to access a key that hasn't been set. However, you can come pretty close by returning a subclass of KeyError which is what I do in the example. I haven't tested it so I won't add it to the code, but you may want to pass in some human-readable representation of the key to KeyError.
But aside from all that it works rather nicely.
>>> qrd = QuickRecursiveDict
>>> qrd[0][13] # returns an instance of a subclass of KeyError
>>> qrd[0][13] = 9
>>> qrd[0][13] # 9
>>> qrd[0][13]['forever'] = 'young'
>>> qrd[0][13] # 9
>>> qrd[0][13]['forever'] # 'young'
>>> qrd[0] # returns an instance of a subclass of KeyError
>>> qrd[0] = 0
>>> qrd[0] # 0
>>> qrd[0][13]['forever'] # 'young'
One more caveat, the things being returned is not quite what it looks like. It's a proxy to what it looks like. If you want the int 9, you need int(qrd[0][13]) not qrd[0][13]. For ints this doesn't matter much since, +,-,= and all that bypass __getattribute__ but for lists, you would lose attributes like append if you didn't recast them. (You'd keep len and other builtin methods, just not attributes of list. You lose __len__.)
So that's it. The code's a little bit convoluted, so let me know if you have any questions. I probably can't answer them until tonight unless the answer's really brief. I wish I saw this question sooner, it's a really cool question, and I'll try to update a cleaner solution soon. I had fun trying to code a solution into the wee hours of last night. :)

Related

Derived AccumulatorParam in py-spark returns EOFException on addInPace function

I have a set of json objects of which one key is a nested object of list of keys, which are not common across all the json object. Example object is:
{
id: 'AJDFAKDF',
class: 'class1',
nested_object: {
'key1': 'value1',
'key2': 'value2',
'key3': 'value3'
}
}
The id is unique identifier of the object and key1, key2... will not be same across all objects i.e. 1 object may have key1 in nested_object and the other may not. Also, class will vary for each record.
I have 100s of millions of such records and I am interested in finding out the distribution of records by the keys of nested_object per class. Sample output would be.
{
class1: {
key1: 32,
key45: 45,
},
class2: {
key2: 39,
key45: 100
}
}
I am using pyspark accumulators to generate this distribution. I have derived from the dict class to create a CounterOfCounters class. This is on lines of the Counter class in collections library of python. Much code is copied from the the source code.
class CounterOfCounters(dict):
def __init__(*args, **kwds):
'''Create a new, empty CounterOfCounters object. Or, initialize
the count from another mapping of elements to their Counter.
>>> c = CounterOfCounters() # a new, empty CoC
>>> c = CounterOfCounters({'a': 4, 'b': 2}) # a new counter from a mapping
'''
if not args:
raise TypeError("descriptor '__init__' of 'Counter' object "
"needs an argument")
self, *args = args
if len(args) > 1:
raise TypeError('expected at most 1 arguments, got %d' % len(args))
super(CounterOfCounters, self).__init__()
self.update(*args, **kwds)
def __missing__(self, key):
# Needed so that self[missing_item] does not raise KeyError
return Counter()
def update(*args, **kwds):
'''Like dict.update() but add counts instead of replacing them.
Source can be a dictionary
'''
# The regular dict.update() operation makes no sense here because the
# replace behavior results in the some of original untouched counts
# being mixed-in with all of the other counts for a mismash that
# doesn't have a straight-forward interpretation in most counting
# contexts. Instead, we implement straight-addition. Both the inputs
# and outputs are allowed to contain zero and negative counts.
# print("In update. [0] ", args)
if not args:
raise TypeError("descriptor 'update' of 'CounterOfCounters' object "
"needs an argument")
self, *args = args
# print("In update. [1] ", self, args)
if len(args) > 1:
raise TypeError('expected at most 1 arguments, got %d' % len(args))
iterable = args[0] if args else None
if iterable is not None:
if isinstance(iterable, dict):
if self:
self_get = self.get
for elem, count in iterable.items():
if isinstance(count, Counter):
self[elem] = count + self_get(elem, Counter())
else:
raise TypeError("values can only be of type Counter")
else:
for elem, count in iterable.items():
if isinstance(count, Counter):
self[elem] = count
else:
raise TypeError("values can only be of type Counter")
else:
raise TypeError("values can only be of type Counter")
if kwds:
raise TypeError("Can not process **kwds as of now")
I then derive from AccumulatorParam my custom AccumulatorParam class.
class KeyAccumulatorParam(AccumulatorParam):
def zero(self, value):
c = CounterOfCounters(
dict.fromkeys(
value.keys(),
Counter()
)
)
return c
def addInPlace(self, ac1, ac2):
ac1.update(ac2)
return ac1
Then I define the accumulator variable and the function to add it.
keyAccum = sc.accumulator(
CounterOfCounters(
dict.fromkeys(
['class1', 'class2', 'class3', 'class4'],
Counter()
)
),
KeyAccumulatorParam()
)
def accumulate_keycounts(rec):
global keyAccum
c = Counter(list(rec['nested_object'].keys()))
keyAccum.add(CounterOfCounters({
rec['class']: c
}))
After which I call accumulate_keycounts foreach record in rdd.
test_rdd.foreach(accumulate_keycounts)
On doing this, I get the end of file exception:
Caused by: java.io.EOFException
at java.io.DataInputStream.readInt(DataInputStream.java:392)
at org.apache.spark.api.python.PythonRunner$$anon$1$$anonfun$read$1.apply$mcVI$sp(PythonRDD.scala:200)
at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:160)
at org.apache.spark.api.python.PythonRunner$$anon$1.read(PythonRDD.scala:199)
... 25 more
Some more additional info. I am using spark 2.2.1 on my MacBook Pro Retina (2013 8GB) for testing my function. The test_rdd has only ~20 records. It is a sample from larger rdd of ~200000 records.
I am not sure why this error is happening. I have enough memory to finish the task. The accumulator is commutative and associative as per the requirements of accumulator.
What I have debugged till now and figured out is that the error happens after first record is processed i.e. first time addInPlace is called. The CounterOfCounters.update function returns normally. But after that the EOFException occurs.
Any tips to debug or pointers to the problem will be much appreciated.

Using overloaded methods for hashing

I have a list of JSON objects (around 30,000) and would like to remove duplicates from them. I consider them a duplicate as long as ModuleCode is the same. Below is an example of one object.
[{"AveragePoints": "4207",
"ModuleTitle": "Tool Engineering",
"Semester": "2",
"ModuleCode": "ME4261",
"StudentAcctType": "P",
"AcadYear": "2013/2014"}]
Planning to do so by hashing, following the example given here. After some experimentation I'm still unsure of how to correctly use the overloaded methods __eq__ and __hash__. Do I create a new class and contain the two methods inside?
Below is my attempt at a solution. It returns NameError: name 'obj' is not defined which I suspect is my incorrect usage of class.
import json
json_data = open('small.json')
data = json.load(json_data)
class Module(obj):
def __eq__(self, other):
return self.ModuleCode == other.ModuleCode
def __hash__(self):
return hash(('ModuleCode', self.ModuleCode))
hashtable = {} #python's dict is implemented as a hashtable
for item in data:
cur = Module(item)
if hashtable[hash(cur)] == item.ModuleCode:
print "duplicate" + item.ModuleCode
else:
hashtable[hash(cur)] = item.ModuleCode
json_data.close()

The problem is that you are referring to obj, which doesn't exist, instead of object. Also, you don't actually define Module.__init__, so never initialise the ModuleCode attribute. Here is one way you could do it:
class Module(object):
def __init__(self, ModuleCode, **data):
self.ModuleCode = ModuleCode
self.data = data
def __eq__(self, other):
return self.ModuleCode == other.ModuleCode
def __hash__(self):
return hash(('ModuleCode', self.ModuleCode))
Then when you create the instance:
cur = Module(**item)
(If the syntax is unfamiliar, see e.g. What does ** (double star) and * (star) do for parameters?)
Also, note that you can use a set rather than a dict for removing duplicates; storing the ModuleCode as the value is duplicating information (as that's the whole point of implementing __hash__ and __eq__):
unique = set()
for item in data:
cur = Module(**item)
if cur in unique:
print "duplicate" + cur.ModuleCode
else:
unique.add(cur)

Inverse of hasattr in Python

hasattr(obj, attribute) is used to check if an object has the specified attribute but given an attribute is there a way to know where (all) it is defined?
Assume that my code is getting the name of an attribute (or a classmethod) as string and I want to invoke classname.attribute but I don't have the classname.
One solution that comes to my mind is this
def finder(attr):
for obj in globals():
try:
if globals()[obj].__dict__[attr]:
return(globals()[obj])
except:
...
usage:
class Lime(object):
#classmethod
def lfunc(self):
print('Classic')
getattr(finder('lfunc'),'lfunc')() #Runs lfunc method of Lime class
I am quite sure that this is not the best (oe even proper way) to do it. Can someone please provide a better way.

It is always "possible". Wether it is desirable is another history.
A quick and dirty way to do it is to iterate linearly over all classes and check if any define the attribute you have. Of course, that is subject to conflicts, and it will yield the first class that has such a named attribute. If it exists in more than one, it is up to you to decide which you want:
def finder(attr):
for cls in object.__subclasses__():
if hasattr(cls, attr):
return cls
raise ValueError
Instead of searching in "globals" this searches all subclasses of "object" - thus the classes to be found don't need to be in the namespace of the module where the finder function is.
If your methods are unique in teh set of classes you are searching, though, maybe you could just assemble a mapping of all methods and use it to call them instead.
Let's suppose all your classes inehrit from a class named "Base":
mapper = {attr_name:getattr(cls, attr_name) for cls in base.__subclasses__() for attr_name, obj in cls.__dict__.items()
if isinstance(obj, classmethod) }
And you call them with mapper['attrname']()
This avoids a linear search at each method call and thus would be much better.
- EDIT -
__subclassess__ just find the direct subclasses of a class, not the inheritance tree - so it won't be usefull in "real life" - maybe it is in the specifc case the OP has in its hands.
If one needs to find things across a inheritance tree, one needs to recurse over the each subclass as well.
As for old-style classes: of course this won't work - that is one of the motives for which they are broken by default in new code.
As for non-class attributes: they can only be found inspecting instances anyway - so another method has to be thought of - does not seem to be the concern of the O.P. here.

This might help:
import gc
def checker(checkee, maxdepth = 3):
def onlyDict(ls):
return filter(lambda x: isinstance(x, dict), ls)
collection = []
toBeInspected = {}
tBI = toBeInspected
gc.collect()
for dic in onlyDict(gc.get_referrers(checkee)):
for item, value in dic.iteritems():
if value is checkee:
collection.append(item)
elif item != "checker":
tBI[item] = value
def _auxChecker(checkee, path, collection, checked, current, depth):
if current in checked: return
checked.append(current)
gc.collect()
for dic in onlyDict(gc.get_referents(current)):
for item, value in dic.iteritems():
currentPath = path + "." + item
if value is checkee:
collection.append(currentPath)
else:
try:
_auxChecker(checkee, currentPath, collection,
checked, value, depth + 1)
if depth < maxdepth else None
except TypeError:
continue
checked = []
for item, value in tBI.iteritems():
_auxChecker(checkee, item, collection, checked, value, 1)
return collection
How to use:
referrer = []
class Foo:
pass
noo = Foo()
bar = noo
import xml
import libxml2
import sys
import os
op = os.path
xml.foo = bar
foobar = noo
for x in checker(foobar, 5):
try:
y= eval(x)
referrer.append(x)
except:
continue
del x, y
ps: attributes of the checkee will not be further checked, for recursive or nested references to the checkee itself.

This should work in all circumstances, but still needs a lot of testing:
import inspect
import sys
def finder(attr, classes=None):
result = []
if classes is None:
# get all accessible classes
classes = [obj for name, obj in inspect.getmembers(
sys.modules[__name__])]
for a_class in classes:
if inspect.isclass(a_class):
if hasattr(a_class, attr):
result.append(a_class)
else:
# we check for instance attributes
if hasattr(a_class(), attr):
result.append(a_class)
try:
result += finder(attr, a_class.__subclasses__())
except:
# old style classes (that don't inherit from object) do not
# have __subclasses; not the best solution though
pass
return list(set(result)) # workaround duplicates
def main(attr):
print finder(attr)
return 0
if __name__ == "__main__":
sys.exit(main("some_attr"))

Python dictionary keys(which are class objects) comparison with multiple comparer

I am using custom objects as keys in python dictionary. These objects has some default hash and eq methods defined which are being used in default comparison
But in some function i need to use a different way to compare these objects.
So is there any way to override or pass a new comparer for these key comparison for this specific function only.
Updated: My class has following type of functionality ( here i can not edit hash method ,it will affect a lot at other places)
class test(object):
def __init__(self,name,city):
self.name=name
self.city=city
def __eq__(self,other):
hash_equality= (self.name==other.name)
if(not hash_equality):
#check with lower
return (self.name.lower()==other.name.lower())
def __hash__(self):
return self.name.__hash__()
my_dict={}
a=test("a","city1")
my_dict[a]="obj1"
b=test("a","city2")
print b in my_dict #prints true
c=test("A","city1")
print c in my_dict #prints false
print c in my_dict.keys() #prints true
# my_dict[c] throw error
This is the normal functionality. But in one specific method i want to override/or pass a new custom comparer where the new hash code is like
def __hash__(self):
return self.name.lower().__hash__()
so that c in my_dict returns ture
or my_dict[c] will return "obj1"
Sorry for so many updates.
Like in sorting we can pass custom method as comparer , is there any way to do the same here.

The only way to make this work is to create a copy of your dictionary using the new hash and comparison-function. The reason is that the dictionary needs to rehash every stored key with the new hash-function to make the lookup work as you desire. Since you cannot provide a custom hash-function to a dictionary (it always uses the one of the key-objects), your best bet is probably to wrap your objects in a type that uses your custom hash and comparison-functions.
class WrapKey(object):
__init__(self, wrapee):
self._wrapee = wrapee
__hash__(self):
return self._wrapee.name.lower().__hash__()
__eq__(self, other):
return self._wrapee.name == other._wrapee.name
def func(d):
d_copy = dict((WrapKey(key), value) for key, value in d.iteritems())
# d_copy will now ignore case

Have a look at the comparison methods you can define in an object.
Depending on what you want to do, __cmp__ might also be interesting.

A little hack for this situation:
class test(object):
def __init__(self,name,city,hash_func=None):
self.name=name
self.city=city
self.hash_func = hash_func
def __eq__(self,other):
return self.__hash__()==other.__hash__()
def __hash__(self):
if self.hash_func is None:
return self.name.__hash__()
else:
return self.hash_func(self)
my_dict={}
a=test("a","city1")
my_dict[a]="obj1"
b=test("a","city2")
print b in my_dict #prints true
c=test("A","city1")
print c in my_dict #Prints false
c.hash_func = lambda x: x.name.lower().__hash__()
print c in my_dict #Now it prints true
You can't change the hash stored in the dict, but you can change the hash use for looking up. Of course, this leads to something weird like this
my_dict={}
a=test("a","city1")
my_dict[a]="obj1"
a.hash_func = lambda x: 1
for key in my_dict:
print key in my_dict # False

now I am using custom dict(derived class of dict) which take comparer as parameter and i have overridden the contains and getitems() which checks and give value based on the comparer.

Steps : Implement a custom key class and override hash and equality function.
e.g.
class CustomDictKey(object):
def __init__(self,
param1,
param2):
self._param1 = param1
self._param2 = param2
# overriding hash and equality function does the trick
def __hash__(self):
return hash((self._param1,
self._param2))
def __eq__(self, other):
return ( ( self._param1,
self._param2 ) == ( other._param1,
other._param2) )
def __str__(self):
return "param 1: {0} param 2: {1} ".format(self._param1, self._param2)
main method
if name == 'main':
# create custom key
k1 = CustomDictKey(10,5)
k2 = CustomDictKey (2, 4)
dictionary = {}
#insert elements in dictionary with custom key
dictionary[k1] = 10
dictionary[k2] = 20
# access dictionary values with custom keys and print values
print "key: ", k1, "val :", dictionary[k1]
print "key: ", k2, "val :", dictionary[k2]
Refer the link Using custom class as key in Python dictionary for complete details.

Python: Pickling a dict with some unpicklable items

I have an object gui_project which has an attribute .namespace, which is a namespace dict. (i.e. a dict from strings to objects.)
(This is used in an IDE-like program to let the user define his own object in a Python shell.)
I want to pickle this gui_project, along with the namespace. Problem is, some objects in the namespace (i.e. values of the .namespace dict) are not picklable objects. For example, some of them refer to wxPython widgets.
I'd like to filter out the unpicklable objects, that is, exclude them from the pickled version.
How can I do this?
(One thing I tried is to go one by one on the values and try to pickle them, but some infinite recursion happened, and I need to be safe from that.)
(I do implement a GuiProject.__getstate__ method right now, to get rid of other unpicklable stuff besides namespace.)

I would use the pickler's documented support for persistent object references. Persistent object references are objects that are referenced by the pickle but not stored in the pickle.
http://docs.python.org/library/pickle.html#pickling-and-unpickling-external-objects
ZODB has used this API for years, so it's very stable. When unpickling, you can replace the object references with anything you like. In your case, you would want to replace the object references with markers indicating that the objects could not be pickled.
You could start with something like this (untested):
import cPickle
def persistent_id(obj):
if isinstance(obj, wxObject):
return "filtered:wxObject"
else:
return None
class FilteredObject:
def __init__(self, about):
self.about = about
def __repr__(self):
return 'FilteredObject(%s)' % repr(self.about)
def persistent_load(obj_id):
if obj_id.startswith('filtered:'):
return FilteredObject(obj_id[9:])
else:
raise cPickle.UnpicklingError('Invalid persistent id')
def dump_filtered(obj, file):
p = cPickle.Pickler(file)
p.persistent_id = persistent_id
p.dump(obj)
def load_filtered(file)
u = cPickle.Unpickler(file)
u.persistent_load = persistent_load
return u.load()
Then just call dump_filtered() and load_filtered() instead of pickle.dump() and pickle.load(). wxPython objects will be pickled as persistent IDs, to be replaced with FilteredObjects at unpickling time.
You could make the solution more generic by filtering out objects that are not of the built-in types and have no __getstate__ method.
Update (15 Nov 2010): Here is a way to achieve the same thing with wrapper classes. Using wrapper classes instead of subclasses, it's possible to stay within the documented API.
from cPickle import Pickler, Unpickler, UnpicklingError
class FilteredObject:
def __init__(self, about):
self.about = about
def __repr__(self):
return 'FilteredObject(%s)' % repr(self.about)
class MyPickler(object):
def __init__(self, file, protocol=0):
pickler = Pickler(file, protocol)
pickler.persistent_id = self.persistent_id
self.dump = pickler.dump
self.clear_memo = pickler.clear_memo
def persistent_id(self, obj):
if not hasattr(obj, '__getstate__') and not isinstance(obj,
(basestring, int, long, float, tuple, list, set, dict)):
return "filtered:%s" % type(obj)
else:
return None
class MyUnpickler(object):
def __init__(self, file):
unpickler = Unpickler(file)
unpickler.persistent_load = self.persistent_load
self.load = unpickler.load
self.noload = unpickler.noload
def persistent_load(self, obj_id):
if obj_id.startswith('filtered:'):
return FilteredObject(obj_id[9:])
else:
raise UnpicklingError('Invalid persistent id')
if __name__ == '__main__':
from cStringIO import StringIO
class UnpickleableThing(object):
pass
f = StringIO()
p = MyPickler(f)
p.dump({'a': 1, 'b': UnpickleableThing()})
f.seek(0)
u = MyUnpickler(f)
obj = u.load()
print obj
assert obj['a'] == 1
assert isinstance(obj['b'], FilteredObject)
assert obj['b'].about

This is how I would do this (I did something similar before and it worked):
Write a function that determines whether or not an object is pickleable
Make a list of all the pickleable variables, based on the above function
Make a new dictionary (called D) that stores all the non-pickleable variables
For each variable in D (this only works if you have very similar variables in d)
make a list of strings, where each string is legal python code, such that
when all these strings are executed in order, you get the desired variable
Now, when you unpickle, you get back all the variables that were originally pickleable. For all variables that were not pickleable, you now have a list of strings (legal python code) that when executed in order, gives you the desired variable.
Hope this helps

I ended up coding my own solution to this, using Shane Hathaway's approach.
Here's the code. (Look for CutePickler and CuteUnpickler.) Here are the tests. It's part of GarlicSim, so you can use it by installing garlicsim and doing from garlicsim.general_misc import pickle_tools.
If you want to use it on Python 3 code, use the Python 3 fork of garlicsim.

One approach would be to inherit from pickle.Pickler, and override the save_dict() method. Copy it from the base class, which reads like this:
def save_dict(self, obj):
write = self.write
if self.bin:
write(EMPTY_DICT)
else: # proto 0 -- can't use EMPTY_DICT
write(MARK + DICT)
self.memoize(obj)
self._batch_setitems(obj.iteritems())
However, in the _batch_setitems, pass an iterator that filters out all items that you don't want to be dumped, e.g
def save_dict(self, obj):
write = self.write
if self.bin:
write(EMPTY_DICT)
else: # proto 0 -- can't use EMPTY_DICT
write(MARK + DICT)
self.memoize(obj)
self._batch_setitems(item for item in obj.iteritems()
if not isinstance(item[1], bad_type))
As save_dict isn't an official API, you need to check for each new Python version whether this override is still correct.

The filtering part is indeed tricky. Using simple tricks, you can easily get the pickle to work. However, you might end up filtering out too much and losing information that you could keep when the filter looks a little bit deeper. But the vast possibility of things that can end up in the .namespace makes building a good filter difficult.
However, we could leverage pieces that are already part of Python, such as deepcopy in the copy module.
I made a copy of the stock copy module, and did the following things:
create a new type named LostObject to represent object that will be lost in pickling.
change _deepcopy_atomic to make sure x is picklable. If it's not, return an instance of LostObject
objects can define methods __reduce__ and/or __reduce_ex__ to provide hint about whether and how to pickle it. We make sure these methods will not throw exception to provide hint that it cannot be pickled.
to avoid making unnecessary copy of big object (a la actual deepcopy), we recursively check whether an object is picklable, and only make unpicklable part. For instance, for a tuple of a picklable list and and an unpickable object, we will make a copy of the tuple - just the container - but not its member list.
The following is the diff:
[~/Development/scratch/] $ diff -uN /System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/copy.py mcopy.py
--- /System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/copy.py 2010-01-09 00:18:38.000000000 -0800
+++ mcopy.py 2010-11-10 08:50:26.000000000 -0800
## -157,6 +157,13 ##
cls = type(x)
+ # if x is picklable, there is no need to make a new copy, just ref it
+ try:
+ dumps(x)
+ return x
+ except TypeError:
+ pass
+
copier = _deepcopy_dispatch.get(cls)
if copier:
y = copier(x, memo)
## -179,10 +186,18 ##
reductor = getattr(x, "__reduce_ex__", None)
if reductor:
rv = reductor(2)
+ try:
+ x.__reduce_ex__()
+ except TypeError:
+ rv = LostObject, tuple()
else:
reductor = getattr(x, "__reduce__", None)
if reductor:
rv = reductor()
+ try:
+ x.__reduce__()
+ except TypeError:
+ rv = LostObject, tuple()
else:
raise Error(
"un(deep)copyable object of type %s" % cls)
## -194,7 +209,12 ##
_deepcopy_dispatch = d = {}
+from pickle import dumps
+class LostObject(object): pass
def _deepcopy_atomic(x, memo):
+ try:
+ dumps(x)
+ except TypeError: return LostObject()
return x
d[type(None)] = _deepcopy_atomic
d[type(Ellipsis)] = _deepcopy_atomic
Now back to the pickling part. You simply make a deepcopy using this new deepcopy function and then pickle the copy. The unpicklable parts have been removed during the copying process.
x = dict(a=1)
xx = dict(x=x)
x['xx'] = xx
x['f'] = file('/tmp/1', 'w')
class List():
def __init__(self, *args, **kwargs):
print 'making a copy of a list'
self.data = list(*args, **kwargs)
x['large'] = List(range(1000))
# now x contains a loop and a unpickable file object
# the following line will throw
from pickle import dumps, loads
try:
dumps(x)
except TypeError:
print 'yes, it throws'
def check_picklable(x):
try:
dumps(x)
except TypeError:
return False
return True
class LostObject(object): pass
from mcopy import deepcopy
# though x has a big List object, this deepcopy will not make a new copy of it
c = deepcopy(x)
dumps(c)
cc = loads(dumps(c))
# check loop refrence
if cc['xx']['x'] == cc:
print 'yes, loop reference is preserved'
# check unpickable part
if isinstance(cc['f'], LostObject):
print 'unpicklable part is now an instance of LostObject'
# check large object
if loads(dumps(c))['large'].data[999] == x['large'].data[999]:
print 'large object is ok'
Here is the output:
making a copy of a list
yes, it throws
yes, loop reference is preserved
unpicklable part is now an instance of LostObject
large object is ok
You see that 1) mutual pointers (between x and xx) are preserved and we do not run into infinite loop; 2) the unpicklable file object is converted to a LostObject instance; and 3) not new copy of the large object is created since it is picklable.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python recursive setattr()-like function for working with nested dictionaries [duplicate] - python

>>> class D(dict): ... def missing(self, k): ... ret = self[k] = D() ... return ret ... >>> x=D() >>> x['f'][0]['a'] = 'whatever' >>> x {'f': {0: {'a': 'whatever'}}}

Related

Derived AccumulatorParam in py-spark returns EOFException on addInPace function

Using overloaded methods for hashing

Inverse of hasattr in Python

Python dictionary keys(which are class objects) comparison with multiple comparer

Python: Pickling a dict with some unpicklable items

Categories

Resources

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python recursive setattr()-like function for working with nested dictionaries [duplicate] - python

>>> class D(dict): ... def __missing__(self, k): ... ret = self[k] = D() ... return ret ... >>> x=D() >>> x['f'][0]['a'] = 'whatever' >>> x {'f': {0: {'a': 'whatever'}}}

Related

Derived AccumulatorParam in py-spark returns EOFException on addInPace function

Using overloaded methods for hashing

Inverse of hasattr in Python

Python dictionary keys(which are class objects) comparison with multiple comparer

Python: Pickling a dict with some unpicklable items

Categories

Resources

>>> class D(dict): ... def missing(self, k): ... ret = self[k] = D() ... return ret ... >>> x=D() >>> x['f'][0]['a'] = 'whatever' >>> x {'f': {0: {'a': 'whatever'}}}