Python - Dictionary - Modify __getitem__? - python

Ok so i've build my own variable handler which has a __getitem__ function for use when accessing data via data[key], it works great except for when trying to access a link of items:
data["key"]["subkey"]
def __getitem__(self, key, **args):
print key
...
return self.dict[key]
When trying to access a subkey that doesn't exist, Python simply returns a KeyError without printing "subkey", why is this and how can I get Python to print out what I'm actually trying to get?
I know that I've probably misunderstood the mechanics but is there a way to emulate a dictionary AND follow the string of data that's being requested?
Mainly so I can dynamically log the missing variables in a dictionary flow...
This obviously works (but it's not the native syntax that I like):
data["key:subkey"]
def __getitem__(self, key, **args):
for slice in key.split(':'):
print key
...
The goal is to emulate the following,
Works:
data = {'key' : {'subkey' : 1}}
print data["key"]["subkey"]
Will not work, but I want to catch the exception within __getitem__ and then create the missing key automatically or just log the missing subkey:
data = {'key' : {}}
print data["key"]["subkey"]
Solution:
class Var():
def __init__(self):
self.dict = {'test' : {}}
def __getitem__(self, var, **args):
print ':',var
if var in self.dict:
v = Var(self.dict[var])
return v
print vHandle['test']['down']
Output:
: test
: down
None

The fact is that when Python encounters an expression such as data["key"]["subkey"], what is done internally is (data["key"])["subkey"]. That is, the first part of the expression is resolved: the retrievalof the item "key" from the object "data". Then, Python tries do call __getitem__ on the resulting object of that expression.
If such resulting object does not have a __getitem__method itself, there is your error.
There are two possible workarounds there: you should either work with "tuple indexes" - like
data["key", "subkey"](and then test on your __getitem__ method wether you got a tuple instance as the key) - or make __getitem__ return an specialized object that also features a __getitem__ method - even if all it does is to log the requested keys.

Remember: tmp = foo['bar']['baz'] is the same as tmp = foo['bar']; tmp = tmp['baz']
So to allow arbitrary depths your __getitem__ method must return a new object that also contains such a __getitem__ method.

Related

Static methods for recursive functions within a class?

I'm working with nested dictionaries on Python (2.7) obtained from YAML objects and I have a couple of questions that I've been trying to get an answer to by reading, but have not been successful. I'm somewhat new to Python.
One of the simplest functions is one that reads the whole dictionary and outputs a list of all the keys that exist in it. I use an underscore at the beginning since this function is later used by others within a class.
class Myclass(object):
#staticmethod
def _get_key_list(d,keylist):
for key,value in d.iteritems():
keylist.append(key)
if isinstance(value,dict):
Myclass._get_key_list(d.get(key),keylist)
return list(set(keylist))
def diff(self,dict2):
keylist = []
all_keys1 = self._get_key_list(self.d,keylist)
all_keys2 = self._get_key_list(dict2,keylist)
... # More code
Question 1: Is this a correct way to do this? I am not sure whether it's good practice to use a static method for this reason. Since self._get_key_list(d,keylist) is recursive, I dont want "self" to be the first argument once the function is recursively called, which is what would happen for a regular instance method.
I have a bunch of static methods that I'm using, but I've read in a lot of places thay they could perhaps not be good practice when used a lot. I also thought I could make them module functions, but I wanted them to be tied to the class.
Question 2: Instead of passing the argument keylist to self._get_key_list(d,keylist), how can I initialize an empty list inside the recursive function and update it? Initializing it inside would reset it to [] every time.
I would eliminate keylist as an explicit argument:
def _get_keys(d):
keyset = set()
for key, value in d.iteritems():
keylist.add(key)
if isinstance(value, dict):
keylist.update(_get_key_list(value))
return keyset
Let the caller convert the set to a list if they really need a list, rather than an iterable.
Often, there is little reason to declare something as a static method rather than a function outside the class.
If you are concerned about efficiency (e.g., getting lots of repeat keys from a dict), you can go back to threading a single set/list through the calls as an explicit argument, but don't make it optional; just require that the initial caller supply the set/list to update. To emphasize that the second argument will be mutated, just return None when the function returns.
def _get_keys(d, result):
for key, value in d.iteritems():
result.add(key)
if isinstance(value, dict):
_get_keys(value, result)
result = set()
_get_keys(d1, result)
_get_keys(d2, result)
# etc
There's no good reason to make a recursive function in a class a static method unless it is meant to be invoked outside the context of an instance.
To initialize a parameter, we usually assign to it a default value in the parameter list, but in case it needs to be a mutable object such as an empty list in this case, you need to default it to None and the initialize it inside the function, so that the list reference won't get reused in the next call:
class Myclass(object):
def _get_key_list(self, d, keylist=None):
if keylist is None:
keylist = []
for key, value in d.iteritems():
keylist.append(key)
if isinstance(value, dict):
self._get_key_list(d.get(key), keylist)
return list(set(keylist))
def diff(self, dict2):
all_keys1 = self._get_key_list(self.d)
all_keys2 = self._get_key_list(dict2)
... # More code

python dictionaries and items()

My understanding is that .items() is only avaliable for python dictionaries.
However in the following bit of code, which runs perfectly, it appears that the .items() function is avaliable for a string. (This code is for the preprocessing stage of doc2vec )
I have looked at this for a while and I can't figure out why the .items() seems to work in this piece of code.
In the code, 'sources' is just an attribute of an instance. Yet it is able to call .items().
What am I missing here?
class LabeledLineSentence(object):
def __init__(self, sources):
self.sources = sources
flipped = {}
# make sure that keys are unique
for key, value in sources.items():
if value not in flipped:
flipped[value] = [key]
else:
raise Exception('Non-unique prefix encountered')
The given code only specifies that sources is an attribute of an instance. It doesn't specify its type. In fact it can be any type that is specified at the time of creating an instance of LabeledLineSentence.
i1 = LabeledLineSentence('sample text') # sources is now a string. Throws error!
i2 = LabeledLineSentence({}) # source is a now a dictionary. No error!
Note that LabeledLineSentence implementation expects the sources parameter to be a dictionary.
.items() is available for any class with an items method. For instance, I can define
class MyClass:
def items(self):
return [1,2,3,4]
and then run
mc = MyClass()
for i in mc.items(): print(i)
Presumably your sources object is of a class that has such an attribute. But we don't know what, since it's an argument to the constructor of LabeledLineSentence.
Can you point us to the full source code? Then we might be able to see what is being passed in.

python getattr() with multiple params

Construction getattr(obj, 'attr1.attr2', None) does not work.
What are the best practices to replace this construction?
Divide that into two getattr statements?
You can use operator.attrgetter() in order to get multiple attributes at once:
from operator import attrgetter
my_attrs = attrgetter(attr1, attr2)(obj)
As stated in this answer, the most straightforward solution would be to use operator.attrgetter (more info in this python docs page).
If for some reason, this solution doesn't make you happy, you could use this code snippet:
def multi_getattr(obj, attr, default = None):
"""
Get a named attribute from an object; multi_getattr(x, 'a.b.c.d') is
equivalent to x.a.b.c.d. When a default argument is given, it is
returned when any attribute in the chain doesn't exist; without
it, an exception is raised when a missing attribute is encountered.
"""
attributes = attr.split(".")
for i in attributes:
try:
obj = getattr(obj, i)
except AttributeError:
if default:
return default
else:
raise
return obj
# Example usage
obj = [1,2,3]
attr = "append.__doc__.capitalize.__doc__"
multi_getattr(obj, attr) #Will return the docstring for the
#capitalize method of the builtin string
#object
from this page, which does work. I tested and used it.
I would suggest using something like this:
from operator import attrgetter
attrgetter('attr0.attr1.attr2.attr3')(obj)
If you have the attribute names you want to get in a list, you can do the following:
my_attrs = [getattr(obj, attr) for attr in attr_list]
A simple, but not very eloquent way, to get multiple attr would be to use tuples with or without brackets something like
aval, bval = getattr(myObj,"a"), getattr(myObj,"b")
but I think you might be wanting instead to get atrribute of a contained object with the way you are using dot notation. In which case it would be something like
getattr(myObj.contained, "c")
where contained is an object cotained within myObj object and c is an attribute of contained. Let me know if this is not what you want.

what's the right way to put *arg in a tuple that can be sorted?

I want a dict or tuple I can sort based on attributes of the objects I'm using as arguments for *arg. The way I've been trying to do it just gives me AttributeErrors, which leads me to believe I'm doing it weird.
def function(*arg):
items = {}
for thing in arg:
items.update({thing.name:thing})
while True:
for thing in items:
## lots of other code here, basically just a game loop.
## Problem is that the 'turn order' is based on whatever
## Python decides the order of arguments is inside "items".
## I'd like to be able to sort the dict based on each object's
## attributes (ie, highest 'thing.speed' goes first in the while loop)
The problem is when I try to sort "items" based on an attribute of the objects I put into function(), it gives me "AttributeError: 'str' object has no attribute 'attribute'". Which leads me to believe I'm either unpacking *arg in a lousy way, or I'm trying to do something the wrong way.
while True:
for thing in sorted(items, key=attrgetter('attribute')):
...doesn't work either, keeps telling me I'm trying to manipulate a 'str' object. What am I not doing here?
arg already is a tuple you can sort by an attribute of each item:
def function(*args):
for thing in sorted(args, key=attrgetter('attribute')):
When you iterate over a dict, as sorted is doing, you just get the keys, not the values. So, if you want to use a dict, you need to do:
def function(*args):
# or use a dict comprehension on 2.7+
items = dict((thing.name, thing) for thing in args)
# or just items.values on 3+
for thing in sorted(items.itervalues(), key=attrgetter('attribute')):
to actually sort the args by an attribute. If you want the keys of the dict available as well (not necessary here because the key is also an attribute of the item), use something like:
for name, thing in sorted(items.iteritems(), key=lambda item: item[1].attribute):
Your items is a dict, you can't properly sort a dict. When you try to use it as an iterable, it silently returns its keys list, which is a list of strings. And you don't use your arg after creating a dict.
If you don't need dict lookup, as you just iterate through it, you can replace dict with list of 2-tuples (thing.name, thing), sort it by any attribute and iterate through it. You can also use collections.OrderedDict from Python 2.7 (it exists as a separate ordereddict package for earlier versions) if you really want both dict lookup and ordering.
{edit} Thanks to agf, I understood the problem. So, what I wrote below is a good answer in itself, but not when related to the question above... I let it here for the trace.
Looking to the answers, I may have not understood the question. But here's my understanding: as args is a tuple of arguments you give to your function, it's likely that none of these arguments is an object with a name attribute. But, looking to the errors you report, you're giving string arguments.
Maybe some illustration will help my description:
>>> # defining a function using name attribute
>>> def f(*args):
... for arg in args:
... print arg.name
>>> # defining an object with a name attribute
>>> class o(object):
... def __init__(self, name):
... self.name = name
>>> # now applying the function on the previous object, and on a string
>>> f( o('arg 1'), 'arg 2' )
arg 1
Traceback (most recent call last):
File "<pyshell#9>", line 1, in <module>
f(o('arg 1'), 'ets')
File "<pyshell#3>", line 3, in f
print arg.name
AttributeError: 'str' object has no attribute 'name'
This is failing as strings have no such attribute.
For me, in your code, there is a mistake: you're trying to use attribute name on your inputs, without ever verifying that they have such an attribute. Maybe you should test with hasattr first:
>>> if hasattr(arg, 'name'):
... print arg.name
... else:
... print arg
or with some inspection on the input, to verify if it's an instance of a given class, known to have the requested attribute.

Python getattr equivalent for dictionaries?

What's the most succinct way of saying, in Python, "Give me dict['foo'] if it exists, and if not, give me this other value bar"? If I were using an object rather than a dictionary, I'd use getattr:
getattr(obj, 'foo', bar)
but this raises a key error if I try using a dictionary instead (a distinction I find unfortunate coming from JavaScript/CoffeeScript). Likewise, in JavaScript/CoffeeScript I'd just write
dict['foo'] || bar
but, again, this yields a KeyError. What to do? Something succinct, please!
dict.get(key, default) returns dict[key] if key in dict, else returns default.
Note that the default for default is None so if you say dict.get(key) and key is not in dict then this will just return None rather than raising a KeyError as happens when you use the [] key access notation.
Also take a look at collections module's defaultdict class. It's a dict for which you can specify what it must return when the key is not found. With it you can do things like:
class MyDefaultObj:
def __init__(self):
self.a = 1
from collections import defaultdict
d = defaultdict(MyDefaultObj)
i = d['NonExistentKey']
type(i)
<instance of class MyDefalutObj>
which allows you to use the familiar d[i] convention.
However, as mikej said, .get() also works, but here is the form closer to your JavaScript example:
d = {}
i = d.get('NonExistentKey') or MyDefaultObj()
# the reason this is slightly better than d.get('NonExistent', MyDefaultObj())
# is that instantiation of default value happens only when 'NonExistent' does not exist.
# With d.get('NonExistent', MyDefaultObj()) you spin up a default every time you .get()
type(i)
<instance of class MyDefalutObj>

Categories

Resources