I'm working with nested dictionaries on Python (2.7) obtained from YAML objects and I have a couple of questions that I've been trying to get an answer to by reading, but have not been successful. I'm somewhat new to Python.
One of the simplest functions is one that reads the whole dictionary and outputs a list of all the keys that exist in it. I use an underscore at the beginning since this function is later used by others within a class.
class Myclass(object):
#staticmethod
def _get_key_list(d,keylist):
for key,value in d.iteritems():
keylist.append(key)
if isinstance(value,dict):
Myclass._get_key_list(d.get(key),keylist)
return list(set(keylist))
def diff(self,dict2):
keylist = []
all_keys1 = self._get_key_list(self.d,keylist)
all_keys2 = self._get_key_list(dict2,keylist)
... # More code
Question 1: Is this a correct way to do this? I am not sure whether it's good practice to use a static method for this reason. Since self._get_key_list(d,keylist) is recursive, I dont want "self" to be the first argument once the function is recursively called, which is what would happen for a regular instance method.
I have a bunch of static methods that I'm using, but I've read in a lot of places thay they could perhaps not be good practice when used a lot. I also thought I could make them module functions, but I wanted them to be tied to the class.
Question 2: Instead of passing the argument keylist to self._get_key_list(d,keylist), how can I initialize an empty list inside the recursive function and update it? Initializing it inside would reset it to [] every time.
I would eliminate keylist as an explicit argument:
def _get_keys(d):
keyset = set()
for key, value in d.iteritems():
keylist.add(key)
if isinstance(value, dict):
keylist.update(_get_key_list(value))
return keyset
Let the caller convert the set to a list if they really need a list, rather than an iterable.
Often, there is little reason to declare something as a static method rather than a function outside the class.
If you are concerned about efficiency (e.g., getting lots of repeat keys from a dict), you can go back to threading a single set/list through the calls as an explicit argument, but don't make it optional; just require that the initial caller supply the set/list to update. To emphasize that the second argument will be mutated, just return None when the function returns.
def _get_keys(d, result):
for key, value in d.iteritems():
result.add(key)
if isinstance(value, dict):
_get_keys(value, result)
result = set()
_get_keys(d1, result)
_get_keys(d2, result)
# etc
There's no good reason to make a recursive function in a class a static method unless it is meant to be invoked outside the context of an instance.
To initialize a parameter, we usually assign to it a default value in the parameter list, but in case it needs to be a mutable object such as an empty list in this case, you need to default it to None and the initialize it inside the function, so that the list reference won't get reused in the next call:
class Myclass(object):
def _get_key_list(self, d, keylist=None):
if keylist is None:
keylist = []
for key, value in d.iteritems():
keylist.append(key)
if isinstance(value, dict):
self._get_key_list(d.get(key), keylist)
return list(set(keylist))
def diff(self, dict2):
all_keys1 = self._get_key_list(self.d)
all_keys2 = self._get_key_list(dict2)
... # More code
Related
I'm trying to remove null and empty keys from my python object by calling a method from a module.
from file2 import remove_nulls
# initialize object and set attributes
obj.remove_nulls()
In my remove_nulls() method, if I print the resultant object, I can observe that null and empty keys are removed, but after returning from the function, the final object still has the null and empty keys.
def remove_null(feature):
return json.dumps(del_none(feature.__dict__.copy()))
def del_none(d):
for key, value in list(d.items()):
if value is None:
del d[key]
elif isinstance(value, dict):
del_none(value)
return d
Can someone help to fine where it went wrong?
Just too many copies...:
def remove_null(feature):
return json.dumps(del_none(feature.__dict__.copy()))
that applies del_none on a copy of your object, dumps the proper "cleaned" object, then returns, leaving your object untouched (since you created a copy). You probably need to do just:
def remove_null(feature):
return json.dumps(del_none(feature.__dict__))
The confusion probably comes from the need to copy the dictionary keys+values to avoid removing items on a dict while iterating on it (which was probably handled here, at the wrong level, then handled at the proper level, but the first copy was not removed)
My understanding is that .items() is only avaliable for python dictionaries.
However in the following bit of code, which runs perfectly, it appears that the .items() function is avaliable for a string. (This code is for the preprocessing stage of doc2vec )
I have looked at this for a while and I can't figure out why the .items() seems to work in this piece of code.
In the code, 'sources' is just an attribute of an instance. Yet it is able to call .items().
What am I missing here?
class LabeledLineSentence(object):
def __init__(self, sources):
self.sources = sources
flipped = {}
# make sure that keys are unique
for key, value in sources.items():
if value not in flipped:
flipped[value] = [key]
else:
raise Exception('Non-unique prefix encountered')
The given code only specifies that sources is an attribute of an instance. It doesn't specify its type. In fact it can be any type that is specified at the time of creating an instance of LabeledLineSentence.
i1 = LabeledLineSentence('sample text') # sources is now a string. Throws error!
i2 = LabeledLineSentence({}) # source is a now a dictionary. No error!
Note that LabeledLineSentence implementation expects the sources parameter to be a dictionary.
.items() is available for any class with an items method. For instance, I can define
class MyClass:
def items(self):
return [1,2,3,4]
and then run
mc = MyClass()
for i in mc.items(): print(i)
Presumably your sources object is of a class that has such an attribute. But we don't know what, since it's an argument to the constructor of LabeledLineSentence.
Can you point us to the full source code? Then we might be able to see what is being passed in.
The error comes from publishDB = defaultdict(defaultdict({})) I want to make a database like {subject1:{student_id:{assignemt1:marks, assignment2:marks,finals:marks}} , {student_id:{assignemt1:marks, assignment2:marks,finals:marks}}, subject2:{student_id:{assignemt1:marks, assignment2:marks,finals:marks}} , {student_id:{assignemt1:marks, assignment2:marks,finals:marks}}}. I was trying to populate it as DB[math][10001] = a dict and later read out as d = DB[math][10001]. Since, I am on my office computer I can not try different module.
Am I on right track to do so?
Such a nested dict structure can be achieved using a recursive defaultdict "tree":
def tree():
return defaultdict(tree)
publishDB = tree()
At each level, the defaultdicts are instantiated with tree which is a zero-argument callable, as required.
Then you can simply assign marks:
publishDB[subject][student][assignment] = mark
defaultdict() requires that its first argument be callable: it must be a class that you want an instance of, or a function that returns an instance.
defaultdict({}) has an empty dictionary, which is not callable.
You likely want defaultdict(dict), as dict is a class that returns a dictionary when instantiated (called).
But that still doesn't solve the problem... just moves it to a different level. The outer defaultdict(...) in defaultdict(defaultdict(dict)) has the exact same issue because defaultdict(dict) isn't callable.
You can use a lambda expression to solve this, creating a one-line function that, when called, creates a defaultdict(dict):
defaultdict(lambda: defaultdict(dict))
You could also use the lambda at the lower level if you wanted:
defaultdict(lambda: defaultdict(lambda: {}))
I have an object subclass which implements a dynamic dispatch __ iter __ using a caching generator (I also have a method for invalidating the iter cache) like so:
def __iter__(self):
print("iter called")
if self.__iter_cache is None:
iter_seen = {}
iter_cache = []
for name in self.__slots:
value = self.__slots[name]
iter_seen[name] = True
item = (name, value)
iter_cache.append(item)
yield item
for d in self.__dc_list:
for name, value in iter(d):
if name not in iter_seen:
iter_seen[name] = True
item = (name, value)
iter_cache.append(item)
yield item
self.__iter_cache = iter_cache
else:
print("iter cache hit")
for item in self.__iter_cache:
yield item
It seems to be working... Are there any gotchas I may not be aware of? Am I doing something ridiculous?
container.__iter__() returns an iterator object. The iterator objects themselves are required to support the two following methods, which together form the iterator protocol:
iterator.__iter__()
Returns the iterator object itself.
iterator.next()
Return the next item from the container.
That's exactly what every generator has, so don't be afraid of any side-effects.
It seems like a very fragile approach. It is enough to change any of __slots, __dc_list, __iter_cache during active iteration to put the object into an inconsistent state.
You need either to forbid changing the object during iteration or generate all cache items at once and return a copy of the list.
It might be better to separate the iteration of the object from the caching of the values it returns. That would simplify the iteration process and allow you to easily control how the caching is accomplished as well as whether it is enabled or not, for example.
Another possibly important consideration is the fact that your code would not predictively handle the situation where the object being iterated over gets changed between successive calls to the method. One simple way to deal with that would be to populate the cache's contents completely on the first call, and then just yield what it contains for each call -- and document the behavior.
What you're doing is valid albeit weird. What is a __slots or a __dc_list ?? Generally it's better to describe the contents of your object in an attribute name, rather than its type (eg: self.users rather than self.u_list).
You can use my LazyProperty decorator to simplify this substantially.
Just decorate your method with #LazyProperty. It will be called the first time, and the decorator will then replace the attribute with the results. The only requirement is that the value is repeatable; it doesn't depend on mutable state. You also have that requirement in your current code, with your self.__iter_cache.
def __iter__(self)
return self.__iter
#LazyProperty
def __iter(self)
def my_generator():
yield whatever
return tuple(my_generator())
What's the most succinct way of saying, in Python, "Give me dict['foo'] if it exists, and if not, give me this other value bar"? If I were using an object rather than a dictionary, I'd use getattr:
getattr(obj, 'foo', bar)
but this raises a key error if I try using a dictionary instead (a distinction I find unfortunate coming from JavaScript/CoffeeScript). Likewise, in JavaScript/CoffeeScript I'd just write
dict['foo'] || bar
but, again, this yields a KeyError. What to do? Something succinct, please!
dict.get(key, default) returns dict[key] if key in dict, else returns default.
Note that the default for default is None so if you say dict.get(key) and key is not in dict then this will just return None rather than raising a KeyError as happens when you use the [] key access notation.
Also take a look at collections module's defaultdict class. It's a dict for which you can specify what it must return when the key is not found. With it you can do things like:
class MyDefaultObj:
def __init__(self):
self.a = 1
from collections import defaultdict
d = defaultdict(MyDefaultObj)
i = d['NonExistentKey']
type(i)
<instance of class MyDefalutObj>
which allows you to use the familiar d[i] convention.
However, as mikej said, .get() also works, but here is the form closer to your JavaScript example:
d = {}
i = d.get('NonExistentKey') or MyDefaultObj()
# the reason this is slightly better than d.get('NonExistent', MyDefaultObj())
# is that instantiation of default value happens only when 'NonExistent' does not exist.
# With d.get('NonExistent', MyDefaultObj()) you spin up a default every time you .get()
type(i)
<instance of class MyDefalutObj>