Related
I'm working with nested dictionaries on Python (2.7) obtained from YAML objects and I have a couple of questions that I've been trying to get an answer to by reading, but have not been successful. I'm somewhat new to Python.
One of the simplest functions is one that reads the whole dictionary and outputs a list of all the keys that exist in it. I use an underscore at the beginning since this function is later used by others within a class.
class Myclass(object):
#staticmethod
def _get_key_list(d,keylist):
for key,value in d.iteritems():
keylist.append(key)
if isinstance(value,dict):
Myclass._get_key_list(d.get(key),keylist)
return list(set(keylist))
def diff(self,dict2):
keylist = []
all_keys1 = self._get_key_list(self.d,keylist)
all_keys2 = self._get_key_list(dict2,keylist)
... # More code
Question 1: Is this a correct way to do this? I am not sure whether it's good practice to use a static method for this reason. Since self._get_key_list(d,keylist) is recursive, I dont want "self" to be the first argument once the function is recursively called, which is what would happen for a regular instance method.
I have a bunch of static methods that I'm using, but I've read in a lot of places thay they could perhaps not be good practice when used a lot. I also thought I could make them module functions, but I wanted them to be tied to the class.
Question 2: Instead of passing the argument keylist to self._get_key_list(d,keylist), how can I initialize an empty list inside the recursive function and update it? Initializing it inside would reset it to [] every time.
I would eliminate keylist as an explicit argument:
def _get_keys(d):
keyset = set()
for key, value in d.iteritems():
keylist.add(key)
if isinstance(value, dict):
keylist.update(_get_key_list(value))
return keyset
Let the caller convert the set to a list if they really need a list, rather than an iterable.
Often, there is little reason to declare something as a static method rather than a function outside the class.
If you are concerned about efficiency (e.g., getting lots of repeat keys from a dict), you can go back to threading a single set/list through the calls as an explicit argument, but don't make it optional; just require that the initial caller supply the set/list to update. To emphasize that the second argument will be mutated, just return None when the function returns.
def _get_keys(d, result):
for key, value in d.iteritems():
result.add(key)
if isinstance(value, dict):
_get_keys(value, result)
result = set()
_get_keys(d1, result)
_get_keys(d2, result)
# etc
There's no good reason to make a recursive function in a class a static method unless it is meant to be invoked outside the context of an instance.
To initialize a parameter, we usually assign to it a default value in the parameter list, but in case it needs to be a mutable object such as an empty list in this case, you need to default it to None and the initialize it inside the function, so that the list reference won't get reused in the next call:
class Myclass(object):
def _get_key_list(self, d, keylist=None):
if keylist is None:
keylist = []
for key, value in d.iteritems():
keylist.append(key)
if isinstance(value, dict):
self._get_key_list(d.get(key), keylist)
return list(set(keylist))
def diff(self, dict2):
all_keys1 = self._get_key_list(self.d)
all_keys2 = self._get_key_list(dict2)
... # More code
I just started building a text based game yesterday as an exercise in learning Python (I'm using 3.3). I say "text based game," but I mean more of a MUD than a choose-your-own adventure. Anyway, I was really excited when I figured out how to handle inheritance and multiple inheritance using super() yesterday, but I found that the argument-passing really cluttered up the code, and required juggling lots of little loose variables. Also, creating save files seemed pretty nightmarish.
So, I thought, "What if certain class hierarchies just took one argument, a dictionary, and just passed the dictionary back?" To give you an example, here are two classes trimmed down to their init methods:
class Actor:
def __init__(self, in_dict,**kwds):
super().__init__(**kwds)
self._everything = in_dict
self._name = in_dict["name"]
self._size = in_dict["size"]
self._location = in_dict["location"]
self._triggers = in_dict["triggers"]
self._effects = in_dict["effects"]
self._goals = in_dict["goals"]
self._action_list = in_dict["action list"]
self._last_action = ''
self._current_action = '' # both ._last_action and ._current_action get updated by .update_action()
class Item(Actor):
def __init__(self,in_dict,**kwds)
super().__init__(in_dict,**kwds)
self._can_contain = in_dict("can contain") #boolean entry
self._inventory = in_dict("can contain") #either a list or dict entry
class Player(Actor):
def __init__(self, in_dict,**kwds):
super().__init__(in_dict,**kwds)
self._inventory = in_dict["inventory"] #entry should be a Container object
self._stats = in_dict["stats"]
Example dict that would be passed:
playerdict = {'name' : '', 'size' : '0', 'location' : '', 'triggers' : None, 'effects' : None, 'goals' : None, 'action list' = None, 'inventory' : Container(), 'stats' : None,}
(The None's get replaced by {} once the dictionary has been passed.)
So, in_dict gets passed to the previous class instead of a huge payload of **kwds.
I like this because:
It makes my code a lot neater and more manageable.
As long as the dicts have at least some entry for the key called, it doesn't break the code. Also, it doesn't matter if a given argument never gets used.
It seems like file IO just got a lot easier (dictionaries of player data stored as dicts, dictionaries of item data stored as dicts, etc.)
I get the point of **kwds (EDIT: apparently I didn't), and it hasn't seemed cumbersome when passing fewer arguments. This just appears to be a comfortable way of dealing with a need for a large number of attributes at the the creation of each instance.
That said, I'm still a major python noob. So, my question is this: Is there an underlying reason why passing the same dict repeatedly through super() to the base class would be a worse idea than just toughing it out with nasty (big and cluttered) **kwds passes? (e.g. issues with the interpreter that someone at my level would be ignorant of.)
EDIT:
Previously, creating a new Player might have looked like this, with an argument passed for each attribute.
bob = Player('bob', Location = 'here', ... etc.)
The number of arguments needed blew up, and I only included the attributes that really needed to be present to not break method calls from the Engine object.
This is the impression I'm getting from the answers and comments thus far:
There's nothing "wrong" with sending the same dictionary along, as long as nothing has the opportunity to modify its contents (Kirk Strauser) and the dictionary always has what it's supposed to have (goncalopp). The real answer is that the question was amiss, and using in_dict instead of **kwds is redundant.
Would this be correct? (Also, thanks for the great and varied feedback!)
I'm not sure I understand your question exactly, because I don't see how the code looked before you made the change to use in_dict. It sounds like you have been listing out dozens of keywords in the call to super (which is understandably not what you want), but this is not necessary. If your child class has a dict with all of this information, it can be turned into kwargs when you make the call with **in_dict. So:
class Actor:
def __init__(self, **kwds):
class Item(Actor):
def __init__(self, **kwds)
self._everything = kwds
super().__init__(**kwds)
I don't see a reason to add another dict for this, since you can just manipulate and pass the dict created for kwds anyway
Edit:
As for the question of the efficiency of using the ** expansion of the dict versus listing the arguments explicitly, I did a very unscientific timing test with this code:
import time
def some_func(**kwargs):
for k,v in kwargs.items():
pass
def main():
name = 'felix'
location = 'here'
user_type = 'player'
kwds = {'name': name,
'location': location,
'user_type': user_type}
start = time.time()
for i in range(10000000):
some_func(**kwds)
end = time.time()
print 'Time using expansion:\t{0}s'.format(start - end)
start = time.time()
for i in range(10000000):
some_func(name=name, location=location, user_type=user_type)
end = time.time()
print 'Time without expansion:\t{0}s'.format(start - end)
if __name__ == '__main__':
main()
Running this 10,000,000 times gives a slight (and probably statistically meaningless) advantage passing around a dict and using **.
Time using expansion: -7.9877269268s
Time without expansion: -8.06108212471s
If we print the IDs of the dict objects (kwds outside and kwargs inside the function), you will see that python creates a new dict for the function to use in either case, but in fact the function only gets one dict forever. After the initial definition of the function (where the kwargs dict is created) all subsequent calls are just updating the values of that dict belonging to the function, no matter how you call it. (See also this enlightening SO question about how mutable default parameters are handled in python, which is somewhat related)
So from a performance perspective, you can pick whichever makes sense to you. It should not meaningfully impact how python operates behind the scenes.
I've done that myself where in_dict was a dict with lots of keys, or a settings object, or some other "blob" of something with lots of interesting attributes. That's perfectly OK if it makes your code cleaner, particularly if you name it clearly like settings_object or config_dict or similar.
That shouldn't be the usual case, though. Normally it's better to explicitly pass a small set of individual variables. It makes the code much cleaner and easier to reason about. It's possible that a client could pass in_dict = None by accident and you wouldn't know until some method tried to access it. Suppose Actor.__init__ didn't peel apart in_dict but just stored it like self.settings = in_dict. Sometime later, Actor.method comes along and tries to access it, then boom! Dead process. If you're calling Actor.__init__(var1, var2, ...), then the caller will raise an exception much earlier and provide you with more context about what actually went wrong.
So yes, by all means: feel free to do that when it's appropriate. Just be aware that it's not appropriate very often, and the desire to do it might be a smell telling you to restructure your code.
This is not python specific, but the greatest problem I can see with passing arguments like this is that it breaks encapsulation. Any class may modify the arguments, and it's much more difficult to tell which arguments are expected in each class - making your code difficult to understand, and harder to debug.
Consider explicitly consuming the arguments in each class, and calling the super's __init__ on the remaining. You don't need to make them explicit:
class ClassA( object ):
def __init__(self, arg1, arg2=""):
pass
class ClassB( ClassA ):
def __init__(self, arg3, arg4="", *args, **kwargs):
ClassA.__init__(self, *args, **kwargs)
ClassB(3,4,1,2)
You can also leave the variables uninitialized and use methods to set them. You can then use different methods in the different classes, and all subclasses will have access to the superclass methods.
I have been working at learning Python over the last week and it has been going really well, however I have now been introduced to custom functions and I sort of hit a wall. While I understand the basics of it, such as:
def helloworld():
print("Hello World!")
helloworld()
I know this will print "Hello World!".
However, when it comes to getting information from one function to another, I find that confusing. ie: function1 and function2 have to work together to perform a task. Also, when to use the return command.
Lastly, when I have a list or a dictionary inside of a function. I'll make something up just as an example.
def my_function():
my_dict = {"Key1":Value1,
"Key2":Value2,
"Key3":Value3,
"Key4":Value4,}
How would I access the key/value and be able to change them from outside of the function? ie: If I had a program that let you input/output player stats or a character attributes in a video game.
I understand bits and pieces of this, it just confuses me when they have different functions calling on each other.
Also, since this was my first encounter with the custom functions. Is this really ambitious to pursue and this could be the reason for all of my confusion? Since this is the most complex program I have seen yet.
Functions in python can be both, a regular procedure and a function with a return value. Actually, every Python's function will return a value, which might be None.
If a return statement is not present, then your function will be executed completely and leave normally following the code flow, yielding None as a return value.
def foo():
pass
foo() == None
>>> True
If you have a return statement inside your function. The return value will be the return value of the expression following it. For example you may have return None and you'll be explicitly returning None. You can also have return without anything else and there you'll be implicitly returning None, or, you can have return 3 and you'll be returning value 3. This may grow in complexity.
def foo():
print('hello')
return
print('world')
foo()
>>>'hello'
def add(a,b):
return a + b
add(3,4)
>>>7
If you want a dictionary (or any object) you created inside a function, just return it:
def my_function():
my_dict = {"Key1":Value1,
"Key2":Value2,
"Key3":Value3,
"Key4":Value4,}
return my_dict
d = my_function()
d['Key1']
>>> Value1
Those are the basics of function calling. There's even more. There are functions that return functions (also treated as decorators. You can even return multiple values (not really, you'll be just returning a tuple) and a lot a fun stuff :)
def two_values():
return 3,4
a,b = two_values()
print(a)
>>>3
print(b)
>>>4
Hope this helps!
The primary way to pass information between functions is with arguments and return values. Functions can't see each other's variables. You might think that after
def my_function():
my_dict = {"Key1":Value1,
"Key2":Value2,
"Key3":Value3,
"Key4":Value4,}
my_function()
my_dict would have a value that other functions would be able to see, but it turns out that's a really brittle way to design a language. Every time you call my_function, my_dict would lose its old value, even if you were still using it. Also, you'd have to know all the names used by every function in the system when picking the names to use when writing a new function, and the whole thing would rapidly become unmanageable. Python doesn't work that way; I can't think of any languages that do.
Instead, if a function needs to make information available to its caller, return the thing its caller needs to see:
def my_function():
return {"Key1":"Value1",
"Key2":"Value2",
"Key3":"Value3",
"Key4":"Value4",}
print(my_function()['Key1']) # Prints Value1
Note that a function ends when its execution hits a return statement (even if it's in the middle of a loop); you can't execute one return now, one return later, keep going, and return two things when you hit the end of the function. If you want to do that, keep a list of things you want to return and return the list when you're done.
You send information into and out of functions with arguments and return values, respectively. This function, for example:
def square(number):
"""Return the square of a number."""
return number * number
... recieves information through the number argument, and sends information back with the return ... statement. You can use it like this:
>>> x = square(7)
>>> print(x)
49
As you can see, we passed the value 7 to the function, and it returned the value 49 (which we stored in the variable x).
Now, lets say we have another function:
def halve(number):
"""Return half of a number."""
return number / 2.0
We can send information between two functions in a couple of different ways.
Use a temporary variable:
>>> tmp = square(6)
>>> halve(tmp)
18.0
use the first function directly as an argument to the second:
>>> halve(square(8))
32.0
Which of those you use will depend partly on personal taste, and partly on how complicated the thing you're trying to do is.
Even though they have the same name, the number variables inside square() and halve() are completely separate from each other, and they're invisible outside those functions:
>>> number
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
NameError: name 'number' is not defined
So, it's actually impossible to "see" the variable my_dict in your example function. What you would normally do is something like this:
def my_function(my_dict):
# do something with my_dict
return my_dict
... and define my_dict outside the function.
(It's actually a little bit more complicated than that - dict objects are mutable (which just means they can change), so often you don't actually need to return them. However, for the time being it's probably best to get used to returning everything, just to be safe).
I have been doing a lot of searching, and I don't think I've really found what I have been looking for. I will try my best to explain what I am trying to do, and hopefully there is a simple solution, and I'll be glad to have learned something new.
This is ultimately what I am trying to accomplish: Using nosetests, decorate some test cases using the attribute selector plugin, then execute test cases that match a criteria by using the -a switch during commandline invocation. The attribute values for the tests that are executed are then stored in an external location. The command line call I'm using is like below:
nosetests \testpath\ -a attribute='someValue'
I have also created a customized nosetest plugin, which stores the test cases' attributse, and writes them to an external location. The idea is that I can select a batch of tests, and by storing the attributes of these tests, I can do filtering on these results later for reporting purposes. I am accessing the method attributes in my plugin by overriding the "wantMethod" method with the code similar to the following:
def set_attribs(self, method, attribute):
if hasattr(method, attribute):
if not self.method_attributes.has_key(method.__name__):
self.method_attributes[method.__name__] = {}
self.method_attributes[method.__name__][attribute] = getattr(method, attribute)
def wantMethod(self, method):
self.set_attribs(method, "attribute1")
self.set_attribs(method, "attribute2")
pass
I have this working for pretty much all the tests, except for one case, where the test is uing the "yield" keyword. What is happening is that the methods that are generated are being executed fine, but then the method attributes are empty for each of the generated functions.
Below is the example of what I am trying to achieve. The test below retreives a list of values, and for each of those values, yields the results from another function:
#attr(attribute1='someValue', attribute2='anotherValue')
def sample_test_generator(self):
for (key, value) in _input_dictionary.items()
f = partial(self._do_test, key, value)
f.attribute1='someValue'
yield (lambda x: f(), key)
def _do_test(self, input1, input2):
# Some code
From what I have read, and think I understand, when yield is called, it would create a new callable function which then gets executed. I have been trying to figure out how to retain the attribute values from my sample_test_generator method, but I have not been successful. I thought I could create a partial method, and then add the attribute to the method, but no luck. The tests execute without errors at all, it just seems that from my plugin's perspective, the method attributes aren't present, so they don't get recorded.
I realize this a pretty involved question, but I wanted to make sure that the context for what I am trying to achieve is clear. I have been trying to find information that could help me for this particular case, but I feel like I've reached a stumbling block now, so I would really like to ask the experts for some advice.
Thanks.
** Update **
After reading through the feedback and playing around some more, it looks like if I modified the lambda expression, it would achieve what I am looking for. In fact, I didn't even need to create the partial function:
def sample_test_generator(self):
for (key, value) in _input_dictionary.items()
yield (lambda: self._do_test)
The only downside to this approach is that the test name will not change. As I am playing around more, it looks like in nosetests, when a test generator is used, it would actually change the test name in the result based on the keywords it contains. Same thing was happening when I was using the lambda expression with a parameter.
For example:
Using lamdba expression with a parameter:
yield (lambda x: self._do_test, "value1")
In nosetests plugin, when you access the test case name, it would be displayed as "sample_test_generator(value1)
Using lambda expression without a parameter:
yield (lambda: self._do_test)
The test case name in this case would be "sample_test_generator". In my example above, if there are multiple values in the dictionary, then the yield call would occur multiple times. However, the test name would always remain as "sample_test_generator". This is not as bad as when I would get the unique test names, but then not be able to store the attribute values at all. I will keep playing around, but thanks for the feedback so far!
EDIT
I forgot to come back and provide my final update on how I was able to get this to work in the end, there was a little confusion on my part at first, and after I looked through it some more, I figured out that it had to do with how the tests are recognized:
My original implementation assumed that every test that gets picked up for execution goes through the "wantMethod" call from the plugin's base class. This is not true when "yield" is used to generate the test, because at this point, the test method has already passed the "wantMethod" call.
However, once the test case is generated through the "yeild" call, it does go through the "startTest" call from the plug-in base class, and this is where I was finally able to store the attribute successfully.
So in a nut shell, my test execution order looked like this:
nose -> wantMethod(method_name) -> yield -> startTest(yielded_test_name)
In my override of the startTest method, I have the following:
def startTest(self, test):
# If a test is spawned by using the 'yield' keyword, the test names would be the parent test name, appended by the '(' character
# example: If the parent test is "smoke_test", the generated test from yield would be "smoke_test('input')
parent_test_name = test_name.split('(')[0]
if self.method_attributes.has_key(test_name):
self._test_attrib = self.method_attributes[test_name]
elif self.method_attributes.has_key(parent_test_name):
self._test_attrib = self.method_attributes[parent_test_name]
else:
self._test_attrib = None
With this implementation, along with my overide of wantMethod, each test spawned by the parent test case also inherits attributes from the parent method, which is what I needed.
Again, thanks to all who send replies. This was quite a learning experience.
Would this fix your name issue?
def _actual_test(x, y):
assert x == y
def test_yield():
_actual_test.description = "test_yield_%s_%s" % (5, 5)
yield _actual_test, 5, 5
_actual_test.description = "test_yield_%s_%s" % (4, 8) # fail
yield _actual_test, 4, 8
_actual_test.description = "test_yield_%s_%s" % (2, 2)
yield _actual_test, 2, 2
Rename survives #attr too.
does this work?
#attr(attribute1='someValue', attribute2='anotherValue')
def sample_test_generator(self):
def get_f(f, key):
return lambda x: f(), key
for (key, value) in _input_dictionary.items()
f = partial(self._do_test, key, value)
f.attribute1='someValue'
yield get_f(f, key)
def _do_test(self, input1, input2):
# Some code
The Problem ist that the local variables change after you created the lambda.
I'm doing some collision detection and I would very much like to use the same function in two different contexts. In one context, I would like for it to be something like
def detect_collisions(item, others):
return any(collides(item, other) for other in others)
and in another, I would like it to be
def get_collisions(item, others):
return [other for other in others if collides(item, other)]
I really hate the idea of writing two functions here. Just keeping their names straight is one turnoff and complicating the interface to the collision detecting is another. so I was thinking:
def peek(gen):
try:
first = next(gen)
except StopIteration:
return False
else:
return it.chain((first,), gen)
def get_collisions(item, others):
get_collisions.all = peek(other for other in others if collides(item, other))
return get_collisions.all
Now when I just want to do a check, I can say:
if get_collisions(item, others):
# aw snap
or
if not get_collisions(item, others):
# w00t
and in the other context where I actually want to examine them, I can do:
if get_collisions(item, others):
for collision in get_collisions.all:
# fix it
and in both cases, I don't do any more processing than I need to.
I recognize that this is more code than the first two functions but it also has the advantage of:
Keeping my interface to the collision detection as a tree with the node at the top level instead of a mid level. This seems simpler.
Hooking myself up with a handy peek function. If I use it one other time, then I'm actually writing less code. (In response to YAGNI, if I have it, I will)
So. If you were the proverbial homicidal maniac that knows where I live, would I be expecting a visit from you if I wrote the above code? If so, how would you approach this situation?
Just make get_collisions return a generator:
def get_collisions(item, others):
return (other for other in others if collides(item, other))
Then, if you want to do a check:
for collision in get_collisions(item, others):
print 'Collision!'
break
else:
print 'No collisions!'
This is very similar to what we were discussing in "pythonic way to rewrite an assignment in an if statement", but this version only handles one positional or one keyword argument per call so one can always retrieve the actual valued cached (not just whether is a True value in the Pythonian boolean sense or not).
I never really cared for your proposal to accept multiple keywords and have differently named functions depending on whether you wanted all the results put through any() or all() -- but liked the idea of using keyword arguments which would allow a single function to be used in two or more spots simultaneously. Here's what I ended up with:
# can be called with a single unnamed value or a single named value
def cache(*args, **kwargs):
if len(args)+len(kwargs) == 1:
if args:
name, value = 'value', args[0] # default attr name 'value'
else:
name, value = kwargs.items()[0]
else:
raise NotImplementedError('"cache" calls require either a single value argument '
'or a name=value argument identifying an attribute.')
setattr(cache, name, value)
return value
# add a sub-function to clear the cache attributes (optional and a little weird)
cache.clear = lambda: cache.func_dict.clear()
# you could then use it either of these two ways
if get_collisions(item, others):
# no cached value
if cache(collisions=get_collisions(item, others)):
for collision in cache.collisions:
# fix them
By putting all the ugly details in a separate function, it doesn't affect the code in get_collisions() one way or the other, and is also available for use elsewhere.