Undo in Python with a very large state. Is it possible? - python

This appears simple, but I can't find a good solution.
It's the old 'pass by reference'/ 'pass by value' / 'pass by object reference' problem. I understand what is happening, but I can't find a good work around.
I am aware of solutions for small problems, but my state is very large and extremely expensive to save/ recalculate. Given these constraints, I can't find a solution.
Here is some simple pseudocode to illustrate what I would like to do (if Python would let me pass by reference):
class A:
def __init__(self,x):
self.g=x
self.changes=[]
def change_something(self,what,new): # I want to pass 'what' by reference
old=what # and then de-reference it here to read the value
self.changes.append([what,old]) # store a reference
what=new # dereference and change the value
def undo_changes():
for c in self.changes:
c[0]=c[1] # dereference and restore old value
Edit: Adding some more pseudocode to show how I would like the use the above
test=A(1) # initialise test.g as 1
print(test.g)
out: 1
test.change_something(test.g,2)
# if my imaginary code above functioned as described in its comments,
# this would change test.g to 2 and store the old value in the list of changes
print(test.g)
out: 2
test.undo_changes()
print(test.g)
out: 1
Obviously the above code doesnt work in python due to being 'pass by object reference'. Also I'd like to be able to undo a single change, not just all of them as in the code above.
The thing is... I can't seem to find a good work around. There are solutions out there like these:
Do/Undo using command pattern in Python
making undo in python
Which involve storing a stack of commands. 'Undo' then involves removing the last command and then re-building the final state by taking the initial state and re-applying everything but the last command. My state is too large for this to be feasible, the issues are:
The state is very large. Saving it entirely is prohibitively expensive.
'Do' operations are costly (making recalculating from a saved state infeasible).
Do operations are also non-deterministic, relying on random input.
Undo operations are very frequent
I have one idea, which is to ensure that EVERYTHING is stored in lists, and writing my code so that everything is stored, read from and written to these lists. Then in the code above I can pass the list name and list index every time I want to read/write a variable.
Essentially this amounts to building my own memory architecture and C-style pointer system within Python!
This works, but seems a little... ridiculous? Surely there is a better way?

Please check if it helps....
class A:
def __init__(self,x):
self.g=x
self.changes={}
self.changes[str(x)] = {'init':x, 'old':x, 'new':x} #or make a key by your choice(immutable)
def change_something(self,what,new): # I want to pass 'what' by reference
self.changes[what]['new'] = new #add changed value to your dict
what=new # dereference and change the value
def undo_changes():
what = self.changes[what]['old'] #retrieve/changed to the old value
self.changes[what]['new'] = self.changes[what]['old'] #change latest new val to old val as you reverted your changes
for each change you can update the change_dictionary. Onlhy thing you have to figure out is "how to create entry for what as a key in self.change dictionary", I just made it str(x), just check the type(what) and how to make it a key in your case.

Okay so I have come up with an answer... but it's ugly! I doubt it's the best solution. It uses exec() which I am told is bad practice and to be avoided if at all possible. EDIT: see below!
Old code using exec():
class A:
def __init__(self,x):
self.old=0
self.g=x
self.h=x*10
self.changes=[]
def change_something(self,what,new):
whatstr='self.'+what
exec('self.old='+whatstr)
self.changes.append([what,self.old])
exec(whatstr+'=new')
def undo_changes(self):
for c in self.changes:
exec('self.'+c[0]+'=c[1]')
def undo_last_change(self):
c = self.changes[-1]
exec('self.'+c[0]+'=c[1]')
self.changes.pop()
Thanks to barny, here's a much nicer version using getattr and setattr:
class A:
def __init__(self,x):
self.g=x
self.h=x*10
self.changes=[]
def change_something(self,what,new):
self.changes.append([what,getattr(self,what)])
setattr(self,what,new)
def undo_changes(self):
for c in self.changes:
setattr(self,c[0],c[1])
def undo_last_change(self):
c = self.changes[-1]
setattr(self,c[0],c[1])
self.changes.pop()
To demonstrate, the input:
print("demonstrate changing one value")
b=A(1)
print('g=',b.g)
b.change_something('g',2)
print('g=',b.g)
b.undo_changes()
print('g=',b.g)
print("\ndemonstrate changing two values and undoing both")
b.change_something('h',3)
b.change_something('g',4)
print('g=', b.g, 'h=',b.h)
b.undo_changes()
print('g=', b.g, 'h=',b.h)
print("\ndemonstrate changing two values and undoing one")
b.change_something('h',30)
b.change_something('g',40)
print('g=', b.g, 'h=',b.h)
b.undo_last_change()
print('g=', b.g, 'h=',b.h)
returns:
demonstrate changing one value
g= 1
g= 2
g= 1
demonstrate changing two values and undoing both
g= 4 h= 3
g= 1 h= 10
demonstrate changing two values and undoing one
g= 40 h= 30
g= 1 h= 30
EDIT 2: Actually... after further testing, my initial version with exec() has some advantages over the second. If the class contains a second class, or list, or whatever, the exec() version has no trouble updating a list within a class within a class, however the second version will fail.

Related

Algorithms to achieve reasonable result with most economic efforts

Suppose that there are six kinds of problems to be handled when reading a book,
I illustrate the details as follow:
while True:
if encounter A :
handle A
#during handling the problem, it might spawn new problems of
#A, B, C, D, E,
produce (A,B..E or null)
continue
if B occurs:
handle B
#during hanlding the problem, it might spwan new problems
#A, B, C, D, E,
produce (A,B..E or null)
continue
if C happens:
handle C
produce (A,B..E or null)
continue
...
if there are no problmes:
break
Assume I have 3 problems,
the above program might endless loop on first one and never touch the second.
Take an example that I am reading a book,
'Problem A' is defined as encounter a 'new word', handle it is to look up dictionary.
When looking up, I might come acorss another new word, another and another.
In this case, I will never end up reading one sentence of a book.
As a solution,
I introduce a container to collect problems, value-weighted them then determine to execute which one.
def solve_problems(problems)
problem_backet = list(problems)
while True:
if problem_backet not is null:
#value-weighted all the problem
#and determine one to implement
value_weight problems
problem = x
if problem == A:
handle A
problem_backet.append(new_problem)
continue
if problem == B:
handle B
problem_backet.append(new_problem)
continue
...
if problem_backet is null:
return
I tried alternatively to seek inspiration and improve efficiency.
def solve_problems(problems):
global problem_backet
problem_backet = list(problems)
value_weight problems
problem = x
if problem == A:
handle A
problem_backet.append(new_problem)
solve_problems(problem_backet)
if problem == B:
handle B
problem_backet.append(new_problem)
solve_problems(problem_backet)
if problem == C:
handle C
problem_backet.append(new_problem)
solve_problems(problem_backet)
...
if problem_backet is null:
return
Again, the value_weighted process consume huge efforts and time.
How to solve such a problem in a proper algorithms?
'Problem A' is defined as encounter a 'new word', handle it is to look up dictionary.
When looking up, I might come across another new word, another and another.
In this case, I will never end up reading one sentence of a book.
Looks like it will eventually end up reading the sentence, since the number of new words is limited by dictionary size. In general it sounds OK to me, unless there are some other restrictions not explicitly mentioned, like finish sentence reading in a limited time.
How to solve such a problem in a proper algorithms?
Well, if there is no a "limited time" restriction, your original algorithm is almost perfect. To make it even better in therms of overall performance, we might handle all problems A first, then move to B and so on. It will increase the data locality and overall performance of our algorithm.
But if there is a "limited time" restriction, we can end up reading full sentence in that time (without full understanding) or end up reading part of the sentence (fully understanding that part) or something in between (like suggested by #Lauro Bravar).
From the example above it is not quite clear how we do the value_weight, but the proper name for this kind of problems is Priority Queueing. There are variety of algorithms and implementations, please have a look at Wikipedia page for the details: https://en.wikipedia.org/wiki/Priority_queue
You can do several things to approach such a problem.
One of them is to set a value of "max-iterations" or "max-effort" in a Machine Learning style that you can invest into reading a book. Therefore you will execute (handle) only up to a number of actions, until the limit has been reached. This solution will look like:
while(effortRemaining > 0){
# Do your actions
}
The actions you do should be the ones that report more benefit/less effort according to some metric that you need to define.
When you perform a certain action (handle) you substract the cost/effort of that action from effortRemaining and you continue with your flow.
You have already the algorithm (with the help of Andriy for the priority queue), but you lack the design. When I see your multple ifs that check the type of the problem, I think to polymorphism.
Why not try OOP? You have two objects to define: a problem and a priority queue. Fortunately, the priority queue is defined in the heapq module. Let's focus on the problem:
in its core definition, it is handled and may be compared to other problems (it is more or less urgent). Note that, guided by the OOP principles, I do not talk of the structure or implementation of a problem,
but only of the functions of a problem:
class Problem()
def handle(self, some args): # we may need a dictionary, a database connection, ...
...
def compare(self, other):
...
But you said that when a problem is handled, it may add new problems to the queue. So let's add a precision to the definition of handle:
def handle(self, queue, some args): # we still may need a dictionary, a database connection, ...
...
In Python, compare is a special method named __lt__, for "lower than". (You have other special comparison methods, but __lt__ will be sufficient here.)
Here's a basic implementation example:
class Problem():
def __init__(self, name, weight):
self.__name = name
self.__weight = weight
def handle(self, queue):
print ("handle the problem {}".format(self))
while random.random() > 0.75: # randomly add new problems for the example
new_problem = Problem(name*2, random.randint(0, 100))
print ("-> Add the problem {} to the queue".format(new_problem))
heapq.heappush(queue, new_problem) # add the problem to the queue
def __lt__(self, other):
return self.__weight > other.__weight # note the >
def __repr__(self): # to show in lists
return "Problem({}, {})".format(self.__name, self.__weight)
Wait! Why "lower than" and a >? That's because the module heapq is a min-heap: it returns the smallest element first. Thus, we define the big weights as smaller than the little weights.
Now, we can build a begin queue with fake data for the example:
queue = []
for name in ["foo", "bar", "baz"]:
problem = Problem(name, random.randint(0, 100))
heapq.heappush(queue, problem) # add the problem to the queue
And run the main loop:
while queue:
print ("Current queue", queue)
problem = heapq.heappop(queue) # the problem with the max weight in O(lg n)
problem.handle(queue)
I guess you will be able to subclass the Problem class to represent the various problems you might want to handle.

Getting checkbutton variables values on every notebook tabs Tkinter Python

On every Tkinter notebook tab, there is a list of checkbuttons and the variables get saved to their corresponding v[ ] (i.e. cb.append(Checkbuttons(.., variables = v[x],..)).
For now, I am encountering this error:
File "/home/pass/OptionsInterface.py", line 27, in __init__
self.ntbk_render(f = self.f1, ntbkLabel="Options",cb = optsCb, msg = optMsg)
File "/home/pass/OptionsInterface.py", line 59, in ntbk_render
text = msg[x][1], command = self.cb_check(v, opt)))
File "/home/pass/OptionsInterface.py", line 46, in cb_check
opt[ix]=(v[ix].get())
IndexError: list assignment index out of range
And I think the error is coming here. I don't know how to access the values of the checkbutton variables.
def cb_check(self, v = [], cb = [], opt = []):
for ix in range(len(cb)):
opt[ix]=(v[ix].get())
print opt
Here are some snippets:
def cb_check(self, v = [], cb = [], opt = []):
for ix in range(len(cb)):
opt[ix]=(v[ix].get())
print opt
def ntbk_render(self, f=None, ntbkLabel="", cb = [], msg = []):
v = []
opt = []
msg = get_thug_args(word = ntbkLabel, argList = msg) #Allows to get the equivalent list (2d array)
#to serve as texts for their corresponding checkboxes
for x in range(len(msg)):
v.append(IntVar())
off_value = 0
on_value = 1
cb.append(Checkbutton(f, variable = v[x], onvalue = on_value, offvalue = off_value,
text = msg[x][1], command = self.cb_check(v, opt)))
cb[x].grid(row=self.rowTracker + x, column=0, sticky='w')
opt.append(off_value)
cb[-1].deselect()
After solving the error, I want to get all the values of the checkbutton variables of each tab after pressing the button Ok at the bottom. Any tips on how to do it will help!
Alright, so there’s a bit more (… alright, maybe a little more than a bit…) here than I intended, but I’ll leave it on the assumption that you’ll simply take away from it what you need or find of value.
The short answer is that when your Checkbutton calls cb_check, it’s passing the arguments like this:
cb_check(self = self, v = v, cb = opt, opt = [])
I think it’s pretty obvious why you’re getting an IndexError when we write it out like this: you’re using the length of your opt list for indexes to use on the empty list that the function uses when opt is not supplied; in other words, if you have 5 options, the it will try accessing indices [0…4] on empty list [] (obviously, it stops as soon as it fails to access Index 0). Your function doesn’t know that the thing you’re passing it are called v and opt: it simply takes some random references you give it and places them in the order of the positional arguments, filling in keyword arguments in order after that, and then fills out the rest of the keyword arguments with whatever defaults you told it to use.
Semi-Quick Aside:
When trying to fix an error, if I have no idea what went wrong, I would start by inserting a print statement right before it breaks with all the references that are involved in the broken line, which will often tell you what references do not contain the values you thought they had. If this looks fine, then I would step in further and further, checking any lookups/function returns for errors. For example:
def cb_check(self, v = [], cb = [], opt = []):
for ix in range(len(cb)):
print(ix, opt, v) ## First check, for sanity’s sake
print(v[ix]) ## Second Check if I still can’t figure it out, but
## this is a lookup, not an assignment, so it
## shouldn’t be the problem
print(v[ix].get()) ## Third Check, again, not an assignment
print(opt[ix]) ## “opt[ix]={something}” is an assignment, so this is
## (logically) where it’s breaking. Here we’re only
## doing a lookup, so we’ll get a normal IndexError
## instead (it won’t say “assignment”)
opt[ix]=(v[ix].get()) ##point in code where IndexError was raised
The simple fix would be to change the Checkbutton command to “lambda: self.cb_check(v,cb,opt)” or more explicitly (so we can do a sanity check) “lambda: self.cb_check(v = v, cb = cb, opt = opt).” (I’ll further mention that you can change “lambda:” to “lambda v = v, cb = cb, opt = opt:” to further ensure that you’ll forever be referring to the same lists, but this should be irrelevant, especially because of changes I’ll suggest below)
[The rest of this is: First Section- an explicit explanation of what your code is doing and a critique of it; second section- an alternative approach to how you have this laid out. As mentioned, the above fixes your problem, so the rest of this is simply an exercise in improvement]
In regards to your reference names-
There’s an old adage “Code is read much more often than it is written,” and part of the Zen of Python says: “Explicit is better than Implicit.[…] Readability counts.” So don’t be afraid to type a little bit more to make it easier to see what’s going on (same logic applies to explicitly passing variables to cb_check in the solution above). v can be varis; cb can be cbuttons; ix would be better (in my opinion) as ind or just plain index; f (in ntkb_render) should probably be parent or master.
Imports-
It looks like you’re either doing star (*) imports for tkinter, or explicitly importing parts of it. I’m going to discourage you from doing either of these things for two reasons. The first is the same reason as above: if only a few extra keystrokes makes it easier to see where everything came from, then it’s worth it in the long run. If you need to go through your code later to find every single tkinter Widget/Var/etc, then simply searching “tk” is a lot easier than searching “Frame” then “Checkbutton” then IntVar and so on. Secondly, imports occasionally clash: so if- for example- you do
import time ## or from time import time, even
from datetime import time
life may get kinda hairy for you. So it would be better to “import tkinter as tk” (for example) than the way you are currently doing it.
cb_check-
There are a few things I’ll point out about this function:
1) v, cb, and opt are all required for the function to work correctly; if the empty list reference is used instead, then it’s going to fail unless you created 0 Checkbuttons (because there wouldn’t be anything to iterate over in the “for loop”; regardless, this doesn’t seem like it should ever happen). What this means is that they’re better off simply being positional arguments (no default value). Had you written them this way, the function would have given you an error stating that you weren’t giving it enough information to work with rather than a semi-arbitrary “IndexError.”
2) Because you supply the function with all the information it needs, there is no practical reason (based on the code supplied, at any rate) as to why the function needs to be a method of some object.
3) This function is being called each time you select a Checkbutton, but reupdates the recorded values (in opt) of all the Checkbuttons (instead of just the one that was selected).
4) The opt list is technically redundant: you already have a reference to a list of all the IntVars (v), which are updated/maintained in real time without you having to do anything; it is basically just as easy to perform v[ix].get() as it is to do opt[ix]: in exchange for the “.get()” call when you eventually need the value you have to include a whole extra function and run it repeatedly to make sure your opt list is up to date. To complicate matters further, there’s an argument for v also being redundant, but we’ll get to that later.
And as an extra note: I’m not sure why you wrapped the integer value of your IntVar (v[ix].get()) with parentheses; they seem extraneous, but I don’t know if you’re trying to cast the value in the same manner as C/Java/etc.
ntbk_render-
Again, notice that this function is given nearly everything it needs to be executed, and therefore feels less like a method than a stand-alone function (at this moment; again, we’ll get to this at the end). The way it’s setup also means that it requires all of that information, so they would better off as positional argument as above.
The cb reference-
Unlike v and opt, the cb reference can be supplied to the function. If we follow cb along its path through the code, we’ll find out that its length must always be equal to v and opt. Assumedly, the reason we may want to pass cb to this method but not v or opt is because we only care about the reference to cb in the rest of our code. However, notice that cb is always an empty iterable with an append method (seems safe to assume it will always be an empty list). So either we should be testing to make sure that it’s empty before we start doing anything with it (because it will break our code if it isn’t), or we should just create it at the same time that we’re creating v and opt. Not knowing how your code is set up, I personally would think it would be easiest to initialize it alongside the other two and then simply return it at the end of the method (putting “return cb” at the end of this function and “cb=[whatever].ntbk_render(f = someframe, ntbklabel = “somethug”, msg = argList)”). Getting back to the redundancy of opt and v (point 4 in cb_check), since we’re keeping all the Checkbuttons around in cb, we can use them to access their IntVars when we need to.
msg-
You pass msg to the function, and then use it to the value of “argList” in get_thug_args and replace it with the result. I think it would make more sense to call the keyword that you pass the ntbk_render “argList” because that’s what it is going to be used for, and then simply let msg be the returned value of get_thug_args. (The same line of thought applies to the keyword “ntbkLabel”, for the record)
Iterating-
I’m not sure if using an Index Reference (x) is just a habit picked up from more rigid programing languages like C and Java, but iterating is probably one of my favorite advantages (subjective, I know) that Python has over those types of languages. Instead of using x, to get your option out of msg, you can simply step through each individual option inside of msg. The only place that we run into insurmountable problems is when we use self.rowTracker (which, on the subject, is not updated in your code; we’ll fix that for now, but as before, we’ll be dealing with that later). What we can do to amend this is utilize the enumerate function built into Python; this will create a tuple containing the current index followed by the value at the iterated index.
Furthermore, because you’re keeping everything in lists, you have to keep going back to the index of the list to get the reference. Instead, simply create references to the things (datatypes/objects) you are creating and then add the references to the lists afterwards.
Below is an adjustment to the code thus far based on most of the things I noted above:
import tkinter as tk ## tk now refers to the instance of the tkinter module that we imported
def ntbk_render(self, parent, word, argList):
cbuttons=list() ## The use of “list()” here is purely personal preference; feel free to
## continue with brackets
msg = get_thug_args(word = word, argList=argList) ## returns a 2d array [ [{some value},
## checkbutton text,…], …]
for x,option in enumerate(msg):
## Each iteration automatically does x=current index, option=msg[current_index]
variable = tk.IntVar()
## off and on values for Checkbuttons are 0 and 1 respectively by default, so it’s
## redundant at the moment to assign them
chbutton=tk.Checkbutton(parent, variable=variable, text=option[1])
chbutton.variable = variable ## rather than carrying the variable references around,
## I’m just going to tack them onto the checkbutton they
## belong to
chbutton.grid(row = self.rowTracker + x, column=0, sticky=’w’)
chbutton.deselect()
cbuttons.append(chbutton)
self.rowTracker += len(msg) ## Updating the rowTracker
return cbuttons
def get_options(self, cbuttons):
## I’m going to keep this new function fairly simple for clarity’s sake
values=[]
for chbutton in cbuttons:
value=chbutton.variable.get() ## It is for this purpose that we made
## chbutton.variable=variable above
values.append(value)
return values
Yes, parts of this are a bit more verbose, but any mistakes in the code are going to be much easier to spot because everything is explicit.
Further Refinement
The last thing I’ll touch on- without going into too much detail because I can’t be sure how much of this was new information for you- is my earlier complaints about how you were passing references around. Now, we already got rid of a lot of complexity by reducing the important parts down to just the list of Checkbuttons (cbuttons), but there are still a few references being passed that we may not need. Rather than dive into a lot more explanation, consider that each of these Notebook Tabs are their own objects and therefore could do their own work: so instead of having your program add options to each tab and carry around all the values to the options, you could relegate that work to the tab itself and then tell it how or what options to add and ask it for its options and values when you need them (instead of doing all that work in the main program).

Modify *existing* variable in `locals()` or `frame.f_locals`

I have found some vaguely related questions to this question, but not any clean and specific solution for CPython. And I assume that a "valid" solution is interpreter specific.
First the things I think I understand:
locals() gives a non-modifiable dictionary.
A function may (and indeed does) use some kind of optimization to access its local variables
frame.f_locals gives a locals() like dictionary, but less prone to hackish things through exec. Or at least I have been less able to do hackish undocumented things like the locals()['var'] = value ; exec ""
exec is capable to do weird things to the local variables, but it is not reliable --e.g. I read somewhere that it doesn't work in Python 3. Haven't tested.
So I understand that, given those limitations, it will never be safe to add extra variables to the locals, because it breaks the interpreter structure.
However, it should be possible to change a variable already existing, isn't it?
Things that I considered
In a function f, one can access the f.func_code.co_nlocals and f.func_code.co_varnames.
In a frame, the variables can be accessed / checked / read through the frame.f_locals. This is in the use case of setting a tracer through sys.settrace.
One can easily access the function in which a frame is --cosidering the use case of setting a trace and using it to "do things" in with the local variables given a certain trigger or whatever.
The variables should be somewhere, preferably writeable... but I am not capable of finding it. Even if it is an array (for interpreter efficient access), or I need some extra C-specific wiring, I am ready to commit to it.
How can I achieve that modification of variables from a tracer function or from a decorated wrapped function or something like that?
A full solution will be of course appreciated, but even some pointers will help me greatly, because I'm stuck here with lots of non writeable dictionaries :-/
Edit: Hackish exec is doing things like this or this
It exists an undocumented C-API call for doing things like that:
PyFrame_LocalsToFast
There is some more discussion in this PyDev blog post. The basic idea seems to be:
import ctypes
...
frame.f_locals.update({
'a': 'newvalue',
'b': other_local_value,
})
ctypes.pythonapi.PyFrame_LocalsToFast(
ctypes.py_object(frame), ctypes.c_int(0))
I have yet to test if this works as expected.
Note that there might be some way to access the Fast directly, to avoid an indirection if the requirements is only modification of existing variable. But, as this seems to be mostly non-documented API, source code is the documentation resource.
Based on the notes from MariusSiuram, I wrote a recipe that show the behavior.
The conclusions are:
we can modify an existing variable
we can delete an existing variable
we can NOT add a new variable.
So, here is the code:
import inspect
import ctypes
def parent():
a = 1
z = 'foo'
print('- Trying to add a new variable ---------------')
hack(case=0) # just try to add a new variable 'b'
print(a)
print(z)
assert a == 1
assert z == 'foo'
try:
print (b)
assert False # never is going to reach this point
except NameError, why:
print("ok, global name 'b' is not defined")
print('- Trying to remove an existing variable ------')
hack(case=1)
print(a)
assert a == 2
try:
print (z)
except NameError, why:
print("ok, we've removed the 'z' var")
print('- Trying to update an existing variable ------')
hack(case=2)
print(a)
assert a == 3
def hack(case=0):
frame = inspect.stack()[1][0]
if case == 0:
frame.f_locals['b'] = "don't work"
elif case == 1:
frame.f_locals.pop('z')
frame.f_locals['a'] += 1
else:
frame.f_locals['a'] += 1
# passing c_int(1) will remove and update variables as well
# passing c_int(0) will only update
ctypes.pythonapi.PyFrame_LocalsToFast(
ctypes.py_object(frame),
ctypes.c_int(1))
if __name__ == '__main__':
parent()
The output would be like:
- Trying to add a new variable ---------------
1
foo
ok, global name 'b' is not defined
- Trying to remove an existing variable ------
2
foo
- Trying to update an existing variable ------
3

short name for a string/dict/list index

In python I seem to need to frequently make dicts/lists within dicts/lists within dicts/lists and then access these structures in complex if/elif/else trees. Is there someway that I could make a shorthand way of accessing a certain level of this data structure to make the code more concise.
This is an example line of code now:
schema[exp][node]['properties']['date'] = i.partition('(')[2].rpartition(')')[0].strip()
which is followed by a whole heap of other lines starting with "schema[exp][node]['properties']['foo']"
What I would like is something like:
reference_maker(schema[exp][node]['properties']['date'], schema_props)
schema_props['date'] = i.partition('(')[2].rpartition(')')[0].strip()
but I can't even really think where to begin.
If you're not worried about it changing:
schema_props = schema[exp][node]['properties']
schema_props['date'] = ...
But if you want the reference to hang around and auto-update:
schema_props = lambda: schema[exp][node]['properties']
schema_props()['date'] = ...
node = node + 1
# this now uses the next node
schema_props()['date'] = ...
Or without the lambda:
def schema_props():
return schema[exp][node]['properties']
schema_props()['date'] = ...
node = node + 1
# this now uses the next node
schema_props()['date'] = .
Not sure I understand but what’s the problem with the following?
schema_props = schema[exp][node]['properties']
schema_props['date'] = i.partition('(')[2].rpartition(')')[0].strip()
Of course, you have to be careful that schema_props always points to a still valid entry in your dict. Ie. once you manually reset schema[exp][node]['properties'] your schema_props reference will not update the original dict anymore.
For more elaborate indirection handling, you could build your own collection types which may then always keep a reference to the base dict. (See also: http://docs.python.org/2/library/collections.html#collections-abstract-base-classes)

Is there a reason not to send super().__init__() a dictionary instead of **kwds?

I just started building a text based game yesterday as an exercise in learning Python (I'm using 3.3). I say "text based game," but I mean more of a MUD than a choose-your-own adventure. Anyway, I was really excited when I figured out how to handle inheritance and multiple inheritance using super() yesterday, but I found that the argument-passing really cluttered up the code, and required juggling lots of little loose variables. Also, creating save files seemed pretty nightmarish.
So, I thought, "What if certain class hierarchies just took one argument, a dictionary, and just passed the dictionary back?" To give you an example, here are two classes trimmed down to their init methods:
class Actor:
def __init__(self, in_dict,**kwds):
super().__init__(**kwds)
self._everything = in_dict
self._name = in_dict["name"]
self._size = in_dict["size"]
self._location = in_dict["location"]
self._triggers = in_dict["triggers"]
self._effects = in_dict["effects"]
self._goals = in_dict["goals"]
self._action_list = in_dict["action list"]
self._last_action = ''
self._current_action = '' # both ._last_action and ._current_action get updated by .update_action()
class Item(Actor):
def __init__(self,in_dict,**kwds)
super().__init__(in_dict,**kwds)
self._can_contain = in_dict("can contain") #boolean entry
self._inventory = in_dict("can contain") #either a list or dict entry
class Player(Actor):
def __init__(self, in_dict,**kwds):
super().__init__(in_dict,**kwds)
self._inventory = in_dict["inventory"] #entry should be a Container object
self._stats = in_dict["stats"]
Example dict that would be passed:
playerdict = {'name' : '', 'size' : '0', 'location' : '', 'triggers' : None, 'effects' : None, 'goals' : None, 'action list' = None, 'inventory' : Container(), 'stats' : None,}
(The None's get replaced by {} once the dictionary has been passed.)
So, in_dict gets passed to the previous class instead of a huge payload of **kwds.
I like this because:
It makes my code a lot neater and more manageable.
As long as the dicts have at least some entry for the key called, it doesn't break the code. Also, it doesn't matter if a given argument never gets used.
It seems like file IO just got a lot easier (dictionaries of player data stored as dicts, dictionaries of item data stored as dicts, etc.)
I get the point of **kwds (EDIT: apparently I didn't), and it hasn't seemed cumbersome when passing fewer arguments. This just appears to be a comfortable way of dealing with a need for a large number of attributes at the the creation of each instance.
That said, I'm still a major python noob. So, my question is this: Is there an underlying reason why passing the same dict repeatedly through super() to the base class would be a worse idea than just toughing it out with nasty (big and cluttered) **kwds passes? (e.g. issues with the interpreter that someone at my level would be ignorant of.)
EDIT:
Previously, creating a new Player might have looked like this, with an argument passed for each attribute.
bob = Player('bob', Location = 'here', ... etc.)
The number of arguments needed blew up, and I only included the attributes that really needed to be present to not break method calls from the Engine object.
This is the impression I'm getting from the answers and comments thus far:
There's nothing "wrong" with sending the same dictionary along, as long as nothing has the opportunity to modify its contents (Kirk Strauser) and the dictionary always has what it's supposed to have (goncalopp). The real answer is that the question was amiss, and using in_dict instead of **kwds is redundant.
Would this be correct? (Also, thanks for the great and varied feedback!)
I'm not sure I understand your question exactly, because I don't see how the code looked before you made the change to use in_dict. It sounds like you have been listing out dozens of keywords in the call to super (which is understandably not what you want), but this is not necessary. If your child class has a dict with all of this information, it can be turned into kwargs when you make the call with **in_dict. So:
class Actor:
def __init__(self, **kwds):
class Item(Actor):
def __init__(self, **kwds)
self._everything = kwds
super().__init__(**kwds)
I don't see a reason to add another dict for this, since you can just manipulate and pass the dict created for kwds anyway
Edit:
As for the question of the efficiency of using the ** expansion of the dict versus listing the arguments explicitly, I did a very unscientific timing test with this code:
import time
def some_func(**kwargs):
for k,v in kwargs.items():
pass
def main():
name = 'felix'
location = 'here'
user_type = 'player'
kwds = {'name': name,
'location': location,
'user_type': user_type}
start = time.time()
for i in range(10000000):
some_func(**kwds)
end = time.time()
print 'Time using expansion:\t{0}s'.format(start - end)
start = time.time()
for i in range(10000000):
some_func(name=name, location=location, user_type=user_type)
end = time.time()
print 'Time without expansion:\t{0}s'.format(start - end)
if __name__ == '__main__':
main()
Running this 10,000,000 times gives a slight (and probably statistically meaningless) advantage passing around a dict and using **.
Time using expansion: -7.9877269268s
Time without expansion: -8.06108212471s
If we print the IDs of the dict objects (kwds outside and kwargs inside the function), you will see that python creates a new dict for the function to use in either case, but in fact the function only gets one dict forever. After the initial definition of the function (where the kwargs dict is created) all subsequent calls are just updating the values of that dict belonging to the function, no matter how you call it. (See also this enlightening SO question about how mutable default parameters are handled in python, which is somewhat related)
So from a performance perspective, you can pick whichever makes sense to you. It should not meaningfully impact how python operates behind the scenes.
I've done that myself where in_dict was a dict with lots of keys, or a settings object, or some other "blob" of something with lots of interesting attributes. That's perfectly OK if it makes your code cleaner, particularly if you name it clearly like settings_object or config_dict or similar.
That shouldn't be the usual case, though. Normally it's better to explicitly pass a small set of individual variables. It makes the code much cleaner and easier to reason about. It's possible that a client could pass in_dict = None by accident and you wouldn't know until some method tried to access it. Suppose Actor.__init__ didn't peel apart in_dict but just stored it like self.settings = in_dict. Sometime later, Actor.method comes along and tries to access it, then boom! Dead process. If you're calling Actor.__init__(var1, var2, ...), then the caller will raise an exception much earlier and provide you with more context about what actually went wrong.
So yes, by all means: feel free to do that when it's appropriate. Just be aware that it's not appropriate very often, and the desire to do it might be a smell telling you to restructure your code.
This is not python specific, but the greatest problem I can see with passing arguments like this is that it breaks encapsulation. Any class may modify the arguments, and it's much more difficult to tell which arguments are expected in each class - making your code difficult to understand, and harder to debug.
Consider explicitly consuming the arguments in each class, and calling the super's __init__ on the remaining. You don't need to make them explicit:
class ClassA( object ):
def __init__(self, arg1, arg2=""):
pass
class ClassB( ClassA ):
def __init__(self, arg3, arg4="", *args, **kwargs):
ClassA.__init__(self, *args, **kwargs)
ClassB(3,4,1,2)
You can also leave the variables uninitialized and use methods to set them. You can then use different methods in the different classes, and all subclasses will have access to the superclass methods.

Categories

Resources