Debug history of variable changing in python

Debug history of variable changing in python - python

I need magic tool, that helps me to understand where one my problem variable is changed in the code.
I know about perfect tool:
pdb.set_trace()
and I need something similar format, but about only one variable changing history.
For example, my current problem is strange value of context['request'] variable inside Django's tag template definition method. The value is string '<<request>>' and I don't understand where it modified from Django's Request object. I can't debug it, because problem is appearing not so often, but permanently. I only see it in error emails and I can't call it specially. The perfect solution will be to create a log with variable's assignment and any modifications.

I'm not really familiar with django, so your mileage may vary. In general, you can override the __setitem__ method on objects to capture item assignment. However, this doesn't work on dictionaries, only on user-created classes, so first of all it depends on what this context object is.
As I get from a short look at the Django docs, it's indeed not a regular dict, so you can try something like this:
def log_setitem(obj):
class Logged(obj.__class__):
def __setitem__(self, item, val):
print "setting", item, "to", val, "on", self
super(Logged, self).__setitem__(item, val)
obj.__class__ = Logged
d = {}
try:
log_setitem(d) # throws an error
except:
print "doesn't work"
class Dict2(dict):
pass
d2 = Dict2()
log_setitem(d2) # this works
d2["hello"] = "world" # prints the log message before assigning
Even if this works, it of course only works if the assignment actually happens through the "standard" way, i.e. somewhere in the code there's a call like context['request'] = "something".
Might be worth a try, but I can't promise you anything.

Related

append to request.sessions[list] in Django

Something is bugging me.
I'm following along with this beginner tutorial for django (cs50) and at some point we receive a string back from a form submission and want to add it to a list:
https://www.youtube.com/watch?v=w8q0C-C1js4&list=PLhQjrBD2T380xvFSUmToMMzERZ3qB5Ueu&t=5777s
def add(request):
if 'tasklist' not in request.session:
request.session['tasklist'] = []
if request.method == 'POST':
form_data = NewTaskForm(request.POST)
if form_data.is_valid():
task = form_data.cleaned_data['task']
request.session['tasklist'] += [task]
return HttpResponseRedirect(reverse('tasks:index'))
I've checked the type of request.session['tasklist']and python shows it's a list.
The task variable is a string.
So why doesn't request.session['tasklist'].append(task) work properly? I can see it being added to the list via some print statements but then it is 'forgotten again' - it doesn't seem to be permanently added to the tasklist.
Why do we use this request.session['tasklist'] += [task] instead?
The only thing I could find is https://ogirardot.wordpress.com/2010/09/17/append-objects-in-request-session-in-django/ but that refers to a site that no longer exists.
The code works fine, but I'm trying to understand why you need to use a different operation and can't / shouldn't use the append method.
Thanks.

The reason why it does not work is because django does not see that you have changed anything in the session by using the append() method on a list that is in the session.
What you are doing here is essentially pulling out the reference to the list and making changes to it without the session backend knowing anything about it. An other way to explain:
The append() method is on the list itself not on the session object
When you call append() on the list you are only talking to the list and the list's parent (the session) has no idea what you guys are doing
When you however do an assignment on the session itself session['whatever'] = 'something' then it knows that something is up and changes are made
So the key here is that you need to operate on the session object directly if you want your changes to be updated automatically
Django only thinks it needs to save a changed session item if the item got reassigned to the session. See here: django session base code the __setitem__ method containing a self.modified = True statement.
The session['list'] += [new_element] adds a new list item (mutates the list stored in the session, so the list reference stays the same) and then gets it reassigned to the session again -> thus triggering first a __getitem__ call -> then your += / __iadd__ runs on the value read -> then a __setitem__ call is made (with the list ref. passed to it). You can see it in the django codebase that it marks the session after each __setitem__ call as modified.
The session['list'] = session['list'] + [new_item] mode of doing the same does create a new list every time it's run so its a bit less efficient, but you should not store hundreds of items in the session anyway. So you're probably fine. This also works exactly as above.
However if you use sub-keys in the session like session['list']['x'] = 'whatever' the session will not see itself as modified so you need to mark it as by request.session.modified = True

Short answer: It's about how Python chooses to implement the dict data structure.
Long answer:
Let's start by saying that request.session is a dictionary.
Quoting Django's documentation, "By default, Django only saves to the session database when the session has been modified – that is if any of its dictionary values have been assigned or deleted". Link
So, the problem is that the session database is not being modified by
request.session['tasklist'].append(task)
Seeing the related parts Django's Session base code (as posted by #Csaba K. in an answer), the variable self.modified is to be set True when setitem dunder method is called.
Now, at this step the problem seems like the setitem dunder method is not being called with request.session['tasklist'].append(task) but with request.session['tasklist'] += [task] it gets called. It is not due to if the reference of request.session['tasklist'] is changing or not as pointed out by another answer, because the reference to the underlying list remains the same.
To confirm, let's create a custom dictionary which extends the Python dict, and print something when setitem dunder method is called.
class MyDict(dict):
def __init__(self, globalVar):
super().__init__()
self.globalVar = globalVar
def __setitem__(self, key, value):
super().__setitem__(key, value)
print("Called Set item when: ", end="")
myDict = MyDict(0)
print("Creating Dict")
print("-----")
myDict["y"] = []
print("Adding a new key-value pair")
print("-----")
myDict["y"] += ["x"]
print(" using +=")
print("-----")
myDict["y"].append("x")
print("append")
print("-----")
myDict["y"].extend(["x"])
print("extend")
print("-----")
myDict["y"] = myDict["y"] + ["x"]
print(" using +",)
print("-----")
It prints:
Creating Dict
-----
Called Set item when: Adding a new key-value pair
-----
Called Set item when: using +=
-----
append
-----
extend
-----
Called Set item when: using +
-----
As we can see, setitem dunder method is called and in turn self.modified is set true only when adding a new key-value pair, or using += or using +, but not when initializing, appending or extending an iterable (in this case a list). Now, the operator + and += do very different things in Python, as explained in the other answer. += behaves more like the append method but in this case, I guess it's more about how Python chooses to implement the dict data structure rather than how +, += and append behave on lists.

I found this while doing some more searching:
https://code.djangoproject.com/wiki/NewbieMistakes
Scroll to 'Appending to a list in session doesn't work'
Again, it is a very dated entry but still seems to hold true.
Not completely satisfied because this does not answer the question as to 'why' this doesn't work, but at the very least confirms 'something's up' and you should probably still use the recommendations there.
(if anyone out there can actually explain this in a more verbose manner then I'd be happy to hear it)

getting a value from a context by name

I want to use context vars for a similar purpose like in this question and accepted answer: Context variables in Python
That corresponds to f3a() in this example:
import contextvars
user_id = contextvars.ContextVar("user_id_var")
def test():
user_id.set("SOME-DATA")
f2()
def f2():
f3a()
f3b()
def f3a():
print(user_id.get())
def f3b():
ctx = contextvars.copy_context()
for key, value in ctx.items():
if key.name == 'user_id_var':
print(value)
break
test()
However the function needs the user_id global variable to get the value. If it were in a different module, it would need to import it.
My idea was that if a function knows there exists a context and it knows the variable name, that should be all it needs. I wrote the f3b, but as you can see, I have to search all variables, because context vars do not support lookup by name. Lookup by variable is implemented, but if I had the variable, I could get the value directly from it (f3a case)
I'm afraid I do not understand why it was designed the way it was. Why an agreed-upon name is not a key? If a context is set in some kind of framework and then used by application code, those two functions will be in different modules without a common module global var. The examples I could find did not help me. Could somebody please explain the rationale behind the context vars API?

This is the best I worked out to make it actually make sense when I use it:
ctx = {ctx_var.name: {"context_var": ctx_var, "value": value} for ctx_var, value in copy_context().items()}

You can get the value of key user_id by this way
user_id = contextvars.ContextVar("user_id_var")
ctx = contextvars.copy_context()
ctx[user_id]
# or
ctx.get(user_id)
I saw in the official document they have mention something that might related to your concern:
The notion of "current value" deserves special consideration: different asynchronous tasks that exist and execute concurrently may have different values for the same key
and Making Context objects picklable

Django method same object not saving

This is something I have been trying to solve for 3 days now and I just can't get my head around it why this is not working.
I have a method that creates a new version of an Object. It used to work, that you would pass in the sou obj. and this would be the source from which a new version is created. You can also pass in a destination, which is not really important in this example. Now I wanted to add locking to this method as we want to add multiple users. So I want to be sure that I always have the most current object from which I create a new one. So I added a line that would just get the newest object. If there is no newer object in the database it would be the same anyway.
def createRevision(request, what, sou, destination=None, ignore = [], **args):
...
if "initial" not in args.keys():
source = get_object_or_404(BaseItem, ppk=sou.ppk, project=sou.project, current=True)
print "------------"
print source == sou
print "------------"
# This outputs True
else:
source = sou
further down in the method I do something like
source.current = False
source.save()
Basically the idea is that I pass in BaseItem and if I don't specify the "initial" keyword then I get the current item from that project with the same ppk (Which is a special random pk in conduction with current). I do this just to be on the save side, that I really have the most current object. And if it is the initial version I just use that one, as there can not be another version.
So now the problem is, that everything works fine if I use sou in this method. I can save it etc .. but as soon as I use source and initial is not in the args it just doesn't save it. The print statement tells me they are the same. Everything I print after the save tells me it has been saved but it just doesn't do it.
source.current = False
source.save()
print "SAVED !!!!"
print source.pk
print source.current
rofl = get_object_or_404(BaseItem, pk=source.pk, project=sou.project)
print rofl.pk
print source.current
outputs the same pk and the same current value but somehow it is not properly saved. As soon as I look into django admin or do a select current = True.
I really don't know what to do anymore.
Why does it work without a problem if I pass in the object into the method but starts to fail when I get the exact same object in the method?
Of course I call the method with the same object:
x = get_object_or_404(BaseItem, ppk=sou.ppk, project=sou.project, current=True)
createRevision(request, "", x)

Thank you pztrick for the hint with the caches. I finally solved it. So the problem was that I was doing:
x = get_object_or_404(BaseItem, ppk=sou.ppk, project=sou.project, current=True)
createRevision(request, "", x)
# .... loads of lines of code
unlock(x)
unlock is a method I wrote that just sets a timestamp so I know no other user is editing it. So now the problem was that I was saving x in createRevision with all the correct data but of course unlock(x) still had a reference to an "old" not updated object and of course was saving it again. Hence it was overwriting my changes in createRevision.
Thank you again to everyone who helped with this.

I think you may be running afoul of model manager caching which is intended to limit database queries. However, by invoking the .all() method on the model manager you force it to hit the databse again.
So, try this: Replace your argument from the BaseItem class to the model manager's .all() QuerySet:
source = get_object_or_404(BaseItem.objects.all(), ppk=sou.ppk, project=sou.project, current=True)
# ...
rofl = get_object_or_404(BaseItem.objects.all(), pk=source.pk, project=sou.project)
get_object_or_404 supports mode classes, model managers, or QuerySets as the first parameter so this is valid.

How do I retain the method attributes of the functions generated through yield in python 2.7?

I have been doing a lot of searching, and I don't think I've really found what I have been looking for. I will try my best to explain what I am trying to do, and hopefully there is a simple solution, and I'll be glad to have learned something new.
This is ultimately what I am trying to accomplish: Using nosetests, decorate some test cases using the attribute selector plugin, then execute test cases that match a criteria by using the -a switch during commandline invocation. The attribute values for the tests that are executed are then stored in an external location. The command line call I'm using is like below:
nosetests \testpath\ -a attribute='someValue'
I have also created a customized nosetest plugin, which stores the test cases' attributse, and writes them to an external location. The idea is that I can select a batch of tests, and by storing the attributes of these tests, I can do filtering on these results later for reporting purposes. I am accessing the method attributes in my plugin by overriding the "wantMethod" method with the code similar to the following:
def set_attribs(self, method, attribute):
if hasattr(method, attribute):
if not self.method_attributes.has_key(method.__name__):
self.method_attributes[method.__name__] = {}
self.method_attributes[method.__name__][attribute] = getattr(method, attribute)
def wantMethod(self, method):
self.set_attribs(method, "attribute1")
self.set_attribs(method, "attribute2")
pass
I have this working for pretty much all the tests, except for one case, where the test is uing the "yield" keyword. What is happening is that the methods that are generated are being executed fine, but then the method attributes are empty for each of the generated functions.
Below is the example of what I am trying to achieve. The test below retreives a list of values, and for each of those values, yields the results from another function:
#attr(attribute1='someValue', attribute2='anotherValue')
def sample_test_generator(self):
for (key, value) in _input_dictionary.items()
f = partial(self._do_test, key, value)
f.attribute1='someValue'
yield (lambda x: f(), key)
def _do_test(self, input1, input2):
# Some code
From what I have read, and think I understand, when yield is called, it would create a new callable function which then gets executed. I have been trying to figure out how to retain the attribute values from my sample_test_generator method, but I have not been successful. I thought I could create a partial method, and then add the attribute to the method, but no luck. The tests execute without errors at all, it just seems that from my plugin's perspective, the method attributes aren't present, so they don't get recorded.
I realize this a pretty involved question, but I wanted to make sure that the context for what I am trying to achieve is clear. I have been trying to find information that could help me for this particular case, but I feel like I've reached a stumbling block now, so I would really like to ask the experts for some advice.
Thanks.
** Update **
After reading through the feedback and playing around some more, it looks like if I modified the lambda expression, it would achieve what I am looking for. In fact, I didn't even need to create the partial function:
def sample_test_generator(self):
for (key, value) in _input_dictionary.items()
yield (lambda: self._do_test)
The only downside to this approach is that the test name will not change. As I am playing around more, it looks like in nosetests, when a test generator is used, it would actually change the test name in the result based on the keywords it contains. Same thing was happening when I was using the lambda expression with a parameter.
For example:
Using lamdba expression with a parameter:
yield (lambda x: self._do_test, "value1")
In nosetests plugin, when you access the test case name, it would be displayed as "sample_test_generator(value1)
Using lambda expression without a parameter:
yield (lambda: self._do_test)
The test case name in this case would be "sample_test_generator". In my example above, if there are multiple values in the dictionary, then the yield call would occur multiple times. However, the test name would always remain as "sample_test_generator". This is not as bad as when I would get the unique test names, but then not be able to store the attribute values at all. I will keep playing around, but thanks for the feedback so far!
EDIT
I forgot to come back and provide my final update on how I was able to get this to work in the end, there was a little confusion on my part at first, and after I looked through it some more, I figured out that it had to do with how the tests are recognized:
My original implementation assumed that every test that gets picked up for execution goes through the "wantMethod" call from the plugin's base class. This is not true when "yield" is used to generate the test, because at this point, the test method has already passed the "wantMethod" call.
However, once the test case is generated through the "yeild" call, it does go through the "startTest" call from the plug-in base class, and this is where I was finally able to store the attribute successfully.
So in a nut shell, my test execution order looked like this:
nose -> wantMethod(method_name) -> yield -> startTest(yielded_test_name)
In my override of the startTest method, I have the following:
def startTest(self, test):
# If a test is spawned by using the 'yield' keyword, the test names would be the parent test name, appended by the '(' character
# example: If the parent test is "smoke_test", the generated test from yield would be "smoke_test('input')
parent_test_name = test_name.split('(')[0]
if self.method_attributes.has_key(test_name):
self._test_attrib = self.method_attributes[test_name]
elif self.method_attributes.has_key(parent_test_name):
self._test_attrib = self.method_attributes[parent_test_name]
else:
self._test_attrib = None
With this implementation, along with my overide of wantMethod, each test spawned by the parent test case also inherits attributes from the parent method, which is what I needed.
Again, thanks to all who send replies. This was quite a learning experience.

Would this fix your name issue?
def _actual_test(x, y):
assert x == y
def test_yield():
_actual_test.description = "test_yield_%s_%s" % (5, 5)
yield _actual_test, 5, 5
_actual_test.description = "test_yield_%s_%s" % (4, 8) # fail
yield _actual_test, 4, 8
_actual_test.description = "test_yield_%s_%s" % (2, 2)
yield _actual_test, 2, 2
Rename survives #attr too.

does this work?
#attr(attribute1='someValue', attribute2='anotherValue')
def sample_test_generator(self):
def get_f(f, key):
return lambda x: f(), key
for (key, value) in _input_dictionary.items()
f = partial(self._do_test, key, value)
f.attribute1='someValue'
yield get_f(f, key)
def _do_test(self, input1, input2):
# Some code
The Problem ist that the local variables change after you created the lambda.

Dictionary or If statements, Jython

I am writing a script at the moment that will grab certain information from HTML using dom4j.
Since Python/Jython does not have a native switch statement I decided to use a whole bunch of if statements that call the appropriate method, like below:
if type == 'extractTitle':
extractTitle(dom)
if type == 'extractMetaTags':
extractMetaTags(dom)
I will be adding more depending on what information I want to extract from the HTML and thought about taking the dictionary approach which I found elsewhere on this site, example below:
{
'extractTitle': extractTitle,
'extractMetaTags': extractMetaTags
}[type](dom)
I know that each time I run the script the dictionary will be built, but at the same time if I were to use the if statements the script would have to check through all of them until it hits the correct one. What I am really wondering, which one performs better or is generally better practice to use?
Update: #Brian - Thanks for the great reply. I have a question, if any of the extract methods require more than one object, e.g.
handle_extractTag(self, dom, anotherObject)
# Do something
How would you make the appropriate changes to the handle method to implemented this? Hope you know what I mean :)
Cheers

To avoid specifying the tag and handler in the dict, you could just use a handler class with methods named to match the type. Eg
class MyHandler(object):
def handle_extractTitle(self, dom):
# do something
def handle_extractMetaTags(self, dom):
# do something
def handle(self, type, dom):
func = getattr(self, 'handle_%s' % type, None)
if func is None:
raise Exception("No handler for type %r" % type)
return func(dom)
Usage:
handler = MyHandler()
handler.handle('extractTitle', dom)
Update:
When you have multiple arguments, just change the handle function to take those arguments and pass them through to the function. If you want to make it more generic (so you don't have to change both the handler functions and the handle method when you change the argument signature), you can use the *args and **kwargs syntax to pass through all received arguments. The handle method then becomes:
def handle(self, type, *args, **kwargs):
func = getattr(self, 'handle_%s' % type, None)
if func is None:
raise Exception("No handler for type %r" % type)
return func(*args, **kwargs)

With your code you're running your functions all get called.
handlers = {
'extractTitle': extractTitle,
'extractMetaTags': extractMetaTags
}
handlers[type](dom)
Would work like your original if code.

It depends on how many if statements we're talking about; if it's a very small number, then it will be more efficient than using a dictionary.
However, as always, I strongly advice you to do whatever makes your code look cleaner until experience and profiling tell you that a specific block of code needs to be optimized.

Your use of the dictionary is not quite correct. In your implementation, all methods will be called and all the useless one discarded. What is usually done is more something like:
switch_dict = {'extractTitle': extractTitle,
'extractMetaTags': extractMetaTags}
switch_dict[type](dom)
And that way is facter and more extensible if you have a large (or variable) number of items.

The efficiency question is barely relevant. The dictionary lookup is done with a simple hashing technique, the if-statements have to be evaluated one at a time. Dictionaries tend to be quicker.
I suggest that you actually have polymorphic objects that do extractions from the DOM.
It's not clear how type gets set, but it sure looks like it might be a family of related objects, not a simple string.
class ExtractTitle( object ):
def process( dom ):
return something
class ExtractMetaTags( object ):
def process( dom ):
return something
Instead of setting type="extractTitle", you'd do this.
type= ExtractTitle() # or ExtractMetaTags() or ExtractWhatever()
type.process( dom )
Then, you wouldn't be building this particular dictionary or if-statement.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Debug history of variable changing in python - python

Related

append to request.sessions[list] in Django

getting a value from a context by name

Django method same object not saving

How do I retain the method attributes of the functions generated through yield in python 2.7?

Dictionary or If statements, Jython

Categories

Resources