Python: How to add/initialize new global vars IN another module? - python

I looked up other posts on the topic and I couldn't find my situation exactly. It is in a Django app, although I believe it's purely a (newbie) Python question. Here's my situation:
Let's say I have mymodule.py where I have various constants and common functions, and at some point elsewhere in the program, I will want to add (and initialize) another attribute for mymodule (if it it's not yet been added):
import mymodule
class UserView(View):
# this method always gets called first..
def get(self, request):
try:
# check if attribute exists
mymodule.user_data;
except AttributeError:
# add it if it doesn't
mymodule.user_data = mymodule.get_user_data()
# continue on..
# sometime later, this method is called..
def post(self, request)
print(mymodule.user_data)
My assumption was that once mymodule.user_data is added, it will persist as a global variable? Even though I do set it in the get() method first, when I try to read it in the post() method later, I get Error: 'module' object has no attribute 'account'
Does it need to be pre-initialized in mymodule.py, as some empty object? I may not necessarily know what type of object it will be -- how would I do it in Python? (Sorry, coming from JS -- don't shoot!)

You should not do this. Your proposed solution is very dangerous, as now all users will share the same data. You almost certainly don't want that.
For per-user data shared between requests, you should use the session.
Edit
There's no way to know if they are separate processes or not. Your server software (Apache, or whatever) will determine the number of processes to run (based on your settings), and automatically route requests between them. Each process could serve any number of requests before being killed and restarted. So, in all likelihood, two consecutive requests could indeed be served by the same process, in which case the data will indeed collide.
Note that the session data is stored on the server (only a key is stored in the user's cookie), so size shouldn't be a consideration. See the sessions documentation.

You should not want to do that.
But it works as "expected": just do
mymodule.variable = value
anywhere in your code.
So, yes, your example code is setting the variable in the current running program -
but then you hit the part where I said: "you should not want to do that" :-)
Because django, when running with production settings will behave differently than a single-proccess, single-thread python application.
In this case, if the variable is not set in mymodule when you try to access it later, it maybe because this access is happening in another process entirely (thus, "global variables" (actually, in Python we have "module" variables) won't work, since they are set per process).
In this particular case, since you have a function ot retrieve your desired value,and you may be worried that it is an expensive value, you should memoize it - check the documentation on django.utils.functional.memoize (which will change to django.utils.lru_cache.lru_cache in upcoming versions) - https://docs.djangoproject.com/en/dev/releases/1.7/ - this way it will be called once per process in your application as it serves from separated processes.

My solution (for now):
In the module mymodule.py, I initialized a dictionary: data = {}
Then in my get() method:
if not ('user' in mymodule.data):
mymodule.data['user'] = mymodule.get_user_data()
Subsequently, I'm able to retrieve the mymodule.data['user'] object in the post() method (and presumably elsewhere in my code). Seems to work but please let me know if it's an aberration!

Related

Storing and loading str:func Python dictionaries

Alright so I want to know if doing this would affect the code, and if I'm doing it correctly.
So basically, let's say in one file I have a dictionary called commands (inside a class), and in another an object of the other class is made, and the dictionary is used. During run-time, I edit the dictionary and add new functions. Now I need to reload the dictionary without having to restart the whole script (because this would affect a lot of people using my services). If I send a signal to the script (it's a socket server) that indicates that the dictionary should be reloaded. How would I re-import the module after it's already imported mid-code? and would re-importing it affect the objects made of it, or do I have to somehow reload the objects? (note that the objects contain an active socket, and I do not wish to kill that socket).
It is better to store the data in a database, like Redis which supports dictionary-like data. This way you can avoid the problem of reloading altogether, as the database process makes sure the fetched data is always up-to-date.

Memory leak in django when keeping a reference of all instances of forms

This is a followup to this thread.
I have implemented the method for keeping a reference to all my forms in a array like mentioned by the selected answer in this but unfortunately I am getting a memory leak on each request to the django host.
The code in question is as follows:
This is my custom form I am extending which has a function to keep reference of neighboring forms, and whenever I instantiate a new form, it just keeps getting added to the _instances stack.
class StepForm(ModelForm):
TAGS = []
_instances = []
def __new__(cls, *args, **kwargs):
instance = object.__new__(cls)
cls._instances.append(instance)
return instance
Even though this more of a python issue then Django I have decided that it's better to show you the full context that I am encountering this problem at.
As requested I am posting what I am trying to accomplish with this feat:
I have a js applet with steps, and for each step there is a form, but in order to load the contents of each step dynamically through JS I need to execute some calls on the next form in line. And on the previous aswell. Therefore the only solution I Could come up with is to just keep a reference to all forms on each request and just use the forms functions I need.
Well it's not only a Python issue - the execution context (here a Django app) is important too. As Ludwik Trammer rightly comments, you're in a long running process, and as such anything at the module or class level will live for the duration of the process. Also if using more than one process to serve the app you may (and will) get inconsistant results from one request to another, since two subsequent requests from a same user can (and will) end up being served by different processes.
To make a long story short: the way to safely keep per-user persistant state in a web application is to use sessions. Please explain what problem you're trying to solve, there's very probably a more appropriate (and possibly existing and tested) solution.
EDIT : ok what you're looking for is a "wizard". There are a couple available implementations for Django but most of them don't handle going back - which, from experience, can get tricky when each step depends on the previous one (and that's one of the driving points for using a wizard). What one usually do is have a `Wizard' class (plain old Python object) with a set of forms.
The wizard takes care of
step to step navigation
instanciating forms
maintaining state (which includes storing and retrieving form's data for each step, revalidating etc).
FWIW I've had rather mixed success using Django's existing session-based wizard. We rolled our own for another project (with somehow complex requirements) and while it works I wouldn't name it a success neither. Having ajax and file uploads thrown in the mix doesn't help neither. Anyway, you can try to start with an existing implementation, see how it fits your needs, and go for a custom solution if it doesn't - generic solutions sometimes make things harder than they have to be.
My 2 cents...
The leak is not just a side effect of your code - it's part of its core function. It is not possible to remove the leak without changing what the code does.
It does exactly what it is programmed to do - every time the form is displayed a new instance is created and added to the _instances list. It is never removed from the list. As a consequence after 100 requests you will have a list with 100 requests, after 1 000 requests there will be 1 000 instances in the list, and so on - until all memory is exhausted and the program crashes.
What did you want to accomplish by keeping all instances of your form? And what else did you expect to happen?

Reloading global Python variables in a long running process

I have celery Python worker processes that are restarted every day or so. They execute Python/Django programs.
I have set certain quasi-global values that should persist in memory for the duration of the process. Namely, I have certain MySQL querysets that do not change often and are therefore evaluated one time and stored as a CONSTANT as soon as the process starts (a bad example being PROFILE = Profile.objects.get(user_id=5)).
Let's say that I want to reset this value in the celery process without exec-ing a whole new program.
This value is imported (and used) in a number of different modules. I'm assuming I'd have to go through each one in sys.modules that imports the CONSTANT and delete/reset the key? Is that right?
This seems very hacky. I usually use external services like Memcached for coordination of memory among multiple processes, but every once in a while, I figure local memory is preferable to over the network calls to a NoSQL store.
It's a bit hard to say without seeing some code, but importing just sets a reference, exactly as with variable assignment: that is, if the data changes, the references change too. Naturally though this only works if it's the parent context that you've imported (otherwise assignment will change the reference, rather than updating the value.)
In other words, if you do this:
from mypackage import mymodule
do_something_with(mymodule.MY_CONSTANT)
#elsewhere
mymodule.MY_CONSTANT = 'new_value'
then all references to mymodule.MY_CONSTANT will get the new value. But if you did this:
from mypackage.mymodule import MY_CONSTANT
# elsewhere
mymodule.MY_CONSTANT = 'new_value'
the original reference won't get the new value, because you've rebound MY_CONSTANT to something else but the first reference is still pointing at the old value.

Are module level variables safe when used with uWSGI?

I am seeing an odd error where a variable I create at the module scope -- as in, at the top of the file before any classes or functions are defined -- is behaving differently over time. This variable (let's call it _cache) gets pulled into my classes:
_cache = None
class XMLGenerator(object):
global _cache
def __init__(self, parms):
if _cache is None:
_cache = expensive_query(parms)
The results of this cache can be different depending on the context of the request coming into the web services, but I am seeing differing behavior in the resulting XML output between calls to the same service: I can restart the server and everything is great, but eventually the anomalous behavior begins again.
Is uWSGI preserving state between requests somehow?
I wanted to circle back and explain what happened here. Global variables are, in fact, not "refreshed" between requests to the same service in uWSGI. Thus, if you create a module level variable, it will carry state between multiple requests. This, obviously, was not what I intended; so I ended up passing a caching object around between the different calls into XMLGenerator. It caused the API to be quite ugly, but avoided the issue with module-level variables.
If you are doing this with multiple workers than you probably want to use uwsig's CachingFramework:
http://projects.unbit.it/uwsgi/wiki/CachingFramework
Otherwise I believe _cache can be different across the workers.
Also, you could test with uwsgi --processes 1 to see if the problem goes away.

How do I disable history in python mechanize module?

I have a web scraping script that gets new data once every minute, but over the course of a couple of days, the script ends up using 200mb or more of memory, and I found out it's because mechanize is keeping an infinite browser history for the .back() function to use.
I have looked in the docstrings, and I found the clear_history() function of the browser class, and I invoke that each time I refresh, but I still get 2-3mb higher memory usage on each page refresh. edit: Hmm, seems as if it kept doing the same thing after I called clear_history, up until I got to about 30mb worth of memory usage, then it cleared back down to 10mb or so (which is the base amount of memory my program starts up with)...any way to force this behavior on a more regular basis?
How do I keep mechanize from storing all of this info? I don't need to keep any of it. I'd like to keep my python script below 15mb memory usage.
You can pass an argument history=whatever when you instantiate the Browser; the default value is None which means the browser actually instantiates the History class (to allow back and reload). The simplest approach (will give an attribute error exception if you ever do call back or reload):
class NoHistory(object):
def add(self, *a, **k): pass
def clear(self): pass
b = mechanize.Browser(history=NoHistory())
a cleaner approach would implement other methods in NoHistory to give clearer exceptions on erroneous use of the browser's back or reload, but this simple one should suffice otherwise.
Note that this is an elegant (though not well documented;-) use of the dependency injection design pattern: in a (bleah) "monkeypatching" world, the client code would be expected to overwrite b._history after the browser is instantiated, but with dependency injection you just pass in the "history" object you want to use. I've often maintained that Dependency Injection may be the most important DP that wasn't in the "gang of 4" book!-).

Categories

Resources