Safest place for initilization code - python

My application has a datastore entry that needs to be initialized with some default values when the app is first deployed. I have a page that lets administrators of the app edit those values later, so it's a problem if the initialization code runs again and overwrites those edits.
I initially tried putting code in appengine_config.py, but that's clearly not correct, as any new values for the entity were overwritten after a few page loads. I thought about putting it in main.py, before the call to run_wsgi_app(), but it's my understanding that main.py is run whenever App Engine creates a new instance of the application. Warmup requests seem to have the same problem as appengine_config.py.
Is there a way to do what I'm trying to do?

Typically you could use appengine_config.py or an explicit handler.
If you use appengine_config.py your code should check for the values existence, and only when no value exists should it define a default.
My main concern with one only initialisation code in appengine_config.py is the check for existence of these initial values will be performed on every instance startup. If there is a lot to check that's an overhead on warm starts that you may not want.
For iany initialisation code for a new instance, you will have this problem of checking existence no matter what strategy you adopt, that is "Ensuring what ever process intialiases default values runs at most once".
Personally I would actually have a specific handler method that you call only once. And it then checks to make sure it shouldn't run before taking any action; In case it is called again

Related

bokeh multiple live streaming graphs in different objects / register update routine located in other class

I use python and bokeh to implement streamed live graphing. I want to include several live graphs into a gridplot and run into a kind of "deathlock".
The graphs (there are a lot of them) are created by different classes and the figure objects are returned and then used as input to the gridplot() function.
For live graphing curdoc().add_periodic_callback(update1, 300) references the update routine. I call the update routines of the other graphs directly from update1(). This works but gives me following error continuously:
`raise RuntimeError("_pending_writes should be non-None when we have a document lock, and we should have the lock when the document changes")
This is expected behavior since data of the other graphs is altered from the 'outside' of their object and from an 'unregistered update routine'. I want to get rid of this error.
In my main object (where the layout is pieced together and curdoc().add_root() is called) I intend to register the other graphs update routines (which have to be regular object routines, so that they can be referenced.) via curdoc().add_periodic_callback(). the problem with this approach is, that the objects update functions take the self parameter and bokeh does not accept that.
Yet I can not do it without self, cause update() needs to reference the source.stream object.
I have no clue how to solve this or do it the 'correct' way. Suggestions are appreciated.
thanks
for clarification:
main object:
def graph(self):
.... bokeh code
#count()
def update(t):
.... update code
curdoc().add_root(gridplot([[l[0]], [l[1]]], toolbar_location="left", plot_width=1000))
curdoc().add_periodic_callback(update, 300)
this works
generic other object
def graph(self):
.... bokeh code
def update(self,t): ....
main object:
curdoc().add_periodic_callback(other_object.update, 300)
this does NOT work.
"_pending_writes should be non-None when we have a document lock, and we should have the lock when the document changes"
Disclaimer: I've been dealing with this error in my own work for two weeks now, and finally resolved the issue today. (: It's easy when every sample you see comes with a csv file that's read and pushed to the doc, all in the same thread, but when things get real and you have a streaming client doing the same thing, suddenly everything stops working.
The general issue is, Bokeh server needs to keep its version of the document model in sync with what Bokeh client has. This happens through a series of events and communication happening between the client (Javascript running in the browser) and the server (inside an event loop that we will get to later on).
So every time you need to change the document, which essentially affects the model, the document needs to be locked (the simplest reason I could think of is concurrency). The simplest way to get around this issue, is to tell the Bokeh document instance you hold, that you are in need of making a change - and request a callback, so Bokeh manages when to call your delegate and allow you to update the document.
Now, with that said, there are few methods in bokeh.plotting.Document that help you request a callback.
The method you would want to probably use based on your use case, for example, if you need an ASAP callback, is add_next_tick_callback.
One thing to remember is that, the reference/pointer to your doc must be correct.
In order to make sure of that, I did wrap all my charting application into a class, and kept an instance of doc internally to access its add_next_tick_callback when new data is received. The way I could point at the right instance, was to initialize my Bokeh app using bokeh.server.server.Server - when it initializes the app, you will receive a doc variable that it's created before starting the server - that would be the right reference to the doc you present in the app. One benefit for having this "chart initializer" in a class, is that you can instantiate it as many times as you may need to construct more charts/documents.
Now, if you are a fan of data pipelines, and streaming, and use something like StreamZ to stream the data to the Pipe or Buffer instance you may have, you must remember one thing:
Be aware of what happens asynchronously, in a thread, or outside of it. Bokeh relies on tornado.ioloop.IOLoop for the most part, and if you are anywhere near running things asynchronously, you must have come across asyncio.
The event loops on these two modules can conflict, and will affect how/when you can change the document.
If you running your streaming in a thread (as the streaming client I wrote did..), make sure that thread has a current loop, otherwise you will face other similar issues. Threads can cause conflicts with internally created loops, and affect how they interact with each other.
With something like the following:
asyncio.set_event_loop(asyncio.new_event_loop())
Finally, be aware of what #gen.coroutine does in tornado. Your callbacks for the streaming, the way I understood, must be decorated with #gen.coroutine if you are doing things asynchronously.

Is django.db.reset_queries required for a (nonweb) script that uses Django when DEBUG is False?

I have a script running continuously (using a for loop and time.sleep). It performs queries on models after loading Django. Debug is set to False in Django settings. However, I have noticed that the process will eat more and more memory. Before my time.sleep(5), I have added a call to django.db.reset_queries().
The very small leak (a few K at a time) has come to an almost full stop, and the issue appears to be addressed. However, I still can't explain why this solves the issue, since when I look at what reset_queries does, it seems to clear a list of queries located in each of connections.all().queries. When I try to output the length of these, it turns out to be 0. So the reset_queries() method seems to clear lists that are already empty.
Is there any reason this would still work nevertheless? I understand reset_queries() is run when using mod wsgi regardless of whether DEBUG is True or not.
Thanks,
After running a debugger, indeed, reset_queries() is required for a non-web python script that uses Django to make queries. For every query made in the while loop, I did find its string representation appended to ones of the queries list in connections.all(), even when DEBUG was set as False.

Django : Call a method only once when the django starts up

I want to initialize some variables (from the database) when Django starts.
I am able to get the data from the database but the problem is how should I call the initialize method . And this should be only called once.
Tried looking in other Pages, but couldn't find an answer to it.
The code currently looks something like this ::
def get_latest_dbx(request, ....):
#get the data from database
def get_latest_x(request):
get_latest_dbx(request,x,...)
def startup(request):
get_latest_x(request)
Some people suggest( Execute code when Django starts ONCE only? ) call that initialization in the top-level urls.py(which looks unusual, for urls.py is supposed to handle url pattern). There is another workaround by writing a middleware: Where to put Django startup code?
But I believe most of people are waiting for the ticket to be solved.
UPDATE:
Since the OP has updated the question, it seems the middleware way may be better, for he actually needs a request object in startup. All startup codes could be put in a custom middleware's process_request method, where request object is available in the first argument. After these startup codes execute, some flag may be set to avoid rerunning them later(raising MiddlewareNotUsed exception only works in __init__, which doesn't receive a request argument).
BTW, OP's requirement looks a bit weird. On one hand, he needs to initialize some variables when Django starts, on the other hand, he need request object in the initialization. But when Django starts, there may be no incoming request at all. Even if there is one, it doesn't make much sense. I guess what he actually needs may be doing some initialization for each session or user.
there are some cheats for this. The general solution is trying to include the initial code in some special places, so that when the server starts up, it will run those files and also run the code.
Have you ever tried to put print 'haha' in the settings.py files :) ?
Note: be aware that settings.py runs twice during start-up

Django view alter global variable

My django app contains a loop, which is launched by the following code in urls.py:
def start_serial():
rfmon = threading.Thread(target=rf_clicker.RFMonitor().run)
rfmon.daemon = True
rfmon.start()
start_serial()
The loop inside this subthread references a global variable defined in global_vars.py. I would like to change to value of this variable in a view, but it doesn't seem to work.
from views.py:
import global_vars
def my_view(request):
global_vars.myvar = 2
return httpResponse...
How can a let the function inside the loop know that this view has been called?
The loop listens for a signal from a remote, and based on button presses may save data to the database. There are several views in the web interface, which change the settings for the remotes. While these settings are being changed the state inside the loop needs to be such that data will not be saved.
I agree with Ignacio Vazquez-Abrams, don't use globals.
Especially in your use case. The problem with this approach is that, when you deploy your app to a wsgi container or what have you, you will have multiple instances of your app running in different processes, so changing a global variable in one process won't change it in others.
And I would also not recommend using threads. If you need a long running process that handles tasks asynchronously(which seems to be the case), consider looking at Celery( http://celeryproject.org/). It's really good at it.
I will admit to having no experience leveraging them, but if you haven't looked at Django's signaling capabilities, it seems like they would be a prime candidate for this kind of activity (and more appropriate than global variables).
https://docs.djangoproject.com/en/dev/ref/signals/

mod_python caching of variables

I'm using mod_python to run Trac in Apache. I'm developing a plugin and am not sure how global variables are stored/cached.
I am new to python and have googled the subject and found that mod_python caches python modules (I think). However, I would expect that cache to be reset when the web service is restarted, but it doesn't appear to be. I'm saying this becasue I have a global variable that is a list, I test the list to see if a value exists and if it doesn't then I add it. The first time I ran this, it added three entries to the list. Subsequently, the list has three entries from the start.
For example:
globalList = []
class globalTest:
def addToTheList(itemToAdd):
print(len(globalTest))
if itemToAdd not in globalList:
globalList.append(itemToAdd)
def doSomething():
addToTheList("I am new entry one")
addToTheList("I am new entry two")
addToTheList("I am new entry three")
The code above is just an example of what I'm doing, not the actual code ;-). But essentially the doSomething() method is called by Trac. The first time it ran, it added all three entries. Now - even after restarting the web server the len(globalList) command is always 3.
I suspect the answer may be that my session (and therefore the global variable) is being cached because Trac is remembering my login details when I refresh the page in Trac after the web server restart. If that's the case - how do I force the cache to be cleared. Note that I don't want to reset the globalList variable manually i.e. globalList.length = 0
Can anyone offer any insight as to what is happening?
Thank you
Obligatory:
Switch to wsgi using mod_wsgi.
Don't use mod_python.
There is Help available for configuring mod_wsgi with trac.
read the mod-python faq it says
Global objects live inside mod_python
for the life of the apache process,
which in general is much longer than
the life of a single request. This
means if you expect a global variable
to be initialised every time you will
be surprised....
go to link
http://www.modpython.org/FAQ/faqw.py?req=show&file=faq03.005.htp
so question is why you want to use global variable?

Categories

Resources