Storing and loading str:func Python dictionaries

Storing and loading str:func Python dictionaries - python

Alright so I want to know if doing this would affect the code, and if I'm doing it correctly.
So basically, let's say in one file I have a dictionary called commands (inside a class), and in another an object of the other class is made, and the dictionary is used. During run-time, I edit the dictionary and add new functions. Now I need to reload the dictionary without having to restart the whole script (because this would affect a lot of people using my services). If I send a signal to the script (it's a socket server) that indicates that the dictionary should be reloaded. How would I re-import the module after it's already imported mid-code? and would re-importing it affect the objects made of it, or do I have to somehow reload the objects? (note that the objects contain an active socket, and I do not wish to kill that socket).

It is better to store the data in a database, like Redis which supports dictionary-like data. This way you can avoid the problem of reloading altogether, as the database process makes sure the fetched data is always up-to-date.

Related

Understanding the sync method from the python shelve library

The python documentation says this about the sync method:
Write back all entries in the cache if the shelf was opened with
writeback set to True. Also empty the cache and synchronize the
persistent dictionary on disk, if feasible. This is called
automatically when the shelf is closed with close().
I am really having a hard time understanding this.
How does accessing data from cache differ from accessing data from disk?
And does emptying the cache affect how we can access the data stored
in a shelve?

For whoever is using the data in the Shelve object, it is transparent whether the data is cached or is on disk. If it is not on the cache, the file is read, the cache filled, and the value returned. Otherwise, the value as it is on the cache is used.
If the cache is emptied on calling sync, that means only that on the next value fetched from the same Shelve instance, the file will be read again. Since it is all automatic, there is no difference. The documentation is mostly describing how it is implemented.
If you are trying to open the same "shelve" file with two concurrent apps, or even two instances of shelve on the same program, chances are you are bound to big problems. Other than that, it just behaves as a "persistent dictionary" and that is it.
This pattern of writing to disk and re-reading from a single file makes no difference for a workload of a single user in an interactive program. For a Python program running as a server with tens to thousands of clients, or even a single big-data processing script, where this could impact actual performance, Shelve is hardly a usable thing anyway.

python reduce sqlite3 db lookups

I am trying to reduce sqlite3 db look ups in python. I have system with limited RAM of 1 GB only where i am implementing it. I want to store current DB values somewhere from where i can retrieve them without consulting DB again and again. One thing to keep in mind is that trigger point of all of my python scripts (processes) is different and there is no master script or you can say i am not controlling all of my scripts from one point.
What's in my knowledge:
I don't want to save/retrieve data from file as i don't want to make read/write operations. In a nutshell i don't want to manipulate it via file (Simply saying no to pickel and shelve python modules)
I also cannot use in memory cache modules like memcahced and beaker because of limitation of memory size and also these modules are intended for server side development and i am working on stand alone scripts (iot device)
I cannot use singleton classes because of limitation of namespaces and scope. As soon as scope of one script ends, instance of singleton also vanishes and i am not able to persist instance of singleton class in all of my python scripts. I am not able to use static variables and static methods too because instance does not stick in scope and everything becomes volatile and goes back to initialized value instead of current DB values every next time i import singleton class script in any of my other scripts.
As trigger point of all of my python scripts is different which also makes it impossible to use global variables too. Global variables are required to be initialized with some value whereas i want current DB values in global variables.
I also cannot do memory segmentation as python does not allow me to do so.
What else can i do more?
Is there any python library or any other language's library which allows me to insert current DB values so that i instead of looking up from Sqlite3 DB i get values from there without doing any read/write operation?? (By read/write operation i mean not to load from hard drive or sd card)
Thanks in advance, any help from you is highly appreciated.

In Python, how do I tie an on-disk JSON file to an in-process dictionary?

In perl there was this idea of the tie operator, where writing to or modifying a variable can run arbitrary code (such as updating some underlying Berkeley database file). I'm quite sure there is this concept of overloading in python too.
I'm interested to know what the most idiomatic way is to basically consider a local JSON file as the canonical source of needed hierarchical information throughout the running of a python script, so that changes in a local dictionary are automatically reflected in the JSON file. I'll leave it to the OS to optimise writes and cache (I don't mind if the file is basically updated dozens of times throughout the running of the script), but ultimately this is just about a kilobyte of metadata that I'd like to keep around. It's not necessary to address concurrent access to this. I'd just like to be able to access a hierarchical structure (like nested dictionary) within the python process and have reads (and writes to) that structure automatically result in reads from (and changes to) a local JSON file.

well, since python itself has no signals-slots, I guess you can instead make your own dictionary class by inherit it from python dictionary. Class exactly like python dict, only in every method of it that can change dict values you will dump your json.
also you can use smth like PyQt4 QAbstractItemModel which has signals. And when it data changed signal will emitted, do your dumping - it will be only in one place, which is nice.
I know these two are sort of stupid ways, probably yea. :) If anyone knows better, go ahead and tell!

This is a developpement from aspect_mkn8rd' answer taking into account Gerrat's comments, but it is too long for a true comment.
You will need 2 special container classes emulating a list and a dictionnary. In both, you add a pointer to a top-level object and override the following methods :
__setitem__(self, key, value)
__delitem__(self, key)
__reversed__(self)
All those methods are called in modification and should have the top-level object to be written to disk.
In addition, __setitem__(self, key, value) should look if value is a list and wrap it into a special list object or if it is a dictionary, wrap it into a special dictionnary object. In both case, the method should set the top-level object into the new container. If neither of them and the object defines __setitem__, it should raise an Exception saying the object is not supported. Of course you should then modify the method to take in account this new class.
Of course, there is a good deal of code to write and test, but it should work - left to the reader as an exercise :-)

If concurrency is not required, maybe consider writing 2 functions to read and write the data to a shelf file? Our is the idea to have the dictionary" aware" of changes to update the file without this kind of thing?

Python: How to add/initialize new global vars IN another module?

I looked up other posts on the topic and I couldn't find my situation exactly. It is in a Django app, although I believe it's purely a (newbie) Python question. Here's my situation:
Let's say I have mymodule.py where I have various constants and common functions, and at some point elsewhere in the program, I will want to add (and initialize) another attribute for mymodule (if it it's not yet been added):
import mymodule
class UserView(View):
# this method always gets called first..
def get(self, request):
try:
# check if attribute exists
mymodule.user_data;
except AttributeError:
# add it if it doesn't
mymodule.user_data = mymodule.get_user_data()
# continue on..
# sometime later, this method is called..
def post(self, request)
print(mymodule.user_data)
My assumption was that once mymodule.user_data is added, it will persist as a global variable? Even though I do set it in the get() method first, when I try to read it in the post() method later, I get Error: 'module' object has no attribute 'account'
Does it need to be pre-initialized in mymodule.py, as some empty object? I may not necessarily know what type of object it will be -- how would I do it in Python? (Sorry, coming from JS -- don't shoot!)

You should not do this. Your proposed solution is very dangerous, as now all users will share the same data. You almost certainly don't want that.
For per-user data shared between requests, you should use the session.
Edit
There's no way to know if they are separate processes or not. Your server software (Apache, or whatever) will determine the number of processes to run (based on your settings), and automatically route requests between them. Each process could serve any number of requests before being killed and restarted. So, in all likelihood, two consecutive requests could indeed be served by the same process, in which case the data will indeed collide.
Note that the session data is stored on the server (only a key is stored in the user's cookie), so size shouldn't be a consideration. See the sessions documentation.

You should not want to do that.
But it works as "expected": just do
mymodule.variable = value
anywhere in your code.
So, yes, your example code is setting the variable in the current running program -
but then you hit the part where I said: "you should not want to do that" :-)
Because django, when running with production settings will behave differently than a single-proccess, single-thread python application.
In this case, if the variable is not set in mymodule when you try to access it later, it maybe because this access is happening in another process entirely (thus, "global variables" (actually, in Python we have "module" variables) won't work, since they are set per process).
In this particular case, since you have a function ot retrieve your desired value,and you may be worried that it is an expensive value, you should memoize it - check the documentation on django.utils.functional.memoize (which will change to django.utils.lru_cache.lru_cache in upcoming versions) - https://docs.djangoproject.com/en/dev/releases/1.7/ - this way it will be called once per process in your application as it serves from separated processes.

My solution (for now):
In the module mymodule.py, I initialized a dictionary: data = {}
Then in my get() method:
if not ('user' in mymodule.data):
mymodule.data['user'] = mymodule.get_user_data()
Subsequently, I'm able to retrieve the mymodule.data['user'] object in the post() method (and presumably elsewhere in my code). Seems to work but please let me know if it's an aberration!

Reloading global Python variables in a long running process

I have celery Python worker processes that are restarted every day or so. They execute Python/Django programs.
I have set certain quasi-global values that should persist in memory for the duration of the process. Namely, I have certain MySQL querysets that do not change often and are therefore evaluated one time and stored as a CONSTANT as soon as the process starts (a bad example being PROFILE = Profile.objects.get(user_id=5)).
Let's say that I want to reset this value in the celery process without exec-ing a whole new program.
This value is imported (and used) in a number of different modules. I'm assuming I'd have to go through each one in sys.modules that imports the CONSTANT and delete/reset the key? Is that right?
This seems very hacky. I usually use external services like Memcached for coordination of memory among multiple processes, but every once in a while, I figure local memory is preferable to over the network calls to a NoSQL store.

It's a bit hard to say without seeing some code, but importing just sets a reference, exactly as with variable assignment: that is, if the data changes, the references change too. Naturally though this only works if it's the parent context that you've imported (otherwise assignment will change the reference, rather than updating the value.)
In other words, if you do this:
from mypackage import mymodule
do_something_with(mymodule.MY_CONSTANT)
#elsewhere
mymodule.MY_CONSTANT = 'new_value'
then all references to mymodule.MY_CONSTANT will get the new value. But if you did this:
from mypackage.mymodule import MY_CONSTANT
# elsewhere
mymodule.MY_CONSTANT = 'new_value'
the original reference won't get the new value, because you've rebound MY_CONSTANT to something else but the first reference is still pointing at the old value.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Storing and loading str:func Python dictionaries - python

It is better to store the data in a database, like Redis which supports dictionary-like data. This way you can avoid the problem of reloading altogether, as the database process makes sure the fetched data is always up-to-date.

Related

Understanding the sync method from the python shelve library

python reduce sqlite3 db lookups

In Python, how do I tie an on-disk JSON file to an in-process dictionary?

Python: How to add/initialize new global vars IN another module?

Reloading global Python variables in a long running process

Categories

Resources