python reduce sqlite3 db lookups

python reduce sqlite3 db lookups - python

I am trying to reduce sqlite3 db look ups in python. I have system with limited RAM of 1 GB only where i am implementing it. I want to store current DB values somewhere from where i can retrieve them without consulting DB again and again. One thing to keep in mind is that trigger point of all of my python scripts (processes) is different and there is no master script or you can say i am not controlling all of my scripts from one point.
What's in my knowledge:
I don't want to save/retrieve data from file as i don't want to make read/write operations. In a nutshell i don't want to manipulate it via file (Simply saying no to pickel and shelve python modules)
I also cannot use in memory cache modules like memcahced and beaker because of limitation of memory size and also these modules are intended for server side development and i am working on stand alone scripts (iot device)
I cannot use singleton classes because of limitation of namespaces and scope. As soon as scope of one script ends, instance of singleton also vanishes and i am not able to persist instance of singleton class in all of my python scripts. I am not able to use static variables and static methods too because instance does not stick in scope and everything becomes volatile and goes back to initialized value instead of current DB values every next time i import singleton class script in any of my other scripts.
As trigger point of all of my python scripts is different which also makes it impossible to use global variables too. Global variables are required to be initialized with some value whereas i want current DB values in global variables.
I also cannot do memory segmentation as python does not allow me to do so.
What else can i do more?
Is there any python library or any other language's library which allows me to insert current DB values so that i instead of looking up from Sqlite3 DB i get values from there without doing any read/write operation?? (By read/write operation i mean not to load from hard drive or sd card)
Thanks in advance, any help from you is highly appreciated.

Related

Storing and loading str:func Python dictionaries

Alright so I want to know if doing this would affect the code, and if I'm doing it correctly.
So basically, let's say in one file I have a dictionary called commands (inside a class), and in another an object of the other class is made, and the dictionary is used. During run-time, I edit the dictionary and add new functions. Now I need to reload the dictionary without having to restart the whole script (because this would affect a lot of people using my services). If I send a signal to the script (it's a socket server) that indicates that the dictionary should be reloaded. How would I re-import the module after it's already imported mid-code? and would re-importing it affect the objects made of it, or do I have to somehow reload the objects? (note that the objects contain an active socket, and I do not wish to kill that socket).

It is better to store the data in a database, like Redis which supports dictionary-like data. This way you can avoid the problem of reloading altogether, as the database process makes sure the fetched data is always up-to-date.

Why can't I change variables from cached modules in IronPython?

Disclaimer: I am new to python and IronPython so sorry if this is obvious.
We have a C# application that uses IronPython to execute scripts. There are a few common modules/scripts and then a lot of little scripts that define parameters, do setup, then call functions in the core modules. After some recent additions made the common modules larger performance took a hit on the imports. I attempted to fix this by making sure we only created one engine and created scopes for each script to run in. I've seen information that these are compiled on the engine, but this is apparently not so as it continued to take excessive time importing them so it must be cached in the scope. Then I used THIS blog entry to create a custom shared dictionary where I could precompile the common modules at app load and then reuse it. Everything was working fine until I realized that variables were not changing on subsequent runs. After creating a scope in which to run a script I would add a required variable...
currentScope.SetVariable("agr", aggregator)
The first time this runs agr works fine in the scripts and is say instance A. On subsequent runs a new scope is created, a new aggregator is created (let's call it B) and set as agr, but when the underlying modules call agr it is not aggregator B, its aggregator A which i no longer valid. I have even tried to force it adding this to the main script...
CommonModule.agr = agr
#Do Work
CommonModule.agr = None
to no avail. agr itself is not stored in the shared symbol dictionary, but CommonModule is and it has a variable for agr. What do I have to do to change this variable and why is it cached in this manner?
UPDATE FOR CLARIFICATION: Sorry about the confusion, but it's a combination of so much code across C# and python it would be hard to include. Let me see if I can clarify a little. Every time I run a script I need to set the value for 'agr' to a new object which is created in C# prior to python execution using scope.SetVariable(). Some core modules are imported and compiled into a cached scope. On script execution a new temporary scope is created using a SharedSymbolDictionary created with the shared scope (to avoid importing core modules every time) which executes the script.
The problem is 'agr' is set correctly the first time both in the main script and the core (precompiled) scripts, however on subsequent script executions 'agr' is correct in the main script, but when the core scripts reference 'agr' it is pointing to the 'agr' created the first execution and NOT the new 'agr' object created for that execution and most of its references are null now.

All the comments without the code are a bit confusing.
But taking just the last paragraph, if you would like to modify the module level variable from C# you can:
# scope is SharedSymbolDictionary
var module = scope.GetVariable("CommonModule") as PythonModule;
module.Get__dict__()["agr"] = "new value";
The second observation, the variables which are provided to SharedSymbolDictionary as sharedScope can be changed within the individual run, but disappear on subsequent runs. If you would like to make persistent changes during the script run, you need to change TrySetExtraValue to something like this:
protected override bool TrySetExtraValue(string key, object value) {
lock (_sharedScope) {
if (_sharedScope.ContainsVariable(key)) {
_sharedScope.SetVariable(key, value);
return true;
}
return false;
}
}
Note: I work with Ironpython 2.7 and .Net 4.0. The signature of TrySetExtraValue is a bit different than the one in the blog.

So I don't have a solid explanation, but I found a simple solution. Originally 'agr' was the variable name used everywhere... scope.SetVariable(), in the top level scripts, and in the precompiled core scripts.
For the fix, I changed the C# to use the variable name 'aggregator' for SetVariable(). I then created a module imported by all the top level main scripts i.e. sharedModule and then used...
sharedModule.agr = aggregator
Then I changed all core scripts to use sharedModule.agr instead of just 'agr' and that seems to work the way I want it.

Reloading global Python variables in a long running process

I have celery Python worker processes that are restarted every day or so. They execute Python/Django programs.
I have set certain quasi-global values that should persist in memory for the duration of the process. Namely, I have certain MySQL querysets that do not change often and are therefore evaluated one time and stored as a CONSTANT as soon as the process starts (a bad example being PROFILE = Profile.objects.get(user_id=5)).
Let's say that I want to reset this value in the celery process without exec-ing a whole new program.
This value is imported (and used) in a number of different modules. I'm assuming I'd have to go through each one in sys.modules that imports the CONSTANT and delete/reset the key? Is that right?
This seems very hacky. I usually use external services like Memcached for coordination of memory among multiple processes, but every once in a while, I figure local memory is preferable to over the network calls to a NoSQL store.

It's a bit hard to say without seeing some code, but importing just sets a reference, exactly as with variable assignment: that is, if the data changes, the references change too. Naturally though this only works if it's the parent context that you've imported (otherwise assignment will change the reference, rather than updating the value.)
In other words, if you do this:
from mypackage import mymodule
do_something_with(mymodule.MY_CONSTANT)
#elsewhere
mymodule.MY_CONSTANT = 'new_value'
then all references to mymodule.MY_CONSTANT will get the new value. But if you did this:
from mypackage.mymodule import MY_CONSTANT
# elsewhere
mymodule.MY_CONSTANT = 'new_value'
the original reference won't get the new value, because you've rebound MY_CONSTANT to something else but the first reference is still pointing at the old value.

How can I speed up a web-application? (Avoid rebuilding a structure.)

After having successfully build a static data structure (see here), I would want to avoid having to build it from scratch every time a user requests an operation on it. My naïv first idea was to dump the structure (using python's pickle) into a file and load this file for each query. Needless to say (as I figured out), this turns out to be too time-consuming, as the file is rather large.
Any ideas how I can easily speed up this thing? Splitting the file into multiple files? Or a program running on the server? (How difficult is this to implement?)
Thanks for your help!

You can dump it in a memory cache (such as memcached).
This method has the advantage of cache key invalidation. When underlying data changes you can invalidate your cached data.
EDIT
Here's the python implementation of memcached: python-memcached. Thanks NicDumZ.

If you can rebuild your Python runtime with the patches offered in the Unladen Swallow project, you should see speedups of 40% to 150% in pickling, 36% to 56% in unpickling, according to their benchmarks; maybe that might help.

My suggestion would be not to rely on having an object structure. Instead have a byte array (or mmap'd file etc) which you can do random access operations on and implement the cross-referencing using pointers inside that structure.
True, it will introduce the concept of pointers to your code, but it will mean that you don't need to unpickle it each time the handler process starts up, and it will also use a lot less memory (as there won't be the overhead of python objects).
As your database is going to be fixed during the lifetime of a handler process (I imagine), you won't need to worry about concurrent modifications or locking etc.
Even if you did what you suggest, you shouldn't have to rebuild it on every user request, just keep an instance in memory in your worker process(es), which means it won't take too long to build as you only build it when a new worker process starts.

The number one way to speed up your web application, especially when you have lots of mostly-static modules, classes and objects that need to be initialized: use a way of serving files that supports serving multiple requests from a single interpreter, such as mod_wsgi, mod_python, SCGI, FastCGI, Google App Engine, a Python web server... basically anything except a standard CGI script that starts a new Python process for every request. With this approach, you can make your data structure a global object that only needs to be read from a serialized format for each new process—which is much less frequent.

How would one make Python objects persistent in a web-app?

I'm writing a reasonably complex web application. The Python backend runs an algorithm whose state depends on data stored in several interrelated database tables which does not change often, plus user specific data which does change often. The algorithm's per-user state undergoes many small changes as a user works with the application. This algorithm is used often during each user's work to make certain important decisions.
For performance reasons, re-initializing the state on every request from the (semi-normalized) database data quickly becomes non-feasible. It would be highly preferable, for example, to cache the state's Python object in some way so that it can simply be used and/or updated whenever necessary. However, since this is a web application, there several processes serving requests, so using a global variable is out of the question.
I've tried serializing the relevant object (via pickle) and saving the serialized data to the DB, and am now experimenting with caching the serialized data via memcached. However, this still has the significant overhead of serializing and deserializing the object often.
I've looked at shared memory solutions but the only relevant thing I've found is POSH. However POSH doesn't seem to be widely used and I don't feel easy integrating such an experimental component into my application.
I need some advice! This is my first shot at developing a web application, so I'm hoping this is a common enough issue that there are well-known solutions to such problems. At this point solutions which assume the Python back-end is running on a single server would be sufficient, but extra points for solutions which scale to multiple servers as well :)
Notes:
I have this application working, currently live and with active users. I started out without doing any premature optimization, and then optimized as needed. I've done the measuring and testing to make sure the above mentioned issue is the actual bottleneck. I'm sure pretty sure I could squeeze more performance out of the current setup, but I wanted to ask if there's a better way.
The setup itself is still a work in progress; assume that the system's architecture can be whatever suites your solution.

Be cautious of premature optimization.
Addition: The "Python backend runs an algorithm whose state..." is the session in the web framework. That's it. Let the Django framework maintain session state in cache. Period.
"The algorithm's per-user state undergoes many small changes as a user works with the application." Most web frameworks offer a cached session object. Often it is very high performance. See Django's session documentation for this.
Advice. [Revised]
It appears you have something that works. Leverage to learn your framework, learn the tools, and learn what knobs you can turn without breaking a sweat. Specifically, using session state.
Second, fiddle with caching, session management, and things that are easy to adjust, and see if you have enough speed. Find out whether MySQL socket or named pipe is faster by trying them out. These are the no-programming optimizations.
Third, measure performance to find your actual bottleneck. Be prepared to provide (and defend) the measurements as fine-grained enough to be useful and stable enough to providing meaningful comparison of alternatives.
For example, show the performance difference between persistent sessions and cached sessions.

I think that the multiprocessing framework has what might be applicable here - namely the shared ctypes module.
Multiprocessing is fairly new to Python, so it might have some oddities. I am not quite sure whether the solution works with processes not spawned via multiprocessing.

I think you can give ZODB a shot.
"A major feature of ZODB is transparency. You do not need to write any code to explicitly read or write your objects to or from a database. You just put your persistent objects into a container that works just like a Python dictionary. Everything inside this dictionary is saved in the database. This dictionary is said to be the "root" of the database. It's like a magic bag; any Python object that you put inside it becomes persistent."
Initailly it was a integral part of Zope, but lately a standalone package is also available.
It has the following limitation:
"Actually there are a few restrictions on what you can store in the ZODB. You can store any objects that can be "pickled" into a standard, cross-platform serial format. Objects like lists, dictionaries, and numbers can be pickled. Objects like files, sockets, and Python code objects, cannot be stored in the database because they cannot be pickled."
I have read it but haven't given it a shot myself though.
Other possible thing could be a in-memory sqlite db, that may speed up the process a bit - being an in-memory db, but still you would have to do the serialization stuff and all.
Note: In memory db is expensive on resources.
Here is a link: http://www.zope.org/Documentation/Articles/ZODB1

First of all your approach is not a common web development practice. Even multi threading is being used, web applications are designed to be able to run multi-processing environments, for both scalability and easier deployment .
If you need to just initialize a large object, and do not need to change later, you can do it easily by using a global variable that is initialized while your WSGI application is being created, or the module contains the object is being loaded etc, multi processing will do fine for you.
If you need to change the object and access it from every thread, you need to be sure your object is thread safe, use locks to ensure that. And use a single server context, a process. Any multi threading python server will serve you well, also FCGI is a good choice for this kind of design.
But, if multiple threads are accessing and changing your object the locks may have a really bad effect on your performance gain, which is likely to make all the benefits go away.

This is Durus, a persistent object system for applications written in the Python
programming language. Durus offers an easy way to use and maintain a consistent
collection of object instances used by one or more processes. Access and change of a
persistent instances is managed through a cached Connection instance which includes
commit() and abort() methods so that changes are transactional.
http://www.mems-exchange.org/software/durus/
I've used it before in some research code, where I wanted to persist the results of certain computations. I eventually switched to pytables as it met my needs better.

Another option is to review the requirement for state, it sounds like if the serialisation is the bottle neck then the object is very large. Do you really need an object that large?
I know in the Stackoverflow podcast 27 the reddit guys discuss what they use for state, so that maybe useful to listen to.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.