Get the path to Django itself - python

I've got some code that runs on every (nearly) every admin request but doesn't have access to the 'request' object.
I need to find the path to Django installation. I could do:
import django
django_path = django.__file__
but that seems rather wasteful in the middle of a request.
Does putting the import at the start of the module waste memory? I'm fairly sure I'm missing an obvious trick here.

So long as Django has already been imported in the Python process (which it has, if your code is, for example, in a view function), importing it again won't do "anything"* — so go nuts, use import django; django.__file__.
Now, if Django hasn't been imported by the current Python process (eg, you're calling os.system("myscript.py") and myscript.py needs to determine Django's path), then import django will be a bit wasteful. But spawning a new process on each request is also fairly wasteful… So if efficiency is important, it might be better import myscript anyway.
*: actually it will set a value in a dictionary… But that's "nothing".

Related

Choose Python classes to instantiate at runtime based on either user input or on command line parameters

I am starting a new Python project that is supposed to run both sequentially and in parallel. However, because the behavior is entirely different, running in parallel would require a completely different set of classes than those used when running sequentially. But there is so much overlap between the two codes that it makes sense to have a unified code and defer the parallel/sequential behavior to a certain group of classes.
Coming from a C++ world, I would let the user set a Parallel or Serial class in the main file and use that as a template parameter to instantiate other classes at runtime. In Python there is no compilation time so I'm looking for the most Pythonic way to accomplish this. Ideally, it would be great that the code determines whether the user is running sequentially or in parallel to select the classes automatically. So if the user runs mpirun -np 4 python __main__.py the code should behave entirely different than when the user calls just python __main__.py. Somehow it makes no sense to me to have if statements to determine the type of an object at runtime, there has to be a much more elegant way to do this. In short, I would like to avoid:
if isintance(a, Parallel):
m = ParallelObject()
elif ifinstance(a, Serial):
m = SerialObject()
I've been reading about this, and it seems I can use factories (which somewhat have this conditional statement buried in the implementation). Yet, using factories for this problem is not an option because I would have to create too many factories.
In fact, it would be great if I can just "mimic" C++'s behavior here and somehow use Parallel/Serial classes to choose classes properly. Is this even possible in Python? If so, what's the most Pythonic way to do this?
Another idea would be to detect whether the user is running in parallel or sequentially and then load the appropriate module (either from a parallel or sequential folder) with the appropriate classes. For instance, I could have the user type in the main script:
from myPackage.parallel import *
or
from myPackage.serial import *
and then have the parallel or serial folders import all shared modules. This would allow me to keep all classes that differentiate parallel/serial behavior with the same names. This seems to be the best option so far, but I'm concerned about what would happen when I'm running py.test because some test files will load parallel modules and some other test files would load the serial modules. Would testing work with this setup?
You may want to check how a similar issue is solved in the stdlib: https://github.com/python/cpython/blob/master/Lib/os.py - it's not a 100% match to your own problem, nor the only possible solution FWIW, but you can safely assume this to be a rather "pythonic" solution.
wrt/ the "automagic" thing depending on execution context, if you decide to go for it, by all means make sure that 1/ both implementations can still be explicitely imported (like os.ntpath and os.posixpath) so they are truly unit-testable, and 2/ the user can still manually force the choice.
EDIT:
So if I understand it correctly, this file you points out imports modules depending on (...)
What it "depends on" is actually mostly irrelevant (in this case it's a builtin name because the target OS is known when the runtime is compiled, but this could be an environment variable, a command line argument, a value in a config file etc). The point was about both conditional import of modules with same API but different implementations while still providing direct explicit access to those modules.
So in a similar way, I could let the user type from myPackage.parallel import * and then in myPackage/init.py I could import all the required modules for the parallel calculation. Is this what you suggest?
Not exactly. I posted this as an example of conditional imports mostly, and eventually as a way to build a "bridge" module that can automagically select the appropriate implementation at runtime (on which basis it does so is up to you).
The point is that the end user should be able to either explicitely select an implementation (by explicitely importing the right submodule - serial or parallel and using it directly) OR - still explicitely - ask the system to select one or the other depending on the context.
So you'd have myPackage.serial and myPackage.parallel (just as they are now), and an additional myPackage.automagic that dynamically selects either serial or parallel. The "recommended" choice would then be to use the "automagic" module so the same code can be run either serial or parallel without the user having to care about it, but with still the ability to force using one or the other where it makes sense.
My fear is that py.test will have modules from parallel and serial while testing different files and create a mess
Why and how would this happen ? Remember that Python has no "process-global" namespace - "globals" are really "module-level" only - and that python's import is absolutely nothing like C/C++ includes.
import loads a module object (can be built directly from python source code, or from compiled C code, or even dynamically created - remember, at runtime a module is an object, instance of the module type) and binds this object (or attributes of this object) into the enclosing scope. Also, modules are garanteed (with a couple caveats, but those are to be considered as error cases) to be imported only once for a given process (and then cached) so importing the same module twice in a same process will yield the same object (IOW a module is a singleton).
All this means that given something like
# module A
def foo():
return bar(42)
def bar(x):
return x * 2
and
# module B
def foo():
return bar(33)
def bar(x):
return x / 2
It's garanteed that however you import from A and B, A.foo will ALWAYS call A.bar and NEVER call B.bar and B.foo will only ever call B.bar (unless you explicitely monkeyptach them of course but that's not the point).
Also, this means that within a module you cannot have access to the importing namespace (the module or function that's importing your module), so you cannot have a module depending on "global" names set by the importer.
To make a long story short, you really need to forget about C++ and learn how Python works, as those are wildly different languages with wildly different object models, execution models and idioms. A couple interesting reads are http://effbot.org/zone/import-confusion.htm and https://nedbatchelder.com/text/names.html
EDIT 2:
(about the 'automagic' module)
I would do that based on whether the user runs mpirun or just python. However, it seems it's not possible (see for instance this or this) in a portable way without a hack. Any ideas in that direction?
I've never ever had anything to do with mpi so I can't help with this - but if the general consensus is that there's no reliable portable way to detect this then obviously there's your answer.
This being said, simple stupid solutions are sometimes overlooked. In your case, explicitly setting an environment variable or passing a command-line switch to your main script would JustWork(tm), ie the user should for example use
SOMEFLAG=serial python main.py
vs
SOMEFLAG=parallel mpirun -np4 python main.py
or
python main.py serial
vs
mpirun -np4 python main.py parallel
(whichever works best for you needs - is the most easily portable).
This of course requires a bit more documentation and some more effort from the end-user but well...
I'm not really what you're asking here. Python classes are just (callable/instantiable) objects themselves, so you can of course select and use them conditionally. If multiple classes within multiple modules are involved, you can also make the imports conditional.
if user_says_parallel:
from myPackage.parallel import ParallelObject
ObjectClass = ParallelObject
else:
from myPackage.serial import SerialObject
ObjectClass = SerialObject
my_abstract_object = ObjectClass()
If that's very useful depends on your classes and the effort it takes to make sure they have the same API so they're compatible when replacing each other. Maybe even inheritance à la ParallelObject => SerialObject is possible, or at least a common (virtual) base class to put all the shared code. But that's just the same as in C++.

Importing from different points in python package (flask project)

I'm trying to build my first web app in flask, and I'm running into some problem. The flask app pushes some pickled objects to a redis queue. The objects are in the namespace app.mypackage.mymoduleA. That is, the objects have
object.__class__ == app.mypackage.mymoduleA.Myclass
Next, I have an external process running a daemon (located in a different module in the same package) that processes the objects as they appear in the redis queue (that is, asynchronously). My problem is that when the objects are unpickled, the pickle module throws an ImportError exception because app.mypackage.mymoduleA.Myclass is not imported into the module.
Although I understand the idea of namespaces (at least I think I do), I'm having some difficulties understanding how they work in reality. So anyway, here are my questions:
1) Is it possible to "fake" the namespace? Something like
import mymoduleA.Myclass as app.mypackage.mymoduleA.Myclass
2) Since faking namespaces is probably a hack, is there a "right" way to do this? Essentially the problem is that the objects are created and pickled in one program which import Myclass like so:
from mypackage.mymoduleA import Myclass
while they are unpickled in another program where Myclass is imported like this:
from mymoduleA import Myclass
I had some difficulties explaining what I mean, so I hope my questions make sense.
Thank you!
Bonus question: I know that the syntax in question 1 doesn't work, but could someone explain to me why this way of importing modules is not allowed?

Help choosing between reload or subprocess

Hello i want to know the best way to re import or re execute a module, because i have a web server with just one Apache session for all my domains and applications, and i if i need to make some changes on one application restart the server will affect the others, so looking for the best way to recall a module. If i choose subprocess i will need to print the response but i don' t know is that most secure way of communication. Please tell me in your experience which is the best way?
Thanks in advance!
Reloading a module is rarely a good idea in a production environment; it's a mechanism intended for debugging. When you reload a module, the module's contents (classes, function, data) get replaced, but existing references to these items from other modules are not affected. This is particularly important for classes: existing objects in memory still refer to the old class, whereas objects generated after the reload refer to the new class.
There is another alternative you might want to consider: load Python code from a file and exec it. Less overhead than a complete subprocess, and less tightly coupled to the rest of a program than a module. In principle the same caveats apply to re-exec-ing as to reloading a module, but you are much less tempted to have references to exec'd code because it's more work.

Python - optimize by not importing at module level?

In a framework such as Django, I'd imagine that if a user lands on a page (running a view function called "some_page"), and you have 8 imports at the top of module that are irrelevant to that view, you're wasting cycles on those imports. My questions are:
Is it a large enough amount of resources to make an impact on a high-traffic website?
Is it such a bad practice to import inside of a function for this purpose that it should be avoided at said impact?
Note: This could be considered premature optimization, but I'm not interested in that argument. Let's assume, for the sake of practical theory, that this is a completed site with loads of traffic, needing to be optimized in every way possible, and the application code, as well as DB have been fully optimized by 50 PhD database admins and developers, and these imports are the only thing left.
No, don't do this. In a normal python execution environment on the web (mod_wsgi, gunicorn, etc.) when your process starts those imports will be executed, and then all subsequent requests will not re-execute the script. If you put the imports inside the functions they'll have to be processed every time the function is called.
Yes, it is a bad practice to import at the function level. By using smarter imports at the top of the module, you create a one time, small cost. However, if you place an import in a function you will suffer the cost of the import each time that function is run. So, rather than import in the function, just import at the top of the module.
A few things you can do to clean up and improve your imports:
Don't use wild imports e.g. from x import *
Where possible, just use a normal import e.g. import x
Try to split your code up into smaller modules that can be called separately, so that fewer imports are made
Also, placing imports at the top of the module is a matter of style. There's a reason why PEP 8 says that modules need to be imported at the top. It's far more readable and maintainable that way.
Finally, some imports at function level will cause compatibility issues in the future, as from x import * is not valid Python 3.x at function level.
1) The answer is no. Django/Python is not like PHP. Your whole module will not be reinterpreted with each pageview like happens with PHP includes. The module will be in memory and each page view will make a simple function call to your view.
2) Yes, it will be a counter-optimization to make imports at the view level.
No. Same reason as other answers.
Yes. Same reason as other answes.
BTW, you can also do import lazily.
For example, Importing toolkit can "import" a module in top level code, but the module is not actually loaded until one of the attributes is accessed.
Sometimes following boiler-plate makes sense:
foo = None
def foorify():
global foo
if not foo: from xxx import foo
foo.bar()
This makes sense when foorification is conditional on something that rarely changes, e.g. one server foorifies while another never does or if you don't want or cannot safely import foo during application startup or most tests.

Prevent Python from caching the imported modules

While developing a largeish project (split in several files and folders) in Python with IPython, I run into the trouble of cached imported modules.
The problem is that instructions import module only reads the module once, even if that module has changed! So each time I change something in my package, I have to quit and restart IPython. Painful.
Is there any way to properly force reloading some modules? Or, better, to somehow prevent Python from caching them?
I tried several approaches, but none works. In particular I run into really, really weird bugs, like some modules or variables mysteriously becoming equal to None...
The only sensible resource I found is Reloading Python modules, from pyunit, but I have not checked it. I would like something like that.
A good alternative would be for IPython to restart, or restart the Python interpreter somehow.
So, if you develop in Python, what solution have you found to this problem?
Edit
To make things clear: obviously, I understand that some old variables depending on the previous state of the module may stick around. That's fine by me. By why is that so difficult in Python to force reload a module without having all sort of strange errors happening?
More specifically, if I have my whole module in one file module.py then the following works fine:
import sys
try:
del sys.modules['module']
except AttributeError:
pass
import module
obj = module.my_class()
This piece of code works beautifully and I can develop without quitting IPython for months.
However, whenever my module is made of several submodules, hell breaks loose:
import os
for mod in ['module.submod1', 'module.submod2']:
try:
del sys.module[mod]
except AttributeError:
pass
# sometimes this works, sometimes not. WHY?
Why is that so different for Python whether I have my module in one big file or in several submodules? Why would that approach not work??
import checks to see if the module is in sys.modules, and if it is, it returns it. If you want import to load the module fresh from disk, you can delete the appropriate key in sys.modules first.
There is the reload builtin function which will, given a module object, reload it from disk and that will get placed in sys.modules. Edit -- actually, it will recompile the code from the file on the disk, and then re-evalute it in the existing module's __dict__. Something potentially very different than making a new module object.
Mike Graham is right though; getting reloading right if you have even a few live objects that reference the contents of the module you don't want anymore is hard. Existing objects will still reference the classes they were instantiated from is an obvious issue, but also all references created by means of from module import symbol will still point to whatever object from the old version of the module. Many subtly wrong things are possible.
Edit: I agree with the consensus that restarting the interpreter is by far the most reliable thing. But for debugging purposes, I guess you could try something like the following. I'm certain that there are corner cases for which this wouldn't work, but if you aren't doing anything too crazy (otherwise) with module loading in your package, it might be useful.
def reload_package(root_module):
package_name = root_module.__name__
# get a reference to each loaded module
loaded_package_modules = dict([
(key, value) for key, value in sys.modules.items()
if key.startswith(package_name) and isinstance(value, types.ModuleType)])
# delete references to these loaded modules from sys.modules
for key in loaded_package_modules:
del sys.modules[key]
# load each of the modules again;
# make old modules share state with new modules
for key in loaded_package_modules:
print 'loading %s' % key
newmodule = __import__(key)
oldmodule = loaded_package_modules[key]
oldmodule.__dict__.clear()
oldmodule.__dict__.update(newmodule.__dict__)
Which I very briefly tested like so:
import email, email.mime, email.mime.application
reload_package(email)
printing:
reloading email.iterators
reloading email.mime
reloading email.quoprimime
reloading email.encoders
reloading email.errors
reloading email
reloading email.charset
reloading email.mime.application
reloading email._parseaddr
reloading email.utils
reloading email.mime.base
reloading email.message
reloading email.mime.nonmultipart
reloading email.base64mime
Quitting and restarting the interpreter is the best solution. Any sort of live reloading or no-caching strategy will not work seamlessly because objects from no-longer-existing modules can exist and because modules sometimes store state and because even if your use case really does allow hot reloading it's too complicated to think about to be worth it.
With IPython comes the autoreload extension that automatically repeats an import before each function call. It works at least in simple cases, but don't rely too much on it: in my experience, an interpreter restart is still required from time to time, especially when code changes occur only on indirectly imported code.
Usage example from the linked page:
In [1]: %load_ext autoreload
In [2]: %autoreload 2
In [3]: from foo import some_function
In [4]: some_function()
Out[4]: 42
In [5]: # open foo.py in an editor and change some_function to return 43
In [6]: some_function()
Out[6]: 43
For Python version 3.4 and above
import importlib
importlib.reload(<package_name>)
from <package_name> import <method_name>
Refer below documentation for details.
There are some really good answers here already, but it is worth knowing about dreload, which is a function available in IPython which does as "deep reload". From the documentation:
The IPython.lib.deepreload module allows you to recursively reload a
module: changes made to any of its dependencies will be reloaded
without having to exit. To start using it, do:
http://ipython.org/ipython-doc/dev/interactive/reference.html#dreload
It is available as a "global" in IPython notebook (at least my version, which is running v2.0).
HTH
You can use import hook machinery described in PEP 302 to load not modules themself but some kind of proxy object that will allow you to do anything you want with underlying module object — reload it, drop reference to it etc.
Additional benefit is that your currently existing code will not require change and this additional module functionality can be torn off from a single point in code — where you actually add finder into sys.meta_path.
Some thoughts on implementing: create finder that will agree to find any module, except of builtin (you have nothing to do with builtin modules), then create loader that will return proxy object subclassed from types.ModuleType instead of real module object. Note that loader object are not forced to create explicit references to loaded modules into sys.modules, but it's strongly encouraged, because, as you have already seen, it may fail unexpectably. Proxy object should catch and forward all __getattr__, __setattr__ and __delattr__ to underlying real module it's keeping reference to. You will probably don't need to define __getattribute__ because of you would not hide real module contents with your proxy methods. So, now you should communicate with proxy in some way — you can create some special method to drop underlying reference, then import module, extract reference from returned proxy, drop proxy and hold reference to reloaded module. Phew, looks scary, but should fix your problem without reloading Python each time.
I am using PythonNet in my project. Fortunately, I found there is a command which can perfectly solve this problem.
using (Py.GIL())
{
dynamic mod = Py.Import(this.moduleName);
if (mod == null)
throw new Exception( string.Format("Cannot find module {0}. Python script may not be complied successfully or module name is illegal.", this.moduleName));
// This command works perfect for me!
PythonEngine.ReloadModule(mod);
dynamic instance = mod.ClassName();
Think twice for quitting and restarting in production
The easy solution without quitting & restarting is by using the reload from imp
import moduleA, moduleB
from imp import reload
reload (moduleB)

Categories

Resources