When developing a Python web app (Flask/uWSGI) and running it on my local machine, *.pyc files are generated by the interpreter. My understanding is that these compiled files can make things load faster, but not necessarily run faster.
When I deploy this same app to production, it runs under a user account that has no write permissions on the local file system. There are no *.pyc files committed to source control, and no effort is made to generate them during the deploy. Even if Python wanted to write a .pyc file at runtime, it would not be able to.
Recently I started wondering if this has any tangible effect on the performance of the app, either in terms of the very first pageview after the process starts, or consistently throughout its entire lifetime.
Should I throw a python -m compileall in as part of my deploy scripts?
Sure, you can go ahead and precompile to .pyc's as it won't hurt anything.
Will it affect the first or nth pageload? Assuming Flask/WSGI runs as a persistent process, not at all. By the time the first page has been requested, all of the Python modules will have already been loaded into memory (as bytecode). Thus, server startup time will be the only thing affected by not having the files pre-compiled.
However, if for some reason a new Python process is invoked for each page request, then yes, there would (probably) be a noticeable difference in performance and it would be better to pre-compile.
As Klaus said in the comments above, the only other time a pageload might be affected is if a function happens to try and import a module that hasn't already been imported. This will require the module to be parsed and converted to bytecode then loaded into memory before being able to continue.
Related
I know there is a lot of debate within this topic.
I made some research, I looked into some of the questions here, but none was exactly it.
I'm developing my app in Django, using Python 3.7 and I'm not looking to convert my app into a single .exe file, actually it wouldn't be reasonable to do so, if even possible.
However, I have seen some apps developed in javascript that use bytenode to compile code to .jsc
Is there such a thing for python? I know there is .pyc, but for all I know those are just runtime compiled files, not actually a bytecode precompiled script.
I wanted to protect the source code on some files that can compromise the security of the app. After all, deploying my app means deploying a fully fledged python installation with a web port open and an app that works on it.
What do you think, is there a way to do it, does it even make sense to you?
Thank you
The precompiled (.pyc) files are what you are looking for. They contain pre-optimized bytecode that can be run by the interpreter even when the original .py file is absent.
You can build the .pyc files directly using python -m py_compile <filename>. There is also a more optimized .pyo format that further reduces the file size by removing identifier names and docstrings. You can turn it on by using -OO.
Note that it might still be possible to decompile the generated bytecode with enough effort, so don't use it as a security measure.
I have a github project available to others. One of the scripts, update.py, checks github everyday (via cron) to see if there is a newer version available.
Locally, the script is located at directory /home/user/.Project/update.py
If the version on github is newer, then update.py moves /home/user/.Project/ to /home/user/.OldProject/, clones the github repo and moves/renames the downloaded repo to /home/user/.Project/
It has worked perfectly for me about five times, but I just realized that the script is moving itself while it is still running. Are there any unforeseen consequences to this approach, and it there a better way?
As long as all of the code used by the script has been compiled and loaded into the Python VM there will be no issue with the source moving since it will remain resident in memory until the process ends or is replaced (or swapped out, but since it is considered dirty data it will be swapped in exactly the same). The operating system, though, may attempt to block the move operation if any files remain open during the process.
Would it be possible to create a python module that lazily downloads and installs submodules as needed? I've worked with "subclassed" modules that mimic real modules, but I've never tried to do so with downloads involved. Is there a guaranteed directory that I can download source code and data to, that the module would then be able to use on subsequent runs?
To make this more concrete, here is the ideal behavior:
User runs pip install magic_module and the lightweight magic_module is installed to their system.
User runs the code import magic_module.alpha
The code goes to a predetermine URL, is told that there is an "alpha" subpackage, and is then given the URLs of alpha.py and alpha.csv files.
The system downloads these files to somewhere that it knows about, and then loads the alpha module.
On subsequent runs, the user is able to take advantage of the downloaded files to skip the server trip.
At some point down the road, the user could run a import magic_module.alpha ; alpha._upgrade() function from the command line to clear the cache and get the latest version.
Is this possible? Is this reasonable? What kinds of problems will I run into with permissions?
Doable, certainly. The core feature will probably be import hooks. The relevant module would be importlib in python 3.
Extending the import mechanism is needed when you want to load modules that are stored in a non-standard way. Examples include [...] modules that are loaded from a database over a network.
Convenient, probably not. The import machinery is one of the parts of python that has seen several changes over releases. It's undergoing a full refactoring right now, with most of the existing things being deprecated.
Reasonable, well it's up to you. Here are some caveats I can think of:
Tricky to get right, especially if you have to support several python versions.
What about error handling? Should application be prepared for import to fail in normal circumstances? Should they degrade gracefully? Or just crash and spew a traceback?
Security? Basically you're downloading code from someplace, how do you ensure the connection is not being hijacked?
How about versionning? If you update some of the remote modules, how can make the application download the correct version?
Dependencies? Pushing of security updates? Permissions management?
Summing it up, you'll have to solve most of the issues of a package manager, along with securing downloads and permissions issues of course. All those issues are tricky to begin with, easy to get wrong with dire consequences.
So with all that in mind, it really comes down to how much resources you deem worth investing into that, and what value that adds over a regular use of readily available tools such as pip.
(the permission question cannot really be answered until you come up with a design for your package manager)
Here's the problem:
In one of Django views.py I have the following code:
from kml_generator import KML_generator
#login_required(login_url='/dev/login')
def search(request):
if request.POST:
result,SF=Validate(request, Activities)
val=result.values('id')
KML_generator(result1=val,user=request.user)
it basically imports module kml_generator and calls the class KML_generator from there. This class generates .kml file which is then shown on OpenLayers. It works as it should, but I want to change it.
And now:
Why when I change code inside the module kml_generator it do not affect the behaviour? I tried everything I even put there errors and it still works like charm....
So here's the question:
How to change it? Do a django have some kind of 'build', 'compile' inside? Do I need to call it to affect the code?
PS. It's all standing on Apache using wsgi.py
PS2. Ok, that's pathetic by me, but we got side company which developed for us a nice dynamic django website. And now I do not know why it doesn't work like I though it would.
You need to restart the Apache server for Django to pick up changes.
Python loads source files just once, when a module is imported. The compiled bytecode is then kept in memory. At import time, Python also caches the bytecode, in a .pyc file next to the original source file, you can verify that a new import has taken place by comparing timestamps on the .py and corresponding .pyc files.
A graceful restart should suffice; run apache2ctl graceful as root on your server.
In future, you may want to get yourself a development setup; running the same code (from a VCS, of course), but using the built-in Django development server:
python manage.py runserver
The Django development server does its best to reload code when you change it. This is a development feature only (watching files for changes costs performance).
Last but not least, try to avoid altering third-party libraries. Use subclassing or monkeypatching instead, and perhaps the upstream author would be willing to implement new features for you or accept patches. That way you don't have to maintain those changes yourself across versions either.
I'm getting seriously frustrated at how slow python startup is. Just importing more or less basic modules takes a second, since python runs down the sys.path looking for matching files (and generating 4 stat() calls - ["foo", "foo.py", "foo.pyc", "foo.so"] - for each check). For a complicated project environment, with tons of different directories, this can take around 5 seconds -- all to run a script that might fail instantly.
Do folks have suggestions for how to speed up this process? For instance, one hack I've seen is to set the LD_PRELOAD_32 environment variable to a library that caches the result of ENOENT calls (e.g. failed stat() calls) between runs. Of course, this has all sorts of problems (potentially confusing non-python programs, negative caching, etc.).
zipping up as many pyc files as feasible (with proper directory structure for packages), and putting that zipfile as the very first entry in sys.path (on the best available local disk, ideally) can speed up startup times a lot.
The first things that come to mind are:
Try a smaller path
Make sure your modules are pyc's so they'll load faster
Make sure you don't double import, or import too much
Other than that, are you sure that the disk operations are what's bogging you down? Is your disk/operating system really busy or old and slow?
Maybe a defrag is in order?
When trying to speed things up, profiling is key. Otherwise, how will you know which parts of your code are really the slow ones?
A while ago, I've created the runtime and import profile visualizer tuna, and I think it may be useful here. Simply create an import profile (with Python 3.7+) and run tuna on it:
python3.7 -X importtime -c "import scipy" 2> scipy.log
tuna scipy.log
If you run out of options, you can create a ramdisk to store your python packages. A ramdisk appears as a directory in your file system, but will actually be mapped directly to your computer's RAM. Here are some instructions for Linux/Redhat.
Beware: A ramdisk is volatile, so you'll also need to keep a backup of your files on your regular hard drive, otherwise you'll lose your data when your computer shuts down.
Something's missing from your premise--I've never seen some "more-or-less" basic modules take over a second to import, and I'm not running Python on what I would call cutting-edge hardware. Either you're running on some seriously old hardware, or you're running on an overloaded machine, or either your OS or Python installation is broken in some way. Or you're not really importing "basic" modules.
If it's any of the first three issues, you need to look at the root problem for a solution. If it's the last, we really need to know what the specific packages are to be of any help.