Benefits of avoiding .pyc files?

Benefits of avoiding .pyc files? - python

Are there any benefits, performance or otherwise for avoiding .pyc files, except for the convenience of not having a bunch of these files in the source folder?

I don't think there really is. .pyc files are cached bytecode files, and you save startup time as Python does not have to recompile your python files every time you start the interpreter.
At most, switching off bytecode compilation lets you measure how much time the interpreter spends on this step. If you want to compare how much time is saved, remove all the .pyc files in your project, and time Python by using the -B switch to the interpreter:
$ time python -B yourproject
then run again without the -B switch:
$ time python yourproject
It could be that the user under which you want to run your program does not have write access to the source code directories; for example a web server where you do not want remote users to have any chance of altering your source code through a vulnerability. In such cases I'd use the included compileall module to bytecompile everything using a priviledged user rather than forgo writing .pyc files.

One reason I can think of: during development, if you remove or rename a .py file, the .pyc file will stay around in your local copy with the old name and the old bytecode. Since you don't normally commit .pyc files to version control, this can lead to stray ImportErrors that don't happen on your machine, but do on others.

It is possible that .pyc files could encourage people to think they need not maintain/ship the original source. pyc files might not be portable between operating systems and versions of Python. When moving Python modules it is safer to leave the pyc files behind and just copy or ship the source and let the host python generate new pyc files.
By the way, from Python 3.2 the .pyc files no longer go into the source folder, but in __pycache__ (in the source folder).

Related

Replace the *.py files while the Python program is running and run second instance

I know the similar questions were already answered:
What will happen if I modify a Python script while it's running?
When are .pyc files refreshed?
Is it possible to replace a python file while its running
changing a python script while it is running
but I still can't find the clear answer to my question.
I have main.py file and the other *.py modules that I import from main file. I run python.exe main.py from (Win) console and python interpreter generates *.pyc files. Than I change some *.py source files and in another console run again python.exe main.py (while the first instance is still running). Python interpreter regenerates only *.pyc files for *.py source files I changed, while the other *.pyc files remains intact.
As I understand and as answers to those questions suggest, the first instance of the running program loaded all (first version of) *.pyc files in memory, the second instance of the running program loaded all (second version of) *.pyc files in memory.
My question is:
Are there any circumstances where the first instance will need/want to reload *.pyc files to memory from disk again (some swap memory/disk or something) or it loaded *.pyc files to memory for good (until the end of running the first instance of the program). Because if there are, the first instance will then reload some of the new *.pyc files and it can probably crash.
I know I can deliberately reload the modules in my python source files:
How do I unload (reload) a module?
but I don't do that in my source files. Is there some other danger.
Why am I asking this question. I made a strategy to upgrade my python GUI program by just copying *.py files through LAN to client's shared folders. On the client (Win) PC, user can have opened for example two or three instances of python GUI program. While user is running those instances on his/her client PC, I make the upgrade (just copy some changed *.py files to their PCs through LAN). He/She closes one of those programs (aware of the upgrade or not, it doesn't matter), loads it again and python interpreter regenerates some *.pyc files. Again, is there any danger that the first two instances will ever need to reload *.pyc files or (as far as they are concerned) they are loaded into memory for good?
Just for fun, I did exactly that for test and even deleted all *.pyc files while all three instances were running and tested it and they never needed any of the *.pyc files again (they never regenerated them) in those sessions.
I just need confirmation that it works that way in order to be sure to make upgrades safely that way.

Python is not saving .pyc files in filesystem

I have a python application running in an embedded Linux system. I have realized that the python interpreter is not saving the compiled .pyc files in the filesystem for the imported modules by default.
How can I enable the interpreter to save it ? File system permission are right.

There are a number of places where this enabled-by-default behavior could be turned off.
PYTHONDONTWRITEBYTECODE could be set in the environment
sys.dont_write_bytecode could be set through an out-of-band mechanism (ie. site-local initialization files, or a patched interpreter build).
File permissions could fail to permit it. This need not be obvious! Anything from filesystem mount flags to SELinux tags could have this result. I'd suggest using strace or a similar tool (as available for your platform) to determine whether any attempts to create these files exist.
On an embedded system, it makes much more sense to make this an explicit step rather than runtime behavior: This ensures that performance is consistent (rather than having some runs take longer than others to execute). Use py_compile or compileall to explicitly run ahead-of-time.

Compiling multiple python scripts into one file

I'm relatively new to python and used to C, and I'm trying to find a way to possibly compile several .py files in a directory together as one file. Some of the methods I've seen were to make an egg or bundle them up as a zip file and run it with 'python foo.zip,' but I'm pretty restricted on those options.
The .zip method is closest to what I'm after but what I need is more along the lines of importing that file in the main script and not as an argument to the interpreter.
I have to run this code on several machines and would rather not have to copy a whole folder of modules with it, and would also rather not have to paste all of my code into one file.
Caveats: I'm running a pretty old version of python (2.4.3) on machines that are cut off from the Internet and that I don't have physical access to, so I can't install other modules. I would have to be able to pull it off with old vanilla python.

clean up .pyc files in virtualenv stored in souce repository after the fact?

I've created a virtualenv for my project and checked it into source control. I've installed a few projects into the virtualenv with pip: django, south, and pymysql. After the fact I realized that I had not set up source control for ignoring .pyc files. Could there be any subtle problems in simply removing all .pyc files from my project's repository and then putting in place the appropriate file ignore rules? Or is removing a .pyc file always a safe thing to do?

That is fine, just remove them!
Python auto-generates them from the corresponding .py file any time it wants to, so you needn't worry about simply deleting them all from your repository.
A couple of related tips - if you don't want them generated at all on your local dev machine, set the environment variable PYTHONDONTWRITEBYTECODE=1. Python 3.2 fixed the annoyance of source folders cluttered with .pyc files with a new __pycache__ subfolder

Backing up a postgresql data dir with python tarfile. What errors to expect / ignore

I am writing a python script for automating the backing up of the PostgreSQL data directory to enable "Continuous Archiving and Point-In-Time Recovery".
I would like to use Pythons tarfile library to create an archive.
The complication is that the data files can change during the backup. To quote the PostgreSQL manual:
Some file system backup tools emit
warnings or errors if the files they
are trying to copy change while the
copy proceeds. When taking a base
backup of an active database, this
situation is normal and not an error.
However, you need to ensure that you
can distinguish complaints of this
sort from real errors.
... some
versions of GNU tar return an error
code indistinguishable from a fatal
error if a file was truncated while
tar was copying it. Fortunately, GNU
tar versions 1.16 and later exit with
1 if a file was changed during the
backup, and 2 for other errors.
When copying the files from the data directory, what exceptions should I be expecting in Python? And how can I determine whether an error is critical?

If you are using the tarfile module, the implementation of this is essentially up to you. GNU tar checks the file size and ctime before and after it processes a file and complains if they differ. It's not an actual error, it's just something that tar feels like it should mention. But in your use case, you don't care about that, so you can just forget about it. The tarfile module certainly won't complain about this by itself.
As far as other exceptions, the possibilities are endless. Pretty much anything that Python throws is probably fatal in this case.

Continuous Archiving is explained in PostgreSQL documentation. It is implemented and there is no need to create additional script - but you will have many files.
24.3.2. Making a Base Backup - follow instructions and you can create tar, data files will not change.
24.3.1. Setting up WAL archiving - here you have to archive new WAL files and they will not change too but they are separate files.
I think it is not safe to add WAL files to base backup. If server crashes while you are adding file to base backup it may become corrupt.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.