Should I store temporary files in the __pycache__ folder?

Should I store temporary files in the __pycache__ folder? - python

I am writing a memoizing decorator for Python 3. The plan is to pickle the cache in a temporary file to save time across multiple executions.
The cache will be deleted if the file containing the decorated function has been changed, just like the .pyc file in __pycache__. So then I started thinking about uncluttering the user's project directory by placing the pickled cache in __pycache__. Are there side effects of storing files in this directory? Is it misleading or confusing?

You should either...
Have all caching to happen at runtime: This beats your purpose. However, it is the only way not to touch the filesystem at all.
Dedicate a special folder like __memo__ or something: This will allow you to have caching across several executions. However, you'll "pollute" the application's file structure.
Reasons you should never mess up with __pycache__:
__pycache__ is CPython-specific. Although it's most likely the implementation you're using right now, it's not portable/safe/good practice to assume everyone does for no good reason at all.
__pycache__ is intended as a transparent optimization, that is, no one should care if it exists at all or not. This is a design decision from the CPython folks, and so you should not circumvent it.
Due to the above, the aforementioned directory may be disabled if the user wants to. For instance, in CPython, if you do python3 -B myapp.py, no __pycache__ will be created if it doesn't exist, and otherwise will be ignored.
Users often delete __pycache__ because of the above two points for several reasons. I do, at least.
Messing things inside __pycache__ because "it doesn't pollute the application's file structure" is an illusion. The directory can perfectly be considered as "pollution". If the Python interpreter already pollutes stuff, why can't you with __memo__ anyway?

Related

Should I add Python's pyc files to .dockerignore?

I've seen several examples of .dockerignore files for Python projects where *.pyc files and/or __pycache__ folders are ignored:
**/__pycache__
*.pyc
Since these files/folders are going to be recreated in the container anyway, I wonder if it's a good practice to do so.

Yes, it's a recommended practice. There are several reasons:
Reduce the size of the resulting image
In .dockerignore you specify files that won't go to the resulting image, it may be crucial when you're building the smallest image. Roughly speaking the size of bytecode files is equal to the size of actual files. Bytecode files aren't intended for distribution, that's why we usually put them into .gitignore as well.
Cache related problems
In earlier versions of Python 3.x there were several cached related issues:
Python’s scheme for caching bytecode in .pyc files did not work well
in environments with multiple Python interpreters. If one interpreter
encountered a cached file created by another interpreter, it would
recompile the source and overwrite the cached file, thus losing the
benefits of caching.
Since Python 3.2 all the cached files prefixed with interpreter version as mymodule.cpython-32.pyc and presented under __pychache__ directory. By the way, starting from Python 3.8 you can even control a directory where the cache will be stored. It may be useful when you're restricting write access to the directory but still want to get benefits of cache usage.
Usually, the cache system works perfectly, but someday something may go wrong. It worth to note that the cached .pyc (lives in the same directory) file will be used instead of the .py file if the .py the file is missing. In practice, it's not a common occurrence, but if some stuff keeps up being "there", thinking about remove cache files is a good point. It may be important when you're experimenting with the cache system in Python or executing scripts in different environments.
Security reasons
Most likely that you don't even need to think about it, but cache files can contain some sort of sensitive information. Due to the current implementation, in .pyc files presented an absolute path to the actual files. There are situations when you don't want to share such information.
It seems that interacting with bytecode files is a quite frequent necessity, for example, django-extensions have appropriate options compile_pyc and clean_pyc.

Pythonic way for magaging and using user-created shared libraries

tl;dr I have a directory of common files outside of my various project directories. What is the pythonic way of using/importing these common files inside my projects, and for building them into an output directory.
Background:
I'm in school and taking a data structures class that uses Python as the language. I'm learning the languages as I take the class but have having some issues trying to maintain a shared code base.
In all of the other languages I've used, both compiled and interpreted, there has been a fairly intuitive way of being able to keep shared modules separate from the code that is using them so that updating a shared module doesn't require updates to the calling code.
This is how I initially had my directory structure organized.
/.../Projects
Assignment_1
__init__.py
classA.py
classB.py
Assignment_2
__init__.pu
classC.py
(etc)
After realizing that much of the functionality of classA and classB would be required later on, I reorganized to this:
/.../Projects
Common
Sorters
__init__.py
BubbleSort.py
MergeSort.py
__init__.py
SimpleProfiler.py
Assignment_1
__init__.py
main.py
Assignment_2
__init__.py
main.py
My issue is that I can't find a good way of importing things like SimpleProfiler or MergeSort from main,py. Right now I'm manually coping all of the Common files into each assignment, which is bad.
I understand that one possible solution is to update the path to include the common folder form within each main.py file, but I've also read that this is very hacky and isn't encouraged.
Another Stackoverflow answer to a similar question suggested that the user structure everything under one large project. I tried this but still couldn't import modules from one sibling into another sibling.
My other issue is how to package everything together when submitting the assignment. In other languages it was easy to implement a build script that would scan the main project for any imports, then copy (flatten) those imported files into a single output directory which I could then compress and submit for grading. I'm using PyCharm, but can't seem to find a way to reference the imports as part of the build process. Is there any kind of script for this? Whatever the solution is, I need to be able to submit the project in such a way that all the instructor has to do is call a single python file (such as main.py)
This issue isn't unique to a school setting, but seems universal to most programming projects. So, what is the pythonic way of managing a shared code base and for building that shared code into a final project?

[Disclaimer: I think it is better to use PYTHONPATH environment variable]
I think of two very similar alternatives:
/.../Projects
Common
Sorters
__init__.py
BubbleSort.py
MergeSort.py
__init__.py
SimpleProfiler.py
assignment_1.py
assignment_2.py
If you use, from assignment_1.py, the following import: from Common.Sorters.BubbleSort import bubble_sort. This is because, by default, PYTHONPATH considers the current path as a valid PYTHONPATH. This is assuming that you are invoking the scripts assignment_* directly.
The other alternative would be:
/.../Projects
Common
Sorters
__init__.py
BubbleSort.py
MergeSort.py
__init__.py
SimpleProfiler.py
Assignment_1
__init__.py
__main__.py
Assignment_2
__init__.py
__main__.py
And invoking the assignments like so: python -m Assignment_1 (from the Projects folder). By default, "executing" a module like that will load its __main__.py code. (This is not a rigurous explanation, although the official one is a bit short).
It works for the same reasons as before: Python interpreter will consider the current path as a valid PYTHONPATH.

Try setting PYTHONPATH environment variable to your directory.
Python first searches for files being imported in sys.path, and the first directory in sys.path is the current directory. PYTHONPATH is the next where python will look for files.

On the minimum end, make a PyCharm run configuration that sets your PYTHONPATH before executing and include the other directory. That way you don't need to do a sys.path call in your code.
Closer to the "perfect" end, make your other directory into a Python package with a setup.py. Then, using the interpreter from your project, do "python path/to/other/dir/setup.py develop" to bring your separately-developed package into the consuming project.

Python init.py proved to be irrelevant in version 3.4?

I am running python 3.4 on the main.py file in the same directory.
/root directory is not in python path. It is simply the current directory that the script is executing in. All pycache folders were deleted after each test
So why exactly is __init__.py important? I thought it was necessary as stated in this post:
What is __init__.py for?
If you remove the init.py file, Python will no longer look for submodules inside that directory, so attempts to import the module will fail.
Right now, it seems to me that __init__.py is nothing more than an optional constructor where we do housekeeping and other optional things like specifying the "all" variable, etc. But not a critical item to have.
Image showing the results of the test:
Can someone explain the discrepancy or what is the cause of this issue?

As confusing as it may be, although the basics will work without __init__.py files, you should probably still use them. Many external tools, as well as package-related functions in the standard library, will not work as expected without them. More words of wisdom here (as well as a misleading accepted answer): Is __init__.py not required for packages in Python 3.3+.

Found Answer
In essence, init.py is not needed, and its purpose is for legacy and optional housekeeping tasks that you may or may not want or need in Python versions 2.7 vs 3.0+. However, it is important to take into account that they have slightly different behavior during more complex parsing if you are building something more complex.
Please refer to the following links for additional reading material:
https://www.python.org/dev/peps/pep-0420/#namespace-packages-today
How do I create a namespace package in Python?
What's the difference between a Python module and a Python package?
https://softwareengineering.stackexchange.com/questions/276888/python-namespace-vs-module-with-underscores

What is the argument for Python to seemingly frown on importing from different directories?

This might be a more broad question, and more related to understanding Python's nature and probably good programming practices in general.
I have a file, called util.py. It has a lot of different small functions I've collected over the past few months that are useful when doing various machine learning tasks.
My thinking is this: I'd like to continue adding important functions to this script as I go. As so, I will want to use import util.py often, now and in the future, in many unrelated projects.
But Python seems to feel like I should only be able to access the code in this file if it lives in my current directly, even if the functions in this file are useful for scripts in different directories. I sense some reason behind the way that works that I don't fully grasp; to me, it seems like I'll be forced to make unnecessary copies often.
If I should have to create a new copy of util.py every time I'm working from within a new directory, on a different project, it won't be long until I have many different version / iterations of this file, scattered all over my hard drive, in various states. I don't desire this degree of modularity in my programming -- for the sake of simplicity, repeatability, and clarity, I want only one file in only one location, accessible to many projects.
The question in a nutshell: What is the argument for Python to seemingly frown on importing from different directories?

If your util.py file contains functions you're using in a lot of different projects, then it's actually a library, and you should package it as such so you can install it in any Python environment with a single line (python setup.py install), and update it if required (Python's packaging ecosystem has several features to track and update library versions).
An added benefit is that right now, if you're doing what the other answers suggested, you have to remember to manually have put util.py in your PYTHONPATH (the "dirty" way). If you try to run one of your programs and you haven't done that, you'll get a cryptic ImportError that doesn't explain much: is it a missing dependency? A typo in the program?
Now think about what happens if someone other than you tries to run the program(s) and gets those error messages.
If you have a library, on the other hand, trying to set up your program will either complain in clear, understandable language that the library is missing or out of date, or (if you've taken the appropriate steps) automatically download and install it so things are ready to roll.
On a related topic, having a file/module/namespace called "util" is a sign of bad design. What are these utilities for? It's the programming equivalent of a "miscellaneous" folder: eventually, everything will end up in it and you'll have no way to know what it contains other than opening it and reading it all.

Another way, is adding the directory/you/want/to/import/from to the path from within the scripts that need it.
You should have a file __init__.py in the same folder where utils.py lives, to tell python to treat the folder as a package. The file __init__.py may be empty, or not, you can define other things in there.
Example:
/home/marcos/python/proj1/
__init__.py
utils.py
/home/marcos/school_projects/final_assignment/
my_scrpyt.py
And then inside my_script.py
import sys
sys.path.append('/home/marcos/python/')
from proj1 import utils
MAX_HEIGHT = utils.SOME_CONSTANT
a_value = utils.some_function()

First, define an environment variable. If you are using bash, for example, then put the following in the appropriate startup file:
export PYTHONPATH=/path/to/my/python/utilities
Now, put your util.py and any of your other common modules or packages in that directory. Now you can import util from anywhere and python will find it.

Dynamically import a package from a directory

I am making a server that listens on a socket. Whenever a new request comes in, the server spawns a new instance of handle_request. Each instance of handle_request.py imports the relevant handler from request_handlers.
server.py
handle_request.py
request_handlers
|_handler_beta
|_handler_1
|_handler_2
While handlerX is a module, request_handlers is not a package. The modules are self-contained and reloaded on each request. The modules may be added, modified or dropped while the program is running.
Question: What is the way to import a module from an arbitrary directory?
Doing my homework, I saw that most questions deal with packages, even one is titled "python: import a module from a folder". Hence I believe this question is distinct. The architecture has been simplified; and yes, I am considering pre-forking with reload on file modification.

Create __init__.py in the directory. That makes it a package. If you're scanning the directory for .py files, you'll probably then want to skip __init__.py.
Then, you can import them with __import__('request_handlers.' + module, fromlist=['']) (the fromlist is important, otherwise you'll get request_handlers rather than the appropriate module).
One way of doing it without an __init__.py would be to put the request_handlers directory in sys.path, but that then makes name clashes with other modules possible. Another way would be with execfile. You can research that more if you want to. I'd do it (and have done it before) the package/__import__ way.

As Chris said: add an __init__.py to your folder. You can use the __import__() function if you do not know the names beforehand or if you do not want to have their names fixed in your code.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.