In Python, how can I efficiently manage references between script files? - python

I have a fair number of Python scripts that contain reusable code that are used and referenced by other Python scripts. However, these scripts tend to be scattered across different directories and I find it to be somewhat tedious to have to include (most often multiple) calls to sys.path.append on my top-level scripts. I just want to provide the 'import' statements without the additional file references in the same script.
Currently, I have this:
import sys
sys.path.append('..//shared1//reusable_foo')
import Foo
sys.path.append('..//shared2//reusable_bar')
import Bar
My preference would be the following:
import Foo
import Bar
My background is primarily in the .NET platform so I am accustomed to having meta files such as *.csproj, *.vbproj, *.sln, etc. to manage and contain the actual file path references outside of the source files. This allows me to just provide 'using' directives (equivalent to Python's import) without exposing all of the references and allowing for reuse of the path references themselves across multiple scripts.
Does Python have equivalent support for this and, if not, what are some techniques and approaches?

The simple answer is to put your reusable code in your site-packages directory, which is in your sys.path.
You can also extend the search path by adding .pth files somewhere in your path.
See https://docs.python.org/2/install/#modifying-python-s-search-path for more details
Oh, and python 2.6/3.0 adds support for PEP370, Per-user site-packages Directory

If your reusable files are packaged (that is, they include an __init__.py file) and the path to that package is part of your PYTHONPATH or sys.path then you should be able to do just
import Foo
This question provides a few more details.
(Note: As Jim said, you could also drop your reusable code into your site-packages directory.)

You can put the reusable stuff in site-packages. That's completely transparent, since it's in sys.path by default.
You can put someName.pth files in site-packages. These files have the directory in which your actual reusable stuff lives. This is also completely transparent. And doesn't involve the extra step of installing a change in site-packages.
You can put the directory of the reusable stuff on PYTHONPATH. That's a little less transparent, because you have to make sure it's set. Not rocket science, but not completely transparent.

In one project, I wanted to make sure that the user could put python scripts (that could basically be used as plugins) anywhere. My solution was to put the following in the config file for that project:
[server]
PYPATH_APPEND: /home/jason:/usr/share/some_directory
That way, this would add /home/jason and /usr/share/some_directory to the python path at program launch.
Then, it's just a simple matter of splitting the string by the colons and adding those directories to the end of the sys.path. You may want to consider putting a module in the site-packages directory that contains a function to read in that config file and add those directories to the sys.path (unfortunately, I don't have time at the moment to write an example).
As others have mentioned, it's a good idea to put as much in site-packages as possible and also using .pth files. But this can be a good idea if you have a script that needs to import a bunch of stuff that's not in site-packages that you wouldn't want to import from other scripts.
(there may also be a way to do this using .pth files, but I like being able to manipulate the python path in the same place as I put the rest of my configuration info)

The simplest way is to set (or add to) PYTHONPATH, and put (or symlink) your modules and packages into a path contained in PYTHONPATH.

My solution was to package up one utility that would import the module:
my_util is in site packages
import my_util
foo = myutil.import_script('..//shared1//reusable_foo')
if foo == None:
sys.exit(1)
def import_script(script_path, log_status = True):
"""
imports a module and returns the handle
"""
lpath = os.path.split(script_path)
if lpath[1] == '':
log('Error in script "%s" in import_script' % (script_path))
return None
#check if path is already in sys.path so we don't repeat
npath = None
if lpath[0] == '':
npath = '.'
else:
if lpath[0] not in sys.path:
npath = lpath[0]
if npath != None:
try:
sys.path.append(npath)
except:
if log_status == True:
log('Error adding path "%s" in import_script' % npath)
return None
try:
mod = __import__(lpath[1])
except:
error_trace,error_reason = FormatExceptionInfo()
if log_status == True:
log('Error importing "%s" module in import_script: %s' % (script_path, error_trace + error_reason))
sys.path.remove(npath)
return None
return mod

Related

Is this the approved way to acess data adjacent to/packaged with a Python script?

I have a Python script that needs some data that's stored in a file that will always be in the same location as the script. I have a setup.py for the script, and I want to make sure it's pip installable in a wide variety of environments, and can be turned into a standalone executable if necessary.
Currently the script runs with Python 2.7 and Python 3.3 or higher (though I don't have a test environment for 3.3 so I can't be sure about that).
I came up with this method to get the data. This script isn't part of a module directory with __init__.py or anything, it's just a standalone file that will work if just run with python directly, but also has an entry point defined in the setup.py file. It's all one file. Is this the correct way?
def fetch_wordlist():
wordlist = 'wordlist.txt'
try:
import importlib.resources as res
return res.read_binary(__file__, wordlist)
except ImportError:
pass
try:
import pkg_resources as resources
req = resources.Requirement.parse('makepw')
wordlist = resources.resource_filename(req, wordlist)
except ImportError:
import os.path
wordlist = os.path.join(os.path.dirname(__file__), wordlist)
with open(wordlist, 'rb') as f:
return f.read()
This seems ridiculously complex. Also, it seems to rely on the package management system in ways I'm uncomfortable with. The script no longer works unless it's been pip-installed, and that also doesn't seem desirable.
Resources living on the filesystem
The standard way to read a file adjacent to your python script would be:
a) If you've got python>=3.4 I'd suggest you use the pathlib module, like this:
from pathlib import Path
def fetch_wordlist(filename="wordlist.txt"):
return (Path(__file__).parent / filename).read_text()
if __name__ == '__main__':
print(fetch_wordlist())
b) And if you're still using a python version <3.4 or you still want to use the good old os.path module you should do something like this:
import os
def fetch_wordlist(filename="wordlist.txt"):
with open(os.path.join(os.path.dirname(__file__), filename)) as f:
return f.read()
if __name__ == '__main__':
print(fetch_wordlist())
Also, I'd suggest you capture exceptions in the outer callers, the above methods are standard way to read files in python so you don't need wrap them in a function like fetch_wordlist, said otherwise, reading files in python is an "atomic" operation.
Now, it may happen that you've frozen your program using some freezer such as cx_freeze, pyinstaller or similars... in that case you'd need to detect that, here's a simple way to check it out:
a) using os.path:
if getattr(sys, 'frozen', False):
app_path = os.path.dirname(sys.executable)
elif __file__:
app_path = os.path.dirname(__file__)
b) using pathlib:
if getattr(sys, 'frozen', False):
app_path = Path(sys.executable).parent
elif __file__:
app_path = Path(__file__).parent
Resources living inside a zip file
The above solutions would work if the code lives on the file system but it wouldn't work if the package is living inside a zip file, when that happens you could use either importlib.resources (new in version 3.7) or pkg_resources combo as you've already shown in the question (or you could wrap up in some helpers) or you could use a nice 3rd party library called importlib_resources that should work with the old&modern python versions:
pypi: https://pypi.org/project/importlib_resources/
documentation: https://importlib-resources.readthedocs.io/en/latest/
Specifically for your particular problem I'd suggest you take a look to this https://importlib-resources.readthedocs.io/en/latest/using.html#file-system-or-zip-file.
If you want to know what that library is doing behind the curtains because you're not willing to install any 3rd party library you can find the code for py2 here and py3 here in case you wanted to get the relevant bits for your particular problem
I'm going to go out on a limb and make an assumption because it may drastically simplify your problem. The only way I can imagine that you can claim that this data is "stored in a file that will always be in the same location as the script" is because you created this data, once, and put it in a file in the source code directory. Even though this data is binary, have you considered making the data a literal byte-string in a python file, and then simply importing it as you would anything else?
You're right about the fact that your method of reading a file is a bit unnecessarily complex. Unless you have got a really specific reason to use the importlib and pkg_resources modules, it's rather simple.
import os
def fetch_wordlist():
if not os.path.exists('wordlist.txt'):
raise FileNotFoundError
with open('wordlist.txt', 'rb') as wordlist:
return wordlist.read()
You haven't given much information regarding your script, so I cannot comment on why it doesn't work unless it's installed using pip. My best guess: your script is probably packed into a python package.

Where to place absolute paths in a python project?

I have the following project structure
ts_tools
/bin
/docs
/lib
/ts_tools
/data_transfer
/tests
data_import.py
__init__.py
/data_manipulation
/tests
data_smoothing.py
__init__.py
__init__.py
config.yaml.conf
setup.py
LICENSE
README.md
TODO.md
I would like to import data with the data_import.py file from an external source. I use the config.yaml.conf file to specify the absolute paths of the data with:
root_path:
windows:
data_fundamental: C:\\Users\\Malcom\\Data_Fundamental
data_event: C:\\Users\\Malcom\\Data_Event
linux:
data_fundamental: /home/data/data_fundamental
data_event: /home/data/data_event
The respective paths should be available for all tools in the ts_tools package (i.e. data_import.py and data_smoothing.py). Furthermore, the program should identify the os and choose the path structure accordingly.
I know how to set the paths with the yaml file using
import yaml
with open("config.yaml.conf", "r") as ymlfile:
cfg = yaml.load(ymlfile)
and I know how to discriminate between the os with
if platform.system().lower() == 'windows':
ROOT_DATA_PATH = cfg['windows']
else:
ROOT_DATA_PATH = cfg['linux']
but I don't know where to place these code snippets. I don't think that it is appropriate to use it in the setup.py file. On the other hand I consider it inappropriate to specify a new .py file. What is a good design structure for this problem? Where should a specify absolute file paths? Is my ansatz a step in the right direction?
Thank you in advance.
In this case, you can make it relative to the home directory, so you can have ~/data_fundamental and ~/data_event (Which should be equivalent on both platforms). You can expand this with os.path.expandhome
import os.path
def get_path(s):
return os.path.normpath(os.path.normcase(
os.path.expanduser(os.path.expandvars(s))
))
# get_path('~/data_fundamental') on Windows:
# r'c:\users\malcom\data_fundamental'
# get_path('~/data_fundamental') on Linux:
# r'/home/data/data_fundamental'
# (Assuming your username on Windows is Malcolm
# and on Linux is data and you haven't changed
# the default home path)
In any case, having two different setup things might be overly confusing, and you should expand ~ and %VARS% and ${VARS} anyways to make setting it up easier and run as expected.
Your other alternatives include:
Reading from environment variables
Writing it in setup.py (You should probably allow some way to change where the config file is as setup.py might put it in a write-protected location)
You could also not have a default at all, and when not given either make a default based on sys.platform() or raise an error message telling the user to set it.
Let's identify two different type of files/data.
Files/data written by the user or for the user during installation/deploy
Files/data written by the coder
It can be okay to have absolute paths in files/data defined by the user or generated by the program executing on the user machine.
Absolute paths are intrinsically more fragile than relative paths, but it's not that bad in the first case.
In the second case you should never use absolute paths. I see that you are even using two different paths for windows and linux. You don't have to do that and you shouldn't.
In Python you have things such as os.path.expanduser('~') to find the user path, or packages like appdirs. You want to be cross-platform as much as possible, and with Python is almost always possible.

How do I use test resources (like a fixed yaml file) with pytest?

I've looked around the docs on the pytest website, but haven't found a clear example of working with 'test resources', such as reading in fixed files during unit tests. Something similar to what http://jlorenzen.blogspot.com/2007/06/proper-way-to-access-file-resources-in.html describes for Java.
For example, if I have a yaml file checked in to source control, what is the right way to write a test which loads from that file? I think this boils down to understanding the right way to access a 'resource file' on the python equivalent of the classpath (PYTHONPATH?).
This seems like it should be simple. Is there an easy solution?
Perhaps what you are looking for is pkg_resources or pkgutil. For example, if you have a module within your python source called "resources", you could read your "resourcefile" using:
with open(pkg_resources.resource_filename("resources", "resourcefile")) as infile:
for line in infile:
print(line)
or:
with tempfile.TemporaryFile() as outfile:
outfile.write(pkgutil.get_data("resources", "resourcefile"))
The latter even works when your "script" is an executable zip file. The former works without needing to unpack your resources from an egg.
Note that creating a subdirectory of your source does not make it a module. You need to add a file named __init__.py within the directory for it to be visible as a module for the purposes of pkg_resources and pkgutil. __init__.py can be empty.
I think "resource file" is whatever definition you give to it in python (in Java, resource files can be bundled into jar files with ordinary Java classes, and Java provides library functions to access this information).
An equivalent solution might be to access the PYTHONPATH environment variable, define your "resource file" as a relative path, and then troll the PYTHONPATH looking for it. Here's an example:
pythonpath = os.env['PYTHONPATH']
file_relative_path = os.path.join('subdir', 'resourcefile') // e.g. subdir/resourcefile
for dir in pythonpath.split(os.pathsep):
resource_path = os.path.join(dir, file_relative_path)
if os.path.exists(resource_path):
return resource_path
This code snippet returns a full path for the first file that exists on the PYTHONPATH.

Open a file from PYTHONPATH

In a program, and obviously being influenced by the way Java does things, I want to read a static file (a log configuration file, actually) from a directory within the interpreter's PYTHONPATH. I know I could do something like:
import foo
a = foo.__path__
conf = open(a[0] + "/logging.conf")
but I don't know if this is the "Pythonic" way of doing things. How could I distribute the logging configuration file in a way that my application does not need to be externally configured to read it?
In general, that's fine, though I'm not sure you want a[0] above (that will just give you the first character of the path), and you should use os.path.join instead of just appending / to be cross-platform compatible. You might consider making the path canonical, i.e. os.path.abspath(os.path.dirname(foo.__path__)). Note that it won't work if __path__ is in a zip file or other import trickery is in use, but I wouldn't worry about that (it's not normal to do so for the main program in Python, unlike Java).
If you do want to support zipped files, there's pkg_resources, but that's somewhat deprecated at this point (there's no corresponding API I could see in the new packaging module).
Here's a snippet based on the link Nix posted upthread but written in a more functional style:
def search_path(pathname_suffix):
cands = [os.path.join(d,pathname_suffix) for d in sys.path]
try:
return filter(os.path.exists, cands)[0]
except IndexError:
return None

Is it possible to 'import * from DIRECTORY', then somehow (anyhow) iterate over the loaded modules?

Let me explain the use case...
In a simple python web application framework designed for Google App Engine, I'd like to have my models loaded automatically from a 'models' directory, so all that's needed to add a model to the application is place a file user.py (for example), which contains a class called 'User', in the 'models/' directory.
Being GAE, I can't read from the file system so I can't just read the filenames that way, but it seems to me that I must be able to 'import * from models' (or some equivalent), and retrieve at least a list of module names that were loaded, so I can subject them to further processing logic.
To be clear, I want this to be done WITHOUT having to maintain a separate list of these module names for the application to read from.
You can read from the filesystem in GAE just fine; you just can't write to the filesystem.
from models import * will only import modules listed in __all__ in models/__init__.py; there's no automatic way to import all modules in a package if they're not declared to be part of the package. You just need to read the directory (which you can do) and __import__() everything in it.
As explained in the Python tutorial, you cannot load all .py files from a directory unless you list them manually in the list named __all__ in the file __init__.py. One of the reasons why this is impossible is that it would not work well on case-insensitive file systems -- Python would not know in which case the module names should be used.
Let me start by saying that I'm not familiar with Google App Engine, but the following code demonstrates how to import all python files from a directory. In this case, I am importing files from my 'example' directory, which contains one file, 'dynamic_file.py'.
import os
import imp
import glob
def extract_module_names(python_files):
module_names = []
for py_file in python_files:
module_name = (os.path.basename(py_file))[:-3]
module_names.append(module_name)
return module_names
def load_modules(modules, py_files):
module_count = len(modules)
for i in range(0, module_count):
globals()[modules[i]] = imp.load_source(modules[i], py_files[i])
if __name__ == "__main__":
python_files = glob.glob('example/*.py')
module_names = extract_module_names(python_files)
load_modules(module_names, python_files)
dynamic_file.my_func()
Also, if you wish to iterate over these modules, you could modify the load_modules function to return a list of the loaded module objects by appending the 'imp.load_source(..)' call to a list.
Hope it helps.

Categories

Resources