I/O in classes in other directories in python [duplicate] - python

This question already has answers here:
How to read a (static) file from inside a Python package?
(6 answers)
Closed 2 years ago.
I am writing a python package with modules that need to open data files in a ./data/ subdirectory. Right now I have the paths to the files hardcoded into my classes and functions. I would like to write more robust code that can access the subdirectory regardless of where it is installed on the user's system.
I've tried a variety of methods, but so far I have had no luck. It seems that most of the "current directory" commands return the directory of the system's python interpreter, and not the directory of the module.
This seems like it ought to be a trivial, common problem. Yet I can't seem to figure it out. Part of the problem is that my data files are not .py files, so I can't use import functions and the like.
Any suggestions?
Right now my package directory looks like:
/
__init__.py
module1.py
module2.py
data/
data.txt
I am trying to access data.txt from module*.py!

The standard way to do this is with setuptools packages and pkg_resources.
You can lay out your package according to the following hierarchy, and configure the package setup file to point it your data resources, as per this link:
http://docs.python.org/distutils/setupscript.html#installing-package-data
You can then re-find and use those files using pkg_resources, as per this link:
http://peak.telecommunity.com/DevCenter/PkgResources#basic-resource-access
import pkg_resources
DATA_PATH = pkg_resources.resource_filename('<package name>', 'data/')
DB_FILE = pkg_resources.resource_filename('<package name>', 'data/sqlite.db')

There is often not point in making an answer that details code that does not work as is, but I believe this to be an exception. Python 3.7 added importlib.resources that is supposed to replace pkg_resources. It would work for accessing files within packages that do not have slashes in their names, i.e.
foo/
__init__.py
module1.py
module2.py
data/
data.txt
data2.txt
i.e. you could access data2.txt inside package foo with for example
importlib.resources.open_binary('foo', 'data2.txt')
but it would fail with an exception for
>>> importlib.resources.open_binary('foo', 'data/data.txt')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python3.7/importlib/resources.py", line 87, in open_binary
resource = _normalize_path(resource)
File "/usr/lib/python3.7/importlib/resources.py", line 61, in _normalize_path
raise ValueError('{!r} must be only a file name'.format(path))
ValueError: 'data/data2.txt' must be only a file name
This cannot be fixed except by placing __init__.py in data and then using it as a package:
importlib.resources.open_binary('foo.data', 'data.txt')
The reason for this behaviour is "it is by design"; but the design might change...

You can use __file__ to get the path to the package, like this:
import os
this_dir, this_filename = os.path.split(__file__)
DATA_PATH = os.path.join(this_dir, "data", "data.txt")
print open(DATA_PATH).read()

To provide a solution working today. Definitely use this API to not reinvent all those wheels.
A true filesystem filename is needed. Zipped eggs will be extracted to a cache directory:
from pkg_resources import resource_filename, Requirement
path_to_vik_logo = resource_filename(Requirement.parse("enb.portals"), "enb/portals/reports/VIK_logo.png")
Return a readable file-like object for the specified resource; it may be an actual file, a StringIO, or some similar object. The stream is in “binary mode”, in the sense that whatever bytes are in the resource will be read as-is.
from pkg_resources import resource_stream, Requirement
vik_logo_as_stream = resource_stream(Requirement.parse("enb.portals"), "enb/portals/reports/VIK_logo.png")
Package Discovery and Resource Access using pkg_resources
https://setuptools.readthedocs.io/en/latest/pkg_resources.html#resource-extraction
https://setuptools.readthedocs.io/en/latest/pkg_resources.html#basic-resource-access

You need a name for your whole module, you're given directory tree doesn't list that detail, for me this worked:
import pkg_resources
print(
pkg_resources.resource_filename(__name__, 'data/data.txt')
)
Notibly setuptools does not appear to resolve files based on a name match with packed data files, soo you're gunna have to include the data/ prefix pretty much no matter what. You can use os.path.join('data', 'data.txt) if you need alternate directory separators, Generally I find no compatibility problems with hard-coded unix style directory separators though.

I think I hunted down an answer.
I make a module data_path.py, which I import into my other modules containing:
data_path = os.path.join(os.path.dirname(__file__),'data')
And then I open all my files with
open(os.path.join(data_path,'filename'), <param>)

Related

How do I ensure that a python package module saves results to a sub-directory of that package?

I'm creating a package with the following structure
/package
__init__.py
/sub_package_1
__init__.py
other_stuff.py
/sub_package_2
__init__.py
calc_stuff.py
/results_dir
I want to ensure that calc_stuff.py will save results to /results_dir, unless otherwise specified (yes, I'm not entirely certain having a results directory in my package is the best idea, but it should work well for now). However, since I don't know from where, or on which machine calc_stuff will be run, I need the package, or at least my_calc.py, to know where it is saved.
So far the two approaches I have tried:
from os import path
saved_dir = path.join(path.dirname(__file__), 'results_dir')
and
from pkg_resources import resource_filename
filepath = resource_filename(__name__, 'results_dir')
have only given me paths relative to the root of the package.
What do I need to do to ensure a statement along the lines of:
pickle.dump(my_data,open(os.path.join(full_path,
'results_dir',
'results.pkl'), 'wb')
Will result in a pickle file being saved into results_dir ?
I'm not entirely certain having a results directory in my package is the best idea, me either :)
But, if you were to put a function like the following inside a module in subpackage2, it should return a path consisting of (module path minus filename, 'results_dir', the filename you passed the function as an argument):
def get_save_path(filename):
import os
return os.path.join(os.path.dirname(__file__), "results_dir", filename)
C:\Users\me\workspaces\workspace-oxygen\test36\TestPackage\results_dir\foo.ext

Adding and reading a config.ini file inside python package

I am writing my first python package which I want to upload on PyPI. I structured my code based on this blog post.
I want to store user setting in a config.ini file. Read it once(every time the package is run) in separate python module in same package and save user setting in global variables of that module. Later import those in other modules.
To recreate the error I just edited few lines of code, in the template described in the blog post. (Please refer to it since it would take too much typing to recreate entire thing here in question.)
The only difference is that my stuff.py reads from config file like this:
from ConfigParser import SafeConfigParser
config = SafeConfigParser()
config.read('config.ini')
TEST_KEY = config.get('main', 'test_key')
Here are the contents of config.ini (placed in same dir as stuff.py):
[main]
test_key=value
And my bootstrap.py just imports and print the TEST_KEY
from .stuff import TEST_KEY
def main():
print(TEST_KEY)
But on executing the package, the import fails give this error
Traceback (most recent call last):
File "D:\Coding\bootstrap\bootstrap-runner.py", line 8, in <module>
from bootstrap.bootstrap import main
File "D:\Coding\bootstrap\bootstrap\bootstrap.py", line 11, in <module>
from .stuff import TEST_KEY
File "D:\Coding\bootstrap\bootstrap\stuff.py", line 14, in <module>
TEST_KEY = config.get('main', 'test_key')
File "C:\Python27\Lib\ConfigParser.py", line 607, in get
raise NoSectionError(section)
ConfigParser.NoSectionError: No section: 'main'
Import keeps giving ConfigParser.NoSectionError, but if you build/run only stuff.py(I use sublime3), the module gives no errors and printing TEST_KEY gives value as output.
Also, this method of import does work when I just use 3 files(config, stuff, main) in a dir and just execute the main as a script. But there I had to import it like this
from stuff import TEST_KEY
I'm just using the explicit relative imports as described in that post but don't have enough understanding of those. I guess the error is due to project structure and import, since running stuff.py as standalone script raises no ConfigParser.NoSectionError.
Other method to read the config file once and then use data in other modules will be really helpful as well.
There are two aspects to this question. First is the weird behavior of ConfigParser. When ConfigParser is unable to locate the .ini file; it never gives, for some annoying reason, an IOError or an error which indicates that it is unable to read the file.
In my case it keeps giving ConfigParser.NoSectionError when the section is clearly present. When I caught the ConfigParser.NoSectionError error it gave an ImportError! But it never tells you that it is simply unable to read the file.
Second is how to safely read the data files that are included in your package. The only way I found to do this was to use the __file__ parameter. This is how you would safely read the config.ini in the above question, for Python27 and Python3:
import os
try:
# >3.2
from configparser import ConfigParser
except ImportError:
# python27
# Refer to the older SafeConfigParser as ConfigParser
from ConfigParser import SafeConfigParser as ConfigParser
config = ConfigParser()
# get the path to config.ini
config_path = os.path.join(os.path.dirname(os.path.abspath(__file__)), 'config.ini')
# check if the path is to a valid file
if not os.path.isfile(config_path):
raise BadConfigError # not a standard python exception
config.read(config_path)
TEST_KEY = config.get('main', 'test_key') # value
This relies on the fact that config.ini is located inside our package bootstrap and is expected to be shipped with it.
The important bit is how you get the config_path:
config_path = os.path.join(os.path.dirname(os.path.abspath(__file__)), 'config.ini')
__file__ refers to the location of current script that is being executed. In my question that means the location of stuff.py, that is inside the bootstrap folder, and so is config.ini.
The above line of code then means; get the absolute path to stuff.py; from that get path to the directory containing it; and join that with config.ini (since it is in same directory) to give absolute path to the config.ini. Then you can proceed to read it and raise an exception just in case.
This will work even when you release your package on pip and a user installs it from there.
As a bonus, and digressing from the question a bit, if you are releasing your package on pip with data files inside your package, you must tell setuptools to include them inside you package when you build sdist and bdists. So to include the config.ini in above package add the following lines to setup class call in setup.py:
include_package_data = True,
package_data = {
# If any package contains *.ini files, include them
'': ['*.ini'],
},
But it still may not work in some cases eg. building wheels etc. So you also do the same in your MANIFEST.IN file:
include LICENSE
include bootstrap/*.ini
abhimanyuPathania : The issue is with path of config.ini in stuff.py. Change config.read('config.ini') to config.read('./bootstrap/config.ini') in stuff.py. I tried the solution. It works for me.
Enjoying Pythoning...

How do I use test resources (like a fixed yaml file) with pytest?

I've looked around the docs on the pytest website, but haven't found a clear example of working with 'test resources', such as reading in fixed files during unit tests. Something similar to what http://jlorenzen.blogspot.com/2007/06/proper-way-to-access-file-resources-in.html describes for Java.
For example, if I have a yaml file checked in to source control, what is the right way to write a test which loads from that file? I think this boils down to understanding the right way to access a 'resource file' on the python equivalent of the classpath (PYTHONPATH?).
This seems like it should be simple. Is there an easy solution?
Perhaps what you are looking for is pkg_resources or pkgutil. For example, if you have a module within your python source called "resources", you could read your "resourcefile" using:
with open(pkg_resources.resource_filename("resources", "resourcefile")) as infile:
for line in infile:
print(line)
or:
with tempfile.TemporaryFile() as outfile:
outfile.write(pkgutil.get_data("resources", "resourcefile"))
The latter even works when your "script" is an executable zip file. The former works without needing to unpack your resources from an egg.
Note that creating a subdirectory of your source does not make it a module. You need to add a file named __init__.py within the directory for it to be visible as a module for the purposes of pkg_resources and pkgutil. __init__.py can be empty.
I think "resource file" is whatever definition you give to it in python (in Java, resource files can be bundled into jar files with ordinary Java classes, and Java provides library functions to access this information).
An equivalent solution might be to access the PYTHONPATH environment variable, define your "resource file" as a relative path, and then troll the PYTHONPATH looking for it. Here's an example:
pythonpath = os.env['PYTHONPATH']
file_relative_path = os.path.join('subdir', 'resourcefile') // e.g. subdir/resourcefile
for dir in pythonpath.split(os.pathsep):
resource_path = os.path.join(dir, file_relative_path)
if os.path.exists(resource_path):
return resource_path
This code snippet returns a full path for the first file that exists on the PYTHONPATH.

Way to access resource files in python

What is the proper way to access resources in python programs.
Basically in many of my python modules I end up writing code like that:
DIRNAME = os.path.split(__file__)[0]
(...)
template_file = os.path.join(DIRNAME, "template.foo")
Which is OK but:
It will break if I will start to use python zip packages
It is boilerplate code
In Java I had a function that did exactly the same --- but worked both when code was lying in bunch of folders and when it was packaged in .jar file.
Is there such function in Python, or is there any other pattern that I might use.
You'll want to look at using either get_data in the stdlib or pkg_resources from setuptools/distribute. Which one you use probably depends on whether you're already using distribute to package your code as an egg.
Since version 3.7 of Python, the proper way to access a file in resources is to use the importlib.resources library.
One can, for example, use the path function to access a particular file in a Python package:
import importlib.resources
with importlib.resources.path("your.package.templates", "template.foo") as template_file:
...
Starting with Python 3.9, this package introduced the files() API, to be preferred over the legacy API.
One can, use the files function to access a particular file in a Python package:
template_res = importlib.resources.files("your.package.templates").joinpath("template.foo")
with importlib.resources.as_file(template_res) as template_file:
...
For older versions, I recommend to install and use the importlib-resources library. The documentation also explains in detail how to migrate your old implementation using pkg_resources to importlib-resources.
Trying to understand how we could combine the two aspect togather
Loading for resources in native filesystem
Packaged in zipped files
Reading through the quick tutorial on zipimport : http://www.doughellmann.com/PyMOTW/zipimport/
I see the following example:
import sys
sys.path.insert(0, 'zipimport_example.zip')
import os
import zipimport
importer = zipimport.zipimporter('zipimport_example.zip')
module = importer.load_module('example_package')
print module.__file__
print module.__loader__.get_data('example_package/README.txt')
I think that output of __file__ is "zipimport_example.zip/example_package/__init__.pyc"
Need to check how it looks from inside.
But then we could always do something like this:
if ".zip" in example_package.__file__:
...
load using get_data
else:
load by building the correct file path
[Edit:] I have tried to work out the example a bit better.
If the the package gets imported as zipped file then, two things happen
__file__ contains ".zip" in it's path.
__loader__ is available in the name space
If these two conditions are met then within the package you could do:
print __loader__.get_data(os.path.join('package_name','README.txt'))
else the module was loaded normally and you can follow the regular approach to loading the file.
I guess the zipimport standard python module could be an answer...
EDIT: well, not the use of the module directly, but using sys.path as shown in the example could be a good way:
I have a zip file test.zip with one python module test and a file test.foo inside
to test that for the zipped python module test can be aware of of test.foo, it contains this code:
c
import os
DIRNAME = os.path.dirname(__file__)
if os.path.exists(os.path.join(DIRNAME, 'test.foo')):
print 'OK'
else:
print 'KO'
Test looks ok:
>>> import sys
>>> sys.path.insert(0, r'D:\DATA\FP12210\My Documents\Outils\SVN\05_impl\2_tools\test.zip')
>>> import test
OK
>>>
So a solution could be to loop in your zip file to retrieve all python modules, and add them in sys.path; this piece of code would be ideally the 1st one loaded by your application.

Accessing resource files in Python unit tests & main code

I have a Python project with the following directory structure:
project/
project/src/
project/src/somecode.py
project/src/mypackage/mymodule.py
project/src/resources/
project/src/resources/datafile1.txt
In mymodule.py, I have a class (lets call it "MyClass") which needs to load datafile1.txt. This sort of works when I do:
open ("../resources/datafile1.txt")
Assuming the code that creates the MyClass instance created is run from somecode.py.
The gotcha however is that I have unit tests for mymodule.py which are defined in that file, and if I leave the relative pathname as described above, the unittest code blows up as now the code is being run from project/src/mypackage instead of project/src and the relative filepath doesn't resolve correctly.
Any suggestions for a best practice type approach to resolve this problem? If I move my testcases into project/src that clutters the main source folder with testcases.
I usually use this to get a relative path from my module. Never tried in a unittest tho.
import os
print(os.path.join(os.path.dirname(__file__),
'..',
'resources'
'datafile1.txt'))
Note: The .. tricks works pretty well, but if you change your directory structure you would need to update that part.
On top of the above answers, I'd like to add some Python 3 tricks to make your tests cleaner.
With the help of the pathlib library, you can explicit your ressources import in your tests. It even handles the separators difference between Unix (/) and Windows ().
Let's say we have a folder structure like this :
`-- tests
|-- test_1.py <-- You are here !
|-- test_2.py
`-- images
|-- fernando1.jpg <-- You want to import this image !
`-- fernando2.jpg
You are in the test_1.py file, and you want to import fernando1.jpg. With the help to the pathlib library, you can read your test resource with an object oriented logic as follows :
from pathlib import Path
current_path = Path(os.path.dirname(os.path.realpath(__file__)))
image_path = current_path / "images" / "fernando1.jpg"
with image_path.open(mode='rb') as image :
# do what you want with your image object
But there's actually convenience methods to make your code more explicit than mode='rb', as :
image_path.read_bytes() # Which reads bytes of an object
text_file_path.read_text() # Which returns you text file content as a string
And there you go !
in each directory that contains Python scripts, put a Python module that knows the path to the root of the hierarchy. It can define a single global variable with the relative path. Import this module in each script. Python searches the current directory first so it will always use the version of the module in the current directory, which will have the relative path to the root of the current directory. Then use this to find your other files. For example:
# rootpath.py
rootpath = "../../../"
# in your scripts
from rootpath import rootpath
datapath = os.path.join(rootpath, "src/resources/datafile1.txt")
If you don't want to put additional modules in each directory, you could use this approach:
Put a sentinel file in the top level of the directory structure, e.g. thisisthetop.txt. Have your Python script move up the directory hierarchy until it finds this file. Write all your pathnames relative to that directory.
Possibly some file you already have in the project directory can be used for this purpose (e.g. keep moving up until you find a src directory), or you can name the project directory in such a way to make it apparent.
You can access files in a package using importlib.resources (mind Python version compatibility of the individual functions, there are backports available as importlib_resources), as described here. Thus, if you put your resources folder into your mypackage, like
project/src/mypackage/__init__.py
project/src/mypackage/mymodule.py
project/src/mypackage/resources/
project/src/mypackage/resources/datafile1.txt
you can access your resource file in code without having to rely on inferring file locations of your scripts:
import importlib.resources
file_path = importlib.resources.files('mypackage').joinpath('resources/datafile1.txt')
with open(file_path) as f:
do_something_with(f)
Note, if you distribute your package, don't forget to include the resources/ folder when creating the package.
The filepath will be relative to the script that you initially invoked. I would suggest that you pass the relative path in as an argument to MyClass. This way, you can have different paths depending on which script is invoking MyClass.

Categories

Resources