I have a question regarding one single module that is distributed over multiple directories.
Let's say I have these two file and directories:
~/lib/python
xxx
__init__.py
util
__init__.py
module1.py
module2.py
~/graphics/python
xxx
__init__.py
misc
__init__.py
module3.py
module4.py
So then in my Python modules, I did this:
import sys
pythonlibpath = '~/lib/python'
if pythonlibpath not in sys.path: sys.path.append(pythonlibpath)
import xxx.util.module1
which works.
Now, the problem is that I need xxx.misc.module3, so I did this:
import sys
graphicslibpath = '~/graphics/python'
if graphicslibpath not in sys.path: sys.path.append(graphicslibpath)
import xxx.misc.module3
but I get this error:
ImportError: No module named misc.module3
It seems like it somehow still remembers that there was a xxx package in ~/lib/python and then tries to find misc.module3 from there.
How do I get around this issue?
You can't without an extreme amount of trickery that pulls one package structure into the other. Python requires that all modules in a package be under a single subdirectory. See the os source to learn how it handles os.path.
Python does indeed remember that there was a xxx package. This is pretty much necessary to achieve acceptable performance, once modules and packages are loaded they are cached. You can see which modules are loaded by looking the the dictionary sys.modules.
sys.modules is a normal dictionary so you can remove a package from it to force it to be reloaded like below:
import sys
print sys.modules
import xml
print sys.modules
del sys.modules['xml']
print sys.modules
Notice that after importing the xml package it is the dictionary, however it is possible to remove it from that dictionary too. This is a point I make for pedagogical purposes only, I would not recommend this approach in a real application. Also if you need to use your misc and util packages together this would not work so great. If at all possible rearrange your source code structure to better fit the normal Python module loading mechanism.
This is addressed by Implicit Namespace Packages in Python 3.3. See PEP-420.
This is an adaptation of an answer to a similar question.
Following up on #Gary's answer, the PEP 420 page says to use the following code on shared __init__.py packages.
__init__.py:
from pkgutil import extend_path
__path__ = extend_path(__path__, __name__)
This code should be placed inside the xxx directory's __init__.py.
See the *s below
someroot/
├── graphics
│ └── python
│ └── xxx
│ ├── ****__init__.py****
│ └── misc
│ ├── __init__.py
│ ├── module3.py
│ └── module4.py
└── lib
└── python
└── xxx
├── ****__init__.py****
└── util
├── __init__.py
├── module1.py
└── module2.py
Some setup.sh file to add to the Python Path:
libPath=someroot/lib/python/
graphicsPath=someroot/graphics/python/
export PYTHONPATH=$PYTHONPATH:$libPath:$graphicsPath
Python test code (tested on Python versions 2.7.14 and 3.6.4 using pyenv):
import xxx.util.module1
import xxx.misc.module3 # No errors
Related
Could you tell me how can I read a file that is inside my Python package?
My situation
A package that I load has a number of templates (text files used as strings) that I want to load from within the program. But how do I specify the path to such file?
Imagine I want to read a file from:
package\templates\temp_file
Some kind of path manipulation? Package base path tracking?
TLDR; Use standard-library's importlib.resources module as explained in the method no 2, below.
The traditional pkg_resources from setuptools is not recommended anymore because the new method:
it is significantly more performant;
is is safer since the use of packages (instead of path-stings) raises compile-time errors;
it is more intuitive because you don't have to "join" paths;
it is faster when developing since you don't need an extra dependency (setuptools), but rely on Python's standard-library alone.
I kept the traditional listed first, to explain the differences with the new method when porting existing code (porting also explained here).
Let's assume your templates are located in a folder nested inside your module's package:
<your-package>
+--<module-asking-the-file>
+--templates/
+--temp_file <-- We want this file.
Note 1: For sure, we should NOT fiddle with the __file__ attribute (e.g. code will break when served from a zip).
Note 2: If you are building this package, remember to declatre your data files as package_data or data_files in your setup.py.
1) Using pkg_resources from setuptools(slow)
You may use pkg_resources package from setuptools distribution, but that comes with a cost, performance-wise:
import pkg_resources
# Could be any dot-separated package/module name or a "Requirement"
resource_package = __name__
resource_path = '/'.join(('templates', 'temp_file')) # Do not use os.path.join()
template = pkg_resources.resource_string(resource_package, resource_path)
# or for a file-like stream:
template = pkg_resources.resource_stream(resource_package, resource_path)
Tips:
This will read data even if your distribution is zipped, so you may set zip_safe=True in your setup.py, and/or use the long-awaited zipapp packer from python-3.5 to create self-contained distributions.
Remember to add setuptools into your run-time requirements (e.g. in install_requires`).
... and notice that according to the Setuptools/pkg_resources docs, you should not use os.path.join:
Basic Resource Access
Note that resource names must be /-separated paths and cannot be absolute (i.e. no leading /) or contain relative names like "..". Do not use os.path routines to manipulate resource paths, as they are not filesystem paths.
2) Python >= 3.7, or using the backported importlib_resources library
Use the standard library's importlib.resources module which is more efficient than setuptools, above:
try:
import importlib.resources as pkg_resources
except ImportError:
# Try backported to PY<37 `importlib_resources`.
import importlib_resources as pkg_resources
from . import templates # relative-import the *package* containing the templates
template = pkg_resources.read_text(templates, 'temp_file')
# or for a file-like stream:
template = pkg_resources.open_text(templates, 'temp_file')
Attention:
Regarding the function read_text(package, resource):
The package can be either a string or a module.
The resource is NOT a path anymore, but just the filename of the resource to open, within an existing package; it may not contain path separators and it may not have sub-resources (i.e. it cannot be a directory).
For the example asked in the question, we must now:
make the <your_package>/templates/ into a proper package, by creating an empty __init__.py file in it,
so now we can use a simple (possibly relative) import statement (no more parsing package/module names),
and simply ask for resource_name = "temp_file" (no path).
Tips:
To access a file inside the current module, set the package argument to __package__, e.g. pkg_resources.read_text(__package__, 'temp_file') (thanks to #ben-mares).
Things become interesting when an actual filename is asked with path(), since now context-managers are used for temporarily-created files (read this).
Add the backported library, conditionally for older Pythons, with install_requires=[" importlib_resources ; python_version<'3.7'"] (check this if you package your project with setuptools<36.2.1).
Remember to remove setuptools library from your runtime-requirements, if you migrated from the traditional method.
Remember to customize setup.py or MANIFEST to include any static files.
You may also set zip_safe=True in your setup.py.
A packaging prelude:
Before you can even worry about reading resource files, the first step is to make sure that the data files are getting packaged into your distribution in the first place - it is easy to read them directly from the source tree, but the important part is making sure these resource files are accessible from code within an installed package.
Structure your project like this, putting data files into a subdirectory within the package:
.
├── package
│ ├── __init__.py
│ ├── templates
│ │ └── temp_file
│ ├── mymodule1.py
│ └── mymodule2.py
├── README.rst
├── MANIFEST.in
└── setup.py
You should pass include_package_data=True in the setup() call. The manifest file is only needed if you want to use setuptools/distutils and build source distributions. To make sure the templates/temp_file gets packaged for this example project structure, add a line like this into the manifest file:
recursive-include package *
Historical cruft note: Using a manifest file is not needed for modern build backends such as flit, poetry, which will include the package data files by default. So, if you're using pyproject.toml and you don't have a setup.py file then you can ignore all the stuff about MANIFEST.in.
Now, with packaging out of the way, onto the reading part...
Recommendation:
Use standard library pkgutil APIs. It's going to look like this in library code:
# within package/mymodule1.py, for example
import pkgutil
data = pkgutil.get_data(__name__, "templates/temp_file")
It works in zips. It works on Python 2 and Python 3. It doesn't require third-party dependencies. I'm not really aware of any downsides (if you are, then please comment on the answer).
Bad ways to avoid:
Bad way #1: using relative paths from a source file
This is currently the accepted answer. At best, it looks something like this:
from pathlib import Path
resource_path = Path(__file__).parent / "templates"
data = resource_path.joinpath("temp_file").read_bytes()
What's wrong with that? The assumption that you have files and subdirectories available is not correct. This approach doesn't work if executing code which is packed in a zip or a wheel, and it may be entirely out of the user's control whether or not your package gets extracted to a filesystem at all.
Bad way #2: using pkg_resources APIs
This is described in the top-voted answer. It looks something like this:
from pkg_resources import resource_string
data = resource_string(__name__, "templates/temp_file")
What's wrong with that? It adds a runtime dependency on setuptools, which should preferably be an install time dependency only. Importing and using pkg_resources can become really slow, as the code builds up a working set of all installed packages, even though you were only interested in your own package resources. That's not a big deal at install time (since installation is once-off), but it's ugly at runtime.
Bad way #3: using legacy importlib.resources APIs
This is currently the recommendation in the top-voted answer. It's in the standard library since Python 3.7. It looks like this:
from importlib.resources import read_binary
data = read_binary("package.templates", "temp_file")
What's wrong with that? Well, unfortunately, the implementation left some things to be desired and it is likely to be was deprecated in Python 3.11. Using importlib.resources.read_binary, importlib.resources.read_text and friends will require you to add an empty file templates/__init__.py so that data files reside within a sub-package rather than in a subdirectory. It will also expose the package/templates subdirectory as an importable package.templates sub-package in its own right. This won't work with many existing packages which are already published using resource subdirectories instead of resource sub-packages, and it's inconvenient to add the __init__.py files everywhere muddying the boundary between data and code.
This approach was deprecated in upstream importlib_resources in 2021, and was deprecated in stdlib from version Python 3.11. bpo-45514 tracked the deprecation and migrating from legacy offers _legacy.py wrappers to aid with transition.
Honorable mention: using newer importlib_resources APIs
This has not been mentioned in any other answers yet, but importlib_resources is more than a simple backport of the Python 3.7+ importlib.resources code. It has traversable APIs which you can use like this:
import importlib_resources
my_resources = importlib_resources.files("package")
data = (my_resources / "templates" / "temp_file").read_bytes()
This works on Python 2 and 3, it works in zips, and it doesn't require spurious __init__.py files to be added in resource subdirectories. The only downside vs pkgutil that I can see is that these new APIs are only available in the stdlib for Python-3.9+, so there is still a third-party dependency needed to support older Python versions. If you only need to run on Python-3.9+ then use this approach, or you can add a compatibility layer and a conditional dependency on the backport for older Python versions:
# in your library code:
try:
from importlib.resources import files
except ImportError:
from importlib_resources import files
# in your setup.py or similar:
from setuptools import setup
setup(
...
install_requires=[
'importlib_resources; python_version < "3.9"',
]
)
Example project:
I've created an example project on github and uploaded on PyPI, which demonstrates all five approaches discussed above. Try it out with:
$ pip install resources-example
$ resources-example
See https://github.com/wimglenn/resources-example for more info.
The content in "10.8. Reading Datafiles Within a Package" of Python Cookbook, Third Edition by David Beazley and Brian K. Jones giving the answers.
I'll just get it to here:
Suppose you have a package with files organized as follows:
mypackage/
__init__.py
somedata.dat
spam.py
Now suppose the file spam.py wants to read the contents of the file somedata.dat. To do
it, use the following code:
import pkgutil
data = pkgutil.get_data(__package__, 'somedata.dat')
The resulting variable data will be a byte string containing the raw contents of the file.
The first argument to get_data() is a string containing the package name. You can
either supply it directly or use a special variable, such as __package__. The second
argument is the relative name of the file within the package. If necessary, you can navigate
into different directories using standard Unix filename conventions as long as the
final directory is still located within the package.
In this way, the package can installed as directory, .zip or .egg.
In case you have this structure
lidtk
├── bin
│ └── lidtk
├── lidtk
│ ├── analysis
│ │ ├── char_distribution.py
│ │ └── create_cm.py
│ ├── classifiers
│ │ ├── char_dist_metric_train_test.py
│ │ ├── char_features.py
│ │ ├── cld2
│ │ │ ├── cld2_preds.txt
│ │ │ └── cld2wili.py
│ │ ├── get_cld2.py
│ │ ├── text_cat
│ │ │ ├── __init__.py
│ │ │ ├── README.md <---------- say you want to get this
│ │ │ └── textcat_ngram.py
│ │ └── tfidf_features.py
│ ├── data
│ │ ├── __init__.py
│ │ ├── create_ml_dataset.py
│ │ ├── download_documents.py
│ │ ├── language_utils.py
│ │ ├── pickle_to_txt.py
│ │ └── wili.py
│ ├── __init__.py
│ ├── get_predictions.py
│ ├── languages.csv
│ └── utils.py
├── README.md
├── setup.cfg
└── setup.py
you need this code:
import pkg_resources
# __name__ in case you're within the package
# - otherwise it would be 'lidtk' in this example as it is the package name
path = 'classifiers/text_cat/README.md' # always use slash
filepath = pkg_resources.resource_filename(__name__, path)
The strange "always use slash" part comes from setuptools APIs
Also notice that if you use paths, you must use a forward slash (/) as the path separator, even if you are on Windows. Setuptools automatically converts slashes to appropriate platform-specific separators at build time
In case you wonder where the documentation is:
PEP 0365
https://packaging.python.org/guides/single-sourcing-package-version/
The accepted answer should be to use importlib.resources. pkgutil.get_data also requires the argument package be a non-namespace package (see pkgutil docs). Hence, the directory containing the resource must have an __init__.py file, making it have the exact same limitations as importlib.resources. If the overhead issue of pkg_resources is not a concern, this is also an acceptable alternative.
Pre-Python-3.3, all packages were required to have an __init__.py. Post-Python-3.3, a folder doesn't need an __init__.py to be a package. This is called a namespace package. Unfortunately, pkgutil does not work with namespace packages (see pkgutil docs).
For example, with the package structure:
+-- foo/
| +-- __init__.py
| +-- bar/
| | +-- hi.txt
where hi.txt just has Hi!, you get the following
>>> import pkgutil
>>> rsrc = pkgutil.get_data("foo.bar", "hi.txt")
>>> print(rsrc)
None
However, with an __init__.py in bar, you get
>>> import pkgutil
>>> rsrc = pkgutil.get_data("foo.bar", "hi.txt")
>>> print(rsrc)
b'Hi!'
assuming you are using an egg file; not extracted:
I "solved" this in a recent project, by using a postinstall script, that extracts my templates from the egg (zip file) to the proper directory in the filesystem. It was the quickest, most reliable solution I found, since working with __path__[0] can go wrong sometimes (i don't recall the name, but i cam across at least one library, that added something in front of that list!).
Also egg files are usually extracted on the fly to a temporary location called the "egg cache". You can change that location using an environment variable, either before starting your script or even later, eg.
os.environ['PYTHON_EGG_CACHE'] = path
However there is pkg_resources that might do the job properly.
I am struggling with nested __init__.py in a Python package I am writting. The Package has the following architecture:
module/
├── __init__.py
├── submodule1
│ ├── __init__.py
│ └── source.py
└── submodule2
├── __init__.py
├── source.py
└── subsubmodule2
├── __init__.py
└── source.py
My intent is to be able to access functions defined in submodule2/source.py through module.submodule2.function and in subsubmodules2/source.py through module.submodule2.subsubmodule2.function.
The first thing I tried was to define __init__.py in submodule2 this way:
from .subsubmodule2 import *
But doing so, I get the functions defined in subsubmodules2/source.py through module.submodule2.function (and module.function).
If I do:
from . import subsubmodule2
I get these functions through module.subsubmodule2.function.
I also tried to define __all__ keyword in __init__, with no more success. If I understand well Python documentation, I guess I could leave empty __init__.py files and it could work, but from my understanding that is not the best practice either.
What would be the best way to access these functions as intended in my module?
in module __init__.py file write the module which you want to import as
from . import submodule1
from . import submodule2
__all__ = ['submodule1', 'submodule2']
Now, in submodule1 __init__.py file write
from . import source
from . import subsubmodule
# if you want to import functions from source then import them or in source.py
# define __all__ and add function which you want to expose
__all__ = ['source', 'subsubmodule']
now in subsubmodule __init__ file define function or class which you want to expose
from . source import *
__all__ = ['source']
# if you want to use * as import, i suggest you to use __all__ in source.py and mention all exposed function there
The __init__.py file represents its respective package. For example, module/submodule2/__init__.py represents module. submodule2 .
In order to pull objects defined in submodules into their package namespace, import them:
# module/submodule2/__init__.py
from .source import *
Since __init__.py is a regular Python module, one can also forgo a separate .source module and define objects directly inside __init__.py:
# module/submodule2/__init__.py
def function():
...
Note that subpackages themselves are already available as their respective name. One does not have to – and in fact should not – import them in the parent module. They will be imported if code using the package imports them.
I have a beeware project and also want to use my own modules in it like Models and Controllers. Also, a module which creates some objects I can test with.
But when I want to import the module to create the test objects and use the method it just throws an error:
ImportError: attempted relative import beyond top-level package
After some research, I know that the path (directory) structures, where I put my modules in, and where the package is, are important. But where ever I put the modules it has the same (or kinda like this) errors. But I can import my Models to create objects of these classes. I also can't decide where the start point of the briefcase is.
Here my structure currently:
/Project_Dir (own created)
/briefcase_project (created from briefcase)
/src
/Models (own created)
/app_directory (created from briefcase)
here is the __main__.py and the __init__.py (the start point I guess) and the app.py (where beeware code is, and also my module import from Test)
/Test (own created, here is a file with a method I want to call)
Sadly there is not so much stuff to find about beeware so I could find a solution.
Please help. Thanks ^^
I did the following to workaround the issue. The example using the Beeware Tutorial 2 source code is on Github
.
├── __init__.py
├── __main__.py
├── app.py
├── mylib <--- # my lib.
│ ├── __init__.py
│ └── testlib.py
└── resources
├── __init__.py
├── beewarecustomlibexample.icns
├── beewarecustomlibexample.ico
└── beewarecustomlibexample.png
2 directories, 9 files
The mylib/testlib.py
def test(text: str) -> str:
return f"Hello: {text}"
In the app.py:
import toga
from toga.style import Pack
from toga.style.pack import COLUMN, ROW
from beewarecustomlibexample.mylib.testlib import test # Import custom lib
class BeewareCustomLibExample(toga.App):
def startup(self):
...
def say_hello(self, widget):
# Calling my test method
result = test(self.name_input.value)
self.main_window.info_dialog("Test Dialog", result)
def main():
return BeewareCustomLibExample()
The above is how I got it working. I've built it on MacOS and works fine.
Take your project folder name and then import from there, so if you're tinkering with the tutorial and you've set up a module folder called myModule in the same directory as your app.py and you have a file called file.py with a class called myClass, you might type:
from helloworld.myModule.file import myClass
I've attempted a few different techniques trying to do something that to me seems doable but I guess I am missing some gotchas about python (using 2.7 but would like this to work also for 3.* if possible).
I am not sure about terminology like package or module, but to me the following seems quite a "simple" doable scenario.
This is the directory structure:
.
├── job
│ └── the_script.py
└── modules
├── __init__.py
└── print_module.py
The content of the_script.py:
# this does not work
import importlib
print_module = importlib.import_module('.print_module', '..modules')
# this also does not work
from ..modules import print_module
print_module.do_stuff()
The content of print_module:
def do_stuff():
print("This should be in stdout")
I would like to run all this "relative paths" stuff as:
/job$ python2 the_script.py
But the importlib.import_module gives various errors:
if I just use 1 input parameter ..modules.print_module, then I get: TypeError("relative imports require the 'package' argument")
if I use 2 input parameters (as in the example above), then I get: ValueError: Empty module name
On the other hand using the from ..modules syntax I get: ValueError: Attempted relative import in non-package.
I think the __init__.py empty file should be enough to qualify that code as "packages" (or modules? not sure about the terminology), but it seems there's something I am missing about how to manage relative paths.
I read that in the past people was hacking this using the path and other functions from import os and import sys, but according to the official docs (python 2.7 and 3.*) this should not be needed anymore.
What am I doing wrong and how could I achieve the result of printing the content modules/print_module.do_stuff calling it from a script in the "relative directory" job/?
If you follow the structure of this guide here: http://docs.python-guide.org/en/latest/writing/structure/#test-suite (highly recommend reading it all, it is very helpful) you will see this:
To give the individual tests import context, create a tests/context.py file:
import os
import sys
sys.path.insert(0, os.path.abspath(os.path.join(os.path.dirname(__file__), '..')))
import sample
Then, within the individual test modules, import the module like so:
from .context import sample
This will always work as expected, regardless of installation method.
Translated in your case this means:
root_folder
├── job
│ ├── context.py <- create this file
│ └── the_script.py
└── modules
├── __init__.py
└── print_module.py
In the context.py file write the lines shown above, but import modules instead of import samples
Finally in your the_script.py: from .context import module and you will be set to go!
Good luck :)
I found a solution using sys and os.
The script the_script.py should be:
import sys
import os
lib_path = os.path.abspath(os.path.join(os.path.dirname(__file__), '../modules'))
sys.path.append(lib_path)
# commenting out the following shows the `modules` directory in the path
# print(sys.path)
import print_module
print_module.do_stuff()
Then I can run it via command line no matter where I am in the path e.g.:
/job$ python2 the_script.py
<...>/job$ python2 <...>/job/the_script.py
If you are not sure about terminology go to very nice tutorials:
http://docs.python-guide.org/en/latest/writing/structure/#modules
and
http://docs.python-guide.org/en/latest/writing/structure/#packages
But for your structure:
.
├── job
│ └── the_script.py
└── modules
├── __init__.py
└── print_module.py
just say in the the_script.py:
import sys
sys.append('..')
import modules.print_module
This will add parent directory to PYTHONPATH, and python will see directory 'parallel' to job directory and it will work.
I think that at the most basic level it is sufficent to know that:
package is any directory with __init__.py file
module is a file with .py, but when you are importing module you omit extension.
Could you tell me how can I read a file that is inside my Python package?
My situation
A package that I load has a number of templates (text files used as strings) that I want to load from within the program. But how do I specify the path to such file?
Imagine I want to read a file from:
package\templates\temp_file
Some kind of path manipulation? Package base path tracking?
TLDR; Use standard-library's importlib.resources module as explained in the method no 2, below.
The traditional pkg_resources from setuptools is not recommended anymore because the new method:
it is significantly more performant;
is is safer since the use of packages (instead of path-stings) raises compile-time errors;
it is more intuitive because you don't have to "join" paths;
it is faster when developing since you don't need an extra dependency (setuptools), but rely on Python's standard-library alone.
I kept the traditional listed first, to explain the differences with the new method when porting existing code (porting also explained here).
Let's assume your templates are located in a folder nested inside your module's package:
<your-package>
+--<module-asking-the-file>
+--templates/
+--temp_file <-- We want this file.
Note 1: For sure, we should NOT fiddle with the __file__ attribute (e.g. code will break when served from a zip).
Note 2: If you are building this package, remember to declatre your data files as package_data or data_files in your setup.py.
1) Using pkg_resources from setuptools(slow)
You may use pkg_resources package from setuptools distribution, but that comes with a cost, performance-wise:
import pkg_resources
# Could be any dot-separated package/module name or a "Requirement"
resource_package = __name__
resource_path = '/'.join(('templates', 'temp_file')) # Do not use os.path.join()
template = pkg_resources.resource_string(resource_package, resource_path)
# or for a file-like stream:
template = pkg_resources.resource_stream(resource_package, resource_path)
Tips:
This will read data even if your distribution is zipped, so you may set zip_safe=True in your setup.py, and/or use the long-awaited zipapp packer from python-3.5 to create self-contained distributions.
Remember to add setuptools into your run-time requirements (e.g. in install_requires`).
... and notice that according to the Setuptools/pkg_resources docs, you should not use os.path.join:
Basic Resource Access
Note that resource names must be /-separated paths and cannot be absolute (i.e. no leading /) or contain relative names like "..". Do not use os.path routines to manipulate resource paths, as they are not filesystem paths.
2) Python >= 3.7, or using the backported importlib_resources library
Use the standard library's importlib.resources module which is more efficient than setuptools, above:
try:
import importlib.resources as pkg_resources
except ImportError:
# Try backported to PY<37 `importlib_resources`.
import importlib_resources as pkg_resources
from . import templates # relative-import the *package* containing the templates
template = pkg_resources.read_text(templates, 'temp_file')
# or for a file-like stream:
template = pkg_resources.open_text(templates, 'temp_file')
Attention:
Regarding the function read_text(package, resource):
The package can be either a string or a module.
The resource is NOT a path anymore, but just the filename of the resource to open, within an existing package; it may not contain path separators and it may not have sub-resources (i.e. it cannot be a directory).
For the example asked in the question, we must now:
make the <your_package>/templates/ into a proper package, by creating an empty __init__.py file in it,
so now we can use a simple (possibly relative) import statement (no more parsing package/module names),
and simply ask for resource_name = "temp_file" (no path).
Tips:
To access a file inside the current module, set the package argument to __package__, e.g. pkg_resources.read_text(__package__, 'temp_file') (thanks to #ben-mares).
Things become interesting when an actual filename is asked with path(), since now context-managers are used for temporarily-created files (read this).
Add the backported library, conditionally for older Pythons, with install_requires=[" importlib_resources ; python_version<'3.7'"] (check this if you package your project with setuptools<36.2.1).
Remember to remove setuptools library from your runtime-requirements, if you migrated from the traditional method.
Remember to customize setup.py or MANIFEST to include any static files.
You may also set zip_safe=True in your setup.py.
A packaging prelude:
Before you can even worry about reading resource files, the first step is to make sure that the data files are getting packaged into your distribution in the first place - it is easy to read them directly from the source tree, but the important part is making sure these resource files are accessible from code within an installed package.
Structure your project like this, putting data files into a subdirectory within the package:
.
├── package
│ ├── __init__.py
│ ├── templates
│ │ └── temp_file
│ ├── mymodule1.py
│ └── mymodule2.py
├── README.rst
├── MANIFEST.in
└── setup.py
You should pass include_package_data=True in the setup() call. The manifest file is only needed if you want to use setuptools/distutils and build source distributions. To make sure the templates/temp_file gets packaged for this example project structure, add a line like this into the manifest file:
recursive-include package *
Historical cruft note: Using a manifest file is not needed for modern build backends such as flit, poetry, which will include the package data files by default. So, if you're using pyproject.toml and you don't have a setup.py file then you can ignore all the stuff about MANIFEST.in.
Now, with packaging out of the way, onto the reading part...
Recommendation:
Use standard library pkgutil APIs. It's going to look like this in library code:
# within package/mymodule1.py, for example
import pkgutil
data = pkgutil.get_data(__name__, "templates/temp_file")
It works in zips. It works on Python 2 and Python 3. It doesn't require third-party dependencies. I'm not really aware of any downsides (if you are, then please comment on the answer).
Bad ways to avoid:
Bad way #1: using relative paths from a source file
This is currently the accepted answer. At best, it looks something like this:
from pathlib import Path
resource_path = Path(__file__).parent / "templates"
data = resource_path.joinpath("temp_file").read_bytes()
What's wrong with that? The assumption that you have files and subdirectories available is not correct. This approach doesn't work if executing code which is packed in a zip or a wheel, and it may be entirely out of the user's control whether or not your package gets extracted to a filesystem at all.
Bad way #2: using pkg_resources APIs
This is described in the top-voted answer. It looks something like this:
from pkg_resources import resource_string
data = resource_string(__name__, "templates/temp_file")
What's wrong with that? It adds a runtime dependency on setuptools, which should preferably be an install time dependency only. Importing and using pkg_resources can become really slow, as the code builds up a working set of all installed packages, even though you were only interested in your own package resources. That's not a big deal at install time (since installation is once-off), but it's ugly at runtime.
Bad way #3: using legacy importlib.resources APIs
This is currently the recommendation in the top-voted answer. It's in the standard library since Python 3.7. It looks like this:
from importlib.resources import read_binary
data = read_binary("package.templates", "temp_file")
What's wrong with that? Well, unfortunately, the implementation left some things to be desired and it is likely to be was deprecated in Python 3.11. Using importlib.resources.read_binary, importlib.resources.read_text and friends will require you to add an empty file templates/__init__.py so that data files reside within a sub-package rather than in a subdirectory. It will also expose the package/templates subdirectory as an importable package.templates sub-package in its own right. This won't work with many existing packages which are already published using resource subdirectories instead of resource sub-packages, and it's inconvenient to add the __init__.py files everywhere muddying the boundary between data and code.
This approach was deprecated in upstream importlib_resources in 2021, and was deprecated in stdlib from version Python 3.11. bpo-45514 tracked the deprecation and migrating from legacy offers _legacy.py wrappers to aid with transition.
Honorable mention: using newer importlib_resources APIs
This has not been mentioned in any other answers yet, but importlib_resources is more than a simple backport of the Python 3.7+ importlib.resources code. It has traversable APIs which you can use like this:
import importlib_resources
my_resources = importlib_resources.files("package")
data = (my_resources / "templates" / "temp_file").read_bytes()
This works on Python 2 and 3, it works in zips, and it doesn't require spurious __init__.py files to be added in resource subdirectories. The only downside vs pkgutil that I can see is that these new APIs are only available in the stdlib for Python-3.9+, so there is still a third-party dependency needed to support older Python versions. If you only need to run on Python-3.9+ then use this approach, or you can add a compatibility layer and a conditional dependency on the backport for older Python versions:
# in your library code:
try:
from importlib.resources import files
except ImportError:
from importlib_resources import files
# in your setup.py or similar:
from setuptools import setup
setup(
...
install_requires=[
'importlib_resources; python_version < "3.9"',
]
)
Example project:
I've created an example project on github and uploaded on PyPI, which demonstrates all five approaches discussed above. Try it out with:
$ pip install resources-example
$ resources-example
See https://github.com/wimglenn/resources-example for more info.
The content in "10.8. Reading Datafiles Within a Package" of Python Cookbook, Third Edition by David Beazley and Brian K. Jones giving the answers.
I'll just get it to here:
Suppose you have a package with files organized as follows:
mypackage/
__init__.py
somedata.dat
spam.py
Now suppose the file spam.py wants to read the contents of the file somedata.dat. To do
it, use the following code:
import pkgutil
data = pkgutil.get_data(__package__, 'somedata.dat')
The resulting variable data will be a byte string containing the raw contents of the file.
The first argument to get_data() is a string containing the package name. You can
either supply it directly or use a special variable, such as __package__. The second
argument is the relative name of the file within the package. If necessary, you can navigate
into different directories using standard Unix filename conventions as long as the
final directory is still located within the package.
In this way, the package can installed as directory, .zip or .egg.
In case you have this structure
lidtk
├── bin
│ └── lidtk
├── lidtk
│ ├── analysis
│ │ ├── char_distribution.py
│ │ └── create_cm.py
│ ├── classifiers
│ │ ├── char_dist_metric_train_test.py
│ │ ├── char_features.py
│ │ ├── cld2
│ │ │ ├── cld2_preds.txt
│ │ │ └── cld2wili.py
│ │ ├── get_cld2.py
│ │ ├── text_cat
│ │ │ ├── __init__.py
│ │ │ ├── README.md <---------- say you want to get this
│ │ │ └── textcat_ngram.py
│ │ └── tfidf_features.py
│ ├── data
│ │ ├── __init__.py
│ │ ├── create_ml_dataset.py
│ │ ├── download_documents.py
│ │ ├── language_utils.py
│ │ ├── pickle_to_txt.py
│ │ └── wili.py
│ ├── __init__.py
│ ├── get_predictions.py
│ ├── languages.csv
│ └── utils.py
├── README.md
├── setup.cfg
└── setup.py
you need this code:
import pkg_resources
# __name__ in case you're within the package
# - otherwise it would be 'lidtk' in this example as it is the package name
path = 'classifiers/text_cat/README.md' # always use slash
filepath = pkg_resources.resource_filename(__name__, path)
The strange "always use slash" part comes from setuptools APIs
Also notice that if you use paths, you must use a forward slash (/) as the path separator, even if you are on Windows. Setuptools automatically converts slashes to appropriate platform-specific separators at build time
In case you wonder where the documentation is:
PEP 0365
https://packaging.python.org/guides/single-sourcing-package-version/
The accepted answer should be to use importlib.resources. pkgutil.get_data also requires the argument package be a non-namespace package (see pkgutil docs). Hence, the directory containing the resource must have an __init__.py file, making it have the exact same limitations as importlib.resources. If the overhead issue of pkg_resources is not a concern, this is also an acceptable alternative.
Pre-Python-3.3, all packages were required to have an __init__.py. Post-Python-3.3, a folder doesn't need an __init__.py to be a package. This is called a namespace package. Unfortunately, pkgutil does not work with namespace packages (see pkgutil docs).
For example, with the package structure:
+-- foo/
| +-- __init__.py
| +-- bar/
| | +-- hi.txt
where hi.txt just has Hi!, you get the following
>>> import pkgutil
>>> rsrc = pkgutil.get_data("foo.bar", "hi.txt")
>>> print(rsrc)
None
However, with an __init__.py in bar, you get
>>> import pkgutil
>>> rsrc = pkgutil.get_data("foo.bar", "hi.txt")
>>> print(rsrc)
b'Hi!'
assuming you are using an egg file; not extracted:
I "solved" this in a recent project, by using a postinstall script, that extracts my templates from the egg (zip file) to the proper directory in the filesystem. It was the quickest, most reliable solution I found, since working with __path__[0] can go wrong sometimes (i don't recall the name, but i cam across at least one library, that added something in front of that list!).
Also egg files are usually extracted on the fly to a temporary location called the "egg cache". You can change that location using an environment variable, either before starting your script or even later, eg.
os.environ['PYTHON_EGG_CACHE'] = path
However there is pkg_resources that might do the job properly.