I am working on a python3 app with a fairly simple file structure, but I'm having issues reading a text file in a script, both of which are lower in the file structure than the script calling them. To be absolutely clear, the file structure is as follows:
app/
|- cli-script
|- app_core/
|- dictionary.txt
|- lib.py
cli-script calls lib.py, and lib.py requires dictionary.txt to do what I need it to, so it gets opened and read in lib.py.
The very basics of cli-script looks like this:
from app_core import lib
def cli_func():
x = lib.Lib_Class()
x.lib_func()
The problem area of lib is here:
class Lib_Class:
def __init__(self):
dictionary = open('dictionary.txt')
The problem I'm getting is that while I have this file structure, the lib file can't find the dictionary file, returning a FileNotFoundError. I would prefer to only use relative paths for portability reasons, but otherwise I just need to make the solution OS agnostic. Symlinks are a last resort option I've figured out, but I want to avoid it at all costs. What are my options?
When you run a Python script, calls involving paths are executed relative to where you run them from, not where the files are actually from.
The __file__ variable stores the path of the current file (no matter where it is), so relative files will be siblings to that.
In your structure, __file__ refers to the path app/app_core/lib.py, so to create app/app_core/dictionary.txt, you need to co up and then down again.
app/app_core/lib.py
import os.path
class Lib_Class:
def __init__(self):
path = os.path.join(os.path.dirname(__file__), 'dictionary.txt')
dictionary = open(path)
or using pathlib
path = pathlib.Path(__file__).parent / 'dictionary.txt'
Because you are expecting the dictionary.txt to be present in the same path as your lib.py file you can do the following.
Instead of dictionary = open('dictionary.txt') use
dictionary = open(Path(__file__).parent / 'dictionary.txt')
Related
So I'm developing a python package, and one of the classes pulls data from a .txt on instantiation. However when I instantiate that class in unit tests, python attempts to load a txt in the test directory rather than the class'. I'm assuming this will cause further problems when importing this package into other projects, so I want to figure out how consistently reach that target .txt (Kinda in a similar fashion to java's Class.getResource, if that helps)
This is how my project is currently set up:
rootdir
|
|----> module
| |
| |----> __init__.py
| |----> class.py
| |----> resource.txt (the resource I'm trying to target)
|
|----> tests
|
|----> __init__.py
|----> test_class.py
The inside of my class is set up like this:
class Foo:
def __init__(self, file_path='resource.txt'):
with open(file_path) as file:
**do stuff**
As of now, any attempts to provide relative pathing to my resource file causes python to search for the that relative path within /tests (i.e. file_path='module/resource.txt' leads to 'tests/module/resource.txt'). Is there any way to have python find the right resource no matter where this class is called up?
In each module, Python injects a __file__ variable at runtime with
the full path to the current module file.
So, you take this variable, reach for its parent, and append
the desired filename in the same directory.
The pathlib.Path class facilitates doing this:
from pathlib import Path
class Foo:
def __init__(self, file_name='resource.txt'):
file_path = Path(__file__).parent / file_name
with open(file_path) as file:
**do stuff**
This is the main way this is done. However, Python packages can also be run packaged in a ".zip" or ".egg" file - in that case, your "resource.txt" file won't exist on the filesystem. If you want your package to maintain the ability to run zipped, there is importlib.resources, and, in this case, its read_text call, which would work without resorting to the __file__ variable.
This question already has answers here:
How to read a (static) file from inside a Python package?
(6 answers)
Closed 2 years ago.
I am writing a python package with modules that need to open data files in a ./data/ subdirectory. Right now I have the paths to the files hardcoded into my classes and functions. I would like to write more robust code that can access the subdirectory regardless of where it is installed on the user's system.
I've tried a variety of methods, but so far I have had no luck. It seems that most of the "current directory" commands return the directory of the system's python interpreter, and not the directory of the module.
This seems like it ought to be a trivial, common problem. Yet I can't seem to figure it out. Part of the problem is that my data files are not .py files, so I can't use import functions and the like.
Any suggestions?
Right now my package directory looks like:
/
__init__.py
module1.py
module2.py
data/
data.txt
I am trying to access data.txt from module*.py!
The standard way to do this is with setuptools packages and pkg_resources.
You can lay out your package according to the following hierarchy, and configure the package setup file to point it your data resources, as per this link:
http://docs.python.org/distutils/setupscript.html#installing-package-data
You can then re-find and use those files using pkg_resources, as per this link:
http://peak.telecommunity.com/DevCenter/PkgResources#basic-resource-access
import pkg_resources
DATA_PATH = pkg_resources.resource_filename('<package name>', 'data/')
DB_FILE = pkg_resources.resource_filename('<package name>', 'data/sqlite.db')
There is often not point in making an answer that details code that does not work as is, but I believe this to be an exception. Python 3.7 added importlib.resources that is supposed to replace pkg_resources. It would work for accessing files within packages that do not have slashes in their names, i.e.
foo/
__init__.py
module1.py
module2.py
data/
data.txt
data2.txt
i.e. you could access data2.txt inside package foo with for example
importlib.resources.open_binary('foo', 'data2.txt')
but it would fail with an exception for
>>> importlib.resources.open_binary('foo', 'data/data.txt')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python3.7/importlib/resources.py", line 87, in open_binary
resource = _normalize_path(resource)
File "/usr/lib/python3.7/importlib/resources.py", line 61, in _normalize_path
raise ValueError('{!r} must be only a file name'.format(path))
ValueError: 'data/data2.txt' must be only a file name
This cannot be fixed except by placing __init__.py in data and then using it as a package:
importlib.resources.open_binary('foo.data', 'data.txt')
The reason for this behaviour is "it is by design"; but the design might change...
You can use __file__ to get the path to the package, like this:
import os
this_dir, this_filename = os.path.split(__file__)
DATA_PATH = os.path.join(this_dir, "data", "data.txt")
print open(DATA_PATH).read()
To provide a solution working today. Definitely use this API to not reinvent all those wheels.
A true filesystem filename is needed. Zipped eggs will be extracted to a cache directory:
from pkg_resources import resource_filename, Requirement
path_to_vik_logo = resource_filename(Requirement.parse("enb.portals"), "enb/portals/reports/VIK_logo.png")
Return a readable file-like object for the specified resource; it may be an actual file, a StringIO, or some similar object. The stream is in “binary mode”, in the sense that whatever bytes are in the resource will be read as-is.
from pkg_resources import resource_stream, Requirement
vik_logo_as_stream = resource_stream(Requirement.parse("enb.portals"), "enb/portals/reports/VIK_logo.png")
Package Discovery and Resource Access using pkg_resources
https://setuptools.readthedocs.io/en/latest/pkg_resources.html#resource-extraction
https://setuptools.readthedocs.io/en/latest/pkg_resources.html#basic-resource-access
You need a name for your whole module, you're given directory tree doesn't list that detail, for me this worked:
import pkg_resources
print(
pkg_resources.resource_filename(__name__, 'data/data.txt')
)
Notibly setuptools does not appear to resolve files based on a name match with packed data files, soo you're gunna have to include the data/ prefix pretty much no matter what. You can use os.path.join('data', 'data.txt) if you need alternate directory separators, Generally I find no compatibility problems with hard-coded unix style directory separators though.
I think I hunted down an answer.
I make a module data_path.py, which I import into my other modules containing:
data_path = os.path.join(os.path.dirname(__file__),'data')
And then I open all my files with
open(os.path.join(data_path,'filename'), <param>)
I'm creating a package with the following structure
/package
__init__.py
/sub_package_1
__init__.py
other_stuff.py
/sub_package_2
__init__.py
calc_stuff.py
/results_dir
I want to ensure that calc_stuff.py will save results to /results_dir, unless otherwise specified (yes, I'm not entirely certain having a results directory in my package is the best idea, but it should work well for now). However, since I don't know from where, or on which machine calc_stuff will be run, I need the package, or at least my_calc.py, to know where it is saved.
So far the two approaches I have tried:
from os import path
saved_dir = path.join(path.dirname(__file__), 'results_dir')
and
from pkg_resources import resource_filename
filepath = resource_filename(__name__, 'results_dir')
have only given me paths relative to the root of the package.
What do I need to do to ensure a statement along the lines of:
pickle.dump(my_data,open(os.path.join(full_path,
'results_dir',
'results.pkl'), 'wb')
Will result in a pickle file being saved into results_dir ?
I'm not entirely certain having a results directory in my package is the best idea, me either :)
But, if you were to put a function like the following inside a module in subpackage2, it should return a path consisting of (module path minus filename, 'results_dir', the filename you passed the function as an argument):
def get_save_path(filename):
import os
return os.path.join(os.path.dirname(__file__), "results_dir", filename)
C:\Users\me\workspaces\workspace-oxygen\test36\TestPackage\results_dir\foo.ext
I have the following directory structure for a program I'm writing in python:
\code\
main.py
config.py
\module_folder1\
script1.1.py
\data\
data_file1
data_file2
My config.py is a set of global variables that are set by the user, or generally fixed all the time. In particular config.py defines path variables to the 2 data files, something like path1 = os.path.abspath("../data/data_file1"). The primary use is to run main.py which imports config (and the other modules I wrote) and all is good.
But sometimes I need to run script1.1.py by itself. Ok, no problem. I can add to script1.1 the usual if __name__ == '__main__': and I can import config. But then I get path1 = "../code/data/data_file1" which doesn't exist. I thought that since the path is created in config.py the path would be relative to where config.py lives, but it's not.
So the question is, how can I have a central config file which defines relative paths, so I can import the config file to scripts in different directories and have the paths still be correct?
I should mention that the code repo will be shared among multiple machines, so hardcoding an absolute path is not an option.
You know the correct relative path to the file from the directory where config.py is located
You know the correct relative path to the directory where config.py is located (in your case, ..)
Both of this things are system-independent and do not change unless you change the structure of you project. Just add them together using os.path.join('..', config.path_repative_to_config)
(Not sure who posted this as a comment, then deleted it, but it seems to work so I'm posting as an answer.) The trick is to use os.path.dirname(__file__) in the config file, which gives the directory of the config file (/code/) regardless of where the script that imports config is.
Specifically to answer the question, in the config file define
path1 = os.path.abspath(os.path.join(os.path.join(os.path.join( os.path.dirname(__file__) , '..'), 'data' ), 'data_file1' ) )
I have a Python project with the following directory structure:
project/
project/src/
project/src/somecode.py
project/src/mypackage/mymodule.py
project/src/resources/
project/src/resources/datafile1.txt
In mymodule.py, I have a class (lets call it "MyClass") which needs to load datafile1.txt. This sort of works when I do:
open ("../resources/datafile1.txt")
Assuming the code that creates the MyClass instance created is run from somecode.py.
The gotcha however is that I have unit tests for mymodule.py which are defined in that file, and if I leave the relative pathname as described above, the unittest code blows up as now the code is being run from project/src/mypackage instead of project/src and the relative filepath doesn't resolve correctly.
Any suggestions for a best practice type approach to resolve this problem? If I move my testcases into project/src that clutters the main source folder with testcases.
I usually use this to get a relative path from my module. Never tried in a unittest tho.
import os
print(os.path.join(os.path.dirname(__file__),
'..',
'resources'
'datafile1.txt'))
Note: The .. tricks works pretty well, but if you change your directory structure you would need to update that part.
On top of the above answers, I'd like to add some Python 3 tricks to make your tests cleaner.
With the help of the pathlib library, you can explicit your ressources import in your tests. It even handles the separators difference between Unix (/) and Windows ().
Let's say we have a folder structure like this :
`-- tests
|-- test_1.py <-- You are here !
|-- test_2.py
`-- images
|-- fernando1.jpg <-- You want to import this image !
`-- fernando2.jpg
You are in the test_1.py file, and you want to import fernando1.jpg. With the help to the pathlib library, you can read your test resource with an object oriented logic as follows :
from pathlib import Path
current_path = Path(os.path.dirname(os.path.realpath(__file__)))
image_path = current_path / "images" / "fernando1.jpg"
with image_path.open(mode='rb') as image :
# do what you want with your image object
But there's actually convenience methods to make your code more explicit than mode='rb', as :
image_path.read_bytes() # Which reads bytes of an object
text_file_path.read_text() # Which returns you text file content as a string
And there you go !
in each directory that contains Python scripts, put a Python module that knows the path to the root of the hierarchy. It can define a single global variable with the relative path. Import this module in each script. Python searches the current directory first so it will always use the version of the module in the current directory, which will have the relative path to the root of the current directory. Then use this to find your other files. For example:
# rootpath.py
rootpath = "../../../"
# in your scripts
from rootpath import rootpath
datapath = os.path.join(rootpath, "src/resources/datafile1.txt")
If you don't want to put additional modules in each directory, you could use this approach:
Put a sentinel file in the top level of the directory structure, e.g. thisisthetop.txt. Have your Python script move up the directory hierarchy until it finds this file. Write all your pathnames relative to that directory.
Possibly some file you already have in the project directory can be used for this purpose (e.g. keep moving up until you find a src directory), or you can name the project directory in such a way to make it apparent.
You can access files in a package using importlib.resources (mind Python version compatibility of the individual functions, there are backports available as importlib_resources), as described here. Thus, if you put your resources folder into your mypackage, like
project/src/mypackage/__init__.py
project/src/mypackage/mymodule.py
project/src/mypackage/resources/
project/src/mypackage/resources/datafile1.txt
you can access your resource file in code without having to rely on inferring file locations of your scripts:
import importlib.resources
file_path = importlib.resources.files('mypackage').joinpath('resources/datafile1.txt')
with open(file_path) as f:
do_something_with(f)
Note, if you distribute your package, don't forget to include the resources/ folder when creating the package.
The filepath will be relative to the script that you initially invoked. I would suggest that you pass the relative path in as an argument to MyClass. This way, you can have different paths depending on which script is invoking MyClass.