Importing python file from other directory - python

I have a following directory structure:
parentDirectory
|
-----------------------------------------
| |
Task Configs
- pythonFile1.py - Config1.py
- Config2.py
The Configuration files have few configuration constants defined inside a class.
Now, I want to import the configuration files from Configs directory into the python file under Tasks directory and make use of constants defined inside the class in each config file.
I had tried with adding(after reading through few answers) -
sys.path.insert(0,'/home/MyName/parentDirectory/Tasks')
inside the config files.
Since I am new to python, I don't know to what extent I am correct in adding the above lines.
Please help!

I think you got it backward: if you want your "Tasks" to be able to import your "Configs", you need to add code to the Tasks to insert the Configs path into sys.path.

Related

How to reference other .yaml files of hydra conf folder?

My conf directory looks like:
conf
- hydra
- run
- dir
- job_timestamp.yaml
main.yaml
in main.yaml I am trying to overwrite the hydra output directory with a custom structure as defined in job_timestamp.yaml:
dir: ./outputs/${status}/${now:%Y-%m-%d}/${now:%H-%M-%S}
However, writing:
hydra:
run:
dir: job_timestamp
in main.yaml doesn't reference the hydra/run/dir/job_timestamp.yaml file I have - it actually just makes the output folder name "job_timestamp"
I was wondering how I would reference my job_timestamp.yaml file in main.yaml to overwrite hydra's output directory config.
EDIT:
I originally had hydra/run/dir: job_timestamp under defaults of main.yaml, however, it would return with hydra/run/dir not found in defaults list. I am planning to have different versions of main.yaml (e.g. main2.yaml, evaluate.yaml etc.) which all would need to override the output directory with the same format, is there a way to do this so that it is DRY?
You seem to be confusing basic concepts in Hydra like config values and config groups.
You do not "reference" a config file from a config value, only from the defaults list.
Go back to the basic tutorial and make sure you understand config groups.
Hydra specific config groups are defined inside the Hydra package (here). You may want to override Hydra configs via config groups - but I only suggest you attempt that after you have a better understanding of config groups.
In the mean time, you can do it by simply overriding it directly in your main.yaml file:
main.yaml:
hydra:
run:
dir: outputs/${status}/${now:%Y-%m-%d}/${now:%H-%M-%S}

Loading external resource in a python package

So I'm developing a python package, and one of the classes pulls data from a .txt on instantiation. However when I instantiate that class in unit tests, python attempts to load a txt in the test directory rather than the class'. I'm assuming this will cause further problems when importing this package into other projects, so I want to figure out how consistently reach that target .txt (Kinda in a similar fashion to java's Class.getResource, if that helps)
This is how my project is currently set up:
rootdir
|
|----> module
| |
| |----> __init__.py
| |----> class.py
| |----> resource.txt (the resource I'm trying to target)
|
|----> tests
|
|----> __init__.py
|----> test_class.py
The inside of my class is set up like this:
class Foo:
def __init__(self, file_path='resource.txt'):
with open(file_path) as file:
**do stuff**
As of now, any attempts to provide relative pathing to my resource file causes python to search for the that relative path within /tests (i.e. file_path='module/resource.txt' leads to 'tests/module/resource.txt'). Is there any way to have python find the right resource no matter where this class is called up?
In each module, Python injects a __file__ variable at runtime with
the full path to the current module file.
So, you take this variable, reach for its parent, and append
the desired filename in the same directory.
The pathlib.Path class facilitates doing this:
from pathlib import Path
class Foo:
def __init__(self, file_name='resource.txt'):
file_path = Path(__file__).parent / file_name
with open(file_path) as file:
**do stuff**
This is the main way this is done. However, Python packages can also be run packaged in a ".zip" or ".egg" file - in that case, your "resource.txt" file won't exist on the filesystem. If you want your package to maintain the ability to run zipped, there is importlib.resources, and, in this case, its read_text call, which would work without resorting to the __file__ variable.

Inside a Python Package / Module - how to define a relative path to a file?

I am working on a Python Toolbox running on Windows, that needs to know several paths:
Folders, e.g. /path/to/output, /path/to/calc/, /fixed/path/to/server
Programs, e.g. /path/to/program.exe
Some of these paths are fixed and could be hard coded. Some of them change depending on the user.
At the moment I solve that problem using a config file config.param. This file has to be created by each user.
The reason why I chose such a config file is, that inexperienced users can simply copy this file from someone else and change the content according to their system.
The directory looks like this
Toolbox
|
+-- config.param
|
+-- package
| |
| +-- __init__.py
| |
| +-- path_and_names.py
|
+-- example
|
+-- example.py
|
+ example_sub
|
+-- example_sub.py
For internal use of all path-variables, I use a class called PathAndNames. The problem is now reduced to: This class PathAndNames needs to know the location of the file config.param. Since both are part of the Python package with a fixed relative path, I solved that by
class PathAndNames:
PATH_TO_CONFIG = os.path.normpath("../config.param")
Problem:
Since this is a Toolbox the user should be able to use it from /where/he/likes, e.g. where his Calculation-Folder is located.
He will need to create an instance of PathAndNames, which then should know all correct paths.
The problem is, that ../config.param now refers to /where/he/likes/../config.param and not the correct path inside the Toolbox.
Example:
The user is at the moment forced to write his code in a subfolder of the Toolbox, e.g. like example.py in the subfolder Toolbox/example/.
If the folder is located not in a direct subfolder of Toolbox, e.g. like example_sub.py in the subsubfolder Toolbox/example/example_sub, then the Toolbox won't work. In this example the location of ../config.param that PathAndNames knows, would be Toolbox/example/config.param.
Is there a way to define a relative path inside a Python package / module?
Are there other possible ways to solve the path problem, in a way that inexperienced users can understand what they should do?
Any ideas are appreciated.
The __file__ attribute will get the absolute file path to the module in which you call it (rather than the working directory of whichever file loads it). This allows you to define two files, say config.param and config.py, then in config.py you can put a line like the following:
import os
__here__ = os.path.join(*list(os.path.split(os.path.dirname(__file__)))[:-1])
PATH_TO_CONFIG = os.path.join(__here__, 'config.param')
From the user script, then you can get the file path as
from my_module.config import PATH_TO_CONFIG

How to make python config file, in which relative paths are defined, but when scripts in other directories import config, paths are correct?

I have the following directory structure for a program I'm writing in python:
\code\
main.py
config.py
\module_folder1\
script1.1.py
\data\
data_file1
data_file2
My config.py is a set of global variables that are set by the user, or generally fixed all the time. In particular config.py defines path variables to the 2 data files, something like path1 = os.path.abspath("../data/data_file1"). The primary use is to run main.py which imports config (and the other modules I wrote) and all is good.
But sometimes I need to run script1.1.py by itself. Ok, no problem. I can add to script1.1 the usual if __name__ == '__main__': and I can import config. But then I get path1 = "../code/data/data_file1" which doesn't exist. I thought that since the path is created in config.py the path would be relative to where config.py lives, but it's not.
So the question is, how can I have a central config file which defines relative paths, so I can import the config file to scripts in different directories and have the paths still be correct?
I should mention that the code repo will be shared among multiple machines, so hardcoding an absolute path is not an option.
You know the correct relative path to the file from the directory where config.py is located
You know the correct relative path to the directory where config.py is located (in your case, ..)
Both of this things are system-independent and do not change unless you change the structure of you project. Just add them together using os.path.join('..', config.path_repative_to_config)
(Not sure who posted this as a comment, then deleted it, but it seems to work so I'm posting as an answer.) The trick is to use os.path.dirname(__file__) in the config file, which gives the directory of the config file (/code/) regardless of where the script that imports config is.
Specifically to answer the question, in the config file define
path1 = os.path.abspath(os.path.join(os.path.join(os.path.join( os.path.dirname(__file__) , '..'), 'data' ), 'data_file1' ) )

Accessing resource files in Python unit tests & main code

I have a Python project with the following directory structure:
project/
project/src/
project/src/somecode.py
project/src/mypackage/mymodule.py
project/src/resources/
project/src/resources/datafile1.txt
In mymodule.py, I have a class (lets call it "MyClass") which needs to load datafile1.txt. This sort of works when I do:
open ("../resources/datafile1.txt")
Assuming the code that creates the MyClass instance created is run from somecode.py.
The gotcha however is that I have unit tests for mymodule.py which are defined in that file, and if I leave the relative pathname as described above, the unittest code blows up as now the code is being run from project/src/mypackage instead of project/src and the relative filepath doesn't resolve correctly.
Any suggestions for a best practice type approach to resolve this problem? If I move my testcases into project/src that clutters the main source folder with testcases.
I usually use this to get a relative path from my module. Never tried in a unittest tho.
import os
print(os.path.join(os.path.dirname(__file__),
'..',
'resources'
'datafile1.txt'))
Note: The .. tricks works pretty well, but if you change your directory structure you would need to update that part.
On top of the above answers, I'd like to add some Python 3 tricks to make your tests cleaner.
With the help of the pathlib library, you can explicit your ressources import in your tests. It even handles the separators difference between Unix (/) and Windows ().
Let's say we have a folder structure like this :
`-- tests
|-- test_1.py <-- You are here !
|-- test_2.py
`-- images
|-- fernando1.jpg <-- You want to import this image !
`-- fernando2.jpg
You are in the test_1.py file, and you want to import fernando1.jpg. With the help to the pathlib library, you can read your test resource with an object oriented logic as follows :
from pathlib import Path
current_path = Path(os.path.dirname(os.path.realpath(__file__)))
image_path = current_path / "images" / "fernando1.jpg"
with image_path.open(mode='rb') as image :
# do what you want with your image object
But there's actually convenience methods to make your code more explicit than mode='rb', as :
image_path.read_bytes() # Which reads bytes of an object
text_file_path.read_text() # Which returns you text file content as a string
And there you go !
in each directory that contains Python scripts, put a Python module that knows the path to the root of the hierarchy. It can define a single global variable with the relative path. Import this module in each script. Python searches the current directory first so it will always use the version of the module in the current directory, which will have the relative path to the root of the current directory. Then use this to find your other files. For example:
# rootpath.py
rootpath = "../../../"
# in your scripts
from rootpath import rootpath
datapath = os.path.join(rootpath, "src/resources/datafile1.txt")
If you don't want to put additional modules in each directory, you could use this approach:
Put a sentinel file in the top level of the directory structure, e.g. thisisthetop.txt. Have your Python script move up the directory hierarchy until it finds this file. Write all your pathnames relative to that directory.
Possibly some file you already have in the project directory can be used for this purpose (e.g. keep moving up until you find a src directory), or you can name the project directory in such a way to make it apparent.
You can access files in a package using importlib.resources (mind Python version compatibility of the individual functions, there are backports available as importlib_resources), as described here. Thus, if you put your resources folder into your mypackage, like
project/src/mypackage/__init__.py
project/src/mypackage/mymodule.py
project/src/mypackage/resources/
project/src/mypackage/resources/datafile1.txt
you can access your resource file in code without having to rely on inferring file locations of your scripts:
import importlib.resources
file_path = importlib.resources.files('mypackage').joinpath('resources/datafile1.txt')
with open(file_path) as f:
do_something_with(f)
Note, if you distribute your package, don't forget to include the resources/ folder when creating the package.
The filepath will be relative to the script that you initially invoked. I would suggest that you pass the relative path in as an argument to MyClass. This way, you can have different paths depending on which script is invoking MyClass.

Categories

Resources