I have the following project structure
ts_tools
/bin
/docs
/lib
/ts_tools
/data_transfer
/tests
data_import.py
__init__.py
/data_manipulation
/tests
data_smoothing.py
__init__.py
__init__.py
config.yaml.conf
setup.py
LICENSE
README.md
TODO.md
I would like to import data with the data_import.py file from an external source. I use the config.yaml.conf file to specify the absolute paths of the data with:
root_path:
windows:
data_fundamental: C:\\Users\\Malcom\\Data_Fundamental
data_event: C:\\Users\\Malcom\\Data_Event
linux:
data_fundamental: /home/data/data_fundamental
data_event: /home/data/data_event
The respective paths should be available for all tools in the ts_tools package (i.e. data_import.py and data_smoothing.py). Furthermore, the program should identify the os and choose the path structure accordingly.
I know how to set the paths with the yaml file using
import yaml
with open("config.yaml.conf", "r") as ymlfile:
cfg = yaml.load(ymlfile)
and I know how to discriminate between the os with
if platform.system().lower() == 'windows':
ROOT_DATA_PATH = cfg['windows']
else:
ROOT_DATA_PATH = cfg['linux']
but I don't know where to place these code snippets. I don't think that it is appropriate to use it in the setup.py file. On the other hand I consider it inappropriate to specify a new .py file. What is a good design structure for this problem? Where should a specify absolute file paths? Is my ansatz a step in the right direction?
Thank you in advance.
In this case, you can make it relative to the home directory, so you can have ~/data_fundamental and ~/data_event (Which should be equivalent on both platforms). You can expand this with os.path.expandhome
import os.path
def get_path(s):
return os.path.normpath(os.path.normcase(
os.path.expanduser(os.path.expandvars(s))
))
# get_path('~/data_fundamental') on Windows:
# r'c:\users\malcom\data_fundamental'
# get_path('~/data_fundamental') on Linux:
# r'/home/data/data_fundamental'
# (Assuming your username on Windows is Malcolm
# and on Linux is data and you haven't changed
# the default home path)
In any case, having two different setup things might be overly confusing, and you should expand ~ and %VARS% and ${VARS} anyways to make setting it up easier and run as expected.
Your other alternatives include:
Reading from environment variables
Writing it in setup.py (You should probably allow some way to change where the config file is as setup.py might put it in a write-protected location)
You could also not have a default at all, and when not given either make a default based on sys.platform() or raise an error message telling the user to set it.
Let's identify two different type of files/data.
Files/data written by the user or for the user during installation/deploy
Files/data written by the coder
It can be okay to have absolute paths in files/data defined by the user or generated by the program executing on the user machine.
Absolute paths are intrinsically more fragile than relative paths, but it's not that bad in the first case.
In the second case you should never use absolute paths. I see that you are even using two different paths for windows and linux. You don't have to do that and you shouldn't.
In Python you have things such as os.path.expanduser('~') to find the user path, or packages like appdirs. You want to be cross-platform as much as possible, and with Python is almost always possible.
Related
How can I get the path to the %APPDATA% directory in Python?
import os
print os.getenv('APPDATA')
You may use os.path.expandvars(path):
Return the argument with environment variables expanded. Substrings of the form $name or ${name} are replaced by the value of environment variable name. Malformed variable names and references to non-existing variables are left unchanged.
On Windows, %name% expansions are supported in addition to $name and ${name}.
This comes handy when combining the expanded value with other path components.
Example:
from os import path
sendto_dir = path.expandvars(r'%APPDATA%\Microsoft\Windows\SendTo')
dumps_dir = path.expandvars(r'%LOCALAPPDATA%\CrashDumps')
Although the question clearly asks about the Windows-specific %APPDATA% directory, perhaps you have ended up here looking for a cross-platform solution for getting the application data directory for the current user, which varies by OS.
As of Python 3.11, somewhat surprisingly, there is no built-in function to find this directory. However, there are third-party packages, the most popular of which seems to be appdirs, which provides functions to retrieve paths such as:
user data dir (user_data_dir)
user config dir (user_config_dir)
user cache dir (user_cache_dir)
site data dir (site_data_dir)
site config dir (site_config_dir)
user log dir (user_log_dir)
You can try doing:
import os
path = os.getenv('APPDATA')
array = os.listdir(path)
print array
You can use module called appdata. It was developed to get access to different paths for your application, including app data folder. Install it:
pip install appdata
And after that you can use it this way:
from appdata import AppDataPaths
app_paths = AppDataPaths()
app_paths.app_data_path # for your app data path
app_paths.logs_path # for logs folder path for your application
It allows to to get not only app data folder and logs folder but has other features to manage paths like managing config files paths. And it's customizable.
Links:
Read the Docs - documentation.
GitHub - source code.
PyPI - package manager (pip).
I printed this page and am gratfull for this.
Also tryed to configure "%APPDATA%"in this Dispositive,
using: "notepad", doesn't knowing to archive the sugested condiguration. Also copied sotoz and Aominé contribuiters. tks.
efk14it.
I have the following directory structure for a program I'm writing in python:
\code\
main.py
config.py
\module_folder1\
script1.1.py
\data\
data_file1
data_file2
My config.py is a set of global variables that are set by the user, or generally fixed all the time. In particular config.py defines path variables to the 2 data files, something like path1 = os.path.abspath("../data/data_file1"). The primary use is to run main.py which imports config (and the other modules I wrote) and all is good.
But sometimes I need to run script1.1.py by itself. Ok, no problem. I can add to script1.1 the usual if __name__ == '__main__': and I can import config. But then I get path1 = "../code/data/data_file1" which doesn't exist. I thought that since the path is created in config.py the path would be relative to where config.py lives, but it's not.
So the question is, how can I have a central config file which defines relative paths, so I can import the config file to scripts in different directories and have the paths still be correct?
I should mention that the code repo will be shared among multiple machines, so hardcoding an absolute path is not an option.
You know the correct relative path to the file from the directory where config.py is located
You know the correct relative path to the directory where config.py is located (in your case, ..)
Both of this things are system-independent and do not change unless you change the structure of you project. Just add them together using os.path.join('..', config.path_repative_to_config)
(Not sure who posted this as a comment, then deleted it, but it seems to work so I'm posting as an answer.) The trick is to use os.path.dirname(__file__) in the config file, which gives the directory of the config file (/code/) regardless of where the script that imports config is.
Specifically to answer the question, in the config file define
path1 = os.path.abspath(os.path.join(os.path.join(os.path.join( os.path.dirname(__file__) , '..'), 'data' ), 'data_file1' ) )
I've looked around the docs on the pytest website, but haven't found a clear example of working with 'test resources', such as reading in fixed files during unit tests. Something similar to what http://jlorenzen.blogspot.com/2007/06/proper-way-to-access-file-resources-in.html describes for Java.
For example, if I have a yaml file checked in to source control, what is the right way to write a test which loads from that file? I think this boils down to understanding the right way to access a 'resource file' on the python equivalent of the classpath (PYTHONPATH?).
This seems like it should be simple. Is there an easy solution?
Perhaps what you are looking for is pkg_resources or pkgutil. For example, if you have a module within your python source called "resources", you could read your "resourcefile" using:
with open(pkg_resources.resource_filename("resources", "resourcefile")) as infile:
for line in infile:
print(line)
or:
with tempfile.TemporaryFile() as outfile:
outfile.write(pkgutil.get_data("resources", "resourcefile"))
The latter even works when your "script" is an executable zip file. The former works without needing to unpack your resources from an egg.
Note that creating a subdirectory of your source does not make it a module. You need to add a file named __init__.py within the directory for it to be visible as a module for the purposes of pkg_resources and pkgutil. __init__.py can be empty.
I think "resource file" is whatever definition you give to it in python (in Java, resource files can be bundled into jar files with ordinary Java classes, and Java provides library functions to access this information).
An equivalent solution might be to access the PYTHONPATH environment variable, define your "resource file" as a relative path, and then troll the PYTHONPATH looking for it. Here's an example:
pythonpath = os.env['PYTHONPATH']
file_relative_path = os.path.join('subdir', 'resourcefile') // e.g. subdir/resourcefile
for dir in pythonpath.split(os.pathsep):
resource_path = os.path.join(dir, file_relative_path)
if os.path.exists(resource_path):
return resource_path
This code snippet returns a full path for the first file that exists on the PYTHONPATH.
I have a Python project with the following directory structure:
project/
project/src/
project/src/somecode.py
project/src/mypackage/mymodule.py
project/src/resources/
project/src/resources/datafile1.txt
In mymodule.py, I have a class (lets call it "MyClass") which needs to load datafile1.txt. This sort of works when I do:
open ("../resources/datafile1.txt")
Assuming the code that creates the MyClass instance created is run from somecode.py.
The gotcha however is that I have unit tests for mymodule.py which are defined in that file, and if I leave the relative pathname as described above, the unittest code blows up as now the code is being run from project/src/mypackage instead of project/src and the relative filepath doesn't resolve correctly.
Any suggestions for a best practice type approach to resolve this problem? If I move my testcases into project/src that clutters the main source folder with testcases.
I usually use this to get a relative path from my module. Never tried in a unittest tho.
import os
print(os.path.join(os.path.dirname(__file__),
'..',
'resources'
'datafile1.txt'))
Note: The .. tricks works pretty well, but if you change your directory structure you would need to update that part.
On top of the above answers, I'd like to add some Python 3 tricks to make your tests cleaner.
With the help of the pathlib library, you can explicit your ressources import in your tests. It even handles the separators difference between Unix (/) and Windows ().
Let's say we have a folder structure like this :
`-- tests
|-- test_1.py <-- You are here !
|-- test_2.py
`-- images
|-- fernando1.jpg <-- You want to import this image !
`-- fernando2.jpg
You are in the test_1.py file, and you want to import fernando1.jpg. With the help to the pathlib library, you can read your test resource with an object oriented logic as follows :
from pathlib import Path
current_path = Path(os.path.dirname(os.path.realpath(__file__)))
image_path = current_path / "images" / "fernando1.jpg"
with image_path.open(mode='rb') as image :
# do what you want with your image object
But there's actually convenience methods to make your code more explicit than mode='rb', as :
image_path.read_bytes() # Which reads bytes of an object
text_file_path.read_text() # Which returns you text file content as a string
And there you go !
in each directory that contains Python scripts, put a Python module that knows the path to the root of the hierarchy. It can define a single global variable with the relative path. Import this module in each script. Python searches the current directory first so it will always use the version of the module in the current directory, which will have the relative path to the root of the current directory. Then use this to find your other files. For example:
# rootpath.py
rootpath = "../../../"
# in your scripts
from rootpath import rootpath
datapath = os.path.join(rootpath, "src/resources/datafile1.txt")
If you don't want to put additional modules in each directory, you could use this approach:
Put a sentinel file in the top level of the directory structure, e.g. thisisthetop.txt. Have your Python script move up the directory hierarchy until it finds this file. Write all your pathnames relative to that directory.
Possibly some file you already have in the project directory can be used for this purpose (e.g. keep moving up until you find a src directory), or you can name the project directory in such a way to make it apparent.
You can access files in a package using importlib.resources (mind Python version compatibility of the individual functions, there are backports available as importlib_resources), as described here. Thus, if you put your resources folder into your mypackage, like
project/src/mypackage/__init__.py
project/src/mypackage/mymodule.py
project/src/mypackage/resources/
project/src/mypackage/resources/datafile1.txt
you can access your resource file in code without having to rely on inferring file locations of your scripts:
import importlib.resources
file_path = importlib.resources.files('mypackage').joinpath('resources/datafile1.txt')
with open(file_path) as f:
do_something_with(f)
Note, if you distribute your package, don't forget to include the resources/ folder when creating the package.
The filepath will be relative to the script that you initially invoked. I would suggest that you pass the relative path in as an argument to MyClass. This way, you can have different paths depending on which script is invoking MyClass.
I have a fair number of Python scripts that contain reusable code that are used and referenced by other Python scripts. However, these scripts tend to be scattered across different directories and I find it to be somewhat tedious to have to include (most often multiple) calls to sys.path.append on my top-level scripts. I just want to provide the 'import' statements without the additional file references in the same script.
Currently, I have this:
import sys
sys.path.append('..//shared1//reusable_foo')
import Foo
sys.path.append('..//shared2//reusable_bar')
import Bar
My preference would be the following:
import Foo
import Bar
My background is primarily in the .NET platform so I am accustomed to having meta files such as *.csproj, *.vbproj, *.sln, etc. to manage and contain the actual file path references outside of the source files. This allows me to just provide 'using' directives (equivalent to Python's import) without exposing all of the references and allowing for reuse of the path references themselves across multiple scripts.
Does Python have equivalent support for this and, if not, what are some techniques and approaches?
The simple answer is to put your reusable code in your site-packages directory, which is in your sys.path.
You can also extend the search path by adding .pth files somewhere in your path.
See https://docs.python.org/2/install/#modifying-python-s-search-path for more details
Oh, and python 2.6/3.0 adds support for PEP370, Per-user site-packages Directory
If your reusable files are packaged (that is, they include an __init__.py file) and the path to that package is part of your PYTHONPATH or sys.path then you should be able to do just
import Foo
This question provides a few more details.
(Note: As Jim said, you could also drop your reusable code into your site-packages directory.)
You can put the reusable stuff in site-packages. That's completely transparent, since it's in sys.path by default.
You can put someName.pth files in site-packages. These files have the directory in which your actual reusable stuff lives. This is also completely transparent. And doesn't involve the extra step of installing a change in site-packages.
You can put the directory of the reusable stuff on PYTHONPATH. That's a little less transparent, because you have to make sure it's set. Not rocket science, but not completely transparent.
In one project, I wanted to make sure that the user could put python scripts (that could basically be used as plugins) anywhere. My solution was to put the following in the config file for that project:
[server]
PYPATH_APPEND: /home/jason:/usr/share/some_directory
That way, this would add /home/jason and /usr/share/some_directory to the python path at program launch.
Then, it's just a simple matter of splitting the string by the colons and adding those directories to the end of the sys.path. You may want to consider putting a module in the site-packages directory that contains a function to read in that config file and add those directories to the sys.path (unfortunately, I don't have time at the moment to write an example).
As others have mentioned, it's a good idea to put as much in site-packages as possible and also using .pth files. But this can be a good idea if you have a script that needs to import a bunch of stuff that's not in site-packages that you wouldn't want to import from other scripts.
(there may also be a way to do this using .pth files, but I like being able to manipulate the python path in the same place as I put the rest of my configuration info)
The simplest way is to set (or add to) PYTHONPATH, and put (or symlink) your modules and packages into a path contained in PYTHONPATH.
My solution was to package up one utility that would import the module:
my_util is in site packages
import my_util
foo = myutil.import_script('..//shared1//reusable_foo')
if foo == None:
sys.exit(1)
def import_script(script_path, log_status = True):
"""
imports a module and returns the handle
"""
lpath = os.path.split(script_path)
if lpath[1] == '':
log('Error in script "%s" in import_script' % (script_path))
return None
#check if path is already in sys.path so we don't repeat
npath = None
if lpath[0] == '':
npath = '.'
else:
if lpath[0] not in sys.path:
npath = lpath[0]
if npath != None:
try:
sys.path.append(npath)
except:
if log_status == True:
log('Error adding path "%s" in import_script' % npath)
return None
try:
mod = __import__(lpath[1])
except:
error_trace,error_reason = FormatExceptionInfo()
if log_status == True:
log('Error importing "%s" module in import_script: %s' % (script_path, error_trace + error_reason))
sys.path.remove(npath)
return None
return mod