Python open filename from custom PATH - python

Similar to the system path, I want to offer some convenience in my code allowing a user to specify a file name that could be in one of a handful of paths.
Say I had two or more config paths
['~/.foo-config/', '/usr/local/myapp/foo-config/']
And my user wants to open bar, (AKA bar.baz)
Is there a convenient build in way to let open('bar') or open('bar.baz') automatically search these paths for that file in LTR order of precedence? Eg, will temporary adjusting my sys.path to only be these directories do this for me?
Else, how would you suggest implementing a PATH-like searching open-wrapper?

As other people already mentioned: sys.path only affects the module search path, i.e. it's relevant for importing Python modules, but not at all for open().
I would suggest separating the logic for searching the paths in order of precedence and opening the file, because that way it's easier to test and read.
I would do something like this:
import os
PATHS = ['~/.foo-config/', '/usr/local/myapp/foo-config/']
def find_first(filename, paths):
for directory in paths:
full_path = os.path.join(directory, filename)
if os.path.isfile(full_path):
return full_path
def main():
filename = 'file.txt'
path = find_first(filename, PATHS)
if path:
with open(path) as f:
print f
else:
print "File {} not found in any of the directories".format(filename)
if __name__ == '__main__':
main()

open doesn't get into that kind of logic. If you want, write a wrapper function that uses os.path.join to join each member of sys.path to the parameter filename, and tries to open them in order, handling the error that occurs when no such file is found.
I'll add that, as another user stated, this is kind of a misuse of sys.path, but this function would work for any list of paths. Indeed, maybe the nicest option is to use the environment variables suggested by another user to specify a colon-delimited list of config directories, which you then parse and use within your search function.

environmental variables
say your app is named foo ... in the readme tell the user to use the FOO_PATH environmental variable to specify the extra paths
then inside your app do something like
for path in os.environ.get("FOO_PATH",".").split(";"):
lookfor(os.path.join(path,"somefile.txt"))
you could wrap it into a generic function
def open_foo(fname):
for path in os.environ.get("FOO_PATH",".").split(";"):
path_to_test = os.path.join(path,"somefile.txt")
if os.path.exists(path_to_test):
return open(path_to_test)
raise Exception("No File Found On FOOPATH")
then you could use it just like normal open
with open_foo("my_config.txt") as f:
print f.read()

Extract from Python Standard Library documentation for open built-in function:
open(file, mode='r', buffering=-1, encoding=None, errors=None, newline=None, closefd=True, opener=None)
...file is either a string or bytes object giving the pathname (absolute or relative to the current working directory) of the file to be opened ...
Explicitely, open does not bring anything to automagically find a file : if path is not absolute, it is only searched in current directory.
So you will have to use a custom function or a custom class for that. For example:
class path_opener(object):
def __init__(self, path = [.]):
self.path = path
def set(self, path):
self.path = path
def append(self, path):
self.path.append(path)
def extent(self, path):
self.path.extend(path)
def find(self, file):
for folder in self.path:
path = os.path.join(folder, file)
if os.path.isfile(path):
return path
raise FileNotFoundError()
def open(self, file, *args, **kwargs):
return open(self.find(file), *args, **kwargs)
That means that a file opener will keep its own path, will be initialized by default with current path, will have methods to set, append to or extend its path, and will normaly raise a FileNotFoundError is a file is not found in any of the directories listed in its path.
Usage :
o = path_opener(['~/.foo-config/', '/usr/local/myapp/foo-config/'])
with o.open('foo') as fd:
...

Related

Python equivalent of Matlab addpath

Does python have an equivalent to Matlab's addpath? I know about sys.path.append, but that only seems to work for python files/modules, not for general files.
Suppose I have a file config.txt in C:\Data and the current working directory is something else, say D:\my_project.
I would like to have code similar to:
def foo():
with open('config.txt') as f:
print(f.read())
def main():
addpath(r'C:\Data')
foo()
Obviously I could pass the path to foo here, but that is very difficult in the actual use case.
You can't add multiple paths like you would in matlab.
You can use os.chdir to change directories, and access files and sub-directories from that directory:
import os
def foo():
with open('config.txt') as f:
print(f.read())
def main():
os.chdir(r'C:\Data')
foo()
To manage multiple directories, using a context manager that returns to the previous directory after the expiration of the context works:
import contextlib
import os
#contextlib.contextmanager
def working_directory(path):
prev_cwd = os.getcwd()
os.chdir(path)
yield
os.chdir(prev_cwd)
def foo():
with open('config.txt') as f:
print(f.read())
def main():
with working_directory(r'C:\Data'):
foo()
# previous working directory
No, it doesn't. Python doesn't work this way. Files are loaded from the current working directory or a specific, resolved path. There is no such thing as a set of pre-defined paths for loading arbitrary files.
Keeping data and program logic separate is an important concept in Python (and most other programming languages besides MATLAB). An key principle of python is "Explicit is better than implicit." Making sure the data file you want to load is defined explicitly is much safer, more reliable, and less error-prone.
So although others have shown how you can hack some workarounds, I would very, very strongly advise you to not use this approach. It is going to make maintaining your code much harder.
You can use the os.chdir functionality along with your own open function to check all the paths you want to.
class FileOpener(object):
def __init__(self):
self.paths = []
def add_path(self, path):
self.paths.append(path)
def __open_path(self, path, *args, **kwargs):
old_path = os.getcwd()
try:
os.chdir(path)
return open(*args, **kwargs)
except:
pass
finally:
os.chdir(old_path)
def open(self, *args, **kwargs):
for path in self.paths + [os.getcwd()]:
f = self.__open_path(path, *args, **kwargs)
if f is not None:
return f
raise IOError("no such file")
my_file_opener = FileOpener()
my_file_opener.add_path("C:/Data")
my_file_opener.add_path("C:/Blah")
my_file_opener.open("some_file") # checks in C:/Data, C:/Blah and then the current working directory, returns the first file named "some_file" that it finds, and raises an IOError otherwise
sorry, I made a mistake.
I guess you can use this to solve it. (as a stack newbee, I hope it helps)
import sys
sys.path.append(r'path')
def foo(prefix):
path = prefix + 'config.txt'
with open(path) as f:
print(f.read())
def main():
foo(r'C:\Data\')
---update----
import os
class changeFileDir:
def __init__(self, path):
self.path= os.path.expanduser(path)
def __enter__(self):
self.savedPath = os.getcwd()
os.chdir(self.path)
def __exit__(self, etype, value, traceback):
os.chdir(self.savedPath)
with changeFileDir(r'C:\Data'):
foo()

Python zip a sub folder and not the entire folder path

I have a program to zip all the contents in a folder. I did not write this code but I found it somewhere online and I am using it. I intend to zip a folder for example say, C:/folder1/folder2/folder3/ . I want to zip folder3 and all its contents in a file say folder3.zip. With the below code, once i zip it, the contents of folder3.zip wil be folder1/folder2/folder3/and files. I do not want the entire path to be zipped and i only want the subfolder im interested to zip (folder3 in this case). I tried some os.chdir etc, but no luck.
def makeArchive(fileList, archive):
"""
'fileList' is a list of file names - full path each name
'archive' is the file name for the archive with a full path
"""
try:
a = zipfile.ZipFile(archive, 'w', zipfile.ZIP_DEFLATED)
for f in fileList:
print "archiving file %s" % (f)
a.write(f)
a.close()
return True
except: return False
def dirEntries(dir_name, subdir, *args):
# Creates a list of all files in the folder
'''Return a list of file names found in directory 'dir_name'
If 'subdir' is True, recursively access subdirectories under 'dir_name'.
Additional arguments, if any, are file extensions to match filenames. Matched
file names are added to the list.
If there are no additional arguments, all files found in the directory are
added to the list.
Example usage: fileList = dirEntries(r'H:\TEMP', False, 'txt', 'py')
Only files with 'txt' and 'py' extensions will be added to the list.
Example usage: fileList = dirEntries(r'H:\TEMP', True)
All files and all the files in subdirectories under H:\TEMP will be added
to the list. '''
fileList = []
for file in os.listdir(dir_name):
dirfile = os.path.join(dir_name, file)
if os.path.isfile(dirfile):
if not args:
fileList.append(dirfile)
else:
if os.path.splitext(dirfile)[1][1:] in args:
fileList.append(dirfile)
# recursively access file names in subdirectories
elif os.path.isdir(dirfile) and subdir:
print "Accessing directory:", dirfile
fileList.extend(dirEntries(dirfile, subdir, *args))
return fileList
You can call this by makeArchive(dirEntries(folder, True), zipname).
Any ideas as to how to solve this problem? I am uing windows OS annd python 25, i know in python 2.7 there is shutil make_archive which helps but since i am working on 2.5 i need another solution :-/
You'll have to give an arcname argument to ZipFile.write() that uses a relative path. Do this by giving the root path to remove to makeArchive():
def makeArchive(fileList, archive, root):
"""
'fileList' is a list of file names - full path each name
'archive' is the file name for the archive with a full path
"""
a = zipfile.ZipFile(archive, 'w', zipfile.ZIP_DEFLATED)
for f in fileList:
print "archiving file %s" % (f)
a.write(f, os.path.relpath(f, root))
a.close()
and call this with:
makeArchive(dirEntries(folder, True), zipname, folder)
I've removed the blanket try:, except:; there is no use for that here and only serves to hide problems you want to know about.
The os.path.relpath() function returns a path relative to root, effectively removing that root path from the archive entry.
On python 2.5, the relpath function is not available; for this specific usecase the following replacement would work:
def relpath(filename, root):
return filename[len(root):].lstrip(os.path.sep).lstrip(os.path.altsep)
and use:
a.write(f, relpath(f, root))
Note that the above relpath() function only works for your specific case where filepath is guaranteed to start with root; on Windows the general case for relpath() is a lot more complex. You really want to upgrade to Python 2.6 or newer if at all possible.
ZipFile.write has an optional argument arcname. Use this to remove parts of the path.
You could change your method to be:
def makeArchive(fileList, archive, path_prefix=None):
"""
'fileList' is a list of file names - full path each name
'archive' is the file name for the archive with a full path
"""
try:
a = zipfile.ZipFile(archive, 'w', zipfile.ZIP_DEFLATED)
for f in fileList:
print "archiving file %s" % (f)
if path_prefix is None:
a.write(f)
else:
a.write(f, f[len(path_prefix):] if f.startswith(path_prefix) else f)
a.close()
return True
except: return False
Martijn's approach using os.path is much more elegant, though.

Is it enough to test for '/../' in a file path to prevent escaping a subdirectory?

I have some code in python that checks that a file path access a file in a subdirectory.
It is for a web server to access files in the static folder.
I use the following code snippet :
path = 'static/' + path
try:
if '/../' in path:
raise RuntimeError('/../ in static file path')
f = open(path)
except (RuntimeError, IOError):
app.abort(404)
return
If the path is clean it would be enough.
Could there be ways to write a path accessing parent directories that would not be detected by this simple test ?
I would suggest using os.path.relpath, it takes a path and works out the most concise relative path from the given directory. That way you only need to test if the path starts with ".."
eg.
path = ...
relativePath = os.path.relpath(path)
if relativePath.startswith(".."):
raise RuntimeError("path escapes static directory")
completePath = "static/" + relativePath
You could also use os.readlink to replace symbolic links with a real path if sym links are something you have to worry about.
Flask has a few helper functions that I think you can copy over to your code without problems. The recommended syntax is:
filename = secure_filename(dirty_filename)
path = os.path.join(upload_folder, filename)
Werkzeug implements secure_filename and uses this code to clean filenames:
_filename_ascii_strip_re = re.compile(r'[^A-Za-z0-9_.-]')
_windows_device_files = ('CON', 'AUX', 'COM1', 'COM2', 'COM3', 'COM4', 'LPT1',
'LPT2', 'LPT3', 'PRN', 'NUL')
def secure_filename(filename):
r"""Pass it a filename and it will return a secure version of it. This
filename can then safely be stored on a regular file system and passed
to :func:`os.path.join`. The filename returned is an ASCII only string
for maximum portability.
On windows system the function also makes sure that the file is not
named after one of the special device files.
>>> secure_filename("My cool movie.mov")
'My_cool_movie.mov'
>>> secure_filename("../../../etc/passwd")
'etc_passwd'
>>> secure_filename(u'i contain cool \xfcml\xe4uts.txt')
'i_contain_cool_umlauts.txt'
The function might return an empty filename. It's your responsibility
to ensure that the filename is unique and that you generate random
filename if the function returned an empty one.
.. versionadded:: 0.5
:param filename: the filename to secure
"""
if isinstance(filename, unicode):
from unicodedata import normalize
filename = normalize('NFKD', filename).encode('ascii', 'ignore')
for sep in os.path.sep, os.path.altsep:
if sep:
filename = filename.replace(sep, ' ')
filename = str(_filename_ascii_strip_re.sub('', '_'.join(
filename.split()))).strip('._')
# on nt a couple of special files are present in each folder. We
# have to ensure that the target file is not such a filename. In
# this case we prepend an underline
if os.name == 'nt' and filename and \
filename.split('.')[0].upper() in _windows_device_files:
filename = '_' + filename
return filename
..//
This essentially is the same thing, however since you are straight away matching strings for /../
the, one I added would go undetected and would get the parent directory.

How to determine whether specified file is placed inside of the specified folder?

Let's say I have two paths: the first one (may be file or folder path): file_path, and the second one (may only be a folder path): folder_path. And I want to determine whether an object collocated with file_path is inside of the object collocated with folder_path.
I have an idea of doing this:
import os
...
def is_inside(file_path, folder_path):
full_file_path = os.path.realpath(file_path)
full_folder_path = os.path.realpath(folder_path)
return full_folder_path.startswith(full_file_path)
but I'm afraid there are some pitfalls in this approach. Also I think there must be a prettier way to do this.
The solution must work on Linux but it would be great if you propose me some cross-platform trick.
Use os.path.commonprefix. Here's an example based on your idea.
import os.path as _osp
def is_inside(file_path, folder_path):
full_file_path = _osp.realpath(file_path)
full_folder_path = _osp.realpath(folder_path)
return _osp.commonprefix([full_file_path, full_folder_path]) == \
full_folder_path
Parse the file name from the file path and do
os.path.exists(full_folder_path + '/' + file_name)

Directory Walker for Python

I am currently using the directory walker from Here
import os
class DirectoryWalker:
# a forward iterator that traverses a directory tree
def __init__(self, directory):
self.stack = [directory]
self.files = []
self.index = 0
def __getitem__(self, index):
while 1:
try:
file = self.files[self.index]
self.index = self.index + 1
except IndexError:
# pop next directory from stack
self.directory = self.stack.pop()
self.files = os.listdir(self.directory)
self.index = 0
else:
# got a filename
fullname = os.path.join(self.directory, file)
if os.path.isdir(fullname) and not os.path.islink(fullname):
self.stack.append(fullname)
return fullname
for file in DirectoryWalker(os.path.abspath('.')):
print file
This minor change allows you to have the full path within the file.
Can anyone help me how to find just the filename as well using this? I need both the full path, and just the filename.
Why do you want to do such boring thing yourself?
for path, directories, files in os.walk('.'):
print 'ls %r' % path
for directory in directories:
print ' d%r' % directory
for filename in files:
print ' -%r' % filename
Output:
'.'
d'finction'
d'.hg'
-'setup.py'
-'.hgignore'
'./finction'
-'finction'
-'cdg.pyc'
-'util.pyc'
-'cdg.py'
-'util.py'
-'__init__.pyc'
-'__init__.py'
'./.hg'
d'store'
-'hgrc'
-'requires'
-'00changelog.i'
-'undo.branch'
-'dirstate'
-'undo.dirstate'
-'branch'
'./.hg/store'
d'data'
-'undo'
-'00changelog.i'
-'00manifest.i'
'./.hg/store/data'
d'finction'
-'.hgignore.i'
-'setup.py.i'
'./.hg/store/data/finction'
-'util.py.i'
-'cdg.py.i'
-'finction.i'
-'____init____.py.i'
But if you insist, there's path related tools in os.path, os.basename is what you are looking at.
>>> import os.path
>>> os.path.basename('/hello/world.h')
'world.h'
Rather than using '.' as your directory, refer to its absolute path:
for file in DirectoryWalker(os.path.abspath('.')):
print file
Also, I'd recommend using a word other than 'file', because it means something in the python language. Not a keyword, though so it still runs.
As an aside, when dealing with filenames, I find the os.path module to be incredibly useful - I'd recommend having a look through that, especially
os.path.normpath
Normalises paths (gets rid of redundant '.'s and 'theFolderYouWereJustIn/../'s)
os.path.join
Joins two paths
os.path.dirname()? os.path.normpath()? os.path.abspath()?
This would also be a lovely place to think recursion.
Just prepend the current directory path to the "./foo" path returned:
print os.path.join(os.getcwd(), file)

Categories

Resources