I need to build a Windows (only) file path from a combination of a UNC path passed to my script as a parameter, an API call, and a calculated date.
I'm having a horrible time of it, mostly caused by Windows' use of a backslash character to delimit file paths. I've read that the "pathlib" module should be able to sort this mess out, BUT it apparently doesn't support concatenation when building out file paths.
The UNC path is being passed out to the script as a dictionary from another application (PRTG Network Monitor:
{"fileshare": "//server02/logs/"}
I read that in and then need to append a hostname derived from an API call:
logPath = Path(params["fileshare"] + "/" + apiHostname + "/")
I then calculate a date which needs to be appended to the logpath, along with a delimiter "-" and a filename suffix:
filePath = Path(logPath, + apiHostname + "-", + past_day + ".log" )
The problem arises during the concatenation:
{"text": "Python Script execution error: unsupported operand type(s) for +: 'WindowsPath' and 'str'", "error": 1}}
Can someone please explain how I can build a path so that the calculated filename, which should look like this:
\\server02\logs\log01.rhmgmt.lan\log01.rhmgmt.lan-2021-07-28.log
Can be opened for processing?
Yes "pathlib" module should be able to sort this mess out
Input:
from datetime import date, timedelta
from pathlib import Path
params = {"fileshare": "//server02/logs/"}
apiHostname = 'log01.rhmgmt.lan'
past_day = str((date.today() - timedelta(days=1)))
Create the initial path and append all parts:
fileshare = Path(params['fileshare'])
filepath = fileshare / apiHostname / f"{apiHostname}-{past_day}.log"
Output:
>>> filepath
PosixPath('//server02/logs/log01.rhmgmt.lan/log01.rhmgmt.lan-2021-07-28.log')
Yes, pathlib can easily sort things out.
You can use the joinpath() method, as I suggested in a comment, to concatenate the components are you're building the Path. It does the equivalent of what os.path.join() does.
The only slightly tricky part is you'll have to initially create an empty Path in order use the method they inherit from PurePath.
from datetime import date, timedelta
from pathlib import Path
params = {"fileshare": "//server02/logs/"}
apiHostname = 'log01.rhmgmt.lan'
past_day = str((date.today() - timedelta(days=1)))
filePath = Path().joinpath(params["fileshare"], apiHostname,
apiHostname + '-' + past_day + '.log')
print(filePath) # -> \\server02\logs\log01.rhmgmt.lan\log01.rhmgmt.lan-2021-07-29.log
I'm connected to Azure Storage and needs to get the name of a file. I have written a function which gets the list of contents in a particular directory.
datepath = np.datetime64('today').astype(str).replace('-', '/')
def list_directory_contents():
try:
file_system_client = service_client.get_file_system_client(file_system="container_name")
paths = file_system_client.get_paths(path = "folder_name/" + str(datepath))
for path in paths:
print(path.name + '\n')
except Exception as e:
print(e)
And then I'm calling the function
list_directory_contents()
This gives me the something like
folder_name/2020/10/28/file_2020_10_28.csv
Now I want to extract just the file name from the above i.e. "file_2020_10_28.csv"
You're looking for os.path.basename. It's a cross platform and robust way to do what you want-
>>> os.path.basename("folder_name/2020/10/28/file_2020_10_28.csv")
'file_2020_10_28.csv'
in case it wasn't obvious, the part to be changed in list_directory_contents to print only the basename is this-
for path in paths:
print(os.path.basename(path.name) + '\n')
assuming path.name is the string that returns something akin to folder_name/2020/10/28/file_2020_10_28.csv
Given a file path
/path/to/some/file.jpg
How would I get
/path/to/some
I'm currently doing
fullpath = '/path/to/some/file.jpg'
filepath = '/'.join(fullpath.split('/')[:-1])
But I think it is open to errors
With os.path.split:
dirname, fname = os.path.split(fullpath)
Per the docs:
Split the pathname path into a pair, (head, tail) where tail is the
last pathname component and head is everything leading up to that. The
tail part will never contain a slash; if path ends in a slash, tail
will be empty. If there is no slash in path, head will be empty.
os.path is always the module suitable for the platform that the code is running on.
Please try this
fullpath = '/path/to/some/file.jpg'
import os
os.path.dirname(fullpath)
Using pathlib you can get the path without the file name using the .parent attribute:
from pathlib import Path
fullpath = Path("/path/to/some/file.jpg")
filepath = str(fullpath.parent) # /path/to/some/
This handles both UNIX and Windows paths correctly.
With String rfind method.
fullpath = '/path/to/some/file.jpg'
index = fullpath.rfind('/')
fullpath[0:index]
I have a file path like this:
file_name = full_path + env + '/filename.txt'
in which:
full_path is '/home/louis/key-files/
env is 'prod'
=> file name is '/home/louis/key-files/prod/filename.txt'
I want to use os.path.join
file_name = os.path.abspath(os.path.join(full_path, env, '/filename.txt'))
But the returned result is only: file_name = '/filename.txt'
How can I get the expected result like above?
Thanks
Since your last component begins with a slash, it is taken as starting from the root, so os.path.join just removes everything else. Try without the leading slash instead:
os.path.join(full_path, env, 'filename.txt')
Note you probably don't need abspath here.
I have a file browser application that exposes a directory and its contents to the users.
I want to sanitize the user input, which is a file path, so that it does not allow absolute paths such as '/tmp/' and relative paths such as '../../etc'
Is there a python function that does this across platforms?
Also for people searching for a way to get rid of A/./B -> A/B and A/B/../C -> A/C in paths.
You can use os.path.normpath for that.
A comprehensive filepath sanitiser for python
I wasn't really satisfied with any of the available methods for sanitising a path, so I wrote my own, relatively comprehensive path sanitiser. This is suitable* for taking input from a public endpoint (http upload, REST endpoint, etc) and ensuring that if you save data at the resulting file path, it will not damage your system**. (Note: this code targets Python 3+, you'll probably need to make some changes to make it work on 2.x)
* No guarantees! Please don't rely on this code without checking it thoroughly yourself.
** Again, no guarantees! You could still do something crazy and set your root path on a *nix system to /dev/ or /bin/ or something like that. Don't do that. There are also some edge cases on Windows that could cause damage (device file names, for example), you could check the secure_filename method from werkzeug's utils for a good start on dealing with these if you're targeting Windows.
How it works
You need to specify a root path, the sanitiser will ensure that all paths returned are under this root. Check the get_root_path function for where to do this. Make sure the value for the root path is from your own configuration, not input from the user!
There is a file name sanitiser which:
Converts unicode to ASCII
Converts path separators to underscores
Only allows certain characters from a whitelist in the file name. The whitelist includes all lower and uppercase letters, all digits, the hyphen, the underscore, the space, opening and closing round brackets and the full stop character (period). You can customise this whitelist if you want to.
Ensures all names have at least one letter or number (to avoid names like '..')
To get a valid file path, you should call make_valid_file_path. You can optionally pass it a subdirectory path in the path parameter. This is the path underneath the root path, and can come from user input. You can optionally pass it a file name in the filename parameter, this can also come from user input. Any path information in the file name you pass will not be used to determine the path of the file, instead it will be flattened into valid, safe components of the file's name.
If there is no path or filename, it will return the root path, correctly formatted for the host file system, with a trailing path separator (/).
If there is a subdirectory path, it will split it into its component parts, sanitising each with the file name sanitiser and rebuilding the path without a leading path separator.
If there is a file name, it will sanitise the name with the sanitiser.
It will os.path.join the path components to get a final path to your file.
As a final double-check that the resulting path is valid and safe, it checks that the resulting path is somewhere under the root path. This check is done properly by splitting up and comparing the component parts of the path, rather than just ensuring one string starts with another.
OK, enough warnings and description, here's the code:
import os
def ensure_directory_exists(path_directory):
if not os.path.exists(path_directory):
os.makedirs(path_directory)
def os_path_separators():
seps = []
for sep in os.path.sep, os.path.altsep:
if sep:
seps.append(sep)
return seps
def sanitise_filesystem_name(potential_file_path_name):
# Sort out unicode characters
valid_filename = normalize('NFKD', potential_file_path_name).encode('ascii', 'ignore').decode('ascii')
# Replace path separators with underscores
for sep in os_path_separators():
valid_filename = valid_filename.replace(sep, '_')
# Ensure only valid characters
valid_chars = "-_.() {0}{1}".format(string.ascii_letters, string.digits)
valid_filename = "".join(ch for ch in valid_filename if ch in valid_chars)
# Ensure at least one letter or number to ignore names such as '..'
valid_chars = "{0}{1}".format(string.ascii_letters, string.digits)
test_filename = "".join(ch for ch in potential_file_path_name if ch in valid_chars)
if len(test_filename) == 0:
# Replace empty file name or file path part with the following
valid_filename = "(Empty Name)"
return valid_filename
def get_root_path():
# Replace with your own root file path, e.g. '/place/to/save/files/'
filepath = get_file_root_from_config()
filepath = os.path.abspath(filepath)
# ensure trailing path separator (/)
if not any(filepath[-1] == sep for sep in os_path_separators()):
filepath = '{0}{1}'.format(filepath, os.path.sep)
ensure_directory_exists(filepath)
return filepath
def path_split_into_list(path):
# Gets all parts of the path as a list, excluding path separators
parts = []
while True:
newpath, tail = os.path.split(path)
if newpath == path:
assert not tail
if path and path not in os_path_separators():
parts.append(path)
break
if tail and tail not in os_path_separators():
parts.append(tail)
path = newpath
parts.reverse()
return parts
def sanitise_filesystem_path(potential_file_path):
# Splits up a path and sanitises the name of each part separately
path_parts_list = path_split_into_list(potential_file_path)
sanitised_path = ''
for path_component in path_parts_list:
sanitised_path = '{0}{1}{2}'.format(sanitised_path, sanitise_filesystem_name(path_component), os.path.sep)
return sanitised_path
def check_if_path_is_under(parent_path, child_path):
# Using the function to split paths into lists of component parts, check that one path is underneath another
child_parts = path_split_into_list(child_path)
parent_parts = path_split_into_list(parent_path)
if len(parent_parts) > len(child_parts):
return False
return all(part1==part2 for part1, part2 in zip(child_parts, parent_parts))
def make_valid_file_path(path=None, filename=None):
root_path = get_root_path()
if path:
sanitised_path = sanitise_filesystem_path(path)
if filename:
sanitised_filename = sanitise_filesystem_name(filename)
complete_path = os.path.join(root_path, sanitised_path, sanitised_filename)
else:
complete_path = os.path.join(root_path, sanitised_path)
else:
if filename:
sanitised_filename = sanitise_filesystem_name(filename)
complete_path = os.path.join(root_path, sanitised_filename)
else:
complete_path = complete_path
complete_path = os.path.abspath(complete_path)
if check_if_path_is_under(root_path, complete_path):
return complete_path
else:
return None
This will prevent the user inputting filenames like ../../../../etc/shadow but will also not allow files in subdirs below basedir (i.e. basedir/subdir/moredir is blocked):
from pathlib import Path
test_path = (Path(basedir) / user_input).resolve()
if test_path.parent != Path(basedir).resolve():
raise Exception(f"Filename {test_path} is not in {Path(basedir)} directory")
If you want to allow subdirs below basedir:
if not Path(basedir).resolve() in test_path.resolve().parents:
raise Exception(f"Filename {test_path} is not in {Path(basedir)} directory")
I ended up here looking for a quick way to handle my use case and ultimately wrote my own. What I needed was a way to take in a path and force it to be in the CWD. This is for a CI system working on mounted files.
def relative_path(the_path: str) -> str:
'''
Force the spec path to be relative to the CI workspace
Sandboxes the path so that you can't escape out of CWD
'''
# Make the path absolute
the_path = os.path.abspath(the_path)
# If it started with a . it'll now be /${PWD}/
# We'll get the path relative to cwd
if the_path.startswith(os.getcwd()):
the_path = '{}{}'.format(os.sep, os.path.relpath(the_path))
# Prepend the path with . and it'll now be ./the/path
the_path = '.{}'.format(the_path)
return the_path
In my case I didn't want to raise an exception. I just want to force that any path given will become an absolute path in the CWD.
Tests:
def test_relative_path():
assert relative_path('../test') == './test'
assert relative_path('../../test') == './test'
assert relative_path('../../abc/../test') == './test'
assert relative_path('../../abc/../test/fixtures') == './test/fixtures'
assert relative_path('../../abc/../.test/fixtures') == './.test/fixtures'
assert relative_path('/test/foo') == './test/foo'
assert relative_path('./test/bar') == './test/bar'
assert relative_path('.test/baz') == './.test/baz'
assert relative_path('qux') == './qux'
This is an improvement on #mneil's solution, using relpath's secret second argument:
import os.path
def sanitize_path(path):
"""
Sanitize a path against directory traversals
>>> sanitize_path('../test')
'test'
>>> sanitize_path('../../test')
'test'
>>> sanitize_path('../../abc/../test')
'test'
>>> sanitize_path('../../abc/../test/fixtures')
'test/fixtures'
>>> sanitize_path('../../abc/../.test/fixtures')
'.test/fixtures'
>>> sanitize_path('/test/foo')
'test/foo'
>>> sanitize_path('./test/bar')
'test/bar'
>>> sanitize_path('.test/baz')
'.test/baz'
>>> sanitize_path('qux')
'qux'
"""
# - pretending to chroot to the current directory
# - cancelling all redundant paths (/.. = /)
# - making the path relative
return os.path.relpath(os.path.normpath(os.path.join("/", path)), "/")
if __name__ == '__main__':
import doctest
doctest.testmod()
To be very specific to the question asked but raise an exception rather than converting the path to relative:
path = 'your/path/../../to/reach/root'
if '../' in path or path[:1] == '/':
raise Exception