Sanitizing a file path in python - python

I have a file browser application that exposes a directory and its contents to the users.
I want to sanitize the user input, which is a file path, so that it does not allow absolute paths such as '/tmp/' and relative paths such as '../../etc'
Is there a python function that does this across platforms?

Also for people searching for a way to get rid of A/./B -> A/B and A/B/../C -> A/C in paths.
You can use os.path.normpath for that.

A comprehensive filepath sanitiser for python
I wasn't really satisfied with any of the available methods for sanitising a path, so I wrote my own, relatively comprehensive path sanitiser. This is suitable* for taking input from a public endpoint (http upload, REST endpoint, etc) and ensuring that if you save data at the resulting file path, it will not damage your system**. (Note: this code targets Python 3+, you'll probably need to make some changes to make it work on 2.x)
* No guarantees! Please don't rely on this code without checking it thoroughly yourself.
** Again, no guarantees! You could still do something crazy and set your root path on a *nix system to /dev/ or /bin/ or something like that. Don't do that. There are also some edge cases on Windows that could cause damage (device file names, for example), you could check the secure_filename method from werkzeug's utils for a good start on dealing with these if you're targeting Windows.
How it works
You need to specify a root path, the sanitiser will ensure that all paths returned are under this root. Check the get_root_path function for where to do this. Make sure the value for the root path is from your own configuration, not input from the user!
There is a file name sanitiser which:
Converts unicode to ASCII
Converts path separators to underscores
Only allows certain characters from a whitelist in the file name. The whitelist includes all lower and uppercase letters, all digits, the hyphen, the underscore, the space, opening and closing round brackets and the full stop character (period). You can customise this whitelist if you want to.
Ensures all names have at least one letter or number (to avoid names like '..')
To get a valid file path, you should call make_valid_file_path. You can optionally pass it a subdirectory path in the path parameter. This is the path underneath the root path, and can come from user input. You can optionally pass it a file name in the filename parameter, this can also come from user input. Any path information in the file name you pass will not be used to determine the path of the file, instead it will be flattened into valid, safe components of the file's name.
If there is no path or filename, it will return the root path, correctly formatted for the host file system, with a trailing path separator (/).
If there is a subdirectory path, it will split it into its component parts, sanitising each with the file name sanitiser and rebuilding the path without a leading path separator.
If there is a file name, it will sanitise the name with the sanitiser.
It will os.path.join the path components to get a final path to your file.
As a final double-check that the resulting path is valid and safe, it checks that the resulting path is somewhere under the root path. This check is done properly by splitting up and comparing the component parts of the path, rather than just ensuring one string starts with another.
OK, enough warnings and description, here's the code:
import os
def ensure_directory_exists(path_directory):
if not os.path.exists(path_directory):
os.makedirs(path_directory)
def os_path_separators():
seps = []
for sep in os.path.sep, os.path.altsep:
if sep:
seps.append(sep)
return seps
def sanitise_filesystem_name(potential_file_path_name):
# Sort out unicode characters
valid_filename = normalize('NFKD', potential_file_path_name).encode('ascii', 'ignore').decode('ascii')
# Replace path separators with underscores
for sep in os_path_separators():
valid_filename = valid_filename.replace(sep, '_')
# Ensure only valid characters
valid_chars = "-_.() {0}{1}".format(string.ascii_letters, string.digits)
valid_filename = "".join(ch for ch in valid_filename if ch in valid_chars)
# Ensure at least one letter or number to ignore names such as '..'
valid_chars = "{0}{1}".format(string.ascii_letters, string.digits)
test_filename = "".join(ch for ch in potential_file_path_name if ch in valid_chars)
if len(test_filename) == 0:
# Replace empty file name or file path part with the following
valid_filename = "(Empty Name)"
return valid_filename
def get_root_path():
# Replace with your own root file path, e.g. '/place/to/save/files/'
filepath = get_file_root_from_config()
filepath = os.path.abspath(filepath)
# ensure trailing path separator (/)
if not any(filepath[-1] == sep for sep in os_path_separators()):
filepath = '{0}{1}'.format(filepath, os.path.sep)
ensure_directory_exists(filepath)
return filepath
def path_split_into_list(path):
# Gets all parts of the path as a list, excluding path separators
parts = []
while True:
newpath, tail = os.path.split(path)
if newpath == path:
assert not tail
if path and path not in os_path_separators():
parts.append(path)
break
if tail and tail not in os_path_separators():
parts.append(tail)
path = newpath
parts.reverse()
return parts
def sanitise_filesystem_path(potential_file_path):
# Splits up a path and sanitises the name of each part separately
path_parts_list = path_split_into_list(potential_file_path)
sanitised_path = ''
for path_component in path_parts_list:
sanitised_path = '{0}{1}{2}'.format(sanitised_path, sanitise_filesystem_name(path_component), os.path.sep)
return sanitised_path
def check_if_path_is_under(parent_path, child_path):
# Using the function to split paths into lists of component parts, check that one path is underneath another
child_parts = path_split_into_list(child_path)
parent_parts = path_split_into_list(parent_path)
if len(parent_parts) > len(child_parts):
return False
return all(part1==part2 for part1, part2 in zip(child_parts, parent_parts))
def make_valid_file_path(path=None, filename=None):
root_path = get_root_path()
if path:
sanitised_path = sanitise_filesystem_path(path)
if filename:
sanitised_filename = sanitise_filesystem_name(filename)
complete_path = os.path.join(root_path, sanitised_path, sanitised_filename)
else:
complete_path = os.path.join(root_path, sanitised_path)
else:
if filename:
sanitised_filename = sanitise_filesystem_name(filename)
complete_path = os.path.join(root_path, sanitised_filename)
else:
complete_path = complete_path
complete_path = os.path.abspath(complete_path)
if check_if_path_is_under(root_path, complete_path):
return complete_path
else:
return None

This will prevent the user inputting filenames like ../../../../etc/shadow but will also not allow files in subdirs below basedir (i.e. basedir/subdir/moredir is blocked):
from pathlib import Path
test_path = (Path(basedir) / user_input).resolve()
if test_path.parent != Path(basedir).resolve():
raise Exception(f"Filename {test_path} is not in {Path(basedir)} directory")
If you want to allow subdirs below basedir:
if not Path(basedir).resolve() in test_path.resolve().parents:
raise Exception(f"Filename {test_path} is not in {Path(basedir)} directory")

I ended up here looking for a quick way to handle my use case and ultimately wrote my own. What I needed was a way to take in a path and force it to be in the CWD. This is for a CI system working on mounted files.
def relative_path(the_path: str) -> str:
'''
Force the spec path to be relative to the CI workspace
Sandboxes the path so that you can't escape out of CWD
'''
# Make the path absolute
the_path = os.path.abspath(the_path)
# If it started with a . it'll now be /${PWD}/
# We'll get the path relative to cwd
if the_path.startswith(os.getcwd()):
the_path = '{}{}'.format(os.sep, os.path.relpath(the_path))
# Prepend the path with . and it'll now be ./the/path
the_path = '.{}'.format(the_path)
return the_path
In my case I didn't want to raise an exception. I just want to force that any path given will become an absolute path in the CWD.
Tests:
def test_relative_path():
assert relative_path('../test') == './test'
assert relative_path('../../test') == './test'
assert relative_path('../../abc/../test') == './test'
assert relative_path('../../abc/../test/fixtures') == './test/fixtures'
assert relative_path('../../abc/../.test/fixtures') == './.test/fixtures'
assert relative_path('/test/foo') == './test/foo'
assert relative_path('./test/bar') == './test/bar'
assert relative_path('.test/baz') == './.test/baz'
assert relative_path('qux') == './qux'

This is an improvement on #mneil's solution, using relpath's secret second argument:
import os.path
def sanitize_path(path):
"""
Sanitize a path against directory traversals
>>> sanitize_path('../test')
'test'
>>> sanitize_path('../../test')
'test'
>>> sanitize_path('../../abc/../test')
'test'
>>> sanitize_path('../../abc/../test/fixtures')
'test/fixtures'
>>> sanitize_path('../../abc/../.test/fixtures')
'.test/fixtures'
>>> sanitize_path('/test/foo')
'test/foo'
>>> sanitize_path('./test/bar')
'test/bar'
>>> sanitize_path('.test/baz')
'.test/baz'
>>> sanitize_path('qux')
'qux'
"""
# - pretending to chroot to the current directory
# - cancelling all redundant paths (/.. = /)
# - making the path relative
return os.path.relpath(os.path.normpath(os.path.join("/", path)), "/")
if __name__ == '__main__':
import doctest
doctest.testmod()

To be very specific to the question asked but raise an exception rather than converting the path to relative:
path = 'your/path/../../to/reach/root'
if '../' in path or path[:1] == '/':
raise Exception

Related

python check folder path with *

checking folder path with "*"
here is what i have tried.
import os
checkdir = "/lib/modules/*/kernel/drivers/char"
path = os.path.exists(checkIPMI_dir)
print path
False
this will always print False, is there other way that i can print it to true?
i know when i put static path, it can
import os
checkdir = "/lib/modules"
path = os.path.exists(checkIPMI_dir)
print path
i cant put static, because i have a few linux OS, that uses different kernel
version, there for the number will not be the same.
This may not solve your problem, but a while ago I needed to be able to recurse an unknown number of times through a directory structure to find a file, so I wrote this function:
import os
import fnmatch
def get_files(root, patterns='*', depth=1, yield_folders=False):
"""
Return all of the files matching patterns, in pattern.
Only search to teh specified depth
Arguments:
root - Top level directory for search
patterns - comma separated list of Unix style
wildcards to match NOTE: This is not
a regular expression.
depth - level of subdirectory search, default
1 level down
yield_folders - True folder names should be returned
default False
Returns:
generator of files meeting search criteria for all
patterns in argument patterns
"""
# Determine the depth of the root directory
root_depth = len(root.split(os.sep))
# Figure out what patterns we are matching
patterns = patterns.split(';')
for path, subdirs, files in os.walk(root):
# Are we including directories in search?
if yield_folders:
files.extend(subdirs)
files.sort()
for name in files:
for pattern in patterns:
# Determine if we've exceeded the depth of the
# search?
cur_depth = len(path.split(os.sep))
if (cur_depth - root_depth) > depth:
break
if fnmatch.fnmatch(name, pattern):
yield os.path.join(path, name)
break
With this function you could check to see if a file exists with the following:
checkdir = "/lib/modules/*/kernel/drivers/char"
matches = get_files(checkdir, depth=100, yield_folders=True)
found = True if matches else False
All this may be overkill, but it should work!

How to check parameters for os.makedirs?

I use this code for create folder for users:
work_path = '/tmp/'
os.umask(0000)
for i in user:
if not os.path.exists(work_path + i):
try:
os.makedirs(work_path + i, 0777)
When I set work_path = '/tmp/' - my code work perfect.
But when I type for mistake work_path = '/tmp' I got not what expected )))
Question: how can I check my path have backslash, or maybe how can I create folders for another way ?
Use os.path.join:
Join one or more path components intelligently. The return value is the concatenation of path and any members of *paths with exactly one directory separator (os.sep) following each non-empty part except the last, meaning that the result will only end in a separator if the last part is empty. If a component is an absolute path, all previous components are thrown away and joining continues from the absolute path component.
os.makedirs(os.path.join(work_path,i))
So in your code join the path once then use the joined path:
for i in user:
pth = os.path.join(work_path, i)
if not os.path.exists(pth):
try:
os.makedirs(pth, 0777)
You can use just the expression
work_path[-1] == '/'
If it is true you have a backslash

Python open filename from custom PATH

Similar to the system path, I want to offer some convenience in my code allowing a user to specify a file name that could be in one of a handful of paths.
Say I had two or more config paths
['~/.foo-config/', '/usr/local/myapp/foo-config/']
And my user wants to open bar, (AKA bar.baz)
Is there a convenient build in way to let open('bar') or open('bar.baz') automatically search these paths for that file in LTR order of precedence? Eg, will temporary adjusting my sys.path to only be these directories do this for me?
Else, how would you suggest implementing a PATH-like searching open-wrapper?
As other people already mentioned: sys.path only affects the module search path, i.e. it's relevant for importing Python modules, but not at all for open().
I would suggest separating the logic for searching the paths in order of precedence and opening the file, because that way it's easier to test and read.
I would do something like this:
import os
PATHS = ['~/.foo-config/', '/usr/local/myapp/foo-config/']
def find_first(filename, paths):
for directory in paths:
full_path = os.path.join(directory, filename)
if os.path.isfile(full_path):
return full_path
def main():
filename = 'file.txt'
path = find_first(filename, PATHS)
if path:
with open(path) as f:
print f
else:
print "File {} not found in any of the directories".format(filename)
if __name__ == '__main__':
main()
open doesn't get into that kind of logic. If you want, write a wrapper function that uses os.path.join to join each member of sys.path to the parameter filename, and tries to open them in order, handling the error that occurs when no such file is found.
I'll add that, as another user stated, this is kind of a misuse of sys.path, but this function would work for any list of paths. Indeed, maybe the nicest option is to use the environment variables suggested by another user to specify a colon-delimited list of config directories, which you then parse and use within your search function.
environmental variables
say your app is named foo ... in the readme tell the user to use the FOO_PATH environmental variable to specify the extra paths
then inside your app do something like
for path in os.environ.get("FOO_PATH",".").split(";"):
lookfor(os.path.join(path,"somefile.txt"))
you could wrap it into a generic function
def open_foo(fname):
for path in os.environ.get("FOO_PATH",".").split(";"):
path_to_test = os.path.join(path,"somefile.txt")
if os.path.exists(path_to_test):
return open(path_to_test)
raise Exception("No File Found On FOOPATH")
then you could use it just like normal open
with open_foo("my_config.txt") as f:
print f.read()
Extract from Python Standard Library documentation for open built-in function:
open(file, mode='r', buffering=-1, encoding=None, errors=None, newline=None, closefd=True, opener=None)
...file is either a string or bytes object giving the pathname (absolute or relative to the current working directory) of the file to be opened ...
Explicitely, open does not bring anything to automagically find a file : if path is not absolute, it is only searched in current directory.
So you will have to use a custom function or a custom class for that. For example:
class path_opener(object):
def __init__(self, path = [.]):
self.path = path
def set(self, path):
self.path = path
def append(self, path):
self.path.append(path)
def extent(self, path):
self.path.extend(path)
def find(self, file):
for folder in self.path:
path = os.path.join(folder, file)
if os.path.isfile(path):
return path
raise FileNotFoundError()
def open(self, file, *args, **kwargs):
return open(self.find(file), *args, **kwargs)
That means that a file opener will keep its own path, will be initialized by default with current path, will have methods to set, append to or extend its path, and will normaly raise a FileNotFoundError is a file is not found in any of the directories listed in its path.
Usage :
o = path_opener(['~/.foo-config/', '/usr/local/myapp/foo-config/'])
with o.open('foo') as fd:
...

Is it enough to test for '/../' in a file path to prevent escaping a subdirectory?

I have some code in python that checks that a file path access a file in a subdirectory.
It is for a web server to access files in the static folder.
I use the following code snippet :
path = 'static/' + path
try:
if '/../' in path:
raise RuntimeError('/../ in static file path')
f = open(path)
except (RuntimeError, IOError):
app.abort(404)
return
If the path is clean it would be enough.
Could there be ways to write a path accessing parent directories that would not be detected by this simple test ?
I would suggest using os.path.relpath, it takes a path and works out the most concise relative path from the given directory. That way you only need to test if the path starts with ".."
eg.
path = ...
relativePath = os.path.relpath(path)
if relativePath.startswith(".."):
raise RuntimeError("path escapes static directory")
completePath = "static/" + relativePath
You could also use os.readlink to replace symbolic links with a real path if sym links are something you have to worry about.
Flask has a few helper functions that I think you can copy over to your code without problems. The recommended syntax is:
filename = secure_filename(dirty_filename)
path = os.path.join(upload_folder, filename)
Werkzeug implements secure_filename and uses this code to clean filenames:
_filename_ascii_strip_re = re.compile(r'[^A-Za-z0-9_.-]')
_windows_device_files = ('CON', 'AUX', 'COM1', 'COM2', 'COM3', 'COM4', 'LPT1',
'LPT2', 'LPT3', 'PRN', 'NUL')
def secure_filename(filename):
r"""Pass it a filename and it will return a secure version of it. This
filename can then safely be stored on a regular file system and passed
to :func:`os.path.join`. The filename returned is an ASCII only string
for maximum portability.
On windows system the function also makes sure that the file is not
named after one of the special device files.
>>> secure_filename("My cool movie.mov")
'My_cool_movie.mov'
>>> secure_filename("../../../etc/passwd")
'etc_passwd'
>>> secure_filename(u'i contain cool \xfcml\xe4uts.txt')
'i_contain_cool_umlauts.txt'
The function might return an empty filename. It's your responsibility
to ensure that the filename is unique and that you generate random
filename if the function returned an empty one.
.. versionadded:: 0.5
:param filename: the filename to secure
"""
if isinstance(filename, unicode):
from unicodedata import normalize
filename = normalize('NFKD', filename).encode('ascii', 'ignore')
for sep in os.path.sep, os.path.altsep:
if sep:
filename = filename.replace(sep, ' ')
filename = str(_filename_ascii_strip_re.sub('', '_'.join(
filename.split()))).strip('._')
# on nt a couple of special files are present in each folder. We
# have to ensure that the target file is not such a filename. In
# this case we prepend an underline
if os.name == 'nt' and filename and \
filename.split('.')[0].upper() in _windows_device_files:
filename = '_' + filename
return filename
..//
This essentially is the same thing, however since you are straight away matching strings for /../
the, one I added would go undetected and would get the parent directory.

Extract files from zip without keep the top-level folder with python zipfile

I'm using the current code to extract the files from a zip file while keeping the directory structure:
zip_file = zipfile.ZipFile('archive.zip', 'r')
zip_file.extractall('/dir/to/extract/files/')
zip_file.close()
Here is a structure for an example zip file:
/dir1/file.jpg
/dir1/file1.jpg
/dir1/file2.jpg
At the end I want this:
/dir/to/extract/file.jpg
/dir/to/extract/file1.jpg
/dir/to/extract/file2.jpg
But it should ignore only if the zip file has a top-level folder with all files inside it, so when I extract a zip with this structure:
/dir1/file.jpg
/dir1/file1.jpg
/dir1/file2.jpg
/dir2/file.txt
/file.mp3
It should stay like this:
/dir/to/extract/dir1/file.jpg
/dir/to/extract/dir1/file1.jpg
/dir/to/extract/dir1/file2.jpg
/dir/to/extract/dir2/file.txt
/dir/to/extract/file.mp3
Any ideas?
If I understand your question correctly, you want to strip any common prefix directories from the items in the zip before extracting them.
If so, then the following script should do what you want:
import sys, os
from zipfile import ZipFile
def get_members(zip):
parts = []
# get all the path prefixes
for name in zip.namelist():
# only check files (not directories)
if not name.endswith('/'):
# keep list of path elements (minus filename)
parts.append(name.split('/')[:-1])
# now find the common path prefix (if any)
prefix = os.path.commonprefix(parts)
if prefix:
# re-join the path elements
prefix = '/'.join(prefix) + '/'
# get the length of the common prefix
offset = len(prefix)
# now re-set the filenames
for zipinfo in zip.infolist():
name = zipinfo.filename
# only check files (not directories)
if len(name) > offset:
# remove the common prefix
zipinfo.filename = name[offset:]
yield zipinfo
args = sys.argv[1:]
if len(args):
zip = ZipFile(args[0])
path = args[1] if len(args) > 1 else '.'
zip.extractall(path, get_members(zip))
Read the entries returned by ZipFile.namelist() to see if they're in the same directory, and then open/read each entry and write it to a file opened with open().
This might be a problem with the zip archive itself. In a python prompt try this to see if the files are in the correct directories in the zip file itself.
import zipfile
zf = zipfile.ZipFile("my_file.zip",'r')
first_file = zf.filelist[0]
print file_list.filename
This should say something like "dir1"
repeat the steps above substituting and index of 1 into filelist like so first_file = zf.filelist[1] This time the output should look like 'dir1/file1.jpg' if this is not the case then the zip file does not contain directories and will be unzipped all to one single directory.
Based on the #ekhumoro's answer I come up with a simpler funciton to extract everything on the same level, it is not exactly what you are asking but I think can help someone.
def _basename_members(self, zip_file: ZipFile):
for zipinfo in zip_file.infolist():
zipinfo.filename = os.path.basename(zipinfo.filename)
yield zipinfo
from_zip="some.zip"
to_folder="some_destination/"
with ZipFile(file=from_zip, mode="r") as zip_file:
os.makedirs(to_folder, exist_ok=True)
zip_infos = self._basename_members(zip_file)
zip_file.extractall(path=to_folder, members=zip_infos)
Basically you need to do two things:
Identify the root directory in the zip.
Remove the root directory from the paths of other items in the zip.
The following should retain the overall structure of the zip while removing the root directory:
import typing, zipfile
def _is_root(info: zipfile.ZipInfo) -> bool:
if info.is_dir():
parts = info.filename.split("/")
# Handle directory names with and without trailing slashes.
if len(parts) == 1 or (len(parts) == 2 and parts[1] == ""):
return True
return False
def _members_without_root(archive: zipfile.ZipFile, root_filename: str) -> typing.Generator:
for info in archive.infolist():
parts = info.filename.split(root_filename)
if len(parts) > 1 and parts[1]:
# We join using the root filename, because there might be a subdirectory with the same name.
info.filename = root_filename.join(parts[1:])
yield info
with zipfile.ZipFile("archive.zip", mode="r") as archive:
# We will use the first directory with no more than one path segment as the root.
root = next(info for info in archive.infolist() if _is_root(info))
if root:
archive.extractall(path="/dir/to/extract/", members=_members_without_root(archive, root.filename))
else:
print("No root directory found in zip.")

Categories

Resources