I am writing a Python script in which I write output to a temporary file and then move that file to its final destination once it is finished and closed. When the script finishes, I want the output file to have the same permissions as if it had been created normally through open(filename,"w"). As it is, the file will have the restrictive set of permissions used by the tempfile module for temp files.
Is there a way for me to figure out what the "default" file permissions for the output file would be if I created it in place, so that I can apply them to the temp file before moving it?
For the record, I had a similar issue, here is the code I have used:
import os
from tempfile import NamedTemporaryFile
def UmaskNamedTemporaryFile(*args, **kargs):
fdesc = NamedTemporaryFile(*args, **kargs)
# we need to set umask to get its current value. As noted
# by Florian Brucker (comment), this is a potential security
# issue, as it affects all the threads. Considering that it is
# less a problem to create a file with permissions 000 than 666,
# we use 666 as the umask temporary value.
umask = os.umask(0o666)
os.umask(umask)
os.chmod(fdesc.name, 0o666 & ~umask)
return fdesc
There is a function umask in the os module. You cannot get the current umask per se, you have to set it and the function returns the previous setting.
The umask is inherited from the parent process. It describes, which bits are not to be set when creating a file or directory.
This way is slow but safe and will work on any system that has a 'umask' shell command:
def current_umask() -> int:
"""Makes a best attempt to determine the current umask value of the calling process in a safe way.
Unfortunately, os.umask() is not threadsafe and poses a security risk, since there is no way to read
the current umask without temporarily setting it to a new value, then restoring it, which will affect
permissions on files created by other threads in this process during the time the umask is changed.
On recent linux kernels (>= 4.1), the current umask can be read from /proc/self/status.
On older systems, the safest way is to spawn a shell and execute the 'umask' command. The shell will
inherit the current process's umask, and will use the unsafe call, but it does so in a separate,
single-threaded process, which makes it safe.
Returns:
int: The current process's umask value
"""
mask: Optional[int] = None
try:
with open('/proc/self/status') as fd:
for line in fd:
if line.startswith('Umask:'):
mask = int(line[6:].strip(), 8)
break
except FileNotFoundError:
pass
except ValueError:
pass
if mask is None:
import subprocess
mask = int(subprocess.check_output('umask', shell=True).decode('utf-8').strip(), 8)
return mask
import os
def current_umask() -> int:
tmp = os.umask(0o022)
os.umask(tmp)
return tmp
This function is implemented in some python packages, e.g. pip and setuptools.
Related
Can joblib.Memory be used to write in a thread-safe manner to a common cache across multiple processes. In what situations, if any will this fail or cause an error?
The library first writes to a temporary file and then moves the temporary file to the destination. Source code:
def _concurrency_safe_write(self, to_write, filename, write_func):
"""Writes an object into a file in a concurrency-safe way."""
temporary_filename = concurrency_safe_write(to_write,
filename, write_func)
self._move_item(temporary_filename, filename)
Writing to the temporary file seems safe among processes in the same operating system because it includes the pid in the file name. Additionally, it seems safe among threads in the same process because it includes the thread id. Source:
def concurrency_safe_write(object_to_write, filename, write_func):
"""Writes an object into a unique file in a concurrency-safe way."""
thread_id = id(threading.current_thread())
temporary_filename = '{}.thread-{}-pid-{}'.format(
filename, thread_id, os.getpid())
write_func(object_to_write, temporary_filename)
return temporary_filename
Moving the temporary file to the destination has shown problems on Windows. Source:
if os.name == 'nt':
# https://github.com/joblib/joblib/issues/540
access_denied_errors = (5, 13)
from os import replace
def concurrency_safe_rename(src, dst):
"""Renames ``src`` into ``dst`` overwriting ``dst`` if it exists.
On Windows os.replace can yield permission errors if executed by two
different processes.
"""
max_sleep_time = 1
total_sleep_time = 0
sleep_time = 0.001
while total_sleep_time < max_sleep_time:
try:
replace(src, dst)
break
except Exception as exc:
if getattr(exc, 'winerror', None) in access_denied_errors:
time.sleep(sleep_time)
total_sleep_time += sleep_time
sleep_time *= 2
else:
raise
else:
raise
else:
from os import replace as concurrency_safe_rename # noqa
From this source code you can see that on Windows it could fail after having failed to move the temporary file to the destination because of access denied errors during a total time of 1 s and having retried with exponential backoff.
The same source code has a link to the issue #540 that describes the Windows errors and was closed with the comment:
Fixed by #541 (hopefully).
The "(hopefully)" in the comment seems to indicate that the author could not guarantee that the fix was definitive, but the issue has not been reopened, so it probably has not happened again.
For other operating systems there is no special logic or retries and just the standard os.replace() is used. The description mentions cases where it "may fail" and also that it "will be an atomic operation":
Rename the file or directory src to dst. If dst is a directory, OSError will be raised. If dst exists and is a file, it will be replaced silently if the user has permission. The operation may fail if src and dst are on different filesystems. If successful, the renaming will be an atomic operation (this is a POSIX requirement).
If no one is changing permissions in the destination directories, you should be less worried about the probability of failure of this operation. The scenario of "if src and dst are on different filesystems" seems not feasible because the source path (temporary file) is built just by adding a suffix to the destination path, so they should be in the same directory.
Other questions that deal with the atomicity of rename:
Is rename() atomic?
Is mv atomic on my fs?
Is rename required by standard to be atomic?
Using portalocker we can lock a file for access through the following way:
f=open("M99","r+")
portalocker.lock(f,portalocker.LOCK_EX)
The lock over the file can be removed using
f.close() #or
portalocker.unlock(file) #needs `file` ie reference to file it locked ..pretty obvious too
Can this same thing be done by any other way in python wherein
We can lock the file for access
Restart Python (so no longer have the original Python file object or file number).
Unlock the file for access in the new process.
I cannot save f or file object so can't use pickle or something either. Is there a way using the Python standard library or some win32api call?
Any windows utility will also do...any command line from windows?
It appears you want to lock access to resources where the lock persists between program invocations. You need a different strategy for that.
Create a lock file using exclusive create mode; in Python 2 this requires using the os.open() call (followed by os.fdopen() to produce a Python file object), in Python 3 you can use the 'x' mode when using the built-in open().
In Python 2:
import os
LOCKFILE = r'some\path\to\lockfile'
class AlreadyLocked(Exception):
pass
def lock():
try:
fd = os.open(LOCKFILE, os.O_WRONLY | os.O_CREAT | os.O_EXCL)
except IOError:
# file already exists
raise AlreadyLocked()
with os.fdopen(fd, 'w') as lockfile:
# write the PID of the current process so you can debug
# later if a lockfile can be deleted after a program crash
lockfile.write(os.getpid())
def unlock():
os.remove(LOCKFILE)
In Python 3 the lock() function would be:
def lock():
try:
with open(LOCKFILE, 'x') as lockfile:
# write the PID of the current process so you can debug
# later if a lockfile can be deleted after a program crash
lockfile.write(os.getpid())
except IOError:
# file already exists
raise AlreadyLocked()
You need to use exclusive create mode to avoid race conditions; in exclusive create mode the file can only be created if it doesn't yet exist, a condition checked by the Operating System, rather than by a separate step in Python which would open a window for another program to create the lock as well.
Now you can lock and unlock without tracking the file descriptor. The lockfile is now a signal file; if it is present something has claimed a lock, and deleting the file means something is unlocked.
This does mean that access to the files or directories you are trying to protect is only protected because all your code honours this lock system, not because the OS is enforcing locks on those files or directories.
This all means that this only works if all access to the shared resource is handled by processes that cooperate in this strategy. It cannot be used if another process doesn't honour this scheme. In that case your only option is to use OS level locking and you have to keep your process running for the full duration of the lock.
there is a method in win32api to set file attributes if you have a read of the following:
python SetFileAttributes
MSDN file attributes
these give you the python method to set file attributes:
win32api.SetFileAttributes(file, win32con.FILE_ATTRIBUTE_NORMAL)
where file is the name/path of the file, and not a file object
and the second argument is a attribute mask, is you wanted to set several attributes at once, you can use bitwise xor to add them:
win32con.FILE_ATTRIBUTE_HIDDEN | win32con.FILE_ATTRIBUTE_READONLY
and there are more constants named in the MSDN page.
EDIT:
for file locking you can also look at the win32file.LockFileEx method
i haven't used this before so it may take some playing around, but it appears to need you to pass it a file object (not a path) and then certain constants to set the access permissions, more info on the constants can be found on MSDN
You could use subprocess to open the file in notepad or excel:
import subprocess, time
subprocess.call('start excel.exe "\lockThisFile.txt\"', shell = True)
time.sleep(10) # if you need the file locked before executing the next commands, you may need to sleep it for a few seconds
or
subprocess.call('notepad > lockThisFile.txt', shell = True)
As written you need shell = True, otherwise windows will give you a syntax error.
(subprocess.Popen() works as well)
You can then close the process later using:
subprocess.call('taskkill /f /im notepad.exe') # or excel.exe
Other options include
-write some C++ code and call it from python (https://msdn.microsoft.com/en-us/library/windows/desktop/aa365203(v=vs.85).aspx)
-call 3rd party programs with subprocess.call():
FileLocker http://www.jensscheffler.de/filelocker (https://superuser.com/questions/294826/how-to-purposefully-exclusively-lock-a-file)
Easy File Locker http://www.xoslab.com/efl.html and Dispatch (from win32com.client import Dispatch), although last choice is the most complex
I wish to write to a file based on whether that file already exists or not, only writing if it doesn't already exist (in practice, I wish to keep trying files until I find one that doesn't exist).
The following code shows a way in which a potentially attacker could insert a symlink, as suggested in this post in between a test for the file and the file being written. If the code is run with high enough permissions, this could overwrite an arbitrary file.
Is there a way to solve this problem?
import os
import errno
file_to_be_attacked = 'important_file'
with open(file_to_be_attacked, 'w') as f:
f.write('Some important content!\n')
test_file = 'testfile'
try:
with open(test_file) as f: pass
except IOError, e:
# Symlink created here
os.symlink(file_to_be_attacked, test_file)
if e.errno != errno.ENOENT:
raise
else:
with open(test_file, 'w') as f:
f.write('Hello, kthxbye!\n')
Edit: See also Dave Jones' answer: from Python 3.3, you can use the x flag to open() to provide this function.
Original answer below
Yes, but not using Python's standard open() call. You'll need to use os.open() instead, which allows you to specify flags to the underlying C code.
In particular, you want to use O_CREAT | O_EXCL. From the man page for open(2) under O_EXCL on my Unix system:
Ensure that this call creates the file: if this flag is specified in conjunction with O_CREAT, and pathname already exists, then open() will fail. The behavior of O_EXCL is undefined if O_CREAT is not specified.
When these two flags are specified, symbolic links are not followed: if pathname is a symbolic link, then open() fails regardless of where the symbolic link points to.
O_EXCL is only supported on NFS when using NFSv3 or later on kernel 2.6 or later. In environments where NFS O_EXCL support is not provided, programs that rely on it for performing locking tasks will contain a race condition.
So it's not perfect, but AFAIK it's the closest you can get to avoiding this race condition.
Edit: the other rules of using os.open() instead of open() still apply. In particular, if you want use the returned file descriptor for reading or writing, you'll need one of the O_RDONLY, O_WRONLY or O_RDWR flags as well.
All the O_* flags are in Python's os module, so you'll need to import os and use os.O_CREAT etc.
Example:
import os
import errno
flags = os.O_CREAT | os.O_EXCL | os.O_WRONLY
try:
file_handle = os.open('filename', flags)
except OSError as e:
if e.errno == errno.EEXIST: # Failed as the file already exists.
pass
else: # Something unexpected went wrong so reraise the exception.
raise
else: # No exception, so the file must have been created successfully.
with os.fdopen(file_handle, 'w') as file_obj:
# Using `os.fdopen` converts the handle to an object that acts like a
# regular Python file object, and the `with` context manager means the
# file will be automatically closed when we're done with it.
file_obj.write("Look, ma, I'm writing to a new file!")
For reference, Python 3.3 implements a new 'x' mode in the open() function to cover this use-case (create only, fail if file exists). Note that the 'x' mode is specified on its own. Using 'wx' results in a ValueError as the 'w' is redundant (the only thing you can do if the call succeeds is write to the file anyway; it can't have existed if the call succeeds):
>>> f1 = open('new_binary_file', 'xb')
>>> f2 = open('new_text_file', 'x')
For Python 3.2 and below (including Python 2.x) please refer to the accepted answer.
This code will easily create a file if one does not exists.
import os
if not os.path.exists('file'):
open('file', 'w').close()
In Python (tried this in 2.7 and below) it looks like a file created using tempfile.NamedTemporaryFile doesn't seem to obey the umask directive:
import os, tempfile
os.umask(022)
f1 = open ("goodfile", "w")
f2 = tempfile.NamedTemporaryFile(dir='.')
f2.name
Out[33]: '/Users/foo/tmp4zK9Fe'
ls -l
-rw------- 1 foo foo 0 May 10 13:29 /Users/foo/tmp4zK9Fe
-rw-r--r-- 1 foo foo 0 May 10 13:28 /Users/foo/goodfile
Any idea why NamedTemporaryFile won't pick up the umask? Is there any way to do this during file creation?
I can always workaround this with os.chmod(), but I was hoping for something that did the right thing during file creation.
This is a security feature. The NamedTemporaryFile is always created with mode 0600, hardcoded at tempfile.py, line 235, because it is private to your process until you open it up with chmod. There is no constructor argument to change this behavior.
In case it might help someone, I wanted to do more or less the same thing, here is the code I have used:
import os
from tempfile import NamedTemporaryFile
def UmaskNamedTemporaryFile(*args, **kargs):
fdesc = NamedTemporaryFile(*args, **kargs)
# we need to set umask to get its current value. As noted
# by Florian Brucker (comment), this is a potential security
# issue, as it affects all the threads. Considering that it is
# less a problem to create a file with permissions 000 than 666,
# we use 666 as the umask temporary value.
umask = os.umask(0o666)
os.umask(umask)
os.chmod(fdesc.name, 0o666 & ~umask)
return fdesc
I'm trying to create a file that is only user-readable and -writable (0600).
Is the only way to do so by using os.open() as follows?
import os
fd = os.open('/path/to/file', os.O_WRONLY, 0o600)
myFileObject = os.fdopen(fd)
myFileObject.write(...)
myFileObject.close()
Ideally, I'd like to be able to use the with keyword so I can close the object automatically. Is there a better way to do what I'm doing above?
What's the problem? file.close() will close the file even though it was open with os.open().
with os.fdopen(os.open('/path/to/file', os.O_WRONLY | os.O_CREAT, 0o600), 'w') as handle:
handle.write(...)
This answer addresses multiple concerns with the answer by vartec, especially the umask concern.
import os
import stat
# Define file params
fname = '/tmp/myfile'
flags = os.O_WRONLY | os.O_CREAT | os.O_EXCL # Refer to "man 2 open".
mode = stat.S_IRUSR | stat.S_IWUSR # This is 0o600.
umask = 0o777 ^ mode # Prevents always downgrading umask to 0.
# For security, remove file with potentially elevated mode
try:
os.remove(fname)
except OSError:
pass
# Open file descriptor
umask_original = os.umask(umask)
try:
fdesc = os.open(fname, flags, mode)
finally:
os.umask(umask_original)
# Open file handle and write to file
with os.fdopen(fdesc, 'w') as fout:
fout.write('something\n')
If the desired mode is 0600, it can more clearly be specified as the octal number 0o600. Even better, just use the stat module.
Even though the old file is first deleted, a race condition is still possible. Including os.O_EXCL with os.O_CREAT in the flags will prevent the file from being created if it exists due to a race condition. This is a necessary secondary security measure to prevent opening a file that may already exist with a potentially elevated mode. In Python 3, FileExistsError with [Errno 17] is raised if the file exists.
Failing to first set the umask to 0 or to 0o777 ^ mode can lead to an incorrect mode (permission) being set by os.open. This is because the default umask is usually not 0, and it will be applied to the specified mode. For example, if my original umask is 2 i.e. 0o002, and my specified mode is 0o222, if I fail to first set the umask, the resulting file can instead have a mode of 0o220, which is not what I wanted. Per man 2 open, the mode of the created file is mode & ~umask.
The umask is restored to its original value as soon as possible. This getting and setting is not thread safe, and a threading.Lock must be used in a multithreaded application.
For more info about umask, refer to this thread.
update
Folks, while I thank you for the upvotes here, I myself have to argue against my originally proposed solution below. The reason is doing things this way, there will be an amount of time, however small, where the file does exist, and does not have the proper permissions in place - this leave open wide ways of attack, and even buggy behavior.
Of course creating the file with the correct permissions in the first place is the way to go - against the correctness of that, using Python's with is just some candy.
So please, take this answer as an example of "what not to do";
original post
You can use os.chmod instead:
>>> import os
>>> name = "eek.txt"
>>> with open(name, "wt") as myfile:
... os.chmod(name, 0o600)
... myfile.write("eeek")
...
>>> os.system("ls -lh " + name)
-rw------- 1 gwidion gwidion 4 2011-04-11 13:47 eek.txt
0
>>>
(Note that the way to use octals in Python is by being explicit - by prefixing it with "0o" like in "0o600". In Python 2.x it would work writing just 0600 - but that is both misleading and deprecated.)
However, if your security is critical, you probably should resort to creating it with os.open, as you do and use os.fdopen to retrieve a Python File object from the file descriptor returned by os.open.
The question is about setting the permissions to be sure the file will not be world-readable (only read/write for the current user).
Unfortunately, on its own, the code:
fd = os.open('/path/to/file', os.O_WRONLY, 0o600)
does not guarantee that permissions will be denied to the world. It does try to set r/w for the current user (provided that umask allows it), that's it!
On two very different test systems, this code creates a file with -rw-r--r-- with my default umask, and -rw-rw-rw- with umask(0) which is definitely not what is desired (and poses a serious security risk).
If you want to make sure that the file has no bits set for group and world, you have to umask these bits first (remember - umask is denial of permissions):
os.umask(0o177)
Besides, to be 100% sure that the file doesn't already exist with different permissions, you have to chmod/delete it first (delete is safer, since you may not have write permissions in the target directory - and if you have security concerns, you don't want to write some file where you're not allowed to!), otherwise you may have a security issue if a hacker created the file before you with world-wide r/w permissions in anticipation of your move. In that case, os.open will open the file without setting its permissions at all and you're left with a world r/w secret file...
So you need:
import os
if os.path.isfile(file):
os.remove(file)
original_umask = os.umask(0o177) # 0o777 ^ 0o600
try:
handle = os.fdopen(os.open(file, os.O_WRONLY | os.O_CREAT, 0o600), 'w')
finally:
os.umask(original_umask)
This is the safe way to ensure the creation of a -rw------- file regardless of your environment and configuration. And of course you can catch and deal with the IOErrors as needed. If you don't have write permissions in the target directory, you shouldn't be able to create the file, and if it already existed the delete will fail.
I would like to suggest a modification of A-B-B's excellent answer that separates the concerns a bit more clearly. The main advantage would be that you can handle exceptions that occur during opening the file descriptor separately from other problems during actual writing to the file.
The outer try ... finally block takes care of handling the permission and umask issues while opening the file descriptor. The inner with block deals with possible exceptions while working with the Python file object (as this was the OP's wish):
try:
oldumask = os.umask(0)
fdesc = os.open(outfname, os.O_WRONLY | os.O_CREAT, 0o600)
with os.fdopen(fdesc, "w") as outf:
# ...write to outf, closes on success or on exceptions automatically...
except IOError, ... :
# ...handle possible os.open() errors here...
finally:
os.umask(oldumask)
If you want to append to the file instead of writing, then the file descriptor should be opened like this:
fdesc = os.open(outfname, os.O_WRONLY | os.O_CREAT | os.O_APPEND, 0o600)
and the file object like this:
with os.fdopen(fdesc, "a") as outf:
Of course all other usual combinations are possible.
I'd do differently.
from contextlib import contextmanager
#contextmanager
def umask_helper(desired_umask):
""" A little helper to safely set and restore umask(2). """
try:
prev_umask = os.umask(desired_umask)
yield
finally:
os.umask(prev_umask)
# ---------------------------------- […] ---------------------------------- #
[…]
with umask_helper(0o077):
os.mkdir(os.path.dirname(MY_FILE))
with open(MY_FILE, 'wt') as f:
[…]
File-manipulating code tends to be already try-except-heavy; making it even worse with os.umask's finally isn't going to bring your eyes any more joy. Meanwhile, rolling your own context manager is that easy, and results in somewhat neater indentation nesting.