Python os.rename if file system is full

Python os.rename if file system is full - python

I'm asking this beacause there's no way to try it myself (if there's one share it please (:).
I'm doing some file handling with Python os library, specifically file moving/renaming with os.rename().
Python docs explains some of the exceptions this function might raise here, but do not say anything about a full file system case. My guess is it raises an IOError, is this right?
Cheers.

In practice this should rarely come up, but if you want to test I'd recommend creating a small file system (I don't know what OS you are on, but this could be on a virtual partition, a RAM disk, a flash drive, etc.) and loading it up with garbage files to see what happens. Something like this maybe:
aBigNumber = 100000000000000000000000000000000
counter = 0
while (True):
counter += 1
anotherFile = open(`counter` + ".txt", "wb")
anotherFile.write("0" * aBigNumber)
anotherFile.close()
When you get an exception, you should be able to verify that the disk is full and then you'll know what kind of error to expect.

You can test it by filling up a small partition and then try the file operations on the filled filesystem. On *nix systems you can mount a tmpfs; for windows maybe use a usb stick.

Related

How do I access a file for reading/writing in a different (non-current) directory?

I am working on the listener portion of a backdoor program (for an ETHICAL hacking course) and I would like to be able to read files from any part of my linux system and not just from within the directory where my listener python script is located - however, this has not proven to be as simple as specifying a typical absolute path such as "~/Desktop/test.txt"
So far my code is able to read files and upload them to the virtual machine where my reverse backdoor script is actively running. But this is only when I read and upload files that are in the same directory as my listener script (aptly named listener.py). Code shown below.
def read_file(self, path):
with open(path, "rb") as file:
return base64.b64encode(file.read())
As I've mentioned previously, the above function only works if I try to open and read a file that is in the same directory as the script that the above code belongs to, meaning that path in the above content is a simple file name such as "picture.jpg"
I would like to be able to read a file from any part of my filesystem while maintaining the same functionality.
For example, I would love to be able to specify "~/Desktop/another_picture.jpg" as the path so that the contents of "another_picture.jpg" from my "~/Desktop" directory are base64 encoded for further processing and eventual upload.
Any and all help is much appreciated.
Edit 1:
My script where all the code is contained, "listener.py", is located in /root/PycharmProjects/virus_related/reverse_backdoor/. within this directory is a file that for simplicity's sake we can call "picture.jpg" The same file, "picture.jpg" is also located on my desktop, absolute path = "/root/Desktop/picture.jpg"
When I try read_file("picture.jpg"), there are no problems, the file is read.
When I try read_file("/root/Desktop/picture.jpg"), the file is not read and my terminal becomes stuck.
Edit 2:
I forgot to note that I am using the latest version of Kali Linux and Pycharm.
I have run "realpath picture.jpg" and it has yielded the path "/root/Desktop/picture.jpg"
Upon running read_file("/root/Desktop/picture.jpg"), I encounter the same problem where my terminal becomes stuck.
[FINAL EDIT aka Problem solved]:
Based on the answer suggesting trying to read a file like "../file", I realized that the code was fully functional because read_file("../file") worked without any flaws, indicating that my python script had no trouble locating the given path. Once the file was read, it was uploaded to the machine running my backdoor where, curiously, it uploaded the file to my target machine but in the parent directory of the script. It was then that I realized that problem lied in the handling of paths in the backdoor script rather than my listener.py
Credit is also due to the commentator who pointed out that "~" does not count as a valid path element. Once I reached the conclusion mentioned just above, I attempted read_file("~/Desktop/picture.jpg") which failed. But with a quick modification, read_file("/root/Desktop/picture.jpg") was successfully executed and the file was uploaded in the same directory as my backdoor script on my target machine once I implemented some quick-fix code.
My apologies for not being so specific; efforts to aid were certainly confounded by the unmentioned complexity of my situation and I would like to personally thank everyone who chipped in.
This was my first whole-hearted attempt to reach out to the stackoverflow community for help and I have not been disappointed. Cheers!

A solution I found is putting "../" before the filename if the path is right outside of the dictionary.
test.py (in some dictionary right inside dictionary "Desktop" (i.e. /Desktop/test):
with open("../test.txt", "r") as test:
print(test.readlines())
test.txt (in dictionary "/Desktop")
Hi!
Hello!
Result:
["Hi!", "Hello!"]
This is likely the simplest solution. I found this solution because I always use "cd ../" on the terminal.

This not only allows you to modify the current file, but all other files in the same directory as the one you are reading/writing to.
path = os.path.dirname(os.path.abspath(__file__))
dir_ = os.listdir(path)
for filename in dir_:
f = open(dir_ + '/' + filename)
content = f.read()
print filename, len(content)
try:
im = Image.open(filename)
im.show()
except IOError:
print('The following file is not an image type:', filename)

Get full path of currently open files

I'm trying to code a simple application that must read all currently open files within a certain directory.
More specificly, I want to get a list of files open anywhere inside my Documents folder,
but I don't want only the processes' IDs or process name, I want the full path of the open file.
The thing is I haven't quite found anything to do that.
I couldn't do it neither in linux shell (using ps and lsof commands) nor using python's psutil library. None of these is giving me the information I need, which is only the path of currently open files in a dir.
Any advice?
P.S: I'm tagging this as python question (besides os related tags) because it would be a plus if it could be done using some python library.

This seems to work (on Linux):
import subprocess
import shlex
cmd = shlex.split('lsof -F n +d .')
try:
output = subprocess.check_output(cmd).splitlines()
except subprocess.CalledProcessError as err:
output = err.output.splitlines()
output = [line[3:] for line in output if line.startswith('n./')]
# Out[3]: ['file.tmp']
it reads open files from current directory, non-recursively.
For recursive search, use +D option. Keep in mind, that it is vulnerable to race condition - when you get your ouput, situation might have changed already. It is always best to try to do something (open file), and check for failure, e.g. open file and catch exception or check for null FILE value in C.

A safe, atomic file-copy operation

I need to copy a file from one location to another, and I need to throw an exception (or at least somehow recognise) if the file already exists at the destination (no overwriting).
I can check first with os.path.exists() but it's extremely important that the file cannot be created in the small amount of time between checking and copying.
Is there a built-in way of doing this, or is there a way to define an action as atomic?

There is in fact a way to do this, atomically and safely, provided all actors do it the same way. It's an adaptation of the lock-free whack-a-mole algorithm, and not entirely trivial, so feel free to go with "no" as the general answer ;)
What to do
Check whether the file already exists. Stop if it does.
Generate a unique ID
Copy the source file to the target folder with a temporary name, say, <target>.<UUID>.tmp.
Rename† the copy <target>-<UUID>.mole.tmp.
Look for any other files matching the pattern <target>-*.mole.tmp.
If their UUID compares greater than yours, attempt to delete it. (Don't worry if it's gone.)
If their UUID compares less than yours, attempt to delete your own. (Again, don't worry if it's gone.) From now on, treat their UUID as if it were your own.
Check again to see if the destination file already exists. If so, attempt to delete your temporary file. (Don't worry if it's gone. Remember your UUID may have changed in step 5.)
If you didn't already attempt to delete it in step 6, attempt to rename your temporary file to its final name, <target>. (Don't worry if it's gone, just jump back to step 5.)
You're done!
How it works
Imagine each candidate source file is a mole coming out of its hole. Half-way out, it pauses and whacks any competing moles back into the ground, before checking no other mole has fully emerged. If you run this through in your head, you should see that only one mole will ever make it all the way out. To prevent this system from livelocking, we add a total ordering on which mole can whack which. Bam! A PhD thesis lock-free algorithm.
† Step 4 may look unnecessary—why not just use that name in the first place? However, another process may "adopt" your mole file in step 5, and make it the winner in step 7, so it's very important that you're not still writing out the contents! Renames on the same file system are atomic, so step 4 is safe.

There is no way to do this; file copy operations are never atomic and there is no way to make them.
But you can write the file under a random, temporary name and then rename it. Rename operations have to be atomic. If the file already exists, the rename will fail and you'll get an error.
[EDIT2] rename() is only atomic if you do it in the same file system. The safe way is to create the new file in the same folder as the destination.
[EDIT] There is a lot of discussion whether rename is always atomic or not and about the overwrite behavior. So I dug up some resources.
On Linux, if the destination exists and both source and destination are files, then the destination is silently overwritten (man page). So I was wrong there.
But rename(2) still guarantees that either the original file or the new file remain valid if something goes wrong, so the operation is atomic in the sense that it can't corrupt data. It's not atomic in the sense that it prevents two processes from doing the same rename at the same time and you can predict the result. One will win but you can't tell which.
On Windows, if another process is currently writing the file, you get an error if you try to open it for writing, so one advantage for Windows, here.
If your computer crashes while the operation is written to disk, the implementation of the file system will decide how much data gets corrupted. There is nothing an application could do about this. So stop whining already :-)
There is also no other approach that works better or even just as well as this one.
You could use file locking instead. But that would just make everything more complex and yield no additional advantages (besides being more complicated which some people do see as a huge advantage for some reason). And you'd add a lot of nice corner cases when your file is on a network drive.
You could use open(2) with the mode O_CREAT which would make the function fail if the file already exists. But that wouldn't prevent a second process to delete the file and writing their own copy.
Or you could create a lock directory since creating directories has to be atomic as well. But that would not buy you much, either. You'd have to write the locking code yourself and make absolutely, 100% sure that you really, really always delete the lock directory in case of disaster - which you can't.

A while back my team needed a mechanism for atomic writes in Python and we came up the following code (also available in a gist):
def copy_with_metadata(source, target):
"""Copy file with all its permissions and metadata.
Lifted from https://stackoverflow.com/a/43761127/2860309
:param source: source file name
:param target: target file name
"""
# copy content, stat-info (mode too), timestamps...
shutil.copy2(source, target)
# copy owner and group
st = os.stat(source)
os.chown(target, st[stat.ST_UID], st[stat.ST_GID])
def atomic_write(file_contents, target_file_path, mode="w"):
"""Write to a temporary file and rename it to avoid file corruption.
Attribution: #therightstuff, #deichrenner, #hrudham
:param file_contents: contents to be written to file
:param target_file_path: the file to be created or replaced
:param mode: the file mode defaults to "w", only "w" and "a" are supported
"""
# Use the same directory as the destination file so that moving it across
# file systems does not pose a problem.
temp_file = tempfile.NamedTemporaryFile(
delete=False,
dir=os.path.dirname(target_file_path))
try:
# preserve file metadata if it already exists
if os.path.exists(target_file_path):
copy_with_metadata(target_file_path, temp_file.name)
with open(temp_file.name, mode) as f:
f.write(file_contents)
f.flush()
os.fsync(f.fileno())
os.replace(temp_file.name, target_file_path)
finally:
if os.path.exists(temp_file.name):
try:
os.unlink(temp_file.name)
except:
pass
With this code, copying a file atomically is as simple as reading it into a variable and then sending it to atomic_write.
The comments should provide a good idea of what's going on but I also wrote up this more complete explanation on Medium for anyone interested.

Why is tempfile using DOS 8.3 directory names on my XP box?

>>> import tempfile
>>> tempfile.mkstemp()
(3, 'c:\\docume~1\\k0811260\\locals~1\\temp\\tmpk6tpd3')
It works, but looks a bit strange. and the actual temporary file name is more than 8 letters.
Why doesn't it use long file names instead?

mkstemp uses the environment variables TMPDIR, TEMP or TMP (the first one that is set) to determine where to put your temporary file. One of these is probably set to c:\docume~1\k0811260\locals~1\temp on your system. Issue
echo %%tmp%%
etc. in a command window ("DOS box") to find out for sure.
Which, in fact, is a good thing because some naïve modules/programs (e.g., those that call external OS commands) may get confused when a directory name contains a space, due to quoting issues.

How to safely write to a file?

Imagine you have a library for working with some sort of XML file or configuration file. The library reads the whole file into memory and provides methods for editing the content. When you are done manipulating the content you can call a write to save the content back to file. The question is how to do this in a safe way.
Overwriting the existing file (starting to write to the original file) is obviously not safe. If the write method fails before it is done you end up with a half written file and you have lost data.
A better option would be to write to a temporary file somewhere, and when the write method has finished, you copy the temporary file to the original file.
Now, if the copy somehow fails, you still have correctly saved data in the temporary file. And if the copy succeeds, you can remove the temporary file.
On POSIX systems I guess you can use the rename system call which is an atomic operation. But how would you do this best on a Windows system? In particular, how do you handle this best using Python?
Also, is there another scheme for safely writing to files?

If you see Python's documentation, it clearly mentions that os.rename() is an atomic operation. So in your case, writing data to a temporary file and then renaming it to the original file would be quite safe.
Another way could work like this:
let original file be abc.xml
create abc.xml.tmp and write new data to it
rename abc.xml to abc.xml.bak
rename abc.xml.tmp to abc.xml
after new abc.xml is properly put in place, remove abc.xml.bak
As you can see that you have the abc.xml.bak with you which you can use to restore if there are any issues related with the tmp file and of copying it back.

If you want to be POSIXly correct and save you have to:
Write to temporary file
Flush and fsync the file (or fdatasync)
Rename over the original file
Note that calling fsync has unpredictable effects on performance -- Linux on ext3 may stall for disk I/O whole numbers of seconds as a result, depending on other outstanding I/O.
Notice that rename is not an atomic operation in POSIX -- at least not in relation to file data as you expect. However, most operating systems and filesystems will work this way. But it seems you missed the very large linux discussion about Ext4 and filesystem guarantees about atomicity. I don't know exactly where to link but here is a start: ext4 and data loss.
Notice however that on many systems, rename will be as safe in practice as you expect. However it is in a way not possible to get both -- performance and reliability across all possible linux confiugrations!
With a write to a temporary file, then a rename of the temporary file, one would expect the operations are dependent and would be executed in order.
The issue however is that most, if not all filesystems separate metadata and data. A rename is only metadata. It may sound horrible to you, but filesystems value metadata over data (take Journaling in HFS+ or Ext3,4 for example)! The reason is that metadata is lighter, and if the metadata is corrupt, the whole filesystem is corrupt -- the filesystem must of course preserve it self, then preserve the user's data, in that order.
Ext4 did break the rename expectation when it first came out, however heuristics were added to resolve it. The issue is not a failed rename, but a successful rename. Ext4 might sucessfully register the rename, but fail to write out the file data if a crash comes shortly thereafter. The result is then a 0-length file and neither orignal nor new data.
So in short, POSIX makes no such guarantee. Read the linked Ext4 article for more information!

In Win API I found quite nice function ReplaceFile that does what name suggests even with optional back-up. There is always way with DeleteFile, MoveFile combo.
In general what you want to do is really good. And I cannot think of any better write scheme.

A simplistic solution. Use tempfile to create a temporary file and if writing succeeds the just rename the file to your original configuration file.
Note that rename is not atomic across filesystems. You'll have to resort to a slight workaround (e.g. tempfile on target filesystem, followed by a rename) in order to be really safe.
For locking a file, see portalocker.

The standard solution is this.
Write a new file with a similar name. X.ext# for example.
When that file has been closed (and perhaps even read and checksummed), then you two two renames.
X.ext (the original) to X.ext~
X.ext# (the new one) to X.ext
(Only for the crazy paranoids) call the OS sync function to force dirty buffer writes.
At no time is anything lost or corruptable. The only glitch can happen during the renames. But you haven't lost anything or corrupted anything. The original is recoverable right up until the final rename.

Per RedGlyph's suggestion, I'm added an implementation of ReplaceFile that uses ctypes to access the Windows APIs. I first added this to jaraco.windows.api.filesystem.
ReplaceFile = windll.kernel32.ReplaceFileW
ReplaceFile.restype = BOOL
ReplaceFile.argtypes = [
LPWSTR,
LPWSTR,
LPWSTR,
DWORD,
LPVOID,
LPVOID,
]
REPLACEFILE_WRITE_THROUGH = 0x1
REPLACEFILE_IGNORE_MERGE_ERRORS = 0x2
REPLACEFILE_IGNORE_ACL_ERRORS = 0x4
I then tested the behavior using this script.
from jaraco.windows.api.filesystem import ReplaceFile
import os
open('orig-file', 'w').write('some content')
open('replacing-file', 'w').write('new content')
ReplaceFile('orig-file', 'replacing-file', 'orig-backup', 0, 0, 0)
assert open('orig-file').read() == 'new content'
assert open('orig-backup').read() == 'some content'
assert not os.path.exists('replacing-file')
While this only works in Windows, it appears to have a lot of nice features that other replace routines would lack. See the API docs for details.

There's now a codified, pure-Python, and I dare say Pythonic solution to this in the boltons utility library: boltons.fileutils.atomic_save.
Just pip install boltons, then:
from boltons.fileutils import atomic_save
with atomic_save('/path/to/file.txt') as f:
f.write('this will only overwrite if it succeeds!\n')
There are a lot of practical options, all well-documented. Full disclosure, I am the author of boltons, but this particular part was built with a lot of community help. Don't hesitate to drop a note if something is unclear!

You could use the fileinput module to handle the backing-up and in-place writing for you:
import fileinput
for line in fileinput.input(filename,inplace=True, backup='.bak'):
# inplace=True causes the original file to be moved to a backup
# standard output is redirected to the original file.
# backup='.bak' specifies the extension for the backup file.
# manipulate line
newline=process(line)
print(newline)
If you need to read in the entire contents before you can write the newline's,
then you can do that first, then print entire new contents with
newcontents=process(contents)
for line in fileinput.input(filename,inplace=True, backup='.bak'):
print(newcontents)
break
If the script ends abruptly, you will still have the backup.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.