Are open(file, "wt" or "rt") different objects? - python

When you do:
file = open("my file","wt")
and
file = open("my file" , "rt")
These both create file objects that we use file methods on. But are they creating different file objects? And if they are creating different file objects would it be fair to say that the "wt" one is mutable, while the "rt" one is immutable?

No, that would not be fair to say. You are creating instances of the same standard file type, which proxies file manipulation calls to the operating system. The mode defines what the operating system will let you do.
It doesn't matter if you use the same filename or different filenames; the OS doesn't care, and neither does Python; the open file objects are distinct.
The Python object itself is immutable; you cannot change the mode, filename or other attributes after the fact.
Note that by adding + to the mode, you can both read and write to the file object; w+ will truncate the file first, while r+ would not.

At the OS level, they would be created as two distinct file descriptors. They would (likely) point to the same data in the VFS/cache, but can be operated independently.

Related

How to extract the full path from a file while using the "with" statement?

I'm trying, just for fun, to understand if I can extract the full path of my file while using the with statement (python 3.8)
I have this simple code:
with open('tmp.txt', 'r') as file:
print(os.path.basename(file))
But I keep getting an error that it's not a suitable type format.
I've been trying also with the relpath, abspath, and so on.
It says that the input should be a string, but even after casting it into string, I'm getting something that I can't manipulate.
Perhaps there isn't an actual way to extract that full path name, but I think there is. I just can't find it, yet.
You could try:
import os
with open("tmp.txt", "r") as file_handle:
print(os.path.abspath(file_handle.name))
The functions in os.path accept strings or path-like objects. You are attempting to pass in a file instead. There are lots of reasons the types aren't interchangable.
Since you opened the file for text reading, file is an instance of io.TextIOWrapper. This class is just an interface that provides text encoding and decoding for some underlying data. It is not associated with a path in general: the underlying stream can be a file on disk, but also a pipe, a network socket, or an in-memory buffer (like io.StringIO). None of the latter are associated with a path or filename in the way that you are thinking, even though you would interface with them as through normal file objects.
If your file-like is an instance of io.FileIO, it will have a name attribute to keep track of this information for you. Other sources of data will not. Since the example in your question uses FileIO, you can do
with open('tmp.txt', 'r') as file:
print(os.path.abspath(file.name))
The full file path is given by os.path.abspath.
That being said, since file objects don't generally care about file names, it is probably better for you to keep track of that info yourself, in case one day you decide to use something else as input. Python 3.8+ allows you to do this without changing your line count using the walrus operator:
with open((filename := 'tmp.txt'), 'r') as file:
print(os.path.abspath(filename))

Do I need to close the OS-level handle returned to me by tempfile.mkstemp if not using it?

I need to create a temporary file to write some data out to in Python 3. The file will be written to via a separate module which deals with opening the file from a path given as a string.
I'm using tempfile.mkstemp() to create this temporary file and according to the docs:
mkstemp() returns a tuple containing an OS-level handle to an open file (as would be returned by os.open()) and the absolute pathname of that file, in that order.
Given I'm not going to be using the open OS-level file handle given to me, do I need to close it? I understand about regular Python file handles and closing them but I'm not familiar with OS-level file descriptors/handles.
So is this better:
fd, filename = tempfile.mkstemp()
os.close(fd)
Or can I simply just do this:
_, output_filename = tempfile.mkstemp()
The returned file descriptor is not a file object, the garbage collector will not close it for you.
You should use:
fd, filename = tempfile.mkstemp()
os.close(fd)
The returned file descriptor is useful to avoid race conditions where the filename is replaced with a symbolic link to a file that the attacker can not read but you can which can result in data exposure.

Dill deletes object when using "load"

I'm having an error that is driving me nuts. I generate some numerical simulation data sim_data.dill and save it to a directory on my computer using
with open(os.path.join(original_directory, 'sim_data.dill'), 'w' as f:
dill.dump(outputs, f)
This data is about 1 Gb and takes a while to generate. Now, I copied that file from original_directory to new_directory when I try to load it from a different program using
simfile = '/new_directory/sim_data.dill'
with open(simfile, 'r') as f:
outputs = dill.load(f)
One of two things happens:
the program says the file is missing with UnpicklingError: [Errno 2] No such file or directory: .../original_directory/sim_data.dill. This means dill puts in the original_directory in the metadata of the file and refuses to open it when the file is moved; truly appalling behavior.
when I copy the file back to new_directory, trying to open it gives an EOFError and dill changes the file to zero bytes, essentially deleting it. This is even worse.
I can read the file just fine by using a standard with open(simfile, 'r') as f; print f.readlines(), but obviously this does not help when trying to recover the internal class structure of the files.
Apparently this is normal behavior for dill; please see:
https://github.com/uqfoundation/dill/issues/296
Paraphrasing: the file location is part of the file handle to be pickled, and so unpickling it without that information is impossible. This means, apparently, that if you save a .dill file in one location, move the file manually (for example to a more convenient directory), and then try to open it again, it won't work.
In terms of the deletion issue, the author of the post above recommends to use fmode=FMODE_PRESERVEDATA or one of the other file modes listed at
https://github.com/matsjoyce/dill/blob/087c00899ef55f31d36e7aee51a958b17daf8c91/dill/dill.py#L136-L145

Change open file access mode

Is it possible to change file access mode after the file has been opened?
f=open(my_file, 'r')
change f to be able to write to it, or to declare that the file should be opened in universal newline mode?
Since changing file descriptor's permissions is not supported by Linux nor Windows. (there is no posix function to change open mode in linux at least), it's not possible to change it's permissions once the file descriptor have been set (Some OS specific tricks exists but I wouldn't recommend it).
You will need to reopen it with other permissions.
While there doesn't seem to be any way of changing the access mode on the underlying descriptor you could do the work somewhat at the python object level if you want to restrict the access (if you want to make a readonly file writable you're out of luck). Something like this:
f=open(my_file, 'w+')
f.write = None
f.writelines = None
# etc...
If you're using python2 you would need to wrap the file object to be able to disable the writing methods.
While you could restore a such modified file object to be writable again (and thereby you could circumvent the block - which by the way is almost always the case in python), it could be made to emulate the behaviour of a read-only file (which would be good enough for many cases).
You can open file as follows to be able to read and write
f = open(my_file, 'r+')
Assuming you've closed the file, just reassign to a new file object:
f = open(my_file, 'w')
Given that you have a file object f_r that was opened only for reading, you can use os.fdopen() to get file object f_w that is associated with the same file, but has different mode:
f_r = open(filename, "r")
f_w = os.fdopen(f_read.fileno(), "a+")
f_w.write("Here I come\n")
However, this path can lead to misery and suffering when misused. Since file objects do some buffering (if not disabled), simultaneous use of both f_r and f_w can cause unexpected results. Also reopening <stdin> or <stdout> may or may not do what you need or expect.
Here's how I solved this problem. For context, in my case, the file was only stored in memory, not on the disk, so I wasn't able to just reopen it from there.
from io import StringIO
...
bytes = file.read()
string = bytes.decode("utf-8") # or whatever encoding you wanna use
file = StringIO(string)
If you do not want to reopen it, use:
f.mode = "mode-to-change-to"#w, a, r, ect.
for mode,
f.name = "file_name"
for name, and:
f.encoding = "encoding"#default is UTF-8
for encoding.
Edit
you should use:
with open("filename", "mode") as f:
#do something
f.mode = "another-mode"
#do something else
so that the file closes automaticly when you are finished

Is it possible to modify lines in a file in-place?

Is it possible to parse a file line by line, and edit a line in-place while going through the lines?
Is it possible to parse a file line by line, and edit a line in-place while going through the lines?
It can be simulated using a backup file as stdlib's fileinput module does.
Here's an example script that removes lines that do not satisfy some_condition from files given on the command line or stdin:
#!/usr/bin/env python
# grep_some_condition.py
import fileinput
for line in fileinput.input(inplace=True, backup='.bak'):
if some_condition(line):
print line, # this goes to the current file
Example:
$ python grep_some_condition.py first_file.txt second_file.txt
On completion first_file.txt and second_file.txt files will contain only lines that satisfy some_condition() predicate.
fileinput module has very ugly API, I find beautiful module for this task - in_place, example for Python 3:
import in_place
with in_place.InPlace('data.txt') as file:
for line in file:
line = line.replace('test', 'testZ')
file.write(line)
main difference from fileinput:
Instead of hijacking sys.stdout, a new filehandle is returned for writing.
The filehandle supports all of the standard I/O methods, not just readline().
Important Notes:
This solution deletes every line in the file if you don't re-write it with the file.write() line.
Also, if the process is interrupted, you lose any line in the file that has not already been re-written.
No. You cannot safely write to a file you are also reading, as any changes you make to the file could overwrite content you have not read yet. To do it safely you'd have to read the file into a buffer, updating any lines as required, and then re-write the file.
If you're replacing byte-for-byte the content in the file (i.e. if the text you are replacing is the same length as the new string you are replacing it with), then you can get away with it, but it's a hornets nest, so I'd save yourself the hassle and just read the full file, replace content in memory (or via a temporary file), and write it out again.
If you only intend to perform localized changes that do not change the length of the part of the file that is modified (e.g. changing all characters to lower case), then you can actually overwrite the old contents of the file dynamically.
To do that, you can use random file access with the seek() method of a file object.
Alternatively, you may be able to use an mmap object to treat the whole file as a mutable string. Keep in mind that mmap objects may impose a maximum file-size limit in the 2-4 GB range on a 32-bit CPU, depending on your operating system and its configuration.
You have to back up by the size of the line in characters. Assuming you used readline, then you can get the length of the line and back up using:
file.seek(offset[, whence])
Set whence to SEEK_CUR, set offset to -length.
See Python Docs or look at the manpage for seek.

Categories

Resources