Python open() vs. .close() - python

Regarding syntax in Python
Why do we use open("file") to open but not "file".close() to close it?
Why isn't it "file".open() or inversely close("file")?

It's because open() is a function, and .close() is an object method. "file".open() doesn't make sense, because you're implying that the open() function is actually a class or instance method of the string "file". Not all strings are valid files or devices to be opened, so how the interpreter is supposed to handle "not a file-like device".open() would be ambiguous. We don't use "file".close() for the same reason.
close("file") would require a lookup of the file name, then another lookup to see if there are file handles, owned by the current process, attached to that file. That would be very inefficient and probably has hidden pitfalls that would make it unreliable (for example, what if it's not a file, but a TTY device instead?). It's much faster and simpler to just keep a reference to the opened file or device and close the file through that reference (also called a handle).
Many languages take this approach:
f = open("file") # open a file and return a file object or handle
# stuff...
close(f) # close the file, using the file handle or object as a reference
This looks similar to your close("file") construct, but don't be fooled: it's closing the file through a direct reference to it, not the file name as stored in a string.
The Python developers have chosen to do the same thing, but it looks different because they have implemented it with an object-oriented approach instead. Part of the reason for this is that Python file objects have a lot of methods available to them, such as read(), flush(), seek(), etc. If we used close(f), then we would have to either change all of the rest of the file object methods to functions, or let it be one random function that behaves differently from the rest for no good reason.
TL;DR
The design of open() and file.close() is consistent with OOP principals and good file reference practices. open() is a factory-like function that creates objects that reference files or other devices. Once the object is created, then all other operations on that object are done through class or instance methods.

Normally you shouldn't use "file".close() explicitly but use open(file) as contextmanager so the file handle is also closed if an exception happened. End of problem :-)
But to actually answer your question I assume the reason is that open supports many options and the returned class differs depending on these options (see also io module). So it would be simply much more complicated for the end-user to remember which class he wants and then use the "class".open with the right class itself. Note you can also pass "integer file descriptor of the file to be wrapped." to open, this would mean besides having a str.open() method you also get a int.open(). This would be really bad OO design but also confusing. I wouldn't care to guess what kind of questions would be asked on StackOverflow about that ("door".open(), (1).open())...
However I must admit that there is a pathlib.Path.open function. But if you have a Path it isn't ambiguous anymore.
As to a close() function: Each instance will have a close() method already and there are no differences between the different classes, so why create an additional function? There is simply no advantage.

Only slightly less new than you but I'll give this one a go, basically opening and closing are pretty different actions in a language like python. When you are opening the file what you are really doing is creating an object to be worked within your application that represents the file, so you create it with a function that informs the OS that the file has been opened and creates an object that python can use to read and write to the file. When it comes time to close the file what basically needs to be done is for your app to tell the OS that it is done with the file and dispose of the object that represented the file fro memory, and the easiest way to do that is with a method on the object itself. Also note that a syntax like "file".open would require the string type to include methods for opening files, which would be a very strange design and require a lot of extensions on the string type for anything else you wanted to implement with that syntax. close(file) would make a bit more sense but would still be a clunky way of releasing that object/letting the OS know the file was no longer open, and you would be passing a variable file representing the object created when you opened the file rather than a string pointing to the file's path.

In addition to what has been said, I quote the changelog in Python removing the built in file type. It (briefly) explains why the class constructor approach using the file type (available in Python 2) was removed in Python 3:
Removed the file type. Use open(). There are now several different kinds of streams that open can return in the io module.
Basically, while file("filename") would create an instance of file, open("filename") can return instances of different stream classes, depending on the mode.
https://docs.python.org/3.4/whatsnew/3.0.html#builtins

Related

How can I save a dynamically generated module and reimport them from file?

I have an application that dynamically generates a lot of Python modules with class factories to eliminate a lot of redundant boilerplate that makes the code hard to debug across similar implementations and it works well except that the dynamic generation of the classes across the modules (hundreds of them) takes more time to load than simply importing from a file. So I would like to find a way to save the modules to a file after generation (unless reset) then load from those files to cut down on bootstrap time for the platform.
Does anyone know how I can save/export auto-generated Python modules to a file for re-import later. I already know that pickling and exporting as a JSON object won't work because they make use of thread locks and other dynamic state variables and the classes must be defined before they can be pickled. I need to save the actual class definitions, not instances. The classes are defined with the type() function.
If you have ideas of knowledge on how to do this I would really appreciate your input.
You’re basically asking how to write a compiler whose input is a module object and whose output is a .pyc file. (One plausible strategy is of course to generate a .py and then byte-compile that in the usual fashion; the following could even be adapted to do so.) It’s fairly easy to do this for simple cases: the .pyc format is very simple (but note the comments there), and the marshal module does all of the heavy lifting for it. One point of warning that might be obvious: if you’ve already evaluated, say, os.getcwd() when you generate the code, that’s not at all the same as evaluating it when loading it in a new process.
The “only” other task is constructing the code objects for the module and each class: this requires concatenating a large number of boring values from the dis module, and will fail if any object encountered is non-trivial. These might be global/static variables/constants or default argument values: if you can alter your generator to produce modules directly, you can probably wrap all of these (along with anything else you want to defer) in function calls by compiling something like
my_global=(lambda: open(os.devnull,'w'))()
so that you actually emit the function and then a call to it. If you can’t so alter it, you’ll have to have rules to recognize values that need to be constructed in this fashion so that you can replace them with such calls.
Another detail that may be important is closures: if your generator uses local functions/classes, you’ll need to create the cell objects, perhaps via “fake” closures of your own:
def cell(x): return (lambda: x).__closure__[0]

Use case for low-level os.open, os.fdopen, and friends?

In Python 3.2 (and other versions), the documentation for os.open states:
This function is intended for low-level I/O. For normal usage, use the built-in function open(), which returns a file object with read() and write() methods (and many more). To wrap a file descriptor in a file object, use fdopen().
And for fdopen():
Return an open file object connected to the file descriptor fd. This is an alias of open() and accepts the same arguments. The only difference is that the first argument of fdopen() must always be an integer.
This comment in a question on the difference between io.open and os.open (this difference is entirely clear to me, I always use io.open, never os.open) asks: why would anyone choose Python for low-level I/O?, but doesn't really get an answer.
My question is very similar to the comment-question: In Python, what is the use case of low-level I/O through os.open, os.fdopen, os.close, os.read, etc.? I used to think it was needed to deamonise a process, but I'm not so sure anymore. Is there any task that can only be performed using low-level I/O, and not with the higher-level wrappers?
I use it when I need to use O_CREAT | O_EXCL to atomically create a file, failing if the file exists. You can't check for file existence then create the file if your test found that it does not exist, because that will create a race condition where the file could be created in the interim period between your check and creation.
Briefly looking at the link you provided, I do believe pidfile creation has a race condition.
In Python 3.3, there is a new 'x' mode added to open() that seems to do this. I haven't tried it, though.
Major Differences:
Low level access to files is unbuffered
Low level access is not portable
Low level allows more fine grained control, e.g. whether to block or not to block upon read
Use cases for low level io:
The file is a block device
The file is a socket
The file is a tty
...
In all these cases you might wish to have that more fine grained control (over buffering and blocking behavior).
You probably never will need the low level functions for regular files. I think most of the time the use case will be some device driver stuff. However, this would better be done in C. But I can see the use case for python as well, e.g. for fast prototyping of device drivers.

Make an object a "character buffer object"

I have a custom class, which has some string data. I wish to be able to save this string data to a file, using a file handle's write object. I have implemented __str__(), so i can do str(myobject), what is the equivalent method for making python consider my object to be a character buffer object?
If you are trying to use your object with library code, that expects to be able to write what you give it to a file, then you may have to resort to implementing a "duck file" class that acts like a file but supports your stringable object. Unfortunately, file is not a type that you can subclass easily, at least as of Python 2.6. You will have to implement enough of the file protocol (write, writelines, tell, etc.) to allow the library code to work as expected.
There isn't a single function, but a whole range of them - read, seek, etc.
Why don't you subclass StringIO.StringIO, which is already a string buffer?

What is the minimal subset of file methods I need to implement to get the full python file interface?

Python has the marvelous collections module that has tools to allow you to implement a full dict (for example) from a minimal set of methods. Is there a similar thing for the file interface in Python? If not, what would you recommend as a minimal set of methods to implement for a file-like object for duck-typing purposes?
And how do you deal with things that would like to use your file like object in a with statement, like you can with a regular file, or who want to iterate over it (like you can with a regular file) or who want to be able to call readline or readlines and have it do something intelligent and useful (like you can with a regular file)? Do you have to implement them all yourself? Or are there better options?
I know I can implement each and every single one of these myself, by hand. But the collections interface allows me to implement a dict by implementing just __len__, __iter__, __setitem__, and __getitem__. I get pop, popitem, clear, update, setdefault, __contains__, keys, items, values, get, __eq__, and __ne__ all for free. There is a minimal interface for __dict__ defined, and if I implement it, I get the full dict interface, all of the extra methods being implemented in terms of the minimal interface.
Similarly, I would like to know what the minimal interface for file is that I have to implement in order to get the full interface. Is there a way to get __enter__, __exit__, readline, readlines, __iter__ and next if I just implement read, write and close, or do I have to implement everything myself by hand each and every time I want the full file interface?
The with statement requires a context manager:
http://docs.python.org/library/stdtypes.html#typecontextmanager
The file type is fully defined:
http://docs.python.org/library/stdtypes.html#file-objects
Seems pretty simple.
The documentation lists the methods and attributes of a file and a context manager. Implement those.
What more information do you need?
http://docs.python.org/library/contextlib.html?highlight=context%20manager
If you want all the methods to work, you have to implement all the methods. Unlike the collections, there is no abstract base class for files.
I would look at io.IOBase[1] and io.RawIOBase for >2.6 compatibility. This will keep you moving forward with 3.x (io implements the 3.x file interface).
[1] http://docs.python.org/library/io.html#i-o-base-classes
You kind of answered it yourself. While there is no set of "special" methods you need to implement the file interface, you can do it just by providing a couple of methods normally associated with files. Duck typing takes care of the rest.
You only really need a read and/or a write method (depending on whether you want it to be readable and/or writable) which behave the same as a normal file object. You can have a look at the Python file object reference to see all of the methods of a file object. Basically, the more you implement, the more situations your class will work in place of a file. (For example, if you implement seek, then it will work in any function that performs seeking on a file.) Note that there is a continuum here, there is no absolute "it supports the file protocol or it doesn't." In fact, there is no way to work 100% in all the places that support file-like objects, because some code will access low-level details of the real file type, and yours won't work there.
In summary, any class that implements read and write will work in most situations that require a "file-like object".
(Note that the special method names like __getitem__ for dicts are really not special, except they are used by special syntax like [key] -- thats why dict has special method names and file does not.)

Overhead of passing around a file object instead of file name?

I have a method that detects what file it should open based on input, opens the file, then returns the file object.
def find_and_open(search_term):
# ... logic to find file
return open(filename, 'r')
I like this way because it hides the most amount of implementation from the caller. You give it your criteria, it spits out the file object. Why bother with String paths if you're just going to open it anyway?
However, in other Python projects I tend to see such a method return a string of the filepath, instead of the fileobject itself. The file is then opened at the very last minute, read/edited, and closed.
My questions are:
from a performance standpoint, does passing around file objects carry more overhead? I suppose a reference is a reference no matter what it points to, but perhaps there's something going on in the interpreter that makes a String reference faster to pass around than a file reference?
From a purely "Pythonic" standpoint, does it make more sense to return the file object, or the String path (and then open the file as late as possible)?
Performance is unlikely to be an issue. Reading and writing to disk are orders of magnitude slower than reading from RAM, so passing a pointer around is very unlikely to be the performance bottleneck. 1
From the python docs:
It is good practice to use the with keyword when dealing with file objects. This has the advantage that the file is properly closed after its suite finishes, even if an exception is raised on the way. It is also much shorter than writing equivalent try-finally blocks ...
Note that you can both use with to open a file and pass the file object around to other functions, either by nesting functions or using yield. Though I would consider this less pythonic than passing a file string around in most cases.
Simple is better than complex.
Flat is better than nested.
You might also be interested in pathlib.

Categories

Resources