receive file object or file path - python

I know that in static languages it's always better to receive a file object rather than a string representing a path (from a software design standpoint). However, in a dynamic language like python where you can't see the type of the variable, what's the "correct" way to pass a file?
Isn't it problematic passing the function object since you need to remember to close it afterwards (which you probably won't since you can't see the type)?

Ideally you would be using the with statement whenever you open a file, so closing will be handled by that.
with open('filepath', 'r') as f:
myfunc(f)
otherstuff() # f is now closed
From the documentation:
It is good practice to use the with keyword when dealing with file
objects. This has the advantage that the file is properly closed after
its suite finishes, even if an exception is raised on the way.

Pass file object just like any other type.
def f(myfile):
myfile.write('asdf')
ofs = open(filepath, 'w') # ofs is file object
f(ofs) # passes file object
ofs.close()
Can also use closure.
def f():
return open(filepath, 'w') # returns file object
ofs = f()
ofs.write('something')
ofs.close()

However, in a dynamic language like python where you can't see the
type of the variable, what's the "correct" way to pass a file?
The short answer is - you don't.
In most object oriented languages, there is an object contract which guarantees that if the object has a method quack, it knows how to quack. Some languages are very strict in enforcing this contract (Java, for example) and others not so much.
In the end it comes down to one of Python's principles EAFP:
E asier to a sk for f orgiveness than p ermission. This common Python
coding style assumes the existence of valid keys or attributes and
catches exceptions if the assumption proves false. This clean and fast
style is characterized by the presence of many try and except
statements. The technique contrasts with the LBYL style common to many
other languages such as C.
LBYL = Look Before You Leap
What this means is that if your method is expecting a "file" (and you state this in your documentation), assume you are being passed a "file like object". Try to execute a file operation on the object (like read() or close()) and then catch the exception if its raised.
One of the main points of the EAFP approach is that you may be getting passed an object that works like a file, in other words - the caller knows what they are doing. So if you spend time checking for exact types, you'll have code that isn't working when it should. Now the burden is on the caller to meet your "object contract"; but what if they are not working with files but with an in-memory buffer (which have the same methods as files)? Or a request object (again, have the same file-like methods). You can't possibly check for all these variations in your code.
This is the preferred approach - instead of the LBYL approach, which would be type checking first.
So, if your method's documentation states that its expecting a file object, it should work with any object that is "file like", but when someone passes it a string to a file path, your method should raise an appropriate exception.
Also, and more importantly - you should avoid closing the object in your method; because it may not be a "file" like explained earlier. However if you absolutely must, make sure the documentation for your method states this very clearly.
Here is an example:
def my_method(fobj):
''' Writes to fobj, which is any file-like object,
and returns the object '''
try:
fobj.write('The answer is: {}\n'.format(42))
except (AttributeError, TypeError):
raise TypeError('Expected file-like object')
return fobj

You can use file objects in Python. When they are (automatically) garbage collected, the file will be closed for you.
File objects are implemented using C’s stdio package and can be created with the built-in open() function.

Related

Python: store expected Exceptions in function attributes

Is it pythonic to store the expected exceptions of a funcion as attributes of the function itself? or just a stinking bad practice.
Something like this
class MyCoolError(Exception):
pass
def function(*args):
"""
:raises: MyCoolError
"""
# do something here
if some_condition:
raise MyCoolError
function.MyCoolError = MyCoolError
And there in other module
try:
function(...)
except function.MyCoolError:
#...
Pro: Anywhere I have a reference to my function, I have also a reference to the exception it can raise, and I don't have to import it explicitly.
Con: I "have" to repeat the name of the exception to bind it to the function. This could be done with a decorator, but it is also added complexity.
EDIT
Why I am doing this is because I append some methods in an irregular way to some classes, where I think that a mixin it is not worth it. Let's call it "tailored added functionality". For instance let's say:
Class A uses method fn1 and fn2
Class B uses method fn2 and fn3
Class C uses fn4 ...
And like this for about 15 classes.
So when I call obj_a.fn2(), I have to import explicitly the exception it may raise (and it is not in the module where classes A, B or C, but in another one where the shared methods live)... which I think it is a little bit annoying. Appart from that, the standard style in the project I'm working in forces to write one import per line, so it gets pretty verbose.
In some code I have seen exceptions stored as class attributes, and I have found it pretty useful, like:
try:
obj.fn()
except obj.MyCoolError:
....
I think it is not Pythonic. I also think that it does not provide a lot of advantage over the standard way which should be to just import the exception along with the function.
There is a reason (besides helping the interpreter) why Python programs use import statements to state where their code comes from; it helps finding the code of the facilities (e. g. your exception in this case) you are using.
The whole idea has the smell of the declaration of exceptions as it is possible in C++ and partly mandatory in Java. There are discussions amongst the language lawyers whether this is a good idea or a bad one, and in the Python world the designers decided against it, so it is not Pythonic.
It also raises a whole bunch of further questions. What happens if your function A is using another function B which then, later, is changed so that it can throw an exception (a valid thing in Python). Are you willing to change your function A then to reflect that (or catch it in A)? Where would you want to draw the line — is using int(text) to convert a string to int reason enough to "declare" that a ValueError can be thrown?
All in all I think it is not Pythonic, no.

Accept different types in python function?

I have a Python function that does a lot of major work on an XML file.
When using this function, I want two options: either pass it the name of an XML file, or pass it a pre-parsed ElementTree instance.
I'd like the function to be able to determine what it was given in its variable.
Example:
def doLotsOfXmlStuff(xmlData):
if (xmlData != # if xmlData is not ET instance):
xmlData = ET.parse(xmlData)
# do a bunch of stuff
return stuff
The app calling this function may need to call it just once, or it may need to call it several times. Calling it several times and parsing the XML each time is hugely inefficient and unnecessary. Creating a whole class just to wrap this one function seems a bit overkill and would end up requiring some code refactoring. For example:
ourResults = doLotsOfXmlStuff(myObject)
would have to become:
xmlObject = XMLProcessingObjectThatHasOneFunction("data.xml")
ourResult = xmlObject.doLotsOfXmlStuff()
And if I had to run this on lots of small files, a class would be created each time, which seems inefficient.
Is there a simple way to simply detect the type of the variable coming in? I know a lot of Pythoners will say "you shouldn't have to check" but here's one good instance where you would.
In other strong-typed languages I could do this with method overloading, but that's obviously not the Pythonic way of things...
The principle of "duck typing" is that you shouldn't care so much about the specific type of an object but rather you should check whether is supports the APIs in which you're interested.
In other words if the object passed to your function through the xmlData argument contains some method or attribute which is indicative of an ElementTree that's been parsed then you just use those methods or attributes ... if it doesn't have the necessary attribute then you are free to then pass it through some parsing.
So functions/methods/attributes of the result ET are you looking to use? You can use hasattr() to check for that. Alternatively you can wrap your call to any such functionality with a try: ... except AttributeError: block.
Personally I think if not hasattr(...): is a bit cleaner. (If it doesn't have the attribut I want, then rebind the name to something which has been prepared, parsed, whatever as I need it).
This approach has advantages over isinstance() because it allows users of your functionality to pass references to objects in their own classes which have extended ET through composition rather than inheritance. In other words if I wrap an ET like object in my own class, and expose the necessary functionality then I should be able to pass reference s to your function and have you just treat my object as if it were a "duck" even if it wasn't a descendant of a duck. If you need feathers, a bill, and webbed feet then just check for one of those and try to use the rest. I may be a black box containing a duck and I may have provided holes through which the feet, duck-bill, and feathers are accessible.
This is a fairly normal pattern (e.g. Python function that accepts file object or path). Just use isinstance:
def doLotsOfXmlStuff(xmlData):
if not isinstance(xmlData, ET):
xmlData = ET.parse(xmlData)
...
If you need to do cleanup (e.g. closing files) then calling your function recursively is OK:
def doLotsOfXmlStuff(xmlData):
if not isinstance(xmlData, ET):
xmlData = ET.parse(xmlData)
ret = doLotsOfXmlStuff(xmlData)
... # cleanup (or use a context manager)
return ret
...
You can use isinstance to determine type of variable.
Can you try to put an if statement to check the type and determine what to run from there?
if type(xmlData).__name__=='ElementTree':
#do stuff
else:
#do some other stuff
I think you can just compare the data types:
if (xmlData.dtype==something):
call Function1
else:
call Function2

How to test if a file has been created by pickle?

Is there any way of checking if a file has been created by pickle? I could just catch exceptions thrown by pickle.load but there is no specific "not a pickle file" exception.
Pickle files don't have a header, so there's no standard way of identifying them short of trying to unpickle one and seeing if any exceptions are raised while doing so.
You could define your own enhanced protocol that included some kind of header by subclassing the Pickler() and Unpickler() classes in the pickle module. However this can't be done with the much faster cPickle module because, in it, they're factory functions, which can't be subclassed [1].
A more flexible approach would be define your own independent classes that used corresponding Pickler() and Unpickler() instances from either one of these modules in its implementation.
Update
The last byte of all pickle files should be the pickle.STOP opcode, so while there isn't a header, there is effectively a very minimal trailer which would be a relatively simple thing to check.
Depending on your exact usage, you might be able to get away with supplementing that with something more elaborate (and longer than one byte), since any data past the STOP opcode in a pickled object's representation is ignored [2].
[1] Footnote [2] in the Python 2 documentation.
[2] Documentation forpickle.loads(), which also applies to pickle.load()since it's currently implemented in terms of the former.
There is no sure way other than to try to unpickle it, and catch exceptions.
I was running into this issue and found a fairly decent way of doing it. You can use the built in pickletools module to deconstruct a pickle file and get the pickle operations. With pickle protocol v2 and higher the first opcode will be a PROTO name and the last one as #martineau mentioned is STOP the following code will display these two opcodes. Note that output in this example can be iterated but opcodes can not be directly accessed thus the for loop.
import pickletools
with open("file.pickle", "rb") as f:
pickle = f.read()
output = pickletools.genops(pickle)
opcodes = []
for opcode in output:
opcodes.append(opcode[0])
print(opcodes[0].name)
print(opcodes[-1].name)

Python: determining if an object is file-like? [duplicate]

This question already has answers here:
Check if object is file-like in Python
(9 answers)
Closed 10 years ago.
I'm writing some unit tests (using the unittest module) for my application, and want to write something which can verify that a method I'm calling returns a "file-like" object. Since this isn't a simple isinstance call, I wonder what the best-practice would be for determining this?
So, in outline:
possible_file = self.dao.get_file("anotherfile.pdf")
self.assertTrue(possible_file is file-like)
Perhaps I have to care which specific interface this file object implements, or which methods that make it file-like I want to support?
Thanks,
R
There is no "official definition" of what objects are "sufficiently file-like", because the various uses of file-like objects have such different requirements -- e.g., some only require read or write methods, other require some subset of the various line-reading methods... all the ways to some requiring the fileno method, which can't even be supplied by the "very file-like objects" offered by StringIO and cStringIO modules in the standard library. It's definitely a question of "shades of gray", not a black-and-white taxonomy!
So, you need to determine which methods you need. To check for them, I recommended defining your own FileLikeEnoughForMe abstract base class with abstractmethod decorators, and checking the object with an isinstance for that class, if you're on Python 2.6 or better: this is the recommended idiom these days, rather than a bunch of hasattr checks which would be less readable and more complex (when properly beefed up with checks that those attributes are actually methods, etc;-).
The classical Python mentality is that it's easier to ask forgiveness than permission. In other words, don't check, catch the exception caused by writeing to it.
The new way is to use an IO abstract base class in an isinstance check. This was introduced when people realised that duck typing is awesome, but sometimes you really do want an instance check.
In your case (unittesting), you probably want to try it and see:
thingy = ...
try:
thingy.write( ... )
thingy.writeline( ... )
...
thingy.read( )
except AttributeError:
...
Check, if returned object provides interface you are looking for. Like this for example:
self.assert_(hasattr(possible_file, 'write'))
self.assert_(hasattr(possible_file, 'read'))

Python - When to use file vs open

What's the difference between file and open in Python? When should I use which one? (Say I'm in 2.5)
You should always use open().
As the documentation states:
When opening a file, it's preferable
to use open() instead of invoking this
constructor directly. file is more
suited to type testing (for example,
writing "isinstance(f, file)").
Also, file() has been removed since Python 3.0.
Two reasons: The python philosophy of "There ought to be one way to do it" and file is going away.
file is the actual type (using e.g. file('myfile.txt') is calling its constructor). open is a factory function that will return a file object.
In python 3.0 file is going to move from being a built-in to being implemented by multiple classes in the io library (somewhat similar to Java with buffered readers, etc.)
file() is a type, like an int or a list. open() is a function for opening files, and will return a file object.
This is an example of when you should use open:
f = open(filename, 'r')
for line in f:
process(line)
f.close()
This is an example of when you should use file:
class LoggingFile(file):
def write(self, data):
sys.stderr.write("Wrote %d bytes\n" % len(data))
super(LoggingFile, self).write(data)
As you can see, there's a good reason for both to exist, and a clear use-case for both.
Functionally, the two are the same; open will call file anyway, so currently the difference is a matter of style. The Python docs recommend using open.
When opening a file, it's preferable to use open() instead of invoking the file constructor directly.
The reason is that in future versions they is not guaranteed to be the same (open will become a factory function, which returns objects of different types depending on the path it's opening).
Only ever use open() for opening files. file() is actually being removed in 3.0, and it's deprecated at the moment. They've had a sort of strange relationship, but file() is going now, so there's no need to worry anymore.
The following is from the Python 2.6 docs. [bracket stuff] added by me.
When opening a file, it’s preferable to use open() instead of invoking this [file()] constructor directly. file is more suited to type testing (for example, writing isinstance(f, file)
According to Mr Van Rossum, although open() is currently an alias for file() you should use open() because this might change in the future.

Categories

Resources