This is probably really dumb question, but I honestly can't find documentation for file object's API in Python 3.
Python docs for things using or returning file objects like open or sys.stdin have links to glossary with high-level introduction. It doesn't list functions exposed by such objects and I don't know, what can I do with them. I've tried googling for file object docs, but search engines don't seem to understand, what am I looking for.
I'm new to Python, but not to programming in general. Until now my scheme of using objects was to find complete API reference, see what it can do and then pick methods to use in my code. Is this wrong mindset in Python world? What are the alternatives?
open returns a file object that differs depending on the mode. From the open docs:
The type of file object returned by the open() function depends on the mode. When open() is used to open a file in a text mode ('w', 'r', 'wt', 'rt', etc.), it returns a subclass of io.TextIOBase (specifically io.TextIOWrapper). When used to open a file in a binary mode with buffering, the returned class is a subclass of io.BufferedIOBase. The exact class varies: in read binary mode, it returns an io.BufferedReader; in write binary and append binary modes, it returns an io.BufferedWriter, and in read/write mode, it returns an io.BufferedRandom. When buffering is disabled, the raw stream, a subclass of io.RawIOBase, io.FileIO, is returned.
Since it varies, open a file object with the mode you want help for and ask it for help:
>>> f = open('xx','w')
>>> help(f)
Help on TextIOWrapper object:
class TextIOWrapper(_TextIOBase)
| Character and line based layer over a BufferedIOBase object, buffer.
|
: etc...
Related
Question:
What is the difference between open(<name>, "w", encoding=<encoding>) and open(<name>, "wb") + str.encode(<encoding>)? They seem to (sometimes) produce different outputs.
Context:
While using PyFPDF (version 1.7.2), I subclassed the FPDF class, and, among other things, added my own output method (taking pathlib.Path objects). While looking at the source of the original FPDF.output() method, I noticed almost all of it is argument parsing - the only relevant bits are
#Finish document if necessary
if(self.state < 3):
self.close()
[...]
f=open(name,'wb')
if(not f):
self.error('Unable to create output file: '+name)
if PY3K:
# manage binary data as latin1 until PEP461 or similar is implemented
f.write(self.buffer.encode("latin1"))
else:
f.write(self.buffer)
f.close()
Seeing that, my own Implementation looked like this:
def write_file(self, file: Path) -> None:
if self.state < 3:
# See FPDF.output()
self.close()
file.write_text(self.buffer, "latin1", "strict")
This seemed to work - a .pdf file was created at the specified path, and chrome opened it. But it was completely blank, even tho I added Images and Text. After hours of experimenting, I finally found a Version that worked (produced a non empty pdf file):
def write_file(self, file: Path) -> None:
if self.state < 3:
# See FPDF.output()
self.close()
# using .write_text(self.buffer, "latin1", "strict") DOES NOT WORK AND I DON'T KNOW WHY
file.write_bytes(self.buffer.encode("latin1", "strict"))
Looking at the pathlib.Path source, it uses io.open for Path.write_text(). As all of this is Python 3.8, io.open and the buildin open() are the same.
Note:
FPDF.buffer is of type str, but holds binary data (a pdf file). Probably because the Library was originally written for Python 2.
Both should be the same (with minor differences).
I like open way, because it is explicit and shorter, OTOH if you want to handle encoding errors (e.g. a way better error to user), one should use decode/encode (maybe after a '\n'.split(s), and keeping line numbers)
Note: if you use the first method (open), you should just use r or w, so without b. For your question's title, it seems you did correct, but check that your example keep b, and probably for this, it used encoding. OTOH the code seems old, and I think the ".encoding" was just done because it would be more natural in Python2 mindset.
Note: I would also replace strict to backslashreplace for debugging. And possibly you may want to check and print (maybe just ord) of the first few characters of the self.buffer on both methods, to see if there are substantial differences before file.write.
I would add a file.flush() on both functions. This is one of the differences: buffering is different, and I'll make sure I close the file. Python will do it, but when debugging, it is important to see the content of the file as quick as possible (and also after an exception). Garbage collector could not guarantee all of this. Maybe you are reading a text file which was not yet flushed.
Aaaand found it: Path.write_bytes() will save the bytes object as is, and str.encoding doesn't touch the line endings.
Path.write_text() will encode the bytes object just like str.encode(), BUT: because the file is opened in text mode, the line endings will be normalized after encoding - in my case converting \n to \r\n because I'm on Windows. And pdfs have to use \n, no matter what platform your on.
In Python2, is it safe to have multiple threads read from a single unchanging disk file using code such as:
with open( pathname, 'rb' ) as f:
f.seek( file_position )
data = f.read( number_of_bytes )
No process has, or will have, write-permission for the file.
Obviously, reading files in this way is not atomic. The Python2 documents say nothing (I could find) about file objects and threads. Here is the documentation for the seek method:
https://docs.python.org/2/library/stdtypes.html?highlight=seek#file-objects
This is a critical issue for my system, so if pointers into the documentation could be provided, that would be reassuring.
Thank you.
If each thread executes the code you've given, they open the file separately, and this is safe. I'm not sure to what documentation to refer you; this is just a result of a process being allowed to have the same file open more than once. You may not be on a POSIX system, but for reference it describes an open file description as the thing created by open() (in C, but wrapped by Python) that holds the file offset and other information relevant to accessing the file.
I have this Python tool written by someone else to flash a certain microcontroller, but he has written this tool for Python 2.6 and I am using Python 3.3.
So, most of it I got ported, but this line is making problems:
data = map(lambda c: ord(c), file(args[0], 'rb').read())
The file function does not exist in Python 3 and has to be replaced with open. But then, a function which gets data as an argument causes an exception:
TypeError: object of type 'map' has no len()
But what I see so far in the documentation is, that map has to join iterable types to one big iterable, am I missing something?
What do I have to do to port this to Python 3?
In Python 3, map returns an iterator. If your function expects a list, the iterator has to be explicitly converted, like this:
data = list(map(...))
And we can do it simply, like this
with open(args[0], "rb") as input_file:
data = list(input_file.read())
rb refers to read in binary mode. So, it actually returns the bytes. So, we just have to convert them to a list.
Quoting from the open's docs,
Python distinguishes between binary and text I/O. Files opened in
binary mode (including 'b' in the mode argument) return contents as
bytes objects without any decoding.
I want to send a file with python ftplib, from one ftp site to another, so to avoid file read/write processees.
I create a BytesIO stream:
myfile=BytesIO()
And i succesfully retrieve a image file from ftp site one with retrbinary:
ftp_one.retrbinary('RETR P1090080.JPG', myfile.write)
I can save this memory object to a regular file:
fot=open('casab.jpg', 'wb')
fot=myfile.readvalue()
But i am not able to send this stream via ftp with storbinary. I thought this would work:
ftp_two.storbinary('STOR magnafoto.jpg', myfile.getvalue())
But doesnt. i get a long error msg ending with 'buf = fp.read(blocksize)
AttributeError: 'str' object has no attribute 'read'
I also tried many absurd combinations, but with no success. As an aside, I am also quite puzzled with what I am really doing with myfoto.write. Shouldnt it be myfoto.write() ?
I am also quite clueless to what this buffer thing does or require. Is what I want too complicated to achieve? Should I just ping pong the files with an intermediate write/read in my system? Ty all
EDIT: thanks to abanert I got things straight. For the record, storbinary arguments were wrong and a myfile.seek(0) was needed to 'rewind' the stream before sending it. This is a working snippet that moves a file between two ftp addresses without intermediate physical file writes:
import ftplib as ftp
from io import BytesIO
ftp_one=ftp.FTP(address1, user1, pass1)
ftp_two=ftp.FTP(address2, user2, pass2)
myfile=BytesIO()
ftp_one.retrbinary ('RETR imageoldname.jpg', myfile.write)
myfile.seek(0)
ftp_two.storbinary('STOR imagenewname.jpg', myfile)
ftp_one.close()
ftp_two.close()
myfile.close()
The problem is that you're calling getvalue(). Just don't do that:
ftp_two.storbinary('STOR magnafoto.jpg', myfile)
storbinary requires a file-like object that it can call read on.
Fortunately, you have just such an object, myfile, a BytesIO. (It's not entirely clear from your code what the sequence of things is here—if this doesn't work as-is, you may need to myfile.seek(0) or create it in a different mode or something. But a BytesIO will work with storbinary unless you do something wrong.)
But instead of passing myfile, you pass myfile.getvalue(). And getvalue "Returns bytes containing the entire contents of the buffer."
So, instead of giving storbinary a file-like object that it can call read on, you're giving it a bytes object, which is of course the same as str in Python 2.x, and you can't call read on that.
For your aside:
As an aside, I am also quite puzzled with what I am really doing with myfoto.write. Shouldnt it be myfoto.write() ?
Look at the docs. The second parameter isn't a file, it's a callback function.
The callback function is called for each block of data received, with a single string argument giving the data block.
What you want is a function that appends each block of data to the end of myfoto. While you could write your own function to do that:
def callback(block_of_data):
myfoto.write(block_of_data)
… it should be pretty obvious that this function does exactly the same thing as the myfoto.write method. So, you can just pass that method itself.
If you don't understand about bound methods, see Method Objects in the tutorial.
This flexibility, as weird as it seems, lets you do something even better than downloading the whole file into a buffer to send to another server. You can actually open the two connections at the same time, and use callbacks to send each buffer from the source server to the destination server as it's received, without ever storing anything more than one buffer.
But, unless you really need that, you probably don't want to go through all that complexity.
In fact, in general, ftplib is kind of low-level. And it has some designs (like the fact that storbinary takes a file, while retrbinary takes a callback) that make total sense at that low level but seem very odd from a higher level. So, you may want to look at some of the higher-level libraries by doing a search at PyPI.
I need to write some methods for loading/saving some classes to and from a binary file. However I also want to be able to accept the binary data from other places, such as a binary string.
In c++ I could do this by simply making my class methods use std::istream and std::ostream which could be a file, a stringstream, the console, whatever.
Does python have a similar input/output class which can be made to represent almost any form of i/o, or at least files and memory?
The Python way to do this is to accept an object that implements read() or write(). If you have a string, you can make this happen with StringIO:
from cStringIO import StringIO
s = "My very long string I want to read like a file"
file_like_string = StringIO(s)
data = file_like_string.read(10)
Remember that Python uses duck-typing: you don't have to involve a common base class. So long as your object implements read(), it can be read like a file.
The Pickle and cPickle modules may also be helpful to you.