python File read buffer boundary - python

when we use function open() to open a file,we may set the buffersize for performance . But I just doubt if we set 1024,but the data in file is like this:
1999999999 3232344 54354364576 2343243254 6453623453245r3245235 5342453245233333333333333333 534545454364536 4355545...
So I don't know whether this will cut off one number,just as first read,buffer will be 1999999999 3232344 54354364576 2343243254 6453623453245r3245235 53424532,
And next we read buffer will be 45233333333333333333 534545454364536 4355545,and so on.
Or python's buffer implement had solve this question ? Can anyone give me some pointers ? Thanks.

If you use the read method without any arguments it will return the entire file content. You can use the size parameter if you only want to read a part of the file at a time.
See the documentation for more info http://docs.python.org/2.7/tutorial/inputoutput.html#reading-and-writing-files

Related

Python 2.7 - How to programmatically read binary data from stdin

I'd like to be able to read binary data from stdin with python.
However, when I use input = sys.stdin.buffer.read(), I get the error that AttributeError: 'file' object has no attribute 'buffer'. This seems strange because the docs say that I should be able to use the underlying buffer object - how can I fix / work around this?
Notes: I've checked out the last time this was asked, but the answers there are all either "use -u", "use buffer" (which I'm trying), or something about reading from files. The first and last don't help because I have no control over the users of this program (so I can't tell them to use particular arguments) and because this is stdin - not files.
Just remove the buffer for python2:
import sys
input = sys.stdin.read()

ftp sending python bytesio stream

I want to send a file with python ftplib, from one ftp site to another, so to avoid file read/write processees.
I create a BytesIO stream:
myfile=BytesIO()
And i succesfully retrieve a image file from ftp site one with retrbinary:
ftp_one.retrbinary('RETR P1090080.JPG', myfile.write)
I can save this memory object to a regular file:
fot=open('casab.jpg', 'wb')
fot=myfile.readvalue()
But i am not able to send this stream via ftp with storbinary. I thought this would work:
ftp_two.storbinary('STOR magnafoto.jpg', myfile.getvalue())
But doesnt. i get a long error msg ending with 'buf = fp.read(blocksize)
AttributeError: 'str' object has no attribute 'read'
I also tried many absurd combinations, but with no success. As an aside, I am also quite puzzled with what I am really doing with myfoto.write. Shouldnt it be myfoto.write() ?
I am also quite clueless to what this buffer thing does or require. Is what I want too complicated to achieve? Should I just ping pong the files with an intermediate write/read in my system? Ty all
EDIT: thanks to abanert I got things straight. For the record, storbinary arguments were wrong and a myfile.seek(0) was needed to 'rewind' the stream before sending it. This is a working snippet that moves a file between two ftp addresses without intermediate physical file writes:
import ftplib as ftp
from io import BytesIO
ftp_one=ftp.FTP(address1, user1, pass1)
ftp_two=ftp.FTP(address2, user2, pass2)
myfile=BytesIO()
ftp_one.retrbinary ('RETR imageoldname.jpg', myfile.write)
myfile.seek(0)
ftp_two.storbinary('STOR imagenewname.jpg', myfile)
ftp_one.close()
ftp_two.close()
myfile.close()
The problem is that you're calling getvalue(). Just don't do that:
ftp_two.storbinary('STOR magnafoto.jpg', myfile)
storbinary requires a file-like object that it can call read on.
Fortunately, you have just such an object, myfile, a BytesIO. (It's not entirely clear from your code what the sequence of things is here—if this doesn't work as-is, you may need to myfile.seek(0) or create it in a different mode or something. But a BytesIO will work with storbinary unless you do something wrong.)
But instead of passing myfile, you pass myfile.getvalue(). And getvalue "Returns bytes containing the entire contents of the buffer."
So, instead of giving storbinary a file-like object that it can call read on, you're giving it a bytes object, which is of course the same as str in Python 2.x, and you can't call read on that.
For your aside:
As an aside, I am also quite puzzled with what I am really doing with myfoto.write. Shouldnt it be myfoto.write() ?
Look at the docs. The second parameter isn't a file, it's a callback function.
The callback function is called for each block of data received, with a single string argument giving the data block.
What you want is a function that appends each block of data to the end of myfoto. While you could write your own function to do that:
def callback(block_of_data):
myfoto.write(block_of_data)
… it should be pretty obvious that this function does exactly the same thing as the myfoto.write method. So, you can just pass that method itself.
If you don't understand about bound methods, see Method Objects in the tutorial.
This flexibility, as weird as it seems, lets you do something even better than downloading the whole file into a buffer to send to another server. You can actually open the two connections at the same time, and use callbacks to send each buffer from the source server to the destination server as it's received, without ever storing anything more than one buffer.
But, unless you really need that, you probably don't want to go through all that complexity.
In fact, in general, ftplib is kind of low-level. And it has some designs (like the fact that storbinary takes a file, while retrbinary takes a callback) that make total sense at that low level but seem very odd from a higher level. So, you may want to look at some of the higher-level libraries by doing a search at PyPI.

Whats the best way to get the filesize?

There are actually three ways I have in mind to determine a files size:
open and read it, and get the size of the string with len()
using os.stat and getting it via st_size -> what should be the "right" way because its handled by the underlying os
os.path.getsize what should be the same as above
So what is the actual right way to determine the filesize? What is the worst way to do?
Or doesn't it even matter because at the end it is all the same?
(I can imagine the first method having a problem with really large files, while the two others have not)
The first method would be a waste if you don't need the contents of the file anyway. Either of your other two options are fine. os.path.getsize() uses os.stat()
From genericpath.py
def getsize(filename):
"""Return the size of a file, reported by os.stat()."""
return os.stat(filename).st_size
Edit:
In case it isn't obvious, os.path.getsize() comes from genericpath.py.
>>> os.path.getsize.__code__
<code object getsize at 0x1d457b0, file "/usr/lib/python2.7/genericpath.py", line 47>
Method 1 is the slowest way possible. Don't use it unless you will need the entire contents of the file as a string later.
Methods 2 and 3 are the fastest, since they don't even have to open the file.
Using f.seek(os.SEEK_END) and f.tell() requires opening the file, and might be a bit slower than 2&3 unless you're going to open the file anyway.
All methods will give the same result when no other program is writing to the file. If the file is in the middle of being modified when your code runs, seek+tell can sometimes give you a more up-to-date answer than 2&3.
no. 1 is definitely the worst. If at all, it's better to seek() and tell(), but that's not as good as the other two.
no. 2 and no. 3 are equally ok IMO. I think no. 3 is a bit clearer to read, but that's negligible.

Python parsing xml directly from web address

Hey. I tried to find a way but i can't. I have set up a xml.sax parser in python and it works perfect when i read a local file (for example calendar.xml), but i need to read a xml file from a web address.
I figured it would work if i do this:
toursxml='http://api.songkick.com/api/3.0/artists/mbid:'+mbid+'/calendar.xml?apikey=---------'
toursurl=urllib2.urlopen(toursxml)
toursurl=toursurl.read()
parser.parse(toursurl)
but it doesnt. im sure theres an easy way but i cant find it.
so yeah I can easily go to the url and download the file and open it by doing
parser.parse("calendar.xml")
as a work around ive set it up to read the file and create the file locally, close the file, and then read it. But as you can guess its slow as hell.
Is there anyone to directly read the xml? also note that the url name does not end in ".xml" so that may be a problem later
First, your example is mixed up. Please don't reuse variables.
toursurl= urllib2.urlopen(toursxml)
toursurl_string= toursurl.read()
parser.parseString( toursurl_string )
Reads the entire file into a string, named toursurl_string.
To parse a string, you use the parseString(toursurl_string) method.
http://docs.python.org/library/xml.sax.html#xml.sax.parseString
If you want to combine reading and parsing, you have to pass the "stream" or filename to parse.
toursurl= urllib2.urlopen(toursxml)
parser.parse(toursurl)
parser.parse(xyz)
expects xyz to be a file; you are looking for
parser.parseString(xyz)
which expects xyz to be a string containing XML.

Emacs collaborative buffers open in the wrong mode

I am using Emacs and Rudel to collaborate with a remote programmer. Rudel has a concept of published buffers. When my partner publishes a buffer, I can subscribe to it and the we can both edit it simultaneously.
My problem is that when he publishes a Python file with a *.py extension and I subscribe to it, my buffer is not set to python-mode automatically (it is in fundamental mode). How can I get it so that the buffer opens with the correct language mode?
I don't know Rudel well enough to give a 100% solution, but what you want to do is something like this:
(add-hook 'rudel-document-attach-hook 'my-rudel-set-mode-appropriately)
(defun my-rudel-set-mode-appropriately (document buffer)
"try to set the mode appropriately"
(set-buffer buffer)
(let ((buffer-file-name ...get-name-from-document...))
(set-auto-mode)))
Only, you need to replace the ...get-name-from-document... portion of the code with something that evaluates to the file name that you want, for example, if the buffer is named myfile.py, then you can change that to (buffer-name). But, if the buffers get odd names, perhaps you need to extract the name from the document object (Rudel internally uses a document object to represent the thing you are sharing). So, if (buffer-name) doesn't work, you can try (rudel-suggested-buffer-name document).
i.e. try the above code but using one of these lines:
(let ((buffer-file-name (buffer-name)))
and
(let ((buffer-file-name (rudel-suggested-buffer-name document)))
The set-auto-mode will use value of buffer-file-name to determine the major mode using the general Emacs mechanisms.
I know absolutely nothing about how rudel works. However, have you tried explicitly setting the mode in the text file? Try adding something like this to the first line of the file:
# -*- mode: python; fill-column: 75; comment-column: 50; -*-
Putting a line like this first in the file will cause emacs to ignore the file's extension and open in the given mode.

Categories

Resources