Gunzipping Contents of a URL - Python - python

I'm back. :) Again trying to get the gzipped contents of a URL and gunzip them. This time in Python. The #SERVER section of code is the script I'm using to generate the gzipped data. The data is known good, as it works with Java. The #CLIENT section of code is the bit of code I'm using client-side to try and read that data (for eventual JSON parsing). However, somewhere in this transfer, the gzip module forgets how to read the data it created.
#SERVER
outbuf = StringIO.StringIO()
outfile = gzip.GzipFile(fileobj = outbuf, mode = 'wb')
outfile.write(data)
outfile.close()
print "Content-Encoding: gzip\n"
print outbuf.getvalue()
#CLIENT
urlReq = urllib2.Request(url)
urlReq.add_header('Accept-Encoding', '*')
urlConn = urllib2.build_opener().open(urlReq)
urlConnObj = StringIO.StringIO(urlConn.read())
gzin = gzip.GzipFile(fileobj = urlConnObj)
return gzin.read() #IOError: Not a gzipped file.
Other Notes:
outbuf.getvalue() is the same as urlConnObj.getvalue() is the same as urlConn.read()

This StackOverflow question seemed to help me out.
Apparently, it was just wise to by-pass the gzip module entirely, opting for zlib instead. Also, changing "*" to "gzip" in the "Accept-Encoding" header may've helped.
#CLIENT
urlReq = urllib2.Request(url)
urlReq.add_header('Accept-Encoding', 'gzip')
urlConn = urllib2.urlopen(urlReq)
return zlib.decompress(urlConn.read(), 16+zlib.MAX_WBITS)

Related

Sending an image from a server to client using HTTP in python

I was creating a web server that process client requests and send data through HTTP. I used python and it works perfectly for text, pdf and html files. When I tried to send a jpg image by this server, the client shows that, the image cant displayed because it contain errors in the client. I used different approaches given in this site, but failed. Can someone help me?? Image sending part of the code is given below. Thanks in advance...
req = clnt.recv(102400)
a = req.split('\n')[0].split()[1].split('/')[1]
if a.split('.')[1] == 'jpg':
path = os.path.abspath(a)
size = os.path.getsize(path)
img_file = open(a, 'rb')
bytes_read = 0
while bytes_read < size:
strng = img_file.read(1024)
if not strng:
break
bytes_read += len(strng)
clnt.sendall('HTTP/1.0 200 OK\n\n' + 'Content-type: image/jpeg"\n\n' + strng)
clnt.close()
time.sleep(30)
You are overwriting the string each time you perform a read on the file. If the file is greater than 1024 bytes you will lose the previously read chunk. Eventually the last read will return an empty string at EOF so strng will end up being the empty string.
strng = img_file.read(1024)
I think that you meant to use +=?:
strng += img_file.read(1024)
There is not really any advantage to reading the file in chunks like this. Reading all the file contents in one read will consume the same amount of memory. You could do this instead:
if a.split('.')[1] == 'jpg':
path = os.path.abspath(a)
with open(a, 'rb') as img_file:
clnt.sendall('HTTP/1.0 200 OK\n\n' + 'Content-type: image/jpeg"\n\n' + img_file.read())
clnt.close()
time.sleep(30)
Also, strictly speaking those \n characters should be \r\n for HTTP.

eBay LMS uploadFile - Cannot specify file with valid format

I am attempting to use the eBay Large Merchant Services API to upload calls in bulk. This API and documentation are notoriously bad (I found a blog post that explains it and rails against it here). The closest I can find to a working Python solution is based on this Question/Answer and his related Github version which is no longer maintained.
I have followed these examples very closely and as best I can tell they never fully worked in the first place. I emailed the original author and he says it was never put in production. However it should be very close to working.
All of my attempts result in a error 11 Please specify a File with Valid Format message.
I have tried various ways to construct this call, using both the github method, or a method via the email.mime package, and via Requests library, and the http.client library. The output below is what I believe is the closest I have to a what it should be. The file attached is using the Github's sample XML file, and it is read and gzipped using the same methods as in the Github repo.
Any help on this would be appreciated. I feel like I have exhausted my possibilities.
POST https://storage.sandbox.ebay.com/FileTransferService
X-EBAY-SOA-SERVICE-VERSION: 1.1.0
Content-Type: multipart/related; boundary=MIME_boundary; type="application/xop+xml"; start="<0.urn:uuid:86f4bbc4-cfde-4bf5-a884-cbdf3c230bf2>"; start-info="text/xml"
User-Agent: python-requests/2.5.1 CPython/3.3.4 Darwin/14.1.0
Accept: */*
Accept-Encoding: gzip, deflate
X-EBAY-SOA-SECURITY-TOKEN: **MYSANDBOXTOKEN**
X-EBAY-SOA-SERVICE-NAME: FileTransferService
Connection: keep-alive
X-EBAY-SOA-OPERATION-NAME: uploadFile
Content-Length: 4863
--MIME_boundary
Content-Type: application/xop+xml; charset=UTF-8; type="text/xml; charset=UTF-8"
Content-Transfer-Encoding: binary
Content-ID: <0.urn:uuid:86f4bbc4-cfde-4bf5-a884-cbdf3c230bf2>
<uploadFileRequest xmlns:sct="http://www.ebay.com/soaframework/common/types" xmlns="http://www.ebay.com/marketplace/services">
<taskReferenceId>50009042491</taskReferenceId>
<fileReferenceId>50009194541</fileReferenceId>
<fileFormat>gzip</fileFormat>
<fileAttachment>
<Size>1399</Size>
<Data><xop:Include xmlns:xop="http://www.w3.org/2004/08/xop/include" href="cid:urn:uuid:1b449099-a434-4466-b8f6-3cc2d59797da"/></Data>
</fileAttachment>
</uploadFileRequest>
--MIME_boundary
Content-Type: application/octet-stream
Content-Transfer-Encoding: binary
Content-ID: <urn:uuid:1b449099-a434-4466-b8f6-3cc2d59797da>
b'\x1f\x8b\x08\x08\x1a\x98\xdaT\x02\xffuploadcompression.xml\x00\xcdW[W\xdb8\x10~\x0e\xbfB\x9b}\x858\xd0\xb4\x14\x8e\xebnn\x94l\xe3\x96M\xccv\xfb\xd4#\xec!\xd1V\x96\\I\x06\xfc\xefw$\xdbI\x9cK\x0f\xec\xbe,\xe5P{f\xbe\xb9i\xe6\xb3\xed\xbf\x7fJ9y\x00\xa5\x99\x14\xef\xda\xa7\x9dn\x9b\x80\x88e\xc2\xc4\xe2]\xfb6\xba:y\xdb~\x1f\x1c\xf9\x83\x9c\x7f\x1fQC\xc7O\xf1\x92\x8a\x05\xcc\xe0G\x0e\xdah\x82p\xa1\xdf\xb5s%.\xe1\x8e\x16\x974c\xfa\x12\x06\xd3\x01\xd50\x94i&\x05\x08\xa3\xdb\xc1Q\xcb\xbf\x06\x9a\x80\xc2\xab\x96\xffg\x1908\xef\x9d\xfb^}c\x15sf`2\nN\xbb]\xdf\xab\xae\x11\xe9\xad\xa1-\xbf\x9f$\x13\x03i\x95\xc1\x0b\x12 \x04\xd1c\xa5\xa4\x9ab\t9]#\x00\xe2\xdb\xed\xdc\xf7\x9a\xc2\xca\xf2\x0bU\x02\xbb0\x85\x07\xe0\xc15[,}\xaf!\xaa\xcc\x0e\x95\xd2\xf2C\xd0\x1a\xfda\t\x11&\x8a8\xf2\t\x1e\xc9\xdc({y%UJ\x8d\xef\xad\x8d\x10\x83\x0e}[\x9b\xc3\xb7\xfc\x88\x19\x0e\xc1\xf5\xefC2\xffz\x12\xf6\xff\xc2\xff\xec\xdf\xc9\x84\x9c\x91\xeb\xf14\x1cG\xa4\xff)\xba\x9e\xf5\x87\x93hLB/\x1c\x8f&\xb7\xa1\xef\x95\xb8\xd2\xc7\x08t\xacXflV\xd1\x92i\x82\xbf\x94\xc4Rr\xb2\x04\x9e\x829&\xccX\xe1\x9d\xa2"!\x023\xbc+\x88Y\x02y\xa4\xc5/\xbe\xb7\x89/=\xde(\x96RU\x0c\xa9\x81\x85TE)m\xf9\xf5=V\xf2\xe6\xbcw\xe1{\x1b\x82\x12\xe8\xedE\xfasC\x95AU\x0c\xc1\xc5E\xe7\x02\x91\x1b\x92\xd2d(E\xc2l\n\xe5h\xe0llJ\x8e\x1a\xf1C\x9ae\xd8\xe0>\xe7\xf2\x11\x92 R9\xacs\xd9R\xd6\xdesa0\x1d;\t\xf5u\xa5\xc9\x95\xc2m\xb0\xaa\x11\xea\xea\xbb\xaa\xb3Lg\xd4\xc4\xcb\x88\xa5\x10\xd2\xa7\xe0\x156kKT\x1aN\x99;\xfdQ\xae\xa8k\xe3\x88\x16\xfa\xdb)\x16\xb1\xadh\x98GE\x06\xc1\x15{\x82\xc4u\xc2\x8e\xc5\n\xe1t\xd5i\xd0"\xc5\x01\x0f\xc1,e\xa2\x03\xbc\xbd\xa1\x1c[\xdd\x14\xaflQ9N)\xe3\xb8D\n\'/\xb0+\xf3\x9bb\xb8\\:a:\xb6\xd5wb\x99:\x07\xdb\xb6\x95\x13\x16\x9b\\\xc1\x08\x0c\xaat}\xfa\x95\xf4v6\r\x96\xc6d\x97\x9e\x07\x9d]\xb7\x1e\x9e\xff\x02\xb4\x87#\xedu\xcf\xce\xce\xbbo\xbd\xd7\xe7oN^\xbf9\xfd\xa6\xd3\xce\xdf\xd9\x02\xe3\xae\x1d\xd5S\xb3\'\xa0\x7f#\xb5\xa1|(\x13\x08z\x17\xbd\xb3\x1e\x9a\xad%\xa5\xc9\x1f9\x15\x86\x99"\xb0\xad^\xdd\x94\xba\x19\xa0Kq#9\x8bW\x03<\x83\xfb\\$\x9f\xcbQ\xafy\xce\xf7\x1a\xe2\x86\xe9\x8e\xd1Zm\xbd\xeb/\xcc,\x99\xa8\x90\xe5\xa1\xf7\xac\xe9\xaer\x1f.8\xed\x11\x0b\xdaBl\xd9\xf6\xe3\x182\x03u~[\xd2\x15v\xcbl\xbf\x8f\x1aM\x0e\xc2k\xe0\x0e)\xb4\xeaV \xb9( \x19\xa8\x94\x19\x04\x1c\x13x\xb2P"\x05\x01\x0e\xb1QR\xb0\x18\x19\x07R}L,\xe1\xa49r\xf8\x1d\x10&p\x9dqK\x13\xf2\xe8\xea$X~\x82\xe5\x13yO\x14\xc4\x80\xd1:$\x04e\xc3\xe0H\xc1\x06\xd0\x91V\\\x13\x82\xc3\x13\xca9\xc9h\xfc\x9d.p]\x8eIJENy\x152\xa3\x98\xdf\xa3T\xdf\x11khl\x9c0\x17\x94\x1bP\x90tH$m\xd6\xae\x9cc\x92q\xc0\x07\x89u\xefLc\x8c*SPD\x83z\xc0\xb5$F\x96\xe9=\x00\xd2\xea,\xec\xff\xea\xbc\xc9\\\xad|\x90{\xa4\xfa\x0e\x19O\xc7\xc3h\xf6\xf9\xd3dH\x90\xad\xc3\xf91Ir\x07G\xaee\xe8/\x83\x98QN\x04\xb5\xc3Nb*\x84t\xf5\xd5n\x12\xeb\x07\x9d\x17\x18\x8fj\xac\xd3\xc6\xb1\xcd\xd6\x92\x03/0\xc3\x07\x9b>I\x18\xe6c\xb8\xe5p%\xf3\xc5\xb2\xf2\x8f\x1b\x8c\x11\x8c\xcd\xd36\xe3\x9e\xba\xa5R\xbaS\x1d\xe9.\xd1#3/\x99\xa3\xcb!\xae\xd6\re\xc9\xa0\xa8\xe6g\x90\x17\xa0\x90\xa7\x0f\xe9\x0f\xe2\x0f#\xebm\xdf\xdd\xcc\x95\x9b\r\x06X\xc9\xe6\xe51\x94qKr\xd8\xd6\x05\xf5}I\x86\xf8p\x11\tU\xc9:\x89\xda\xae\x19\xad\x92\xda\x0c\x83n\xa7\xbbc\xee\x14{!\xc8\x97n\x12-\x1b\x1d\x00o\x99\xecu\x83\xb4/\x95\xe3\xaf\x1d\xf8JU\x02\xc7O\x19\xa0?H\xeaJ\xeeq\xd6\x91\x95v\xe4\xae=W\n\xa0\xf6\x17\x18\xf7xl\x88l{\xbd\xc3\xfd\xf5\'\x02\xf7D\xd02\xfd\xbdv\x07\x8e\xa1j|\x03\xbf\xff\x14\xf6\x1d\x02\xae^\xf9\xf8\x9d\x8c\xf0\xc5t>j\x07g\xf8\xb6\xf0\xf6\xf0\xb9\x1c\xec\xe7\xd9\xcf\xfb\xe9p\x91\x9c\xca\xb8|*\x0f\xfb\xa5\xfd\x86\xc8\xb5\xe8y}\xf8\xff\xb4\xeb\xd5\xbfl\xd7\xab\x97\xb5\xab\x8f\xec\xc8b\xaa\xf75m\xc7x\x9c+\x99\xc1\xb3L\xfb\x9a\xd1\xe7\x19\xde\xfe\xa7\xf3\xaa5\xe5\xfb\x17\xb7\xef\xe8\r\xd1\x91[\xb8\x98\xe7\tlE\x99~\xb4+\xb7Os\x183\x19\xbd\x1c\x13~}9\xe6\xc3\xe0\xe5\x98\xf9\x87\x9fa\xe6\xc09\xa8\xbdz}\xa3\xe0\x1e\xec\xf4AE0\xcf4>\xda`\x9e\xe6\xfb\x9e\xfd\x16\x0c`#\x8bP\x1a\xa9t\xf9qX\xeb>\xde\xba\xaf\x02\xfc9\xb1\xbb\x8d\xb7n.\xbc\xfaS\xca\xf7\x9a\xdf\x8cVv\xe4{\x87\xbeiQ]\xff\xfb\x07\xe0\xf2E>\x1f\x0f\x00\x00'
--MIME_boundary--
EDIT: Showing the methods I have tried to actually generate the binary data:
with open('testpayload.xml', mode='rt') as myfile:
full_xml = myfile.read() # replacing my full_xml with testpayload.xml, from Github repo
# METHOD 1: Simple gzip.compress()
payload = gzip.compress(bytes(full_xml, 'UTF-8'))
# METHOD 2: Implementing https://stackoverflow.com/a/8507012/1281743 which should produce a gzip-compatible string with header
import io
out = io.BytesIO()
with gzip.GzipFile(fileobj=out, mode="w") as f:
f.write(bytes(full_xml, 'UTF-8'))
payload = out.getvalue()
# METHOD 3: Simply reading in a pre-compressed version of same testpayload.xml file. Compressed on commandline with 'gzip testpayload.xml'
with open('testpayload.xml.gz', mode='rb') as myfile:
payload = myfile.read()
# METHOD 4: Adoping the _generate_date() function from github eBay-LMS-API repo http://goo.gl/YgFyBi
mybuffer = io.BytesIO()
fp = open('testpayload.xml', 'rb') # This file is from https://github.com/wrhansen/eBay-LMS-API/
# Create a gzip object that reads the compression to StringIO buffer
gzipbuffer = gzip.GzipFile('uploadcompression.xml.gz', 'wb', 9, mybuffer)
gzipbuffer.writelines(fp)
gzipbuffer.close()
fp.close()
mybuffer.seek(0)
payload = mybuffer.read()
mybuffer.close()
# DONE: send payload to API call
r = ebay.fileTransfer_upload_file(file_id='50009194541', task_id='50009042491', gzip_data=payload)
EDIT2, THE QUASI-SOLUTION: Python was not actually sending binary data, but rather a string representation of binary, which did not work. I ended up having to base64-encode the file first. I had tried that before, but have been waived off of that by a suggestion on the eBay forums. I just changed one of the last headers to Content-Transfer-Encoding: base64 and I got a success response.
EDIT3, THE REAL SOLUTION: Base 64 encoding only solved my problems with the example XML file. I found that my own XML payload was NOT sending successfully (same error as before), but that the script on Github could send that same file just fine. The difference was that the Github script is Python2 and able to mix string and binary data without differentiation. To send binary data in Python3 I just had to make a few simple changes to change all the strings in the request_part and binary_part into bytes, so that the bytes from the gzip payload would concatenate with it. Looks like this:
binary_part = b'\r\n'
binary_part += b'--MIME_boundary\r\n'
binary_part += b'Content-Type: application/octet-stream\r\n'
binary_part += b'Content-Transfer-Encoding: binary\r\n'
binary_part += b'Content-ID: <' + URN_UUID_ATTACHMENT.encode('utf-8') + b'>\r\n\r\n'
binary_part += gzip_data + b'\r\n'
binary_part += b'--MIME_boundary--'
So, no base64 encoding, just needed to figure out how to make Python send real binary data. And the original github repo does work as advertised, in python2.
So after scanning all my code about this, it looks like it's setup right. So if your error is a good error then it means your binary data is wrong. Can you show how you're gzipping and then reading that data back in? Here's what I'm doing
$dir = '/.../ebayUpload.gz';
if(is_file($dir)) unlink($dir);
$gz = gzopen($dir,'w9');
gzwrite($gz, $xmlFile);
chmod($dir, 0777);
gzclose($gz);
// open that file as data;
$handle = fopen($dir, 'r');
$fileData = fread($handle, filesize($dir));
fclose($handle);
And in your case the $xmlFile should be the string found in https://github.com/wrhansen/eBay-LMS-API/blob/master/examples/AddItem.xml
I should add that this is how I'm using $fileData
$binaryPart = '';
$binaryPart .= "--" . $boundry . $CRLF;
$binaryPart .= 'Content-Type: application/octet-stream' . $CRLF;
$binaryPart .= 'Content-Transfer-Encoding: binary' . $CRLF;
$binaryPart .= 'Content-ID: <urn:uuid:'.$uuid_attachment.'>' . $CRLF . $CRLF;
$binaryPart .= $fileData . $CRLF;
$binaryPart .= "--" . $boundry . "--";

How can I read exactly one response chunk with python's http.client?

Using http.client in Python 3.3+ (or any other builtin python HTTP client library), how can I read a chunked HTTP response exactly one HTTP chunk at a time?
I'm extending an existing test fixture (written in python using http.client) for a server which writes its response using HTTP's chunked transfer encoding. For the sake of simplicity, let's say that I'd like to be able to print a message whenever an HTTP chunk is received by the client.
My code follows a fairly standard pattern for reading a large response:
conn = http.client.HTTPConnection(...)
conn.request(...)
response = conn.getresponse()
resbody = []
while True:
chunk = response.read(1024)
if len(chunk):
resbody.append(chunk)
else:
break
conn.close();
But this reads 1024 byte chunks regardless of whether or not the server is sending 10 byte chunks or 10MiB chunks.
What I'm looking for would be something like the following:
while True:
chunk = response.readchunk()
if len(chunk):
resbody.append(chunk)
else
break
If this is not possible with http.client, is it possible with another builtin http client library? If it's not possible with a builtin client lib, is it possible with pip installable module?
I found it easier to use the requests library like so
r = requests.post(url, data=foo, headers=bar, stream=True)
for chunk in (r.raw.read_chunked()):
print(chunk)
Update:
The benefit of chunked transfer encoding is to allow the transmission of dynamically generated content. Whether a HTTP library lets you read individual chunks or not is a separate issue (see RFC 2616 - Section 3.6.1).
I can see how what you are trying to do would be useful, but the standard python http client libraries don't do what you want without some hackery (see http.client and httplib).
What you are trying to do may be fine for use in your test fixture, but in the wild there are no guarantees. It is possible for the chunking of the data read by your client to be be different from the chunking of the data sent by your server. E.g. the data could have been "re-chunked" by a proxy server before it arrived (see RFC 2616 - Section 3.2 - Framing Techniques).
The trick is to tell the response object that it isn't chunked (resp.chunked = False) so that it returns the raw bytes. This allows you to parse the size and data of each chunk as it is returned.
import http.client
conn = http.client.HTTPConnection("localhost")
conn.request('GET', "/")
resp = conn.getresponse()
resp.chunked = False
def get_chunk_size():
size_str = resp.read(2)
while size_str[-2:] != b"\r\n":
size_str += resp.read(1)
return int(size_str[:-2], 16)
def get_chunk_data(chunk_size):
data = resp.read(chunk_size)
resp.read(2)
return data
respbody = ""
while True:
chunk_size = get_chunk_size()
if (chunk_size == 0):
break
else:
chunk_data = get_chunk_data(chunk_size)
print("Chunk Received: " + chunk_data.decode())
respbody += chunk_data.decode()
conn.close()
print(respbody)

Python3 progress bar and download with gzip

i am having a little problem with the answer stated at Python progress bar and downloads
if the data downloaded was gzip encoded, the content length and the total length of the data after joining them in the for data in response.iter_content(): is different as in it is bigger cause automatically decompresses gzip-encoded responses
so the bar get longer and longer and once it become to long for a single line, it start flooding the terminal
a working example of the problem (the site is the first site i found on google that got both content-length and gzip encoding):
import requests,sys
def test(link):
print("starting")
response = requests.get(link, stream=True)
total_length = response.headers.get('content-length')
if total_length is None: # no content length header
data = response.content
else:
dl = 0
data = b""
total_length = int(total_length)
for byte in response.iter_content():
dl += len(byte)
data += (byte)
done = int(50 * dl / total_length)
sys.stdout.write("\r[%s%s]" % ('=' * done, ' ' * (50-done)))
sys.stdout.flush()
print("total data size: %s, content length: %s" % (len(data),total_length))
test("http://www.pontikis.net/")
ps, i am on linux but it should effect other os too (except windows cause \r doesn't work on it iirc)
and i am using requests.Session for cookies (and gzip) handling so a solution with urllib and other module isn't what i am looking for
Perhaps you should try disabling gzip compression or otherwise accounting for it.
The way to turn it off for requests (when using a session as you say you are):
import requests
s = requests.Session()
del s.headers['Accept-Encoding']
The header sent will now be: Accept-Encoding: Identity and the server should not attempt to use gzip compression. If instead you're trying to download a gzip-encoded file, you should not run into this problem. You will receive a Content-Type of application/x-gzip-compressed. If the website is gzip compressed, you'll receive a Content-Type of text/html for example and a Content-Encoding of gzip.
If the server always serves compressed content then you're out of luck, but no server should do that.
If you want to do something with the functional API of requests:
import requests
r = requests.get('url', headers={'Accept-Encoding': None})
Setting the header value to None via the functional API (or even in a call to session.get) removes that header from the requests.
You could replace...
dl += len(byte)
...with:
dl = response.raw.tell()
From the documentation:
tell(): Obtain the number of bytes pulled over the wire so far. May
differ from the amount of content returned by :meth:HTTPResponse.read
if bytes are encoded on the wire (e.g, compressed).
Here is a simple process bar implement with tqdm:
def _reader_generator(reader):
b = reader(1024 * 1024)
while b:
yield b
b = reader(1024 * 1024)
def raw_newline_count_gzip(fname):
f = gzip.open(fname, 'rb')
f_gen = _reader_generator(f.read)
return sum(buf.count(b'\n') for buf in f_gen)
num = raw_newline_count_gzip(fname)
(loop a gzip file):
with tqdm(total=num_ids) as pbar:
# do whatever you want
pbar.update(1)
The bar looks like:
35%|███▌ | 26288/74418 [00:05<00:09, 5089.45it/s]

Python seek on remote file using HTTP

How do I seek to a particular position on a remote (HTTP) file so I can download only that part?
Lets say the bytes on a remote file were: 1234567890
I wanna seek to 4 and download 3 bytes from there so I would have: 456
and also, how do I check if a remote file exists?
I tried, os.path.isfile() but it returns False when I'm passing a remote file url.
If you are downloading the remote file through HTTP, you need to set the Range header.
Check in this example how it can be done. Looks like this:
myUrlclass.addheader("Range","bytes=%s-" % (existSize))
EDIT: I just found a better implementation. This class is very simple to use, as it can be seen in the docstring.
class HTTPRangeHandler(urllib2.BaseHandler):
"""Handler that enables HTTP Range headers.
This was extremely simple. The Range header is a HTTP feature to
begin with so all this class does is tell urllib2 that the
"206 Partial Content" reponse from the HTTP server is what we
expected.
Example:
import urllib2
import byterange
range_handler = range.HTTPRangeHandler()
opener = urllib2.build_opener(range_handler)
# install it
urllib2.install_opener(opener)
# create Request and set Range header
req = urllib2.Request('http://www.python.org/')
req.header['Range'] = 'bytes=30-50'
f = urllib2.urlopen(req)
"""
def http_error_206(self, req, fp, code, msg, hdrs):
# 206 Partial Content Response
r = urllib.addinfourl(fp, hdrs, req.get_full_url())
r.code = code
r.msg = msg
return r
def http_error_416(self, req, fp, code, msg, hdrs):
# HTTP's Range Not Satisfiable error
raise RangeError('Requested Range Not Satisfiable')
Update: The "better implementation" has moved to github: excid3/urlgrabber in the byterange.py file.
I highly recommend using the requests library. It is easily the best HTTP library I have ever used. In particular, to accomplish what you have described, you would do something like:
import requests
url = "http://www.sffaudio.com/podcasts/ShellGameByPhilipK.Dick.pdf"
# Retrieve bytes between offsets 3 and 5 (inclusive).
r = requests.get(url, headers={"range": "bytes=3-5"})
# If a 4XX client error or a 5XX server error is encountered, we raise it.
r.raise_for_status()
AFAIK, this is not possible using fseek() or similar. You need to use the HTTP Range header to achieve this. This header may or may not be supported by the server, so your mileage may vary.
import urllib2
myHeaders = {'Range':'bytes=0-9'}
req = urllib2.Request('http://www.promotionalpromos.com/mirrors/gnu/gnu/bash/bash-1.14.3-1.14.4.diff.gz',headers=myHeaders)
partialFile = urllib2.urlopen(req)
s2 = (partialFile.read())
EDIT: This is of course assuming that by remote file you mean a file stored on a HTTP server...
If the file you want is on an FTP server, FTP only allows to to specify a start offset and not a range. If this is what you want, then the following code should do it (not tested!)
import ftplib
fileToRetrieve = 'somefile.zip'
fromByte = 15
ftp = ftplib.FTP('ftp.someplace.net')
outFile = open('partialFile', 'wb')
ftp.retrbinary('RETR '+ fileToRetrieve, outFile.write, rest=str(fromByte))
outFile.close()
You can use httpio to access remote HTTP files as if they were local:
pip install httpio
import zipfile
import httpio
url = "http://some/large/file.zip"
with httpio.open(url) as fp:
zf = zipfile.ZipFile(fp)
print(zf.namelist())

Categories

Resources