uploading file contents in python takes a long time

uploading file contents in python takes a long time - python

I'm trying to upload files with the following code:
url = "/folder/sub/interface?"
connection = httplib.HTTPConnection('www.mydomain.com')
def sendUpload(self):
fields = []
file1 = ['file1', '/home/me/Desktop/sometextfile.txt']
f = open(file1[1], 'r')
file1.append(f.read())
files = [file1]
content_type, body = self.encode_multipart_formdata(fields, files)
myheaders['content-type'] = content_type
myheaders['content-length'] = str(len(body))
upload_data = urllib.urlencode({'command':'upload'})
self.connection.request("POST", self.url + upload_data, {}, myheaders)
response = self.connection.getresponse()
if response.status == 200:
data = response.read()
self.connection.close()
print data
The encode_multipart_formdata() comes from http://code.activestate.com/recipes/146306/
When I execute the method it takes a long time to complete. In fact, I don't think it will end.. On the network monitor I see that data is transferred, but the method doesn't end...
Why is that? Should I set a timeout somewhere?

You don't seem to be sending the body of your request to the server, so it's probably stuck waiting for content-length bytes to arrive, which they never do.
Are you sure that
self.connection.request("POST", self.url + upload_data, {}, myheaders)
shouldn't read
self.connection.request("POST", self.url + upload_data, body, myheaders)
?

Related

aiohttp request.multipart() get nothing from browser uploading files

browser using vue element-ui el-upload component to upload file,and aiohttp as backend receive form data then save it.but aiohttp request.multipart() was always blank but request.post() will be ok.
vue:
<el-upload class="image-uploader"
:data="dataObj"
drag
name="aaa"
:multiple="false"
:show-file-list="false"
:action="action" -> upload url,passed from outer component
:on-success="handleImageScucess">
<i class="el-icon-upload"></i>
</el-upload>
export default {
name: 'singleImageUpload3',
props: {
value: String,
action: String
},
methods: {
handleImageScucess(file) {
this.emitInput(file.files.file)
},
}
aiohttp: not work
async def post_image(self, request):
reader = await request.multipart()
image = await reader.next()
print (image.text())
filename = image.filename
print (filename)
size = 0
with open(os.path.join('', 'aaa.jpg'), 'wb') as f:
while True:
chunk = await image.read_chunk()
print ("chunk", chunk)
if not chunk:
break
size += len(chunk)
f.write(chunk)
return await self.reply_ok([])
aiohttp: work
async def post_image(self, request):
data = await request.post()
print (data)
mp3 = data['aaa']
filename = mp3.filename
mp3_file = data['aaa'].file
content = mp3_file.read()
with open('aaa.jpg', 'wb') as f:
f.write(content)
return await self.reply_ok([])
browser console:
bug or anything i missed ? please help me to solve it,thanks in advance.

I think you might have check the example at aiohttp document about file upload server. But that snippet is ambiguous, and the document doesn't explain itself very well.
After digging around its source code for a while, I found that request.multipart() actually yields a MultipartReader instance, which processes multipart/form-data requests one field each time on calling of .next(), yielding another BodyPartReader instance.
In your not working code, image = await reader.next() that line actually reads out one entire field from the form data, which you can't be sure which field it actually is. It might be the token field, the key field, the filename field, the aaa field... or any of them. So in your not working example, that post_image coroutine function will only process one single field from your requesting data, and you cannot be very sure that it is the aaa file field.
Here's my code snippet,
async def post_image(self, request):
# Iterate through each field of MultipartReader
async for field in (await request.multipart()):
if field.name == 'token':
# Do something about token
token = (await field.read()).decode()
pass
if field.name == 'key':
# Do something about key
pass
if field.name == 'filename':
# Do something about filename
pass
if field.name == 'aaa':
# Process any files you uploaded
filename = field.filename
# In your example, filename should be "2C80...jpg"
# Deal with actual file data
size = 0
with open(os.path.join('', filename), 'wb') as fd:
while True:
chunk = await field.read_chunk()
if not chunk:
break
size += len(chunk)
fd.write(chunk)
# Reply ok, all fields processed successfully
return await self.reply_ok([])
And the snippet above can also deal with multiple files in single request with duplicate field name, or 'aaa' in your example. The filename in Content-Disposition header should be automatically filled in by the browser itself, so no need to worry about filename.
BTW, when dealing with file uploads in requests, data = await request.post() will eat up considerable amount of memory to load file data. So request.post() should be avoided when involving file uploads, use request.multipart() instead.

Urllib2 Python - Reconnecting and Splitting Response

I am moving to Python from other language and I am not sure how to properly tackle this. Using the urllib2 library it is quite easy to set up a proxy and get a data from a site:
import urllib2
req = urllib2.Request('http://www.voidspace.org.uk')
response = urllib2.urlopen(req)
the_page = response.read()
The problem I have is that the text file that is retrieved is very large (hundreds of MB) and the connection is often problematic. The code also need to catch connection, server and transfer errors (it will be a part of small extensively used pipeline).
Could anyone suggest how to modify the code above to make sure the code automatically reconnects n times (for example 100 times) and perhaps split the response into chunks so the data will be downloaded faster and more reliably?
I have already split the requests as much as I could so now have to make sure that the retrieve code is as good as it can be. Solutions based on core python libraries are ideal.
Perhaps the library is already doing the above in which case is there any way to improve downloading large files? I am using UNIX and need to deal with a proxy.
Thanks for your help.

I'm putting up an example of how you might want to do this with the python-requests library. The script below checks if the destinations file already exists. If the partially destination file exists, it's assumed to be the partially downloaded file, and tries to resume the download. If the server claims support for a HTTP Partial Request (i.e. the response to a HEAD request contains Accept-Range header), then the script resume based on the size of the partially downloaded file; otherwise it just does a regular download and discard the parts that are already downloaded. I think it should be fairly straight forward to convert this to use just urllib2 if you don't want to use python-requests, it'll probably be just much more verbose.
Note that resuming downloads may corrupt the file if the file on the server is modified between the initial download and the resume. This can be detected if the server supports strong HTTP ETag header so the downloader can check whether it's resuming the same file.
I make no claim that it is bug-free.
You should probably add a checksum logic around this script to detect download errors and retry from scratch if the checksum doesn't match.
import logging
import os
import re
import requests
CHUNK_SIZE = 5*1024 # 5KB
logging.basicConfig(level=logging.INFO)
def stream_download(input_iterator, output_stream):
for chunk in input_iterator:
output_stream.write(chunk)
def skip(input_iterator, output_stream, bytes_to_skip):
total_read = 0
while total_read <= bytes_to_skip:
chunk = next(input_iterator)
total_read += len(chunk)
output_stream.write(chunk[bytes_to_skip - total_read:])
assert total_read == output_stream.tell()
return input_iterator
def resume_with_range(url, output_stream):
dest_size = output_stream.tell()
headers = {'Range': 'bytes=%s-' % dest_size}
resp = requests.get(url, stream=True, headers=headers)
input_iterator = resp.iter_content(CHUNK_SIZE)
if resp.status_code != requests.codes.partial_content:
logging.warn('server does not agree to do partial request, skipping instead')
input_iterator = skip(input_iterator, output_stream, output_stream.tell())
return input_iterator
rng_unit, rng_start, rng_end, rng_size = re.match('(\w+) (\d+)-(\d+)/(\d+|\*)', resp.headers['Content-Range']).groups()
rng_start, rng_end, rng_size = map(int, [rng_start, rng_end, rng_size])
assert rng_start <= dest_size
if rng_start != dest_size:
logging.warn('server returned different Range than requested')
output_stream.seek(rng_start)
return input_iterator
def download(url, dest):
''' Download `url` to `dest`, resuming if `dest` already exists
If `dest` already exists it is assumed to be a partially
downloaded file for the url.
'''
output_stream = open(dest, 'ab+')
output_stream.seek(0, os.SEEK_END)
dest_size = output_stream.tell()
if dest_size == 0:
logging.info('STARTING download from %s to %s', url, dest)
resp = requests.get(url, stream=True)
input_iterator = resp.iter_content(CHUNK_SIZE)
stream_download(input_iterator, output_stream)
logging.info('FINISHED download from %s to %s', url, dest)
return
remote_headers = requests.head(url).headers
remote_size = int(remote_headers['Content-Length'])
if dest_size < remote_size:
logging.info('RESUMING download from %s to %s', url, dest)
support_range = 'bytes' in [s.strip() for s in remote_headers['Accept-Ranges'].split(',')]
if support_range:
logging.debug('server supports Range request')
logging.debug('downloading "Range: bytes=%s-"', dest_size)
input_iterator = resume_with_range(url, output_stream)
else:
logging.debug('skipping %s bytes', dest_size)
resp = requests.get(url, stream=True)
input_iterator = resp.iter_content(CHUNK_SIZE)
input_iterator = skip(input_iterator, output_stream, bytes_to_skip=dest_size)
stream_download(input_iterator, output_stream)
logging.info('FINISHED download from %s to %s', url, dest)
return
logging.debug('NOTHING TO DO')
return
def main():
TEST_URL = 'http://mirror.internode.on.net/pub/test/1meg.test'
DEST = TEST_URL.split('/')[-1]
download(TEST_URL, DEST)
main()

You can try something like this. It reads the file line by line and appends it to a file. It also checks to make sure that you don't go over the same line. I'll write another script that does it by chunks as well.
import urllib2
file_checker = None
print("Please Wait...")
while True:
try:
req = urllib2.Request('http://www.voidspace.org.uk')
response = urllib2.urlopen(req, timeout=20)
print("Connected")
with open("outfile.html", 'w+') as out_data:
for data in response.readlines():
file_checker = open("outfile.html")
if data not in file_checker.readlines():
out_data.write(str(data))
break
except urllib2.URLError:
print("Connection Error!")
print("Connecting again...please wait")
file_checker.close()
print("done")
Here's how to read the data in chunks instead of by lines
import urllib2
CHUNK = 16 * 1024
file_checker = None
print("Please Wait...")
while True:
try:
req = urllib2.Request('http://www.voidspace.org.uk')
response = urllib2.urlopen(req, timeout=1)
print("Connected")
with open("outdata", 'wb+') as out_data:
while True:
chunk = response.read(CHUNK)
file_checker = open("outdata")
if chunk and chunk not in file_checker.readlines():
out_data.write(chunk)
else:
break
break
except urllib2.URLError:
print("Connection Error!")
print("Connecting again...please wait")
file_checker.close()
print("done")

How to test send_file flask

I have a small flask application which takes some images for upload and converts them into a multipage tiff. Nothing special.
But how do I test the upload of multiple files and the file download?
My Testclient:
class RestTestCase(unittest.TestCase):
def setUp(self):
self.dir = os.path.dirname(__file__)
rest = imp.load_source('rest', self.dir + '/../rest.py')
rest.app.config['TESTING'] = True
self.app = rest.app.test_client()
def runTest(self):
with open(self.dir + '/img/img1.jpg', 'rb') as img1:
img1StringIO = StringIO(img1.read())
response = self.app.post('/convert',
content_type='multipart/form-data',
data={'photo': (img1StringIO, 'img1.jpg')},
follow_redirects=True)
assert True
if __name__ == "__main__":
unittest.main()
The application sends back the file with
return send_file(result, mimetype='image/tiff', \
as_attachment=True)
I want to read the file sent in the response and compare it with another file. How do I get the file from the response object?

I think maybe the confusion here is that response is a Response object and not the data downloaded by the post request. This is because an HTTP response has other attributes that are often useful to know, for example http status code returned, the mime-type of the response, etc... The attribute names to access these are listed in the link above.
The response object has an attribute called 'data', so response.data will contain the data downloaded from the server. The docs I linked to indicate that data is soon to be deprecated, and the get_data() method should be used instead, but the testing tutorial still uses data. Test on your own system to see what works.Assuming you want to test a round trip of the data,
def runTest(self):
with open(self.dir + '/img/img1.jpg', 'rb') as img1:
img1StringIO = StringIO(img1.read())
response = self.app.post('/convert',
content_type='multipart/form-data',
data={'photo': (img1StringIO, 'img1.jpg')},
follow_redirects=True)
img1StringIO.seek(0)
assert response.data == imgStringIO.read()

execute http post with Lua

I have written a Python script which POSTs data to an apache webserver and the data arrives nicely in $_POST and $_FILES aray. Now I want to implement this same thing in Lua but I can't get it going yet.
My code in Python looks something like this:
try:
wakeup()
socket.setdefaulttimeout(TIMEOUT)
opener = urllib2.build_opener(MultipartPostHandler.MultipartPostHandler)
host = HOST
func = "post_img"
url = "http://{0}{1}?f={2}&nodemac={3}&time={4}".format(host, URI, func, nodemac, timestamp)
if os.path.isfile(filename):
data = {"data":open(filename,"rb")}
print "POST time "+str(time.time())
response = opener.open(url, data, timeout=TIMEOUT)
retval = response.read()
if "SUCCESS" in retval:
return 0
else:
print "RETVAL: "+retval
return 99
except Exception as e:
print "EXCEPTION time "+str(time.time())+" - "+str(e)
return 99
The Lua code I have come up with thus far:
#! /usr/bin/lua
http = require("socket.http")
ltn12 = require("ltn12")
http.request{
url = "localhost/test.php?test=SEMIOS",
method = "POST",
headers = {
["Content-Type"] = "multipart/form-data; boundary=127.0.1.1.1000.17560.1375897994.242.1",
["Content-Length"] = 7333
},
source = ltn12.source.file(io.open("test.gif")),
sink = ltn12.sink.table(response_body)
}
print(response_body[1]) --response to request
but this code keeps getting me this on execution:
$ ./post.lua
/usr/bin/lua: ./post.lua:17: attempt to index global 'response_body' (a nil value)
stack traceback:
./post.lua:17: in main chunk
[C]: ?
reg#DesktopOffice:~$

There are several examples of sending POST data using Lua: from the author of luasocket and SO. This example works directly with files, which is very close to what you are using.
Your description of this question doesn't match the comment you provided.

Python grequests takes a long time to finish

I am trying to unshort a lot of URLs which I have in a urlSet. The following code works most of the time. But some times it takes a very long time to finish. For example I have 2950 in urlSet. stderr tells me that 2900 is done, but getUrlMapping does not finish.
def getUrlMapping(urlSet):
# get the url mapping
urlMapping = {}
#rs = (grequests.get(u) for u in urlSet)
rs = (grequests.head(u) for u in urlSet)
res = grequests.imap(rs, size = 100)
counter = 0
for x in res:
counter += 1
if counter % 50 == 0:
sys.stderr.write('Doing %d url_mapping length %d \n' %(counter, len(urlMapping)))
urlMapping[ getOriginalUrl(x) ] = getGoalUrl(x)
return urlMapping
def getGoalUrl(resp):
url=''
try:
url = resp.url
except:
url = 'NULL'
return url
def getOriginalUrl(resp):
url=''
try:
url = resp.history[0].url
except IndexError:
url = resp.url
except:
url = 'NULL'
return url

Probably it won't help you as it has passed a long time but still..
I was having some issues with Requests, similar to the ones you are having. To me the problem was that Requests took ages to download some pages, but using any other software (browsers, curl, wget, python's urllib) everything worked fine...
Afer a LOT of time wasted, I noticed that the server was sending some invalid headers, for example, in one of the "slow" pages, after Content-type: text/html it began to send header in the form Header-name : header-value (notice the space before the colon). This somehow breaks Python's email.header functionality used to parse HTTP headers by Requests so the Transfer-encoding: chunked header wasn't being parsed.
Long story short: manually setting the chunked property to True of Response objects before asking for the content solved the issue. For example:
response = requests.get('http://my-slow-url')
print(response.text)
took ages but
response = requests.get('http://my-slow-url')
response.raw.chunked = True
print(response.text)
worked great!

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

uploading file contents in python takes a long time - python

Related

aiohttp request.multipart() get nothing from browser uploading files

Urllib2 Python - Reconnecting and Splitting Response

How to test send_file flask

execute http post with Lua

Python grequests takes a long time to finish

Categories

Resources