I am trying to upload a zip file directly to S3 using a Python script, but running into some Unicode Decode Errors.
What I do is generate a Pre-Signed S3 Link and then upload data to it. I know the link works fine because the upload works when I use curl to do it like this:
curl -v -H "Content-Type: application/zip" -T /Path/To/Local/File.zip https://MySignedAWSS3Link
However, when I attempt this in Python using the code below, I get an error.
infile2 = open('/Path/To/Local/File.zip', 'rb')
filedata2 = infile2.read()
request2 = urllib2.Request("https://MySignedAWSS3Link",data=filedata2)
request2.add_header('Content-Type', 'application/zip')
request2.get_method = lambda: 'PUT'
url2 = opener.open(request2)
I get the following error/traceback in Python:
> Traceback (most recent call last):
File "putFiles.py", line 44, in <module>
url2 = opener.open(request2)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 404, in open
response = self._open(req, data)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 422, in _open
'_open', req)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 382, in _call_chain
result = func(*args)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 1222, in https_open
return self.do_open(httplib.HTTPSConnection, req)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 1181, in do_open
h.request(req.get_method(), req.get_selector(), req.data, headers)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/httplib.py", line 973, in request
self._send_request(method, url, body, headers)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/httplib.py", line 1007, in _send_request
self.endheaders(body)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/httplib.py", line 969, in endheaders
self._send_output(message_body)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/httplib.py", line 827, in _send_output
msg += message_body
UnicodeDecodeError: 'ascii' codec can't decode byte 0xca in position 10: ordinal not in range(128)
What am I doing wrong here?
According to your traceback, some string added to your request is unicode. However, according to your script example all strings are ascii encoded (since you are using Python2.7).
Weird things happen on your end, unless you set Python default encoding to UTF-8 (or OS sets it for you).
I hope, this should work for you (convert all strings into ascii):
infile2 = open('/Path/To/Local/File.zip', 'rb')
filedata2 = infile2.read()
request2 = urllib2.Request("https://MySignedAWSS3Link".encode('utf-8'),data=filedata2)
request2.add_header(str('Content-Type'), str('application/zip'))
request2.get_method = lambda: str('PUT')
url2 = opener.open(request2)
Update: This q/a might help you too
Related
Problem: I have a text file with names written in Russian. I take each name from the text file and form a request to Wikipidea with line from text file as page title. Then I want to take information about all existing images on this website.
Program:
with open('names-video.txt', "r", encoding='Windows-1251') as file:
for line in file.readlines():
print(line)
name = "_".join(line.split())
print(name)
html = urlopen(f'https://ru.wikipedia.org/wiki/{name}')
bs = BeautifulSoup(html, 'html.parser')
images = bs.findAll('img', {'src': re.compile('.jpg')})
print(images[0])
names-video.txt:
Алимпиев, Виктор Гелиевич
Андреев, Алексей Викторович (художник)
Баевер, Антонина
Булдаков, Алексей Александрович
Жестков, Максим Евгеньевич
Канис, Полина Владимировна
Мустафин, Денис Рафаилович
Преображенский, Кирилл Александрович
Селезнёв, Владимир Викторович
Сяйлев, Андрей Фёдорович
Шерстюк, Татьяна Александровна
Error message:
error from callback <bound method SocketHandler.handle_message of <amino.socket.SocketHandler object at 0x0000018B92600FA0>>: 'ascii' codec can't encode characters in position 10-17: ordinal not in range(128)
File "C:\Users\1\Desktop\ИНФА\pycharm\venv\lib\site-packages\websocket\_app.py", line 344, in _callback
callback(*args)
File "C:\Users\1\Desktop\ИНФА\pycharm\venv\lib\site-packages\amino\socket.py", line 80, in handle_message
self.client.handle_socket_message(data)
File "C:\Users\1\Desktop\ИНФА\pycharm\venv\lib\site-packages\amino\client.py", line 345, in handle_socket_message
return self.callbacks.resolve(data)
File "C:\Users\1\Desktop\ИНФА\pycharm\venv\lib\site-packages\amino\socket.py", line 204, in resolve
return self.methods.get(data["t"], self.default)(data)
File "C:\Users\1\Desktop\ИНФА\pycharm\venv\lib\site-packages\amino\socket.py", line 192, in _resolve_chat_message
return self.chat_methods.get(key, self.default)(data)
File "C:\Users\1\Desktop\ИНФА\pycharm\venv\lib\site-packages\amino\socket.py", line 221, in on_text_message
def on_text_message(self, data): self.call(getframe(0).f_code.co_name, objects.Event(data["o"]).Event)
File "C:\Users\1\Desktop\ИНФА\pycharm\venv\lib\site-packages\amino\socket.py", line 209, in call
handler(data)
File "C:\Users\1\Desktop\python-bots\music_bot\bot.py", line 56, in on_text_message
html = urlopen(f'https://ru.wikipedia.org/wiki/{name}')
File "C:\Users\1\AppData\Local\Programs\Python\Python39\lib\urllib\request.py", line 214, in urlopen
return opener.open(url, data, timeout)
File "C:\Users\1\AppData\Local\Programs\Python\Python39\lib\urllib\request.py", line 517, in open
response = self._open(req, data)
File "C:\Users\1\AppData\Local\Programs\Python\Python39\lib\urllib\request.py", line 534, in _open
result = self._call_chain(self.handle_open, protocol, protocol +
File "C:\Users\1\AppData\Local\Programs\Python\Python39\lib\urllib\request.py", line 494, in _call_chain
result = func(*args)
File "C:\Users\1\AppData\Local\Programs\Python\Python39\lib\urllib\request.py", line 1385, in https_open
return self.do_open(http.client.HTTPSConnection, req,
File "C:\Users\1\AppData\Local\Programs\Python\Python39\lib\urllib\request.py", line 1342, in do_open
h.request(req.get_method(), req.selector, req.data, headers,
File "C:\Users\1\AppData\Local\Programs\Python\Python39\lib\http\client.py", line 1255, in request
self._send_request(method, url, body, headers, encode_chunked)
File "C:\Users\1\AppData\Local\Programs\Python\Python39\lib\http\client.py", line 1266, in _send_request
self.putrequest(method, url, **skips)
File "C:\Users\1\AppData\Local\Programs\Python\Python39\lib\http\client.py", line 1104, in putrequest
self._output(self._encode_request(request))
File "C:\Users\1\AppData\Local\Programs\Python\Python39\lib\http\client.py", line 1184, in _encode_request
return request.encode('ascii')
Question: For some reason the code breaks on urlopen(). print(line) and print(name) work just fine. What can be the problem here? I've been trying to tackle this issue for quite a while and I will appreciate any solution, thanks in advance.
You'll need to percent encode the non-ASCII characters to make it a proper URI:
from urllib.parse import quote
...
name = "_".join(line.split())
# Percent encode the UTF-8 characters
name = quote(name)
print(name)
...
Note the link listed in the comment is for Python 2.7, but this questions pertains to Python 3.7.
I'm using Python 3.7 and Django. I want to read from a URL that has special characters in its string, but get errors when I try the traditional way ...
>>> url = "https://www.supergaming.com/f/gaming/article/pvmqe/was_browsing_the_steam_app_reviews_and_ಠ_ಠ/"
...
>>> html = urllib2.urlopen(req, 5000).read()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 222, in urlopen
return opener.open(url, data, timeout)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 525, in open
response = self._open(req, data)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 543, in _open
'_open', req)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 503, in _call_chain
result = func(*args)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 1360, in https_open
context=self._context, check_hostname=self._check_hostname)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 1317, in do_open
encode_chunked=req.has_header('Transfer-encoding'))
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/http/client.py", line 1229, in request
self._send_request(method, url, body, headers, encode_chunked)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/http/client.py", line 1240, in _send_request
self.putrequest(method, url, **skips)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/http/client.py", line 1107, in putrequest
self._output(request.encode('ascii'))
UnicodeEncodeError: 'ascii' codec can't encode character '\u0ca0' in position 69: ordinal not in range(128)
So I tried the solution recommended here -- How to convert a url string to safe characters with python? , but I'm still unable to read the URL
>>> urllib.parse.quote_plus(url)
'https%3A%2F%2Fwww.supergaming.com%2Ff%2Fgaming%2Farticle%2Fpvmqe%2Fwas_browsing_the_steam_app_reviews_and_%E0%B2%A0_%E0%B2%A0%2F'
>>> req = urllib2.Request(urllib.parse.quote_plus(url))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 328, in __init__
self.full_url = url
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 354, in full_url
self._parse()
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 383, in _parse
raise ValueError("unknown url type: %r" % self.full_url)
ValueError: unknown url type: 'https%3A%2F%2Fwww.supergaming.com%2Fr%2Fgaming%2Farticle%2Fpvmqe%2Fwas_browsing_the_steam_app_reviews_and_%E0%B2%A0_%E0%B2%A0%2F'
What's the proper way to read from a URL if it contains special characters?
I'm trying to pull some JSON data from an API using urllib in Python 3.6. It requires header information to be passed for authorization. Here is my code:
import urllib.request, json
headers = {"authorization" : "Bearer {authorization_token}"}
with urllib.request.urlopen("{api_url}", data=headers) as url:
data = json.loads(url.read().decode())
print(data)
And the error message I get:
Traceback (most recent call last):
File "getter.py", line 5, in <module>
with urllib.request.urlopen("{url}", data=headers) as url:
File "AppData\Local\Programs\Python\Python36-32\lib\urllib\request.py", line 223, in urlopen
return opener.open(url, data, timeout)
File "AppData\Local\Programs\Python\Python36-32\lib\urllib\request.py", line 526, in open
response = self._open(req, data)
File "AppData\Local\Programs\Python\Python36-32\lib\urllib\request.py", line 544, in _open
'_open', req)
File "AppData\Local\Programs\Python\Python36-32\lib\urllib\request.py", line 504, in _call_chain
result = func(*args)
File "AppData\Local\Programs\Python\Python36-32\lib\urllib\request.py", line 1361, in https_open
context=self._context, check_hostname=self._check_hostname)
File "AppData\Local\Programs\Python\Python36-32\lib\urllib\request.py", line 1318, in do_open
encode_chunked=req.has_header('Transfer-encoding'))
File "AppData\Local\Programs\Python\Python36-32\lib\http\client.py", line 1239, in request
self._send_request(method, url, body, headers, encode_chunked)
File "AppData\Local\Programs\Python\Python36-32\lib\http\client.py", line 1285, in _send_request
self.endheaders(body, encode_chunked=encode_chunked)
File "AppData\Local\Programs\Python\Python36-32\lib\http\client.py", line 1234, in endheaders
self._send_output(message_body, encode_chunked=encode_chunked)
File "AppData\Local\Programs\Python\Python36-32\lib\http\client.py", line 1064, in _send_output
+ b'\r\n'
TypeError: can't concat bytes to str
Process finished with exit code 1
Not too sure what's going wrong here, I'm not inputting any bytes so I'm not sure why I'm getting an error telling me I can't concat bytes to str.
The data argument is expected to be a bytes-like object. you need to do the following:
urllib.request.urlopen({api_url}, data=bytes(json.dumps(headers), encoding="utf-8"))
So I have a little script that I would like to use to upload some PDFs to my citation-site-of-choice (citeulike.org)
Thing is its not working. It does this:
so want to upload /Users/willwade/Dropbox/Papers/price_promoting_643127.pdf to 12589610
Traceback (most recent call last):
File "citeuupload.py", line 167, in <module>
cureader.parseUserBibTex()
File "citeuupload.py", line 160, in parseUserBibTex
self.uploadFileToCitation(b['citeulike-article-id'],self.localpapers+fileorfalse)
File "citeuupload.py", line 138, in uploadFileToCitation
resp = self.browser.submit()
File "build/bdist.macosx-10.8-intel/egg/mechanize/_mechanize.py", line 541, in submit
File "build/bdist.macosx-10.8-intel/egg/mechanize/_mechanize.py", line 203, in open
File "build/bdist.macosx-10.8-intel/egg/mechanize/_mechanize.py", line 230, in _mech_open
File "build/bdist.macosx-10.8-intel/egg/mechanize/_opener.py", line 193, in open
File "build/bdist.macosx-10.8-intel/egg/mechanize/_urllib2_fork.py", line 344, in _open
File "build/bdist.macosx-10.8-intel/egg/mechanize/_urllib2_fork.py", line 332, in _call_chain
File "build/bdist.macosx-10.8-intel/egg/mechanize/_urllib2_fork.py", line 1142, in http_open
File "build/bdist.macosx-10.8-intel/egg/mechanize/_urllib2_fork.py", line 1115, in do_open
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/httplib.py", line 955, in request
self._send_request(method, url, body, headers)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/httplib.py", line 989, in _send_request
self.endheaders(body)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/httplib.py", line 951, in endheaders
self._send_output(message_body)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/httplib.py", line 809, in _send_output
msg += message_body
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 544: ordinal not in range(128)
and the code:
def uploadFileToCitation(self,artid,file):
print 'so want to upload', file, ' to ', artid
self.browser.open('http://www.citeulike.org/user/'+cUser+'/article/'+artid)
self.browser.select_form(name="fileupload_frm")
self.browser.form.add_file(open(file, 'rb'), 'application/pdf', file, name='file')
try:
resp = self.browser.submit()
self.wait_for_api_limit()
except mechanize.HTTPError, e:
print 'error'
print e.getcode()
print resp.read()
exit()
NB: I can see it's reading in the file correctly (and it does exist). Also note that I'm doing this elsewhere
self.browser = mechanize.Browser()
self.browser.set_handle_robots(False)
self.browser.addheaders = [
("User-agent", 'me#me.com citeusyncpy/1.0'),
]
Full code is here
Try to check this similar question.
To clarify, the message is constructed in httplib from the method, URL, headers, etc. If any of these is Unicode, the whole string gets converted to Unicode (I presume this is normal Python behavior). Then if you try to append a UTF-8 string you get the error I described in the original question...
From looks of it's a problem with encoding that proper header can fix.
Also you can check this issue.
I'm completely new to Flask. I have some code that will copy a file over to a virtual machine using the pysphere library. This works fine on its own, but when I try using a Flask app, I get:
UnicodeDecodeError: 'ascii' codec can't decode byte 0x90 in position 2: ordinal not in range(128)
At first I figured it is just because the web form is passing something it doesn't like. However, I decided to hard code the values and it still fails. Here is the code:
#app.route('/begin_install', methods=['POST'])
def begin_install():
source_installer_path = app.root_path + '/installers'
installer_file = str(request.form['installer'])
option_file_path = app.root_path + '/installers/options'
option_file = 'testing.options'
vmserver.start_install( request.form['vm'],
source_installer_path,
installer_file,
option_file_path,
option_file)
return render_template('results.html')
Then, in my pysphere related file:
def start_install(self, vmpath, installer_path, installer_file, options_path, options_file):
vm.revert_to_named_snapshot('python_install')
vm.power_on()
while vm.get_tools_status() != 'RUNNING':
sleep(3)
vm.login_in_guest(self.guest_user, self.guest_password)
vm.send_file('C:\\folder\\filetosend.exe', 'c:\\installer\\filename.exe')
Everything up to the "vm.send_file" works perfectly fine. If I call the same code from a non-Flask app, it also works perfectly fine. I'm very confused about why I'm getting errors from Flask when this part of the code is all pysphere.
EDIT: Here is the traceback
Traceback (most recent call last):
File "C:\Users\username\Flask\lib\site-packages\flask\app.py", line 1701, in __call__
return self.wsgi_app(environ, start_response)
File "C:\Users\username\Flask\lib\site-packages\flask\app.py", line 1689, in wsgi_app
response = self.make_response(self.handle_exception(e))
File "C:\Users\username\Flask\lib\site-packages\flask\app.py", line 1687, in wsgi_app
response = self.full_dispatch_request()
File "C:\Users\username\Flask\lib\site-packages\flask\app.py", line 1360, in full_dispatch_request
rv = self.handle_user_exception(e)
File "C:\Users\username\Flask\lib\site-packages\flask\app.py", line 1358, in full_dispatch_request
rv = self.dispatch_request()
File "C:\Users\username\Flask\lib\site-packages\flask\app.py", line 1344, in dispatch_request
return self.view_functions[rule.endpoint](**req.view_args)
File "C:\python\PycharmProjects\installtest\installtest.py", line 51, in begin_install
option_file)
File "C:\python\PycharmProjects\installtest\testmachines.py", line 56, in start_install
'c:\\installer\\filename.exe')
File "C:\Users\username\Flask\lib\site-packages\pysphere\vi_virtual_machine.py", line 1282, in send_file
resp = opener.open(request)
File "C:\Python27\Lib\urllib2.py", line 400, in open
response = self._open(req, data)
File "C:\Python27\Lib\urllib2.py", line 418, in _open
'_open', req)
File "C:\Python27\Lib\urllib2.py", line 378, in _call_chain
result = func(*args)
File "C:\Python27\Lib\urllib2.py", line 1215, in https_open
return self.do_open(httplib.HTTPSConnection, req)
File "C:\Python27\Lib\urllib2.py", line 1174, in do_open
h.request(req.get_method(), req.get_selector(), req.data, headers)
File "C:\Python27\Lib\httplib.py", line 958, in request
self._send_request(method, url, body, headers)
File "C:\Python27\Lib\httplib.py", line 992, in _send_request
self.endheaders(body)
File "C:\Python27\Lib\httplib.py", line 954, in endheaders
self._send_output(message_body)
File "C:\Python27\Lib\httplib.py", line 812, in _send_output
msg += message_body
UnicodeDecodeError: 'ascii' codec can't decode byte 0x90 in position 2: ordinal not in range(128)
As you can see in the traceback pysphere uses urllib2 which is in turn using httplib.
As far as I know, httplib is concatenating the whole http request without further checking for unicode strings. And python returns always unicode if one part is unicode.
I think your pysphere gets somehow poluted with unicodes either in the request or the opener in the send_file method:
File "C:\Users\username\Flask\lib\site-packages\pysphere\vi_virtual_machine.py", line 1282, in send_file
resp = opener.open(request)
you should check how you are creating resp. configuring your vm instance and probably in detail which header is unicode and how is that affecting the request or the opener instantiation.