The system cannot find the file specified in `edx-dl` module - python

Am trying to use this Python Module. https://github.com/coursera-dl/edx-dl
Please excuse my basic knowledge.
Installed Anaconda 3 Windows 10 then:
pip install edx-dl
pip install --upgrade youtube-dl
Then to get courses did:
edx-dl -u user#user.com --list-courses
edx-dl -u user#user.com COURSE_URL
This all worked however once downloads actually started was getting:
Got SSL/Connection error: HTTP Error 403: Forbidden
Fiddler showed that it was being blocked by by Cloudfare I suspect due to User-Agent
I the installed Fake_UserAgent https://pypi.python.org/pypi/fake-useragent and added:
from fake_useragent import UserAgent #added this
def edx_get_headers():
"""
Build the Open edX headers to create future requests.
"""
logging.info('Building initial headers for future requests.')
headers = {
'User-Agent': 'edX-downloader/0.01',
'Accept': 'application/json, text/javascript, */*; q=0.01',
'Content-Type': 'application/x-www-form-urlencoded;charset=utf-8',
'Referer': EDX_HOMEPAGE,
'X-Requested-With': 'XMLHttpRequest',
'X-CSRFToken': _get_initial_token(EDX_HOMEPAGE),
}
ua = UserAgent() #added this
headers['User-Agent'] = ua.ie #added this
It then downloaded a pdf and an xls but got another error due to request.py adding a header so added fake to requests.py and commented out the default header as below.
from fake_useragent import UserAgent
ub = UserAgent()
self.addheaders = [('User-Agent', ub.ie)]
# self.addheaders = [('User-Agent', self.version), ('Accept', '*/*')] [('User-Agent', self.version), ('Accept', '*/*')]
The new error is below. I can't work out how to troubleshoot further. I suspect it can't find a file / path possibly due to Windows.
[download] https://youtube.com/watch?v=bKkrDLwDnDE => Downloaded\Implementing_ETL_with_SQL_Server_Integration_Services\02-Module_1__ETL_Processing\01-%(title)s-%(id)s.%(ext)s
Downloading video with URL https://youtube.com/watch?v=bKkrDLwDnDE from YouTube.
Traceback (most recent call last):
File "edx-dl.py", line 6, in <module>
edx_dl.main()
File "c:\edx-dl-master\edx-dl-master\edx_dl\edx_dl.py", line 1080, in main
download(args, selections, filtered_units, headers)
File "c:\edx-dl-master\edx-dl-master\edx_dl\edx_dl.py", line 857, in download
headers)
File "c:\edx-dl-master\edx-dl-master\edx_dl\edx_dl.py", line 819, in download_unit
headers)
File "c:\edx-dl-master\edx-dl-master\edx_dl\edx_dl.py", line 801, in download_video
skip_or_download(youtube_downloads, headers, args)
File "c:\edx-dl-master\edx-dl-master\edx_dl\edx_dl.py", line 788, in skip_or_download
f(url, filename, headers, args)
File "c:\edx-dl-master\edx-dl-master\edx_dl\edx_dl.py", line 721, in download_url
download_youtube_url(url, filename, headers, args)
File "c:\edx-dl-master\edx-dl-master\edx_dl\edx_dl.py", line 761, in download_youtube_url
execute_command(cmd, args)
File "c:\edx-dl-master\edx-dl-master\edx_dl\utils.py", line 37, in execute_command
subprocess.check_call(cmd)
File "C:\Users\anton\Anaconda3\lib\subprocess.py", line 286, in check_call
retcode = call(*popenargs, **kwargs)
File "C:\Users\anton\Anaconda3\lib\subprocess.py", line 267, in call
with Popen(*popenargs, **kwargs) as p:
File "C:\Users\anton\Anaconda3\lib\subprocess.py", line 709, in __init__
restore_signals, start_new_session)
File "C:\Users\anton\Anaconda3\lib\subprocess.py", line 997, in _execute_child
startupinfo)
FileNotFoundError: [WinError 2] The system cannot find the file specified
Same issue as here however no resolution or assistance had been provided so thought I would try here instead.
https://github.com/coursera-dl/edx-dl/issues/368
Advice on how to learn to troubleshoot this would be appreciated.

Debugged the code and found that couldn't find youtube-dl.
Checked echo %PATH% and realised I had path to:
C:...\Anaconda3\ but not to C:...\Anaconda3\Scripts\ (this is location of youtube_dl.exe).
I had added this path but not rebooted.
Rebooted and now resolved.

There is another easy solution and no need to use Fake_UserAgent, just use other downloaders, like wget.
Install fresh edx_dl.
If you are on Windows download wget, save it for example on H drive.
Change download_url function like this:
def download_url(url, filename, headers, args):
"""
Downloads the given url in filename.
"""
if is_youtube_url(url):
download_youtube_url(url, filename, headers, args)
else:
# jcline
cmd = (["h:\wget.exe", url, '-c', '-O', filename, '--keep-session-cookies', '--no-check-certificate'])
execute_command(cmd, args)
(Source)

Related

when i use conn.getresponse(),there are some error

import sys
import pdb
import http.client
def PassParse():
headers = {"Accept":" application/json, text/plain, */*",
"Authorization":" Basic YWRtaW46YXNkZg==",
"Referer":" http://192.168.1.113:8080/#/apps",
"Accept-Language":" zh-CN",
"Accept-Encoding":" gzip, deflate",
"User-Agent":" Mozilla/5.0 (Windows NT 6.1; Trident/7.0; rv:11.0) like Gecko LBBROWSER",
"Host":" 192.168.1.113:8080",
"DNT":" 1",
"Connection":" Keep-Alive"};
conn = http.client.HTTPConnection("192.168.1.113:8080");
conn.request(method="Get",url="/api/v1/login",body=None,headers=headers);
response = conn.getresponse();
responseText = response.getheaders("content-lentgh");
print ("succ!^_^!");
#print (response.status);
print (responseText);
conn.close();
run error:
Traceback (most recent call last):
File "F:\Python\test1-3.4.py", line 32, in <module>
PassParse();
File "F:\Python\test1-3.4.py", line 24, in PassParse
response = conn.getresponse();
File "E:\program files\Python 3.4.3\lib\http\client.py", line 1171, in getresponse
response.begin()
File "E:\program files\Python 3.4.3\lib\http\client.py", line 351, in begin
version, status, reason = self._read_status()
File "E:\program files\Python 3.4.3\lib\http\client.py", line 333, in _read_status
raise BadStatusLine(line)
http.client.BadStatusLine: <html>
It seems that the httpserver didn't return valid http response, you can use telnet to check it.
telnet 192.168.1.113 8080
then send:
GET /api/v1/login HTTP/1.1
reference: https://www.w3.org/Protocols/rfc2616/rfc2616-sec6.html
From the Python docs for httplib:
exception httplib.BadStatusLine
A subclass of HTTPException. Raised if a server responds with a HTTP status code that we don’t understand.
Sounds like your API might not be returning something that can be parsed as a valid HTTP response with a valid HTTP status code. You might want to check that the code for your API endpoint is working as expected and is not failing.
Besides that, your code runs fine except for one thing: response.getheader() takes an argument, while response.getheaders() takes no argument, so Python with complain about that once you resolve the BadStatusLine exception.
I have solve the problem using the following code:
from requests.auth import HTTPBasicAuth
res=requests.get('http://192.168.1.113:8080/api/v1/login', auth=(username, password));

Python traceback error using urllib2

I am really confused, new to Python and I am working on a script that scrapes a website for products on Python27. I am trying to use urllib2 to do this and when I run the script it prints multiple traceback errors. Suggestions?
Script:
import urllib2, zlib, json
url='https://launches.endclothing.com/api/products'
req = urllib2.Request(url)
req.add_header(':host','launches.endclothing.com');req.add_header(':method','GET');req.add_header(':path','/api/products');req.add_header(':scheme','https');req.add_header(':version','HTTP/1.1');req.add_header('accept','application/json, text/plain, */*');req.add_header('accept-encoding','gzip,deflate');req.add_header('accept-language','en-US,en;q=0.8');req.add_header('cache-control','max-age=0');req.add_header('cookie','__/');req.add_header('user-agent','Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Ubuntu Chromium/37.0.2062.120 Chrome/37.0.2062.120 Safari/537.36');
resp = urllib2.urlopen(req).read()
resp = zlib.decompress(bytes(bytearray(resp)),15+32)
data = json.loads(resp)
for product in data:
for attrib in product.keys():
print str(attrib)+' :: '+ str(product[attrib])
print '\n'
Error(s):
C:\Users\Luke>py C:\Users\Luke\Documents\EndBot2.py
Traceback (most recent call last):
File "C:\Users\Luke\Documents\EndBot2.py", line 5, in <module>
resp = urllib2.urlopen(req).read()
File "C:\Python27\lib\urllib2.py", line 126, in urlopen
return _opener.open(url, data, timeout)
File "C:\Python27\lib\urllib2.py", line 391, in open
response = self._open(req, data)
File "C:\Python27\lib\urllib2.py", line 409, in _open
'_open', req)
File "C:\Python27\lib\urllib2.py", line 369, in _call_chain
result = func(*args)
File "C:\Python27\lib\urllib2.py", line 1181, in https_open
return self.do_open(httplib.HTTPSConnection, req)
File "C:\Python27\lib\urllib2.py", line 1148, in do_open
raise URLError(err)
urllib2.URLError: <urlopen error [Errno 1] _ssl.c:499: error:14077438:SSL routines:SSL23_GET_SERVER_HELLO:tlsv1 alert internal error>
You're running into issues with SSL configuration of your request. I'm sorry, but I won't correct your code, because we're in 2016, and there's a wonderful library that you should use instead: requests
So its usage is pretty simple:
>>> user_agent = 'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:40.0) Gecko/20100101 Firefox/40.1'
>>> result = requests.get('https://launches.endclothing.com/api/products', headers={'user-agent': user_agent})
>>> result
<Response [200]>
>>> result.json()
[{u'name': u'Adidas Consortium x HighSnobiety Ultraboost', u'colour': u'Grey', u'id': 30, u'releaseDate': u'2016-04-09T00:01:00+0100', …
You'll notice that I changed the user-agent in the previous query to have it working, because weirdly enough, the website is refusing API access to requests:
>>> result = requests.get('https://launches.endclothing.com/api/products')
>>> result
<Response [403]>
>>> result.text
This website is using a security service to protect itself from online attacks. The action you just performed triggered the security solution. There are several actions that could trigger this block including submitting a certain word or phrase, a SQL command or malformed data.</p></div><div class="error-right"><h3>What can I do to resolve this?</h3><p>If you are on a personal connection, like at home, you can run an anti-virus scan on your device to make sure it is not infected with malware.</p><p>If you are at an office or shared network, you can ask the network administrator to run a scan across the network looking for misconfigured or infected devices.
Otherwise, now that you've tried requests and your life has changed, you might still run into this issue again. As you might read from many places on internet, this is related to SNI and outdated libraries and you might get headaches trying to figure this out. My best advice would be for you to upgrade to Python3, as the problem is likely to be solved by installing a new vanilla version of python and the libs involved.
HTH

Opening a page with urllib2 handshake failure

I am simply trying to open a webpage: https://close5.com/home/
And I keep getting differing errors concerning my ssl. Here are a couple of my attempts and their errors. I am open to a fix that works in either framework. My end goal is to use turn this page into a beautifulsoup4 soup.
The error:
Traceback (most recent call last):
File "test.py", line 54, in <module>
print soup_maker_two(url)
File "test.py", line 45, in soup_maker_two
response = br.open(url)
File "/usr/local/lib/python2.7/dist-packages/mechanize/_mechanize.py", line 203, in open
return self._mech_open(url, data, timeout=timeout)
File "/usr/local/lib/python2.7/dist-packages/mechanize/_mechanize.py", line 230, in _mech_open
response = UserAgentBase.open(self, request, data)
File "/usr/local/lib/python2.7/dist-packages/mechanize/_opener.py", line 193, in open
response = urlopen(self, req, data)
File "/usr/local/lib/python2.7/dist-packages/mechanize/_urllib2_fork.py", line 344, in _open
'_open', req)
File "/usr/local/lib/python2.7/dist-packages/mechanize/_urllib2_fork.py", line 332, in _call_chain
result = func(*args)
File "/usr/local/lib/python2.7/dist-packages/mechanize/_urllib2_fork.py", line 1170, in https_open
return self.do_open(conn_factory, req)
File "/usr/local/lib/python2.7/dist-packages/mechanize/_urllib2_fork.py", line 1118, in do_open
raise URLError(err)
urllib2.URLError: <urlopen error [Errno 1] _ssl.c:510: error:14094410:SSL routines:SSL3_READ_BYTES:sslv3 alert handshake failure>
The code:
import mechanize
import ssl
from functools import wraps
def sslwrap(func):
#wraps(func)
def bar(*args, **kw):
kw['ssl_version'] = ssl.PROTOCOL_TLSv1
return func(*args, **kw)
return bar
def soup_maker_two(url):
br = mechanize.Browser()
br.set_handle_robots(False)
br.set_handle_equiv(False)
br.set_handle_refresh(False)
br.addheaders = [('User-agent', 'Firefox')]
ssl.wrap_socket = sslwrap(ssl.wrap_socket)
response = br.open(url)
for f in br.forms():
print f
return 'hi'
if __name__ == "__main__":
url = 'https://close5.com/'
print soup_maker_two(url)
I have also tried got this error and code combo
2nd Attempt
The error:
Traceback (most recent call last):
File "test.py", line 29, in <module>
print str(soup_maker(url))[0:1000]
File "test.py", line 22, in soup_maker
webpage = opener.open(req)
File "/usr/lib/python2.7/urllib2.py", line 404, in open
response = self._open(req, data)
File "/usr/lib/python2.7/urllib2.py", line 422, in _open
'_open', req)
File "/usr/lib/python2.7/urllib2.py", line 382, in _call_chain
result = func(*args)
File "/usr/lib/python2.7/urllib2.py", line 1222, in https_open
return self.do_open(httplib.HTTPSConnection, req)
File "/usr/lib/python2.7/urllib2.py", line 1184, in do_open
raise URLError(err)
urllib2.URLError: <urlopen error [Errno 1] _ssl.c:510: error:14077410:SSL routines:SSL23_GET_SERVER_HELLO:sslv3 alert handshake failure>
The code:
from bs4 import BeautifulSoup
import urllib2
def soup_maker(url):
class RedirectHandler(urllib2.HTTPRedirectHandler):
def http_error_302(self, req, fp, code, msg, headers):
result = urllib2.HTTPError(req.get_full_url(), code, msg, headers, fp)
result.status = code
return result
hdr = {'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.64 Safari/537.11',
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
'Accept-Charset': 'ISO-8859-1,utf-8;q=0.7,*;q=0.3',
'Accept-Encoding': 'none',
'Accept-Language': 'en-US,en;q=0.8',
'Connection': 'keep-alive'}
req = urllib2.Request(url,headers=hdr)
opener = urllib2.build_opener(RedirectHandler())
webpage = opener.open(req)
soup = BeautifulSoup(webpage, "html5lib")
return soup
if __name__ == "__main__":
url = 'https://close5.com/home/'
print str(soup_maker(url))[0:1000]
EDIT 1
from bs4 import BeautifulSoup
It was suggested that I use:
def soup_maker(url):
soup = BeautifulSoup(requests.get(url).content, "html5lib")
return soup
if __name__ == "__main__":
import requests
url = 'https://close5.com/home/'
print str(soup_maker(url))[:1000]
This code worked for Padraic, but does not work for me. I get the error:
Traceback (most recent call last):
File "test_3.py", line 10, in <module>
print str(soup_maker(url))[:1000]
File "test_3.py", line 4, in soup_maker
soup = BeautifulSoup(requests.get(url).content, "html5lib")
File "/usr/lib/python2.7/dist-packages/requests/api.py", line 55, in get
return request('get', url, **kwargs)
File "/usr/lib/python2.7/dist-packages/requests/api.py", line 44, in request
return session.request(method=method, url=url, **kwargs)
File "/usr/lib/python2.7/dist-packages/requests/sessions.py", line 455, in request
resp = self.send(prep, **send_kwargs)
File "/usr/lib/python2.7/dist-packages/requests/sessions.py", line 558, in send
r = adapter.send(request, **kwargs)
File "/usr/lib/python2.7/dist-packages/requests/adapters.py", line 385, in send
raise SSLError(e)
requests.exceptions.SSLError: [Errno 1] _ssl.c:510: error:14077410:SSL routines:SSL23_GET_SERVER_HELLO:sslv3 alert handshake failure
Which is the same error as before. I am gonna guess it may have something to do with the fact that I am using Python 2.7.6, but I am uncertain. Also, I am unsure how to use that information to solve my issue.
EDIT 2
The issue may lie in the incorrect version of requests. I currently have requests==2.2.1 in my pip freeze
sudo pip install -U requests
returns
Downloading/unpacking requests from https://pypi.python.org/packages/2.7/r/requests/requests-2.9.1-py2.py3-none-any.whl#md5=58a444aaa02780ad01983f5f540e67b2
Downloading requests-2.9.1-py2.py3-none-any.whl (501kB): 501kB downloaded
Installing collected packages: requests
Found existing installation: requests 2.2.1
Not uninstalling requests at /usr/lib/python2.7/dist-packages, owned by OS
Successfully installed requests
Cleaning up..
sudo pip2 install -U requests returns the same thing
sudo pip uninstall requests returns
Not uninstalling requests at /usr/lib/python2.7/dist-packages, owned by OS
I am running ubuntu 14.04 and python 2.7.6 and requests 2.2.1
Edit 3
sudo pip install --ignore-installed requests
gives
Downloading/unpacking requests
Downloading requests-2.9.1-py2.py3-none-any.whl (501kB): 501kB downloaded
Installing collected packages: requests
Successfully installed requests
Cleaning up...
but sudo pip freeze still gives requests==2.2.1
Edit 4
After going through many suggestions I now have
$python
Python 2.7.6 (default, Jun 22 2015, 18:00:18)
[GCC 4.8.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import requests;requests.__version__
'2.9.1'
>>> url = 'https://close5.com/home/'
>>> from bs4 import BeautifulSoup
>>> soup = BeautifulSoup(requests.get(url).content, "html5lib")
/usr/local/lib/python2.7/dist-packages/requests/packages/urllib3/util/ssl_.py:315: SNIMissingWarning: An HTTPS request has been made, but the SNI (Subject Name Indication) extension to TLS is not available on this platform. This may cause the server to present an incorrect TLS certificate, which can cause validation failures. For more information, see https://urllib3.readthedocs.org/en/latest/security.html#snimissingwarning.
SNIMissingWarning
/usr/local/lib/python2.7/dist-packages/requests/packages/urllib3/util/ssl_.py:120: InsecurePlatformWarning: A true SSLContext object is not available. This prevents urllib3 from configuring SSL appropriately and may cause certain SSL connections to fail. For more information, see https://urllib3.readthedocs.org/en/latest/security.html#insecureplatformwarning.
InsecurePlatformWarning
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python2.7/dist-packages/requests/api.py", line 67, in get
return request('get', url, params=params, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/requests/api.py", line 53, in request
return session.request(method=method, url=url, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 468, in request
resp = self.send(prep, **send_kwargs)
File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 576, in send
r = adapter.send(request, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/requests/adapters.py", line 447, in send
raise SSLError(e, request=request)
requests.exceptions.SSLError: [Errno 1] _ssl.c:510: error:14077410:SSL routines:SSL23_GET_SERVER_HELLO:sslv3 alert handshake failure
>>>
I would recommend using requests:
def soup_maker(url):
soup = BeautifulSoup(requests.get(url).content)
return soup
if __name__ == "__main__":
import requests
url = 'https://close5.com/home/'
print str(soup_maker(url))[:1000]
Which will give you what you require:
<html><head><title>Buy & Sell Locally with Close5</title><meta content="Close5 provides a safe and easy environment to list your items and sell them fast. Shop cars, home goods and Children's items locally with Close5" name="description"/><meta content="index, follow" name="robots"/><!--link(rel="canonical" href="https://www.close5.com")-->
<link href="https://www.close5.com/images/favicons/favicon-160x160.png" rel="image_src"/><meta content="index, follow" name="robots"/><!-- Facebook Item Tags--><meta content="Buy & Sell Locally with Close5" property="og:title"/><meta content="Close5" property="og:site_name"/><!-- meta(property="og:url" content='https://www.close5.com/images/app-icon.png')--><meta content="Close5 provides a safe and easy environment to list your items and sell them fast. Shop cars, home goods and Children's items locally with Close5" property="og:description"/><meta content="1470902013158927" property="fb:app_id"/><meta content="100000228184034" property="fb:
Edit1:
Your version of pip is ancient, upgrade with pip install -U requests
Edit2:
You installed requests with apt-get so you need to:
apt-get remove python-requests
pip install --ignore-installed requests # pip install -U requests should also work
I would remove pip altogether and download get-pip.py, run python get-pip.py and stick to using pip to install packages. Most likely pip did successfully install requests, the newer version is probably further down in your path.
Edit3:
You installed requests with apt-get so you cannot remove it with pip, use apt-get remove python-requests as suggested in Edit2.
Edit4:
The link in the output explains what is happening and suggests:
pip install pyopenssl ndg-httpsclient pyasn1
You can also:
pip install requests[security]

Problems running Dashku python script on Amazon Linux

I am trying to run a Dashku python script on my Amazon EC2 instance running Amazon Linux.
This is the error I get:
[ec2-user#ip-10-231-47-166 ~]$ python dashku_53dc123bc0f9ac740b009af9.py
Traceback (most recent call last):
File "dashku_53dc123bc0f9ac740b009af9.py", line 16, in <module>
requests.post('https://dashku.com/api/transmission', data=json.dumps(payload),headers=headers)
File "/usr/lib/python2.6/site-packages/requests/api.py", line 88, in post
return request('post', url, data=data, **kwargs)
File "/usr/lib/python2.6/site-packages/requests/api.py", line 44, in request
return session.request(method=method, url=url, **kwargs)
File "/usr/lib/python2.6/site-packages/requests/sessions.py", line 335, in request
resp = self.send(prep, **send_kwargs)
File "/usr/lib/python2.6/site-packages/requests/sessions.py", line 438, in send
r = adapter.send(request, **kwargs)
File "/usr/lib/python2.6/site-packages/requests/adapters.py", line 331, in send
raise SSLError(e)
requests.exceptions.SSLError: [Errno 1] _ssl.c:493: error:14090086:SSL routines:SSL3_GET_SERVER_CERTIFICATE:certificate verify failed
Note that I calling sudo yum install requests did not do much, but calling sudo yum upgrade python-requests did install it.
The script works find on my local machine and has not been modified at all:
# Instructions
#
# easy_install requests
# python dashku_53dc123bc0f9ac740b009af9.py
#
import requests
import json
payload = {
"bigNumber": 500,
"_id": "XXXXXX",
"apiKey": "XXXXXX"
}
headers = {'content-type': 'application/json'}
requests.post('https://dashku.com/api/transmission', data=json.dumps(payload),headers=headers)
Running it on my local machine works fine and Dashku is updated accordingly.
Any ideas? Thanks.
Not sure if this is the appropriate solution, but adding verify=False to the requests.post call "fixes" the problem.
requests.post('https://dashku.com/api/transmission', data=json.dumps(payload),headers=headers, verify=False)

tornado curl http client CANNOT fetch binary file

I want to fetch a Image(GIF format) from a website.So I use tornado in-build asynchronous http client to do it.My code is like the following:
import tornado.httpclient
import tornado.ioloop
import tornado.gen
import tornado.web
tornado.httpclient.AsyncHTTPClient.configure("tornado.curl_httpclient.CurlAsyncHTTPClient")
http_client = tornado.httpclient.AsyncHTTPClient()
class test(tornado.web.RequestHandler):
#tornado.gen.coroutine
def get(self):
content = yield http_client.fetch('http://www.baidu.com/img/bdlogo.gif')
print('=====', type(content.body))
application = tornado.web.Application([
(r'/', test)
])
application.listen(80)
tornado.ioloop.IOLoop.instance().start()
So when I visit the server it should fetch a gif file.However It catch a exception.
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x81 in position 8: invalid start byte
ERROR:tornado.application:Uncaught exception GET / (127.0.0.1)
HTTPRequest(protocol='http', host='127.0.0.1', method='GET', uri='/', version='HTTP/1.1', remote_ip='127.0.0.1', headers={'Accept-Language': 'zh-cn,zh;q=0.8,en-us;q=0.5,en;q=0.3', 'Accept-Encoding': 'gzip, deflate', 'Host': '127.0.0.1', 'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8', 'User-Agent': 'Mozilla/5.0 (X11; Linux i686; rv:17.0) Gecko/20130922 Firefox/17.0', 'Connection': 'keep-alive', 'Cache-Control': 'max-age=0', 'If-None-Match': '"da39a3ee5e6b4b0d3255bfef95601890afd80709"'})
Traceback (most recent call last):
File "/usr/lib/python3.2/site-packages/tornado/web.py", line 1144, in _when_complete
if result.result() is not None:
File "/usr/lib/python3.2/site-packages/tornado/concurrent.py", line 129, in result
raise_exc_info(self.__exc_info)
File "<string>", line 3, in raise_exc_info
File "/usr/lib/python3.2/site-packages/tornado/stack_context.py", line 302, in wrapped
ret = fn(*args, **kwargs)
File "/usr/lib/python3.2/site-packages/tornado/gen.py", line 550, in inner
self.set_result(key, result)
File "/usr/lib/python3.2/site-packages/tornado/gen.py", line 476, in set_result
self.run()
File "/usr/lib/python3.2/site-packages/tornado/gen.py", line 505, in run
yielded = self.gen.throw(*exc_info)
File "test.py", line 12, in get
content = yield http_client.fetch('http://www.baidu.com/img/bdlogo.gif')
File "/usr/lib/python3.2/site-packages/tornado/gen.py", line 496, in run
next = self.yield_point.get_result()
File "/usr/lib/python3.2/site-packages/tornado/gen.py", line 395, in get_result
return self.runner.pop_result(self.key).result()
File "/usr/lib/python3.2/concurrent/futures/_base.py", line 393, in result
return self.__get_result()
File "/usr/lib/python3.2/concurrent/futures/_base.py", line 352, in __get_result
raise self._exception
tornado.curl_httpclient.CurlError: HTTP 599: Failed writing body (0 != 1024)
ERROR:tornado.access:500 GET / (127.0.0.1) 131.53ms
It seems to attempt to decode my binary file as UTF-8 text, which is unnecessary.IF I comment
tornado.httpclient.AsyncHTTPClient.configure("tornado.curl_httpclient.CurlAsyncHTTPClient")
out, which will use a simple http client instead of pycurl, it works well.(It tell me that the type of "content" is bytes)
So if it return a bytes object, why it tries to decode it? I think the problems is the pycurl or the wrapper of pycurl in tornado, right?
My python version is 3.2.5, tornado 3.1.1, pycurl 7.19.
Thanks!
pycurl 7.19 doesn't support Python 3. Ubuntu (and possibly other Linux distributions) ship a modified version of pycurl that partially works with Python 3, but it doesn't work with Tornado (https://github.com/facebook/tornado/issues/671), and fails with an exception that looks like the one you're seeing here.
Until there's a new version of pycurl that officially supports Python 3 (or you use the change suggested in that Tornado bug report), I'm afraid you'll need to either go back to Python 2.7 or use Tornado's simple_httpclient instead.

Categories

Resources