Python broken urllib2 request - python

I get this error when trying to run my program:
Traceback (most recent call last):
File "C:/Users/Adasli199/Desktop/Minetek-Testy-Stuff/tools/soldering-iron.py", line 400, in <module>
MCVersionRegEx = cacheMCVersions()
File "C:/Users/Adasli199/Desktop/Minetek-Testy-Stuff/tools/soldering-iron.py", line 141, in cacheMCVersions
feeddata = opener.open(request).read()
File "C:\Python27\lib\urllib2.py", line 404, in open
response = self._open(req, data)
File "C:\Python27\lib\urllib2.py", line 422, in _open
'_open', req)
File "C:\Python27\lib\urllib2.py", line 382, in _call_chain
result = func(*args)
File "C:\Python27\lib\urllib2.py", line 1222, in https_open
return self.do_open(httplib.HTTPSConnection, req)
File "C:\Python27\lib\urllib2.py", line 1184, in do_open
raise URLError(err)
urllib2.URLError: <urlopen error [Errno 1] _ssl.c:510: error:14077438:SSL routines:SSL23_GET_SERVER_HELLO:tlsv1 alert internal error>
I am trying to get some data data to help with processing my RegEx command in another part of the code, here is the function that it is referencing to which throws the error.
import httplib
import urllib2
def cacheMCVersions():
request = urllib2.Request('https://mcversions.net/')
request.add_header('User-Agent', 'SolderingIron/1.0 +http://tetrarch.co/') # identify ourselves so we don't get blocked
opener = urllib2.build_opener()
feeddata = opener.open(request).read()
There is some more regex code after this to interpret the data but I feel this is all that is needed to resolve the issues. At the time of running, https://mcversions.net/ was up and still is as far as I am aware, which is what makes this error even more strange.

urllib is broken on python 2.7.9 and lower for https requests.
Install python 2.7.10 inside a virtual enviornment
and it should work. You may need to install libssl-dev or openssl depending on your linux platform.

Related

Web Scraping on Mac with VS Code OS X Yosemite Error

I am trying to do some web scraping using using Python and Beautifulsoup. I am using a fairly old MacBook on OS X Yosemite 10.10.5 and can't update the OS further. I am using VS Code to write and execute the code.
I have used Home Brew to update Python to the latest version - I think, same with pip. However when I try to run the code I keep getting these error messages, as seen below..
Code I enter into VS Code
# import libraries
import urllib2
from bs4 import BeautifulSoup
# specify the url
quote_page = 'http://www.bloomberg.com/quote/SPX:IND'
#
page = urllib2.urlopen(quote_page)
# parse the html using beautiful soup and store in variable `soup`
soup = BeautifulSoup(page,'html.parser')
# Take out the <div> of name and get its value
name_box = soup.find('h1', attrs={'class': 'name'})
name = name_box.text.strip() # strip() is used to remove starting and trailing
print name
# get the index price
price_box = soup.find('div', attrs={'class':'price'})
price = price_box.text
print price
Output when I try to execute code:
[Running] python -u "/Users/TheChef/Desktop/# import libraries.py"
Traceback (most recent call last):
File "/Users/TheChef/Desktop/# import libraries.py", line 7, in <module>
page = urllib2.urlopen(quote_page)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 154, in urlopen
return opener.open(url, data, timeout)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 437, in open
response = meth(req, response)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 550, in http_response
'http', request, response, code, msg, hdrs)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 469, in error
result = self._call_chain(*args)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 409, in _call_chain
result = func(*args)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 656, in http_error_302
return self.parent.open(new, timeout=req.timeout)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 431, in open
response = self._open(req, data)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 449, in _open
'_open', req)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 409, in _call_chain
result = func(*args)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 1240, in https_open
context=self._context)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 1197, in do_open
raise URLError(err)
urllib2.URLError: <urlopen error [SSL: TLSV1_ALERT_PROTOCOL_VERSION] tlsv1 alert protocol version (_ssl.c:590)>
[Done] exited with code=1 in 1.165 seconds
Also, the instructions for the code that I'm following are: https://www.freecodecamp.org/news/how-to-scrape-websites-with-python-and-beautifulsoup-5946935d93fe/
I have tried a number of solutions that I've seen on other forums, to no avail. Any one know how to solve this??

jython use urllib2 to make a request throw General SSLEngine

I'm trying to use jython to make a AJAX call. This server will need a 'session.id' to be authorized.
url = 'https://somewebsite.com:8443/executor?pa=p1&pb=p2&session.id=ac8884bc-33f2-46e9-9893-5c7b92de5d5e'
urllib2.urlopen(url).read()
I run this on python, it works fine.
But when I run it on jython, it throw exceptions. The traceback stack is:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/zsun/jython2.7.0/Lib/urllib2.py", line 127, in urlopen
return _opener.open(url, data, timeout)
File "/Users/zsun/jython2.7.0/Lib/urllib2.py", line 404, in open
response = self._open(req, data)
File "/Users/zsun/jython2.7.0/Lib/urllib2.py", line 421, in _open
result = self._call_chain(self.handle_open, protocol, protocol +
File "/Users/zsun/jython2.7.0/Lib/urllib2.py", line 382, in _call_chain
result = func(*args)
File "/Users/zsun/jython2.7.0/Lib/urllib2.py", line 1222, in https_open
return self.do_open(httplib.HTTPSConnection, req)
File "/Users/zsun/jython2.7.0/Lib/urllib2.py", line 1184, in do_open
raise URLError(err)
urllib2.URLError: <urlopen error [Errno 1] General SSLEngine problem (javax.net.ssl.SSLHandshakeException: General SSLEngine problem)>
Any suggestions?
More than likely you are dealing with an unsigned certificate.
Since Java always checks certificates by default, even if you use the verify=True parameter on the request, look at this to find a Jython-only solution:
http://tech.pedersen-live.com/2010/10/trusting-all-certificates-in-jython/

how to get raw html text of a given url using python

I'm using html2text in python to get raw text (tags included) of a HTML page by taking any URL but I'm getting an error.
My code -
import html2text
import urllib2
proxy = urllib2.ProxyHandler({'http': 'http://<proxy>:<pass>#<ip>:<port>'})
auth = urllib2.HTTPBasicAuthHandler()
opener = urllib2.build_opener(proxy, auth, urllib2.HTTPHandler)
urllib2.install_opener(opener)
html = urllib2.urlopen("http://www.ndtv.com/india-news/this-stunt-for-a-facebook-like-got-the-hyderabad-youth-arrested-740851").read()
print html2text.html2text(html)
The error -
Traceback (most recent call last):
File "t.py", line 8, in <module>
html = urllib2.urlopen("http://www.ndtv.com/india-news/this-stunt-for-a-facebook-like-got-the-hyderabad-youth-arrested-740851").read()
File "/usr/lib/python2.7/urllib2.py", line 127, in urlopen
return _opener.open(url, data, timeout)
File "/usr/lib/python2.7/urllib2.py", line 404, in open
response = self._open(req, data)
File "/usr/lib/python2.7/urllib2.py", line 422, in _open
'_open', req)
File "/usr/lib/python2.7/urllib2.py", line 382, in _call_chain
result = func(*args)
File "/usr/lib/python2.7/urllib2.py", line 1214, in http_open
return self.do_open(httplib.HTTPConnection, req)
File "/usr/lib/python2.7/urllib2.py", line 1184, in do_open
raise URLError(err)
urllib2.URLError: <urlopen error [Errno 110] Connection timed out>
Can anyone explain what I'm doing wrong?
If you don't require SSL, this script in Python 2.7.x should work:
import urllib
url = "http://stackoverflow.com"
f = urllib.urlopen(url)
print f.read()
and in Python 3.x use urllib.request instead of urllib
Because urllib2 for Python 2, in Python 3 it was merged into urllib.
http:// is required.
EDIT: In 2020, you should use the 3rd party module requests. requests can be installed with pip.
import requests
print(requests.get("http://stackoverflow.com").text)

urllib2.urlopen raise urllib2.URLError

I'm doing a easy work to get the page of "http://search.jd.com/Search?keyword=%E5%A5%87%E7%9F%B3&enc=utf-8"
so my python code is:
# -*- coding: utf-8 -*-
import sys, codecs
import urllib, urllib2
url = "http://search.jd.com/Search?keyword=%E5%A5%87%E7%9F%B3&enc=utf-8"
print url
page=urllib2.urlopen(url).read()
print page
however I get
Traceback (most recent call last):
File "tmp.py", line 15, in <module>
page=urllib2.urlopen(url).read()
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 127, in urlopen
return _opener.open(url, data, timeout)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 404, in open
response = self._open(req, data)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 422, in _open
'_open', req)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 382, in _call_chain
result = func(*args)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 1214, in http_open
return self.do_open(httplib.HTTPConnection, req)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 1184, in do_open
raise URLError(err)
urllib2.URLError: <urlopen error [Errno 8] nodename nor servname provided, or not known>
can anyone tell me what's going on?
many thanks!
Sounds like it might be a network issue. Check that you have a consistent internet connection (e.g. by pinging an appropriate server continuously as you run the tests). Just ran the code you post and worked perfectly fine for me.
Your codes working fine for me too.
But the error could occur in case, the url has some characters like "+=#" it would then require
s = "http://search.jd.com/Search?keyword=%E5%A5%87%E7%9F%B3&enc=utf-8"
my_url = urllib2.quote(s.encode("utf8"))
page=urllib2.urlopen(my_url).read()
print page
Alternatively you could use requests.
response =requests.post(url)
print response.content
or
print response.text
It's network issue, please be sure you are on a proper internet connection.

How to crawl Twitter pages using Python?

When I try to crawl Twitter using this code:
import urllib2
s = "https://mobile.twitter.com/bing/"
html = urllib2.urlopen(s).read()
print html
... I get the following error:
Traceback (most recent call last):
File "C:\Users\arpit\Downloads\Desktop\Wiki Code\final Crawler_wiki.py", line 14, in <module>
html = urllib2.urlopen(s).read()
File "C:\Python27\lib\urllib2.py", line 126, in urlopen
return _opener.open(url, data, timeout)
File "C:\Python27\lib\urllib2.py", line 400, in open
response = self._open(req, data)
File "C:\Python27\lib\urllib2.py", line 418, in _open
'_open', req)
File "C:\Python27\lib\urllib2.py", line 378, in _call_chain
result = func(*args)
File "C:\Python27\lib\urllib2.py", line 1215, in https_open
return self.do_open(httplib.HTTPSConnection, req)
File "C:\Python27\lib\urllib2.py", line 1177, in do_open
raise URLError(err)
URLError: <urlopen error [Errno 10061] No connection could be made because the target machine actively refused it>
If I replace mobile.twitter.com with twitter.com then it works, but I want it to work with mobile.twitter.com.
The twitter site is probably looking for a user-agent which you dont have set when you make the request through the urllib api.
You will likely need to use something like mechanize to fake your user-agent.
But I highly suggest your use the twitter api which provide a lot of easy and awesome way to play with data.

Categories

Resources