Python 3 Requests errors - python

I'm working through Python Crash Course 2nd Ed. and in the text is some code for accessing APIs. My code is copied from the text and is as follows:
import requests
import json
from operator import itemgetter
#Fetch top stories and store in variable r
url = 'https://hacker-news.firebaseio.com/v0/topstories.json'
r = requests.get(url)
print(f"Status code: {r.status_code}")
# #Explore data structure
# response_dict = r.json()
# readable_file = 'hn_readable.json'
# with open(readable_file, 'w') as f:
# json.dump(response_dict, f, indent=4)
submission_ids = r.json()
submission_dicts = []
for submission_id in submission_ids[:30]:
#Make API call for each article
url = f"https://hacker-news.firebasio.com/v0/item/{submission_id}.json"
r = requests.get(url)
print(f"id: {submission_id}\tstatus code: {r.status_code}")
response_dict = r.json()
#Store dictionary of each article
submission_dict = {
'title': response_dict['title'],
'score': response_dict['score'],
'comments': response_dict['descendants'],
'link': response_dict['url'],
}
submission_dicts.append(submission_dict)
#Sort article by score
submission_dicts = sorted(submission_dicts, key=itemgetter('score'), reverse = True)
#Display information about each article, ranked by score
for submission_dict in submission_dicts:
print(f"Article title: {submission_dict['title']}")
print(f"Article link: {submission_dict['url']}")
print(f"Score: {submission_dict['score']}")
However, this is now returning the following error messages:
Status code: 200
Traceback (most recent call last):
File "C:\Users\snack\AppData\Roaming\Python\Python37\site-packages\urllib3\connectionpool.py", line 677, in urlopen
chunked=chunked,
File "C:\Users\snack\AppData\Roaming\Python\Python37\site-packages\urllib3\connectionpool.py", line 381, in _make_request
self._validate_conn(conn)
File "C:\Users\snack\AppData\Roaming\Python\Python37\site-packages\urllib3\connectionpool.py", line 976, in _validate_conn
conn.connect()
File "C:\Users\snack\AppData\Roaming\Python\Python37\site-packages\urllib3\connection.py", line 370, in connect
ssl_context=context,
File "C:\Users\snack\AppData\Roaming\Python\Python37\site-packages\urllib3\util\ssl_.py", line 377, in ssl_wrap_socket
return context.wrap_socket(sock, server_hostname=server_hostname)
File "C:\Users\snack\Python\lib\ssl.py", line 423, in wrap_socket
session=session
File "C:\Users\snack\Python\lib\ssl.py", line 870, in _create
self.do_handshake()
File "C:\Users\snack\Python\lib\ssl.py", line 1139, in do_handshake
self._sslobj.do_handshake()
ssl.SSLCertVerificationError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: certificate has expired (_ssl.c:1076)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\Users\snack\AppData\Roaming\Python\Python37\site-packages\requests\adapters.py", line 449, in send
timeout=timeout
File "C:\Users\snack\AppData\Roaming\Python\Python37\site-packages\urllib3\connectionpool.py", line 725, in urlopen
method, url, error=e, _pool=self, _stacktrace=sys.exc_info()[2]
File "C:\Users\snack\AppData\Roaming\Python\Python37\site-packages\urllib3\util\retry.py", line 439, in increment
raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='hacker-news.firebasio.com', port=443): Max retries exceeded with url: /v0/item/23273247.json (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: certificate has expired (_ssl.c:1076)')))
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\Users\snack\Python\proj_2\hn_submissions.py", line 24, in <module>
r = requests.get(url)
File "C:\Users\snack\AppData\Roaming\Python\Python37\site-packages\requests\api.py", line 76, in get
return request('get', url, params=params, **kwargs)
File "C:\Users\snack\AppData\Roaming\Python\Python37\site-packages\requests\api.py", line 61, in request
return session.request(method=method, url=url, **kwargs)
File "C:\Users\snack\AppData\Roaming\Python\Python37\site-packages\requests\sessions.py", line 530, in request
resp = self.send(prep, **send_kwargs)
File "C:\Users\snack\AppData\Roaming\Python\Python37\site-packages\requests\sessions.py", line 643, in send
r = adapter.send(request, **kwargs)
File "C:\Users\snack\AppData\Roaming\Python\Python37\site-packages\requests\adapters.py", line 514, in send
raise SSLError(e, request=request)
requests.exceptions.SSLError: HTTPSConnectionPool(host='hacker-news.firebasio.com', port=443): Max retries exceeded with url: /v0/item/23273247.json (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: certificate has expired (_ssl.c:1076)')))
[Finished in 3.6s]
I have almost no experience with this, but from what I can tell, some authentication is failing and not letting my program access the API, but I have no idea why. I've tried limiting the number of API calls by removing the loop, but it doesn't seem to help. I also tried adding the verify=False parameter into the requests.get lines, but that just kicked up different errors.

There is nothing wrong with the API call itself.
As you visit the site https://hacker-news.firebaseio.com/v0/topstories.json you can see the expected list in the browser. (Your first and working api call)
As the first number in this list is 23277594, the script start with this request https://hacker-news.firebasio.com/v0/item/23277594.json, but visiting this url via the browser will also result in warnings. (your second and failing api call)

Alright, it was typos (of course). The url in my code was https...firebasio....json instead of https...firebaseio....json. One of the results is still not working, but I'm assuming that's due to the article not having comments (i.e. descendants), so some try/ except should fix that.

Related

Python Requests Mount Not Working on Linux But Works Fine on Windows

I have the following code and when I run it on Windows I can make requests through a specific NIC as said on this answer but when I run it on Arch Linux request goes to timeout.
import requests
from requests_toolbelt.adapters import source
source = source.SourceAddressAdapter('10.100.89.75')
with requests.Session() as session:
session.mount('http://', source)
r = session.get("http://ifconfig.me")
print(r.text)
I get the following error:
Traceback (most recent call last):
File "<stdin>", line 3, in <module>
File "/usr/lib/python3.10/site-packages/requests/sessions.py", line 600, in get
return self.request("GET", url, **kwargs)
File "/usr/lib/python3.10/site-packages/requests/sessions.py", line 587, in request
resp = self.send(prep, **send_kwargs)
File "/usr/lib/python3.10/site-packages/requests/sessions.py", line 701, in send
r = adapter.send(request, **kwargs)
File "/usr/lib/python3.10/site-packages/requests/adapters.py", line 553, in send
raise ConnectTimeout(e, request=request)
requests.exceptions.ConnectTimeout: HTTPConnectionPool(host='ifconfig.me', port=80): Max retries exceeded with url: / (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x7f8e0ab379a0>, 'Connection to ifconfig.me timed out. (connect timeout=None)'))

Handeling errors in Python requests [duplicate]

This question already has answers here:
Python requests.exceptions.SSLError: EOF occurred in violation of protocol
(9 answers)
Closed 1 year ago.
I am learning to use requests in Python and I need a way to get a meaningful output if the site does not exist at all.
I looked at this question, but it is unclear if the OP of the question actually wants to check if the site exists, or if it just returns an error. The problem with all of the answers that question is that if the site does not exist at all we cannot really use HTTP response headers, because no response is returned from a server that does not exist.
Here is an example.
If I use this code I will not get any errors because the site exists.
import requests
r = requests.get('https://duckduckgo.com')
However, if I enter a web page I know does not exist I will get an error
import requests
r = requests.get('https://thissitedoesnotexist.com')
if r.status_code == requests.codes.ok:
print('Site good')
else:
print('Site bad')
This error is super long and I would prefer to have a more meaningful and short error if the site does not exist.
Traceback (most recent call last):
File "C:\Users\ADMIN\AppData\Local\Programs\Python\Python310\lib\site-packages\urllib3\connectionpool.py", line 699, in urlopen
httplib_response = self._make_request(
File "C:\Users\ADMIN\AppData\Local\Programs\Python\Python310\lib\site-packages\urllib3\connectionpool.py", line 382, in _make_request
self._validate_conn(conn)
File "C:\Users\ADMIN\AppData\Local\Programs\Python\Python310\lib\site-packages\urllib3\connectionpool.py", line 1010, in _validate_conn
conn.connect()
File "C:\Users\ADMIN\AppData\Local\Programs\Python\Python310\lib\site-packages\urllib3\connection.py", line 416, in connect
self.sock = ssl_wrap_socket(
File "C:\Users\ADMIN\AppData\Local\Programs\Python\Python310\lib\site-packages\urllib3\util\ssl_.py", line 449, in ssl_wrap_socket
ssl_sock = _ssl_wrap_socket_impl(
File "C:\Users\ADMIN\AppData\Local\Programs\Python\Python310\lib\site-packages\urllib3\util\ssl_.py", line 493, in _ssl_wrap_socket_impl
return ssl_context.wrap_socket(sock, server_hostname=server_hostname)
File "C:\Users\ADMIN\AppData\Local\Programs\Python\Python310\lib\ssl.py", line 512, in wrap_socket
return self.sslsocket_class._create(
File "C:\Users\ADMIN\AppData\Local\Programs\Python\Python310\lib\ssl.py", line 1070, in _create
self.do_handshake()
File "C:\Users\ADMIN\AppData\Local\Programs\Python\Python310\lib\ssl.py", line 1341, in do_handshake
self._sslobj.do_handshake()
ssl.SSLEOFError: EOF occurred in violation of protocol (_ssl.c:997)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\Users\ADMIN\AppData\Local\Programs\Python\Python310\lib\site-packages\requests\adapters.py", line 439, in send
resp = conn.urlopen(
File "C:\Users\ADMIN\AppData\Local\Programs\Python\Python310\lib\site-packages\urllib3\connectionpool.py", line 755, in urlopen
retries = retries.increment(
File "C:\Users\ADMIN\AppData\Local\Programs\Python\Python310\lib\site-packages\urllib3\util\retry.py", line 574, in increment
raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='234876.com', port=443): Max retries exceeded with url: / (Caused by SSLError(SSLEOFError(8, 'EOF occurred in violation of protocol (_ssl.c:997)')))
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\Users\ADMIN\Desktop\tetst.py", line 2, in <module>
r = requests.get('https://234876.com')
File "C:\Users\ADMIN\AppData\Local\Programs\Python\Python310\lib\site-packages\requests\api.py", line 75, in get
return request('get', url, params=params, **kwargs)
File "C:\Users\ADMIN\AppData\Local\Programs\Python\Python310\lib\site-packages\requests\api.py", line 61, in request
return session.request(method=method, url=url, **kwargs)
File "C:\Users\ADMIN\AppData\Local\Programs\Python\Python310\lib\site-packages\requests\sessions.py", line 542, in request
resp = self.send(prep, **send_kwargs)
File "C:\Users\ADMIN\AppData\Local\Programs\Python\Python310\lib\site-packages\requests\sessions.py", line 655, in send
r = adapter.send(request, **kwargs)
File "C:\Users\ADMIN\AppData\Local\Programs\Python\Python310\lib\site-packages\requests\adapters.py", line 514, in send
raise SSLError(e, request=request)
requests.exceptions.SSLError: HTTPSConnectionPool(host='234876.com', port=443): Max retries exceeded with url: / (Caused by SSLError(SSLEOFError(8, 'EOF occurred in violation of protocol (_ssl.c:997)')))
Is it possible to make a function that returns, for example print('The site probably does not exist') or at least does not give an EOF error?
Normally the desirable thing to do is trap Exceptions from requests
You also can use .raise_for_status() on the Response to get a meaningful Exception for non-OK requests
However, you want to watch out for where you want to handle an Exception
immediately? can your program handle it meaningfully or should it exit?
should the caller handle a specific Exception (such as requests.exceptions.Timeout) or a more general one?
do you have many functions which call each other? should any handle some subset of possible Exceptions? and which?
See Python Exception Hierarchy for how the first-party Exceptions inheritance structure
import sys
import requests
def some_function_which_makes_requests():
r = requests.get("https://example.com", timeout=(2,10))
r.raise_for_status() # raise for non-OK
return r.json() # interpret response via some method (for example as JSON)
def main():
...
try:
result_json = some_function_which_makes_requests
except requests.exceptions.Timeout:
print("WARNING: request timed out")
result_json = None # still effectively handled for later program?
except requests.exceptions.RequestException as ex:
sys.exit(f"something wrong with Request: {repr(ex)}")
except Exception:
sys.exit(f"something wrong around Request: {repr(ex)}")
# now you can use result_json
Did some more research and just learned that I need to use a Python Try Except as mentioned by #Anand Sowmithiran. Here is a video explaining it for beginners: https://www.youtube.com/watch?v=NIWwJbo-9_8
import requests
try:
r = requests.get("http://www.duckduckgo.com")
except requests.exceptions.ConnectionError:
print('\n\tSorry. There was a network problem getting the URL. Perhaps it does not exist?\n\tCheck the URL, DNS issues or if you are being rejected by the server.')
else:
print(r)

SSL problems using Twilio

I'm trying to use Twilio to send SMS. I'm using their templates to send my first test message:
import os
from twilio.rest import Client
client = Client(my_SID, my_TOKEN)
message = client.messages \
.create(
body="Join Earth's mightiest heroes. Like Kevin Bacon.",
from_= number1,
to= number2
)
print(message.sid)
I've manually replaced the SID and the TOKEN with their respective values as per Twilio's console (the os.environ[] function doesn't work). The thing is, this error appearas as I try to run the code:
PS C:\Users\USER> & C:/Users/USER/anaconda3/python.exe "d:/Escritorio/amigo secreto/send_sms.py"
Traceback (most recent call last):
File "C:\Users\USER\anaconda3\lib\site-packages\urllib3\connectionpool.py", line 688, in urlopen
conn = self._get_conn(timeout=pool_timeout)
File "C:\Users\USER\anaconda3\lib\site-packages\urllib3\connectionpool.py", line 280, in _get_conn
return conn or self._new_conn()
File "C:\Users\USER\anaconda3\lib\site-packages\urllib3\connectionpool.py", line 979, in _new_conn
raise SSLError(
urllib3.exceptions.SSLError: Can't connect to HTTPS URL because the SSL module is not available.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\Users\USER\anaconda3\lib\site-packages\requests\adapters.py", line 439, in send
resp = conn.urlopen(
File "C:\Users\USER\anaconda3\lib\site-packages\urllib3\connectionpool.py", line 755, in urlopen
retries = retries.increment(
File "C:\Users\USER\anaconda3\lib\site-packages\urllib3\util\retry.py", line 574, in increment
raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='api.twilio.com', port=443): Max retries exceeded with url: /2010-04-01/Accounts/ACfd9e165c0a6ba1760d5671ccbfc5dbc6/Messages.json (Caused by SSLError("Can't connect to HTTPS URL because the SSL module is not available."))
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "d:/Escritorio/amigo secreto/send_sms.py", line 10, in <module>
message = client.messages \
File "C:\Users\USER\anaconda3\lib\site-packages\twilio\rest\api\v2010\account\message\__init__.py", line 88, in create
payload = self._version.create(method='POST', uri=self._uri, data=data, )
File "C:\Users\USER\anaconda3\lib\site-packages\twilio\base\version.py", line 193, in create
response = self.request(
File "C:\Users\USER\anaconda3\lib\site-packages\twilio\base\version.py", line 39, in request
return self.domain.request(
File "C:\Users\USER\anaconda3\lib\site-packages\twilio\base\domain.py", line 38, in request
return self.twilio.request(
File "C:\Users\USER\anaconda3\lib\site-packages\twilio\rest\__init__.py", line 131, in request
return self.http_client.request(
File "C:\Users\USER\anaconda3\lib\site-packages\twilio\http\http_client.py", line 91, in request
response = session.send(
File "C:\Users\USER\anaconda3\lib\site-packages\requests\sessions.py", line 655, in send
r = adapter.send(request, **kwargs)
File "C:\Users\USER\anaconda3\lib\site-packages\requests\adapters.py", line 514, in send
raise SSLError(e, request=request)
requests.exceptions.SSLError: HTTPSConnectionPool(host='api.twilio.com', port=443): Max retries exceeded with url: /2010-04-01/Accounts/ACfd9e165c0a6ba1760d5671ccbfc5dbc6/Messages.json (Caused by SSLError("Can't connect to HTTPS URL because the SSL module is not available."))
I've never used an API before, I could really use somebody's guidance. Thanks in advance
Twilio developer evangelist here.
That error looks as though you have issues between your installation of Anaconda and the SSL module. There's a potential fix from this GitHub issue, run:
execstack -c anaconda3/lib/libcrypto.so.1.0.0
Alternatively, other comments suggest installing OpenSSL for your environment.

python requests: (SSLError(1, '[SSL: WRONG_VERSION_NUMBER] wrong version number (_ssl.c:1123)'))

I have a python script that gets me some information from basketball-reference.com
It stopped working today due to this error:
File "C:\Users\pphotsauce\anaconda3\lib\site-packages\urllib3\connectionpool.py", line 670, in urlopen
httplib_response = self._make_request(
File "C:\Users\pphotsauce\anaconda3\lib\site-packages\urllib3\connectionpool.py", line 381, in _make_request
self._validate_conn(conn)
File "C:\Users\pphotsauce\anaconda3\lib\site-packages\urllib3\connectionpool.py", line 978, in _validate_conn
conn.connect()
File "C:\Users\pphotsauce\anaconda3\lib\site-packages\urllib3\connection.py", line 362, in connect
self.sock = ssl_wrap_socket(
File "C:\Users\pphotsauce\anaconda3\lib\site-packages\urllib3\util\ssl_.py", line 386, in ssl_wrap_socket
return context.wrap_socket(sock, server_hostname=server_hostname)
File "C:\Users\pphotsauce\anaconda3\lib\ssl.py", line 500, in wrap_socket
return self.sslsocket_class._create(
File "C:\Users\pphotsauce\anaconda3\lib\ssl.py", line 1040, in _create
self.do_handshake()
File "C:\Users\pphotsauce\anaconda3\lib\ssl.py", line 1309, in do_handshake
self._sslobj.do_handshake()
SSLError: [SSL: WRONG_VERSION_NUMBER] wrong version number (_ssl.c:1123)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\Users\pphotsauce\anaconda3\lib\site-packages\requests\adapters.py", line 439, in send
File "C:\Users\pphotsauce\anaconda3\lib\site-packages\urllib3\connectionpool.py", line 726, in urlopen
retries = retries.increment(
File "C:\Users\pphotsauce\anaconda3\lib\site-packages\urllib3\util\retry.py", line 446, in increment
raise MaxRetryError(_pool, url, error or ResponseError(cause))
MaxRetryError: HTTPSConnectionPool(host='www.basketball-reference.com', port=443): Max retries exceeded with url: /search/search.fcgi?search=RJ+Barrett (Caused by SSLError(SSLError(1, '[SSL: WRONG_VERSION_NUMBER] wrong version number (_ssl.c:1123)')))
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "G:\fan_duel\april_version\main_2.py", line 13, in <module>
test = [f.get_bref_id_tester(x) for x in full_names]
File "G:\fan_duel\april_version\main_2.py", line 13, in <listcomp>
test = [f.get_bref_id_tester(x) for x in full_names]
File "G:\fan_duel\april_version\functions.py", line 30, in get_bref_id_tester
search_results_page = requests.get(url, headers=headers, allow_redirects='False')
File "C:\Users\pphotsauce\anaconda3\lib\site-packages\requests\api.py", line 76, in get
File "C:\Users\pphotsauce\anaconda3\lib\site-packages\requests\api.py", line 61, in request
File "C:\Users\pphotsauce\anaconda3\lib\site-packages\requests\sessions.py", line 542, in request
File "C:\Users\pphotsauce\anaconda3\lib\site-packages\requests\sessions.py", line 655, in send
File "C:\Users\pphotsauce\anaconda3\lib\site-packages\requests\adapters.py", line 514, in send
SSLError: HTTPSConnectionPool(host='www.basketball-reference.com', port=443): Max retries exceeded with url: /search/search.fcgi?search=RJ+Barrett (Caused by SSLError(SSLError(1, '[SSL: WRONG_VERSION_NUMBER] wrong version number (_ssl.c:1123)')))
runfile('G:/fan_duel/april_version/main_2.py', wdir='G:/fan_duel/april_version')
Reloaded modules: functions
Traceback (most recent call last):
File "C:\Users\pphotsauce\anaconda3\lib\site-packages\urllib3\connectionpool.py", line 670, in urlopen
httplib_response = self._make_request(
File "C:\Users\pphotsauce\anaconda3\lib\site-packages\urllib3\connectionpool.py", line 381, in _make_request
self._validate_conn(conn)
File "C:\Users\pphotsauce\anaconda3\lib\site-packages\urllib3\connectionpool.py", line 978, in _validate_conn
conn.connect()
File "C:\Users\pphotsauce\anaconda3\lib\site-packages\urllib3\connection.py", line 362, in connect
self.sock = ssl_wrap_socket(
File "C:\Users\pphotsauce\anaconda3\lib\site-packages\urllib3\util\ssl_.py", line 386, in ssl_wrap_socket
return context.wrap_socket(sock, server_hostname=server_hostname)
File "C:\Users\pphotsauce\anaconda3\lib\ssl.py", line 500, in wrap_socket
return self.sslsocket_class._create(
File "C:\Users\pphotsauce\anaconda3\lib\ssl.py", line 1040, in _create
self.do_handshake()
File "C:\Users\pphotsauce\anaconda3\lib\ssl.py", line 1309, in do_handshake
self._sslobj.do_handshake()
SSLError: [SSL: WRONG_VERSION_NUMBER] wrong version number (_ssl.c:1123)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\Users\pphotsauce\anaconda3\lib\site-packages\requests\adapters.py", line 439, in send
File "C:\Users\pphotsauce\anaconda3\lib\site-packages\urllib3\connectionpool.py", line 726, in urlopen
retries = retries.increment(
File "C:\Users\pphotsauce\anaconda3\lib\site-packages\urllib3\util\retry.py", line 446, in increment
raise MaxRetryError(_pool, url, error or ResponseError(cause))
MaxRetryError: HTTPSConnectionPool(host='www.basketball-reference.com', port=443): Max retries exceeded with url: /search/search.fcgi?search=Chris+Boucher (Caused by SSLError(SSLError(1, '[SSL: WRONG_VERSION_NUMBER] wrong version number (_ssl.c:1123)')))
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "G:\fan_duel\april_version\main_2.py", line 13, in <module>
test = [f.get_bref_id_tester(x) for x in full_names]
File "G:\fan_duel\april_version\main_2.py", line 13, in <listcomp>
test = [f.get_bref_id_tester(x) for x in full_names]
File "G:\fan_duel\april_version\functions.py", line 33, in get_bref_id_tester
search_results_page = requests.get(url, headers=headers, allow_redirects='False')
File "C:\Users\pphotsauce\anaconda3\lib\site-packages\requests\api.py", line 76, in get
File "C:\Users\pphotsauce\anaconda3\lib\site-packages\requests\api.py", line 61, in request
File "C:\Users\pphotsauce\anaconda3\lib\site-packages\requests\sessions.py", line 542, in request
File "C:\Users\pphotsauce\anaconda3\lib\site-packages\requests\sessions.py", line 655, in send
File "C:\Users\pphotsauce\anaconda3\lib\site-packages\requests\adapters.py", line 514, in send
SSLError: HTTPSConnectionPool(host='www.basketball-reference.com', port=443): Max retries exceeded with url: /search/search.fcgi?search=Chris+Boucher (Caused by SSLError(SSLError(1, '[SSL: WRONG_VERSION_NUMBER] wrong version number (_ssl.c:1123)')))
I have a function that uses an NBA player's full name and returns a basketball reference id. If I apply this function to a pandas series or iterate through a list of names, only some of the names (different names each time) will cause the error. ('nightmare' is a dictionary with troublesome names)
nightmare = {
'Bol Bol': 'bolbo01',
'Nicolo Melli': 'mellini01',
'Davis Bertans': 'bertada01',
'Tomas Satoransky': 'satorto01',
'Theo Maledon': 'maledth01',
'Bogdan Bogdanovic': 'bogdabo01',
}
def get_bref_id(full_name):
sleep(0.2)
if full_name not in nightmare:
try:
first_name, last_name = full_name.split(' ')
url = 'https://www.basketball-reference.com/search/search.fcgi?search='+first_name+'+'+last_name
search_results_page = requests.get(url, allow_redirects = "False")
soup = bs(search_results_page.content, 'html.parser')
potential_links = soup.find_all('div', class_="search-item-name")
bucket = [element for element in potential_links if '202' in element.text if first_name in element.text]
bref_id = str(bucket[0]).replace('.', '/').split('/')[3]
except IndexError: # b-ref search occasionally takes you straight to the page
bref_id = search_results_page.url.split('/')[5].split('.')[0]
finally:
return bref_id
else:
return nightmare[full_name]
I don't understand anything about SSL or what could be causing this issue. If you could point me in the right direction to learn more, I would be grateful.
The server www.basketball-reference.com requires at least TLS 1.2. It looks like your Python is linked against a version of OpenSSL which is too old to support TLS 1.2. Use the following code to check which OpenSSL version is used. Support for TLS 1.2 was added with OpenSSL 1.0.1 ages ago, but for example MacOS shipped for a long time with the old version OpenSSL 0.9.8.
import ssl
print(ssl.OPENSSL_VERSION)
Try add the user agent in the headers parameter. Also, I added a little input option if the search returns more than 1 option so you can choose:
Code:
from bs4 import BeautifulSoup as bs
import requests
from time import sleep
#pip install choice
import choice
nightmare = {
'Bol Bol': 'bolbo01',
'Nicolo Melli': 'mellini01',
'Davis Bertans': 'bertada01',
'Tomas Satoransky': 'satorto01',
'Theo Maledon': 'maledth01',
'Bogdan Bogdanovic': 'bogdabo01',
}
def get_bref_id(full_name):
sleep(0.2)
if full_name not in nightmare:
first_name, last_name = full_name.split(' ')
url = 'https://www.basketball-reference.com/search/search.fcgi?search='+first_name+'+'+last_name
headers = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.190 Safari/537.36'}
search_results_page = requests.get(url, headers=headers, allow_redirects = "False")
soup = bs(search_results_page.content, 'html.parser')
potential_links = soup.find_all('div', class_="search-item-name")
if len(potential_links) > 1:
bucket = {}
for element in potential_links:
player = element.find('a').text
playerId = element.find('a')['href'].split('/')[-1].replace('.html','')
statsType = element.find('a')['href'].split('/')[1]
bucket[player + ' (%s)' %statsType] = playerId
print('Which player id did you want?')
player_choice = choice.Menu(bucket.keys()).ask()
bref_id = bucket[player_choice]
else:
try:
element = soup.find_all('div', class_="search-item-name")[0]
bref_id = element.find('a')['href'].split('/')[-1].replace('.html','')
except IndexError: # b-ref search occasionally takes you straight to the page
bref_id = search_results_page.url.split('/')[5].split('.')[0]
return bref_id
else:
return nightmare[full_name]
Output:
Which player id did you want?
Make a choice:
0: Michael Jordan (1985-2003) (players)
1: Michael Jordan (1984-1992) (international)
2: Michael-Hakim Jordan (2006-2009) (international)
3: Michael Jordan (2000-2003) (executives)
Enter number or name; return for next page
? 0
jordami01

Expired certificate, not working with cerify=True; requests.exceptions.SSLError ceritificate verify failed

I am a real beginner in Python and learnt basically everything from the internet - so please excuse if I might not have grasped all concepts properly.
My problem is that I try to program a webscraping with requests and BeautifulSoup. Since two days I get the error that the certificate is expired and it's also true if I enter this website - I can even not add it as exception in my explorer.
This is my code:
def project_spider(max_pages):
global page
page = 1
#for i in range(1, max_pages+1):
while page <= max_pages:
# for i in range(1, page + 1)
page += 1
url = 'https://hubbub.org/projects/?page=' + str(page)
# Collect list of urls
try:
source_code = requests.get(url, allow_redirects=False, timeout=15, verify=False)
except Exception or AttributeError or ConnectionError or IOError:
print 'Failed to open url.'
pass
# Turn urls to text
plain_text = source_code.text.encode('utf-8')
# define object with all text on website
soup = BeautifulSoup(plain_text, 'html.parser')
# define variable that finds in the text data everything that is in the html code considered "diverse" and has the attributes 'col...' class
data = soup.findAll('div', attrs={'class': 'col-xs-12 col-sm-6 col-md-4 col-lg-3'})
# for every found diverse in the data variable
for div in data:
#search all diverse for links (a)
links = div.findAll('a', href=True)
global names
names = div.find('h4').contents[0]
print(names)
for a in links:
global links2
links2 = a['href']
print(links2)
get_single_item_data(links2)
Probably an expert would program differently. However, I tried to fix it with verify=False and with session() but it doesn't work. I also tried to jump over the page that it is in (5) but I couldn't skip it. I'm really desperate in this moment as all I get is this error:
https://rabbitraisers.org/p/fantasticfloats/
Traceback (most recent call last):
File "C:\Users\stockisa\AppData\Local\Programs\Python\Python37\lib\site-packages\urllib3\connectionpool.py", line 600, in urlopen
chunked=chunked)
File "C:\Users\stockisa\AppData\Local\Programs\Python\Python37\lib\site-packages\urllib3\connectionpool.py", line 343, in _make_request
self._validate_conn(conn)
File "C:\Users\stockisa\AppData\Local\Programs\Python\Python37\lib\site-packages\urllib3\connectionpool.py", line 849, in _validate_conn
conn.connect()
File "C:\Users\stockisa\AppData\Local\Programs\Python\Python37\lib\site-packages\urllib3\connection.py", line 356, in connect
ssl_context=context)
File "C:\Users\stockisa\AppData\Local\Programs\Python\Python37\lib\site-packages\urllib3\util\ssl_.py", line 359, in ssl_wrap_socket
return context.wrap_socket(sock, server_hostname=server_hostname)
File "C:\Users\stockisa\AppData\Local\Programs\Python\Python37\lib\ssl.py", line 412, in wrap_socket
session=session
File "C:\Users\stockisa\AppData\Local\Programs\Python\Python37\lib\ssl.py", line 850, in _create
self.do_handshake()
File "C:\Users\stockisa\AppData\Local\Programs\Python\Python37\lib\ssl.py", line 1108, in do_handshake
self._sslobj.do_handshake()
ssl.SSLCertVerificationError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: certificate has expired (_ssl.c:1045)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\Users\stockisa\AppData\Local\Programs\Python\Python37\lib\site-packages\requests\adapters.py", line 445, in send
timeout=timeout
File "C:\Users\stockisa\AppData\Local\Programs\Python\Python37\lib\site-packages\urllib3\connectionpool.py", line 638, in urlopen
_stacktrace=sys.exc_info()[2])
File "C:\Users\stockisa\AppData\Local\Programs\Python\Python37\lib\site-packages\urllib3\util\retry.py", line 398, in increment
raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='rabbitraisers.org', port=443): Max retries exceeded with url: /p/fantasticfloats/ (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: certificate has expired (_ssl.c:1045)')))
Import this at the top of your source code
from requests.packages.urllib3.exceptions import InsecureRequestWarning
Then put this as one of the first lines of your project_spider function
requests.packages.urllib3.disable_warnings(InsecureRequestWarning)

Categories

Resources