Python: requests_html html method does not work - python

This is probably (hopefully) a simple beginner's mistake, yet I cannot find any explanation for what I'm doing wrong.
I want to scrape some content for a yt-video with bs4 and requests_html. both of these libraries are installed. however, when I try to run the code, it does not work. PyCharm tells me that the method 'html' as in r.html.render(sleep=1) and soup = bs(r.html.html, "html.parser") is the problem. could anyone please have a quick look into it and tell me what I might be doing wrong?
from bs4 import BeautifulSoup as bs
from requests_html import HTMLSession
video_id = "v8Yh_4oE-Fs"
video_root = "https://www.youtube.com/watch?v="
video_url = "".join((video_root, video_id))
session = HTMLSession()
r = session.get(video_url)
r.html.render(sleep=1)
soup = bs(r.html.html, "html.parser")
etc
edit: This is the full error message.
[W:pyppeteer.chromium_downloader] start chromium download. Download may take a few minutes. Traceback (most recent call last): File "C:\Program Files\Python\lib\site-packages\urllib3\contrib\pyopenssl.py", line 488, in wrap_socket
cnx.do_handshake() File "C:\Program Files\Python\lib\site-packages\OpenSSL\SSL.py", line 1934, in do_handshake
self._raise_ssl_error(self._ssl, result) File "C:\Program Files\Python\lib\site-packages\OpenSSL\SSL.py", line 1671, in
_raise_ssl_error
_raise_current_error() File "C:\Program Files\Python\lib\site-packages\OpenSSL_util.py", line 54, in exception_from_error_queue
raise exception_type(errors) OpenSSL.SSL.Error: [('SSL routines', 'tls_process_server_certificate', 'certificate verify failed')]
During handling of the above exception, another exception occurred:
Traceback (most recent call last): File "C:\Program Files\Python\lib\site-packages\urllib3\connectionpool.py", line 677, in urlopen
chunked=chunked, File "C:\Program Files\Python\lib\site-packages\urllib3\connectionpool.py", line 381, in _make_request
self._validate_conn(conn) File "C:\Program Files\Python\lib\site-packages\urllib3\connectionpool.py", line 976, in validate_conn
conn.connect() File "C:\Program Files\Python\lib\site-packages\urllib3\connection.py", line 370, in connect
ssl_context=context, File "C:\Program Files\Python\lib\site-packages\urllib3\util\ssl.py", line 377, in ssl_wrap_socket
return context.wrap_socket(sock, server_hostname=server_hostname) File "C:\Program Files\Python\lib\site-packages\urllib3\contrib\pyopenssl.py", line 494, in wrap_socket
raise ssl.SSLError("bad handshake: %r" % e) ssl.SSLError: ("bad handshake: Error([('SSL routines', 'tls_process_server_certificate', 'certificate verify failed')])",)
During handling of the above exception, another exception occurred:
Traceback (most recent call last): File "C:/Users/Administrator.WSW41/Documents/PycharmProjects/YoutubeData/try_out.py", line 10, in
r.html.render(sleep=1) File "C:\Program Files\Python\lib\site-packages\requests_html.py", line 586, in render
self.browser = self.session.browser # Automatically create a event loop and browser File "C:\Program Files\Python\lib\site-packages\requests_html.py", line 730, in browser
self._browser = self.loop.run_until_complete(super().browser) File "C:\Program Files\Python\lib\asyncio\base_events.py", line 584, in run_until_complete
return future.result() File "C:\Program Files\Python\lib\site-packages\requests_html.py", line 714, in browser
self._browser = await pyppeteer.launch(ignoreHTTPSErrors=not(self.verify), headless=True, args=self.__browser_args) File "C:\Program Files\Python\lib\site-packages\pyppeteer\launcher.py", line 306, in launch
return await Launcher(options, **kwargs).launch() File "C:\Program Files\Python\lib\site-packages\pyppeteer\launcher.py", line 119, in init
download_chromium() File "C:\Program Files\Python\lib\site-packages\pyppeteer\chromium_downloader.py", line 146, in download_chromium
extract_zip(download_zip(get_url()), DOWNLOADS_FOLDER / REVISION) File "C:\Program Files\Python\lib\site-packages\pyppeteer\chromium_downloader.py", line 85, in download_zip
data = http.request('GET', url, preload_content=False) File "C:\Program Files\Python\lib\site-packages\urllib3\request.py", line 76, in request
method, url, fields=fields, headers=headers, **urlopen_kw File "C:\Program Files\Python\lib\site-packages\urllib3\request.py", line 97, in request_encode_url
return self.urlopen(method, url, **extra_kw) File "C:\Program Files\Python\lib\site-packages\urllib3\poolmanager.py", line 336, in urlopen
response = conn.urlopen(method, u.request_uri, **kw) File "C:\Program Files\Python\lib\site-packages\urllib3\connectionpool.py", line 765, in urlopen
**response_kw File "C:\Program Files\Python\lib\site-packages\urllib3\connectionpool.py", line 765, in urlopen
**response_kw File "C:\Program Files\Python\lib\site-packages\urllib3\connectionpool.py", line 765, in urlopen
**response_kw File "C:\Program Files\Python\lib\site-packages\urllib3\connectionpool.py", line 725, in urlopen
method, url, error=e, _pool=self, _stacktrace=sys.exc_info()[2] File "C:\Program Files\Python\lib\site-packages\urllib3\util\retry.py", line 439, in increment
raise MaxRetryError(_pool, url, error or ResponseError(cause)) urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='storage.googleapis.com', port=443): Max retries exceeded with url: /chromium-browser-snapshots/Win_x64/588429/chrome-win32.zip (Caused by SSLError(SSLError("bad handshake: Error([('SSL routines', 'tls_process_server_certificate', 'certificate verify failed')])")))
Process finished with exit code 1
Thank you in advance!

Related

Pandas_datareader issue

When I try to read data from stooq through pandas_datareader.data this error keeps coming through:
Traceback (most recent call last):
File "/Users/matteo/PycharmProjects/pythonProject/venv/lib/python3.11/site-packages/urllib3/connectionpool.py", line 700, in urlopen
self._prepare_proxy(conn)
File "/Users/matteo/PycharmProjects/pythonProject/venv/lib/python3.11/site-packages/urllib3/connectionpool.py", line 996, in _prepare_proxy
conn.connect()
File "/Users/matteo/PycharmProjects/pythonProject/venv/lib/python3.11/site-packages/urllib3/connection.py", line 414, in connect
self.sock = ssl_wrap_socket(
^^^^^^^^^^^^^^^^
File "/Users/matteo/PycharmProjects/pythonProject/venv/lib/python3.11/site-packages/urllib3/util/ssl_.py", line 449, in ssl_wrap_socket
ssl_sock = _ssl_wrap_socket_impl(
^^^^^^^^^^^^^^^^^^^^^^
File "/Users/matteo/PycharmProjects/pythonProject/venv/lib/python3.11/site-packages/urllib3/util/ssl_.py", line 493, in _ssl_wrap_socket_impl
return ssl_context.wrap_socket(sock, server_hostname=server_hostname)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/ssl.py", line 517, in wrap_socket
return self.sslsocket_class._create(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/ssl.py", line 1075, in _create
self.do_handshake()
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/ssl.py", line 1346, in do_handshake
self._sslobj.do_handshake()
ssl.SSLCertVerificationError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self signed certificate in certificate chain (_ssl.c:992)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/Users/matteo/PycharmProjects/pythonProject/venv/lib/python3.11/site-packages/requests/adapters.py", line 489, in send
resp = conn.urlopen(
^^^^^^^^^^^^^
File "/Users/matteo/PycharmProjects/pythonProject/venv/lib/python3.11/site-packages/urllib3/connectionpool.py", line 787, in urlopen
retries = retries.increment(
^^^^^^^^^^^^^^^^^^
File "/Users/matteo/PycharmProjects/pythonProject/venv/lib/python3.11/site-packages/urllib3/util/retry.py", line 592, in increment
raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='stooq.com', port=443): Max retries exceeded with url: /q/d/l/?s=%5EDJI&i=d&d1=20180216&d2=20230215 (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self signed certificate in certificate chain (_ssl.c:992)')))
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/Users/matteo/PycharmProjects/pythonProject/stock_data.py", line 3, in <module>
f = web.DataReader('^DJI', 'stooq')
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/matteo/PycharmProjects/pythonProject/venv/lib/python3.11/site-packages/pandas/util/_decorators.py", line 211, in wrapper
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/Users/matteo/PycharmProjects/pythonProject/venv/lib/python3.11/site-packages/pandas_datareader/data.py", line 432, in DataReader
).read()
^^^^^^
File "/Users/matteo/PycharmProjects/pythonProject/venv/lib/python3.11/site-packages/pandas_datareader/base.py", line 253, in read
df = self._read_one_data(self.url, params=self._get_params(self.symbols))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/matteo/PycharmProjects/pythonProject/venv/lib/python3.11/site-packages/pandas_datareader/base.py", line 108, in _read_one_data
out = self._read_url_as_StringIO(url, params=params)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/matteo/PycharmProjects/pythonProject/venv/lib/python3.11/site-packages/pandas_datareader/base.py", line 119, in _read_url_as_StringIO
response = self._get_response(url, params=params)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/matteo/PycharmProjects/pythonProject/venv/lib/python3.11/site-packages/pandas_datareader/base.py", line 155, in _get_response
response = self.session.get(
^^^^^^^^^^^^^^^^^
File "/Users/matteo/PycharmProjects/pythonProject/venv/lib/python3.11/site-packages/requests/sessions.py", line 600, in get
return self.request("GET", url, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/matteo/PycharmProjects/pythonProject/venv/lib/python3.11/site-packages/requests/sessions.py", line 587, in request
resp = self.send(prep, **send_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/matteo/PycharmProjects/pythonProject/venv/lib/python3.11/site-packages/requests/sessions.py", line 701, in send
r = adapter.send(request, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/matteo/PycharmProjects/pythonProject/venv/lib/python3.11/site-packages/requests/adapters.py", line 563, in send
raise SSLError(e, request=request)
requests.exceptions.SSLError: HTTPSConnectionPool(host='stooq.com', port=443): Max retries exceeded with url: /q/d/l/?s=%5EDJI&i=d&d1=20180216&d2=20230215 (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self signed certificate in certificate chain (_ssl.c:992)')))
I tried to run the same code on jupiter lab and the error is the same, contrary in google colab I get the correct result. Here is the code from google colab:
Code:
import pandas_datareader.data as web
f = web.DataReader('^DJI', 'stooq')
f[:10]

Error when Running Basic Appium Script. Is it urllib3 related?

Getting an Error for this Basic Appium Script while attempting to run it on a real device. Everything works fine if I used the Appium Inspector, but once I try to run the same code in PyCharm im getting an Error. Below is the JSON format, as Im pretty new to this library I don't know if I've done it correctly or not.
from appium import webdriver
desired_cap = {
"deviceName": "R9AN60B4CCJ",
"platformName": "Android",
"app": "C:\\Users\\John Doe\\AppData\\Local\\Android\\Sdk\\platform-tools\\airmirror2.apk"
} #The Capabilities to install an app from the apk file on your Computer
driver = webdriver.Remote('https://localhost:4723/wd/hub', desired_cap)
#Similar to Selenium; Declaring the Driver. The Appium Server should already be started
the Error is below
Traceback (most recent call last):
File "C:\Users\John Doe\PycharmProjects\SelTest\venv\lib\site-packages\urllib3\connectionpool.py", line 600, in urlopen
chunked=chunked)
File "C:\Users\John Doe\PycharmProjects\SelTest\venv\lib\site-packages\urllib3\connectionpool.py", line 343, in _make_request
self._validate_conn(conn)
File "C:\Users\John Doe\PycharmProjects\SelTest\venv\lib\site-packages\urllib3\connectionpool.py", line 849, in _validate_conn
conn.connect()
File "C:\Users\John Doe\PycharmProjects\SelTest\venv\lib\site-packages\urllib3\connection.py", line 356, in connect
ssl_context=context)
File "C:\Users\John Doe\PycharmProjects\SelTest\venv\lib\site-packages\urllib3\util\ssl_.py", line 359, in ssl_wrap_socket
return context.wrap_socket(sock, server_hostname=server_hostname)
File "C:\Users\John Doe\Miniconda3\lib\ssl.py", line 423, in wrap_socket
session=session
File "C:\Users\John Doe\Miniconda3\lib\ssl.py", line 870, in _create
self.do_handshake()
File "C:\Users\John Doe\Miniconda3\lib\ssl.py", line 1139, in do_handshake
self._sslobj.do_handshake()
ssl.SSLError: [SSL: WRONG_VERSION_NUMBER] wrong version number (_ssl.c:1076)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:/Users/John Doe/PycharmProjects/SelTest/venv/FirstTest.py", line 11, in <module>
driver = webdriver.Remote('https://localhost:4723/wd/hub', desired_cap)
File "C:\Users\John Doe\PycharmProjects\SelTest\venv\lib\site-packages\appium\webdriver\webdriver.py", line 275, in __init__
AppiumConnection(command_executor, keep_alive=keep_alive), desired_capabilities, browser_profile, proxy
File "C:\Users\John Doe\PycharmProjects\SelTest\venv\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 269, in __init__
self.start_session(capabilities, browser_profile)
File "C:\Users\John Doe\PycharmProjects\SelTest\venv\lib\site-packages\appium\webdriver\webdriver.py", line 369, in start_session
response = self.execute(RemoteCommand.NEW_SESSION, parameters)
File "C:\Users\John Doe\PycharmProjects\SelTest\venv\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 423, in execute
response = self.command_executor.execute(driver_command, params)
File "C:\Users\John Doe\PycharmProjects\SelTest\venv\lib\site-packages\selenium\webdriver\remote\remote_connection.py", line 333, in execute
return self._request(command_info[0], url, body=data)
File "C:\Users\John Doe\PycharmProjects\SelTest\venv\lib\site-packages\selenium\webdriver\remote\remote_connection.py", line 355, in _request
resp = self._conn.request(method, url, body=body, headers=headers)
File "C:\Users\John Doe\PycharmProjects\SelTest\venv\lib\site-packages\urllib3\request.py", line 72, in request
**urlopen_kw)
File "C:\Users\John Doe\PycharmProjects\SelTest\venv\lib\site-packages\urllib3\request.py", line 150, in request_encode_body
return self.urlopen(method, url, **extra_kw)
File "C:\Users\John Doe\PycharmProjects\SelTest\venv\lib\site-packages\urllib3\poolmanager.py", line 322, in urlopen
response = conn.urlopen(method, u.request_uri, **kw)
File "C:\Users\John Doe\PycharmProjects\SelTest\venv\lib\site-packages\urllib3\connectionpool.py", line 667, in urlopen
**response_kw)
File "C:\Users\John Doe\PycharmProjects\SelTest\venv\lib\site-packages\urllib3\connectionpool.py", line 667, in urlopen
**response_kw)
File "C:\Users\John Doe\PycharmProjects\SelTest\venv\lib\site-packages\urllib3\connectionpool.py", line 667, in urlopen
**response_kw)
File "C:\Users\John Doe\PycharmProjects\SelTest\venv\lib\site-packages\urllib3\connectionpool.py", line 638, in urlopen
_stacktrace=sys.exc_info()[2])
File "C:\Users\John Doe\PycharmProjects\SelTest\venv\lib\site-packages\urllib3\util\retry.py", line 398, in increment
raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='localhost', port=4723): Max retries exceeded with url: /wd/hub/session (Caused by SSLError(SSLError(1, '[SSL: WRONG_VERSION_NUMBER] wrong version number (_ssl.c:1076)')))
I dont know if I need a specific version of urllib3 in this venv.

Problem downloading website with Python Requests.get() method on a AWS Ubuntu EC2 instance

I have a webscraping project that is stuck.
I am using the Requests package and the get() method to download the website http, after which I want to use Beautiful Soup on it.
It works fine on my labtop, but when I upload the program to my AWS Ubuntu EC2 instance, I run into an error. I have tried other websites, and they all work, I only run into these problems with this site.
Does anyone know why this is happening?
Based on the error messages, I suspected SSL issues, but even with the verify=False parameter, it still wont work.
The code:
import requests
url = "http://www.kino.dk"
r = requests.get(url, verify=False)
print(r.text)
The error message:
> Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/urllib3/contrib/pyopenssl.py", line 485, in wrap_socket
cnx.do_handshake()
File "/usr/lib/python3/dist-packages/OpenSSL/SSL.py", line 1915, in do_handshake
self._raise_ssl_error(self._ssl, result)
File "/usr/lib/python3/dist-packages/OpenSSL/SSL.py", line 1647, in _raise_ssl_error
_raise_current_error()
File "/usr/lib/python3/dist-packages/OpenSSL/_util.py", line 54, in exception_from_error_queue
raise exception_type(errors)
OpenSSL.SSL.Error: [('SSL routines', 'tls12_check_peer_sigalg', 'wrong signature type')]
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 665, in urlopen
httplib_response = self._make_request(
File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 376, in _make_request
self._validate_conn(conn)
File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 996, in _validate_conn
conn.connect()
File "/usr/lib/python3/dist-packages/urllib3/connection.py", line 352, in connect
self.sock = ssl_wrap_socket(
File "/usr/lib/python3/dist-packages/urllib3/util/ssl_.py", line 370, in ssl_wrap_socket
return context.wrap_socket(sock, server_hostname=server_hostname)
File "/usr/lib/python3/dist-packages/urllib3/contrib/pyopenssl.py", line 491, in wrap_socket
raise ssl.SSLError("bad handshake: %r" % e)
ssl.SSLError: ("bad handshake: Error([('SSL routines', 'tls12_check_peer_sigalg', 'wrong signature type')])",)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/requests/adapters.py", line 439, in send
resp = conn.urlopen(
File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 719, in urlopen
retries = retries.increment(
File "/usr/lib/python3/dist-packages/urllib3/util/retry.py", line 436, in increment
raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='www.kino.dk', port=443): Max retries exceeded with url: / (Caused by SSLError(SSLError("bad handshake: Error([('SSL routines', 'tls12_check_peer_sigalg', 'wrong signature type')])")))
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "test.py", line 6, in <module>
response = requests.get(url, verify=False)
File "/usr/lib/python3/dist-packages/requests/api.py", line 75, in get
return request('get', url, params=params, **kwargs)
File "/usr/lib/python3/dist-packages/requests/api.py", line 60, in request
return session.request(method=method, url=url, **kwargs)
File "/usr/lib/python3/dist-packages/requests/sessions.py", line 533, in request
resp = self.send(prep, **send_kwargs)
File "/usr/lib/python3/dist-packages/requests/sessions.py", line 668, in send
history = [resp for resp in gen] if allow_redirects else []
File "/usr/lib/python3/dist-packages/requests/sessions.py", line 668, in <listcomp>
history = [resp for resp in gen] if allow_redirects else []
File "/usr/lib/python3/dist-packages/requests/sessions.py", line 239, in resolve_redirects
resp = self.send(
File "/usr/lib/python3/dist-packages/requests/sessions.py", line 646, in send
r = adapter.send(request, **kwargs)
File "/usr/lib/python3/dist-packages/requests/adapters.py", line 514, in send
raise SSLError(e, request=request)
requests.exceptions.SSLError: HTTPSConnectionPool(host='www.kino.dk', port=443): Max retries exceeded with url: / (Caused by SSLError(SSLError("bad handshake: Error([('SSL routines', 'tls12_check_peer_sigalg', 'wrong signature type')])")))

PyTube, URLlib, Http packages giving error while trying to download youtube video

I am trying to download youtube video using PyTube package but I am unable to do so.
I am using Python 3.5 and pytube 9.2.2.
Urllib doesn't give me an error in other programs but it is giving me errors here.
The code I am using is:
#importing the module
from pytube import YouTube
#where to save
SAVE_PATH = "E:/" #to_do
#link of the video to be downloaded
link="https://www.youtube.com/watch?v=xWOoBJUqlbI"
yt = YouTube(link)
#filters out all the files with "mp4" extension
mp4files = yt.filter('mp4')
yt.set_filename('GeeksforGeeks Video') #to set the name of the file
#get the video with the extension and resolution passed in the get() function
d_video = yt.get(mp4files[-1].extension,mp4files[-1].resolution)
try:
#downloading the video
d_video.download(SAVE_PATH)
except:
print("Some Error!")
print('Task Completed!')
The errors that I am getting are:
Traceback (most recent call last):
File "C:\Users\hp\AppData\Local\Programs\Python\Python35\lib\urllib\request.py", line 1240, in do_open
h.request(req.get_method(), req.selector, req.data, headers)
File "C:\Users\hp\AppData\Local\Programs\Python\Python35\lib\http\client.py", line 1083, in request
self._send_request(method, url, body, headers)
File "C:\Users\hp\AppData\Local\Programs\Python\Python35\lib\http\client.py", line 1128, in _send_request
self.endheaders(body)
File "C:\Users\hp\AppData\Local\Programs\Python\Python35\lib\http\client.py", line 1079, in endheaders
self._send_output(message_body)
File "C:\Users\hp\AppData\Local\Programs\Python\Python35\lib\http\client.py", line 911, in _send_output
self.send(msg)
File "C:\Users\hp\AppData\Local\Programs\Python\Python35\lib\http\client.py", line 854, in send
self.connect()
File "C:\Users\hp\AppData\Local\Programs\Python\Python35\lib\http\client.py", line 1237, in connect
server_hostname=server_hostname)
File "C:\Users\hp\AppData\Local\Programs\Python\Python35\lib\ssl.py", line 376, in wrap_socket
_context=self)
File "C:\Users\hp\AppData\Local\Programs\Python\Python35\lib\ssl.py", line 747, in __init__
self.do_handshake()
File "C:\Users\hp\AppData\Local\Programs\Python\Python35\lib\ssl.py", line 983, in do_handshake
self._sslobj.do_handshake()
File "C:\Users\hp\AppData\Local\Programs\Python\Python35\lib\ssl.py", line 628, in do_handshake
self._sslobj.do_handshake()
ssl.SSLError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:646)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "2TryingPyTube.py", line 10, in <module>
yt = YouTube(link)
File "C:\Users\hp\AppData\Local\Programs\Python\Python35\lib\site-packages\pytube\__main__.py", line 87, in __init__
self.prefetch_init()
File "C:\Users\hp\AppData\Local\Programs\Python\Python35\lib\site-packages\pytube\__main__.py", line 95, in prefetch_init
self.prefetch()
File "C:\Users\hp\AppData\Local\Programs\Python\Python35\lib\site-packages\pytube\__main__.py", line 158, in prefetch
self.watch_html = request.get(url=self.watch_url)
File "C:\Users\hp\AppData\Local\Programs\Python\Python35\lib\site-packages\pytube\request.py", line 21, in get
response = urlopen(url)
File "C:\Users\hp\AppData\Local\Programs\Python\Python35\lib\urllib\request.py", line 162, in urlopen
return opener.open(url, data, timeout)
File "C:\Users\hp\AppData\Local\Programs\Python\Python35\lib\urllib\request.py", line 465, in open
response = self._open(req, data)
File "C:\Users\hp\AppData\Local\Programs\Python\Python35\lib\urllib\request.py", line 483, in _open
'_open', req)
File "C:\Users\hp\AppData\Local\Programs\Python\Python35\lib\urllib\request.py", line 443, in _call_chain
result = func(*args)
File "C:\Users\hp\AppData\Local\Programs\Python\Python35\lib\urllib\request.py", line 1283, in https_open
context=self._context, check_hostname=self._check_hostname)
File "C:\Users\hp\AppData\Local\Programs\Python\Python35\lib\urllib\request.py", line 1242, in do_open
raise URLError(err)
urllib.error.URLError: <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:646)>
I wish someone could help me.
PS - I am only a beginner in using urllib, http packages. Please don't write harsh comments, thank you :0

Unable To Get HTTPS URLs (requests package)

When I try to follow the guide here: https://automatetheboringstuff.com/chapter11/ my script fails:
import requests
res = requests.get('https://automatetheboringstuff.com/files/rj.txt')
type(res)
res.raise_for_status()
requests is installed.
I am given the following error messages after a very long wait, which only appear when using HTTPS URLs; the same thing occurs on two Windows 10 64bit machines with Python 3.6.3 64bit and Python 3.6.4 64bit:
"C:\Program Files\Python36\python.exe" "C:/Users/user.name/Google Drive/Automation/RoHSWebScraper/main.py"
Traceback (most recent call last):
File "C:\Program Files\Python36\lib\site-packages\urllib3\contrib\pyopenssl.py", line 441, in wrap_socket
cnx.do_handshake()
File "C:\Program Files\Python36\lib\site-packages\OpenSSL\SSL.py", line 1716, in do_handshake
self._raise_ssl_error(self._ssl, result)
File "C:\Program Files\Python36\lib\site-packages\OpenSSL\SSL.py", line 1449, in _raise_ssl_error
raise SysCallError(-1, "Unexpected EOF")
OpenSSL.SSL.SysCallError: (-1, 'Unexpected EOF')
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\Program Files\Python36\lib\site-packages\urllib3\connectionpool.py", line 601, in urlopen
chunked=chunked)
File "C:\Program Files\Python36\lib\site-packages\urllib3\connectionpool.py", line 346, in _make_request
self._validate_conn(conn)
File "C:\Program Files\Python36\lib\site-packages\urllib3\connectionpool.py", line 850, in _validate_conn
conn.connect()
File "C:\Program Files\Python36\lib\site-packages\urllib3\connection.py", line 326, in connect
ssl_context=context)
File "C:\Program Files\Python36\lib\site-packages\urllib3\util\ssl_.py", line 329, in ssl_wrap_socket
return context.wrap_socket(sock, server_hostname=server_hostname)
File "C:\Program Files\Python36\lib\site-packages\urllib3\contrib\pyopenssl.py", line 448, in wrap_socket
raise ssl.SSLError('bad handshake: %r' % e)
ssl.SSLError: ("bad handshake: SysCallError(-1, 'Unexpected EOF')",)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\Program Files\Python36\lib\site-packages\requests\adapters.py", line 440, in send
timeout=timeout
File "C:\Program Files\Python36\lib\site-packages\urllib3\connectionpool.py", line 639, in urlopen
_stacktrace=sys.exc_info()[2])
File "C:\Program Files\Python36\lib\site-packages\urllib3\util\retry.py", line 388, in increment
raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='automatetheboringstuff.com', port=443): Max retries exceeded with url: /files/rj.txt (Caused by SSLError(SSLError("bad handshake: SysCallError(-1, 'Unexpected EOF')",),))
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:/Users/user.name/Google Drive/Automation/RoHSWebScraper/main.py", line 3, in <module>
res = requests.get('https://automatetheboringstuff.com/files/rj.txt', verify=False)
File "C:\Program Files\Python36\lib\site-packages\requests\api.py", line 72, in get
return request('get', url, params=params, **kwargs)
File "C:\Program Files\Python36\lib\site-packages\requests\api.py", line 58, in request
return session.request(method=method, url=url, **kwargs)
File "C:\Program Files\Python36\lib\site-packages\requests\sessions.py", line 508, in request
resp = self.send(prep, **send_kwargs)
File "C:\Program Files\Python36\lib\site-packages\requests\sessions.py", line 618, in send
r = adapter.send(request, **kwargs)
File "C:\Program Files\Python36\lib\site-packages\requests\adapters.py", line 506, in send
raise SSLError(e, request=request)
requests.exceptions.SSLError: HTTPSConnectionPool(host='automatetheboringstuff.com', port=443): Max retries exceeded with url: /files/rj.txt (Caused by SSLError(SSLError("bad handshake: SysCallError(-1, 'Unexpected EOF')",),))
Process finished with exit code 1
Can anyone help me with this infuriating problem!!?
You can try urllib:
Python2:
import urllib
data = urllib.urlopen('https://automatetheboringstuff.com/files/rj.txt').read()
Python3:
import urllib.requests
data = urllib.requests.urlopen('https://automatetheboringstuff.com/files/rj.txt').read()
So it turns out the computers on my corporate network are using proxy servers, which was preventing my HTTP and HTTPS requests from connecting properly.
I followed the answer from Lelouchzqy here to determine what my HTTP and HTTPS proxy servers were.
I then followed the answer from Roland Smith here to tell requests which proxies to use.
Hopefully this will help someone in the future if they have the same issue!

Categories

Resources