Python Requests Library timing out under Linux - python

I am attempting to use the requests.py library for calls to a rest web service. I wrote a quick prototype for my usage under windows and everything worked fine, but when I attempted to run the same prototype under linux I get a "requests.exceptions.Timeout: Request timed out" error. Does anyone know why this might be happening? If I try to use the library to access a non https url it works fine under both windows and linux.
import requests
url = 'https://path.to.rest/svc/?options'
r = requests.get(url, auth=('uid','passwd'), verify=False)
print(r.content)
I did notice that if I leave off the verify=False parameter from the get call, I get a different exception, namely "requests.exceptions.SSLError: Can't connect to HTTPS URL because the SSL module is not available". This appears to be a possible underlying cause, though I dont know why they would change the errorcode, but I cant find any reference to an ssl module and I verified that certifi was installed. Interestingly, if I leave off the verify parameter in windows I get a different exception, "requests.exceptions.SSLError: [Errno 1] _ssl.c:503: error:14090086:SSL routines:SSL3_GET_SERVER_CERTIFICATE:certificate verify failed"
EDIT:
Tracebacks for all cases/scenarios mentioned
Full code as shown above:
Traceback(most recent call last):
File "testRequests.py", line 15, in <module>
r = requests.get(url, auth=('uid','passwd'), verify=False)
File "build/bdist.linux-x86_64/egg/requests/api.py", line 52, in get
File "build/bdist.linux-x86_64/egg/requests/api.py", line 40, in request
File "build/bdist.linux-x86_64/egg/requests/sessions.py", line 208, in request
File "build/bdist.linux-x86_64/egg/requests/models.py", line 586, in send
requests.exceptions.Timeout: Request timed out
Code as shown above minus the "verify=False" paramter:
Traceback(most recent call last):
File "testRequests.py", line 15, in <module>
r = requests.get(url, auth=('uid','passwd'))
File "build/bdist.linux-x86_64/egg/requests/api.py", line 52, in get
File "build/bdist.linux-x86_64/egg/requests/api.py", line 40, in request
File "build/bdist.linux-x86_64/egg/requests/sessions.py", line 208, in request
File "build/bdist.linux-x86_64/egg/requests/models.py", line 584, in send
requests.exceptions.SSLError: Can't connect to HTTPS URL because the SSL module is not available
Code as show above minus the "verify=False" parameter and run under windows:
Traceback(most recent call last):
File "testRequests.py", line 59, in <module>
r = requests.get(url, auth=('uid','passwd'))
File "c:\Python27\lib\site-packages\requests\api.py", line 52, in get
return request('get', url, **kwargs)
File "c:\Python27\lib\site-packages\requests\api.py", line 40, in request
return s.request(method=method, url=url, **kwargs)
File "c:\Python27\lib\site-packages\requests\sessions.py", line 208, in request
r.send(prefetch=prefetch)
File "c:\Python27\lib\site-packages\requests\models.py", line 584, in send
raise SSLError(e)
requests.exceptions.SSLError: [Errno 1] _ssl.c:503: error:14090086:SSL routines:SSL3_GET_SERVER_CERTIFICATE:certificate verify failed

I'm not an expert on the matter but it looks like the certificate from the server can't be verified correctly. I don't know how Python and ssl handles certificate verification but the first option is to try ignoring the exception, or maybe change https to http in an attempt to see if the web-service allows non-secure service calls.
If the issue is revolving around an import error for ssl, the module is part of CPython and you may need to ensure that the Python interpreter is compiled with SSL support (from openssl). Look into removing the package for python (be careful) and compiling it with openssl support, personally I would strongly advise you looking into a virtualenv before removing anything, compiling Python is not too difficult and it would give you a finer grain of control for what you aim to do.

Related

Unable to programmatically download all files in a remote maven repository

Background
I am trying to bring up my grails project, which stopped building all of a sudden (probably due to me playing around with my local maven repository). Now when I run the grails command, I get the following errors -
org.eclipse.aether.resolution.ArtifactDescriptorException: Failed to read artifact descriptor for xalan:serializer:jar:2.7.1
at org.apache.maven.repository.internal.DefaultArtifactDescriptorReader.loadPom(DefaultArtifactDescriptorReader.java:335)
at org.apache.maven.repository.internal.DefaultArtifactDescriptorReader.readArtifactDescriptor(DefaultArtifactDescriptorReader.java:217)
at org.eclipse.aether.internal.impl.DefaultDependencyCollector.resolveCachedArtifactDescriptor(DefaultDependencyCollector.java:537)
at org.eclipse.aether.internal.impl.DefaultDependencyCollector.getArtifactDescriptorResult(DefaultDependencyCollector.java:521)
at org.eclipse.aether.internal.impl.DefaultDependencyCollector.processDependency(DefaultDependencyCollector.java:421)
at org.eclipse.aether.internal.impl.DefaultDependencyCollector.processDependency(DefaultDependencyCollector.java:375)
at org.eclipse.aether.internal.impl.DefaultDependencyCollector.process(DefaultDependencyCollector.java:363)
at org.eclipse.aether.internal.impl.DefaultDependencyCollector.collectDependencies(DefaultDependencyCollector.java:266)
at org.eclipse.aether.internal.impl.DefaultRepositorySystem.collectDependencies(DefaultRepositorySystem.java:337)
at grails.util.BuildSettings.doResolve(BuildSettings.groovy:514)
at grails.util.BuildSettings.doResolve(BuildSettings.groovy)
at grails.util.BuildSettings$_getDefaultBuildDependencies_closure19.doCall(BuildSettings.groovy:775)
at grails.util.BuildSettings$_getDefaultBuildDependencies_closure19.doCall(BuildSettings.groovy)
at grails.util.BuildSettings.getDefaultBuildDependencies(BuildSettings.groovy:769)
at grails.util.BuildSettings.getBuildDependencies(BuildSettings.groovy:674)
Caused by: org.eclipse.aether.resolution.ArtifactResolutionException: Could not transfer artifact xalan:serializer:pom:2.7.1 from/to repo1_maven_org_maven2 (https://repo1.maven.org/maven2): sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target
at org.eclipse.aether.internal.impl.DefaultArtifactResolver.resolve(DefaultArtifactResolver.java:462)
at org.eclipse.aether.internal.impl.DefaultArtifactResolver.resolveArtifacts(DefaultArtifactResolver.java:264)
at org.eclipse.aether.internal.impl.DefaultArtifactResolver.resolveArtifact(DefaultArtifactResolver.java:241)
at org.apache.maven.repository.internal.DefaultArtifactDescriptorReader.loadPom(DefaultArtifactDescriptorReader.java:320)
... 14 more
I followed this blog https://stackoverflow.com/a/36427118/351903 and added the certificate of the remote server to my Cacerts, but am now getting these errors -
sun.security.validator.ValidatorException: KeyUsage does not allow digital signatures
I tried to manually download the artefacts and then observed that this error resolves for that one and shows for the next artefact. So, I thought, if I could list down all the file names in a url like this https://repo1.maven.org/maven2/org/grails/grails-bootstrap/2.4.0/ and then curl all the file names and create them on my local, I could resolve this one by one for my dependencies, like this -
curl https://repo1.maven.org/maven2/org/grails/grails-bootstrap/2.4.0/grails-bootstrap-2.4.0-javadoc.jar -o 'grails-bootstrap-2.4.0-javadoc.jar'
However, I am however, unable to list down all the files for an artefact -
SandeepanNath:Desktop sandeepan.nath$ ssh https://repo1.maven.org ls -l /maven2/org/grails/grails-bootstrap/2.4.0/
ssh: Could not resolve hostname https://repo1.maven.org: nodename nor servname provided, or not known
SandeepanNath:Desktop sandeepan.nath$
Trying with the IP -
SandeepanNath:Desktop sandeepan.nath$ ssh 151.101.36.209 ls -l /maven2/org/grails/grails-bootstrap/2.4.0/
ssh: connect to host 151.101.36.209 port 22: Operation timed out
Then I understood that the only option I have is to scrape the url looking for links and then doing curl. But, I am unable to scrape as well due to ssl errors. I tried following this python example - https://www.geeksforgeeks.org/implementing-web-scraping-python-beautiful-soup/?ref=lbp
#This will not run on online IDE
import requests
from bs4 import BeautifulSoup
import ssl
URL = "http://www.values.com/inspirational-quotes"
r = requests.get(URL)
soup = BeautifulSoup(r.content, 'html5lib')
print(soup.prettify())
But I get this error -
Traceback (most recent call last):
File "parse_remote.py", line 7, in <module>
r = requests.get(URL)
File "/Users/sandeepan.nath/Library/Python/2.7/lib/python/site-packages/requests/api.py", line 76, in get
return request('get', url, params=params, **kwargs)
File "/Users/sandeepan.nath/Library/Python/2.7/lib/python/site-packages/requests/api.py", line 61, in request
return session.request(method=method, url=url, **kwargs)
File "/Users/sandeepan.nath/Library/Python/2.7/lib/python/site-packages/requests/sessions.py", line 530, in request
resp = self.send(prep, **send_kwargs)
File "/Users/sandeepan.nath/Library/Python/2.7/lib/python/site-packages/requests/sessions.py", line 665, in send
history = [resp for resp in gen] if allow_redirects else []
File "/Users/sandeepan.nath/Library/Python/2.7/lib/python/site-packages/requests/sessions.py", line 245, in resolve_redirects
**adapter_kwargs
File "/Users/sandeepan.nath/Library/Python/2.7/lib/python/site-packages/requests/sessions.py", line 643, in send
r = adapter.send(request, **kwargs)
File "/Users/sandeepan.nath/Library/Python/2.7/lib/python/site-packages/requests/adapters.py", line 514, in send
raise SSLError(e, request=request)
requests.exceptions.SSLError: HTTPSConnectionPool(host='www.values.com', port=443): Max retries exceeded with url: /inspirational-quotes (Caused by SSLError(SSLEOFError(8, u'EOF occurred in violation of protocol (_ssl.c:590)'),))

from urllib3 HTTPSConnectionPool request_encode_body certificate issue

I'm trying to sent a request on https url to get data, the domain needs security certificate when I try to execute it on the browser.
But my issue is how to call the url on my python code to get the response data?
I've write the follwing code:
conn = HTTPSConnectionPool(BETTING_CONFG['api_url'],
maxsize = BETTING_CONFG['connection_max_size'])
response = conn.request_encode_body('POST', service_uri, headers= headers,
encode_multipart=False, body = body)
and I get the following response:
Response: status = 200, payload = {"_status":"error","payload":{"_code":"0-2","_message":"invalid_app_key"}} .
and this warning on the terminal:
/usr/local/lib/python2.7/dist-packages/urllib3/util/ssl_.py:318: SNIMissingWarning: An HTTPS request has been made, but the SNI (Subject Name Indication) extension to TLS is not available on this platform. This may cause the server to present an incorrect TLS certificate, which can cause validation failures. You can upgrade to a newer version of Python to solve this. For more information, see https://urllib3.readthedocs.org/en/latest/security.html#snimissingwarning.
SNIMissingWarning
/usr/local/lib/python2.7/dist-packages/urllib3/util/ssl_.py:122: InsecurePlatformWarning: A true SSLContext object is not available. This prevents urllib3 from configuring SSL appropriately and may cause certain SSL connections to fail. You can upgrade to a newer version of Python to solve this. For more information, see https://urllib3.readthedocs.org/en/latest/security.html#insecureplatformwarning.
InsecurePlatformWarning
/usr/local/lib/python2.7/dist-packages/urllib3/connectionpool.py:821: InsecureRequestWarning: Unverified HTTPS request is being made. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.org/en/latest/security.html
InsecureRequestWarning)
[555WIN] 2016-05-30 14:02:06,043 - INFO - Betting Response: status = 200, payload = {"_status":"error","payload":{"_code":"0-2","_message":"invalid_app_key"}} .
Traceback (most recent call last):
File "/usr/lib/python2.7/logging/handlers.py", line 76, in emit
if self.shouldRollover(record):
File "/usr/lib/python2.7/logging/handlers.py", line 156, in shouldRollover
msg = "%s\n" % self.format(record)
File "/usr/lib/python2.7/logging/__init__.py", line 724, in format
return fmt.format(record)
File "/usr/lib/python2.7/logging/__init__.py", line 464, in format
record.message = record.getMessage()
File "/usr/lib/python2.7/logging/__init__.py", line 324, in getMessage
msg = str(self.msg)
TypeError: __str__ returned non-string (type dict)
Logged from file jsonapi.py, line 137
Traceback (most recent call last):
File "/usr/lib/python2.7/logging/__init__.py", line 851, in emit
msg = self.format(record)
File "/usr/lib/python2.7/logging/__init__.py", line 724, in format
return fmt.format(record)
File "/usr/lib/python2.7/logging/__init__.py", line 464, in format
record.message = record.getMessage()
File "/usr/lib/python2.7/logging/__init__.py", line 324, in getMessage
msg = str(self.msg)
TypeError: __str__ returned non-string (type dict)
when I added the certificate on the chrome and tried to send the request on the postman, it works fine?
any help how to fix it?
Please understand the your Chrome certificate store is not the same certificate store that is used by your Python application.
It would be much easier if you could only get a valid SSL certificate instead of trying to make self-signed ones to work.
Also, be sure you do upgrade your Python and urllib. Those warning messages are not to be ignored! Resolve them first!
SSL certificates used to be expensive but now you can get valid, fully supported certificates for free from LetsEncrypt. I run my own website using their certificates and I can assure you that Python does have no problem loading their certificates.

SSLError using requests for python

I tried to do the first command in the Quickstart for requests:
>>> import requests
>>> r = requests.get('https://github.com/timeline.json')
But I get the following error message:
Traceback (most recent call last):
File "./main.py", line 16, in <module>
requests.get('https://github.com/timeline.json')
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/requests/api.py", line 55, in get
return request('get', url, **kwargs)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/requests/api.py", line 44, in request
return session.request(method=method, url=url, **kwargs)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/requests/sessions.py", line 383, in request
resp = self.send(prep, **send_kwargs)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/requests/sessions.py", line 486, in send
r = adapter.send(request, **kwargs)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/requests/adapters.py", line 385, in send
raise SSLError(e)
requests.exceptions.SSLError: [Errno 1] _ssl.c:499: error:14090086:SSL routines:SSL3_GET_SERVER_CERTIFICATE:certificate verify failed
I am totally new to SSL certificates, but I suspect it has something to do with Python looking in the wrong place. I downloaded Python 2.7 and am using it as my default Python (I am running Mac OSX 10.6 (Snow Leopard), which came with Python 2.6). I had a lot of trouble with my Mac looking in the wrong place for Python stuff until I fixed the paths and made symbolic links, but I wonder if there is something else that has to do with the upgrade that is causing this SSL error? Or it could be something that doesn't have anything to do with that.
I have tried searching for similar questions and read some people's suggestions just to add the argument verify=False in requests.get(), but I don't want to do that, since I think that just avoids the real problem. Thanks for helping out a complete newbie.
You can try this.
Verify the path to the cert:
>>> requests.get('https://whatever.com', verify='/path/to/certfile')
Or
>>> requests.get('https://whatever.com', cert=('/path/server.crt', '/path/key'))
http://docs.python-requests.org/en/latest/user/advanced/

Python Requests library can't handle redirects for HTTPS URLs when behind a proxy

I think I've discovered a problem with the Requests library's handling of redirects when using HTTPS. As far as I can tell, this is only a problem when the server redirects the Requests client to another HTTPS resource.
I can assure you that the proxy I'm using supports HTTPS and the CONNECT method because I can use it with a browser just fine. I'm using version 2.1.0 of the Requests library which is using 1.7.1 of the urllib3 library.
I watched the transactions in wireshark and I can see the first transaction for https://www.paypal.com/ but I don't see anything for https://www.paypal.com/home. I keep getting timeouts when debugging any deeper in the stack with my debugger so I don't know where to go from here. I'm definitely not seeing the request for /home as a result of the redirect. So it must be erroring out in the code before it gets sent to the proxy.
I want to know if this truly is a bug or if I am doing something wrong. It is really easy to reproduce so long as you have access to a proxy that you can send traffic through. See the code below:
import requests
proxiesDict = {
'http': "http://127.0.0.1:8080",
'https': "http://127.0.0.1:8080"
}
# This fails with "requests.exceptions.ProxyError: Cannot connect to proxy. Socket error: [Errno 111] Connection refused." when it tries to follow the redirect to /home
r = requests.get("https://www.paypal.com/", proxies=proxiesDict)
# This succeeds.
r = requests.get("https://www.paypal.com/home", proxies=proxiesDict)
This also happens when using urllib3 directly. It is probably mainly a bug in urllib3, which Requests uses under the hood, but I'm using the higher level requests library. See below:
proxy = urllib3.proxy_from_url('http://127.0.0.1:8080/')
# This fails with the same error as above.
res = proxy.urlopen('GET', https://www.paypal.com/)
# This succeeds
res = proxy.urlopen('GET', https://www.paypal.com/home)
Here is the traceback when using Requests:
Traceback (most recent call last):
File "tests/downloader_tests.py", line 22, in test_proxy_https_request
r = requests.get("https://www.paypal.com/", proxies=proxiesDict)
File "/usr/local/lib/python2.7/dist-packages/requests/api.py", line 55, in get
return request('get', url, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/requests/api.py", line 44, in request
return session.request(method=method, url=url, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 382, in request
resp = self.send(prep, **send_kwargs)
File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 505, in send
history = [resp for resp in gen] if allow_redirects else []
File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 167, in resolve_redirects
allow_redirects=False,
File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 485, in send
r = adapter.send(request, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/requests/adapters.py", line 375, in send
raise ProxyError(e)
requests.exceptions.ProxyError: Cannot connect to proxy. Socket error: [Errno 111] Connection refused.
Update:
The problem only seems to happen with a 302 (Found) redirect not with the normal 301 redirects (Moved Permanently). Also, I noticed that with the Chrome browser, Paypal doesn't return a redirect. I do see the redirect when using Requests - even though I'm borrowing Chrome's User Agent for this experiment. I'm looking for more URLs that return a 302 in order to get more data points.
I need this to work for all URLs or at least understand why I'm seeing this behavior.
This is a bug in urllib3. We're tracking it as urllib3 issue #295.

urllib.urlopen isn't working. Is there a workaround?

I'm getting a getaddress error and after doing some sleuthing, it looks like it might be my corporate intranet not allowing the connection (I'm assuming due to security, although it is strange that IE works but won't allow Python to open a url). Is there a safe way to get around this?
Here's the exact error:
Traceback (most recent call last):
File "<pyshell#1>", line 1, in <module>
b = urllib.urlopen('http://www.google.com')
File "C:\Python26\lib\urllib.py", line 87, in urlopen
return opener.open(url)
File "C:\Python26\lib\urllib.py", line 203, in open
return getattr(self, name)(url)
File "C:\Python26\lib\urllib.py", line 342, in open_http
h.endheaders()
File "C:\Python26\lib\httplib.py", line 868, in endheaders
self._send_output()
File "C:\Python26\lib\httplib.py", line 740, in _send_output
self.send(msg)
File "C:\Python26\lib\httplib.py", line 699, in send
self.connect()
File "C:\Python26\lib\httplib.py", line 683, in connect
self.timeout)
File "C:\Python26\lib\socket.py", line 498, in create_connection
for res in getaddrinfo(host, port, 0, SOCK_STREAM):
IOError: [Errno socket error] [Errno 11001] getaddrinfo failed
More info: I also get this error with urllib2.urlopen
You probably need to fill in proxy information.
import urllib2
proxy_handler = urllib2.ProxyHandler({'http': 'http://yourcorporateproxy:12345/'})
proxy_auth_handler = urllib2.HTTPBasicAuthHandler()
proxy_auth_handler.add_password('realm', 'host', 'username', 'password')
opener = urllib2.build_opener(proxy_handler, proxy_auth_handler)
opener.open('http://www.stackoverflow.com')
Check you are using the correct proxy.
You can get the proxy information by using urllib.getproxies (note: getproxies does not work with dynamic proxy configuration, like when using PAC).
Update As per information about empty proxy list, I would suggest using an urlopener, with the proxy name and information.
Some good information about how use proxies urlopeners:
Urllib manual
Michael Foord's introduction to urllib
Possibly this is a DNS issue, try urlopen with the IP address of the web server you're accessing, i.e.
import urllib
URL="http://66.102.11.99" # www.google.com
f = urllib.urlopen(URL)
f.read()
If this succeeds, then it's probably a DNS issue rather than a proxy issue (but you should also check your proxy setup).
Looks like a DNS problem.
Since you are using Windows, you can try run this command
nslookup www.google.com
To check if the web address can be resolved successfully.
If not, it is a network setting issue
If OK, then we have to look at possible alternative causes
I was facing the same issue.
In my system the proxy configuration is through a .PAC file.
So i opended that file, took out the default proxy url, for me it was http://168.219.61.250:8080/
Following test code worked for me :
import urllib2
proxy_support = urllib2.ProxyHandler({'http': 'http://168.219.61.250:8080/'})
opener = urllib2.build_opener(proxy_support)
urllib2.install_opener(opener)
response = urllib2.urlopen('http://python.org/')
html = response.read()
print html
You might need to add some more code, if your proxy requires authentication
Hope this helps!!

Categories

Resources