I am using my company's internet, and I need to access a webpage to scrape data off it. I am using the Python Requests module. The page I need to access is done through a POST request. My company has a proxy. I can get through the proxy using the proxies flag in requests.post(). However, there is an authentication part which uses cookies, and I can't seem to get through it. How should I do the authentication part when using a POST request?
I am trying to use the authentication process as described in this thread, but it's not working:
Authentication and python Requests
The code is set up this way:
import ssl
from MyHtmlParser import MyHTMLParser
from lxml import html
import requests
from bs4 import BeautifulSoup as bs
def authenticate(s, url):
headers = {'USER_NAME': 'me', 'PASSWORD': 'mypassword', '_Id': 'submit'}
page=s.get(url)
soup=bs(page.content)
value=soup.form.find_all('input')[2]['value']
headers.update({'value_name':value})
auth = s.post(url, params=headers, cookies=page.cookies)
post_url_finance = 'https://opsdata<company>com/scripts/finance/finance.exe'
values_finance = {'EMPLOYEE_TOTAL': 'employeeId'}
proxies = {'http': 'http://proxy-<company>.com'}
page = requests.post(post_url_finance, data=values_finance, proxies=proxies) print page.content
However, I am getting this error back:
$ python postUsingRequests.py
Traceback (most recent call last):
File "postUsingRequests.py", line 53, in <module>
page = requests.post(post_url_finance, data=values_finance, proxies=proxies)
File "C:\Python27\lib\site-packages\requests\api.py", line 109, in post
return request('post', url, data=data, json=json, **kwargs)
File "C:\Python27\lib\site-packages\requests\api.py", line 50, in request
response = session.request(method=method, url=url, **kwargs)
File "C:\Python27\lib\site-packages\requests\sessions.py", line 465, in request
resp = self.send(prep, **send_kwargs)
File "C:\Python27\lib\site-packages\requests\sessions.py", line 573, in send
r = adapter.send(request, **kwargs)
File "C:\Python27\lib\site-packages\requests\adapters.py", line 431, in send
raise SSLError(e, request=request)
requests.exceptions.SSLError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:590)
The problem you are having seems to be caused by an untrusted SSL certificate.
The quickest fix is setting verify=False. Please note that this will cause the certificate not to be verified and expose your application to security risks. But as you mensioned, it is running in a safe network so this is not a serious problem.
s = requests.session()
s.auth = {'USER_NAME': '----', 'PASSWORD': '----'}
pageCert = requests.post(post_url_finance, proxies=proxies, verify=False)
I used s.auth with verify=False. This gave me a response back instead of giving me the SSL error.
Related
I have an API, where I need to get the bearer token. When I use 'Postman' application, I get the bearer token correctly. I have written below python code for the same but I get below errors. Please help. I need to send username and password in the body as a form data.
import requests
url = "https://322.286.24.01/ach/ach_api/login"
payload={'username': 'test',
'password': 'test12'}
response = requests.post( url,data=payload)
print(response.text)
ERROR:
Traceback (most recent call last):
File "/tmp/test.py", line 13, in <module>
response = requests.post(url,data=payload)
File "/usr/local/lib/python3.9/site-packages/requests/api.py", line 119, in post
return request('post', url, data=data, json=json, **kwargs)
File "/usr/local/lib/python3.9/site-packages/requests/api.py", line 61, in request
return session.request(method=method, url=url, **kwargs)
File "/usr/local/lib/python3.9/site-packages/requests/sessions.py", line 542, in request
resp = self.send(prep, **send_kwargs)
File "/usr/local/lib/python3.9/site-packages/requests/sessions.py", line 655, in send
r = adapter.send(request, **kwargs)
File "/usr/local/lib/python3.9/site-packages/requests/adapters.py", line 514, in send
raise SSLError(e, request=request)
requests.exceptions.SSLError: HTTPSConnectionPool(host='322.286.24.01', port=443): Max retries exceeded with url: /ach/ach_api/login (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self signed certificate (_ssl.c:1123)')))
The error is because requests is trying to check for the ca cert, try the following:
response = requests.post(url, data=payload, verify=False)
Or, if you have a ca.crt somewhere, usually pem format, you can try:
response = requests.post(url, data=payload, verify='/path/to/pem')
Also the IP address looks funny, although since the client connects I suspect you just changed that to anonymise your post?
Here iam using a request module to download the api information from github and code is shown below.
# Creation of Github request
# Import requests
import requests
r = requests.get('https://api.github.com/user', auth=('user','pass'))
print(r.status_code)
print(r.headers['content-type'])
print(r.encoding)
print(r.text)
print(r.json())
While using this module there is a error
python github.py
Traceback (most recent call last):
File "github.py", line 6, in <module>
r = requests.get('https://api.github.com/user', auth=('user','pass'))
File "/home/ubuntu/.local/lib/python2.7/site-packages/requests/api.py", line 70, in get
return request('get', url, params=params, **kwargs)
File "/home/ubuntu/.local/lib/python2.7/site-packages/requests/api.py", line 56, in request
return session.request(method=method, url=url, **kwargs)
File "/home/ubuntu/.local/lib/python2.7/site-packages/requests/sessions.py", line 488, in request
resp = self.send(prep, **send_kwargs)
File "/home/ubuntu/.local/lib/python2.7/site-packages/requests/sessions.py", line 609, in send
r = adapter.send(request, **kwargs)
File "/home/ubuntu/.local/lib/python2.7/site-packages/requests/adapters.py", line 497, in send
raise SSLError(e, request=request)
requests.exceptions.SSLError: ("bad handshake: Error([('SSL routines', 'ssl3_get_server_certificate', 'certificate verify failed')],)",)
What i have tried:
Replaced the username and password with my authentation details and it gave the following error.
401
application/json; charset=utf-8
utf-8
{"message":"Bad credentials","documentation_url":"https://developer.github.com/v3"}
{u'documentation_url': u'https://developer.github.com/v3', u'message': u'Bad credentials'}
Tried to work with access token in the settings tab of github but it was no use.
Any help? Please help me solve this problem.
Also requests modules is installed on my system.
import requests
r = requests.get('https://api.github.com/user', auth=('user','pass'), verify=False)
print(r.status_code)
print(r.headers['content-type'])
print(r.encoding)
print(r.text)
print(r.json())
This might work, if that's the case.. Then you have trouble verifying the certificate sent from the server (seeing as you're using HTTPS).
There's a whole section about this
Requests verifies SSL certificates for HTTPS requests, just like a web browser. By default, SSL verification is enabled, and Requests will throw a SSLError if it's unable to verify the certificate:
Note that you shouldn't disable TLS/SSL verification for production code, rather investigate why the certificate isn't valid and follow the guide lines from the official documentation.
Manual verification
You can always export the github certificate from say a browser and place the cert in the same directory as your script. Now this should normally not be needed, but as a test.. This should work:
r.get('https://api.github.com', verify='./github.crt')
And again, make sure you've exported the certificate and placed it as github.crt in the same directory as your script.
I'm trying to use the requests module to make a post request to an endpoint that requires ssl auth. My pem file is in the specified path and contains the client cert, and private key. However, I keep getting the Certificate Verified Failed exception. I see in the nginx logs that the request never even made it there. Anyone have any ideas why? I know the certs should work.
params = {
"param_2" : "32100",
"param_1" : "abc"
}
headers = {
"Content-Type" : "application/json"
}
body = json.dumps(params)
r = requests.post(
https://somesite.com/somepath,
data=body,
headers=headers,
timeout=10,
verify="/path/to/cert.pem"
)
Traceback (most recent call last):
File "./somefile.py", line 264, in <module>
start()
File "./somefile.py", line 149, in start
verify="/path/to/cert.pem"
File "/usr/local/lib/python2.7/dist-packages/requests/api.py", line 88, in post
return request('post', url, data=data, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/requests/api.py", line 44, in request
return session.request(method=method, url=url, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 448, in request
resp = self.send(prep, **send_kwargs)
File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 554, in send
r = adapter.send(request, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/requests/adapters.py", line 417, in send
raise SSLError(e, request=request)
requests.exceptions.SSLError: [Errno 1] _ssl.c:504: error:14090086:SSL routines:SSL3_GET_SERVER_CERTIFICATE:certificate verify failed
From the requests docs:
Requests can also ignore verifying the SSL certificate if you set
verify to False.
requests.get('https://kennethreitz.com', verify=False)
By default, verify is set to True. Option verify only applies to host
certs.
You can also specify a local cert to use as client side certificate,
as a single file (containing the private key and the certificate) or
as a tuple of both file’s path:
requests.get('https://kennethreitz.com', cert=('/path/server.crt', '/path/key'))
So it seems like you've just got the args wrong. Try it with 'cert' instead of verify.
I think I've discovered a problem with the Requests library's handling of redirects when using HTTPS. As far as I can tell, this is only a problem when the server redirects the Requests client to another HTTPS resource.
I can assure you that the proxy I'm using supports HTTPS and the CONNECT method because I can use it with a browser just fine. I'm using version 2.1.0 of the Requests library which is using 1.7.1 of the urllib3 library.
I watched the transactions in wireshark and I can see the first transaction for https://www.paypal.com/ but I don't see anything for https://www.paypal.com/home. I keep getting timeouts when debugging any deeper in the stack with my debugger so I don't know where to go from here. I'm definitely not seeing the request for /home as a result of the redirect. So it must be erroring out in the code before it gets sent to the proxy.
I want to know if this truly is a bug or if I am doing something wrong. It is really easy to reproduce so long as you have access to a proxy that you can send traffic through. See the code below:
import requests
proxiesDict = {
'http': "http://127.0.0.1:8080",
'https': "http://127.0.0.1:8080"
}
# This fails with "requests.exceptions.ProxyError: Cannot connect to proxy. Socket error: [Errno 111] Connection refused." when it tries to follow the redirect to /home
r = requests.get("https://www.paypal.com/", proxies=proxiesDict)
# This succeeds.
r = requests.get("https://www.paypal.com/home", proxies=proxiesDict)
This also happens when using urllib3 directly. It is probably mainly a bug in urllib3, which Requests uses under the hood, but I'm using the higher level requests library. See below:
proxy = urllib3.proxy_from_url('http://127.0.0.1:8080/')
# This fails with the same error as above.
res = proxy.urlopen('GET', https://www.paypal.com/)
# This succeeds
res = proxy.urlopen('GET', https://www.paypal.com/home)
Here is the traceback when using Requests:
Traceback (most recent call last):
File "tests/downloader_tests.py", line 22, in test_proxy_https_request
r = requests.get("https://www.paypal.com/", proxies=proxiesDict)
File "/usr/local/lib/python2.7/dist-packages/requests/api.py", line 55, in get
return request('get', url, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/requests/api.py", line 44, in request
return session.request(method=method, url=url, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 382, in request
resp = self.send(prep, **send_kwargs)
File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 505, in send
history = [resp for resp in gen] if allow_redirects else []
File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 167, in resolve_redirects
allow_redirects=False,
File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 485, in send
r = adapter.send(request, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/requests/adapters.py", line 375, in send
raise ProxyError(e)
requests.exceptions.ProxyError: Cannot connect to proxy. Socket error: [Errno 111] Connection refused.
Update:
The problem only seems to happen with a 302 (Found) redirect not with the normal 301 redirects (Moved Permanently). Also, I noticed that with the Chrome browser, Paypal doesn't return a redirect. I do see the redirect when using Requests - even though I'm borrowing Chrome's User Agent for this experiment. I'm looking for more URLs that return a 302 in order to get more data points.
I need this to work for all URLs or at least understand why I'm seeing this behavior.
This is a bug in urllib3. We're tracking it as urllib3 issue #295.
I'm calling a REST API with requests in python and so far have been successful when I set verify=False.
Now, I have to use client side cert that I need to import for authentication and I'm getting this error everytime I'm using the cert (.pfx). cert.pfx is password protected.
r = requests.post(url, params=payload, headers=headers,
data=payload, verify='cert.pfx')
This is the error I'm getting:
Traceback (most recent call last):
File "C:\Users\me\Desktop\test.py", line 65, in <module>
r = requests.post(url, params=payload, headers=headers, data=payload, verify=cafile)
File "C:\Python33\lib\site-packages\requests\api.py", line 88, in post
return request('post', url, data=data, **kwargs)
File "C:\Python33\lib\site-packages\requests\api.py", line 44, in request
return session.request(method=method, url=url, **kwargs)
File "C:\Python33\lib\site-packages\requests\sessions.py", line 346, in request
resp = self.send(prep, **send_kwargs)
File "C:\Python33\lib\site-packages\requests\sessions.py", line 449, in send
r = adapter.send(request, **kwargs)
File "C:\Python33\lib\site-packages\requests\adapters.py", line 322, in send
raise SSLError(e)
requests.exceptions.SSLError: unknown error (_ssl.c:2158)
I've also tried openssl to get .pem and key but with .pem and getting SSL: CERTIFICATE_VERIFY_FAILED
Can someone please direct me on how to import the certs and where to place it? I tried searching but still faced with the same issue.
I had this same problem. The verify parameter refers to the server's certificate. You want the cert parameter to specify your client certificate.
import requests
cert_file_path = "cert.pem"
key_file_path = "key.pem"
url = "https://example.com/resource"
params = {"param_1": "value_1", "param_2": "value_2"}
cert = (cert_file_path, key_file_path)
r = requests.get(url, params=params, cert=cert)
I had the same problem and to resolve this, I came to know that we have to send RootCA along with certificate and its key as shown below,
response = requests.post(url, data=your_data, cert=('path_client_certificate_file', 'path_certificate_key_file'), verify='path_rootCA')