How can I use SSPI to negotiate requests handled by external libraries? - python

I'll set expectations with the fact that I've been pushed well outside my area of expertise here. I'm behind a corporate firewall, and it's interfering with a lot of external code I use.
For example, I'm trying to use HuggingFace's from_pretrained method. Behind the scenes, this eventually makes a request similar to this:
import requests
requests.get('https://huggingface.co/distilbert-base-uncased-distilled-squad/resolve/main/tokenizer_config.json')
This requests fails with an error from my proxy telling me my credentials are missing, but it can be fixed with the following using this excellent library:
import requests
from requests_negotiate_sspi import HttpNegotiateAuth
s = requests.Session()
s.auth = HttpNegotiateAuth()
s.get('https://huggingface.co/distilbert-base-uncased-distilled-squad/resolve/main/tokenizer_config.json', verify='/path/to/cert.pem')
Unfortunately though, that request is made behind the scenes in the HuggingFace library. I can set env variables to save the path to the cert, but I can't use the SSPI negotiation unless I control that code directly (so far as I can tell). Is there any way around this problem?

Related

How to post request to API using only code?

I am developing a DAG to be scheduled on Apache Airflow which main porpuse will be to post survey data (on json format) to an API and then getting a response (the answers to the surveys). Since this whole process is going to be automated, every part of it has to be programmed in the DAG, so I can´t use Postman or any similar app (unless there is a way to automate their usage, but I don't know if this is possible).
I was thinking of using the requests library for Python, and the function I've written for posting the json to the API looks like this:
def postFileToAPI(**context):
print('uploadFileToAPI() ------ ')
json_file = context['ti'].xcom_pull(task_ids='toJson') ## this pulls the json file from a previous task
print('--------------- Posting survey request to API')
r = requests.post('https://[request]', data = json_file)
(I haven't finished defining the http link for the request because my source data is incomplete.)
However, since this is my frst time working with APIs and the requests library, I don't know if this is enough. For example, I'm unsure if I need to provide a token from the API to perform the request.
I also don't know if there are other libraries that are better suited for this or that could be a good support.
In short: I don't know if what I'm doing will work as intended, what other information I need t provide my DAG or if there are any libraries to make my work easier.
The Python requests package that you're using is all you need, except if you're making a request that needs extra authorisation - then you should also import for example requests_jwt (then from requests_jwt import JWTAuth) if you're using JSON web tokens, or whatever relevant requests package corresponds for your authorisation style.
You make POST and GET requests and all individual requests separately.
Include the URL and data arguments as you have done and that should work!
You may also need headers and/or auth arguments to get through security,
eg for the GitLab api for a private repository you would include these extra arguments, where GITLAB_TOKEN is a GitLab web token.
```headers={'PRIVATE-TOKEN': GITLAB_TOKEN},
auth=JWTAuth(GITLAB_TOKEN)```
If you just try it it should work, if it doesn't work then test the API with curl requests directly in the Terminal, or let us know :)

Generate the AWS HTTP signature from boto3

I am working with the AWS Transcribe streaming service that boto3 does not support yet, so to make HTTP/2 requests, I need to manually setup the authorization header with the "AWS Signature Version 4"
I've found some example implementation, but I was hoping to just call whatever function boto3/botocore have implemented using the same configuration object.
Something like
session = boto3.Session(...)
auth = session.generate_signature('POST', '/stream-transcription', ...)
Any pointers in that direction?
Contrary to the AWS SDKs for most other programming languages, boto3/botocore don't offer the functionality to sign arbitrary requests using "AWS Signature Version 4" yet. However there is at least already an open feature request for that: https://github.com/boto/botocore/issues/1784
In this feature request, existing alternatives are discussed as well. One is the third-party Python library aws-requests-auth, which provides a thin wrapper around botocore and requests to sign HTTP-requests. That looks like the following:
import requests
from aws_requests_auth.boto_utils import BotoAWSRequestsAuth
auth = BotoAWSRequestsAuth(aws_host="your-service.domain.tld",
aws_region="us-east-1",
aws_service="execute-api")
response = requests.get("https://your-service.domain.tld",
auth=auth)
Another alternative presented in the feature request is to implement the necessary glue-code on your own, as shown in the following gist: https://gist.github.com/rhboyd/1e01190a6b27ca4ba817bf272d5a5f9a.
Did you check this SDK? Seems very recent but might do what you need.
https://github.com/awslabs/amazon-transcribe-streaming-sdk/tree/master
It looks like it handles the signing: https://github.com/awslabs/amazon-transcribe-streaming-sdk/blob/master/amazon_transcribe/signer.py
I have not tested this, but you can likely accomplish this by following along with with this SigV4 unit test:
https://github.com/boto/botocore/blob/master/tests/unit/test_auth_sigv4.py
Note, this constructs a request using the botocore.awsrequest.AWSRequest helper. You'll likely need to dig around to figure out how to send the actual HTTP request (perhaps with httpsession.py)

Python requests call fails with HTTPS

I am running a Flask restful API behind an NGINX web server on AWS. I am hitting that with a python module from my Pi.
Everything worked fine when I was using HTTP to make calls to the api. But I just locked down my api so only HTTPS is possible. I changed the UIRL used by my python module but it now fails. The code is quite simple...here is an extract:
jsonpkg = {'subscriberID': self.api_login, 'token': self.api_token,
'content': speech_content}
headers = {'Content-Type': 'application/json'}
r = requests.post(self.api_apiurl, data=json.dumps(jsonpkg), headers=headers)
The values are being correct set by the class init section. And I am importing the requests module at the top. Error messages indicate it is using python 2.7. However when I monitor the API I can see its not even hitting the server. I can point a browser to the api and its working fine.
Am I to understand the requests module in python 2.7 does not support https?
Are there additional parameters I need to send for https?
Aha! With a little more digging into the request module docs I found the answer. If I use the following
r = requests.post(self.api_apiurl, data=json.dumps(jsonpkg), headers=headers, verify=False)
then it works. So the issue is with verifying the cert. I am not quite sure why the browser gets by without this...but perhaps it does the extra stuff automatically. So I either need to NOT verify the cert or have a local copy(?) that can be verified.
Final Update:
I finally worked out how to concatenate my site certificate with the chain certificate (and understand why). This site here was a great help. Also, once they are concatenated you will probably get a second error, which if you google it you will find is caused by the need for a carriage return after the first certificate and before the second (edit the resulting concatenated file with notepad). I then was able to return the post to using "verify=True" which made the warnings about no verification go away.

difference between urllibx and requests? [duplicate]

In Python, what are the differences between the urllib, urllib2, urllib3 and requests modules? Why are there three? They seem to do the same thing...
I know it's been said already, but I'd highly recommend the requests Python package.
If you've used languages other than python, you're probably thinking urllib and urllib2 are easy to use, not much code, and highly capable, that's how I used to think. But the requests package is so unbelievably useful and short that everyone should be using it.
First, it supports a fully restful API, and is as easy as:
import requests
resp = requests.get('http://www.mywebsite.com/user')
resp = requests.post('http://www.mywebsite.com/user')
resp = requests.put('http://www.mywebsite.com/user/put')
resp = requests.delete('http://www.mywebsite.com/user/delete')
Regardless of whether GET / POST, you never have to encode parameters again, it simply takes a dictionary as an argument and is good to go:
userdata = {"firstname": "John", "lastname": "Doe", "password": "jdoe123"}
resp = requests.post('http://www.mywebsite.com/user', data=userdata)
Plus it even has a built in JSON decoder (again, I know json.loads() isn't a lot more to write, but this sure is convenient):
resp.json()
Or if your response data is just text, use:
resp.text
This is just the tip of the iceberg. This is the list of features from the requests site:
International Domains and URLs
Keep-Alive & Connection Pooling
Sessions with Cookie Persistence
Browser-style SSL Verification
Basic/Digest Authentication
Elegant Key/Value Cookies
Automatic Decompression
Unicode Response Bodies
Multipart File Uploads
Connection Timeouts
.netrc support
List item
Python 2.7, 3.6—3.9
Thread-safe.
urllib2 provides some extra functionality, namely the urlopen() function can allow you to specify headers (normally you'd have had to use httplib in the past, which is far more verbose.) More importantly though, urllib2 provides the Request class, which allows for a more declarative approach to doing a request:
r = Request(url='http://www.mysite.com')
r.add_header('User-Agent', 'awesome fetcher')
r.add_data(urllib.urlencode({'foo': 'bar'})
response = urlopen(r)
Note that urlencode() is only in urllib, not urllib2.
There are also handlers for implementing more advanced URL support in urllib2. The short answer is, unless you're working with legacy code, you probably want to use the URL opener from urllib2, but you still need to import into urllib for some of the utility functions.
Bonus answer
With Google App Engine, you can use any of httplib, urllib or urllib2, but all of them are just wrappers for Google's URL Fetch API. That is, you are still subject to the same limitations such as ports, protocols, and the length of the response allowed. You can use the core of the libraries as you would expect for retrieving HTTP URLs, though.
In the Python 2 standard library there were two HTTP libraries that existed side-by-side. Despite the similar name, they were unrelated: they had a different design and a different implementation.
urllib was the original Python HTTP client, added to the standard library in Python 1.2. Earlier documentation for urllib can be found in Python 1.4.
urllib2 was a more capable HTTP client, added in Python 1.6, intended as a replacement for urllib:
urllib2 - new and improved but incompatible version of urllib (still experimental).
Earlier documentation for urllib2 can be found in Python 2.1.
The Python 3 standard library has a new urllib which is a merged/refactored/rewritten version of the older modules.
urllib3 is a third-party package (i.e., not in CPython's standard library). Despite the name, it is unrelated to the standard library packages, and there is no intention to include it in the standard library in the future.
Finally, requests internally uses urllib3, but it aims for an easier-to-use API.
urllib and urllib2 are both Python modules that do URL request related stuff but offer different functionalities.
1) urllib2 can accept a Request object to set the headers for a URL request, urllib accepts only a URL.
2) urllib provides the urlencode method which is used for the generation of GET query strings, urllib2 doesn't have such a function. This is one of the reasons why urllib is often used along with urllib2.
Requests - Requests’ is a simple, easy-to-use HTTP library written in Python.
1) Python Requests encodes the parameters automatically so you just pass them as simple arguments, unlike in the case of urllib, where you need to use the method urllib.encode() to encode the parameters before passing them.
2) It automatically decoded the response into Unicode.
3) Requests also has far more convenient error handling.If your authentication failed, urllib2 would raise a urllib2.URLError, while Requests would return a normal response object, as expected. All you have to see if the request was successful by boolean response.ok
Just to add to the existing answers, I don't see anyone mentioning that python requests is not a native library. If you are ok with adding dependencies, then requests is fine. However, if you are trying to avoid adding dependencies, urllib is a native python library that is already available to you.
One considerable difference is about porting Python2 to Python3. urllib2 does not exist for python3 and its methods ported to urllib.
So you are using that heavily and want to migrate to Python3 in future, consider using urllib.
However 2to3 tool will automatically do most of the work for you.
I think all answers are pretty good. But fewer details about urllib3.urllib3 is a very powerful HTTP client for python.
For installing both of the following commands will work,
urllib3
using pip,
pip install urllib3
or you can get the latest code from Github and install them using,
$ git clone git://github.com/urllib3/urllib3.git
$ cd urllib3
$ python setup.py install
Then you are ready to go,
Just import urllib3 using,
import urllib3
In here, Instead of creating a connection directly, You’ll need a PoolManager instance to make requests. This handles connection pooling and thread-safety for you. There is also a ProxyManager object for routing requests through an HTTP/HTTPS proxy
Here you can refer to the documentation.
example usage :
>>> from urllib3 import PoolManager
>>> manager = PoolManager(10)
>>> r = manager.request('GET', 'http://google.com/')
>>> r.headers['server']
'gws'
>>> r = manager.request('GET', 'http://yahoo.com/')
>>> r.headers['server']
'YTS/1.20.0'
>>> r = manager.request('POST', 'http://google.com/mail')
>>> r = manager.request('HEAD', 'http://google.com/calendar')
>>> len(manager.pools)
2
>>> conn = manager.connection_from_host('google.com')
>>> conn.num_requests
3
As mentioned in urrlib3 documentations,urllib3 brings many critical features that are missing from the Python standard libraries.
Thread safety.
Connection pooling.
Client-side SSL/TLS verification.
File uploads with multipart encoding.
Helpers for retrying requests and dealing with HTTP redirects.
Support for gzip and deflate encoding.
Proxy support for HTTP and SOCKS.
100% test coverage.
Follow the user guide for more details.
Response content (The HTTPResponse object provides status, data,
and header attributes)
Using io Wrappers with Response content
Creating a query parameter
Advanced usage of urllib3
requests
requests uses urllib3 under the hood and make it even simpler to make requests and retrieve data.
For one thing, keep-alive is 100% automatic, compared to urllib3 where it's not. It also has event hooks which call a callback function when an event is triggered, like receiving a response
In requests, each request type has its own function. So instead of creating a connection or a pool, you directly GET a URL.
For install requests using pip just run
pip install requests
or you can just install from source code,
$ git clone git://github.com/psf/requests.git
$ cd requests
$ python setup.py install
Then, import requests
Here you can refer the official documentation,
For some advanced usage like session object, SSL verification, and Event Hooks please refer to this url.
I like the urllib.urlencode function, and it doesn't appear to exist in urllib2.
>>> urllib.urlencode({'abc':'d f', 'def': '-!2'})
'abc=d+f&def=-%212'
To get the content of a url:
try: # Try importing requests first.
import requests
except ImportError:
try: # Try importing Python3 urllib
import urllib.request
except AttributeError: # Now importing Python2 urllib
import urllib
def get_content(url):
try: # Using requests.
return requests.get(url).content # Returns requests.models.Response.
except NameError:
try: # Using Python3 urllib.
with urllib.request.urlopen(index_url) as response:
return response.read() # Returns http.client.HTTPResponse.
except AttributeError: # Using Python3 urllib.
return urllib.urlopen(url).read() # Returns an instance.
It's hard to write Python2 and Python3 and request dependencies code for the responses because they urlopen() functions and requests.get() function return different types:
Python2 urllib.request.urlopen() returns a http.client.HTTPResponse
Python3 urllib.urlopen(url) returns an instance
Request request.get(url) returns a requests.models.Response
You should generally use urllib2, since this makes things a bit easier at times by accepting Request objects and will also raise a URLException on protocol errors. With Google App Engine though, you can't use either. You have to use the URL Fetch API that Google provides in its sandboxed Python environment.
A key point that I find missing in the above answers is that urllib returns an object of type <class http.client.HTTPResponse> whereas requests returns <class 'requests.models.Response'>.
Due to this, read() method can be used with urllib but not with requests.
P.S. : requests is already rich with so many methods that it hardly needs one more as read() ;>

What are the differences between the urllib, urllib2, urllib3 and requests module?

In Python, what are the differences between the urllib, urllib2, urllib3 and requests modules? Why are there three? They seem to do the same thing...
I know it's been said already, but I'd highly recommend the requests Python package.
If you've used languages other than python, you're probably thinking urllib and urllib2 are easy to use, not much code, and highly capable, that's how I used to think. But the requests package is so unbelievably useful and short that everyone should be using it.
First, it supports a fully restful API, and is as easy as:
import requests
resp = requests.get('http://www.mywebsite.com/user')
resp = requests.post('http://www.mywebsite.com/user')
resp = requests.put('http://www.mywebsite.com/user/put')
resp = requests.delete('http://www.mywebsite.com/user/delete')
Regardless of whether GET / POST, you never have to encode parameters again, it simply takes a dictionary as an argument and is good to go:
userdata = {"firstname": "John", "lastname": "Doe", "password": "jdoe123"}
resp = requests.post('http://www.mywebsite.com/user', data=userdata)
Plus it even has a built in JSON decoder (again, I know json.loads() isn't a lot more to write, but this sure is convenient):
resp.json()
Or if your response data is just text, use:
resp.text
This is just the tip of the iceberg. This is the list of features from the requests site:
International Domains and URLs
Keep-Alive & Connection Pooling
Sessions with Cookie Persistence
Browser-style SSL Verification
Basic/Digest Authentication
Elegant Key/Value Cookies
Automatic Decompression
Unicode Response Bodies
Multipart File Uploads
Connection Timeouts
.netrc support
List item
Python 2.7, 3.6—3.9
Thread-safe.
urllib2 provides some extra functionality, namely the urlopen() function can allow you to specify headers (normally you'd have had to use httplib in the past, which is far more verbose.) More importantly though, urllib2 provides the Request class, which allows for a more declarative approach to doing a request:
r = Request(url='http://www.mysite.com')
r.add_header('User-Agent', 'awesome fetcher')
r.add_data(urllib.urlencode({'foo': 'bar'})
response = urlopen(r)
Note that urlencode() is only in urllib, not urllib2.
There are also handlers for implementing more advanced URL support in urllib2. The short answer is, unless you're working with legacy code, you probably want to use the URL opener from urllib2, but you still need to import into urllib for some of the utility functions.
Bonus answer
With Google App Engine, you can use any of httplib, urllib or urllib2, but all of them are just wrappers for Google's URL Fetch API. That is, you are still subject to the same limitations such as ports, protocols, and the length of the response allowed. You can use the core of the libraries as you would expect for retrieving HTTP URLs, though.
In the Python 2 standard library there were two HTTP libraries that existed side-by-side. Despite the similar name, they were unrelated: they had a different design and a different implementation.
urllib was the original Python HTTP client, added to the standard library in Python 1.2. Earlier documentation for urllib can be found in Python 1.4.
urllib2 was a more capable HTTP client, added in Python 1.6, intended as a replacement for urllib:
urllib2 - new and improved but incompatible version of urllib (still experimental).
Earlier documentation for urllib2 can be found in Python 2.1.
The Python 3 standard library has a new urllib which is a merged/refactored/rewritten version of the older modules.
urllib3 is a third-party package (i.e., not in CPython's standard library). Despite the name, it is unrelated to the standard library packages, and there is no intention to include it in the standard library in the future.
Finally, requests internally uses urllib3, but it aims for an easier-to-use API.
urllib and urllib2 are both Python modules that do URL request related stuff but offer different functionalities.
1) urllib2 can accept a Request object to set the headers for a URL request, urllib accepts only a URL.
2) urllib provides the urlencode method which is used for the generation of GET query strings, urllib2 doesn't have such a function. This is one of the reasons why urllib is often used along with urllib2.
Requests - Requests’ is a simple, easy-to-use HTTP library written in Python.
1) Python Requests encodes the parameters automatically so you just pass them as simple arguments, unlike in the case of urllib, where you need to use the method urllib.encode() to encode the parameters before passing them.
2) It automatically decoded the response into Unicode.
3) Requests also has far more convenient error handling.If your authentication failed, urllib2 would raise a urllib2.URLError, while Requests would return a normal response object, as expected. All you have to see if the request was successful by boolean response.ok
Just to add to the existing answers, I don't see anyone mentioning that python requests is not a native library. If you are ok with adding dependencies, then requests is fine. However, if you are trying to avoid adding dependencies, urllib is a native python library that is already available to you.
One considerable difference is about porting Python2 to Python3. urllib2 does not exist for python3 and its methods ported to urllib.
So you are using that heavily and want to migrate to Python3 in future, consider using urllib.
However 2to3 tool will automatically do most of the work for you.
I think all answers are pretty good. But fewer details about urllib3.urllib3 is a very powerful HTTP client for python.
For installing both of the following commands will work,
urllib3
using pip,
pip install urllib3
or you can get the latest code from Github and install them using,
$ git clone git://github.com/urllib3/urllib3.git
$ cd urllib3
$ python setup.py install
Then you are ready to go,
Just import urllib3 using,
import urllib3
In here, Instead of creating a connection directly, You’ll need a PoolManager instance to make requests. This handles connection pooling and thread-safety for you. There is also a ProxyManager object for routing requests through an HTTP/HTTPS proxy
Here you can refer to the documentation.
example usage :
>>> from urllib3 import PoolManager
>>> manager = PoolManager(10)
>>> r = manager.request('GET', 'http://google.com/')
>>> r.headers['server']
'gws'
>>> r = manager.request('GET', 'http://yahoo.com/')
>>> r.headers['server']
'YTS/1.20.0'
>>> r = manager.request('POST', 'http://google.com/mail')
>>> r = manager.request('HEAD', 'http://google.com/calendar')
>>> len(manager.pools)
2
>>> conn = manager.connection_from_host('google.com')
>>> conn.num_requests
3
As mentioned in urrlib3 documentations,urllib3 brings many critical features that are missing from the Python standard libraries.
Thread safety.
Connection pooling.
Client-side SSL/TLS verification.
File uploads with multipart encoding.
Helpers for retrying requests and dealing with HTTP redirects.
Support for gzip and deflate encoding.
Proxy support for HTTP and SOCKS.
100% test coverage.
Follow the user guide for more details.
Response content (The HTTPResponse object provides status, data,
and header attributes)
Using io Wrappers with Response content
Creating a query parameter
Advanced usage of urllib3
requests
requests uses urllib3 under the hood and make it even simpler to make requests and retrieve data.
For one thing, keep-alive is 100% automatic, compared to urllib3 where it's not. It also has event hooks which call a callback function when an event is triggered, like receiving a response
In requests, each request type has its own function. So instead of creating a connection or a pool, you directly GET a URL.
For install requests using pip just run
pip install requests
or you can just install from source code,
$ git clone git://github.com/psf/requests.git
$ cd requests
$ python setup.py install
Then, import requests
Here you can refer the official documentation,
For some advanced usage like session object, SSL verification, and Event Hooks please refer to this url.
I like the urllib.urlencode function, and it doesn't appear to exist in urllib2.
>>> urllib.urlencode({'abc':'d f', 'def': '-!2'})
'abc=d+f&def=-%212'
To get the content of a url:
try: # Try importing requests first.
import requests
except ImportError:
try: # Try importing Python3 urllib
import urllib.request
except AttributeError: # Now importing Python2 urllib
import urllib
def get_content(url):
try: # Using requests.
return requests.get(url).content # Returns requests.models.Response.
except NameError:
try: # Using Python3 urllib.
with urllib.request.urlopen(index_url) as response:
return response.read() # Returns http.client.HTTPResponse.
except AttributeError: # Using Python3 urllib.
return urllib.urlopen(url).read() # Returns an instance.
It's hard to write Python2 and Python3 and request dependencies code for the responses because they urlopen() functions and requests.get() function return different types:
Python2 urllib.request.urlopen() returns a http.client.HTTPResponse
Python3 urllib.urlopen(url) returns an instance
Request request.get(url) returns a requests.models.Response
You should generally use urllib2, since this makes things a bit easier at times by accepting Request objects and will also raise a URLException on protocol errors. With Google App Engine though, you can't use either. You have to use the URL Fetch API that Google provides in its sandboxed Python environment.
A key point that I find missing in the above answers is that urllib returns an object of type <class http.client.HTTPResponse> whereas requests returns <class 'requests.models.Response'>.
Due to this, read() method can be used with urllib but not with requests.
P.S. : requests is already rich with so many methods that it hardly needs one more as read() ;>

Categories

Resources