How to "keep-alive" with cookielib and httplib in python? - python

In python, I'm using httplib because it "keep-alive" the http connection (as oppose to urllib(2)). Now, I want to use cookielib with httplib but they seem to hate each other!! (no way to interface them together).
Does anyone know of a solution to that problem?

HTTP handler for urllib2 that supports keep-alive

You should consider using the Requests library instead at the earliest chance you have to refactor your code. In the mean time;
HACK ALERT! :)
I'd go other suggested way, but I've done a hack (done for different reasons though), which does create an interface between httplib and cookielib.
What I did was creating a fake HTTPRequest with minimal required set of methods, so that CookieJar would recognize it and process cookies as needed. I've used that fake request object, setting all the data needed for cookielib.
Here is the code of the class:
class HTTPRequest( object ):
"""
Data container for HTTP request (used for cookie processing).
"""
def __init__( self, host, url, headers={}, secure=False ):
self._host = host
self._url = url
self._secure = secure
self._headers = {}
for key, value in headers.items():
self.add_header(key, value)
def has_header( self, name ):
return name in self._headers
def add_header( self, key, val ):
self._headers[key.capitalize()] = val
def add_unredirected_header(self, key, val):
self._headers[key.capitalize()] = val
def is_unverifiable( self ):
return True
def get_type( self ):
return 'https' if self._secure else 'http'
def get_full_url( self ):
port_str = ""
port = str(self._host[1])
if self._secure:
if port != 443:
port_str = ":"+port
else:
if port != 80:
port_str = ":"+port
return self.get_type() + '://' + self._host[0] + port_str + self._url
def get_header( self, header_name, default=None ):
return self._headers.get( header_name, default )
def get_host( self ):
return self._host[0]
get_origin_req_host = get_host
def get_headers( self ):
return self._headers
Please note, the class has support for HTTPS protocol only (all I needed at the moment).
The code, which used this class was (please note another hack to make response compatible with cookielib):
cookies = CookieJar()
headers = {
# headers that you wish to set
}
# construct fake request
fake_request = HTTPRequest( host, request_url, headers )
# add cookies to fake request
cookies.add_cookie_header(fake_request)
# issue an httplib.HTTPConnection based request using cookies and headers from the fake request
http_connection.request(type, request_url, body, fake_request.get_headers())
response = http_connection.getresponse()
if response.status == httplib.OK:
# HACK: pretend we're urllib2 response
response.info = lambda : response.msg
# read and store cookies from response
cookies.extract_cookies(response, fake_request)
# process response...

Related

Issue with singleton pattern in python

I am using a python class with Singleton pattern to get different types of benefits e.g
To limit concurrent access to a shared resource.
To create a global point of access for a resource.
To create just one instance of a class, throughout the lifetime of a program.
But now having some issues with that also, so let me know how can I fix this issue but get the benefits of the Singleton pattern mentioned above. Note: Here I am using python zeep for SOAP calls.
Sample Code:
from zeep.plugins import HistoryPlugin
from shared import Singleton
import zeep
class MySoapClient(metaclass=Singleton):
"""MyFancySoapClient"""
def __init__(
self,
user: str = "username",
password: str = "password",
wsdl: str = "wsdl_url_goes_here",
):
self._history = HistoryPlugin()
self._soap_client = zeep.Client(wsdl, plugins=[self._history])
def methodA():
resp = self._soap_client.ServiceA(
body
)
return resp
def methodB():
resp = self._soap_client.ServiceB(
body
)
return resp
def methodC(request_proxy_url):
self._soap_client.transport.session.proxies = {"https": request_proxy_url}
resp = self._soap_client.ServiceE(
body
)
return resp
def methodD():
resp = self._soap_client.ServiceC(
body
)
return resp
def methodE():
resp = self._soap_client.ServiceD(
body
)
return resp
client = MySoapClient()
client.methodA()
client.methodB()
client.methodC("https://example.com") <---- after this call it modifies the '_soap_client' attribute
client.methodD()
client.methodE()
That's why methodD() and methodE() get affected because of added self._soap_client.transport.session.proxies, actually I need to set proxy URL only for methodC() but due to singleton it propagates the updated attribute value to mothodD() and methodE() also. Finally make my methodD() and methodE() fails because SOAP call inside those method doesn't need use to proxy.
You can rewrite your methodC as:
def methodC(request_proxy_url):
original_proxies = self._soap_client.transport.session.proxies
self._soap_client.transport.session.proxies = {"https": request_proxy_url}
resp = self._soap_client.ServiceE(
body
)
self._soap_client.transport.session.proxies = original_proxies
return resp
(and do similar for any other methods which need to modify the set up of the self._soap_client instance before making the request)
I am a bit skeptical of enforcing the singleton pattern, rather than just creating a global var in a module and importing that from everywhere... but that is just personal taste and no relation to your issue.
Knowing the nature of SOAP APIs I expect the zeep.Client instance is quite a heavy object so it totally makes sense to try to avoid having multiple instances if you can avoid it.
If you use a multi threaded platform (like e.g. Python with gevent) then you have to be careful to avoid global vars which mutate their shared state, like this MySoapClient now does.
An alternative would be for it to maintain a small number of distinct zeep.Client instances, and for each methodA, methodC etc to use the appropriate _soap_client instance. Something like:
class MySoapClient(metaclass=Singleton):
"""MyFancySoapClient"""
def __init__(
self,
user: str = "username",
password: str = "password",
wsdl: str = "wsdl_url_goes_here",
request_proxy_url: str = "default value",
):
self._history = HistoryPlugin()
self._soap_client = zeep.Client(wsdl, plugins=[self._history])
self._soap_client_proxied = zeep.Client(wsdl, plugins=[self._history])
self._soap_client_proxied.transport.session.proxies = {"https": request_proxy_url}
def methodB():
resp = self._soap_client.ServiceB(
body
)
return resp
def methodC(request_proxy_url):
request_proxy_url}
resp = self._soap_client_proxied.ServiceE(
body
)
return resp
# etc

requests.auth.AuthBase TypeError on call

From the docs at https://docs.python-requests.org/en/master/user/authentication/
I gathered that the __call__ function in my own Auth Class should have the r argument,
However when i go to call this class in requests.get(auth=MyClass), I get the error TypeError: __call__() missing 1 required positional argument: 'r'
The code for my class can be found here https://pastebin.com/YDZ2DeaT
import requests
import time
import base64
from requests.auth import AuthBase
class TokenAuth(AuthBase):
"""Refreshes SkyKick token, for use with all Skykick requests"""
def __init__(self, Username: str, SubKey: str):
self.Username = Username
self.SubKey = SubKey
# Initialise with no token and instant expiry
self.Token = None
self.TokenExpiry = time.time()
self.Headers = {
# Request headers
'Content-Type' : 'application/x-www-form-urlencoded',
'Ocp-Apim-Subscription-Key': self.SubKey,
}
self.Body = {
# Request body
'grant_type': 'client_credentials',
'scope' : 'Partner'
}
def regenToken(self):
# Sends request to regenerate token
try:
# Get key from API
response = requests.post("https://apis.skykick.com/auth/token",
headers=self.Headers,
auth=(self.Username, self.SubKey),
data=self.Body,
).json()
except:
raise Exception("Sending request failed, check connection.")
# API errors are inconsistent, easiest way to catch them
if "error" in response or "statusCode" in response:
raise Exception(
"Token requesting failed, cannot proceed with any Skykick actions, exiting.\n"
f"Error raised was {response}")
# Get token from response and set expiry
self.Token = response["access_token"]
self.TokenExpiry = time.time() + 82800
def __call__(self, r):
# If token expiry is now or in past, call regenToken
if self.TokenExpiry <= time.time():
self.regenToken()
# Set headers and return complete requests.Request object
r.headers["Authorization"] = f"Bearer {self.Token}"
return r
# Initialise our token class, so it is ready to call
TokenClass = TokenAuth("test", "1234")
#Send request with class as auth method.
requests.get("https://apis.skykick.com/whoami", auth=TokenClass())
I've tried using the example code, which works, but I can't figure out why mine won't work.
python-requests version is 2.25.1
I think I know what is going on.
This line instantiates an object, called TokenClass
TokenClass = TokenAuth("test", "1234")
then here,
requests.get("https://apis.skykick.com/whoami", auth=TokenClass())
you are calling that object like a function
when you call an object like a function, python looks for the __call__ method of the object.
And you are not calling in any arguments here. What you have is roughly the same as this I think
requests.get("https://apis.skykick.com/whoami", auth=TokenClass.__call__())
and so it complains that you are missing the r argument
This is their example:
import requests
class MyAuth(requests.auth.AuthBase):
def __call__(self, r):
# Implement my authentication
return r
url = 'https://httpbin.org/get'
requests.get(url, auth=MyAuth())
MyAuth is a class that they define, and then MyAuth() creates an instance of it that they pass in to get.
Yours is more like this
import requests
class MyAuth(requests.auth.AuthBase):
def __call__(self, r):
# Implement my authentication
return r
url = 'https://httpbin.org/get'
myAuth = MyAuth() # create an instance of the class
requests.get(url, auth=myAuth()) # call the instance and pass in result
It could also be written like this
import requests
class MyAuth(requests.auth.AuthBase):
def __call__(self, r):
# Implement my authentication
return r
url = 'https://httpbin.org/get'
requests.get(url, auth=MyAuth()())
This program with produce the same error you are getting
import requests
class MyAuth(requests.auth.AuthBase):
def __call__(self, r):
# Implement my authentication
return r
url = 'https://httpbin.org/get'
MyAuth()()
because when you put () after a class, you get an instance, and when you put () after an instance, you call the __call__ method

Setting tweepy useragent universally?

I've been playing with tweepy for a while, but I keep having rate-limiting issues, getting 429 errors. I know you can set the headers on individual calls like
api.get_user('twitter', headers={'User-Agent': 'MyUserAgent'})
but is there a way to set the header in one place and not have to do it on every api call?
Hacky way:
import functools
class NewAPI(object):
def __init__(self, api):
self.api = api
def __getattr__(self, key):
call = getattr(self.api, key)
#functools.wraps(call)
def wrapped_call(*args, **kwargs):
headers = kwargs.pop('headers', {})
headers['User-Agent'] = 'MyUserAgent' # or make this a class variable/instance variable
kwargs['headers'] = headers
return call(*args, **kwargs)
return wrapped_call
api = NewAPI(api)
print(api.get_user('twitter'))
Disclaimer: untested as I don't have tweepy.

How to test twisted web resource with trial?

I'm developing a twisted.web server - it consists of some resources that apart from rendering stuff use adbapi to fetch some data and write some data to postgresql database. I'm trying to figoure out how to write a trial unittest that would test resource rendering without using net (in other words: that would initialize a resource, produce it a dummy request etc.).
Lets assume the View resource is a simple leaf that in render_GET returns NOT_DONE_YET and tinkers with adbapi to produce simple text as a result. Now, I've written this useless code and I can't come up how to make it actually initialize the resource and produce some sensible response:
from twisted.trial import unittest
from myserv.views import View
from twisted.web.test.test_web import DummyRequest
class ExistingView(unittest.TestCase):
def test_rendering(self):
slug = "hello_world"
view = View(slug)
request = DummyRequest([''])
output = view.render_GET(request)
self.assertEqual(request.responseCode, 200)
The output is... 1. I've also tried such approach: output = request.render(view) but same output = 1. Why? I'd be very gratefull for some example how to write such unittest!
Here's a function that will render a request and convert the result into a Deferred that fires when rendering is complete:
def _render(resource, request):
result = resource.render(request)
if isinstance(result, str):
request.write(result)
request.finish()
return succeed(None)
elif result is server.NOT_DONE_YET:
if request.finished:
return succeed(None)
else:
return request.notifyFinish()
else:
raise ValueError("Unexpected return value: %r" % (result,))
It's actually used in Twisted Web's test suite, but it's private because it has no unit tests itself. ;)
You can use it to write a test like this:
def test_rendering(self):
slug = "hello_world"
view = View(slug)
request = DummyRequest([''])
d = _render(view, request)
def rendered(ignored):
self.assertEquals(request.responseCode, 200)
self.assertEquals("".join(request.written), "...")
...
d.addCallback(rendered)
return d
Here is a DummierRequest class that fixes almost all my problems. Only thing left is it does not set any response code! Why?
from twisted.web.test.test_web import DummyRequest
from twisted.web import server
from twisted.internet.defer import succeed
from twisted.internet import interfaces, reactor, protocol, address
from twisted.web.http_headers import _DictHeaders, Headers
class DummierRequest(DummyRequest):
def __init__(self, postpath, session=None):
DummyRequest.__init__(self, postpath, session)
self.notifications = []
self.received_cookies = {}
self.requestHeaders = Headers()
self.responseHeaders = Headers()
self.cookies = [] # outgoing cookies
def setHost(self, host, port, ssl=0):
self._forceSSL = ssl
self.requestHeaders.setRawHeaders("host", [host])
self.host = address.IPv4Address("TCP", host, port)
def addCookie(self, k, v, expires=None, domain=None, path=None, max_age=None, comment=None, secure=None):
"""
Set an outgoing HTTP cookie.
In general, you should consider using sessions instead of cookies, see
L{twisted.web.server.Request.getSession} and the
L{twisted.web.server.Session} class for details.
"""
cookie = '%s=%s' % (k, v)
if expires is not None:
cookie = cookie +"; Expires=%s" % expires
if domain is not None:
cookie = cookie +"; Domain=%s" % domain
if path is not None:
cookie = cookie +"; Path=%s" % path
if max_age is not None:
cookie = cookie +"; Max-Age=%s" % max_age
if comment is not None:
cookie = cookie +"; Comment=%s" % comment
if secure:
cookie = cookie +"; Secure"
self.cookies.append(cookie)
def getCookie(self, key):
"""
Get a cookie that was sent from the network.
"""
return self.received_cookies.get(key)
def getClientIP(self):
"""
Return the IPv4 address of the client which made this request, if there
is one, otherwise C{None}.
"""
return "192.168.1.199"

using cookies with twisted.web.client

I'm trying to make a web client application using twisted but having some trouble with cookies. Does anyone have an example I can look at?
While it's true that getPage doesn't easily allow direct access to the request or response headers (just one example of how getPage isn't a super awesome API), cookies are actually supported.
cookies = {cookies: tosend}
d = getPage(url, cookies=cookies)
def cbPage(result):
print 'Look at my cookies:', cookies
d.addCallback(cbPage)
Any cookies in the dictionary when it is passed to getPage will be sent. Any new cookies the server sets in response to the request will be added to the dictionary.
You might have missed this feature when looking at getPage because the getPage signature doesn't have a cookies parameter anywhere in it! However, it does take **kwargs, and this is how cookies is supported: any extra arguments passed to getPage that it doesn't know about itself, it passes on to HTTPClientFactory.__init__. Take a look at that method's signature to see all of the things you can pass to getPage.
Turns out there is no easy way afaict
The headers are stored in twisted.web.client.HTTPClientFactory but not available from twisted.web.client.getPage() which is the function designed for pulling back a web page. I ended up rewriting the function:
from twisted.web import client
def getPage(url, contextFactory=None, *args, **kwargs):
fact = client._makeGetterFactory(
url,
HTTPClientFactory,
contextFactory=contextFactory,
*args, **kwargs)
return fact.deferred.addCallback(lambda data: (data, fact.response_headers))
from twisted.internet import reactor
from twisted.web import client
def getPage(url, contextFactory=None, *args, **kwargs):
return client._makeGetterFactory(
url,
CustomHTTPClientFactory,
contextFactory=contextFactory,
*args, **kwargs).deferred
class CustomHTTPClientFactory(client.HTTPClientFactory):
def __init__(self,url, method='GET', postdata=None, headers=None,
agent="Twisted PageGetter", timeout=0, cookies=None,
followRedirect=1, redirectLimit=20):
client.HTTPClientFactory.__init__(self, url, method, postdata,
headers, agent, timeout, cookies,
followRedirect, redirectLimit)
def page(self, page):
if self.waiting:
self.waiting = 0
res = {}
res['page'] = page
res['headers'] = self.response_headers
res['cookies'] = self.cookies
self.deferred.callback(res)
if __name__ == '__main__':
def cback(result):
for k in result:
print k, '==>', result[k]
reactor.stop()
def eback(error):
print error.getTraceback()
reactor.stop()
d = getPage('http://example.com', agent='example web client',
cookies={ 'some' : 'cookie' } )
d.addCallback(cback)
d.addErrback(eback)
reactor.run()

Categories

Resources