How can I use Twisted's ThrottlingFactory with their web client? - python

Problem
I need to execute HTTP requests and simulate high latency at the same time. I have encountered the Twisted package in Python which includes both an HTTP client and a ThrottlingFactory. The issue I am encountering is that the documentation is not clear for a newcomer and I am having trouble understanding how I could utilize the ThrottlingFactory within API calls using the HTTP client.
I am currently utilizing the following example code to test things out. Nothing has worked so far.
from sys import argv
from pprint import pformat
from twisted.internet.task import react
from twisted.web.client import Agent, readBody
from twisted.web.http_headers import Headers
def cbRequest(response):
print("Response version:", response.version)
print("Response code:", response.code)
print("Response phrase:", response.phrase)
print("Response headers:")
print(pformat(list(response.headers.getAllRawHeaders())))
d = readBody(response)
d.addCallback(cbBody)
return d
def cbBody(body):
print("Response body:")
print(body)
def main(reactor, url=b"http://httpbin.org/get"):
agent = Agent(reactor)
d = agent.request(
b"GET", url, Headers({"User-Agent": ["Twisted Web Client Example"]}), None
)
d.addCallback(cbRequest)
return d
react(main, argv[1:])
How can I use the ThrottlingFactory in this example?

You're right - this composition is awkward, and it should be better documented, and arguably have a nicer API!
Still, you can accomplish this by putting a proxy between your application and the reactor.
It would look like this:
from sys import argv
from pprint import pformat
from dataclasses import dataclass
from twisted.internet.task import react
from twisted.internet.interfaces import IReactorTCP
from twisted.web.client import Agent, readBody
from twisted.web.http_headers import Headers
from twisted.protocols.policies import ThrottlingFactory
def cbRequest(response):
print("Response version:", response.version)
print("Response code:", response.code)
print("Response phrase:", response.phrase)
print("Response headers:")
print(pformat(list(response.headers.getAllRawHeaders())))
d = readBody(response)
d.addCallback(cbBody)
return d
def cbBody(body):
print("Response body:")
print(len(body))
#dataclass
class SlowReactorProxy:
original: IReactorTCP
def __getattr__(self, name):
return getattr(self.original, name)
def connectTCP(self, host, port, factory, timeout=30, bindAddress=None):
return self.original.connectTCP(
host, port, ThrottlingFactory(factory, readLimit=0.1), timeout, bindAddress
)
def main(reactor, url=b"http://httpbin.org/bytes/10485760000"):
agent = Agent(SlowReactorProxy(reactor))
d = agent.request(
b"GET", url, Headers({"User-Agent": ["Twisted Web Client Example"]}), None
)
d.addCallback(cbRequest)
return d
react(main, argv[1:])
However, unfortunately, ThrottlingFactory's algorithm for throttling traffic is quite primitive; there's just a timer that fires once per second and pauses everyone if too much data has been consumed. This means that you will be reading at maximum speed with zero throttling for an entire second at a time, then, having exhausted that quota, pause for a commensurately long period of time. On my (gigabit) network, I cannot get a large enough entity-body out of httpbin (the max size seems to be 102400) in order to be producing data for longer than a second, so no throttling will ever take place in this scenario.
Hopefully this will help you accomplish your task, but I'd encourage you to file a bug on twisted in order to make the composition of HTTP and throttling somewhat more graceful and responsive.

Related

requests - Gateway Timeout

this is a test script to request data from Rovi API, provided by the API itself.
test.py
import requests
import time
import hashlib
import urllib
class AllMusicGuide(object):
api_url = 'http://api.rovicorp.com/data/v1.1/descriptor/musicmoods'
key = 'my key'
secret = 'secret'
def _sig(self):
timestamp = int(time.time())
m = hashlib.md5()
m.update(self.key)
m.update(self.secret)
m.update(str(timestamp))
return m.hexdigest()
def get(self, resource, params=None):
"""Take a dict of params, and return what we get from the api"""
if not params:
params = {}
params = urllib.urlencode(params)
sig = self._sig()
url = "%s/%s?apikey=%s&sig=%s&%s" % (self.api_url, resource, self.key, sig, params)
resp = requests.get(url)
if resp.status_code != 200:
# THROW APPROPRIATE ERROR
print ('unknown err')
return resp.content
from another script I import the module:
from roviclient.test import AllMusicGuide
and create an instance of the class inside a mood function:
def mood():
test = AllMusicGuide()
print (test.get('[moodids=moodids]'))
according to documentation, the following is the syntax for requests:
descriptor/musicmoods?apikey=apikey&sig=sig [&moodids=moodids] [&format=format] [&country=country] [&language=language]
but running the script I get the following error:
unknown err
<h1>Gateway Timeout</h1>:
what is wrong?
"504, try once more. 502, it went through."
Your code is fine, this is a network issue. "Gateway Timeout" is a 504. The intermediate host handling your request was unable to complete it. It made its own request to another server on your behalf in order to handle yours, but this request took too long and timed out. Usually this is because of network congestion in the backend; if you try a few more times, does it sometimes work?
In any case, I would talk to your network administrator. There could be any number of reasons for this and they should be able to help fix it for you.

Python requests module multithreading

Is there a possible way to speed up my code using multiprocessing interface? The problem is that this interface uses map function, which works only with 1 function. But my code has 3 functions. I tried to combine my functions into one, but didn't get success. My script reads the URL of site from file and performs 3 functions over it. For Loop makes it very slow, because I got a lot of URLs
import requests
def Login(url): #Log in
payload = {
'UserName_Text' : 'user',
'UserPW_Password' : 'pass',
'submit_ButtonOK' : 'return buttonClick;'
}
try:
p = session.post(url+'/login.jsp', data = payload, timeout=10)
except (requests.exceptions.ConnectionError, requests.exceptions.Timeout):
print "site is DOWN! :", url[8:]
session.cookies.clear()
session.close()
else:
print 'OK: ', p.url
def Timer(url): #Measure request time
try:
timer = requests.get(url+'/login.jsp').elapsed.total_seconds()
except (requests.exceptions.ConnectionError):
print 'Request time: None'
print '-----------------------------------------------------------------'
else:
print 'Request time:', round(timer, 2), 'sec'
def Logout(url): # Log out
try:
logout = requests.get(url+'/logout.jsp', params={'submit_ButtonOK' : 'true'}, cookies = session.cookies)
except(requests.exceptions.ConnectionError):
pass
else:
print 'Logout '#, logout.url
print '-----------------------------------------------------------------'
session.cookies.clear()
session.close()
for line in open('text.txt').read().splitlines():
session = requests.session()
Login(line)
Timer(line)
Logout(line)
Yes, you can use multiprocessing.
from multiprocessing import Pool
def f(line):
session = requests.session()
Login(session, line)
Timer(session, line)
Logout(session, line)
if __name__ == '__main__':
urls = open('text.txt').read().splitlines()
p = Pool(5)
print(p.map(f, urls))
The requests session cannot be global and shared between workers, every worker should use its own session.
You write that you already "tried to combine my functions into one, but didn't get success". What exactly didn't work?
There are many ways to accomplish your task, but multiprocessing is not needed at that level, it will just add complexity, imho.
Take a look at gevent, greenlets and monkey patching, instead!
Once your code is ready, you can wrap a main function into a gevent loop, and if you applied the monkey patches, the gevent framework will run N jobs concurrently (you can create a jobs pool, set the limits of concurrency, etc.)
This example should help:
#!/usr/bin/python
# Copyright (c) 2009 Denis Bilenko. See LICENSE for details.
"""Spawn multiple workers and wait for them to complete"""
from __future__ import print_function
import sys
urls = ['http://www.google.com', 'http://www.yandex.ru', 'http://www.python.org']
import gevent
from gevent import monkey
# patches stdlib (including socket and ssl modules) to cooperate with other greenlets
monkey.patch_all()
if sys.version_info[0] == 3:
from urllib.request import urlopen
else:
from urllib2 import urlopen
def print_head(url):
print('Starting %s' % url)
data = urlopen(url).read()
print('%s: %s bytes: %r' % (url, len(data), data[:50]))
jobs = [gevent.spawn(print_head, url) for url in urls]
gevent.wait(jobs)
You can find more here and in the Github repository, from where this example comes from
P.S.
Greenlets will works with requests as well, you don't need to change your code.

Twisted: Advise on using txredisapi library required

Below I provided a code example which simply respond to HTTP GET request with the data from Redis:
Request: http://example.com:8888/?auth=zefDWDd5mS7mcbfoDbDDf4eVAKb1nlDmzLwcmhDOeUc
Response: get: u'"True"'
The purpose of this code is to serve as a REST server (that's why I'm using lazyConnectionPool) responding to the requests, and using data from Redis (read/ write).
What I need to do:
Run multiple requests to Redis inside render_GET of the IndexHandler (like GET, HMGET, SET, etc)
Run multiple requests in a transaction inside render_GET of the IndexHandler
I've tried multiple ways to do that (including examples from the txredisapi library), but due to lack of experience failed to do that. Could you please advise on questions 1) and 2).
Thanks in advance.
import txredisapi as redis
from twisted.application import internet
from twisted.application import service
from twisted.web import server
from twisted.web.resource import Resource
class Root(Resource):
isLeaf = False
class BaseHandler(object):
isLeaf = True
def __init__(self, db):
self.db = db
Resource.__init__(self)
class IndexHandler(BaseHandler, Resource):
def _success(self, value, request, message):
request.write(message % repr(value))
request.finish()
def _failure(self, error, request, message):
request.write(message % str(error))
request.finish()
def render_GET(self, request):
try:
auth = request.args["auth"][0]
except:
request.setResponseCode(404, "not found")
return ""
d = self.db.hget(auth, 'user_add')
d.addCallback(self._success, request, "get: %s\n")
d.addErrback(self._failure, request, "get failed: %s\n")
return server.NOT_DONE_YET
# Redis connection parameters
REDIS_HOST = '10.10.0.110'
REDIS_PORT = 6379
REDIS_DB = 1
REDIS_POOL_SIZE = 1
REDIS_RECONNECT = True
# redis connection
_db = redis.lazyConnectionPool(REDIS_HOST, REDIS_PORT, REDIS_DB, REDIS_POOL_SIZE)
# http resources
root = Root()
root.putChild("", IndexHandler(_db))
application = service.Application("web")
srv = internet.TCPServer(8888, server.Site(root), interface="127.0.0.1")
srv.setServiceParent(application)
Regarding first question:
There is a few ways to generalize to making multiple database requests in a single HTTP request.
For example you can make multiple requests:
d1 = self.db.hget(auth, 'user_add')
d2 = self.db.get('foo')
Then you can get a callback to trigger when all of these simultaneous requests are finished (see twisted.internet.defer.DeferredList).
Or you can use inlineCallbacks if you need sequential requests. For example:
#inlineCallbacks
def do_redis(self):
foo = yield self.db.get('somekey')
bar = yield self.db.hget(foo, 'bar') # Get 'bar' field of hash foo
But you will need to read more about combining inlineCallbacks with twisted.web (there are SO questions on that topic you should look up).
Regarding question 2:
Transactions are really ugly to do without using inlineCallbacks. There is an example at txredisapi homepage that shows it using inlineCallbacks.

Nested web service calls with tornado (async?)

I am implementing a SOAP web service using tornado (and the third party tornadows module). One of the operations in my service needs to call another so I have the chain:
External request in (via SOAPUI) to operation A
Internal request (via requests module) in to operation B
Internal response from operation B
External response from operation A
Because it is all running in one service it is being blocked somewhere though. I'm not familiar with tornado's async functionality.
There is only one request handling method (post) because everything comes in on the single url and then the specific operation (method doing processing) is called based on the SOAPAction request header value. I have decorated my post method with #tornado.web.asynchronous and called self.finish() at the end but no dice.
Can tornado handle this scenario and if so how can I implement it?
EDIT (added code):
class SoapHandler(tornado.web.RequestHandler):
#tornado.web.asynchronous
def post(self):
""" Method post() to process of requests and responses SOAP messages """
try:
self._request = self._parseSoap(self.request.body)
soapaction = self.request.headers['SOAPAction'].replace('"','')
self.set_header('Content-Type','text/xml')
for operations in dir(self):
operation = getattr(self,operations)
method = ''
if callable(operation) and hasattr(operation,'_is_operation'):
num_methods = self._countOperations()
if hasattr(operation,'_operation') and soapaction.endswith(getattr(operation,'_operation')) and num_methods > 1:
method = getattr(operation,'_operation')
self._response = self._executeOperation(operation,method=method)
break
elif num_methods == 1:
self._response = self._executeOperation(operation,method='')
break
soapmsg = self._response.getSoap().toprettyxml()
self.write(soapmsg)
self.finish()
except Exception as detail:
#traceback.print_exc(file=sys.stdout)
wsdl_nameservice = self.request.uri.replace('/','').replace('?wsdl','').replace('?WSDL','')
fault = soapfault('Error in web service : {fault}'.format(fault=detail), wsdl_nameservice)
self.write(fault.getSoap().toxml())
self.finish()
This is the post method from the request handler. It's from the web services module I'm using (so not my code) but I added the async decorator and self.finish(). All it basically does is call the correct operation (as dictated in the SOAPAction of the request).
class CountryService(soaphandler.SoapHandler):
#webservice(_params=GetCurrencyRequest, _returns=GetCurrencyResponse)
def get_currency(self, input):
result = db_query(input.country, 'currency')
get_currency_response = GetCurrencyResponse()
get_currency_response.currency = result
headers = None
return headers, get_currency_response
#webservice(_params=GetTempRequest, _returns=GetTempResponse)
def get_temp(self, input):
get_temp_response = GetTempResponse()
curr = self.make_curr_request(input.country)
get_temp_response.temp = curr
headers = None
return headers, get_temp_response
def make_curr_request(self, country):
soap_request = """<soapenv:Envelope xmlns:soapenv='http://schemas.xmlsoap.org/soap/envelope/' xmlns:coun='CountryService'>
<soapenv:Header/>
<soapenv:Body>
<coun:GetCurrencyRequestget_currency>
<country>{0}</country>
</coun:GetCurrencyRequestget_currency>
</soapenv:Body>
</soapenv:Envelope>""".format(country)
headers = {'Content-Type': 'text/xml;charset=UTF-8', 'SOAPAction': '"http://localhost:8080/CountryService/get_currency"'}
r = requests.post('http://localhost:8080/CountryService', data=soap_request, headers=headers)
try:
tree = etree.fromstring(r.content)
currency = tree.xpath('//currency')
message = currency[0].text
except:
message = "Failure"
return message
These are two of the operations of the web service (get_currency & get_temp). So SOAPUI hits get_temp, which makes a SOAP request to get_currency (via make_curr_request and the requests module). Then the results should just chain back and be sent back to SOAPUI.
The actual operation of the service makes no sense (returning the currency when asked for the temperature) but i'm just trying to get the functionality working and these are the operations I have.
I don't think that your soap module, or requests is asyncronous.
I believe adding the #asyncronous decorator is only half the battle. Right now you aren't making any async requests inside of your function (every request is blocking, which ties up the server until your method finishes)
You can switch this up by using tornados AsynHttpClient. This can be used pretty much as an exact replacement for requests. From the docoumentation example:
class MainHandler(tornado.web.RequestHandler):
#tornado.web.asynchronous
def get(self):
http = tornado.httpclient.AsyncHTTPClient()
http.fetch("http://friendfeed-api.com/v2/feed/bret",
callback=self.on_response)
def on_response(self, response):
if response.error: raise tornado.web.HTTPError(500)
json = tornado.escape.json_decode(response.body)
self.write("Fetched " + str(len(json["entries"])) + " entries "
"from the FriendFeed API")
self.finish()
Their method is decorated with async AND they are making asyn http requests. This is where the flow gets a little strange. When you use the AsyncHttpClient it doesn't lock up the event loop (PLease I just started using tornado this week, take it easy if all of my terminology isn't correct yet). This allows the server to freely processs incoming requests. When your asynchttp request is finished the callback method will be executed, in this case on_response.
Here you can replace requests with the tornado asynchttp client realtively easily. For your soap service, though, things might be more complicated. You could make a local webserivce around your soap client and make async requests to it using the tornado asyn http client???
This will create some complex callback logic which can be fixed using the gen decorator
This issue was fixed since yesterday.
Pull request:
https://github.com/rancavil/tornado-webservices/pull/23
Example: here a simple webservice that doesn't take arguments and returns the version.
Notice you should:
Method declaration: decorate the method with #gen.coroutine
Returning results: use raise gen.Return(data)
Code:
from tornado import gen
from tornadows.soaphandler import SoapHandler
...
class Example(SoapHandler):
#gen.coroutine
#webservice(_params=None, _returns=Version)
def Version(self):
_version = Version()
# async stuff here, let's suppose you ask other rest service or resource for the version details.
# ...
# returns the result.
raise gen.Return(_version)
Cheers!

How do I unit test a module that relies on urllib2?

I've got a piece of code that I can't figure out how to unit test! The module pulls content from external XML feeds (twitter, flickr, youtube, etc.) with urllib2. Here's some pseudo-code for it:
params = (url, urlencode(data),) if data else (url,)
req = Request(*params)
response = urlopen(req)
#check headers, content-length, etc...
#parse the response XML with lxml...
My first thought was to pickle the response and load it for testing, but apparently urllib's response object is unserializable (it raises an exception).
Just saving the XML from the response body isn't ideal, because my code uses the header information too. It's designed to act on a response object.
And of course, relying on an external source for data in a unit test is a horrible idea.
So how do I write a unit test for this?
urllib2 has a functions called build_opener() and install_opener() which you should use to mock the behaviour of urlopen()
import urllib2
from StringIO import StringIO
def mock_response(req):
if req.get_full_url() == "http://example.com":
resp = urllib2.addinfourl(StringIO("mock file"), "mock message", req.get_full_url())
resp.code = 200
resp.msg = "OK"
return resp
class MyHTTPHandler(urllib2.HTTPHandler):
def http_open(self, req):
print "mock opener"
return mock_response(req)
my_opener = urllib2.build_opener(MyHTTPHandler)
urllib2.install_opener(my_opener)
response=urllib2.urlopen("http://example.com")
print response.read()
print response.code
print response.msg
It would be best if you could write a mock urlopen (and possibly Request) which provides the minimum required interface to behave like urllib2's version. You'd then need to have your function/method which uses it able to accept this mock urlopen somehow, and use urllib2.urlopen otherwise.
This is a fair amount of work, but worthwhile. Remember that python is very friendly to ducktyping, so you just need to provide some semblance of the response object's properties to mock it.
For example:
class MockResponse(object):
def __init__(self, resp_data, code=200, msg='OK'):
self.resp_data = resp_data
self.code = code
self.msg = msg
self.headers = {'content-type': 'text/xml; charset=utf-8'}
def read(self):
return self.resp_data
def getcode(self):
return self.code
# Define other members and properties you want
def mock_urlopen(request):
return MockResponse(r'<xml document>')
Granted, some of these are difficult to mock, because for example I believe the normal "headers" is an HTTPMessage which implements fun stuff like case-insensitive header names. But, you might be able to simply construct an HTTPMessage with your response data.
Build a separate class or module responsible for communicating with your external feeds.
Make this class able to be a test double. You're using python, so you're pretty golden there; if you were using C#, I'd suggest either in interface or virtual methods.
In your unit test, insert a test double of the external feed class. Test that your code uses the class correctly, assuming that the class does the work of communicating with your external resources correctly. Have your test double return fake data rather than live data; test various combinations of the data and of course the possible exceptions urllib2 could throw.
Aand... that's it.
You can't effectively automate unit tests that rely on external sources, so you're best off not doing it. Run an occasional integration test on your communication module, but don't include those tests as part of your automated tests.
Edit:
Just a note on the difference between my answer and #Crast's answer. Both are essentially correct, but they involve different approaches. In Crast's approach, you use a test double on the library itself. In my approach, you abstract the use of the library away into a separate module and test double that module.
Which approach you use is entirely subjective; there's no "correct" answer there. I prefer my approach because it allows me to build more modular, flexible code, something I value. But it comes at a cost in terms of additional code to write, something that may not be valued in many agile situations.
You can use pymox to mock the behavior of anything and everything in the urllib2 (or any other) package. It's 2010, you shouldn't be writing your own mock classes.
I think the easiest thing to do is to actually create a simple web server in your unit test. When you start the test, create a new thread that listens on some arbitrary port and when a client connects just returns a known set of headers and XML, then terminates.
I can elaborate if you need more info.
Here's some code:
import threading, SocketServer, time
# a request handler
class SimpleRequestHandler(SocketServer.BaseRequestHandler):
def handle(self):
data = self.request.recv(102400) # token receive
senddata = file(self.server.datafile).read() # read data from unit test file
self.request.send(senddata)
time.sleep(0.1) # make sure it finishes receiving request before closing
self.request.close()
def serve_data(datafile):
server = SocketServer.TCPServer(('127.0.0.1', 12345), SimpleRequestHandler)
server.datafile = datafile
http_server_thread = threading.Thread(target=server.handle_request())
To run your unit test, call serve_data() then call your code that requests a URL that looks like http://localhost:12345/anythingyouwant.
Why not just mock a website that returns the response you expect? then start the server in a thread in setup and kill it in the teardown. I ended up doing this for testing code that would send email by mocking an smtp server and it works great. Surely something more trivial could be done for http...
from smtpd import SMTPServer
from time import sleep
import asyncore
SMTP_PORT = 6544
class MockSMTPServer(SMTPServer):
def __init__(self, localaddr, remoteaddr, cb = None):
self.cb = cb
SMTPServer.__init__(self, localaddr, remoteaddr)
def process_message(self, peer, mailfrom, rcpttos, data):
print (peer, mailfrom, rcpttos, data)
if self.cb:
self.cb(peer, mailfrom, rcpttos, data)
self.close()
def start_smtp(cb, port=SMTP_PORT):
def smtp_thread():
_smtp = MockSMTPServer(("127.0.0.1", port), (None, 0), cb)
asyncore.loop()
return Thread(None, smtp_thread)
def test_stuff():
#.......snip noise
email_result = None
def email_back(*args):
email_result = args
t = start_smtp(email_back)
t.start()
sleep(1)
res.form["email"]= self.admin_email
res = res.form.submit()
assert res.status_int == 302,"should've redirected"
sleep(1)
assert email_result is not None, "didn't get an email"
Trying to improve a bit on #john-la-rooy answer, I've made a small class allowing simple mocking for unit tests
Should work with python 2 and 3
try:
import urllib.request as urllib
except ImportError:
import urllib2 as urllib
from io import BytesIO
class MockHTTPHandler(urllib.HTTPHandler):
def mock_response(self, req):
url = req.get_full_url()
print("incomming request:", url)
if url.endswith('.json'):
resdata = b'[{"hello": "world"}]'
headers = {'Content-Type': 'application/json'}
resp = urllib.addinfourl(BytesIO(resdata), header, url, 200)
resp.msg = "OK"
return resp
raise RuntimeError('Unhandled URL', url)
http_open = mock_response
#classmethod
def install(cls):
previous = urllib._opener
urllib.install_opener(urllib.build_opener(cls))
return previous
#classmethod
def remove(cls, previous=None):
urllib.install_opener(previous)
Used like this:
class TestOther(unittest.TestCase):
def setUp(self):
previous = MockHTTPHandler.install()
self.addCleanup(MockHTTPHandler.remove, previous)

Categories

Resources