I have several Flask servers which handle POST requests and returns some values. I need to run an infinite process which sends requests to all servers, waiting until first response, update internal state based on that response and send new request to that server again. Here is a pseudocode for this:
obj = SomeObject()
requests = [obj.make_request() for _ in range(10)]
responses = grequests.imap(requests, size=10)
for response in responses:
obj.update_state(response)
requests.append(obj.make_request())
What is the proper implementation of such logic in python?
Related
Good day,
I am currently trying to figure out a way to make non blocking requests inside a simple script of mitmproxy, but the documentation doesn't seem to be clear for me for the first look.
I think it's probably the easiest if I show my current code and describe my issue below:
from copy import copy
from mitmproxy import http
def request(flow: http.HTTPFlow):
headers = copy(flow.request.headers)
headers.update({"Authorization": "<removed>", "Requested-URI": flow.request.pretty_url})
req = http.HTTPRequest(
first_line_format="origin_form",
scheme=flow.request.scheme,
port=443,
path="/",
http_version=flow.request.http_version,
content=flow.request.content,
host="my.api.xyz",
headers=headers,
method=flow.request.method
)
print(req.get_text())
flow.response = http.HTTPResponse.make(
200, req.content,
)
Basically I would like to intercept any HTTP(S) request done and make a non blocking request to an API endpoint at https://my.api.xyz/ which should take all original headers and return a png screenshot of the originally requested URL.
However the code above produces an empty content and the print returns nothing either.
My issue seems to be related to: mtmproxy http get request in script and Resubmitting a request from a response in mitmproxy but I still couldn't figure out a proper way of sending requests inside mitmproxy.
The following piece of code probably does what you are looking for:
from copy import copy
from mitmproxy import http
from mitmproxy import ctx
from mitmproxy.addons import clientplayback
def request(flow: http.HTTPFlow):
ctx.log.info("Inside request")
if hasattr(flow.request, 'is_custom'):
return
headers = copy(flow.request.headers)
headers.update({"Authorization": "<removed>", "Requested-URI": flow.request.pretty_url})
req = http.HTTPRequest(
first_line_format="origin_form",
scheme='http',
port=8000,
path="/",
http_version=flow.request.http_version,
content=flow.request.content,
host="localhost",
headers=headers,
method=flow.request.method
)
req.is_custom = True
playback = ctx.master.addons.get('clientplayback')
f = flow.copy()
f.request = req
playback.start_replay([f])
It uses the clientplayback addon in order to send out the request. When this new request is sent, that will generate another request event which will then be an infinite loop. That is the reason for the is_custom attribute I added to the request there. If the request that generated this event is the one that we have created, then we don't want to create a new request from it.
I'm trying to get a resource from a url inside a route on a web server without blocking it since getting it sometimes takes 11 seconds+.
I switched from flask to aiohttp for this.
async def process(request):
data = await request.json()
req = urllib.request.Request(
request["resource_url"],
data=None,
headers=hdrs
)
# Do processing on the resource
But I'm not sure how to make the call and would it allow other calls to be made to this route while the resource is getting fetched?
This question already has answers here:
Flask hangs when sending a post request to itself
(2 answers)
Closed 5 years ago.
I was trying develop web application using flask, below is my code,
from sample import APIAccessor
#API
#app.route('/test/triggerSecCall',methods=['GET'])
def triggerMain():
resp = APIAccessor().trigger2()
return Response(json.dumps(resp), mimetype='application/json')
#app.route('/test/seccall',methods=['GET'])
def triggerSub():
resp = {'data':'called second method'}
return Response(json.dumps(resp), mimetype='application/json')
And my trigger method contains the following code,
def trigger2(self):
url = 'http:/127.0.0.1:5000/test/seccall'
response = requests.get(url)
response.raise_for_status()
responseJson = response.json()
if self.track:
print 'Response:::%s' %str(responseJson)
return responseJson
When I hit http://127.0.0.1:5000/test/seccall, I get the expected output. When I hit /test/triggerSecCall, the server stop responding. The request waits forever.
At this stage, I am not able to access any apis from anyother REST clients. When I force stop the server(Ctrl+C) I am getting response in the second REST client.
Why flask is not able to serve to internal service call?
I guess you are using the single threaded development server and not a WSGI setup for production.
Since the server has only one thread is can handle one request at a time. The first request will be executed, resulting in the requests.get(...) which will open a second request that can not be handled until the first request is complete, a dead lock.
The best solution would be to just call triggerSub() to get the result instead of using an HTTP request.
Currently taking a web scraping class with other students, and we are supposed to make ‘get’ requests to a dummy site, parse it, and visit another site.
The problem is, the content of the dummy site is only up for several minutes and disappears, and the content comes back up at a certain interval. During the time the content is available, everyone tries to make the ‘get’ requests, so mine just hangs until everyone clears up, and the content eventually disappears. So I end up not being able to successfully make the ‘get’ request:
import requests
from splinter import Browser
browser = Browser('chrome')
# Hangs here
requests.get('http://dummysite.ca').text
# Even if get is successful hangs here as well
browser.visit(parsed_url)
So my question is, what's the fastest/best way to make endless concurrent 'get' requests until I get a response?
Decide to use either requests or splinter
Read about Requests: HTTP for Humans
Read about Splinter
Related
Read about keep-alive
Read about blocking-or-non-blocking
Read about timeouts
Read about errors-and-exceptions
If you are able to get not hanging requests, you can think of repeated requests, for instance:
while True:
requests.get(...
if request is succesfull:
break
time.sleep(1)
Gevent provides a framework for running asynchronous network requests.
It can patch Python's standard library so that existing libraries like requests and splinter work out of the box.
Here is a short example of how to make 10 concurrent requests, based on the above code, and get their response.
from gevent import monkey
monkey.patch_all()
import gevent.pool
import requests
pool = gevent.pool.Pool(size=10)
greenlets = [pool.spawn(requests.get, 'http://dummysite.ca')
for _ in range(10)]
# Wait for all requests to complete
pool.join()
for greenlet in greenlets:
# This will raise any exceptions raised by the request
# Need to catch errors, or check if an exception was
# thrown by checking `greenlet.exception`
response = greenlet.get()
text_response = response.text
Could also use map and a response handling function instead of get.
See gevent documentation for more information.
In this situation, concurrency will not help much since the server seems to be the limiting factor. One solution is to send a request with a timeout interval, if the interval has exceeded, then try the request again after a few seconds. Then gradually increase the time between retries until you get the data that you want. For instance, your code might look like this:
import time
import requests
def get_content(url, timeout):
# raise Timeout exception if more than x sends have passed
resp = requests.get(url, timeout=timeout)
# raise generic exception if request is unsuccessful
if resp.status_code != 200:
raise LookupError('status is not 200')
return resp.content
timeout = 5 # seconds
retry_interval = 0
max_retry_interval = 120
while True:
try:
response = get_content('https://example.com', timeout=timeout)
retry_interval = 0 # reset retry interval after success
break
except (LookupError, requests.exceptions.Timeout):
retry_interval += 10
if retry_interval > max_retry_interval:
retry_interval = max_retry_interval
time.sleep(retry_interval)
# process response
If concurrency is required, consider the Scrapy project. It uses the Twisted framework. In Scrapy you can replace time.sleep with reactor.callLater(fn, *args, **kw) or use one of hundreds of middleware plugins.
From the documentation for requests:
If the remote server is very slow, you can tell Requests to wait
forever for a response, by passing None as a timeout value and then
retrieving a cup of coffee.
import requests
#Wait potentially forever
r = requests.get('http://dummysite.ca', timeout=None)
#Check the status code to see how the server is handling the request
print r.status_code
Status codes beginning with 2 mean the request was received, understood, and accepted. 200 means the request was a success and the information returned. But 503 means the server is overloaded or undergoing maintenance.
Requests used to include a module called async which could send concurrent requests. It is now an independent module named grequests
which you can use to make concurrent requests endlessly until a 200 response:
import grequests
urls = [
'http://python-requests.org', #Just include one url if you want
'http://httpbin.org',
'http://python-guide.org',
'http://kennethreitz.com'
]
def keep_going():
rs = (grequests.get(u) for u in urls) #Make a set of unsent Requests
out = grequests.map(rs) #Send them all at the same time
for i in out:
if i.status_code == 200:
print i.text
del urls[out.index(i)] #If we have the content, delete the URL
return
while urls:
keep_going()
I'm working on a chat app for mobile that needs to handle 10k+ messages/s.
I want to save every message to database by sending post requests from tornado to django rest.
I don't know which is the best way to write my post request not to slow down the server.
This is my function:
def SaveToDatabase(endpoint, data):
#data= {"user_id": msg['username'], "room_id": 1, "message": msg['payload']}
req = urllib2.Request(endpoint)
req.add_header('Content-Type', 'application/json')
urllib2.urlopen(req, json.dumps(data))
Thanks!
Tornado is an asynchronous server, so you should send that post request in asynchronous way also. urllib2 will block whole worker waiting for response, that worker won't take any other requests until post is done. You should use httpclient from tornado:
request = httpclient.HTTPRequest(endpoint, body=json.dumps(data), method="POST", headers={"content-type": "application/json"})
response = yield http_client.fetch(request)
print response
All that code should be inside coroutine. You can also use AsyncHTTPClient that will call your callback when request is finished instead of creating proper coroutine for that.