is Python's Http request synchronous? - python

I am new to Python, I want to understand if the Http request is Synchronous or Async? do i need to implement callbacks ?
I am using urllib2 module and below is the syntax:
content = urllib2.urlopen(urlnew).read()
At my server there are more than 30,000 records and for each one there will be an http call and the response received will be stored.
Any help appreciated.

Like most Python stuff, unless explicitely mentioned, urllib2 is synchronous. So the execution will block until the server responded.
So if you want to make 30,000 requests, you will have to do one request after the other one. An alternative would be to launch the requests in multiple processes (using multiprocessing) to parallelize it.
But the better option, especially since you seem to be in control of the server, would be to have it provide some kind of batch request that allows you to query multiple (or all) records at once.

Related

Flask: making a non-blocking requests call

Flask==2.0 (+uwsgi processes=5, threads=20)
Python: 3.9
I have a route that will accept a message and re-direct this message to other API(s) based on the type of message. Let's say there are 4 types of messages with 4 matching downstream APIs. The request is made using requests library and downstream API will return a response that my route will need to return to the client (so I need to wait for the response).
The issue is sometimes one of the downstream APIs can have an issue exhibiting high latency. This has an undesired result as this makes my application slow and impact other message types.
Is there a way to make these requests calls non-blocking so the slow downstream API responses don't slow down the whole app?
I read the flask async guide and from my understanding you can get real benefit if you're making multiple requests calls within a single route which isn't the case for me, it's always a single request.
Async requests to your downstream API could provide some relief in your case, I believe:
Async is beneficial when performing concurrent IO-bound tasks, but will probably not improve CPU-bound tasks.
Consider also using a queueing mechanism (such as background tasks in flask, or something more fully fledged such as RabbitMQ).
Final words - also consider defining timeout for your requests.
https://flask.palletsprojects.com/en/2.1.x/async-await/

Embed an http server in a long running python process

Using python 3. Lets say I am processing a loop around a large database query. Effectively my loop around the result set cursor can be a billion iterations.
I'd like to give a user an ability to call out to an http server embedded in the same process that would give some statistics on the progress of the query.
So far I have tried this with IOLoop using Tornado http server. The problem is that I have to basically transfer the control back to IOLoop on some number of rows to get the HTTP request to get serviced. That seems wasteful. Transferring that control has a price. Tornado would let me support multiple connections, but I don't actually care for that - one connection is fine.
What I would prefer would be to simply interrupt the loop, service the HTTP request and resume.
I guess this is probably open to too many possiblities...but using tornado I've just started an instance on a thread. Threading issues with python aside, it basically does what i want.

Performing a blocking request in django view

In one of the views in my django application, I need to perform a relatively lengthy network IO operation. The problem is other requests must wait for this request to be completed even though they have nothing to do with it.
I did some research and stumbled upon Celery but as I understand, it is used to perform background tasks independent of the request. (so I can not use the result of the task for the response to the request)
Is there a way to process views asynchronously in django so while the network request is pending other requests can be processed?
Edit: What I forgot to mention is that my application is a web service using django rest framework. So the result of a view is a json response not a page that I can later modify using AJAX.
The usual solution here is to offload the task to celery, and return a "please wait" response in your view. If you want, you can then use an Ajax call to periodically hit a view that will report whether the response is ready, and redirect when it is.
You want to maintain that HTTP connection for an extended period of time but still allow other requests to be managed, right? There's no simple solution to this problem. Also, any solution will be a level away from Django as it depends on how you process requests.
I don't know what you're currently using, so I can only tell you how I handled this in the past... I was using uwsgi to provide the WSGI interface between my python application and nginx. In uwsgi I used the asynchronous functions to suspend my long running connection when there was time to wait on the IO connections. The methods allow you to ask it to suspend things until there is something to read or write and then allow other connections to be serviced.
The above mentioned async calls use "green threads". It's much lighter weight then regular threads and you have control over when you move from thread to thread.
I am not saying that it is a good solution for your scenario[1], but the simple answer is using the following pattern:
async_result = some_task.delay(arg1)
result = async_result.get()
Check documentation for the get method. And instead of using the delay method you can use anything that returns an AsyncResult (like the apply_async method
[1] Why it may be a bad idea? Having an ongoing connection waiting a lot is bad for Django (it is not ready for long-lived connections), may conflict with the proxy configuration (if there is a reverse proxy somewhere) and may be identified as a timeout from the browser. So... it seems a Bad Idea[TM] to use this pattern for a Django Rest Framework view.

Consume REST API from Python: do I need an async library?

I have a REST API and now I want to create a web site that will use this API as only and primary datasource. The system is distributed: REST API is on one group of machines and the site will be on the other(s).
I'm expecting to have quite a lot of load, so I'd like to make requests as efficient, as possible.
Do I need some async HTTP requests library or any HTTP client library will work?
API is done using Flask, web site will be also built using Flask and Jinja as template engine.
You could use gevent with Flask to get asynchronous I/O from normally synchronous libraries. See this question for an example of someone getting help with doing that.
You could also run Flask behind gunicorn, which has support for spawning multiple workers (threads, processes, or greenlets) for handling concurrent requests. If you were to take that approach, Flask would remain completely synchronous, and gunicorn would handle creating multiple Flask instances to handle concurrent requests.
Start simple and use the way, which seems to be easy to use for you. Consider optimization to be done later on only if needed.
Use of async libraries would come into play as helpful if you would have thousands of request a second. Much sooner you are likely to have performance problems related to database (if you use it), which is not to be resolved by async magic.
Actually your API is on a separate machine. Even if you make your client calls asynchronous , it will not have any impact on the server. By making your calls asynchronous, your thread in the client will not wait for the response. Your server will react the same when call is sync /async.
And if you want to make your calls async , please check http://stackandqueue.com/?p=57 . It uses unirest to make both get and post async calls

Python Socket and Thread pooling, how to get more performance?

I am trying to implement a basic lib to issue HTTP GET requests. My target is to receive data through socket connections - minimalistic design to improve performance - usage with threads, thread pool(s).
I have a bunch of links which I group by their hostnames, so here's a simple demonstration of input URLs:
hostname1.com - 500 links
hostname2.org - 350 links
hostname3.co.uk - 100 links
...
I intend to use sockets because of performance issues. I intend to use a number of sockets which keeps connected (if possible and it usually is) and issue HTTP GET requests. The idea came from urllib low performance on continuous requests, then I met urllib3, then I realized it uses httplib and then I decided to try sockets. So here's what I accomplished till now:
GETSocket class, SocketPool class, ThreadPool and Worker classes
GETSocket class is a minified, "HTTP GET only" version of Python's httplib.
So, I use these classes like that:
sp = Comm.SocketPool(host,size=self.poolsize, timeout=5)
for link in linklist:
pool.add_task(self.__get_url_by_sp, self.count, sp, link, results)
self.count += 1
pool.wait_completion()
pass
__get_url_by_sp function is a wrapper which calls sp.urlopen and saves the result to results list. I am using a pool of 5 threads which has a socket pool of 5 GETSocket classes.
What I wonder is, is there any other possible way that I can improve performance of this system?
I've read about asyncore here, but I couldn't figure out how to use same socket connection with class HTTPClient(asyncore.dispatcher) provided.
Another point, I don't know if I'm using a blocking or a non-blocking socket, which would be better for performance or how to implement which one.
Please be specific about your experiences, I don't intend to import another library to do just HTTP GET so I want to code my own tiny library.
Any help appreciated, thanks.
Do this.
Use multiprocessing. http://docs.python.org/library/multiprocessing.html.
Write a worker Process which puts all of the URL's into a Queue.
Write a worker Process which gets a URL from a Queue and does a GET, saving a file and putting the File information into another Queue. You'll probably want multiple copies of this Process. You'll have to experiment to find how many is the correct number.
Write a worker Process which reads file information from a Queue and does whatever it is that you're trying do.
I finally found a well chosen path to solve my problems. I was using Python 3 for my project and my only option was to use pycurl, so this made me have to port my project back to Python 2.7 series.
Using pycurl, I gained:
- Consistent responses to my requests (actually my script has to deal with minimum 10k URLs)
- With the usage of ThreadPool class I am receiving responses as fast as my system can (received data is processed later - so multiprocessing is not much of a possibility here)
I tried httplib2 first, I realized that it is not acting as solid as it acts on Python 2, by switching to pycurl I lost caching support.
Final conclusion: When it comes to HTTP communication, one could need a tool like (py)curl at his disposal. It is a lifesaver, especially when one is dealing with loads of URLs (try sometimes for fun: you will get lots of weird responses from them)
Thanks for the replies, folks.

Categories

Resources