How to speed up the process of multiple request in python? [duplicate] - python

I've constructed the following little program for getting phone numbers using google's place api but it's pretty slow. When I'm testing with 6 items it takes anywhere from 4.86s to 1.99s and I'm not sure why the significant change in time. I'm very new to API's so I'm not even sure what sort of things can/cannot be sped up, which sort of things are left to the webserver servicing the API and what I can change myself.
import requests,json,time
searchTerms = input("input places separated by comma")
start_time = time.time() #timer
searchTerms = searchTerms.split(',')
for i in searchTerms:
r1 = requests.get('https://maps.googleapis.com/maps/api/place/textsearch/json?query='+ i +'&key=MY_KEY')
a = r1.json()
pid = a['results'][0]['place_id']
r2 = requests.get('https://maps.googleapis.com/maps/api/place/details/json?placeid='+pid+'&key=MY_KEY')
b = r2.json()
phone = b['result']['formatted_phone_number']
name = b['result']['name']
website = b['result']['website']
print(phone+' '+name+' '+website)
print("--- %s seconds ---" % (time.time() - start_time))

You may want to send requests in parallel. Python provides multiprocessing module which is suitable for task like this.
Sample code:
from multiprocessing import Pool
def get_data(i):
r1 = requests.get('https://maps.googleapis.com/maps/api/place/textsearch/json?query='+ i +'&key=MY_KEY')
a = r1.json()
pid = a['results'][0]['place_id']
r2 = requests.get('https://maps.googleapis.com/maps/api/place/details/json?placeid='+pid+'&key=MY_KEY')
b = r2.json()
phone = b['result']['formatted_phone_number']
name = b['result']['name']
website = b['result']['website']
return ' '.join((phone, name, website))
if __name__ == '__main__':
terms = input("input places separated by comma").split(",")
with Pool(5) as p:
print(p.map(get_data, terms))

Use sessions to enable persistent HTTP connections (so you don't have to establish a new connection every time)
Docs: Requests Advanced Usage - Session Objects

Most of the time isn't spent computing your request. The time is spent in communication with the server. That is a thing you cannot control.
However, you may be able to speed it along using parallelization. Create a separate thread for each request as a start.
from threading import Thread
def request_search_terms(*args):
#your logic for a request goes here
pass
#...
threads = []
for st in searchTerms:
threads.append (Thread (target=request_search_terms, args=(st,)))
threads[-1].start()
for t in threads:
t.join();
Then use a thread pool as the number of request grows, this will avoid the overhead of repeated thread creation.

its a matter of latency between client and servers , you can't change anything in this way unless you use multiple server location ( the near server to the client are getting the request ) .
in term of performance you can build a multithreding system that can handel multiple requests at once .

There is no need to do multithreading yourself. grequests provides a quick drop-in replacement for requests.

Related

How to measure the number and response time of GET requests a backend can handle in a given time using Python?

I'm trying to measure how fast a backend I'm running locally can deal with GET requests.
To measure this, I'm planning to use a Python script that sends requests. Ideally, I would like to send as many requests as fast as possible for a given time (say 10 seconds) and then count all the responses that returned within that time, but not any that arrived later. Additionally, I would like to measure the reponse time for each individual request, so the time between sending it and a response arriving.
My first attempt looks like this:
async def scalability_test(seconds):
serviced = 0
total_response_time_micro = 0
timeout = time.time() + seconds
async with aiohttp.ClientSession() as session:
while time.time() < timeout:
async with session.get(url=BASE_URL + str(serviced + 1)) as resp:
time_before = datetime.datetime.now()
dummy = await resp.json()
print(dummy)
response_time_micro = (datetime.datetime.now().microsecond - time_before.microsecond)
print("This took " + str(response_time_micro) + " microseconds.")
total_response_time_micro += response_time_micro
serviced += 1
print("Number of requests serviced in " + str(seconds) + " seconds: " + str(serviced) + ".")
print("In total, the response time was " + str(total_response_time_micro) + " microseconds.")
print("On average, responses took " + str(total_response_time_micro / serviced) + " microseconds.")
This gives me a realistic number of serviced requests, but I'm not sure if that's all it managed to send, or only the ones that got back in time. Additionally, the response time for each individual request seems very low, so I think I'm doing something wrong when it comes to timing it.
My issue is that running it completely asynchronously seems to make measuring the time hard (impossible?), but if I await everything, it just turns into a synchronous function.
Is what I'm asking even possible? Any help would be greatly appreciated.
Maybe another way to increase the load would be to use a multiprocessing.Pool to ensure that you run several parallel threads of this same function.
from multiprocessing import Pool
if __name__ == '__main__':
with Pool(5) as p: # 5 is the number of threads you want to launch, prefer a number close to you actual processor number of threads
p.map(scalability_test, [seconds_1, seconds_2, seconds_3]) # this should launch one job for each member of [second...] so in your case you can use at least one per thread
However, to measure time, I think you have to use callbacks not to block the thread for each call. I don't how to do it using aiohttp

How to quickly send messages to azure queue storage using python?

I am trying to send a large amount of messages (tens of millions) to azure using the python azure.storage.queue library however it is taking a very long time to do so. The code I am using is below:
from azure.storage.queue import (
QueueClient,
BinaryBase64EncodePolicy,
BinaryBase64DecodePolicy
)
messages = [example list of messages]
connectionString = "example connection string"
queueName = "example-queue-name"
queueClient = QueueClient.from_connection_string(connectionString, queueName)
for message in messages:
queueClient.send_message(message)
Currently it is taking in the region of 3 hours to send around 70,000 messages which is significantly too slow considering the potential number of messages that need to be sent.
I have looked through the documentation to try and find a batch option but none seem to exist: https://learn.microsoft.com/en-us/python/api/azure-storage-queue/azure.storage.queue.queueclient?view=azure-python
I also wondered if anyone had any experience using the asynchio library to speed this process up and could suggest how to use it?
Try this:
from azure.storage.queue import (
QueueClient,
BinaryBase64EncodePolicy,
BinaryBase64DecodePolicy
)
from concurrent.futures import ProcessPoolExecutor
import time
messages = []
messagesP1 = messages[:len(messages)//2]
messagesP2 = messages[len(messages)//2:]
print(len(messagesP1))
print(len(messagesP2))
connectionString = "<conn str>"
queueName = "<queue name>"
queueClient = QueueClient.from_connection_string(connectionString, queueName)
def pushThread(messages):
for message in messages:
queueClient.send_message(message)
def callback_function(future):
print('Callback with the following result', future.result())
tic = time.perf_counter()
def main():
with ProcessPoolExecutor(max_workers=2) as executor:
future = executor.submit(pushThread, messagesP1)
future.add_done_callback(callback_function)
future2 = executor.submit(pushThread, messagesP2)
while True:
if(future.running()):
print("Task 1 running")
if(future2.running()):
print("Task 2 running")
if(future.done() and future2.done()):
print(future.result(), future2.result())
break
if __name__ == '__main__':
main()
toc = time.perf_counter()
print(f"spent {toc - tic:0.4f} seconds")
As you can see I split the message array into 2 parts and use 2 tasks to push data into the queue concurrently. Per my test, I have about 800 messages and it spends me 94s to push all messages:
But use the way above, it spends me 48s:

Waiting for API response in python3

(background)
I have an ERP application which is managed from a Weblogic Console. Recently we noticed that the same activities that we perform from the console can be performed using the vendor provided REST API calls. So we wanted to utilize this approach programatically and try to build some automations.
This is the page from where we can control one of the instance ConsoleImage
The same button acts as Stop and Start to manage the start and stop instance.
Both the start and stop have different API calls which makes sense.
The complete API doc is at : https://docs.oracle.com/cd/E61420_01/doc.92/e80710/smcrestapis.htm#BABFHBJI
(Now)
I wrote a program in python using the request method to call these APIs and it works fine.
The API response can take anywhere between 20 to 30 seconds when I use the stopInstance API
And normally takes 60 to 90 seconds when I use the startInstance API, but if there is an issue when starting the instance it takes more than 300 seconds and goes into indefinate wait.
My problem is, while starting an instance I want to wait maximum only for 100 seconds for the response. If it takes more than 100 seconds the program should display a message like "Instance was not able to start in 100 seconds"
This is my program. I am taking input from a text file and all the values present there have been verified.
import requests
import json
import importlib.machinery
import importlib.util
import numpy
import time
import sys
loader = importlib.machinery.SourceFileLoader('SM','sm_details.txt')
spec = importlib.util.spec_from_loader(loader.name, loader)
mod = importlib.util.module_from_spec(spec)
loader.exec_module(mod)
username = str(mod.username)
password = str(mod.password)
hostname = str(mod.servermanagerHostname)
portnum = str(mod.servermanagerPort)
instanceDetails = numpy.array(mod.instanceName)
authenticationAPI = "http://"+hostname+":"+portnum+"/manage/mgmtrestservice/authenticate"
startInstanceAPI = "http://"+hostname+":"+portnum+"/manage/mgmtrestservice/startinstance"
headers = {
'Content-Type':'application/json',
'Cache-Control':'no-cache',
}
data = {}
data['username']= username
data['password']= password
instanceNameDict = {'instanceName':''}
#Authentication request and storing token
response = requests.post(authenticationAPI, data=json.dumps(data), headers=headers)
token = response.headers['TOKEN']
head2 = {}
head2['TOKEN']=token
def start(instance):
print(f'\nTrying to start instance : '+instance['instanceName'])
startInstanceResponse = requests.post(startInstanceAPI,data=json.dumps(instance), headers=head2) #this is where the program is stuck and it does not move to the time.sleep step
time.sleep(100)
if startInstanceResponse.status_code == 200:
print('Instance '+instance['instanceName']+' started.')
else:
print('Could not start instance in 100 seconds')
sys.exit(1)
I would suggest you to use the timeout parameter in requests:
requests.post(startInstanceAPI,data=json.dumps(instance), headers=head2, timeout=100.0)
You can tell Requests to stop waiting for a response after a given
number of seconds with the timeout parameter. Nearly all production
code should use this parameter in nearly all requests. Failure to do
so can cause your program to hang indefinitely.
Source
Here's the requests timeout documentation, you will also find more details in there and Exception handling.

how to send muliptle requests and make sure the response comes back within a second in python

I am trying to validate what the throttle limit for an endpoint using python code.
Basically I have set Throttlelimit on the endpoint I am testing is 3calls/sec. The test does 4 calls and checks the status codes to have atleast 1 429 response.
The validation I have fails sometimes because it looks like the responses take more than a second to respond. The code I tried are:
Method1:
request = requests.Request(method='GET', url=GLOBALS["url"], params=context.payload, headers=context.headers)
context.upperlimit = int(GLOBALS["ThrottleLimit"]) + 1
reqs = [request for i in range(0, context.upperlimit)]
with BaseThrottler(name='base-throttler', reqs_over_time=(context.upperlimit, 1)) as bt:
throttled_requests = bt.multi_submit(reqs)
context.responses = [tr.response for tr in throttled_requests]
assert(429 in [ i.status_code for i in context.responses])
Method2:
request = requests.get(url=GLOBALS["url"], params=context.payload, headers=context.headers)
url = request.url
urls = set([])
for i in range(0, context.upperlimit):
urls.add(grequests.get(url))
context.responses = grequests.map(urls)
assert(429 in [ i.status_code for i in context.responses])
Is there a way that I can make sure all the responses came back in the same second and if not it should try again before failing the test.
I suppose you are using requests and grequests library. You can set a timeout as explained in the docs and also for grequests.
Plain requests
requests.get(url, timeout=1)
Using grequests
grequests.get(url, timeout=1)
Timeout value is "number of seconds"
Using timeout won't necessarily ensure the condition that you are looking for, which is that all 4 requests were received by the endpoint within one second (not that each individual response was received within one second of sending the request).
One quick and dirty way to solve this is to simply time the execution of the code, and ensure that all responses were received in less than a second (using the timeit module)
start_time = timeit.default_timer()
context.responses = grequests.map(urls)
elapsed = timeit.default_timer() - start_time
if elapsed < 1:
assert(429 in [ i.status_code for i in context.responses])
This is crude because it is checking round trip time, but will ensure that all requests were received within a second. If you need more specificity, or find that the condition is not met often enough, you could add a header to the response with the exact time the request was received by the endpoint, and then verify that all requests hit the endpoint within one second of each other.

How to open Post Urls in multithreads in python

I am using python 2.7 on Windows machine. I have an array of urls accompanied by data and headers, so POST method is required.
In simple execution it works well:
rescodeinvalid =[]
success = []
for i in range(0,len(HostArray)):
data = urllib.urlencode(post_data)
req = urllib2.Request(HostArray[i], data)
response = urllib2.urlopen(req)
rescode=response.getcode()
if responsecode == 400:
rescodeinvalid.append(HostArray[i])
if responsecode == 200:
success.append(HostArray[i])
My question is if HostArray length is very large, then it is taking much time in loop.
So, how to check each url of HostArray in a multithread. If response code of each url is 200, then I am doing different operation. I have arrays to store 200 and 400 responses.
So, how to do this in multithread in python
If you want to do each one in a separate thread you could do something like:
rescodeinvalid =[]
success = []
def post_and_handle(url,post_data)
data = urllib.urlencode(post_data)
req = urllib2.Request(url, data)
response = urllib2.urlopen(req)
rescode=response.getcode()
if responsecode == 400:
rescodeinvalid.append(url) # Append is thread safe
elif responsecode == 200:
success.append(url) # Append is thread safe
workers = []
for i in range(0,len(HostArray)):
t = threading.Thread(target=post_and_handle,args=(HostArray[i],post_data))
t.start()
workers.append(t)
# Wait for all of the requests to complete
for t in workers:
t.join()
I'd also suggest using requests: http://docs.python-requests.org/en/latest/
as well as a thread pool:
Threading pool similar to the multiprocessing Pool?
Thread pool usage:
from multiprocessing.pool import ThreadPool
# Done here because this must be done in the main thread
pool = ThreadPool(processes=50) # use a max of 50 threads
# do this instead of Thread(target=func,args=args,kwargs=kwargs))
pool.apply_async(func,args,kwargs)
pool.close() # I think
pool.join()
scrapy uses twisted library to call multiple urls in parallel without the overhead of opening a new thread per request, it also manage internal queue to accumulate and even prioritize them as a bonus you can also restrict number of parallel requests by settings maximum concurrent requests, you can either launch a scrapy spider as an external process or from your code, just set spider start_urls = HostArray
Your case (basically processing a list into another list) looks like an ideal candidate for concurrent.futures (see for example this answer) or you may go all the way to Executor.map. And of course use ThreadPoolExecutor to limit the number of concurrently running threads to something reasonable.

Categories

Resources