aiohttp set number of requests per second - python

I'm writing an API in Flask with 1000+ requests to get data and I'd like to limit the number of requests per second. I tried with:
conn = aiohttp.TCPConnector(limit_per_host=20)
and
conn = aiohttp.TCPConnector(limit=20)
But is seems doesn't work
My code looks like this:
import logging
import asyncio
import aiohttp
logging.basicConfig(filename="logfilename.log", level=logging.INFO, format='%(asctime)s %(levelname)s:%(message)s')
async def fetch(session, url):
async with session.get(url, headers=headers) as response:
if response.status == 200:
data = await response.json()
json = data['args']
return json
async def fetch_all(urls, loop):
conn = aiohttp.TCPConnector(limit=20)
async with aiohttp.ClientSession(connector=conn, loop=loop) as session:
results = await asyncio.gather(*[fetch(session, url) for url in urls], return_exceptions=True)
return results
async def main():
loop = asyncio.new_event_loop()
url_list = []
args = ['a', 'b', 'c', +1000 others]
urls = url_list
for i in args:
base_url = 'http://httpbin.org/anything?key=%s' % i
url_list.append(base_url)
htmls = loop.run_until_complete(fetch_all(urls, loop))
for j in htmls:
key = j['key']
# save to database
logging.info(' %s was added', key)
If I run code, within 1s I send over than 200 requests. Is there any way to limit requests?

The code above works as expected (apart from a small error regarding headers being undefined).
Tested on my machine the httpbin URL responds in around 100ms which means that with a concurrency of 20 it will serve around 200 requests in 1 second (which is what you're seeing as well):
100 ms per request means 10 requests are completed in a second
10 requests per second with a concurrency of 20 means 200 requests in one second
The limit option (aiohttp.TCPConnector) limits the number of concurrent requests and does not have any time dimension.
To see the limit in action try with more values like 10, 20, 50:
# time to complete 1000 requests with different keys
aiohttp.TCPConnector(limit=10): 12.58 seconds
aiohttp.TCPConnector(limit=20): 6.57 seconds
aiohttp.TCPConnector(limit=50): 3.1 seconds
If you want to use a requests per second limit send batch of requests (20 for example) and use asyncio.sleep(1.0) to pause for a second, then send the next batch and so on.

Related

How to send many requests in minimal time with Python 3

My task is to send 30-100 post requests to one url in one exact precise moment of time. For example in 13:00:00.550 with several milliseconds accuracy.
Requests are differ from each other (some types, for example 10 types). And each type must send 5 times.
I have problem with fast sending of http requests. Is there the fastest way to send 30-100 post requests in minimal time?
I tried to use asyncio and httpx.AsyncClient to do it.
Here the part of code how I made it:
from datetime import datetime
import asyncio
import httpx
async def async_post(request_data):
time_to_sleep = 0.005
action_time = '13:00:00'
time_microseconds = 550000
async with httpx.AsyncClient(cookies=request_data['cookies']) as client:
while True:
now_time_second = datetime.now().strftime('%H:%M:%S')
if action_time==now_time_second:
break
await asyncio.sleep(0.05)
while True:
now_time_microsecond = datetime.now().strftime('%f')
if now_time_microsecond >= time_microseconds:
break
await asyncio.sleep(0.003)
for _ in range(5):
response = await client.post(request_data['url'],
headers = request_data['headers'],
params = request_data['params'],
data = request_data['data'],
timeout = 60)
logger.info('Time: ' + str(datetime.now().strftime('%H:%M:%S.%f')))
logger.info('Text: ' + str(response.text))
logger.info('Response time: ' + str(response.headers['Date']))
await asyncio.sleep(time_to_sleep)
def main():
loop = asyncio.get_event_loop()
loop.run_until_complete(
asyncio.gather(*[async_post(request_data) for request_data in all_requests_data]))
all_requests_data - list of all types of requests.
request_data - dict that contains data of request
As result - the time between requests can reach 70-200 ms. That's a lot. It does not suit for me.
And it's not server lag. I tried other application, and could see, that server can make answers in few miliseconds. So that is not on server side.
How to send requests faster?

Using multithreading for api requests in python

For my project I need to request a api and to store the result in a list. But the no. of requests I need to give more than 5000 with different body values. So, it take huge amount of time to complete. Is there is any way to parallely send the requests to complete the process quickly. I tried some threading code in this but I can't be able to figure out the ay to solve this.
import requests
res_list=[]
l=[19821, 29674 , 41983, 40234 ,.....] # Nearly 5000 items for now and the count may increase in future
for i in l:
URL ="https://api.something.com/?key=xxx-xxx-xxx&job_id={0}".format(i)
res = requests.get(url=URL)
res_list.append(res.text)
Probably, you just need to make your queries asynchronously. Something like that:
import asyncio
import aiohttp
NUMBERS = [1, 2, 3]
async def call():
async with aiohttp.ClientSession() as session:
for num in NUMBERS:
async with session.get(f'http://httpbin.org/get?{num}') as resp:
print(resp.status)
print(await resp.text())
if __name__ == '__main__':
loop = asyncio.new_event_loop()
loop.run_until_complete(call())

Asynchronous requests backoff/throttling best practice

Scenario: I need to gather paginated data from a web app's API which has a call limit of 100 per minute. The API object I need to return contains 100 items per page for 105 total, and growing, pages (~10,500 total items). Synchronous code was taking approximately 15 minutes to retrieve all the pages, so there was no worry about hitting the call limits then. However, I wanted to speed up the data retrieval, so I implemented asynchronous calls using asyncio and aiohttp. Data now downloads in 15 seconds - nice.
Problem: I'm now hitting the call limit thus receiving 403 errors for the last 5 or so calls.
Proposed Solution I implemented the try/except found in the get_data() function. I make the calls, and then when the call is not successful because of 403: Exceeded call limit I back off for back_off seconds and retry up to retries times:
async def get_data(session, url):
retries = 3
back_off = 60 # seconds to try again
for _ in range(retries):
try:
async with session.get(url, headers=headers) as response:
if response.status != 200:
response.raise_for_status()
print(retries, response.status, url)
return await response.json()
except aiohttp.client_exceptions.ClientResponseError as e:
retries -= 1
await asyncio.sleep(back_off)
continue
async def main():
async with aiohttp.ClientSession() as session:
attendee_urls = get_urls('attendee') # returns list of URLs to call asynchronously in get_data()
attendee_data = await asyncio.gather(*[get_data(session, attendee_url) for attendee_url in attendee_urls])
return attendee_data
if __name__ == '__main__':
data = asyncio.run(main())
Question: How do I limit the aiohttp calls so that they stay under the 100 calls/minute threshold without making a 403 request to back off? I've tried the following modules and none of them appeared to do anything: ratelimiter, ratelimit and asyncio-throttle.
Goal: To make 100 async calls per minute, but backing off and retrying if necessary (403: Exceeded call limit).
You can achieve "at most 100 requests/min" by adding a delay before every request.
100 requests/min is equivalent to 1 request/0.6s.
async def main():
async with aiohttp.ClientSession() as session:
attendee_urls = get_urls('attendee') # returns list of URLs to call asynchronously in get_data()
coroutines = []
for attendee_url in attendee_urls:
coroutines.append(get_data(session, attendee_url))
await asyncio.sleep(0.6)
attendee_data = asyncio.gather(*coroutines)
return attendee_data
Apart from the request rate limit, often, APIs also limit the no. of simultaneous requests. If so, you can use BoundedSempahore.
async def main():
sema = asyncio.BoundedSemaphore(50) # Assuming a concurrent requests limit of 50
...
coroutines.append(get_data(sema, session, attendee_url))
...
def get_data(sema, session, attendee_url):
...
for _ in range(retries):
try:
async with sema:
response = await session.get(url, headers=headers):
if response.status != 200:
response.raise_for_status()
...

Run Parallel Request session in python

I am trying to open a multiple web session and save the data into CSV, Have written my code using for loop & requests.get options, But it's taking so long to access 90 number of Web location. Can anyone let me know how the whole process run in parallel for loc_var:
The code is working fine, only the issue is running one by one for loc_var, and took so long time.
Want to access all the for loop loc_var URL in parallel and write operation of CSV
Below is the Code:
import pandas as pd
import numpy as np
import os
import requests
import datetime
import zipfile
t=datetime.date.today()-datetime.timedelta(2)
server = [("A","web1",":5000","username=usr&password=p7Tdfr")]
'''List of all web_ips'''
web_1 = ["Web1","Web2","Web3","Web4","Web5","Web6","Web7","Web8","Web9","Web10","Web11","Web12","Web13","Web14","Web15"]
'''List of All location'''
loc_var =["post1","post2","post3","post4","post5","post6","post7","post8","post9","post10","post11","post12","post13","post14","post15","post16","post17","post18"]
for s,web,port,usr in server:
login_url='http://'+web+port+'/api/v1/system/login/?'+usr
print (login_url)
s= requests.session()
login_response = s.post(login_url)
print("login Responce",login_response)
#Start access the Web for Loc_variable
for mkt in loc_var:
#output is CSV File
com_actions_url='http://'+web+port+'/api/v1/3E+date(%5C%22'+str(t)+'%5C%22)and+location+%3D%3D+%27'+mkt+'%27%22&page_size=-1&format=%22csv%22'
print("com_action_url",com_actions_url)
r = s.get(com_actions_url)
print("action",r)
if r.ok == True:
with open(os.path.join("/home/Reports_DC/", "relation_%s.csv"%mkt),'wb') as f:
f.write(r.content)
# If loc is not aceesble try with another Web_1 List
if r.ok == False:
while r.ok == False:
for web_2 in web_1:
login_url='http://'+web_2+port+'/api/v1/system/login/?'+usr
com_actions_url='http://'+web_2+port+'/api/v1/3E+date(%5C%22'+str(t)+'%5C%22)and+location+%3D%3D+%27'+mkt+'%27%22&page_size=-1&format=%22csv%22'
login_response = s.post(login_url)
print("login Responce",login_response)
print("com_action_url",com_actions_url)
r = s.get(com_actions_url)
if r.ok == True:
with open(os.path.join("/home/Reports_DC/", "relation_%s.csv"%mkt),'wb') as f:
f.write(r.content)
break
There are multiple approaches that you can take to make concurrent HTTP requests. Two that I've used are (1) multiple threads with concurrent.futures.ThreadPoolExecutor or (2) send the requests asynchronously using asyncio/aiohttp.
To use a thread pool to send your requests in parallel, you would first generate a list of URLs that you want to fetch in parallel (in your case generate a list of login_urls and com_action_urls), and then you would request all of the URLs concurrently as follows:
from concurrent.futures import ThreadPoolExecutor
import requests
def fetch(url):
page = requests.get(url)
return page.text
# Catch HTTP errors/exceptions here
pool = ThreadPoolExecutor(max_workers=5)
urls = ['http://www.google.com', 'http://www.yahoo.com', 'http://www.bing.com'] # Create a list of urls
for page in pool.map(fetch, urls):
# Do whatever you want with the results ...
print(page[0:100])
Using asyncio/aiohttp is generally faster than the threaded approach above, but the learning curve is more complicated. Here is a simple example (Python 3.7+):
import asyncio
import aiohttp
urls = ['http://www.google.com', 'http://www.yahoo.com', 'http://www.bing.com']
async def fetch(session, url):
async with session.get(url) as resp:
return await resp.text()
# Catch HTTP errors/exceptions here
async def fetch_concurrent(urls):
loop = asyncio.get_event_loop()
async with aiohttp.ClientSession() as session:
tasks = []
for u in urls:
tasks.append(loop.create_task(fetch(session, u)))
for result in asyncio.as_completed(tasks):
page = await result
#Do whatever you want with results
print(page[0:100])
asyncio.run(fetch_concurrent(urls))
But unless you are going to be making a huge number of requests, the threaded approach will likely be sufficient (and way easier to implement).

Computer hangs on large number of Python async aiohttp requests

I have a text file with over 20 million lines in the below format:
ABC123456|fname1 lname1|fname2 lname2
.
.
.
.
My task is to read the file line by line and send both the names to Google transliteration API and print the results on the terminal (linux). Below is my code:
import asyncio
import urllib.parse
from aiohttp import ClientSession
async def getResponse(url):
async with ClientSession() as session:
async with session.get(url) as response:
response = await response.read()
print(response)
loop = asyncio.get_event_loop()
tasks = []
# I'm using test server localhost, but you can use any url
url = "https://www.google.com/inputtools/request?{}"
for line in open('tg.txt'):
vdata = line.split("|")
if len(vdata) == 3:
names = vdata[1]+"_"+vdata[2]
tdata = {"text":names,"ime":"transliteration_en_te"}
qstring = urllib.parse.urlencode(tdata)
task = asyncio.ensure_future(getResponse(url.format(qstring)))
tasks.append(task)
loop.run_until_complete(asyncio.wait(tasks))
In the above code, my file tg.txt contains 20+ million lines. When I run it, my laptop freezes and I have to hard restart it. But this code works fine when I use another file tg1.txt which has only 10 lines. What am I missing?
You can try to use asyncio.gather(*futures) instead of asyncio.wait.
Also try to do this with batches of fixed size (for example 10 lines per batch) and add print after each processed batch, it should help you to debug your app.
Also your future could finish in different order and it's better to store result of gather and print it when processing of batch is finished.

Categories

Resources