Im facing difficulties while trying to execute around 10 functions simultaneously using asyncio in Django. I could not find nay clear documentation on how to use asyncio with django.
I made around 10 http requests to different using TOR which is by default slow. Instead of making these requests one by one, which usually takes around 2 minutes, I wanted to perform all the 10 requests simultaneously. There are 10 distinct functions each making a http request a different URL and scrape data and return JSON.
URLscrape.py :
async def scrape1(username):
response = request.get('http://example.com'+username)
return response.json()
async def scrape2(username):
response = request.get('http://something.com'+username)
return response.json()
I have 10 individual functions like the above with different URLs and perfrom scraping and return a json data. In Django views, I did like this:
Views.py
from URLscrape import scrape1, scrape2........scrape10
def scrapper():
loop = asyncio.get_event_loop()
feature1 = loop.run_in_executor(None, scrape1, username)
feature2 = loop.run_in_executor(None, scrape2, username)
....
feature10 = loop.run_in_executor(None, scrape10, username)
response1 = await future1
response2 = await future2
.....
response10 = await future10
response1 = response1.text
......
response10 = response2.text
return render(request, 'index.html', {'scrape1':response1,'scrape10':response10})
But I dont know how to use loop.run_until_complete() to complete the script. Im in a restricted situation to use 10 individual functions for scraping. Im want to run the 10 simultaneously but I dont know how. I could not understand the concept and syntax of asyncio. Please help!!!
Related
I have written a script where draft orders are created in shopify and then used the id of the first request for the url of the second request which involves completing the order. Below is just a portion of the script that I wrote. However, there are delays that are happening when response is being fetched. And , because of this the draft order only sometimes get completed:
url=f"https://{shop_url}/admin/api/{api_version}/draft_orders.json"
headers = {"X-Shopify-Access-Token": private_app_password}
counter = count(start=1)
for _ in range(number_of_orders):
order = get_order(
line_items_list, locale="en_US", country="United States"
)
response = requests.post(url, json=order, headers=headers)
data = response.json()
# complete order
url2 = f"https://{shop_url}/admin/api/{api_version}/draft_orders/{data['draft_order']['id']}/complete.json"
requests.put(url2,headers=headers)
The problem seems to have to do with the delay that happens when the first response is fetched. Hence why I tried to wrap my API calls in asycn to fetch but the same thing is still occuring. The portion of the script is given below:
async def fetch(session, url,order,headers):
async with session.post(url,headers=headers,json=order) as response:
return await response.json()
async def get_draft_order(url,order,headers):
async with aiohttp.ClientSession() as session:
data = await fetch(session,url,order,headers)
url2 = f"https://{shop_url}/admin/api/{api_version}/draft_orders/{data['draft_order']['id']}/complete.json"
await session.put(url2,headers=headers,json=data)
def create_orders():
# POST request
url = f"https://{shop_url}/admin/api/{api_version}/draft_orders.json"
headers = {"X-Shopify-Access-Token": private_app_password}
counter = count(start=1)
for _ in range(number_of_orders):
order = get_order(
line_items_list, locale="en_US", country="United States"
)
asyncio.run(get_draft_order(url,order,headers))
Could someone help me understand what is wrong with the way I have implemented it. The second request depends on the id of the first request.
For my project I need to request a api and to store the result in a list. But the no. of requests I need to give more than 5000 with different body values. So, it take huge amount of time to complete. Is there is any way to parallely send the requests to complete the process quickly. I tried some threading code in this but I can't be able to figure out the ay to solve this.
import requests
res_list=[]
l=[19821, 29674 , 41983, 40234 ,.....] # Nearly 5000 items for now and the count may increase in future
for i in l:
URL ="https://api.something.com/?key=xxx-xxx-xxx&job_id={0}".format(i)
res = requests.get(url=URL)
res_list.append(res.text)
Probably, you just need to make your queries asynchronously. Something like that:
import asyncio
import aiohttp
NUMBERS = [1, 2, 3]
async def call():
async with aiohttp.ClientSession() as session:
for num in NUMBERS:
async with session.get(f'http://httpbin.org/get?{num}') as resp:
print(resp.status)
print(await resp.text())
if __name__ == '__main__':
loop = asyncio.new_event_loop()
loop.run_until_complete(call())
first time trying asyncio and aiohttp.
I have the following code that gets urls from the MySQL database for GET requests. Gets the responses and pushes them to MySQL database.
if __name__ == "__main__":
database_name = 'db_name'
company_name = 'company_name'
my_db = Db(database=database_name) # wrapper class for mysql.connector
urls_dict = my_db.get_rest_api_urls_for_specific_company(company_name=company_name)
update_id = my_db.get_updateid()
my_db.get_connection(dictionary=True)
for url in urls_dict:
url_id = url['id']
url = url['url']
table_name = my_db.make_sql_table_name_by_url(url)
insert_query = my_db.get_sql_for_insert(table_name)
r = requests.get(url=url).json() # make the request
args = [json.dumps(r), update_id, url_id]
my_db.db_execute_one(insert_query, args, close_conn=False)
my_db.close_conn()
This works fine but to speed it up How can I run it asynchronously?
I have looked here, here and here but can't seem to get my head around it.
Here is what I have tried based on #Raphael Medaer's answer.
async def fetch(url):
async with ClientSession() as session:
async with session.request(method='GET', url=url) as response:
json = await response.json()
return json
async def process(url, update_id):
table_name = await db.make_sql_table_name_by_url(url)
result = await fetch(url)
print(url, result)
if __name__ == "__main__":
"""Get urls from DB"""
db = Db(database="fuse_src")
urls = db.get_rest_api_urls() # This returns list of dictionary
update_id = db.get_updateid()
url_list = []
for url in urls:
url_list.append(url['url'])
print(update_id)
asyncio.get_event_loop().run_until_complete(
asyncio.gather(*[process(url, update_id) for url in url_list]))
I get an error in the process method:
TypeError: object str can't be used in 'await' expression
Not sure whats the problem?
Any code example specific to this would be highly appreciated.
Make this code asynchronous will not speed it up at all. Except if you consider to run a part of your code in "parallel". For instance you can run multiple (SQL or HTTP) queries in "same time". By doing asynchronous programming you will not execute code in "same time". Although you will get benefit of long IO tasks to execute other part of your code while you're waiting for IOs.
First of all, you'll have to use asynchronous libraries (instead of synchronous one).
mysql.connector could be replaced by aiomysql from aio-libs.
requests could be replaced by aiohttp
To execute multiple asynchronous tasks in "parallel" (for instance to replace your loop for url in urls_dict:), you have to read carefully about asyncio tasks and function gather.
I will not (re)write your code in an asynchronous way, however here are a few lines of pseudo code which could help you:
async def process(url):
result = await fetch(url)
await db.commit(result)
if __name__ == "__main__":
db = MyDbConnection()
urls = await db.fetch_all_urls()
asyncio.get_event_loop().run_until_complete(
asyncio.gather(*[process(url) for url in urls]))
I am downloading some information from webpages in the form
http://example.com?p=10
http://example.com?p=20
...
The point is that I don't know how many they are. At some point I will receive an error from the server, or maybe at some point I want to stop the processing since I have enough. I want to run them in parallel.
def generator_query(step=10):
i = 0
yield "http://example.com?p=%d" % i
i += step
def task(url):
t = request.get(url).text
if not t: # after the last one
return None
return t
I can implement it with consumer/producer pattern with queues, but I am wondering it is possible to have an higher level implementation, for example with the concurrent module.
Non-concurrent example:
results = []
for url in generator_query():
results.append(task(url))
You could use concurrent's ThreadPoolExecutor. An example of how to use it is provided here.
You'll need to break out of the example's for-loop, when you're getting invalid answers from the server (the except section) or whenever you feel like you got enough data (you could count valid responses in the else section for example).
You could use aiohttp for this purpose:
async def fetch(session, url):
async with session.get(url) as response:
return await response.text()
async def coro(step):
url = 'https://example.com?p={}'.format(step)
async with aiohttp.ClientSession() as session:
html = await fetch(session, url)
print(html)
if __name__ == '__main__':
loop = asyncio.get_event_loop()
tasks = [coro(i*10) for i in range(10)]
loop.run_until_complete(asyncio.wait(tasks))
as for the page error, you might have to figure it yourself since I don't know what website you're dealing with. Maybe try...except?
Notice: if your python version is higher than 3.5, it might cause an ssl certificate verification error.
I'm trying to send about 70 requests to slack api but can't find a way to implement it in asynchronous way, I have about 3 second for it or I'm getting timeout error
here how I've tried to t implement it:
import asyncio
def send_msg_to_all(sc,request,msg):
user_list = sc.api_call(
"users.list"
)
members_array = user_list["members"]
ids_array = []
for member in members_array:
ids_array.append(member['id'])
real_users = []
for user_id in ids_array:
user_channel = sc.api_call(
"im.open",
user=user_id,
)
if user_channel['ok'] == True:
real_users.append(User(user_id, user_channel['channel']['id']) )
loop = asyncio.new_event_loop()
asyncio.set_event_loop(loop)
loop.run_until_complete(send_msg(sc, real_users, request, msg))
loop.close()
return HttpResponse()
async def send_msg(sc, real_users, req, msg):
for user in real_users:
send_ephemeral_msg(sc,user.user_id,user.dm_channel, msg)
def send_ephemeral_msg(sc, user, channel, text):
sc.api_call(
"chat.postEphemeral",
channel=channel,
user=user,
text=text
)
But it looks like I'm still doing it in a synchronous way
Any ideas guys?
Slack's API has a rate limit of 1 query per second (QPS) as documented here.
Even if you get this working you'll be well exceeding the limits and you will start to see HTTP 429 Too Many Requests errors. Your API token may even get revoked / cancelled if you continue at that rate.
I think you'll need to find a different way.