Is it possible to send many requests asynchronously with Python - python

I'm trying to send about 70 requests to slack api but can't find a way to implement it in asynchronous way, I have about 3 second for it or I'm getting timeout error
here how I've tried to t implement it:
import asyncio
def send_msg_to_all(sc,request,msg):
user_list = sc.api_call(
"users.list"
)
members_array = user_list["members"]
ids_array = []
for member in members_array:
ids_array.append(member['id'])
real_users = []
for user_id in ids_array:
user_channel = sc.api_call(
"im.open",
user=user_id,
)
if user_channel['ok'] == True:
real_users.append(User(user_id, user_channel['channel']['id']) )
loop = asyncio.new_event_loop()
asyncio.set_event_loop(loop)
loop.run_until_complete(send_msg(sc, real_users, request, msg))
loop.close()
return HttpResponse()
async def send_msg(sc, real_users, req, msg):
for user in real_users:
send_ephemeral_msg(sc,user.user_id,user.dm_channel, msg)
def send_ephemeral_msg(sc, user, channel, text):
sc.api_call(
"chat.postEphemeral",
channel=channel,
user=user,
text=text
)
But it looks like I'm still doing it in a synchronous way
Any ideas guys?

Slack's API has a rate limit of 1 query per second (QPS) as documented here.
Even if you get this working you'll be well exceeding the limits and you will start to see HTTP 429 Too Many Requests errors. Your API token may even get revoked / cancelled if you continue at that rate.
I think you'll need to find a different way.

Related

Tornado 6.1 non-blocking request

Using Tornado, I have a POST request that takes a long time as it makes many requests to another API service and processes the data. This can take minutes to fully complete. I don't want this to block the entire web server from responding to other requests, which it currently does.
I looked at multiple threads here on SO, but they are often 8 years old and the code does not work anylonger as tornado removed the "engine" component from tornado.gen.
Is there an easy way to kick off this long get call and not have it block the entire web server in the process? Is there anything I can put in the code to say.. "submit the POST response and work on this one function without blocking any concurrent server requests from getting an immediate response"?
Example:
main.py
def make_app():
return tornado.web.Application([
(r"/v1", MainHandler),
(r"/v1/addfile", AddHandler, dict(folderpaths = folderpaths)),
(r"/v1/getfiles", GetHandler, dict(folderpaths = folderpaths)),
(r"/v1/getfile", GetFileHandler, dict(folderpaths = folderpaths)),
])
if __name__ == "__main__":
app = make_app()
sockets = tornado.netutil.bind_sockets(8888)
tornado.process.fork_processes(0)
tornado.process.task_id()
server = tornado.httpserver.HTTPServer(app)
server.add_sockets(sockets)
tornado.ioloop.IOLoop.current().start()
addHandler.py
class AddHandler(tornado.web.RequestHandler):
def initialize(self, folderpaths):
self.folderpaths = folderpaths
def blockingFunction(self):
time.sleep(320)
post("AWAKE")
def post(self):
user = self.get_argument('user')
folderpath = self.get_argument('inpath')
outpath = self.get_argument('outpath')
workflow_value = self.get_argument('workflow')
status_code, status_text = validateInFolder(folderpath)
if (status_code == 200):
logging.info("Status Code 200")
result = self.folderpaths.add_file(user, folderpath, outpath, workflow_value)
self.write(result)
self.finish()
#At this point the path is validated.
#POST response should be send out. Internal process should continue, new
#requests should not be blocked
self.blockingFunction()
Idea is that if input-parameters are validated the POST response should be sent out.
Then internal process (blockingFunction()) should be started, that should not block the Tornado Server from processing another API POST request.
I tried defining the (blockingFunction()) as async, which allows me to process multiple concurrent user requests - however there was a warning about missing "await" with async method.
Any help welcome. Thank you
class AddHandler(tornado.web.RequestHandler):
def initialize(self, folderpaths):
self.folderpaths = folderpaths
def blockingFunction(self):
time.sleep(320)
post("AWAKE")
async def post(self):
user = self.get_argument('user')
folderpath = self.get_argument('inpath')
outpath = self.get_argument('outpath')
workflow_value = self.get_argument('workflow')
status_code, status_text = validateInFolder(folderpath)
if (status_code == 200):
logging.info("Status Code 200")
result = self.folderpaths.add_file(user, folderpath, outpath, workflow_value)
self.write(result)
self.finish()
#At this point the path is validated.
#POST response should be send out. Internal process should continue, new
#requests should not be blocked
await loop.run_in_executor(None, self.blockingFunction)
#if this had multiple parameters it would be
#await loop.run_in_executor(None, self.blockingFunction, param1, param2)
Thank you #xyres
Further read: https://www.tornadoweb.org/en/stable/faq.html

Running 10 functions asynchronously in Djano Views using Asyncio

Im facing difficulties while trying to execute around 10 functions simultaneously using asyncio in Django. I could not find nay clear documentation on how to use asyncio with django.
I made around 10 http requests to different using TOR which is by default slow. Instead of making these requests one by one, which usually takes around 2 minutes, I wanted to perform all the 10 requests simultaneously. There are 10 distinct functions each making a http request a different URL and scrape data and return JSON.
URLscrape.py :
async def scrape1(username):
response = request.get('http://example.com'+username)
return response.json()
async def scrape2(username):
response = request.get('http://something.com'+username)
return response.json()
I have 10 individual functions like the above with different URLs and perfrom scraping and return a json data. In Django views, I did like this:
Views.py
from URLscrape import scrape1, scrape2........scrape10
def scrapper():
loop = asyncio.get_event_loop()
feature1 = loop.run_in_executor(None, scrape1, username)
feature2 = loop.run_in_executor(None, scrape2, username)
....
feature10 = loop.run_in_executor(None, scrape10, username)
response1 = await future1
response2 = await future2
.....
response10 = await future10
response1 = response1.text
......
response10 = response2.text
return render(request, 'index.html', {'scrape1':response1,'scrape10':response10})
But I dont know how to use loop.run_until_complete() to complete the script. Im in a restricted situation to use 10 individual functions for scraping. Im want to run the 10 simultaneously but I dont know how. I could not understand the concept and syntax of asyncio. Please help!!!

Concurrent HTTP and SQL requests using async Python 3

first time trying asyncio and aiohttp.
I have the following code that gets urls from the MySQL database for GET requests. Gets the responses and pushes them to MySQL database.
if __name__ == "__main__":
database_name = 'db_name'
company_name = 'company_name'
my_db = Db(database=database_name) # wrapper class for mysql.connector
urls_dict = my_db.get_rest_api_urls_for_specific_company(company_name=company_name)
update_id = my_db.get_updateid()
my_db.get_connection(dictionary=True)
for url in urls_dict:
url_id = url['id']
url = url['url']
table_name = my_db.make_sql_table_name_by_url(url)
insert_query = my_db.get_sql_for_insert(table_name)
r = requests.get(url=url).json() # make the request
args = [json.dumps(r), update_id, url_id]
my_db.db_execute_one(insert_query, args, close_conn=False)
my_db.close_conn()
This works fine but to speed it up How can I run it asynchronously?
I have looked here, here and here but can't seem to get my head around it.
Here is what I have tried based on #Raphael Medaer's answer.
async def fetch(url):
async with ClientSession() as session:
async with session.request(method='GET', url=url) as response:
json = await response.json()
return json
async def process(url, update_id):
table_name = await db.make_sql_table_name_by_url(url)
result = await fetch(url)
print(url, result)
if __name__ == "__main__":
"""Get urls from DB"""
db = Db(database="fuse_src")
urls = db.get_rest_api_urls() # This returns list of dictionary
update_id = db.get_updateid()
url_list = []
for url in urls:
url_list.append(url['url'])
print(update_id)
asyncio.get_event_loop().run_until_complete(
asyncio.gather(*[process(url, update_id) for url in url_list]))
I get an error in the process method:
TypeError: object str can't be used in 'await' expression
Not sure whats the problem?
Any code example specific to this would be highly appreciated.
Make this code asynchronous will not speed it up at all. Except if you consider to run a part of your code in "parallel". For instance you can run multiple (SQL or HTTP) queries in "same time". By doing asynchronous programming you will not execute code in "same time". Although you will get benefit of long IO tasks to execute other part of your code while you're waiting for IOs.
First of all, you'll have to use asynchronous libraries (instead of synchronous one).
mysql.connector could be replaced by aiomysql from aio-libs.
requests could be replaced by aiohttp
To execute multiple asynchronous tasks in "parallel" (for instance to replace your loop for url in urls_dict:), you have to read carefully about asyncio tasks and function gather.
I will not (re)write your code in an asynchronous way, however here are a few lines of pseudo code which could help you:
async def process(url):
result = await fetch(url)
await db.commit(result)
if __name__ == "__main__":
db = MyDbConnection()
urls = await db.fetch_all_urls()
asyncio.get_event_loop().run_until_complete(
asyncio.gather(*[process(url) for url in urls]))

use concurrent futures with generator input

I am downloading some information from webpages in the form
http://example.com?p=10
http://example.com?p=20
...
The point is that I don't know how many they are. At some point I will receive an error from the server, or maybe at some point I want to stop the processing since I have enough. I want to run them in parallel.
def generator_query(step=10):
i = 0
yield "http://example.com?p=%d" % i
i += step
def task(url):
t = request.get(url).text
if not t: # after the last one
return None
return t
I can implement it with consumer/producer pattern with queues, but I am wondering it is possible to have an higher level implementation, for example with the concurrent module.
Non-concurrent example:
results = []
for url in generator_query():
results.append(task(url))
You could use concurrent's ThreadPoolExecutor. An example of how to use it is provided here.
You'll need to break out of the example's for-loop, when you're getting invalid answers from the server (the except section) or whenever you feel like you got enough data (you could count valid responses in the else section for example).
You could use aiohttp for this purpose:
async def fetch(session, url):
async with session.get(url) as response:
return await response.text()
async def coro(step):
url = 'https://example.com?p={}'.format(step)
async with aiohttp.ClientSession() as session:
html = await fetch(session, url)
print(html)
if __name__ == '__main__':
loop = asyncio.get_event_loop()
tasks = [coro(i*10) for i in range(10)]
loop.run_until_complete(asyncio.wait(tasks))
as for the page error, you might have to figure it yourself since I don't know what website you're dealing with. Maybe try...except?
Notice: if your python version is higher than 3.5, it might cause an ssl certificate verification error.

Am I using aiohttp together with psycopg2 correctly?

I'm very new to using asyncio/aiohttp, but I have a Python script that read a batch of URL:s from a Postgres table, downloads the URL:s, runs a processing function on each download (not relevant for the question), and saves back the result of the processing to the table.
In simplified form it looks like this:
import asyncio
import psycopg2
from aiohttp import ClientSession, TCPConnector
BATCH_SIZE = 100
def _get_pgconn():
return psycopg2.connect()
def db_conn(func):
def _db_conn(*args, **kwargs):
with _get_pgconn() as conn:
with conn.cursor() as cur:
return func(cur, *args, **kwargs)
conn.commit()
return _db_conn
async def run():
async with ClientSession(connector=TCPConnector(ssl=False, limit=100)) as session:
while True:
count = await run_batch(session)
if count == 0:
break
async def run_batch(session):
tasks = []
for url in get_batch():
task = asyncio.ensure_future(process_url(url, session))
tasks.append(task)
await asyncio.gather(*tasks)
results = [task.result() for task in tasks]
save_batch_result(results)
return len(results)
async def process_url(url, session):
try:
async with session.get(url, timeout=15) as response:
body = await response.read()
return process_body(body)
except:
return {...}
#db_conn
def get_batch(cur):
sql = "SELECT id, url FROM db.urls WHERE processed IS NULL LIMIT %s"
cur.execute(sql, (BATCH_SIZE,))
return cur.fetchall()
#db_conn
def save_batch_result(cur, results):
sql = "UPDATE db.urls SET a = %(a)s, processed = true WHERE id = %(id)s"
cur.executemany(sql, tuple(results))
loop = asyncio.get_event_loop()
loop.run_until_complete(run())
But I have the feeling that I must be missing something here. The script runs but it seems to become slower and slower with each batch. Specially it seems like the call to the process_url function becomes slower over time. Also the used memory keeps growing so I'm guessing there might be something that I fail to clean up properly between runs?
I also have problems increasing the batch size much, if I go much over 200 I seem to get a much higher proportion of exceptions from the call to session.get. I have tried playing with the limit argument to the TCPConnector, setting it both higher and lower but I can't see that it helps much. Have also tried running it on a few different server but it seems to be the same. Is there some way to think about how to set these values more effectively?
Would be grateful for some pointers to what I might do wrong here!
The problem of your code is mixing asynchronous aiohttp library with synchronous psycopg2 client.
As a consequence calls to DB blocks the event loop entirely affecting all other parallel tasks.
To solve it you need to use asynchronous DB client: aiopg (a wrapper around psycopg2 async mode) or asyncpg (it has a different API but works faster).

Categories

Resources