How to download images with aiohttp? - python

So I have a discord bot that I'm playing with to learn Python. I have a command that downloads images, and edits/merges them, then sends the edited image to chat. I was using requests to do this before, but I was told by one of the library devs for discord.py that I should be using aiohttpinstead of requests. I can't find how to download images in aiohttp, I've tried a bunch of stuff, but none of it works.
if message.content.startswith("!async"):
import aiohttp
import random
import time
import shutil
start = time.time()
notr = 0
imagemake = Image.new("RGBA",(2048,2160))
imgsave = "H:\Documents\PyCharmProjects\ChatBot\Images"
imagesend = os.path.join(imgsave,"merged.png")
imgmergedsend =os.path.join(imgsave,"merged2.png")
with aiohttp.ClientSession() as session:
async with session.get("http://schoolido.lu/api/cards/788/") as resp:
data = await resp.json()
cardsave = session.get(data["card_image"])
with open((os.path.join(imgsave, "card.png")),"wb") as out_file:
shutil.copyfileobj(cardsave, out_file)
is what I have right now, but that still doesn't work.
So, is there a way to download images?

You lock loop when write file. You need use aiofiles.
import aiohttp
import aiofiles
async with aiohttp.ClientSession() as session:
url = "http://host/file.img"
async with session.get(url) as resp:
if resp.status == 200:
f = await aiofiles.open('/some/file.img', mode='wb')
await f.write(await resp.read())
await f.close()

So I figured it out, a while ago:
if message.content.startswith("!async2"):
import aiohttp
with aiohttp.ClientSession() as session:
async with session.get("http://schoolido.lu/api/cards/788/") as resp:
data = await resp.json()
card = data["card_image"]
async with session.get(card) as resp2:
test = await resp2.read()
with open("cardtest2.png", "wb") as f:
f.write(test)
I was getting a response, not an image response

pdf_url = 'https://example.com/file.pdf'
async with aiohttp.ClientSession() as session:
async with session.get(pdf_url) as resp:
if resp.status == 200:
with open('file.pdf', 'wb') as fd:
async for chunk in resp.content.iter_chunked(10):
fd.write(chunk)
While methods read(), json() and text() are very convenient you should
use them carefully. All these methods load the whole response in
memory. For example if you want to download several gigabyte sized
files, these methods will load all the data in memory. Instead you can
use the content attribute. It is an instance of the
aiohttp.StreamReader class. The gzip and deflate transfer-encodings
are automatically decoded for you:
https://docs.aiohttp.org/en/stable/client_quickstart.html#streaming-response-content

Related

Python Asyncio file.write after request.get(file) not working

I'm using Asyncio and aiohttp to asynchronously get files from an endpoint. My status codes for the request are successful but when I try to write the files everything is always empty for some reason.
This is what my code looks like right now:
async def download_link(url:str,my_session:ClientSession, filename:Path):
async with my_session.get(url, allow_redirects=True) as response:
with filename.open(mode='wb') as f: #Line 3
await f.write(response.content)
async def download_all(urls:list, filenames:list):
my_conn = aiohttp.TCPConnector(limit=10)
async with aiohttp.ClientSession(connector=my_conn) as session:
tasks = []
for item in zip(urls,file_names):
task = asyncio.ensure_future(download_link(url=item[0],my_session=session, filename=item[1]))
tasks.append(task)
await asyncio.gather(*tasks,return_exceptions=True)
I've also tried to put async in front of the with on line 3, inside the download_link function. And I've also tried making the code that opens the file and writes into it a separate async function a such:
async def store_response(response, filename:Path):
async with filename.open(model='wb') as f:
f.write(response.content)
I know the files I'm fetching from do have data, when I use multi-threading I'm able to get data back. Anyone know why this is happening?
I have used this code to download files asynchronously with no problem and good speed.
import asyncio
import aiohttp
import aiofile
async def download_file(url: str):
filename = url.split('?')[0].split('/')[-1]
async with aiohttp.ClientSession() as session:
async with session.get(url) as resp:
if not resp.ok:
print(f"Invalid status code: {resp.status}")
else:
try:
async with aiofile.async_open(filename, "wb+") as afp:
async for chunk in resp.content.iter_chunked(1024 * 512): # 500 KB
await afp.write(chunk)
except asyncio.TimeoutError:
print(f"A timeout ocurred while downloading '{filename}'")
asyncio.run(download_file("https://www.python.org/static/img/python-logo.png"))

How to save JSON responses with asynchronous requests?

I have a question regarding asynchronous requests:
How do I save response.json() to a file, on the fly?
I want to make a request and save response to a .json file, without keeping it in memory.
import asyncio
import aiohttp
async def fetch(sem, session, url):
async with sem:
async with session.get(url) as response:
return await response.json() # here
async def fetch_all(urls, loop):
sem = asyncio.Semaphore(4)
async with aiohttp.ClientSession(loop=loop) as session:
results = await asyncio.gather(
*[fetch(sem, session, url) for url in urls]
)
return results
if __name__ == '__main__':
urls = (
"https://public.api.openprocurement.org/api/2.5/tenders/6a0585fcfb05471796bb2b6a1d379f9b",
"https://public.api.openprocurement.org/api/2.5/tenders/d1c74ec8bb9143d5b49e7ef32202f51c",
"https://public.api.openprocurement.org/api/2.5/tenders/a3ec49c5b3e847fca2a1c215a2b69f8d",
"https://public.api.openprocurement.org/api/2.5/tenders/52d8a15c55dd4f2ca9232f40c89bfa82",
"https://public.api.openprocurement.org/api/2.5/tenders/b3af1cc6554440acbfe1d29103fe0c6a",
"https://public.api.openprocurement.org/api/2.5/tenders/1d1c6560baac4a968f2c82c004a35c90",
)
loop = asyncio.get_event_loop()
data = loop.run_until_complete(fetch_all(urls, loop))
print(data)
For now, the script just prints JSON files, and I can save them once they're all scraped:
data = loop.run_until_complete(fetch_all(urls, loop))
for i, resp in enumerate(data):
with open(f"{i}.json", "w") as f:
json.dump(resp, f)
But it doesn't feel right to me as it will definitely fail once I run out of memory for example.
Any suggestions?
Edit
Limited my post to only one question
How do I save response.json() to a file, on the fly?
Don't use response.json() in the first place, use the streaming API instead:
async def fetch(sem, session, url):
async with sem, session.get(url) as response:
with open("some_file_name.json", "wb") as out:
async for chunk in response.content.iter_chunked(4096)
out.write(chunk)

Instagram API start returning loading page after some calls

I am using the code below to get account information of one thousand instagram accounts using asycnio. In the initial requests the output is correct but after 10-20 calls, instagram starts returning loading page's HTML code. What could I be doing wrong here ? Below is the python code.
import random
import asyncio
from aiohttp import ClientSession
import urllib.request
import aiohttp
async def fetch(url, session,sem):
print("------")
print(url)
async with session.get(url = url) as response:
print(await response.text())
await response.text()
# exit()
if response.status == 200:
await sem.acquire()
fname = url[22:]
fname = fname.split('/')
fname = fname[0] + '.txt'
f = open(fname , 'w')
f.write(str(await response.text()))
sem.release()
# return (await response.text())
async def run(url_list):
tasks = []
# create instance of Semaphore
sem = asyncio.Semaphore(2)
# Create client session that will ensure we dont open new connection
# per each request.
async with ClientSession() as session:
for url in url_list:
task = asyncio.ensure_future(fetch(url, session,sem))
tasks.append(task)
responses = asyncio.gather(*tasks)
await responses
# making the url list here
url_list = []
file = open('url.txt', 'r')
for url in file:
url_list.append(url)
print(url_list)
import time
old = time.time()
loop = asyncio.get_event_loop()
future = asyncio.ensure_future(run(url_list))
loop.run_until_complete(future)
print(time.time() - old)
Here are some of the URL's from url.txt file
https://instagram.com/johanna_kre/?__a=1
https://instagram.com/channie_f/?__a=1
https://instagram.com/lilakuh68/?__a=1
https://instagram.com/nataliacallisto/?__a=1
https://instagram.com/edbastian/?__a=1
https://instagram.com/sylvana.h/?__a=1
https://instagram.com/munich_bombon/?__a=1
https://instagram.com/younotus/?__a=1
https://instagram.com/meet.herbert/?__a=1
https://instagram.com/inaaogo/?__a=1
https://instagram.com/dennisaogo/?__a=1
https://instagram.com/mrslight__/?__a=1
https://instagram.com/reneturrek/?__a=1
https://instagram.com/_eeasyyy/?__a=1
https://instagram.com/sentinobln/?__a=1
https://instagram.com/eri.ka_g/?__a=1
Your semaphore is not limiting the requests as you want it to; you should acquire it before making the request, not before processing the content.
With your current implementation you are making 100 concurrent requests (aiohttp's client default limit) but only process the responses two at a time (however at this point from the server's perspective the requests are already processed).
Use:
async def fetch(url, session,sem):
print("------")
print(url)
await sem.acquire()
async with session.get(url = url) as response:
print(await response.text())
await response.text()
...
sem.release()
...

Can someone help to explain why the python aiohttp return more response content than requests.get?

Recently, I'm looking at the python aiohttp lib, play around it, compare with python requests. Here is the code:
import aiohttp
import asyncio
import requests
request_url = 'http://www.baidu.com'
requests_resp = requests.get(request_url)
async def fetch(session, url):
async with session.get(url) as response:
return await response.text()
async def main():
async with aiohttp.ClientSession() as session:
aio_resp = await fetch(session, request_url)
print('aio_resp_length =', len(aio_resp))
loop = asyncio.get_event_loop()
loop.run_until_complete(main())
print('requests_resp_length = ', len(requests_resp.text))
The response lengths with a huge diffferences
aio_resp_length = 152576
requests_resp_length = 2381
Not sure what happens in aiohttp.session.get, but this result is not always like this. When you change the requests_url to http://www.example.com,the
response lengthes are the same. Can someone tell me what happened here?
Cheers
Because aiohttp has newline in it's response and requests doesn't.
you can check thier response like this
print('requests_resp_length = ', requests_resp.text[0:100])
print('aio_resp_length =', aio_resp[0:100])

RuntimeError Session is closed when trying to make async requests

First of all heres the code:
import random
import asyncio
from aiohttp import ClientSession
import csv
headers =[]
def extractsites(file):
sites = []
readfile = open(file, "r")
reader = csv.reader(readfile, delimiter=",")
raw = list(reader)
for a in raw:
sites.append((a[1]))
return sites
async def fetchheaders(url, session):
async with session.get(url) as response:
responseheader = await response.headers
print(responseheader)
return responseheader
async def bound_fetch(sem, url, session):
async with sem:
print("doing request for "+ url)
await fetchheaders(url, session)
async def run():
urls = extractsites("cisco-umbrella.csv")
tasks = []
# create instance of Semaphore
sem = asyncio.Semaphore(100)
async with ClientSession() as session:
for i in urls:
task = asyncio.ensure_future(bound_fetch(sem, "http://"+i, session))
tasks.append(task)
return tasks
def main():
loop = asyncio.get_event_loop()
future = asyncio.ensure_future(run())
loop.run_until_complete(future)
if __name__ == '__main__':
main()
Most of this code was taken from this blog post:
https://pawelmhm.github.io/asyncio/python/aiohttp/2016/04/22/asyncio-aiohttp.html
Here is my problem that I'm facing: I am trying to read a million urls from a file and then make async request for each of them.
But when I try to execute the code above I get the Session expired error.
This is my line of thought:
I am relatively new to async programming so bear with me.
My though process was to create a long task list (that only allows 100 parallel requests), that I build in the run function, and then pass as a future to the event loop to execute.
I have included a print debug in the bound_fetch (which I copied from the blog post) and it looks like it loops over all urls that I have and as soon as it should start making requests in the fetchheaders function I get the runtime errors.
How do I fix my code ?
A couple things here.
First, in your run function you actually want to gather the tasks there and await them to fix your session issue, like so:
async def run():
urls = ['google.com','amazon.com']
tasks = []
# create instance of Semaphore
sem = asyncio.Semaphore(100)
async with ClientSession() as session:
for i in urls:
task = asyncio.ensure_future(bound_fetch(sem, "http://"+i, session))
tasks.append(task)
await asyncio.gather(*tasks)
Second, the aiohttp API is a little odd in dealing with headers in that you can't await them. I worked around this by awaiting body so that headers are populated and then returning the headers:
async def fetchheaders(url, session):
async with session.get(url) as response:
data = await response.read()
responseheader = response.headers
print(responseheader)
return responseheader
There is some additional overhead here in pulling the body however. I couldn't find another way to load headers though without doing a body read.

Categories

Resources