Python Telethon, how to resume a download media - python

Currently to download file i'm using
from telethon import TelegramClient
client = TelegramClient('anon', api_id, api_hash)
async def main():
async for dialog in client.iter_messages(entity=peer_channel):
await dialog.download_media("file....")
def bot():
with client:
client.loop.run_until_complete(main())
client.start()
client.run_until_disconnected()
if __name__ == "__main__":
bot()
But sometimes I lost connection, because of my net or telegram disconnection.Or maybe because I need to restart... Here a log.
INFO:root:Download total: 99% 1048.50 mb/1057.82 mb tmp/Evil/2x12 Evil.rar
INFO:telethon.network.mtprotosender:Disconnecting from xxx.xxx.xxx.xxx:443/TcpFull...
INFO:telethon.network.mtprotosender:Disconnection from xxx.xxx.xxx.xxx:443/TcpFull complete!
INFO:telethon.network.mtprotosender:Connecting to xxx.xxx.xxx.xxx:443/TcpFull...
INFO:telethon.network.mtprotosender:Connection to xxx.xxx.xxx.xxx:443/TcpFull complete!
INFO:root:[DEBUG] DOWNLOADING /export/nasty/tmp/Evil/2x12 Evil.rar
INFO:telethon.client.downloads:Starting direct file download in chunks of 524288 at 0, stride 524288
"INFO:root" are messages writed by me using logging.info(....)
It's disgusting... file was at 99% but connection failed and must to restart download from zero.
Reading documentation I found this client.iter_download
I tried:
async def main():
async for dialog in client.iter_messages(entity=peer_channel):
# Filename should be generated by dialog.media, is a example
with open('file.rar', 'wb') as fd:
async for chunk in client.iter_download(dialog.media):
fd.write(chunk)
But same result if I stop script, download start from zero

iter_download is the correct way, however, you need to manually specify the resume offset (and you should open the file in append mode):
import os
file = 'file.rar'
try:
offset = os.path.getsize(file)
except OSError:
offset = 0
with open(file, 'ab') as fd:
# ^ append
async for chunk in client.iter_download(dialog.media, offset=offset):
# ^~~~~~~~~~~~~ resume from offset
fd.write(chunk)

Related

How to pass a video uploaded via FastAPI to OpenCV VideoCapture?

I am trying to upload an mp4 video file using UploadFile in FastAPI.
However, the uploaded format is not readable by OpencCV (cv2).
This is my endpoint:
from fastapi import FastAPI, File, UploadFile
from fastapi.responses import PlainTextResponse
#app.post("/video/test", response_class=PlainTextResponse)
async def detect_faces_in_video(video_file: UploadFile):
contents = await video_file.read()
print(type(video_file)) # <class 'starlette.datastructures.UploadFile'>
print(type(contents)) # <class 'bytes'>
return ""
and the two file formats (i.e., bytes and UploadFile) are not readable by OpenCV.
You are trying to pass either the file contents (bytes) or UploadFile object; however, VideoCapture() accepts either a video filename, capturing device or or an IP video stream.
UploadFile is basically a SpooledTemporaryFile (a file-like object) that operates similar to a TemporaryFile. However, it does not have a visible name in the file system. As you mentioned that you wouldn't be keeping the files on the server after processing them, you could copy the file contents to a NamedTemporaryFile that "has a visible name in the file system, which can be used to open the file" (using the name attribute), as described here and here. As per the documentation:
Whether the name can be used to open the file a second time, while the
named temporary file is still open, varies across platforms (it can be
so used on Unix; it cannot on Windows). If delete is true (the
default), the file is deleted as soon as it is closed.
Hence, on Windows you need to set the delete argument to False when instantiating a NamedTemporaryFile, and once you are done with it, you can manually delete it, using the os.remove() or os.unlink() method.
Below are given two options on how to do that. Option 1 implements a solution using a def endpoint, while Option 2 uses an async def endpoint (utilising the aiofiles library). For the difference between def and async def, please have a look at this answer. If you are expecting users to upload rather large files in size that wouldn't fit into memory, have a look at this and this answer on how to read the uploaded video file in chunks instead.
Option 1 - Using def endpoint
from fastapi import FastAPI, File, UploadFile
from tempfile import NamedTemporaryFile
import os
#app.post("/video/detect-faces")
def detect_faces(file: UploadFile = File(...)):
temp = NamedTemporaryFile(delete=False)
try:
try:
contents = file.file.read()
with temp as f:
f.write(contents);
except Exception:
return {"message": "There was an error uploading the file"}
finally:
file.file.close()
res = process_video(temp.name) # Pass temp.name to VideoCapture()
except Exception:
return {"message": "There was an error processing the file"}
finally:
#temp.close() # the `with` statement above takes care of closing the file
os.remove(temp.name)
return res
Option 2 - Using async def endpoint
from fastapi import FastAPI, File, UploadFile
from tempfile import NamedTemporaryFile
from fastapi.concurrency import run_in_threadpool
import aiofiles
import asyncio
import os
#app.post("/video/detect-faces")
async def detect_faces(file: UploadFile = File(...)):
try:
async with aiofiles.tempfile.NamedTemporaryFile("wb", delete=False) as temp:
try:
contents = await file.read()
await temp.write(contents)
except Exception:
return {"message": "There was an error uploading the file"}
finally:
await file.close()
res = await run_in_threadpool(process_video, temp.name) # Pass temp.name to VideoCapture()
except Exception:
return {"message": "There was an error processing the file"}
finally:
os.remove(temp.name)
return res

open external file through wave.open in python

I'm using wave.open to open file , if i give local path
async def run_test(uri):
async with websockets.connect(uri) as websocket:
wf = wave.open('test.wav', "rb")
then it is working but if i give external path it is not working
async def run_test(uri):
async with websockets.connect(uri) as websocket:
wf = wave.open('http://localhost:8000/storage/uploads/test.wav', "rb")
getting this error :
OSError: [Errno 22] Invalid argument:
'http://localhost:8000/storage/uploads/test.wav'
Yeah, wave.open() doesn't know anything about HTTP.
You'll need to download the file first, e.g. with requests (or aiohttp or httpx since you're in async land).
import io, requests, wave
resp = requests.get('http://localhost:8000/storage/uploads/test.wav')
resp.raise_for_status()
bio = io.BytesIO() # file-like memory buffer
bio.write(resp.content) # todo: if file can be large, maybe use streaming
bio.seek(0) # seek to the start of the file
wf = wave.open(bio, "rb") # wave.open accepts file-like objects
This assumes the file is small enough to fit in memory; if it's not, you'd want to use tempfile.NamedTemporaryFile instead.

Azure python asyncio memory exhaustion in upload_blob function

I have an asyncio python azure script that uses multiple tasks to upload files to blobs from an asyncio queue. It works fine, at least up until the point where it uses up all available memory on the system. I can't figure out where the memory leak is. Normally I use memory-profiler, but this doesn't seem to work with async functions.
Can someone tell me either what I'm doing wrong here, or else what the best way would be to find out where the issue lies? Thanks. It's not clear to me what is not being cleaned up, if anything.
I put anywhere from a few hundred to a few thousand files on the work queue, and usually run with 3-5 tasks. Within the space of a couple of minutes this program uses up anywhere from 3 to 6GB of resident memory and then starts eating into swap until, if it runs long enough, it gets killed from memory starvation. This is on a linux box with 8GB memory using Python 3.6.8 and the following azure libraries:
azure-common 1.1.25
azure-core 1.3.0
azure-identity 1.3.0
azure-nspkg 3.0.2
azure-storage-blob 12.3.0
from azure.identity.aio import ClientSecretCredential
from azure.storage.blob.aio import BlobClient
async def uploadBlobsTask(taskName, args, workQueue):
while not workQueue.empty():
fileName = await workQueue.get()
blobName = fileName.replace(args.sourceDirPrefix, '')
blobClient = BlobClient(
"https://{}.blob.core.windows.net".format(args.accountName),
credential = args.creds,
container_name = args.container,
blob_name = blobName,
)
async with blobClient:
args.logger.info("Task {}: uploading {} as {}".format(taskName, fileName, blobName))
try:
with open(fileName, "rb") as data:
await blobClient.upload_blob(data, overwrite=True)
fileNameMoved = fileName + '.moved'
with open(fileNameMoved, "w") as fm:
fm.write("")
except KeyboardInterrupt:
raise
except:
args.logger.error("Task {}: {}".format(taskName, traceback.format_exc()))
await workQueue.put(fileName)
finally:
workQueue.task_done()
async def processFiles(args):
workQueue = asyncio.Queue()
for (path, dirs, files) in os.walk(args.sourceDir):
for f in files:
fileName = os.path.join(path, f)
await workQueue.put(fileName)
creds = ClientSecretCredential(args.tenant, args.appId, args.password)
args.creds = creds
tasks = [ args.loop.create_task(uploadBlobsTask(str(i), args, workQueue)) for i in range(1, args.tasks+1) ]
await asyncio.gather(*tasks)
await creds.close()
loop = asyncio.get_event_loop()
args.loop = loop
loop.run_until_complete(processFiles(args))
loop.close()
For what it's worth, I seem to have managed to fix this so that it works without memory leaks. I did this by obtaining a containerClient and then obtaining blobClients off of that (ie, containerClient.get_blob_client()) instead of obtaining BlobClient objects directly. Now the overall memory usage tops out at a very low level rather than growing continuously as before.

Salesforce Bulk API batch download with Python

I have an async Python script that creates a bulk API job/batch in Salesforce. After the batch is complete I then download the csv file for processing.
Here's my problem: A streaming download for a ~300 MB csv file using Python can take 3+ minutes using this asynchronous code:
If you're familiar with Salesforce bulk jobs, you can enter your information
into the variables below and download your batch results for testing. This is a working example of code provided you enter the necessary information.
import asyncio, aiohttp, aiofiles
from simple_salesforce import Salesforce
from credentials import credentials as cred
sf_data_path = 'C:/Users/[USER NAME]/Desktop/'
job_id = '[18 DIGIT JOB ID]'
batch_id = '[18 DIGIT BATCH ID]'
result_id = '[18 DIGIT RESULT ID]'
instance_name = '[INSTANCE NAME]'
result_url = f'https://{instance_name}.salesforce.com/services/async/45.0/job/{job_id}/batch/{batch_id}/result/{result_id}'
sf = Salesforce(username=['SALESFORCE USERNAME'],
password=['SALESFORCE PASSWORD'],
security_token=['SALESFORCE SECURITY TOKEN'],
organizationId=['SALESFORCE ORGANIZATION ID'])
async def download_results():
err = None
retries = 3
status = 'Not Downloaded'
for _ in range(retries):
try:
async with aiohttp.ClientSession() as session:
async with session.get(url=result_url,
headers={"X-SFDC-Session": sf.session_id, 'Content-Encoding': 'gzip'},
timeout=300) as resp:
async with aiofiles.open(f'{sf_data_path}_DOWNLOAD_.csv', 'wb') as outfile:
while True:
chunk = await resp.content.read(10485760) # = 10Mb
if not chunk:
break
await outfile.write(chunk)
status = 'Downloaded'
except Exception as e:
err = e
retries -= 1
status = 'Retrying'
continue
else:
break
else:
status = 'Failed'
return err, status, retries
asyncio.run(download_results())
However, if I download the result of the batch in the Developer Workbench: https://workbench.developerforce.com/asyncStatus.php?jobId='[18 DIGIT JOB ID]' the same file might download in 5 seconds.
There is obviously something going on here that I'm missing. I know that the Workbench uses PHP, is this functionality even available with Python? I figured the async calls would make this download quickly, but that doesn't seem to make it download as fast as the functionality in the browser. Any ideas?
Thanks!
You can try curl request to get the csv. This method is as quick as you see in the workbench.
More information you can read here:
https://developer.salesforce.com/docs/atlas.en-us.api_asynch.meta/api_asynch/query_walkthrough.htm
https://developer.salesforce.com/docs/atlas.en-us.api_asynch.meta/api_asynch/query_get_job_results.htm#example_locator

Kinesis Video stream async get frames

I would like to capture a video stream from AWS Kinesis and use asyncio. My goal is to extract frames from the stream and pass them to processing queue.
Below is the working example of what I have so far. It receives chunks of data of various length (may be in range of, say, 76 bytes up to 8192) which do not seem to be a complete frame.
Is there a cheap way to split the stream to chunks with support of asyncio and preferrably without threads?
I want one application to handle at least 10-20 streams and to have one application per CPU running on server.
I was thinking about ffmpeg and opencv, but they seem too heavyweight and do not seem to be good compatible with asyncio.
import asyncio
import aiobotocore
VIDEO_STREAM_NAME = 'bc-test1'
async def get_data2(loop):
chunk_size = 1024 * 1024 * 500
session = aiobotocore.get_session(loop=loop)
async with session.create_client('kinesisvideo', region_name='us-west-2', ) as client:
resp = await client.get_data_endpoint(
StreamName=VIDEO_STREAM_NAME,
APIName='GET_MEDIA',
)
data_url = resp['DataEndpoint']
async with session.create_client('kinesis-video-media', endpoint_url=data_url) as client:
resp = await client.get_media(
StreamName=VIDEO_STREAM_NAME,
StartSelector={"StartSelectorType": "NOW", },
)
print(resp)
while True:
data = await resp['Payload'].read(1024 * 8)
if data:
print("frame len: %s" % len(frame))
else:
print("No data")
break
def main():
loop = asyncio.get_event_loop()
loop.run_until_complete(get_data2(loop))
if __name__ == '__main__':
main()
I would consider advices to do not use asyncio with another solution to satisfy the requirements above.
Does it have to be python?
KinesisVideoStream has examples in Java to consume the stream, parse the data and decode the video: https://github.com/aws/amazon-kinesis-video-streams-parser-library
KinesisVideoStream also has KIT: https://aws.amazon.com/blogs/machine-learning/analyze-live-video-at-scale-in-real-time-using-amazon-kinesis-video-streams-and-amazon-sagemaker/
It can be used to process video streams in scale.

Categories

Resources