Resume download with pySmartDL - python

I would like to know if, when the program stops while downloading a file with pySmartDL, it is possible to resume it where it stopped

is pause/unpause what you are looking for?
https://github.com/iTaybb/pySmartDL/blob/master/test/test_pySmartDL.py#L76
def test_pause_unpause(self, testfile=None):
obj = pySmartDL.SmartDL(testfile if testfile else self.res_7za920_mirrors, dest=self.dl_dir, progress_bar=False, connect_default_logger=self.enable_logging)
obj.start(blocking=False)
while not obj.get_dl_size():
time.sleep(0.1)
# pause
obj.pause()
time.sleep(0.5)
if obj.get_status() == "finished":
# too bad, the file was too small and was downloaded complectely until we stopped it.
# We should download a bigger file
if self.res_testfile_100mb == testfile:
self.fail("The download got completed before we could stop it, even though we've used a big file. Are we on a 100GB/s internet connection or somethin'?")
return self.test_pause_unpause(testfile=self.res_testfile_100mb)
dl_size = obj.get_dl_size()
# verify download has really stopped
time.sleep(2.5)
self.assertEqual(dl_size, obj.get_dl_size())
# continue
obj.unpause()
time.sleep(2.5)
self.assertNotEqual(dl_size, obj.get_dl_size())
obj.wait()
self.assertTrue(obj.isSuccessful())
more likely you want restart partially downloaded file which has this issue opened. https://github.com/iTaybb/pySmartDL/issues/14

Related

Selenium (Python), How do I get the file path after the download completes of each download process using Pool?

I looked at other topics on almost the same issue, but basically the function is implemented for downloading and waiting for one file, getting one path. That's all right.
But what to do when the files are downloaded almost simultaneously, but through the Pool, and the response returns the same file in the directory?
Important: Chromium works in headless mode.
Now I have the following functions working together
def download_wait(path_to_downloads):
seconds = 0
download_waiting = True
while download_waiting and seconds < 300:
time.sleep(1)
download_waiting = False
for filename in os.listdir(path_to_downloads):
if filename.endswith('.crdownload'):
download_waiting = True
seconds += 1
print(f'File downloaded in {seconds} seconds')
def last_downloaded_file(download_dir):
filename = max([f for f in os.listdir(download_dir)], key=lambda xa: os.path.getctime(os.path.join(download_dir, xa)))
return filename
Any ideas for upgrading these features? It is necessary for each Pool to receive its own file.

Why is os.path.getmtime() always running twice? It does not make any sense

Here is the code:
import os
import asyncio
async def func_placing_sell_orders():
prev_final_stocks_list_state = os.path.getmtime('stock_data//final_stocks_list.json')
print('i run once')
while True:
if (prev_final_stocks_list_state != os.path.getmtime('stock_data//final_stocks_list.json')):
prev_final_stocks_list_state = os.path.getmtime('stock_data//final_stocks_list.json')
print('here')
asyncio.get_event_loop().run_until_complete(func_placing_sell_orders())
simplified ver:
import os
def simple():
state = os.path.getmtime('file.json')
print('i run once')
while True:
if (state != os.path.getmtime('file.json')):
state = os.path.getmtime('file.json')
print('here')
simple()
This is the print out:
i run once
here
here
here, gets print out twice every time I save the file. I ran to check the time between previous and current modified time and it is always different, which implies it should only run once per save.
This is so basic I don't understand why I'm getting this result. Please send help
If the file is large enough maybe the first "here" is while file is still writing edits and the last "here" is after the saving is done. Also, if you're using something like open("file", "w") or something like this to write edits, the file will be first clean (first "here") and then edited with with new data (second "here")
You can ignore too fast reports (<1s) with a simple timer
lastEdit = time.time()
while True:
if (state != os.path.getmtime('file.json')):
state = os.path.getmtime('file.json')
if time.time()-lastEdit > 1:
print('here')
lastEdit = time.time()

Reuse opened connection to another script

I'm trying to build project consists of multiple python files. The first file is called "startup.py" and just responsible of opening connections to multiple routers and switches (each device allow only one connection at a time) and save them to the list. This script should be running all the time so other files can use it
#startup.py
def validate_connections_to_leaves():
leaves = yaml_utils.load_yaml_file_from_directory("inventory", topology)["fabric_leaves"]
leaves_connections = []
for leaf in leaves:
leaf_ip = leaf["ansible_host"]
leaf_user = leaf["ansible_user"]
leaf_pass = leaf["ansible_pass"]
leaf_cnx = junos_utils.open_fabric_connection(host=leaf_ip, user=leaf_user, password=leaf_pass)
if leaf_cnx:
leaves_connections.append(leaf_cnx)
else:
log.script_logger(severity="ERROR", message="Unable to connect to Leaf", data=leaf_ip, debug=debug,
indent=0)
return leaves_connections
if __name__ == '__main__':
leaves = validate_connections_to_leaves()
pprint(leaves)
#Keep script running
while True:
time.sleep(10)
now I want to re-use these opened connections in another python file(s) without having to establish connections again. if I just import it to another file it will re-execute the startup script one more time.
can anyone help me to identify which part I'm missing here?
You should consider your startup.py file as your entry point where all the logic is. You other files should be imported and used inside this file.
import otherfile1
import otherfile2
# import other file here
def validate_connections_to_leaves:
# ...
if __name__ == '__main__':
leaves = validate_connections_to_leaves()
otherfile1.do_something_with_the_connection(leaves)
#Keep script running
while True:
time.sleep(10)
And in your other file it will be simply:
def do_something_with_the_connection(leaves):
# do something with the connections

Python watchdog windows wait till copy finishes

I am using the Python watchdog module on a Windows 2012 server to monitor new files appearing on a shared drive. When watchdog notices the new file it kicks off a database restore process.
However, it seems that watchdog will attempt to restore the file the second it is created and not wait till the file has finished copying to the shared drive. So I changed the event to on_modified but there are two on_modified events, one when the file is initially being copied and one when it is finished being copied.
How can I handle the two on_modified events to only fire when the file being copied to the shared drive has finished?
What happens when multiple files are copied to the shared drive at the same time?
Here is my code
import time
import subprocess
from watchdog.observers import Observer
from watchdog.events import FileSystemEventHandler
class NewFile(FileSystemEventHandler):
def process(self, event):
if event.is_directory:
return
if event.event_type == 'modified':
if getext(event.src_path) == 'gz':
load_pgdump(event.src_path)
def on_modified(self, event):
self.process(event)
def getext(filename):
"Get the file extension"
file_ext = filename.split(".",1)[1]
return file_ext
def load_pgdump(src_path):
restore = 'pg_restore command ' + src_path
subprocess.call(restore, shell=True)
def main():
event_handler = NewFile()
observer = Observer()
observer.schedule(event_handler, path='Y:\\', recursive=True)
observer.start()
try:
while True:
time.sleep(1)
except KeyboardInterrupt:
observer.stop()
observer.join()
if __name__ == '__main__':
main()
In your on_modified event, just wait until the file is finished being copied, via watching the filesize.
Offering a Simpler Loop:
historicalSize = -1
while (historicalSize != os.path.getsize(filename)):
historicalSize = os.path.getsize(filename)
time.sleep(1)
print "file copy has now finished"
I'm using following code to wait until file copied (for Windows only):
from ctypes import windll
import time
def is_file_copy_finished(file_path):
finished = False
GENERIC_WRITE = 1 << 30
FILE_SHARE_READ = 0x00000001
OPEN_EXISTING = 3
FILE_ATTRIBUTE_NORMAL = 0x80
if isinstance(file_path, str):
file_path_unicode = file_path.decode('utf-8')
else:
file_path_unicode = file_path
h_file = windll.Kernel32.CreateFileW(file_path_unicode, GENERIC_WRITE, FILE_SHARE_READ, None, OPEN_EXISTING, FILE_ATTRIBUTE_NORMAL, None)
if h_file != -1:
windll.Kernel32.CloseHandle(h_file)
finished = True
print 'is_file_copy_finished: ' + str(finished)
return finished
def wait_for_file_copy_finish(file_path):
while not is_file_copy_finished(file_path):
time.sleep(0.2)
wait_for_file_copy_finish(r'C:\testfile.txt')
The idea is to try open a file for write with share read mode. It will fail if someone else is writing to it.
Enjoy ;)
I would add a comment as this isn't an answer to your question but a different approach... but I don't have enough rep yet. You could try monitoring filesize, if it stops changing you can assume copy has finished:
copying = True
size2 = -1
while copying:
size = os.path.getsize('name of file being copied')
if size == size2:
break
else:
size2 = os.path.getsize('name of file being copied')
time.sleep(2)
On linux you also get close event. Than solution would be to wait with processing file until file gets closed.
My approach would be to add on_closed handling.
class Handler(FileSystemEventHandler):
def __init__(self):
self.files_to_process = set()
def dispatch(self, event):
_method_map = {
'created': self.on_created,
'closed': self.on_closed
}
def on_created(self, event):
self.files_to_process.add(event.src_path)
def on_closed(self, event):
self.files_to_process.remove(event.src_path)
actual_processing(event.src_path)
I had a similar issue recently with watchdog. A rather simple but not very smart workaround was for me to check the change of file size in a while loop using a two-element list, one for 'past', one for 'now'. Once the the values are equal the copying is finished.
Edit: something like this.
past = 0
now = 1
value = [past, now]
while True:
# change
# test
if value[0] == value[1]:
break
else:
value = [value[1], value[0]]
This works for me. Tested in windows as well with python3.7
while True:
size_now = os.path.getsize(event.src_path)
if size_now == size_past:
log.debug("file has copied completely now size: %s", size_now)
break
# TODO: why sleep is not working here ?
else:
size_past = os.path.getsize(event.src_path)
log.debug("file copying size: %s", size_past)
Old I know, but I recently came up with a solution for this exact problem. In my case, I was only concerned with wav and mp3 files. This function will ensure that only files that are completely copied will be sent to makerCore() because the created placeholder files do not have any extension and will always end up in 'not ready'. Once the file is completed it will trigger the watchdog module again except this time with an extension. This will work on multiple files simultaneously as well.
def on_created(event):
#print(event)
if str(event.src_path).endswith('.mp3') or str(event.src_path).endswith('.wav'):
makerCore(event)
else:
print('not ready')
I am using a different approach that might not be the most elegant one but is easy to do on any plateform if you have control on the side copying the file.
Just had 'in-progress' to the name of the file until the copying is complete, and then rename the file. You can then have a while loop waiting for the file with the name without 'in-progress' to exist and you're good.
I've tried the check filesize - wait - check again routine many have suggested above but it's not very reliable. To make it work better I've added a check if the file is still locked.
file_done = False
file_size = -1
while file_size != os.path.getsize(file_path):
file_size = os.path.getsize(file_path)
time.sleep(1)
while not file_done:
try:
os.rename(file_path, file_path)
file_done = True
except:
return True
Following up to ravenwing's answer, more details can be found about on_closed in watchdog here.
As mentioned in the documented issue, there is no documentation available for on_closed yet and it can only be used with unix.

Webdriver open a file as soon as it finishes downloading

I'm writing some tests and I'm I'm using the Firefox webdriver with a FirefoxProfile to download a file from an external url, but I need to read such file as soon as it finishes downloading to retrieve some specific data.
I set my profile and driver like this:
fp = webdriver.FirefoxProfile()
fp.set_preference("browser.download.folderList", 2)
fp.set_preference("browser.download.manager.showWhenStarting", False)
fp.set_preference("browser.download.dir", '/some/path/')
fp.set_preference("browser.helperApps.neverAsk.saveToDisk", "text/plain, application/vnd.ms-excel, text/csv, text/comma-separated-values, application/octet-stream")
ff = webdriver.Firefox(firefox_profile=fp)
Is there some way to know when the file finishes downloading, so that I know when to call the reader function without having to poll the download directory, waiting with time.sleep or using any Firefox add-on?
Thanks for any help :)
You could try hooking the file up to a file object as it downloads to use it like a stream buffer, polling it as it downloads to get the data you need, monitoring for the download completion yourself directly (either by waiting for the file to be of the expected size or by assuming it is complete if there has been no new data added for a certain amount of time).
Edit:
You could try to look at the download tracking db in the profile folder as referenced here. Looks like you can wait for your file to have status 1.
I like to use inotify to watch for these kinds of events. Some example code,
from pyinotify import (
EventsCodes,
ProcessEvent,
Notifier,
WatchManager,
)
class EventManager(ProcessEvent):
def process_IN_CLOSE_WRITE(self, event):
file_path = os.path.join(event.path, event.name)
# do something to file, you might want to wait a second here and
# also test for existence because ff might be making temp files
wm = WatchManager()
notifier = Notifier(wm, EventManager())
wdd = wm.add_watch('/some/path', EventsCodes.ALL_FLAGS['IN_CLOSE_WRITE'], rec=True)
While True:
try:
notifier.process_events()
if notifier.check_events():
notifier.read_events()
except:
notifier.stop()
raise
The notifier decides which method to call on the event manager based on the name of the event. So in this case we are only watching for IN_CLOSE_WRITE events
It's far from ideal, however with firefox you could check the target folder for the presence of the .part file which is present while it's still downloading (with other browsers you can do something similar).
A while loop will then halt everything while waiting for the download to complete:
import os
def test_for_partfile():
part_file = False
dir = "C:\\Downloads"
filelist = (os.listdir(dir))
for partfile in filelist:
if partfile.endswith('.part'):
part_file = True
return part_file
while test_for_partfile():
time.sleep(15)

Categories

Resources