tqdm: extract time passed + time remaining? - python

I have been going over the tqdm docs, but no matter where I look, I cannot find a method by which to extract the time passed and estimated time remaining fields (basically the center of the progress bar on each line: 00:00<00:02).
0%| | 0/200 [00:00<?, ?it/s]
4%|▎ | 7/200 [00:00<00:02, 68.64it/s]
8%|▊ | 16/200 [00:00<00:02, 72.87it/s]
12%|█▎ | 25/200 [00:00<00:02, 77.15it/s]
17%|█▋ | 34/200 [00:00<00:02, 79.79it/s]
22%|██▏ | 43/200 [00:00<00:01, 79.91it/s]
26%|██▌ | 52/200 [00:00<00:01, 80.23it/s]
30%|███ | 61/200 [00:00<00:01, 82.13it/s]
....
100%|██████████| 200/200 [00:02<00:00, 81.22it/s]
tqdm works via essentially printing a dynamic progress bar anytime an update occurs, but is there a way to "just" print the 00:01 and 00:02 portions, so I could use them elsewhere in my Python program, such as in automatic stopping code that halts the process if it is taking too long?

tqdm objects expose some information via the public property format_dict.
from tqdm import tqdm
# for i in tqdm(iterable):
with tqdm(iterable) as t:
for i in t:
...
elapsed = t.format_dict['elapsed']
elapsed_str = t.format_interval(elapsed)
Otherwise you could parse str(t).split()

You can get elapsed and remaining time from format_dict and some calculations.
t = tqdm(total=100)
...
elapsed = t.format_dict["elapsed"]
rate = t.format_dict["rate"]
remaining = (t.total - t.n) / rate if rate and t.total else 0 # Seconds*

Here's the answer to the time remaining and time elapsed question:
from tqdm import tqdm
from time import sleep
with tqdm(total=100, bar_format="{l_bar}{bar} [ time left: {remaining}, time spent: {elapsed}]") as pbar:
for i in loop:
pbar.update(1)
sleep(0.01)
If needed to be worked with or printed elsewhere:
elapsed = pbar.format_dict["elapsed"]
remains = pbar.format_dict["remaining"]

Edit: see the library maintainer's answer below. Turns out, it is possible to get this information in the public API.
tqdm does not expose that information as part of its public API, and I don't recommend trying to hack your own into it. Then you would be depending on implementation details of tqdm that might change at any time.
However, that shouldn't stop you from writing your own. It's easy enough to instrument a loop with a timer, and you can then abort the loop if it takes too long. Here's a quick, rough example that still uses tqdm to provide visual feedback:
import time
from tqdm import tqdm
def long_running_function(n, timeout=5):
start_time = time.time()
for _ in tqdm(list(range(n))):
time.sleep(1) # doing some expensive work...
elapsed_time = time.time() - start_time
if elapsed_time > timeout:
raise TimeoutError("long_running_function took too long!")
long_running_function(100, timeout=10)
If you run this, the function will stop its own execution after 10 seconds by raising an exception. You could catch this exception at the call site and respond to it in whatever way you deem appropriate.
If you want to be clever, you could even factor this out in a tqdm-like wrapper like this:
def timed_loop(iterator, timeout):
start_time = time.time()
iterator = iter(iterator)
while True:
elapsed_time = time.time() - start_time
if elapsed_time > timeout:
raise TimeoutError("long_running_function took too long!")
try:
yield next(iterator)
except StopIteration:
pass
def long_running_function(n, timeout=5):
for _ in timed_loop(tqdm(list(range(n))), timeout=timeout):
time.sleep(0.1)
long_running_function(100, timeout=5)

Related

Check every X-milliseconds if process/application is running on Win

i want to check every 500 milliseconds if a process/application is running (Windows 10). The code should be very fast and resource efficient!
My Code is this but how to build the 500 milliseconds in. Is psutil the fastest and best way? Thank You.
import psutil
for p in psutil.process_iter(attrs=['pid', 'name']):
if "excel.exe" in (p.info['name']).lower():
print("Application is running", (p.info['name']).lower())
else:
print("Application is not Running")
How about doing it like this:
import psutil
import time
def running(pname):
pname = pname.lower()
for p in psutil.process_iter(attrs=['name']):
if pname in p.info['name'].lower():
print(f'{pname} is running')
return # early return
print(f'{pname} is not running')
while True:
running('excel.exe')
time.sleep(0.5)
First of all, psutil is a pretty good library. It has C Bindings so you won't be able to get much faster.
import psutil
import time
def print_app():
present = False
for p in psutil.process_iter(attrs=['pid', 'name']):
if "excel.exe" in (p.info['name']).lower():
present = True
print(f"Application is {'' if present else 'not'} present")
start_time = time.time()
print_app()
print("--- %s seconds ---" % (time.time() - start_time))
You can know how much time it takes. 0.06sec for me.
if you want to exec this every 0.5s you can simply put a time.sleep because 0.5 >> 0.06.
You can then write this kind of code:
import psutil
import time
def print_app():
present = False
for p in psutil.process_iter(attrs=['pid', 'name']):
if "excel.exe" in (p.info['name']).lower():
present = True
print(f"Application is {'' if present else 'not'} present")
while True:
print_app()
sleep(0.5)
PS: I changed your code to check if your app was running without printing it. This makes the code faster because print takes a bit of time.

Trying to add throttle control to paralleled API calls in python

I am using Google places API which has a query per second limit of 10. This means I cannot make more than 10 requests within a second. If we were using Serial execution this wouldn't be an issue as the APIs avg response time is 250 ms, so i will be able to make just 4 calls in a second.
To utilize the entire 10 QPS limit i used multithreading and made parallel API calls. But now i need to control the number of calls that can happen in a second, it should not go beyond 10 (google API starts throwing errors if i cross the limit)
Below is the code that i have so far, I am not able to figure out why the program just gets stuck sometimes or takes alot longer than required.
import time
from datetime import datetime
import random
from threading import Lock
from concurrent.futures import ThreadPoolExecutor as pool
import concurrent.futures
import requests
import matplotlib.pyplot as plt
from statistics import mean
from ratelimiter import RateLimiter
def make_parallel(func, qps=10):
lock = Lock()
threads_execution_que = []
limit_hit = False
def qps_manager(arg):
current_second = time.time()
lock.acquire()
if len(threads_execution_que) >= qps or limit_hit:
limit_hit = True
if current_second - threads_execution_que[0] <= 1:
time.sleep(current_second - threads_execution_que[0])
current_time = time.time()
threads_execution_que.append(current_time)
lock.release()
res = func(arg)
lock.acquire()
threads_execution_que.remove(current_time)
lock.release()
return res
def wrapper(iterable, number_of_workers=12):
result = []
with pool(max_workers=number_of_workers) as executer:
bag = {executer.submit(func, i): i for i in iterable}
for future in concurrent.futures.as_completed(bag):
result.append(future.result())
return result
return wrapper
#make_parallel
def api_call(i):
min_func_time = random.uniform(.25, .3)
start_time = time.time()
try:
response = requests.get('https://jsonplaceholder.typicode.com/posts', timeout=1)
except Exception as e:
response = e
if (time.time() - start_time) - min_func_time < 0:
time.sleep(min_func_time - (time.time() - start_time))
return response
api_call([1]*50)
Ideally the code should take not more than 1.5 seconds, but currently it is taking about 12-14 seconds.
The script speeds up to its expected speed as soon as i remove the QPS manager logic.
Please do suggest what i am doing wrong and also, if there is any package available already which does this mechanism out of the box.
Looks like ratelimit does just that:
from ratelimit import limits, sleep_and_retry
#make_parallel
#sleep_and_retry
#limits(calls=10, period=1)
def api_call(i):
try:
response = requests.get("https://jsonplaceholder.typicode.com/posts", timeout=1)
except Exception as e:
response = e
return response
EDIT: I did some testing and it looks like #sleep_and_retry is a little too optimistic, so just increase the period a little, to 1.2 second:
s = datetime.now()
api_call([1] * 50)
elapsed_time = datetime.now() - s
print(elapsed_time > timedelta(seconds=50 / 10))

Python requests lib is taking way longer than it should to do a get request

So I have this code. Whenever I run the code, and it gets to line 3, it takes about 20 whole seconds to do the get request. There is no reason it should be taking this long, and it's consistently taking long every time. Any help?
def get_balance(addr):
try:
r = requests.get("http://blockexplorer.com/api/addr/"+addr+"/balance")
return int(r.text)/10000000
except:
return "e"
It works for me most of the time.
>>> def get_balance(addr):
... try:
... start = time.time()
... r = requests.get("http://blockexplorer.com/api/addr/"+addr+"/balance")
... end = time.time()
... print(f"took {end - start} seconds")
... print(r.text, "satoshis")
... return int(r.text)/100000000
... except:
... return "e"
...
>>>
>>> get_balance("1HB5XMLmzFVj8ALj6mfBsbifRoD4miY36v")
took 0.7754228115081787 seconds
151881086 satoshis
15.1881086
But if I do this enough times in a row, I'll occasionally get the error "Bitcoin JSON-RPC: Work queue depth exceeded. Code:429"
Print out r.text like I did, and that might show you an error message from Block Explorer. It might be that they have started rate-limiting you.

Slow brute force program in python

So here's the problem, our security teacher made a site that requires authentification and then asks for a code (4 characters) so that you can access to a file. He told us to write a brute force program in Python (any library we want) that can find the password. So to do that I wanted first to make a program that can try random combinations on that code field just to have an idea about the time of each request ( I'm using requests library) and the result was disapointing each request takes around 8 secs.
With some calculations: 4^36=13 436 928 possible combination that would take my program around 155.52 days.
I would really apreciate if any one can help me out to make that faster. ( he told us that it is possible to make around 1200 combinations per sec)
Here's my code:
import requests
import time
import random
def gen():
alphabet = "abcdefghijklmnopqrstuvwxyz0123456789"
pw_length = 4
mypw = ""
for i in range(pw_length):
next_index = random.randrange(len(alphabet))
mypw = mypw + alphabet[next_index]
return mypw
t0 = time.clock()
t1 = time.time()
cookie = {'ig': 'b0b5294376ef12a219147211fc33d7bb'}
for i in range(0,5):
t2 = time.clock()
t3 = time.time()
values = {'RECALL':gen()}
r = requests.post('http://www.example.com/verif.php', stream=True, cookies=cookie, data=values)
print("##################################")
print("cpu time for req ",i,":", time.clock()-t2)
print("wall time for req ",i,":", time.time()-t3)
print("##################################")
print("##################################")
print("Total cpu time:", time.clock()-t0)
print("Total wall time:", time.time()-t1)
Thank you
A thing you could try is to use a Pool of workers to do multiple requests in parallel passing a password to each worker. Something like:
import itertools
from multiprocessing import Pool
def pass_generator():
for pass_tuple in itertools.product(alphabet, repeat=4):
yield ''.join(pass_tuple)
def check_password(password):
values = {'RECALL': password}
r = requests.post('http://www.example.com/verif.php', stream=True, cookies=cookie, data=values)
# Check response here.
pool = Pool(processes=NUMBER_OF_PROCESSES)
pool.map(check_password, pass_generator())

Multiprocessing pool returning wrong results

Another confused parallel coder here!
Our internal Hive database has an API layer to which we need to use to access the data. There is a 300 second query timeout limit, so I wanted to use multiprocessing to execute multiple queries in parallel:
from multiprocessing import Pool
import pandas as pd
import time
from hive2pandas_anxpy import Hive2Pandas # custom module for querying our Hive db and converting the results to a Pandas dataframe
import datetime
def run_query(hour):
start_time = time.time()
start_datetime = datetime.datetime.now()
query = """SELECT id, revenue from table where date='2014-05-20 %s' limit 50""" % hour
h2p = Hive2Pandas(query, 'username')
h2p.run()
elapsed_time = int(time.time() - start_time)
end_datetime = datetime.datetime.now()
return {'query':query, 'start_time':start_datetime, 'end_time':end_datetime, 'elapsed_time':elapsed_time, 'data':h2p.data_df}
if __name__ == '__main__':
start_time = time.time()
pool = Pool(4)
hours = ['17','18','19']
results = pool.map_async(run_query, hours)
pool.close()
pool.join()
print int(time.time() - start_time)
The issue I'm having is that one of the queries always returns no data, but when I run the same query in the usual fashion, it returns data. Since I'm new to multiprocessing, I'm wondering if there are there any obvious issues with how I'm using it above?
I think the issue you are having is that the results object is not ready by the time you want to use it. Also if you have a known amount of time for a timeout, I would suggest using that to your advantage in the code.
This code shows an example of how you can force a timeout after 300 seconds if the results from all of them are not collected by then.
if __name__ == '__main__':
start_time = time.time()
hours = ['17','18','19']
with Pool(processes=4) as pool:
results = pool.map_async(run_query, hours)
print(results.get(timeout=300))
print int(time.time() - start_time)
Otherwise you should still be using results.get() to return your data, or specify a callback function for map_async.

Categories

Resources