timeit eats return value - python

I want to measure execution time of a function on the cheap, something like this:
def my_timeit(func, *args, **kwargs):
t0 = time.time()
result = func(*args, **kwargs)
delta = time.time() - t0
return delta, result
def foo():
time.sleep(1.23)
return 'potato'
delta, result = my_timeit(foo)
But I want to use timeit, profile or other built-in to handle whatever are the common pitfalls due to platform differences, and it would probably be also better to get the actual execution time not the wall time.
I tried using timeit.Timer(foo).timeit(number=1) but the interface seems to obscure the return value.

This is my current attempt. But I would welcome any suggestions, because this feels too hacky and could probably do with improvement.
import time
from timeit import Timer
def my_timeit(func, *args, **kwargs):
output_container = []
def wrapper():
output_container.append(func(*args, **kwargs))
timer = Timer(wrapper)
delta = timer.timeit(1)
return delta, output_container.pop()
def foo():
time.sleep(1.111)
return 'potato'
delta, result = my_timeit(foo)
edit: adapted to work as a decorator below:
def timeit_decorator(the_func):
#functools.wraps(the_func)
def my_timeit(*args, **kwargs):
output_container = []
def wrapper():
output_container.append(the_func(*args, **kwargs))
timer = Timer(wrapper)
delta = timer.timeit(1)
my_timeit.last_execution_time = delta
return output_container.pop()
return my_timeit

How about
>>time python yourprogram.py < input.txt
This is the output for a python script I ran
[20:13:29] praveen:jan$ time python mtrick.py < input_mtrick.txt
3 3 9
1 2 3 4
real 0m0.067s
user 0m0.016s
sys 0m0.012s

Related

Python decorator to time recursive functions

I have a simple decorator to track the runtime of a function call:
def timed(f):
def caller(*args):
start = time.time()
res = f(*args)
end = time.time()
return res, end - start
return caller
This can be used as follows, and returns a tuple of the function result and the execution time.
#timed
def test(n):
for _ in range(n):
pass
return 0
print(test(900)) # prints (0, 2.69e-05)
Simple enough. But now I want to apply this to recursive functions. Applying the above wrapper to a recursive function results in nested tuples with the times of each recursive call, as is expected.
#timed
def rec(n):
if n:
return rec(n - 1)
else:
return 0
print(rec(3)) # Prints ((((0, 1.90e-06), 8.10e-06), 1.28e-05), 1.90e-05)
What's an elegant way to write the decorator so that it handles recursion properly? Obviously, you could wrap the call if a timed function:
#timed
def wrapper():
return rec(3)
This will give a tuple of the result and the time, but I want all of it to be handled by the decorator so that the caller does not need to worry about defining a new function for every call. Ideas?
The problem here isn't really the decorator. The problem is that rec needs rec to be a function that behaves one way, but you want rec to be a function that behaves differently. There's no clean way to reconcile that with a single rec function.
The cleanest option is to stop requiring rec to be two things at once. Instead of using decorator notation, assign timed(rec) to a different name:
def rec(n):
...
timed_rec = timed(rec)
If you don't want two names, then rec needs to be written to understand the actual value that the decorated rec will return. For example,
#timed
def rec(n):
if n:
val, runtime = rec(n-1)
return val
else:
return 0
I prefer the other answers so far (particularly user2357112's answer), but you can also make a class-based decorator that detects whether the function has been activated, and if so, bypasses the timing:
import time
class fancy_timed(object):
def __init__(self, f):
self.f = f
self.active = False
def __call__(self, *args):
if self.active:
return self.f(*args)
start = time.time()
self.active = True
res = self.f(*args)
end = time.time()
self.active = False
return res, end - start
#fancy_timed
def rec(n):
if n:
time.sleep(0.01)
return rec(n - 1)
else:
return 0
print(rec(3))
(class written with (object) so that this is compatible with py2k and py3k).
Note that to really work properly, the outermost call should use try and finally. Here's the fancied up fancy version of __call__:
def __call__(self, *args):
if self.active:
return self.f(*args)
try:
start = time.time()
self.active = True
res = self.f(*args)
end = time.time()
return res, end - start
finally:
self.active = False
You could structure your timer in a different way by *ahem* abusing the contextmanager and function attribute a little...
from contextlib import contextmanager
import time
#contextmanager
def timed(func):
timed.start = time.time()
try:
yield func
finally:
timed.duration = time.time() - timed.start
def test(n):
for _ in range(n):
pass
return n
def rec(n):
if n:
time.sleep(0.05) # extra delay to notice the difference
return rec(n - 1)
else:
return n
with timed(rec) as r:
print(t(10))
print(t(20))
print(timed.duration)
with timed(test) as t:
print(t(555555))
print(t(666666))
print(timed.duration)
Results:
# recursive
0
0
1.5130000114440918
# non-recursive
555555
666666
0.053999900817871094
If this is deemed a bad hack I'll gladly accept your criticism.
Although it is not an overall solution to the problem of integrating recursion with decorators, for the problem of timing only, I have verified that the last element of the tuple of the times is the overall run time, as this is the time from the upper-most recursive call. Thus if you had
#timed
def rec():
...
to get the overall runtime given the original function definitions you could simply do
rec()[1]
Getting the result of the call, on the other hand, would then require recusing through the nested tuple:
def get(tup):
if isinstance(tup, tuple):
return get(tup[0])
else:
return tup
This might be too complicated to simply get the result of your function.
I encountered the same issue when trying to profile a simple quicksort implementation.
The main issue is that decorators are executed on each function call and we need something that can keep a state, so we can sum all calls at the end. Decorators are not the right tool the job
However, one idea is to abuse the fact that functions are objects and can have atributes. This is explored below with a simple decorator. Something that must be understood is that, by using decorator's sintax sugar (#), the function will always be accumulating its timings.
from typing import Any, Callable
from time import perf_counter
class timeit:
def __init__(self, func: Callable) -> None:
self.func = func
self.timed = []
def __call__(self, *args: Any, **kwds: Any) -> Any:
start = perf_counter()
res = self.func(*args, **kwds)
end = perf_counter()
self.timed.append(end - start)
return res
# usage
#timeit
def rec(n):
...
if __name__ == "__main__":
result = rec(4) # rec result
print(f"Took {rec.timed:.2f} seconds")
# Out: Took 3.39 seconds
result = rec(4) # rec result
# timings between calls are accumulated
# Out: Took 6.78 seconds
Which brings us to a solution inspired by #r.ook, below is a simple context manager that stores each run timing and prints its sum at the end (__exit__). Notice that, because for each timing we require a with statement, this will not accumulate different runs.
from typing import Any, Callable
from time import perf_counter
class timeit:
def __init__(self, func: Callable) -> None:
self.func = func
self.timed = []
def __call__(self, *args: Any, **kwds: Any) -> Any:
start = perf_counter()
res = self.func(*args, **kwds)
end = perf_counter()
self.timed.append(end - start)
return res
def __enter__(self):
return self
def __exit__(self, exc_type, exc_value, exc_traceback):
# TODO: report `exc_*` if an exception get raised
print(f"Took {sum(self.timed):.2f} seconds")
return
# usage
def rec(n):
...
if __name__ == "__main__":
with timeit(rec) as f:
result = f(a) # rec result
# Out: Took 3.39 seconds

How to slow down asynchrounous API calls to match API limits?

I have a list of ~300K URLs for an API i need to get data from.
The API limit is 100 calls per second.
I have made a class for the asynchronous but this is working to fast and I am hitting an error on the API.
How do I slow down the asynchronous, so that I can make 100 calls per second?
import grequests
lst = ['url.com','url2.com']
class Test:
def __init__(self):
self.urls = lst
def exception(self, request, exception):
print ("Problem: {}: {}".format(request.url, exception))
def async(self):
return grequests.map((grequests.get(u) for u in self.urls), exception_handler=self.exception, size=5)
def collate_responses(self, results):
return [x.text for x in results]
test = Test()
#here we collect the results returned by the async function
results = test.async()
response_text = test.collate_responses(results)
The first step that I took was to create an object who can distribute a maximum of n coins every t ms.
import time
class CoinsDistribution:
"""Object that distribute a maximum of maxCoins every timeLimit ms"""
def __init__(self, maxCoins, timeLimit):
self.maxCoins = maxCoins
self.timeLimit = timeLimit
self.coin = maxCoins
self.time = time.perf_counter()
def getCoin(self):
if self.coin <= 0 and not self.restock():
return False
self.coin -= 1
return True
def restock(self):
t = time.perf_counter()
if (t - self.time) * 1000 < self.timeLimit:
return False
self.coin = self.maxCoins
self.time = t
return True
Now we need a way of forcing function to only get called if they can get a coin.
To do that we can write a decorator function that we could use like that:
#limitCalls(callLimit=1, timeLimit=1000)
def uniqFunctionRequestingServer1():
return 'response from s1'
But sometimes, multiple functions are calling requesting the same server so we would want them to get coins from the the same CoinsDistribution object.
Therefor, another use of the decorator would be by supplying the CoinsDistribution object:
server_2_limit = CoinsDistribution(3, 1000)
#limitCalls(server_2_limit)
def sendRequestToServer2():
return 'it worked !!'
#limitCalls(server_2_limit)
def sendAnOtherRequestToServer2():
return 'it worked too !!'
We now have to create the decorator, it can take either a CoinsDistribution object or enough data to create a new one.
import functools
def limitCalls(obj=None, *, callLimit=100, timeLimit=1000):
if obj is None:
obj = CoinsDistribution(callLimit, timeLimit)
def limit_decorator(func):
#functools.wraps(func)
def limit_wrapper(*args, **kwargs):
if obj.getCoin():
return func(*args, **kwargs)
return 'limit reached, please wait'
return limit_wrapper
return limit_decorator
And it's done ! Now you can limit the number of calls any API that you use and you can build a dictionary to keep track of your CoinsDistribution objects if you have to manage a lot of them (to differrent API endpoints or to different APIs).
Note: Here I have choosen to return an error message if there are no coins available. You should adapt this behaviour to your needs.
You can just keep track of how much time has passed and decide if you want to do more requests or not.
This will print 100 numbers per second, for example:
from datetime import datetime
import time
start = datetime.now()
time.sleep(1);
counter = 0
while (True):
end = datetime.now()
s = (end-start).seconds
if (counter >= 100):
if (s <= 1):
time.sleep(1) # You can keep track of the time and sleep less, actually
start = datetime.now()
counter = 0
print(counter)
counter += 1
This other question in SO shows exactly how to do this. By the way, what you need is usually called throttling.

Understanding this Python decorator code

I have problem in understanding this code that I got from book "Learning Python" section decorators.
Why this code return result's variable value once instead of twice? We returned the amount of result variable twice, once in "max_result" and another in "measure"; here is the code:
from time import sleep, time
from functools import wraps
def measure(func):
#wraps(func)
def wrapper(*args, **kwargs):
t = time()
result = func(*args, **kwargs)
print(func.__name__, 'took:', time() - t)
return result
return wrapper
def max_result(func):
#wraps(func)
def wrapper(*args, **kwargs):
result = func(*args, **kwargs)
if result > 100:
print('Result is too big ({0}). Max allowed is 100.'
.format(result))
return result
return wrapper
#measure
#max_result
def cube(n):
return n ** 3
print(cube(2))
print(cube(5))
Here is the output, why don't we get two 8 or two 125?
>>> print(cube(2))
cube took: 8.106231689453125e-06
8
>>> print(cube(5))
Result is too big (125). Max allowed is 100.
cube took: 5.91278076171875e-05
125
>>>
The decorators are chained. The original cube() function was wrapped by the max_result decorator, and the result of that decoration was decorated by measure.
So the return value of cube() is taken by wrapper() in max_result(), and the result of that function is taken by wrapper() in measure() before being returned to the caller.
Unraveling all the decorators would give you:
def measure_wrapper(*args, **kwargs):
t = time()
result = max_result_wrapper(*args, **kwargs)
print(func.__name__, 'took:', time() - t)
return result
def max_result_wrapper(*args, **kwargs):
result = original_cube(*args, **kwargs)
if result > 100:
print('Result is too big ({0}). Max allowed is 100.'
.format(result))
return result
def original_cube(n):
return n ** 3
cube = measure_wrapper
So calling cube(2) produces:
measure_wrapper(2), records t = time() and calls
max_result_wrapper(2), which directly calls
original_cube(2), which
returns 2 ** 3 is 8
tests 8 > 100, whichis false so
returns 8
prints the time the max_result_wrapper() call took and
returns 8

How can I capture return value with Python timeit module?

Im running several machine learning algorithms with sklearn in a for loop and want to see how long each of them takes. The problem is I also need to return a value and DONT want to have to run it more than once because each algorithm takes so long. Is there a way to capture the return value 'clf' using python's timeit module or a similar one with a function like this...
def RandomForest(train_input, train_output):
clf = ensemble.RandomForestClassifier(n_estimators=10)
clf.fit(train_input, train_output)
return clf
when I call the function like this
t = Timer(lambda : RandomForest(trainX,trainy))
print t.timeit(number=1)
P.S. I also dont want to set a global 'clf' because I might want to do multithreading or multiprocessing later.
For Python 3.5 you can override the value of timeit.template
timeit.template = """
def inner(_it, _timer{init}):
{setup}
_t0 = _timer()
for _i in _it:
retval = {stmt}
_t1 = _timer()
return _t1 - _t0, retval
"""
unutbu's answer works for python 3.4 but not 3.5 as the _template_func function appears to have been removed in 3.5
The problem boils down to timeit._template_func not returning the function's return value:
def _template_func(setup, func):
"""Create a timer function. Used if the "statement" is a callable."""
def inner(_it, _timer, _func=func):
setup()
_t0 = _timer()
for _i in _it:
_func()
_t1 = _timer()
return _t1 - _t0
return inner
We can bend timeit to our will with a bit of monkey-patching:
import timeit
import time
def _template_func(setup, func):
"""Create a timer function. Used if the "statement" is a callable."""
def inner(_it, _timer, _func=func):
setup()
_t0 = _timer()
for _i in _it:
retval = _func()
_t1 = _timer()
return _t1 - _t0, retval
return inner
timeit._template_func = _template_func
def foo():
time.sleep(1)
return 42
t = timeit.Timer(foo)
print(t.timeit(number=1))
returns
(1.0010340213775635, 42)
The first value is the timeit result (in seconds), the second value is the function's return value.
Note that the monkey-patch above only affects the behavior of timeit when a callable is passed timeit.Timer. If you pass a string statement, then you'd have to (similarly) monkey-patch the timeit.template string.
Funnily enough, I'm also doing machine-learning, and have a similar requirement ;-)
I solved it as follows, by writing a function, that:
runs your function
prints the running time, along with the name of your function
returns the results
Let's say you want to time:
clf = RandomForest(train_input, train_output)
Then do:
clf = time_fn( RandomForest, train_input, train_output )
Stdout will show something like:
mymodule.RandomForest: 0.421609s
Code for time_fn:
import time
def time_fn( fn, *args, **kwargs ):
start = time.clock()
results = fn( *args, **kwargs )
end = time.clock()
fn_name = fn.__module__ + "." + fn.__name__
print fn_name + ": " + str(end-start) + "s"
return results
If I understand it well, after python 3.5 you can define globals at each Timer instance without having to define them in your block of code. I am not sure if it would have the same issues with parallelization.
My approach would be something like:
clf = ensemble.RandomForestClassifier(n_estimators=10)
myGlobals = globals()
myGlobals.update({'clf'=clf})
t = Timer(stmt='clf.fit(trainX,trainy)', globals=myGlobals)
print(t.timeit(number=1))
print(clf)
As of 2020, in ipython or jupyter notebook it is
t = %timeit -n1 -r1 -o RandomForest(trainX, trainy)
t.best
If you don't want to monkey-patch timeit, you could try using a global list, as below. This will also work in python 2.7, which doesn't have globals argument in timeit():
from timeit import timeit
import time
# Function to time - plaigiarised from answer above :-)
def foo():
time.sleep(1)
return 42
result = []
print timeit('result.append(foo())', setup='from __main__ import result, foo', number=1)
print result[0]
will print the time and then the result.
An approach I'm using it is to "append" the running time to the results of the timed function. So, I write a very simple decorator using the "time" module:
def timed(func):
def func_wrapper(*args, **kwargs):
import time
s = time.clock()
result = func(*args, **kwargs)
e = time.clock()
return result + (e-s,)
return func_wrapper
And then I use the decorator for the function I want to time.
For Python 3.X I use this approach:
# Redefining default Timer template to make 'timeit' return
# test's execution timing and the function return value
new_template = """
def inner(_it, _timer{init}):
{setup}
_t0 = _timer()
for _i in _it:
ret_val = {stmt}
_t1 = _timer()
return _t1 - _t0, ret_val
"""
timeit.template = new_template

Python, function quit if it has been run the last 5 minutes

I have a python script that gets data from a USB weather station, now it puts the data into MySQL whenever the data is received from the station.
I have a MySQL class with an insert function, what i want i that the function checks if it has been run the last 5 minutes if it has, quit.
Could not find any code on the internet that does this.
Maybe I need to have a sub-process, but I am not familiar with that at all.
Does anyone have an example that I can use?
Use this timeout decorator.
import signal
class TimeoutError(Exception):
def __init__(self, value = "Timed Out"):
self.value = value
def __str__(self):
return repr(self.value)
def timeout(seconds_before_timeout):
def decorate(f):
def handler(signum, frame):
raise TimeoutError()
def new_f(*args, **kwargs):
old = signal.signal(signal.SIGALRM, handler)
signal.alarm(seconds_before_timeout)
try:
result = f(*args, **kwargs)
finally:
signal.signal(signal.SIGALRM, old)
signal.alarm(0)
return result
new_f.func_name = f.func_name
return new_f
return decorate
Usage:
import time
#timeout(5)
def mytest():
print "Start"
for i in range(1,10):
time.sleep(1)
print "%d seconds have passed" % i
if __name__ == '__main__':
mytest()
Probably the most straight-forward approach (you can put this into a decorator if you like, but that's just cosmetics I think):
import time
import datetime
class MySQLWrapper:
def __init__(self, min_period_seconds):
self.min_period = datetime.timedelta(seconds=min_period_seconds)
self.last_calltime = datetime.datetime.now() - self.min_period
def insert(self, item):
now = datetime.datetime.now()
if now-self.last_calltime < self.min_period:
print "not insert"
else:
self.last_calltime = now
print "insert", item
m = MySQLWrapper(5)
m.insert(1) # insert 1
m.insert(2) # not insert
time.sleep(5)
m.insert(3) # insert 3
As a side-note: Have you noticed RRDTool during your web-search for related stuff? It does apparantly what you want to achieve, i.e.
a database to store the most recent values of arbitrary resolution/update frequency.
extrapolation/interpolation of values if updates are too frequent or missing.
generates graphs from the data.
An approach could be to store all data you can get into your MySQL database and forward a subset to such RRDTool database to generate a nice time series visualization of it. Depending on what you might need.
import time
def timeout(f, k, n):
last_time = [time.time()]
count = [0]
def inner(*args, **kwargs):
distance = time.time() - last_time[0]
if distance > k:
last_time[0] = time.time()
count[0] = 0
return f(*args, **kwargs)
elif distance < k and (count[0]+1) == n:
return False
else:
count[0] += 1
return f(*args, **kwargs)
return inner
timed = timeout(lambda x, y : x + y, 300, 1)
print timed(2, 4)
First argument is the function you want run, second is the time interval, and the third is the number of times it's allowed to run in that time interval.
Each time the function is run save a file with the current time. When the function is run again check the time stored in the file and make sure it is old enough.
Just derive to a new class and override the insert function. In the overwriting function, check last insert time and call father's insert method if it has been more than five minutes, and of course update the most recent insert time.

Categories

Resources