I'm currently doing some work in uni that requires generating multiple benchmarks for multiple short C programs. I've written a python script to automate this process. Up until now I've been using the time module and essentially calculating the benchmark as such:
start = time.time()
successful = run_program(path)
end = time.time()
runtime = end - start
where the run_program function just uses the subprocess module to run the C program:
def run_program(path):
p = subprocess.Popen(path, shell=True, stdout=subprocess.PIPE)
p.communicate()[0]
if (p.returncode > 1):
return False
return True
However I've recently discovered that this measures elapsed time and not CPU time, i.e. this sort of measurement is sensitive to noise from the OS. Similar questions on SO suggest that the timeit module is is better for measuring CPU time, so I've adapted the run method as such:
def run_program(path):
command = 'p = subprocess.Popen(\'time ' + path + '\', shell=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE); out, err = p.communicate()'
result = timeit.Timer(command, setup='import subprocess').repeat(1, 10)
return numpy.median(result)
But from looking at the timeit documentation it seems that the timeit module is only meant for small snippets of python code passed in as a string. So I'm not sure if timeit is giving me accurate results for this computation. So my question is: Will timeit measure the CPU for every step of the process that it runs or will it only measure the CPU time for the actual python(i.e. the subprocess module) code to run? Is this an accurate way to benchmark a set of C programs?
timeit will measure the CPU time used by the Python process in which it runs. Execution time of external processes will not be "credited" to those times.
A more accurate way would be to do it in C, where you can get true speed and throughput.
Related
I'm relatively new to Dask. I'm trying to parallelize a "custom" function that doesn't use Dask containers. I would just like to speed up the computation. But my results are that when I try parallelizing with dask.delayed, it has significantly worse performance than running the serial version. Here is a minimal implementation demonstrating the issue (the code I actually want to do this with is significantly more involved :) )
import dask,time
def mysum(rng):
# CPU intensive
z = 0
for i in rng:
z += i
return z
# serial
b = time.time(); zz = mysum(range(1, 1_000_000_000)); t = time.time() - b
print(f'time to run in serial {t}')
# parallel
ms_parallel = dask.delayed(mysum)
ss = []
ncores = 10
m = 100_000_000
for i in range(ncores):
lower = m*i
upper = (i+1) * m
r = range(lower, upper)
s = ms_parallel(r)
ss.append(s)
j = dask.delayed(ss)
b = time.time(); yy = j.compute(); t = time.time() - b
print(f'time to run in parallel {t}')
Typical results are:
time to run in serial 55.682398080825806
time to run in parallel 135.2043571472168
It seems I'm missing something basic here.
You are running a pure CPU-bound computation in threads by default. Because of python's Global Interpreter Lock (GIL), only one thread is actually running at a time. In short, you are only adding overhead to your original compute, due to thread switching and task executing.
To actually get faster for this workload, you should use dask-distributed. Just adding
import dask.distributed
client = dask.distributed.Client(threads_per_worker=1)
at the start of your script may well give you a decent speed up, since this invokes a certain number of processes, each with their own GIL. This scheduler becomes the default one just by creating it.
EDIT: ignore the following, I see you are already doing it :). Leaving here for others, unless people want it gone ...The second problem, for dask, is the sheer number of tasks. For any task execution system, there is an overhead associated with each task (actually, this is higher for distributed than the default threads scheduler). You could get around it by computing batches of function calls per task. This is, in practice, what dask.array and dask.dataframe do: they operate on largeish pieces of the overall problem, such that the overhead becomes small compared to the useful CPU execution time.
how make a simple cpython program from scratch and measure it performance gain with its respective python program..
example if I make a program of 10! in python and the in Cpython how would I know how much it improves the performance of computation.
Take a look at : How to use timeit module
simple code to test time taken for a function:
def time_taken(func):
from time import time
start = time()
func()
end = time()
return (end - start)
def your_func():
#your code or logic
# To test
time_taken(your_func)
Use this mechanism to test time taken in both situations.
Let's say CPython took c seconds & Python took p seconds, CPython is faster than Python by: (c - p) * 100 / p
If c is 2 secs & p is 1 sec. CPython is faster than Python by (2-1)*100/1 = 100%
However, the performance keeps changing depending on the code & problem statement.
Share the output. Goodluck!
Pretty simple, I'd like to run an external command/program from within a Python script, once it is finished I would also want to know how much CPU time it consumed.
Hard mode: running multiple commands in parallel won't cause inaccuracies in the CPU consumed result.
On UNIX: either (a) use resource module (also see answer by icktoofay), or (b) use the time command and parse the results, or (c) use /proc filesystem, parse /proc/[pid]/stat and parse out utime and stime fields. The last of these is Linux-specific.
Example of using resource:
import subprocess, resource
usage_start = resource.getrusage(resource.RUSAGE_CHILDREN)
subprocess.call(["yourcommand"])
usage_end = resource.getrusage(resource.RUSAGE_CHILDREN)
cpu_time = usage_end.ru_utime - usage_start.ru_utime
Note: it is not necessary to do fork/execvp, subprocess.call() or the other subprocess methods are fine here and much easier to use.
Note: you could run multiple commands from the same python script simultaneously either using subprocess.Popen or subprocess.call and threads, but resource won't return their correct individual cpu times, it will return the sum of their times in between calls to getrusage; to get the individual times, run one little python wrapper per command to time it as above (could launch those from your main script), or use the time method below which will work correctly with multiple simultaneous commands (time is basically just such a wrapper).
Example of using time:
import subprocess, StringIO
time_output = StringIO.StringIO()
subprocess.call(["time", "yourcommand", "youroptions"], stdout=time_output)
# parse time_output
On Windows: You need to use performance counters (aka "performance data helpers") somehow. Here is a C example of the underlying API. To get it from python, you can use one of two modules: win32pdh (part of pywin32; sample code) or pyrfcon (cross-platform, also works on Unix; sample code).
Any of these methods actually meet the "hard mode" requirements above: they should be accurate even with multiple running instances of different processes on a busy system. They may not produce the exact same results in that case compared to running just one process on an idle system, because process switching does have some overhead, but they will be very close, because they ultimately get their data from the OS scheduler.
On platforms where it's available, the resource module may provide what you need. If you need to time multiple commands simultaneously, you may want to (for each command you want to run) fork and then create the subprocess so you get information for only that process. Here's one way you might do this:
def start_running(command):
time_read_pipe, time_write_pipe = os.pipe()
want_read_pipe, want_write_pipe = os.pipe()
runner_pid = os.fork()
if runner_pid != 0:
os.close(time_write_pipe)
os.close(want_read_pipe)
def finish_running():
os.write(want_write_pipe, 'x')
os.close(want_write_pipe)
time = os.read(time_read_pipe, struct.calcsize('f'))
os.close(time_read_pipe)
time = struct.unpack('f', time)[0]
return time
return finish_running
os.close(time_read_pipe)
os.close(want_write_pipe)
sub_pid = os.fork()
if sub_pid == 0:
os.close(time_write_pipe)
os.close(want_read_pipe)
os.execvp(command[0], command)
os.wait()
usage = resource.getrusage(resource.RUSAGE_CHILDREN)
os.read(want_read_pipe, 1)
os.write(time_write_pipe, struct.pack('f', usage.ru_utime))
sys.exit(0)
You can then use it to run a few commands:
get_ls_time = start_running(['ls'])
get_work_time = start_running(['python', '-c', 'print (2 ** 512) ** 200'])
After that code has executed, both of those commands should be running in parallel. When you want to wait for them to finish and get the time they took to execute, call the function returned by start_running:
ls_time = get_ls_time()
work_time = get_work_time()
Now ls_time will contain the time ls took to execute and work_time will contain the time python -c "print (2 ** 512) ** 200" took to execute.
You can do timings within Python, but if you want to know the overall CPU consumption of your program, that is kind of silly to do. The best thing to do is to just use the GNU time program. It even comes standard in most operating systems.
The timeit module of python is very useful for benchmarking/profiling purposes. In addtion to that you can even call it from the command-line interface. To benchmark a external command, you would go like this:
>>> import timeit
>>> timeit.timeit("call(['ls','-l'])",setup="from subprocess import call",number=1) #number defaults to 1 million
total 16
-rw-rw-r-- 1 nilanjan nilanjan 3675 Dec 17 08:23 icon.png
-rw-rw-r-- 1 nilanjan nilanjan 279 Dec 17 08:24 manifest.json
-rw-rw-r-- 1 nilanjan nilanjan 476 Dec 17 08:25 popup.html
-rw-rw-r-- 1 nilanjan nilanjan 1218 Dec 17 08:25 popup.js
0.02114391326904297
The last line is the returned execution time. Here, the first argument to timeit.timeit() is the code for calling the external method and setup argument specifies the code to run before the start of time-measurement. number argument is the number of time you wish to run the specified code and then you can divide the time returned by the number to get average time.
You can also use the timeit.repeat() method which takes similar arguments as timeit.timeit() but takes an additional repeat argument to specify the number of time timeit.timeit() should be called and returns a list of execution times for each run.
Note: The execution time returned by the timeit.timeit() method is the wall clock time, not the CPU time. So, other processes may interfere with the timing. So, in case of timeit.repeat() you should take the minimum value instead of trying to calculate the average or standard deviation.
You can do this using ipython's %time magic function:
In [1]: time 2**128
CPU times: user 0.00 s, sys: 0.00 s, total: 0.00 s
Wall time: 0.00
Out[1]: 340282366920938463463374607431768211456L
In [2]: n = 1000000
In [3]: time sum(range(n))
CPU times: user 1.20 s, sys: 0.05 s, total: 1.25 s
Wall time: 1.37
Out[3]: 499999500000L
Is there a simple way to time a Python program's execution?
clarification: Entire programs
Use timeit:
This module provides a simple way to time small bits of Python code. It has both command line as well as callable interfaces. It avoids a number of common traps for measuring execution times.
You'll need a python statement in a string; if you have a main function in your code, you could use it like this:
>>> from timeit import Timer
>>> timer = Timer('main()', 'from yourmodule import main')
>>> print timer.timeit()
The second string provides the setup, the environment for the first statement to be timed in. The second part is not being timed, and is intended for setting the stage as it were. The first string is then run through it's paces; by default a million times, to get accurate timings.
If you need more detail as to where things are slow, use one of the python profilers:
A profiler is a program that describes the run time performance of a program, providing a variety of statistics.
The easiest way to run this is by using the cProfile module from the command line:
$ python -m cProfile yourprogram.py
You might want to use built-in profiler.
Also you might want to measure function's running time by using following simple decorator:
import time
def myprof(func):
def wrapping_fun(*args):
start = time.clock()
result = func(*args)
end = time.clock()
print 'Run time of %s is %4.2fs' % (func.__name__, (end - start))
return result
return wrapping_fun
Usage:
#myprof
def myfun():
# function body
If you're on Linux/Unix/POSIX-combatible platform just use time. This way you won't interfere with you script and won't slow it down with unnecessarily detailed (for you) profiling. Naturally, you can use it for pretty much anything, not just Python scripts.
For snippets use the timeit module.
For entire programs use the cProfile module.
Use timeit
>>> import timeit
>>> t = timeit.Timer(stmt="lst = ['c'] * 100")
>>> print t.timeit()
1.10580182076
>>> t = timeit.Timer(stmt="lst = ['c' for x in xrange(100)]")
>>> print t.timeit()
7.66900897026
I would like to know that how much time a particular function has spent during the duration of the program which involves recursion, what is the best way of doing it?
Thank you
The best way would be to run some benchmark tests (to test individual functions) or Profiling (to test an entire application/program). Python comes with built-in Profilers.
Alternatively, you could go back to the very basics by simply setting a start time at the beginning of the program, and, at the end of the program, subtracting the current time from the start time. This is basically very simple Benchmarking.
Here is an implementation from the an answer from the linked question:
import time
start = time.time()
do_long_code()
print "it took", time.time() - start, "seconds."
Python has something for benchmarking included in its standard library, as well.
From the example give on the page:
def test():
"Time me"
L = []
for i in range(100):
L.append(i)
if __name__=='__main__':
from timeit import Timer
t = Timer("test()", "from __main__ import test")
print t.timeit()
Use the profiler!
python -m cProfile -o prof yourscript.py
runsnake prof
runsnake is a nice tool for looking at the profiling output. You can of course use other tools.
More on the Profiler here: http://docs.python.org/library/profile.html