How can I make python busy-wait for an exact duration?

How can I make python busy-wait for an exact duration? - python

More out of curiosity, I was wondering how might I make a python script sleep for 1 second without using the time module?
Is there a computation that can be conducted in a while loop which takes a machine of n processing power a designated and indexable amount of time?

As mentioned in comments for your second part of question:
The processing time is depends on the machine(computer and its configuration) you are working with and active processes on it. There isnt fixed amount of time for an operation.
It's been a long time since you could get a reliable delay out of just trying to execute code that would take a certain time to complete. Computers don't work like that any more.
But to answer your first question: you can use system calls and open a os process to sleep 1 second like:
import subprocess
subprocess.run(["sleep", "1"])

Related

How to check cpu consumption by a python script

I have implemented kalman filter. I want to find out how much of cpu energy is being consumed by my script. I have checked other posts on Stackoverflow and following them I downloaded psutil library. Now, I am unaware of where to put the statements to get the correct answer. Here is my code:
if __name__ == "__main__":
#kalman code
pid = os.getpid()
py = psutil.Process(pid)
current_process = psutil.Process();
memoryUse = py.memory_info()[0]/2.**30 # memory use in GB...I think
print('memory use:', memoryUse)
print(current_process.cpu_percent())
print(psutil.virtual_memory()) # physical memory usage
Please inform whether I am headed in the right direction or not.
The above code generated following results.
('memory use:', 0.1001129150390625)
0.0
svmem(total=6123679744, available=4229349376, percent=30.9, used=1334358016, free=3152703488, active=1790803968, inactive=956125184, buffers=82894848, cached=1553723392, shared=289931264, slab=132927488)
Edit: Goal: find out energy consumed by CPU while running this script

I'm not sure whether you're trying to find the peak CPU usage during execution of that #kalman code, or the average, or the total, or something else?
But you can't get any of those by calling cpu_percent() after it's done. You're just measuring the CPU usage of calling cpu_percent().
Some of these, you can get by calling cpu_times(). This will tell you how many seconds were spent doing various things, including total "user CPU time" and "system CPU time". The docs indicate exactly which values you get on each platform, and what they mean.
In fact, if this is all you want (including only needing those two values), you don't even need psutils, you can just use os.times() in the standard library.
Or you can just use the time command from outside your program, by running time python myscript.py instead of python myscript.py.
For others, you need to call cpu_percent while your code is running. One simple solution is to run a background process with multiprocessing or concurrent.futures that just calls cpu_percent on your main process. To get anything useful, the child process may need to call it periodically, aggregating the results (e.g., to find the maximum), until it's told to stop, at which point it can return the aggregate.
Since this is not quite trivial to write, and definitely not easy to explain without knowing how much familiarity you have with multiprocessing, and there's a good chance cpu_times() is actually what you want here, I won't go into details unless asked.

Comparing two different processes based on the real, user and sys times

I have been through other answers on SO about real,user and sys times. In this question, apart from the theory, I am interested in understanding the practical implications of the times being reported by two different processes, achieving the same task.
I have a python program and a nodejs program https://github.com/rnanwani/vips_performance. Both work on a set of input images and process them to obtain different outputs. Both using libvips implementations.
Here are the times for the two
Python
real 1m17.253s
user 1m54.766s
sys 0m2.988s
NodeJS
real 1m3.616s
user 3m25.097s
sys 0m8.494s
The real time (the wall clock time as per other answers is lesser for NodeJS, which as per my understanding means that the entire process from input to output, finishes much quicker on NodeJS. But the user and sys times are very high as compared to Python. Also using the htop utility, I see that NodeJS process has a CPU usage of about 360% during the entire process maxing out the 4 cores. Python on the other hand has a CPU usage from 250% to 120% during the entire process.
I want to understand a couple of things
Does a smaller real time and a higher user+sys time mean that the process (in this case Node) utilizes the CPU more efficiently to complete the task sooner?
What is the practical implication of these times - which is faster/better/would scale well as the number of requests increase?

My guess would be that node is running more than one vips pipeline at once, whereas python is strictly running one after the other. Pipeline startup and shutdown is mostly single-threaded, so if node starts several pipelines at once, it can probably save some time, as you observed.
You load your JPEG images in random access mode, so the whole image will be decompressed to memory with libjpeg. This is a single-threaded library, so you will never see more than 100% CPU use there.
Next, you do resize/rotate/crop/jpegsave. Running through these operations, resize will thread well, with the CPU load increasing as the square of the reduction, the rotate is too simple to have much effect on runtime, and the crop is instant. Although the jpegsave is single-threaded (of course) vips runs this in a separate background thread from a write-behind buffer, so you effectively get it for free.
I tried your program on my desktop PC (six hyperthreaded cores, so 12 hardware threads). I see:
$ time ./rahul.py indir outdir
clearing output directory - outdir
real 0m2.907s
user 0m9.744s
sys 0m0.784s
That looks like we're seeing 9.7 / 2.9, or about a 3.4x speedup from threading, but that's very misleading. If I set the vips threadpool size to 1, you see something closer to the true single-threaded performance (though it still uses the jpegsave write-behind thread):
$ export VIPS_CONCURRENCY=1
$ time ./rahul.py indir outdir
clearing output directory - outdir
real 0m18.160s
user 0m18.364s
sys 0m0.204s
So we're really getting 18.1 / 2.97, or a 6.1x speedup.
Benchmarking is difficult and real/user/sys can be hard to interpret. You need to consider a lot of factors:
Number of cores and number of hardware threads
CPU features like SpeedStep and TurboBoost, which will clock cores up and down depending on thermal load
Which parts of the program are single-threaded
IO load
Kernel scheduler settings
And I'm sure many others I've forgotten.
If you're curious, libvips has it's own profiler which can help give more insight into the runtime behaviour. It can show you graphs of the various worker threads, how long they are spending in synchronisation, how long in housekeeping, how long actually processing your pixels, when memory is allocated, and when it finally gets freed again. There's a blog post about it here:
http://libvips.blogspot.co.uk/2013/11/profiling-libvips.html

Does a smaller real time and a higher user+sys time mean that the process (in this case Node) utilizes the CPU more efficiently to complete the task sooner?
It doesn't necessarily mean they utilise the processor(s) more efficiently.
The higher user time means that Node is utilising more user space processor time, and in turn complete the task quicker. As stated by Luke Exton, the cpu is spending more time on "Code you wrote/might look at"
The higher sys time means there is more context switching happening, which makes sense from your htop utilisation numbers. This means the scheduler (kernel process) is jumping between Operating system actions, and user space actions. This is the time spent finding a CPU to schedule the task onto.
What is the practical implication of these times - which is faster/better/would scale well as the number of requests increase?
The question of implementation is a long one, and has many caveats. I would assume from the python vs Node numbers that the Python threads are longer, and in turn doing more processing inline. Another thing to note is the GIL in python. Essentially python is a single threaded application, and you can't easily break out of this. This could be a contributing factor to the Node implementation being quicker (using real threads).
The Node appears to be written to be correctly threaded and to split many tasks out. The advantages of the highly threaded application will have a tipping point where you will spend MORE time trying to find a free cpu for a new thread, than actually doing the work. When this happens your python implementation might start being faster again.

The higher user+sys time means that the process had more running threads and as you've noticed by looking at 360% used almost all available CPU resources of your 4-cores. That means that NodeJS process is already limited by available CPU resources and unable to process more requests. Also any other CPU intensive processes that you could eventually run on that machine will hit your NodeJS process. On the other hand Python process doesn't take all available CPU resources and probably could scale with a number of requests.

So these times are not reliable in and of themselves, they say how long the process took to perform an action on the CPU. This is coupled very tightly to whatever else was happening at the same time on that machine and could fluctuate wildly based entirely on physical resources.
In terms of these times specifically:
real = Wall Clock time (Start to finish time)
user = Userspace CPU time (i.e. Code you wrote/might look at) e.g. node/python libs/your code
sys = Kernel CPU time (i.e. Syscalls, e.g Open a file from the OS.)
Specifically, small real time means it actually finished faster. Does it mean it did it better for sure, NO. There could have been less happening on the machine at the same time for instance.
In terms of scale, these numbers are a little irrelevant, and it depends on the architecture/bottlenecks. For instance, in terms of scale and specifically, cloud compute, it's about efficiently allocating resources and the relevant IO for each, generally (compute, disk, network). Does processing this image as fast as possible help with scale? Maybe? You need to examine bottlenecks and specifics to be sure. It could for instance overwhelm your network link and then you are constrained there, before you hit compute limits. Or you might be constrained by how quickly you can write to the disk.

One potentially important aspect of this which no one mention is the fact that your library (vips) will itself launch threads:
http://www.vips.ecs.soton.ac.uk/supported/current/doc/html/libvips/using-threads.html
When libvips calculates an image, by default it will use as many
threads as you have CPU cores. Use vips_concurrency_set() to change
this.
This explains the thing that surprised me initially the most -- NodeJS should (to my understanding) be pretty single threaded, just as Python with its GIL. It being all about asynchronous processing and all.
So perhaps Python and Node bindings for vips just use different threading settings. That's worth investigating.
(that said, a quick look doesn't find any evidence of changes to the default concurrency levels in either library)

issue in trying to execute certain number of python scripts at certain intervals

I am trying to execute certain number of python scripts at certain intervals. Each script takes a lot of time to execute and hence I do not want to waste time in waiting to run them sequentially. I tired this code but it is not executing them simultaneously and is executing them one by one:
Main_file.py
import time
def func(argument):
print 'Starting the execution for argument:',argument
execfile('test_'+argument+'.py')
if __name__ == '__main__':
arg = ['01','02','03','04','05']
for val in arg:
func(val)
time.sleep(60)
What I want is to kick off by starting the executing of first file(test_01.py). This will keep on executing for some time. After 1 minute has passed I want to start the simultaneous execution of second file (test_02.py). This will also keep on executing for some time. Like this I want to start the executing of all the scripts after gaps of 1 minute.
With the above code, I notice that the execution is happening one after other file and not simultaneously as the print statements which are there in these files appear one after the other and not mixed up.
How can I achieve above needed functionality?

Using python 2.7 on my computer, the following seems to work with small python scripts as test_01.py, test_02.py, etc. when threading with the following code:
import time
import thread
def func(argument):
print('Starting the execution for argument:',argument)
execfile('test_'+argument+'.py')
if __name__ == '__main__':
arg = ['01','02','03']
for val in arg:
thread.start_new_thread(func, (val,))
time.sleep(10)
However, you indicated that you kept getting a memory exception error. This is likely due to your scripts using more stack memory than was allocated to them, as each thread is allocated 8 kb by default (on Linux). You could attempt to give them more memory by calling
thread.stack_size([size])
which is outlined here: https://docs.python.org/2/library/thread.html
Without knowing the number of threads that you're attempting to create or how memory intensive they are, it's difficult to if a better solution should be sought. Since you seem to be looking into executing multiple scripts essentially independently of one another (no shared data), you could also look into the Multiprocessing module here:
https://docs.python.org/2/library/multiprocessing.html

If you need them to run parallel you will need to look into threading. Take a look at https://docs.python.org/3/library/threading.html or https://docs.python.org/2/library/threading.html depending on the version of python you are using.

Test speed of two scripts

I'd like to test the speed of a bash script and a Python script. How would I get the time it took to run them?

If you're on Linux (or another UN*X), try time:
The time command runs the specified program command with the given
arguments. When command finishes, time writes a message to standard
error giving timing statistics about this program run. These statis-
tics consist of (i) the elapsed real time between invocation and termi-
nation, (ii) the user CPU time (the sum of the tms_utime and tms_cutime
values in a struct tms as returned by times(2)), and (iii) the system
CPU time (the sum of the tms_stime and tms_cstime values in a struct
tms as returned by times(2)).
Note that you need to eliminate outer effects - e.g. other processes using the same resources can skew the measurement.

I guess that you can use
time ./script.sh
time python script.py

At the beginning of each script output the start time and at the end of each script output the end time. Subtract the times and compare. Or use the time command if it is available as others have answered.

Python/PySerial and CPU usage

I've created a script to monitor the output of a serial port that receives 3-4 lines of data every half hour - the script runs fine and grabs everything that comes off the port which at the end of the day is what matters...
What bugs me, however, is that the cpu usage seems rather high for a program that's just monitoring a single serial port, 1 core will always be at 100% usage while this script is running.
I'm basically running a modified version of the code in this question: pyserial - How to Read Last Line Sent from Serial Device
I've tried polling the inWaiting() function at regular intervals and having it sleep when inWaiting() is 0 - I've tried intervals from 1 second down to 0.001 seconds (basically, as often as I can without driving up the cpu usage) - this will succeed in grabbing the first line but seems to miss the rest of the data.
Adjusting the timeout of the serial port doesn't seem to have any effect on cpu usage, nor does putting the listening function into it's own thread (not that I really expected a difference but it was worth trying).
Should python/pyserial be using this much cpu? (this seems like overkill)
Am I wasting my time on this quest / Should I just bite the bullet and schedule the script to sleep for the periods that I know no data will be coming?

Maybe you could issue a blocking read(1) call, and when it succeeds use read(inWaiting()) to get the right number of remaining bytes.

Would a system style solution be better? Create the python script and have it executed via Cron/Scheduled Task?
pySerial shouldn't be using that much CPU but if its just sitting there polling for an hour I can see how it may happen. Sleeping may be a better option in conjunction with periodic wakeup and polls.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.