How to use psutil.get_cpu_percent()?

How to use psutil.get_cpu_percent()? - python

How to exactly use the function get_cpu_percent()?
My code is:
SDKTestSuite.DijSDK_CalculateFps(int(timeForFPS),int(index),cameraName)
cpuUsage = process.get_cpu_percent()
Here I am calling a Function called SDKTestSuite.DijSDK_CalculateFps() and I am calling get_cpu_percent() to get the CPU usage of this call. I am Calling this function for different inputs, The result is sometimes the CPU usage gives 0.0% which is not expected.
So Am I using the get_cpu_percent in the correct manner? How to exactly use this get_cpu_percent function? Is there any interval parameters vary here?
In the actual definition of this function it just sleeps for the given interval and compares the CPU time, but how does it calculates my functionality call here?

If you read the docs, psutil.cpu_percent will:
Return a float representing the current system-wide CPU utilization as a percentage… When interval is 0.0 or None compares system CPU times elapsed since last call or module import, returning immediately…
I'm pretty sure that's not what you want.
First, if you want to know the CPU usage during a specific call, you have to either (a) call it before and after the function, or (b) call it from another thread or process running in parallel with the call, and for the same time as the call (by passing an interval).
Second, if you want to know how much CPU time that call is using, as opposed to how much that call plus everything else being done by every program on your computer are using, you're not even calling the right function, so there's no way to get it to do what you want.
And there's nothing in psutil that's designed to do that. The whole point of psutil is that it provides information about your system, and about your program from the "outside" perspective. It doesn't know anything about what functions you ran when.
However, there are things that come with the stdlib that do things like that, like resource.getrusage or the cProfile module. Without knowing exactly what you're trying to accomplish, it's hard to tell you exactly what to do, but maybe if you read those linked docs they'll give you some ideas.

you need psutil
pip install psutil
then
import psutil,os
p = psutil.Process(os.getpid())
print(p.cpu_percent())

os.getpid - this will return the PID of the program running this code. For example, when you run it on pythonw.exe, it will return PID of pythonw.exe.
Hence p.get_cpu_percent will return CPU usage of that process.
if you want system wide CPU, psutil.cpu_percent can do the job.
Hope this helps, Cheers!

Thanks all...
I got the solution for my query.
>>> p = psutil.Process(os.getpid())
>>> # blocking
>>> p.get_cpu_percent(interval=1)
2.0
>>> # non-blocking (percentage since last call)
>>> p.get_cpu_percent(interval=0)
2.9
Here the interval value matters a lot. The final call will give the usage of cpu in percentage for my actual function call.

Related

Why does mulitprocessing.Pool run but never terminate?

I'm trying to use mulitprocessing.Pool to speed up the execution of a function across a range of inputs. The processes seem to have been called, since my task manager indicates a substantial increase in my CPU's utilization, but the task never terminates. No exceptions are ever raised, runtime or otherwise.
from multiprocessing import Pool
def f(x):
print(x)
return x**2
class Klass:
def __init__(self):
pass
def foo(self):
X = list(range(1, 1000))
with Pool(15) as p:
result = p.map(f, X)
if __name__ == "__main__":
obj = Klass()
obj.foo()
print("All Done!")
Interestingly, despite the uptick in CPU utilization, print(x) never prints anything to the console.
I have moved the function f outside of the class as was suggested here, to no avail. I have tried adding p.close() and p.join() as well with no success. Using other Pool class methods like imap lead to TypeError: can't pickle _thread.lock objects errors and seems to take a step away from the example usage in the introduction of the Python Multiprocessing Documentation.
Adding to the confusion, if I try running the code above enough times (killing the hung kernel after each attempt) the code begins consistently working as expected. It usually takes about twenty attempts before this "clicks" into place. Restarting my IDE reverts the now functional code back to the former broken state. For reference, I am running using the Anaconda Python Distribution (Python 3.7) with the Spyder IDE on Windows 10. My CPU has 16 cores, so the Pool(15) is not calling for more processes than I have CPU cores. However, running the code with a different IDE, like Jupyter Lab, yields the same broken results.
Others have suggested that this may be a flaw with Spyder itself, but the suggestion to use mulitprocessing.Pool instead of mulitprocessing.Process doesn't seem to work either.

Could be related to this from python doc:
Note Functionality within this package requires that the main
module be importable by the children. This is covered in Programming
guidelines however it is worth pointing out here. This means that some
examples, such as the multiprocessing.pool.Pool examples will not work
in the interactive interpreter.
and then this comment on their example:
If you try this it will actually output three full tracebacks
interleaved in a semi-random fashion, and then you may have to stop
the master process somehow.
UPDATE:
The info found here seems to confirm that using the pool from an interactive interpreter will have varying success. This guidance is also shared...
...guidance [is] to always use functions/classes whose definitions are
importable.
This is the solution outlined here and which works for me (every time) using your code.

This seems like it might be a problem with both Spyder and Jupyter. If you run the above code in the console directly, everything works as intended.

How to do a parallel computation on Python exactly specifying cpu no i do task i?

Define the following function which adds natural numbers until where you ask.
def f(x):
lo=0
for i in range(x):
lo+=i
return(lo)
To parallel it using multiprocessing.dummy I wrote the following
from multiprocessing.dummy import Pool as ThreadPool
pool=ThreadPool(4)
def f_parallel(x1,x2,x3,x4):
listo_parallel=[x1,x2,x3,x4]
resulto_parallel=pool.map(f,listo_parallel)
return(resulto_parallel)
It works, but I don't see any reduction in time of computation. Because define the following functions which reports the computation time as well.
import time
def f_t(x):
st=time.time()
lob=f(x)
st=time.time()-st
return(lob,st)
def f_parallel_t(x1,x2,x3,x4):
listo_parallel=[x1,x2,x3,x4]
st=time.time()
resulto_parallel=pool.map(f,listo_parallel)
st=time.time()-st
return(resulto_parallel,st)
Now let's examine. for x=10**7, 9**7, 10**7-2, 10**6 normal f takes 0.53, 0.24, 0.53, 0.04 seconds. And for four of them the f_parallel takes 1.39 seconds!!!!! I expected to see 0.53 seconds because the computer I used has 4 cpus and I chose 4 in the pool. But why it's going like this?
I also tried to read the documentation of the multiprocessing library of Python 3.7 but they works only if I type the examples exactly the way they are written there. For example consider the first example in that document. If I type
from multiprocessing import Pool
Pool(4).map(f,[10**7,9**7,10**7-2,10**6])
Then nothing happens and I have to restart shell (Ctrl+F6).
And doing this pool.map is not really what I want, I want to tell Python to do f(x_i) exactly on cpu no. i. So I want to know what part of my computation is being done on which cpu at any step of my programming.
Any help or guidance will be appreciated.
For the case someone doesn't get what I really want to do with python, I am uploading screenshot from the Maple file I made right now which is doing exactly what I want to do with Python and I'm asking in this question.

In CPython, more or less the "standard" implementation, only one thread at a time can be executing Python bytecode.
So using threads to speed up computations won't work in CPython.
You could use multiprocessing.Pool instead. In general I would recommend using the Pool's imap_unordered method instead of plain map. The former will start yielding values as soon as they become available, while the latter returns a list after all calculations are done.
Getting to the core of your question, Python does not have a platform independant method specify on which CPU a process it starts will run. How so-called processor affinity works is very operating system dependant as you can see on the linked page. Of course you could use subprocess to run one of the mentioned utility programs, or you could use ctypes to execute the relevant system calls directly.

Thanks to #FlyingTeller and #quamrana who answered my the other question, now I know how to implement the python program to do the four computations parallel such that it takes time as much as the maximum time of the four separate computations. Here is the corrected code:
def f(x):
lo=0
for i in range(x):
lo+=i
return(lo)
from multiprocessing import Pool
def f_parallel(x1,x2,x3,x4):
with Pool(processes=4) as pool:
resulto_parallel=pool.map(f,[x1,x2,x3,x4])
return(resulto_parallel)
import time
def f_parallel_t(x1,x2,x3,x4):
st=time.time()
ans=f_parallel(x1,x2,x3,x4)
st=time.time()-st
return(ans,st)
if __name__ == '__main__':
print(f_parallel_t(10**7,10**6,10**7-2,9**7))
And the screenshot of the result when I run it:

Millisecond precise python timer

I work on a python script designed to control multiple actuator at the same time. To simplify, let's say I need to control 2 electric motors.
With the multiprocessing module I create a process for each motors, and a process to save the data on a sheet.
This part of the script works fine, however I need to command my motors at precise times, every milliseconds, and the time.time() or the time.clock() functions seems to be unreliable (triggs between 0.05 and 30 millisecond!)
Is it "normal" for these functions to be so erratic, or is this caused by another part of my script?
EDIT: I used the datetime function (see below) to improve the precision, but I still have several discrete level of error. For example, if I want 1ms, I get also 1.25, 0.75,1.5...
So IMO this is due to computer hardware(as Serge Ballesta said).

As I "only" need a relative time (1ms between each command), do you
know a way to do that precisely?
The best you can hope for is datetime.datetime.now(), which will give you microsecond resolution:
>>> import datetime
>>> datetime.datetime.now()
datetime.datetime(2014, 7, 15, 14, 31, 1, 521368)
>>> i = datetime.datetime.now()
>>> q = datetime.datetime.now()
>>> (i-q).seconds
86389
>>> (i-q).microseconds
648299
>>> i.microsecond
513160

IMHO the problem is not in your script but more probably in your machine. I assume you are using a standard computer under Linux or Windows. And even if the computer is able to do many things in one single milliseconds, it constantly does with :
network
mouse or keyboard events
screen (including screen savers ...)
antivirus software (mainly on Windows)
system or daemon processes (and there are plenty of them)
multi-task management
You cannot reliably hope to have a one millisecond accuracy without a dedicated hardware equipement or you must use a real time system.
Edit:
As a complement, here is a quote from an answer from Ulrich Eckhardt to this other post Inconsistent Python Performance on Windows :
You have code that performs serial IO, which will block your process and possibly invoke the scheduler. If that happens, it will first give control to a different process and only when that one yield or exceeds its timeslice it will re-schedule your process.
The question for you is: What is the size of the scheduler timeslice of the systems you are running? I believe that this will give you an insight into what is happening.

I can't give You a comment yet (reputation), so a give you an anwser:
It's just a clue:
Try to use cProfile module.
https://docs.python.org/2/library/profile.html
It lets you check how long your script is executed, and every function in the script. Function .run() in the cProfile module returns precise statistics about your script.
Maybe it can help you..

Python - Working around memory leaks

I have a Python program that runs a series of experiments, with no data intended to be stored from one test to another. My code contains a memory leak which I am completely unable to find (I've look at the other threads on memory leaks). Due to time constraints, I have had to give up on finding the leak, but if I were able to isolate each experiment, the program would probably run long enough to produce the results I need.
Would running each test in a separate thread help?
Are there any other methods of isolating the effects of a leak?
Detail on the specific situation
My code has two parts: an experiment runner and the actual experiment code.
Although no globals are shared between the code for running all the experiments and the code used by each experiment, some classes/functions are necessarily shared.
The experiment runner isn't just a simple for loop that can be easily put into a shell script. It first decides on the tests which need to be run given the configuration parameters, then runs the tests then outputs the data in a particular way.
I tried manually calling the garbage collector in case the issue was simply that garbage collection wasn't being run, but this did not work
Update
Gnibbler's answer has actually allowed me to find out that my ClosenessCalculation objects which store all of the data used during each calculation are not being killed off. I then used that to manually delete some links which seems to have fixed the memory issues.

You can use something like this to help track down memory leaks
>>> from collections import defaultdict
>>> from gc import get_objects
>>> before = defaultdict(int)
>>> after = defaultdict(int)
>>> for i in get_objects():
... before[type(i)] += 1
...
now suppose the tests leaks some memory
>>> leaked_things = [[x] for x in range(10)]
>>> for i in get_objects():
... after[type(i)] += 1
...
>>> print [(k, after[k] - before[k]) for k in after if after[k] - before[k]]
[(<type 'list'>, 11)]
11 because we have leaked one list containing 10 more lists

Threads would not help. If you must give up on finding the leak, then the only solution to contain its effect is running a new process once in a while (e.g., when a test has left overall memory consumption too high for your liking -- you can determine VM size easily by reading /proc/self/status in Linux, and other similar approaches on other OS's).
Make sure the overall script takes an optional parameter to tell it what test number (or other test identification) to start from, so that when one instance of the script decides it's taking up too much memory, it can tell its successor where to restart from.
Or, more solidly, make sure that as each test is completed its identification is appended to some file with a well-known name. When the program starts it begins by reading that file and thus knows what tests have already been run. This architecture is more solid because it also covers the case where the program crashes during a test; of course, to fully automate recovery from such crashes, you'll want a separate watchdog program and process to be in charge of starting a fresh instance of the test program when it determines the previous one has crashed (it could use subprocess for the purpose -- it also needs a way to tell when the sequence is finished, e.g. a normal exit from the test program could mean that while any crash or exit with a status != 0 signify the need to start a new fresh instance).
If these architectures appeal but you need further help implementing them, just comment to this answer and I'll be happy to supply example code -- I don't want to do it "preemptively" in case there are as-yet-unexpressed issues that make the architectures unsuitable for you. (It might also help to know what platforms you need to run on).

I had the same problem with a third party C library which was leaking. The most clean work-around that I could think of was to fork and wait. The advantage of it is that you don't even have to create a separate process after each run. You can define the size of your batch.
Here's a general solution (if you ever find the leak, the only change you need to make is to change run() to call run_single_process() instead of run_forked() and you'll be done):
import os,sys
batchSize = 20
class Runner(object):
def __init__(self,dataFeedGenerator,dataProcessor):
self._dataFeed = dataFeedGenerator
self._caller = dataProcessor
def run(self):
self.run_forked()
def run_forked(self):
dataFeed = self._dataFeed
dataSubFeed = []
for i,dataMorsel in enumerate(dataFeed,1):
if i % batchSize > 0:
dataSubFeed.append(dataMorsel)
else:
self._dataFeed = dataSubFeed
self.fork()
dataSubFeed = []
if self._child_pid is 0:
self.run_single_process()
self.endBatch()
def run_single_process(self)
for dataMorsel in self._dataFeed:
self._caller(dataMorsel)
def fork(self):
self._child_pid = os.fork()
def endBatch(self):
if self._child_pid is not 0:
os.waitpid(self._child_pid, 0)
else:
sys.exit() # exit from the child when done
This isolates the memory leak to the child process. And it will never leak more times than the value of the batchSize variable.

I would simply refactor the experiments into individual functions (if not like that already) then accept an experiment number from the command line which calls the single experiment function.
The just bodgy up a shell script as follows:
#!/bin/bash
for expnum in 1 2 3 4 5 6 7 8 9 10 11 ; do
python youProgram ${expnum} otherParams
done
That way, you can leave most of your code as-is and this will clear out any memory leaks you think you have in between each experiment.
Of course, the best solution is always to find and fix the root cause of a problem but, as you've already stated, that's not an option for you.
Although it's hard to imagine a memory leak in Python, I'll take your word on that one - you may want to at least consider the possibility that you're mistaken there, however. Consider raising that in a separate question, something that we can work on at low priority (as opposed to this quick-fix version).
Update: Making community wiki since the question has changed somewhat from the original. I'd delete the answer but for the fact I still think it's useful - you could do the same to your experiment runner as I proposed the bash script for, you just need to ensure that the experiments are separate processes so that memory leaks dont occur (if the memory leaks are in the runner, you're going to have to do root cause analysis and fix the bug properly).

cProfile and Python: Finding the specific line number that code spends most time on

I'm using cProfile, pstats and Gprof2dot to profile a rather long python script.
The results tell me that the most time is spent calling a method in an object I've defined. However, what I would really like is to know exactly what line number within that function is eating up the time.
Any idea's how to get this additional information?
(By the way, I'm using Python 2.6 on OSX snow leopard if that helps...)

There is a line profiler in python written by Robert Kern.

cProfile does not track line numbers within a function; it only tracks the line number of where the function was defined.
cProfile attempts to duplicate the behavior of profile (which is pure Python). profile uses pstats to store the data from running, and pstats only stores line numbers for function definitions, not for individual Python statements.
If you need to figure out with finer granularity what is eating all your time, then you need to refactor your big function into several, smaller functions.

Suppose the amount of time being "eaten up" is some number, like 40%. Then if you just interrupt the program or pause it at a random time, the probability is 40% that you will see it, precisely exposed on the call stack. Do this 10 times, and on 4 samples, +/-, you will see it.
This tells why it works. This is an example.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.