How to restrict the maximum CPU utilization for a python program

How to restrict the maximum CPU utilization for a python program - python

I have searched it over internet but found nothing related to it.
I have two large python dictionaries that contains more than 2 Million key value pairs .My computer is continuously showing 100% utilization when I am doing any kind of computation on this data.
Due to this I am not able to perform other task on my system as it hangs frequently.
Is there any way that can restrict the maximum CPU allocation for a python program by writing some code in the python program itself.As I do not want to allow this program to use 100% cpu time.
PS: I am currently using sleep function to restrict it but it looks silly.I am using windows 7.

If you are on a linux/unix platform you can use nice to reduce the priority of your process.
This only helps if it is cpu that is maxed out. If you are waiting on disk/swap I/O for example, nice really won't help.
NICE(1) User Commands NICE(1)
NAME
nice - run a program with modified scheduling priority
SYNOPSIS
nice [OPTION] [COMMAND [ARG]...]
DESCRIPTION
Run COMMAND with an adjusted niceness, which affects process schedul‐
ing. With no COMMAND, print the current niceness. Nicenesses range
from -20 (most favorable scheduling) to 19 (least favorable).
For Windows try the START command

Related

How to prevent python script from freezing in ubuntu 16.04

I work for a digital marketing agency having multiple clients. And in one of the projects, I have a very resource intensive python script (which fetches data for Facebook ads), to be run on all those clients (say 500+ in number) in ubuntu 16.04 server.
Originally script took around 2 mins to complete, with 300 MB RES & 1000 MB VM (as per htop), for 1 client. Hence optimized it with ThreadPoolExecutor (max_workers=10) so that script can run on 4 clients concurrently (almost).
Then found out that sometimes, script froze during run (or basically its in "comatose state"). Debugged & profiled and found that its not the script that's causing issue, but its the system.
Then batched the script, means if there are 20 input clients, ran 5 instances (4*5=20) of script. Here sometimes it went fine but sometimes last instance froze.
Then found out that RAM (2G) was being overused, hence increased swapping memory from 0 to 1G. That did the trick. But if few clients are heavy in memory, same thing happens.
Have attached the screenshot of the latest run where after running the 8 instances, last 2 froze. They can be left for days for that matter.
I am thinking of increasing the server RAM from 2G to 4G but not sure if that's the permanent solution. Did anyone has faced similar issue?

You need to fix the Ram consumption of your script,
if your script allocates more memory than your system can provide it get's memory errors, in case you have them in threadpools or similar constructs the threads may never return under some circumstances.
You can fix this by using async functions with timeouts and implementing automatic restart handlers, in case a process does not yield an expected results.
The best way to do that is heavily dependent on the script and will probably require altering already created code
The issue is definitly with your script and not with the OS.
The fastetst workaround would be to increase she system memory or to reduce the amount of threads.

If just adding 1GB of swap area "almost" did the trick then definitely increasing the physical memory is a good way to go. Btw remember that swapping means you're using disk storage, whose speed is measured in millisecs, while RAM speed is measured in nanosecs - so avoiding swap guarantees a performance boost.
And then, reboot your system every now and then. Although Linux is far better than Windows in this respect, memory leaks do occur in Linux too, and a reboot every few months will surely help.

As Gornoka stated you need to alter the memory comsumption of the script as added details this can also be done by removing declared variables within the script once used with the keyword
del
This can also be done by ensuring that if it is processing massive files it does this line by line and saving it as it finishes each line.
I have had this happen and it usually is an indicator of working with to much data at once within the ram and it is always better to work with it partially whenever possible and if not possible get more RAM

How to check cpu consumption by a python script

I have implemented kalman filter. I want to find out how much of cpu energy is being consumed by my script. I have checked other posts on Stackoverflow and following them I downloaded psutil library. Now, I am unaware of where to put the statements to get the correct answer. Here is my code:
if __name__ == "__main__":
#kalman code
pid = os.getpid()
py = psutil.Process(pid)
current_process = psutil.Process();
memoryUse = py.memory_info()[0]/2.**30 # memory use in GB...I think
print('memory use:', memoryUse)
print(current_process.cpu_percent())
print(psutil.virtual_memory()) # physical memory usage
Please inform whether I am headed in the right direction or not.
The above code generated following results.
('memory use:', 0.1001129150390625)
0.0
svmem(total=6123679744, available=4229349376, percent=30.9, used=1334358016, free=3152703488, active=1790803968, inactive=956125184, buffers=82894848, cached=1553723392, shared=289931264, slab=132927488)
Edit: Goal: find out energy consumed by CPU while running this script

I'm not sure whether you're trying to find the peak CPU usage during execution of that #kalman code, or the average, or the total, or something else?
But you can't get any of those by calling cpu_percent() after it's done. You're just measuring the CPU usage of calling cpu_percent().
Some of these, you can get by calling cpu_times(). This will tell you how many seconds were spent doing various things, including total "user CPU time" and "system CPU time". The docs indicate exactly which values you get on each platform, and what they mean.
In fact, if this is all you want (including only needing those two values), you don't even need psutils, you can just use os.times() in the standard library.
Or you can just use the time command from outside your program, by running time python myscript.py instead of python myscript.py.
For others, you need to call cpu_percent while your code is running. One simple solution is to run a background process with multiprocessing or concurrent.futures that just calls cpu_percent on your main process. To get anything useful, the child process may need to call it periodically, aggregating the results (e.g., to find the maximum), until it's told to stop, at which point it can return the aggregate.
Since this is not quite trivial to write, and definitely not easy to explain without knowing how much familiarity you have with multiprocessing, and there's a good chance cpu_times() is actually what you want here, I won't go into details unless asked.

Comparing two different processes based on the real, user and sys times

I have been through other answers on SO about real,user and sys times. In this question, apart from the theory, I am interested in understanding the practical implications of the times being reported by two different processes, achieving the same task.
I have a python program and a nodejs program https://github.com/rnanwani/vips_performance. Both work on a set of input images and process them to obtain different outputs. Both using libvips implementations.
Here are the times for the two
Python
real 1m17.253s
user 1m54.766s
sys 0m2.988s
NodeJS
real 1m3.616s
user 3m25.097s
sys 0m8.494s
The real time (the wall clock time as per other answers is lesser for NodeJS, which as per my understanding means that the entire process from input to output, finishes much quicker on NodeJS. But the user and sys times are very high as compared to Python. Also using the htop utility, I see that NodeJS process has a CPU usage of about 360% during the entire process maxing out the 4 cores. Python on the other hand has a CPU usage from 250% to 120% during the entire process.
I want to understand a couple of things
Does a smaller real time and a higher user+sys time mean that the process (in this case Node) utilizes the CPU more efficiently to complete the task sooner?
What is the practical implication of these times - which is faster/better/would scale well as the number of requests increase?

My guess would be that node is running more than one vips pipeline at once, whereas python is strictly running one after the other. Pipeline startup and shutdown is mostly single-threaded, so if node starts several pipelines at once, it can probably save some time, as you observed.
You load your JPEG images in random access mode, so the whole image will be decompressed to memory with libjpeg. This is a single-threaded library, so you will never see more than 100% CPU use there.
Next, you do resize/rotate/crop/jpegsave. Running through these operations, resize will thread well, with the CPU load increasing as the square of the reduction, the rotate is too simple to have much effect on runtime, and the crop is instant. Although the jpegsave is single-threaded (of course) vips runs this in a separate background thread from a write-behind buffer, so you effectively get it for free.
I tried your program on my desktop PC (six hyperthreaded cores, so 12 hardware threads). I see:
$ time ./rahul.py indir outdir
clearing output directory - outdir
real 0m2.907s
user 0m9.744s
sys 0m0.784s
That looks like we're seeing 9.7 / 2.9, or about a 3.4x speedup from threading, but that's very misleading. If I set the vips threadpool size to 1, you see something closer to the true single-threaded performance (though it still uses the jpegsave write-behind thread):
$ export VIPS_CONCURRENCY=1
$ time ./rahul.py indir outdir
clearing output directory - outdir
real 0m18.160s
user 0m18.364s
sys 0m0.204s
So we're really getting 18.1 / 2.97, or a 6.1x speedup.
Benchmarking is difficult and real/user/sys can be hard to interpret. You need to consider a lot of factors:
Number of cores and number of hardware threads
CPU features like SpeedStep and TurboBoost, which will clock cores up and down depending on thermal load
Which parts of the program are single-threaded
IO load
Kernel scheduler settings
And I'm sure many others I've forgotten.
If you're curious, libvips has it's own profiler which can help give more insight into the runtime behaviour. It can show you graphs of the various worker threads, how long they are spending in synchronisation, how long in housekeeping, how long actually processing your pixels, when memory is allocated, and when it finally gets freed again. There's a blog post about it here:
http://libvips.blogspot.co.uk/2013/11/profiling-libvips.html

Does a smaller real time and a higher user+sys time mean that the process (in this case Node) utilizes the CPU more efficiently to complete the task sooner?
It doesn't necessarily mean they utilise the processor(s) more efficiently.
The higher user time means that Node is utilising more user space processor time, and in turn complete the task quicker. As stated by Luke Exton, the cpu is spending more time on "Code you wrote/might look at"
The higher sys time means there is more context switching happening, which makes sense from your htop utilisation numbers. This means the scheduler (kernel process) is jumping between Operating system actions, and user space actions. This is the time spent finding a CPU to schedule the task onto.
What is the practical implication of these times - which is faster/better/would scale well as the number of requests increase?
The question of implementation is a long one, and has many caveats. I would assume from the python vs Node numbers that the Python threads are longer, and in turn doing more processing inline. Another thing to note is the GIL in python. Essentially python is a single threaded application, and you can't easily break out of this. This could be a contributing factor to the Node implementation being quicker (using real threads).
The Node appears to be written to be correctly threaded and to split many tasks out. The advantages of the highly threaded application will have a tipping point where you will spend MORE time trying to find a free cpu for a new thread, than actually doing the work. When this happens your python implementation might start being faster again.

The higher user+sys time means that the process had more running threads and as you've noticed by looking at 360% used almost all available CPU resources of your 4-cores. That means that NodeJS process is already limited by available CPU resources and unable to process more requests. Also any other CPU intensive processes that you could eventually run on that machine will hit your NodeJS process. On the other hand Python process doesn't take all available CPU resources and probably could scale with a number of requests.

So these times are not reliable in and of themselves, they say how long the process took to perform an action on the CPU. This is coupled very tightly to whatever else was happening at the same time on that machine and could fluctuate wildly based entirely on physical resources.
In terms of these times specifically:
real = Wall Clock time (Start to finish time)
user = Userspace CPU time (i.e. Code you wrote/might look at) e.g. node/python libs/your code
sys = Kernel CPU time (i.e. Syscalls, e.g Open a file from the OS.)
Specifically, small real time means it actually finished faster. Does it mean it did it better for sure, NO. There could have been less happening on the machine at the same time for instance.
In terms of scale, these numbers are a little irrelevant, and it depends on the architecture/bottlenecks. For instance, in terms of scale and specifically, cloud compute, it's about efficiently allocating resources and the relevant IO for each, generally (compute, disk, network). Does processing this image as fast as possible help with scale? Maybe? You need to examine bottlenecks and specifics to be sure. It could for instance overwhelm your network link and then you are constrained there, before you hit compute limits. Or you might be constrained by how quickly you can write to the disk.

One potentially important aspect of this which no one mention is the fact that your library (vips) will itself launch threads:
http://www.vips.ecs.soton.ac.uk/supported/current/doc/html/libvips/using-threads.html
When libvips calculates an image, by default it will use as many
threads as you have CPU cores. Use vips_concurrency_set() to change
this.
This explains the thing that surprised me initially the most -- NodeJS should (to my understanding) be pretty single threaded, just as Python with its GIL. It being all about asynchronous processing and all.
So perhaps Python and Node bindings for vips just use different threading settings. That's worth investigating.
(that said, a quick look doesn't find any evidence of changes to the default concurrency levels in either library)

(Unix/Python) Activity Monitor % CPU

I have a project I am working on for developing a distributed computing model for one of my other projects. One of my scripts is multiprocessed, monitoring for changes in directories, decoding pickles, creating authentication keys for them based on the node to which they are headed, and repickling them. (Edit: These processes all operate in loops.)
My question is, when looking at the activity monitor in OSX, my %CPU column displays 100% for the primary processes that run the scripts. These three that are showing 100% are the manager script, and the two nodes (I am simulating the model on one machine, the intent is to move the model to a live cluster network in the future). Is this bad? My system usage shows 27.50%, user 12.50%, and 65% idle.
I've attempted research myself, first, and my only thought is that these numbers display that the process is utilizing the CPU for the entire time it's alive, and is never idle.
Can I please get some clarification?
Update based on comments:
My processes run in an endless loop, monitoring for changes to files in their respective directories, in order to receive new 'jobs' from the manager process/script (a separate computer in the cluster in the project's final implementation). Maybe there is a better way to be waiting for I/O in this form that doesn't require so much processor time?

Solution (If however Sub-Optimal):
Implemented a time.sleep(n) period at the end of each loop for 0.1 seconds. Brought the CPU time down to no more than 0.4%.
Still, though, I am looking for a more optimal way of reducing CPU time, without using modules not included in the Python Standard Library, I'd like to avoid using a time.sleep(n) period, as I want the system to be able to respond at any moment, and if the load of input files gets very high, I do not want it to be wasting time sleeping, when it could be processing the files.
My processes operate by busy waiting, so it was causing them to use excessive CPU time:
while True:
files = os.path.listdir('./sub_directory/')
if files != []:
do_something()
This is the basis for each script/process I am executing.

How to monitor Cpu usage for my commands?

I want to monitor my project in different parts to find out which part is consuming more CPU usage. I think it's possible in two ways:
1- before each command get CPU usage, of course it can't be efficient because maybe the command last long and I cant check CPU usage during the execution of command.
2- Create a monitoring daemon which monitors that specific process's CPU usage in milliseconds and log them some where and simultaneously log the time in my project and compare together.
1- Please let me know if there is any other way to do this?
2- Please tell me how to get specific process CPU usage?
I'm using python 2.6 on Linux Debian

Gathering data on what parts of program use most resources is called profiling. Python has tools for this task in standard library, see:
http://docs.python.org/2/library/profile.html
If this is not enough, you can google for 'python profiler' to find other tools, that better suit your needs.

I think it would be nice if you can log the information after some interval about your process from the
/proc/PID/
directory. Linux based system maintains the information about the running process into its virtual "proc" directory. Now it maintains the enormous information about process resources utilization.The complete information about this can be found from here.You may want to explore this and find where kernel stores the useful information which suits your requirement. Sometime back I had written some shell script fetch this information about the process(input is PID) of your program. You can get the information from my blog
http://mantoshopensource.blogspot.in/2011/02/proc-direcory-information.html
Now if you can log these information and put some logging in your program then you can verify that which part of your program is taking more CPU memory usage. Once you have broadly identify the module in your program which is causing the problem, then you can go for some dynamic tool or do static code analysis.
Hope you would find it of some use.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.