I have spent a long time profiling some of my Django code (trying to find the root cause of a performance issue), which converts a queryset to a list of HTML tags.
The queryset in question has around 8000 records, but profiling has shown that the database/SQL is not the issue, as the actual query executes in around 1 millisecond.
After many hours of trying to find the problem I accidentally stumbled upon the fact that running the specific code via Apache/WSGI seems to play a big role in the performance problem.
To narrow the issue down to a single line of Python code, I have wrapped it in a "clock" to measure it's performance, like so:
import time
start_time = time.clock()
records = list(query_set) # <--- THIS LINE IS THE CULPRIT
total_time = time.clock() - start_time
with open('some_path/debug.txt', 'a') as f:
f.write('%f\n'%total_time)
Ignoring the fact that I am converting an 8000+ record queryset to a Python list (I have reasons for doing so), the part I want to focus on is the timing differences:
Apache/WSGI: 1.184240000000000403446165365 (seconds)
From a shell: 0.6034849999999999381472548521 (seconds)
The code in question lives in a function that I can conveniently call from a Django management shell and run manually, which allows me to get the second timing metric.
So can someone please tell me what the heck is causing that doubling in execution time of that single line. Obviously a fair bit of built-in Django code is invoked by that line, but how can the way the code is executed make such a massive difference?
How can I get the same performance in Apache/WSGI as I get in the shell?
Note that I am running the Django application in daemon mode (not embedded mode) and the rest of my application is not suffering from performance issues. It's just that I never thought the difference between shell and Apache/WSGI would be a cause of performance difference, for this particular code, let alone any.
Update 1
I forgot to mention that I tried running the same code in nginx/uWSGI and the same problem occurred, so it doesn't seem to be Apache that's at fault. Possibly the way WSGI works itself?
Update 2
So the plot thickens...
That list(query_set) line of code is not multi-threaded by me, but if I alter my implementation slightly and do the actual iteration of the querset in multi-threaded code:
def threaded_func(query_set_slice): # Gets invoked by multiple threads created by me
start_time = time.clock()
for record in query_set_slice:
... do stuff ...
total_time = time.clock() - start_time
with open('some_path/debug.txt', 'a') as f:
f.write('%f\n'%total_time)
... there is still an execution time difference (between shell and Apache/WSGI), but this time the ratio is reversed.
I.e. The numbers are:
Apache/WSGI: 0.824394 (seconds)
Shell: 1.890437 (seconds)
Funnily enough, as the ratio of the threaded function is the reverse of the previous ratio, the execution time of the function that invokes the threaded function is the same for the two types of invocation (Apache/WSGI and Shell).
Update 3
After doing some research on time.clock() it seems that it can't be trusted all the time, so I tried time.time() and that produced roughly a 1:1 ratio.
I also tried LineProfiler and that gave a 1:1 ratio, so my guess is that time.clock() is not taking into account context switches and things like that.
Related
I've seen this stackoverflow question, as well as this one. The first says that the whitespace is from it being blocked by local work, but stepping through my program, the ~20 delay occurs right when I call dask.compute() and not in the surrounding code. The question asked said their issue was resolved by disabling garbage collection, but this did nothing for me. The second says to check the scheduler profiler, but that doesn't seem to be taking a long time either.
My task graph is dead simple - I'm calling a function on 500 objects with no task dependencies. (And repeat this 3 times, I'll link the functions once I figure out this issue). Here is my dask performance report html, and here is the section of code that is calling dask.compute().
Any suggestions as to what could be causing this? Any suggestions as to how I can better profile to figure this out?
This doesn't seem to be the main problem, but lines 585/587 will result in transfer of computations to the local machine, this could slow-down/introduce a bottleneck in the computations. If the results are not used locally downstream, then one option is to leave computations on the remote machines calling client.compute (assuming the client is named as client):
# changing 587: preprocessedcases = dask.compute(*preprocessedcases)
preprocessedcases = client.compute(*preprocessedcases)
More out of curiosity, I was wondering how might I make a python script sleep for 1 second without using the time module?
Is there a computation that can be conducted in a while loop which takes a machine of n processing power a designated and indexable amount of time?
As mentioned in comments for your second part of question:
The processing time is depends on the machine(computer and its configuration) you are working with and active processes on it. There isnt fixed amount of time for an operation.
It's been a long time since you could get a reliable delay out of just trying to execute code that would take a certain time to complete. Computers don't work like that any more.
But to answer your first question: you can use system calls and open a os process to sleep 1 second like:
import subprocess
subprocess.run(["sleep", "1"])
I have implemented kalman filter. I want to find out how much of cpu energy is being consumed by my script. I have checked other posts on Stackoverflow and following them I downloaded psutil library. Now, I am unaware of where to put the statements to get the correct answer. Here is my code:
if __name__ == "__main__":
#kalman code
pid = os.getpid()
py = psutil.Process(pid)
current_process = psutil.Process();
memoryUse = py.memory_info()[0]/2.**30 # memory use in GB...I think
print('memory use:', memoryUse)
print(current_process.cpu_percent())
print(psutil.virtual_memory()) # physical memory usage
Please inform whether I am headed in the right direction or not.
The above code generated following results.
('memory use:', 0.1001129150390625)
0.0
svmem(total=6123679744, available=4229349376, percent=30.9, used=1334358016, free=3152703488, active=1790803968, inactive=956125184, buffers=82894848, cached=1553723392, shared=289931264, slab=132927488)
Edit: Goal: find out energy consumed by CPU while running this script
I'm not sure whether you're trying to find the peak CPU usage during execution of that #kalman code, or the average, or the total, or something else?
But you can't get any of those by calling cpu_percent() after it's done. You're just measuring the CPU usage of calling cpu_percent().
Some of these, you can get by calling cpu_times(). This will tell you how many seconds were spent doing various things, including total "user CPU time" and "system CPU time". The docs indicate exactly which values you get on each platform, and what they mean.
In fact, if this is all you want (including only needing those two values), you don't even need psutils, you can just use os.times() in the standard library.
Or you can just use the time command from outside your program, by running time python myscript.py instead of python myscript.py.
For others, you need to call cpu_percent while your code is running. One simple solution is to run a background process with multiprocessing or concurrent.futures that just calls cpu_percent on your main process. To get anything useful, the child process may need to call it periodically, aggregating the results (e.g., to find the maximum), until it's told to stop, at which point it can return the aggregate.
Since this is not quite trivial to write, and definitely not easy to explain without knowing how much familiarity you have with multiprocessing, and there's a good chance cpu_times() is actually what you want here, I won't go into details unless asked.
When I use ansible's python API to run a script on remote machines(thousands), the code is:
runner = ansible.runner.Runner(
module_name='script',
module_args='/tmp/get_config.py',
pattern='*',
forks=30
)
then, I use
datastructure = runner.run()
This takes too long. I want to insert the datastructure stdout into MySQL. What I want is if once a machine has return data, just insert the data into MySQL, then the next, until all the machines have returned.
Is this a good idea, or is there a better way?
The runner call will not complete until all machines have returned data, can't be contacted or the SSH session times out. Given that this is targeting 1000's of machines and you're only doing 30 machines in parallel (forks=30) it's going to take roughly Time_to_run_script * Num_Machines/30 to complete. Does this align with your expectation?
You could up the number of forks to a much higher number to have the runner complete sooner. I've pushed this into the 100's without much issue.
If you want max visibility into what's going on and aren't sure if there is one machine holding you up, you could run through each hosts serially in your python code.
FYI - this module and class is completely gone in Ansible 2.0 so you might want to make the jump now to avoid having to rewrite code later
I am trying to execute certain number of python scripts at certain intervals. Each script takes a lot of time to execute and hence I do not want to waste time in waiting to run them sequentially. I tired this code but it is not executing them simultaneously and is executing them one by one:
Main_file.py
import time
def func(argument):
print 'Starting the execution for argument:',argument
execfile('test_'+argument+'.py')
if __name__ == '__main__':
arg = ['01','02','03','04','05']
for val in arg:
func(val)
time.sleep(60)
What I want is to kick off by starting the executing of first file(test_01.py). This will keep on executing for some time. After 1 minute has passed I want to start the simultaneous execution of second file (test_02.py). This will also keep on executing for some time. Like this I want to start the executing of all the scripts after gaps of 1 minute.
With the above code, I notice that the execution is happening one after other file and not simultaneously as the print statements which are there in these files appear one after the other and not mixed up.
How can I achieve above needed functionality?
Using python 2.7 on my computer, the following seems to work with small python scripts as test_01.py, test_02.py, etc. when threading with the following code:
import time
import thread
def func(argument):
print('Starting the execution for argument:',argument)
execfile('test_'+argument+'.py')
if __name__ == '__main__':
arg = ['01','02','03']
for val in arg:
thread.start_new_thread(func, (val,))
time.sleep(10)
However, you indicated that you kept getting a memory exception error. This is likely due to your scripts using more stack memory than was allocated to them, as each thread is allocated 8 kb by default (on Linux). You could attempt to give them more memory by calling
thread.stack_size([size])
which is outlined here: https://docs.python.org/2/library/thread.html
Without knowing the number of threads that you're attempting to create or how memory intensive they are, it's difficult to if a better solution should be sought. Since you seem to be looking into executing multiple scripts essentially independently of one another (no shared data), you could also look into the Multiprocessing module here:
https://docs.python.org/2/library/multiprocessing.html
If you need them to run parallel you will need to look into threading. Take a look at https://docs.python.org/3/library/threading.html or https://docs.python.org/2/library/threading.html depending on the version of python you are using.