Parallel Python-C++ program freezes (memory?) - python

I have a python program which features a C++ core python wrapped. It's written in parallel as it is computationally very expensive and I'm currently making it run on a server in remote on a Ubuntu 16.04 platform.
The problem I'm experiencing is that, at a certain number of cycles (let's say 2000) for my test case, it freezes abruptly without giving error messages or anything. I detected the part of the code where it stops and is a python function which doesn't feature any for cycle (so I assume it's not stuck into a loop). I tried to simply comment the function where it gets stuck out of the code, as it does minor calculations and now, at the exact same number of cycles, it gets stuck a little bit ahead, this time inside the C++ written part. I'm starting to assume that a possibility is some memory problem related to the server.
Doing htop from terminal when the code is stuck I can see the units involved in the computation are fully loaded as they are currently involved in some unknown calculations. Moreover, the memory involved in the process (at least when the process is already stuck) is not fully occupied so it may not be a RAM problem neither.
I also tried to reduce drastically the number of output written at every cycle (which, I admit, where consistent in size) but nothing. With the optimum number of processors it takes like 20 minutes to get to the critical point of 2000 cycles so the problem is not easy reproducible.
It is the first time I'm experiencing these sort of problems, Is there anything else I can do to highlight the issue?
Thanks for the answer

Here is something you could try.
Write a code which checks which iterations is taking place and store all the variables at the start of the 2000th iteration.
Then use the same variable set to run the iteration again.
It won't solve your problem, but it will help in reducing the testing time, and consequently the time it takes for you to find the problem.
If it is definitely a memory issue, the code will not get stuck at 2000 (That's where you start) but will get stuck at 4000.
Then you can monitor memory at 2000th iteration and replicate that.

Related

Why does a Python program get faster the more times it is run?

This isn't a problem, I'm just curious. In Atom, after running numerous tests, I realized that each python programs I created ran faster the more times they were run, (they did reach a certain equilibrium after a few runs), and I was just wondering why this happens. The programs weren't huge, (not more than 100 lines) so my best guess is the time change can be explained by the initial construction of the variables and general setup, but I'm not sure.
For background information, the way I'm getting my times is by using the "script" package by rgbkrk, which handles the output and records the time. I think the code is irrelevant due to the numerous different types of tests all yielding the same result, so here are just some example times:
Finished in 0.641s
Finished in 0.257s
Finished in 0.06s
Finished in 0.049s
Finished in 0.049s
Finished in 0.058s
I'm not entirely sure why this happens, so an explanation would be helpful. Thanks.
EDIT: Code isn't required, the same thing happens even without any code. I opened Atom and ran an empty file a few times, and the same thing happened.(The run time got faster after the first run).
A surprising amount of apparent performance can be traced to how the particular operating system you're running on decides how to cache blocks from (relatively slow) disk in memory. On first run, nothing will be cached. On subsequent runs, depending on what contends for operating system disk cache in the interim, more will be cached. When you run again, not having to do to disk is a big performance win.
What you'll probably find is that if you wait a while, and run some other programs that are disk hungry, the next run of your python code will be slower.

Reduce stacktrace of executing frame

I'm doing my best to not be vague here and this all could be solved with a while loop, but as an exercise I've gone against the best practice of using a while loop in hopes to learn something new.
I'm re-learning some basics of CPU architecture, and thought it'd be a fun project to implement CPU emulation with "actual" JMP logic, or as close to it as possible in software.
However, I'm getting stuck on a rendering process of said logic.
The code is (according to my best judgement) irrelevant to the problem but to avoid a back and fourth, the logic is as follows:
.LDA 0220
.ASL
.BCC FA
All this does it check a register, executes a bit shift to the left and jumps to memory address FA if a status flag is set correctly, if not it will jump back to .LDA and check the registers again.
The implementation in python does the same with a recursive function of each step in the code. Needless to say, this is beyond best practices but I thought it'd be a fun experiment in recursion and call-order.
If my math checks out, I end up with 16 280 recursive calls before python simply halts, and after 3 seconds or so just quits back out to the command prompt.
I've done sys.setrecursionlimit(self.dotcount*self.linecount) in a dirty attempt to increase the recursionlimit, and the goal here was to be able to execute 81 600 recursions (340x240 pixels, one recursive call per pixel rougly).
according to What is the maximum recursion depth in Python, and how to increase it? this is a bad idea, since the frames are pretty big, so my attempt to remedy this was:
for tb in inspect.stack():
tb.frame.clear()
I've also tried (to no prevail) to use traceback.clear_frames(tb).
The dead end hits me with RuntimeError: cannot clear an executing frame.
My last resort/question is: Is it possible to reduce the executing frame to allow for more recursive calls that I know have a happy ending? I can't see that I'm even close to running out of RAM and the application isn't running even close to noticeably slow (I was expecting a slow down at some point).
If so, how do I free up the stack trace or increase the recursion depth further?
So Python itself doesn't have a mechanism to do tail recursion directly. However, there are some interesting tail-recursion decorators which should help quite a bit. They work by editing the call stack before entering a recursive call again. Very clever!

The way pyglet swaps front and back buffers (`flip()`, wrapper for OpenGL's wglSwapLayerBuffers) on Windows can be 100 times too slow

Out of the blue (although I might have missed some automated update), the flip() method of pyglet on my P.C. became about 100 times slower (my script goes from about 20 to 0.2 FPS, and profiling shows that flip() is to blame).
I don't understand this fully but since my OS is windows 10, the method seems to just be a way to run the wglSwapLayerBuffers OpenGL double-buffering cycle in python. Everything else seems to have a normal speed, including programs that use OpenGL. This has happened before and fixed itself after a restart, so I didn't really look further into it at the time.
Now, restarting doesn't change anything. I updated my GPU driver, I tried disabling vsync, I looked for unrelated processes that might use a lot of memory and/or GPU memory. I re-installed the latest stable version of pyglet.
Now I have no idea how to even begin troubleshooting this...
Here's a minimal example that prints me 0.2s instead of 20s.
from pyglet.gl import *
def timing(dt):
print(1/dt)
game_window = pyglet.window.Window(1,1)
if __name__ == '__main__':
pyglet.clock.schedule_interval(timing, 1/20.0)
pyglet.app.run()
(Within pyglet.app.run(), profiling shows me that it's the flip() method that takes basically all the time).
Edit: my real script, which displays frequently updated images using pyglet, causes no increase in GPU usage whatsoever (I also checked the effect of a random program (namely Minecraft) to make sure the GPU monitoring tool I use works, and it does cause an increase). I think this rules out the possibility that I somehow don't have enough computing power available due to some unrelated issue.
OK, I found a way to solve my issue in this google groups conversation about a different problem with the same method: https://groups.google.com/forum/#!topic/pyglet-users/7yQ9viOu75Y (the changes suggested in claudio canepa's reply, namely making flip() link to the GDI version of the same function instead of wglSwapLayerBuffers, brings things back to normal).
I'm still not sure why wglSwapLayerBuffers behaved so oddly in my case. I guess problems like mine are part of the reason why the GDI version is "recommended". However understanding why my problem is even possible would still be nice, if someone gets what's going on... And having to meddle with a relatively reliable and respected library just to perform one of its most basic tasks feels really, really dirty, there must be a more sensible solution.

How to trace random MemoryError in python script?

I have a python script, which is used to perform a lab measurement using several devices. The whole setup is rather involved, including communication over serial devices, API calls as well as the use of self-written and commercial drivers. In the end, however, everything boils down to two nested loops, which vary some parameters, collect data and write it to a file.
My problem is that I observe random occurences of a MemoryError, typically after 10 hours, equivalent to ~15k runs of the loops. At the moment, I don't have an idea, where it comes from or how I can trace it further. So I would be happy for suggestions, how to work on my problem. My observations up to this moment are as follows.
The error occurs at random states of the program. Different runs will throw the MemoryError at different lines of my script.
There is never any helpful error message. Python only says MemoryError without any error string. The traceback leads me to some point in the script, where memory is needed (e.g. when building a list), but it appears to be no specific instruction, which is the problem.
My RAM is far from full. The python process in question typically consumes some ten MB of RAM when viewed in the task manager. In addition, the RAM usage appears to be stable for hours. Usually, it increases slowly for some time, just to drop to down to the previous level quickly, which I interpret as the garbage collector kicking in periodically.
So far I did not find any indications for a memory leak. I used memory_profiler to trace the memory usage of my functions and found it to be stable. In addition, I followed this blog entry to observe what the garbage collector does in detail. Again, I could not find any hints for undeleted objects.
I am stuck to Win7 x86 due to a driver, which will only work on a 32bit system. So I cannot follow suggestions like this to go to a 64 bit version of Windows. Anyway, I do not see, how this would help in my situation.
The iPython console, from which the script is being launched, often behaves strange after the error occurred. Sometimes, a new MemoryError is thrown even for very simple operations. Often, the console is marked by Windows as "not responding" after some time. A menu pops up, where besides the usual options to wait for the process or to terminate it, there is a third option to "restore" the program (whatever that means). Doing so usually causes the console to work normal again.
At this point, I am somewhat out of ideas on how to proceed. The general receipe to comment out parts of the script until it works is highly undesirable in my case. As stated above, each test run will take several hours, meaning a potential downtime of weeks for my lab equipment. Going that direction, appears unfeasable to me. Is there any more direct approach to learn, what is crashing behind the scenes? How can I understand that python apparently fails to malloc?

Why does this simple loop produce a "jittery" print?

I'm just learning Python and I have tried this simple loop based on Learn Python The Hard Way. With my basic understanding, this should keep printing "Hello", one letter at a time at the same position. This seems to be the case, but the print is not fluid, it doesn't spend the same amount of time on each character; some go very fast, and then it seems to get stuck for one or two seconds on one.
Can you explain why?
while True:
for i in ["H","e","l","l","o"]:
print "%s\r" % i,
you are running an infinite loop with very little work done in it, and most of it being printing. The bottleneck of such an application is how fast your output can be integrated in your running environment (you console).
There are various buffers involved, and the system can also schedule other processes and therefore pause your app for a few cycles.

Categories

Resources