I'm trying to discover the cause of delays in Django 1.8 startup, especially, but not only, when run in a debugger (WingIDE 5 and 6 in my case).
Minimal test case: the Django 1.8 tutorial "poll" example, completed just to the first point where 'manage.py runserver' works. All default configuration, using SQLite. Python 3.5.2 with Django 1.8.14, in a fresh venv.
From the command line, on Linux (Mint 18) and Windows (7-64), this may run as fast as 2 seconds to reach the "Starting development server" message. But on Windows it sometimes takes 10+ secs. And in the debugger on both machines, it can take 40 secs.
One specific issue: By placing print statements at begin and end of django/__init__.py setup(), I note that this function is called twice before the "Starting... " message, and again after that message; the first two times contribute half the delay each. This suggests that django is getting started three times. What is the purpose of that, or does it indicate a problem?
(I did find that I could get rid of one of the first two startup()s using the runserver --noreload option. But why does it happen in the first place? And there's still a startup() call after the "Starting..." message.)
To summarize the question:
-- Any insights into what might be responsible for the delay?
-- Why does django need to start three times? (Or twice, even with --noreload).
A partial answer.
After some time with WingIDE IDE's debugger, and some profiling with cProfile, I have located the main CPU hogging issue.
During initial django startup there's a cascade of imports, in which module validators.py prepares some compiled regular expressions for later use. One in particular, URLValidator.regex, is complicated and also involves five instances of the unicode character set (variable ul). This causes re.compile to perform a large amount of processing, notably in sre_compile.py _optimize_charset() and in large number of calls to the fixup() function.
As it happens, the particular combination of calls and data structure apparently hit a special slowness in WingIDE 6.0b2 debugger. It's considerably faster in WingIDE 5.1 debugger (though still much slower than when run from command line). Not sure why yet, but Wingware is looking into it.
This doesn't explain the occasional slowness when launched from the command line on Windows; there's an outside change this was waiting for a sleeping drive to awaken. Still observing.
Related
I would like to know that the Python interpreter is doing in my production environments.
Some time ago I wrote a simple tool called live-trace which runs a daemon thread which collects stacktraces every N milliseconds.
But signal handling in the interpreter itself has one disadvantage:
Although Python signal handlers are called asynchronously as far as the Python user is concerned, they can only occur between the “atomic” instructions of the Python interpreter. This means that signals arriving during long calculations implemented purely in C (such as regular expression matches on large bodies of text) may be delayed for an arbitrary amount of time.
Source: https://docs.python.org/2/library/signal.html
How could I work around above constraint and get a stacktrace, even if the interpreter is in some C code for several seconds?
Related: https://github.com/23andMe/djdt-flamegraph/issues/5
I use py-spy with speedscope now. It is a very cool combination.
py-spy works on Windows/Linux/macOS, can output flame graphs by its own and is actively deployed, eg. subprocess profiling support was added in October 2019.
Have you tried Pyflame? It's based on ptrace, so it shouldn't be affected by CPython's signal handling subtleties.
Maybe the perf-tool from Brendan Gregg can help
I have a python script, which is used to perform a lab measurement using several devices. The whole setup is rather involved, including communication over serial devices, API calls as well as the use of self-written and commercial drivers. In the end, however, everything boils down to two nested loops, which vary some parameters, collect data and write it to a file.
My problem is that I observe random occurences of a MemoryError, typically after 10 hours, equivalent to ~15k runs of the loops. At the moment, I don't have an idea, where it comes from or how I can trace it further. So I would be happy for suggestions, how to work on my problem. My observations up to this moment are as follows.
The error occurs at random states of the program. Different runs will throw the MemoryError at different lines of my script.
There is never any helpful error message. Python only says MemoryError without any error string. The traceback leads me to some point in the script, where memory is needed (e.g. when building a list), but it appears to be no specific instruction, which is the problem.
My RAM is far from full. The python process in question typically consumes some ten MB of RAM when viewed in the task manager. In addition, the RAM usage appears to be stable for hours. Usually, it increases slowly for some time, just to drop to down to the previous level quickly, which I interpret as the garbage collector kicking in periodically.
So far I did not find any indications for a memory leak. I used memory_profiler to trace the memory usage of my functions and found it to be stable. In addition, I followed this blog entry to observe what the garbage collector does in detail. Again, I could not find any hints for undeleted objects.
I am stuck to Win7 x86 due to a driver, which will only work on a 32bit system. So I cannot follow suggestions like this to go to a 64 bit version of Windows. Anyway, I do not see, how this would help in my situation.
The iPython console, from which the script is being launched, often behaves strange after the error occurred. Sometimes, a new MemoryError is thrown even for very simple operations. Often, the console is marked by Windows as "not responding" after some time. A menu pops up, where besides the usual options to wait for the process or to terminate it, there is a third option to "restore" the program (whatever that means). Doing so usually causes the console to work normal again.
At this point, I am somewhat out of ideas on how to proceed. The general receipe to comment out parts of the script until it works is highly undesirable in my case. As stated above, each test run will take several hours, meaning a potential downtime of weeks for my lab equipment. Going that direction, appears unfeasable to me. Is there any more direct approach to learn, what is crashing behind the scenes? How can I understand that python apparently fails to malloc?
I need to trace program execution, so I decided to make infinite loop, and read pc register and make step.
Platform: IOS
In such way I want to trace program's execution flow.
Question is - how should i get $pc register through LLDB python API?
Your program will likely have more than one thread, and each thread will have a different PC. So you would start with your SBProcess object, then it has a "threads" property for iterating over threads - represented by the SBThread object. The SBThread has a "frames" property which is an array of all the "SBFrames", and frames[0] is the bottom-most frame. The SBFrame has "pc" property which is the pc. This table of the Python SB API's might help you out:
LLDB Python APIs
However, what you are trying to do won't work under Xcode - which is generally the only way to do debugging on iOS. Xcode and Python currently fight over who gets to control process execution, and at some point the wrong actor wins and execution stalls.
You can do this sort of thing using a stand-alone Python driver, an example of which is:
Process Events Example
But since you can't really attach to an iOS process from stand-alone lldb, this is hard to use for iOS development.
BTW, I've occasionally done what you are describing on Mac OS X, and it is also really really slow. You would only want to do this when you are desperate.
You can sometimes get the same effect by putting breakpoints on every function entry point, which you can do on the lldb command line using:
(lldb) break set -r .
and if you only care about tracing through some given modules, you can add the --shlib option one or more times to the "break set" line to restrict the breakpoints to those libraries. Then write a breakpoint command (which you can do in Python) to gather the requisite information. This will still be slow, but is closer to useable.
So I am working on a Matlab application that has to do some communication with a Python script. The script that is called is a simple client software. As a side note, if it would be possible to have a Matlab client and a Python server communicating this would solve this issue completely but I haven't found a way to work that out.
Anyhow, after searching the web I have found two ways to call Python scripts, either by the system() command or editing the perl.m file to call Python scripts instead. Both ways are too slow though (tic tocing them to > 20ms and must run faster than 6ms) as this call will be in a loop that is very time sensitive.
As a solution I figured I could instead save a file at a certain location and have my Python script continuously check for this file and when finding it executing the command I want it to. Now after timing each of these steps and summing them up I found it to be incredibly much faster (almost 100x so for sure fast enough) and I cant really believe that, or rather I cant understand why calling python scripts is so slow (not that I have more than a superficial knowledge of the subject). I also found this solution to be really messy and ugly so just wanted to check that, first, is it a good idea and second, is there a better one?
Finally, I realize that the Python time.time() and Matlab tic, toc might not be precise enough to measure time correctly on that scale so also a reason why I ask.
Spinning up new instances of the Python interpreter takes a while. If you spin up the interpreter once, and reuse it, this cost is paid only once, rather than for every run.
This is normal (expected) behaviour, since startup includes large numbers of allocations and imports. For example, on my machine, the startup time is:
$ time python -c 'import sys'
real 0m0.034s
user 0m0.022s
sys 0m0.011s
I have a relatively simple (no classes) python 2.7 program. The first thing the program does is read an sqlite dbase into a dictionary. The database is large, but not huge, around 90Meg on disk. It takes about 20 seconds to read in. After reading in the database I initialize some variables, e.g.
localMax = 0
localMin = 0
firstTime = True
When I debug this program in Eclipse-3.7.0/pydev - even these simple lines - each single-step in the debugger eats up 100% of a core, and takes between 5 and 10 seconds. I can see the python process goes to 100% cpu for 10 seconds. Single-step... wait 10 seconds... single-step... wait 10 seconds... If I debug at the command line just using pdb, no problems. If I'm not debugging at all, the program runs at "normal" speed, nothing strange like in Eclipse.
I've reproduced this on a dual core Win7 PC w/ 4G memory, my 8 core Ubuntu box w/ 8G of memory, and even my Mac Air. How's that for multi-platform development! I kept thinking it would work somewhere. I'm never even close to running out of memory at any time.
On each Eclipse single-step, why does the python process jump to 100% CPU, and take 10 seconds?
Here is a good enough workaround, based on Mikko Ohtamaa's hint. I just verified the following on my Mac Air:
If I simply close the 'Variables' window in the Eclipse GUI, I can single step through the code at normal speed. Which is great, but, uh, I don't have the Variables window.
For any variable I want to see, I can hover my cursor over the variable and see the value. I didn't attempt to hover over my large dictionary that is the culprit here.
I can also right-click on any variable and add a 'Watch', which brings up an 'Expressions' window. In this case the variable is just a degenerate case (very simple case) of an 'expression.
So, the workaround for me is to close the Eclipse Variable window, and use the Expressions window to selectively view variables. A pain, but for the debugging I'm doing it is better than pdb.
I simply commented this line out:
np.set_printoptions(threshold = 'nan')
It seems eclipse is trying to keep up with too much information.