What can change my floating point control word behind my back?

What can change my floating point control word behind my back? - python

I have a 32 bit Windows application, written primarily in Delphi, which performs floating point numerical simulations using the 8087 FPU. I have recently added the ability to link in external Python code using the Python API through python2x.dll. This recent change has led to some very strange behaviour.
The application has a batch mode of operation where it performs multiple simulations in parallel to take advantage of multi-core architectures. As soon as Python code has been executed in the process, I start to see changes to the 8087 control word on different threads. I've checked this very carefully and I have observed the control word having changed even in a thread which has never called into the Python DLL.
I know this sounds fantastical, but, as I have discovered, there are mechanisms for this behaviour to manifest. I have learnt about signals. I first hypothesised that the Python DLL was setting process wide signal handlers (by calling signal()) and these signal handlers were responsible for changing the control word. For example, a thread, unrelated to the Python code, could perhaps cause SIGFPE and that may, in turn, modify the control word.
I have rather come to the conclusion that signal() is not the mechanism. I arranged to execute the Python code at startup. Then I set of the signal handlers back to SIG_DFL. Then I started the simulations. But still the control word changes occurred.
My question (finally) is whether anyone knows of another mechanism by which the control word could be changed in such a manner. I'm looking for interrupts, APCs etc., I think!
Update
The control word is being changed to 0x037F which is the Intel default value. This differs from the MSVC/Windows default of 0x027F. I hypothesise that something is calling FPINIT.
I also discovered Py_InitializeEx which allows the caller to stop Python setting signal handlers. The control word changes occur even if I use this approach to initialisation so I'm even more convinced that is not the mechanism.

For example, a DllMain call with DLL_THREAD_ATTACH flag, see msdn
Update
I have found a link to similar problem - DLL Load "Poisons" FPU Control Word for New Threads. But yes, it is about the threads created after Dll load.

If I remember correctly, that's Delphi's problem. There are some discussions of the issue here and here. I remember bumping into it when trying to write some VST plugins in Delphi.

I have seen a case like this where the printer driver of the default printer changed the control word in my back. When I changed the default printer, the problem went away.
To circumvent this problem I set the control word to the default value at approriate places with:
_control87(_CW_DEFAULT, _CW_DEFAULT);
I have also seen the same problem on all machines of a customer that had Norton Security 2011 installed, but the problem went away with the fix for the printer driver, so I'm not really sure if Norton was really the cause.

Related

Selectively request root access for a piece of a (packaged) python .app

This problem involves the collision of several problems, all of which I understand only somewhat well, but I include them together because they could all be the entry point for a solution. Here is the best description I can give.
I have an app, in python. (I imagine I could theoretically solve all of these problems by learning Cocoa and ObjectiveC, but that seems like QUITE a lift, for this problem -- AND, as noted below, this problem may not actually be related to python, really, at all. I just don't know.) A CORE feature of this app is to trigger a minigame, with a hotkey -- meaning, the hotkey itself is fundamental to the desired functionality. And furthermore, I would really like to package this app, to let other people use it. (Locally, it works great! Hey!)
The problem starts with the fact that adding the hotkey -- which I am doing with
import keyboard
keyboard.add_hotkey('windows+shift+y', trigger_minigame)
-- requires root access. Due to DIRE WARNINGS in another SO post Forcing a GUI application to run as root (which, honestly, I only vaguely understand), I would like to grant that access to ONLY this part of the program. I IMAGINE, such an approach would look something like this:
# needs_root.py
import keyboard
from shouldnt_have_root import trigger_minigame
keyboard.add_hotkey('windows+shift+y', trigger_minigame)
# shouldnt_have_root.py
def minigame():
buncha pygame, GUI stuff (which is dangerous???)
def trigger_minigame():
adds event to minigame's event queue
# bash script
sudo python needs_root.py
HOWEVER -- there are several major challenges!
The biggest is that I don't even know if THAT is safe, since I don't know how security and permissions (especially with imports) works at all! And more generally, how dangerous are the imports? It appears that I may in fact have to import substantially more, to make it clear what event queue the trigger is adding an event TO -- and I don't know how to have that communication happen, while still isolating the GUI parts (or generally dangerous ones) from unnecessary and hazardous access.
There's another layer too though; packaging it through pyinstaller means that I can't target the scripts directly, because they'll have been turned into binaries, but according to THIS answer Packaging multiple scripts in PyInstaller it appears I can just target the binaries instead, i.e. have the first binary call
osascript -e 'do shell script "python needs_root_binary" with admin.'
to get the user to bless only the necessary part, but I don't know if that will put OTHER obstacles, or vulnerabilities (or inter-file communication difficulties), in the way.
LAST, I could try STARTING as root, and then switching away from it, as soon as the hotkey is set (and before anything else happens) -- but would that be safe? I'm still worried about the fact that it involves running sudo on the whole app.
In any event --
is this as big a mess as it feels?
How do I give root access to only a piece of a packaged .app, that I've written in python?

I'd advice You to:
enable the root access,
write the script,
disable the root access
as it's closer described in here.
The Pyinstaller is another chapter. When I was making software requiring usage of hotkeys, I was forced to use another than keyboard, because it wasn't working properly on PC without Python, therefore I made a hotkey with tkinter built-in function canvas.bind() (more info here).
Hopefully I helped.

You can not run a specific Python function as root, only the Python process executing your script can be run with elevated permissions.
So my answer is: your problem as described is unsolvable.

Statistical Profiling in Python

I would like to know that the Python interpreter is doing in my production environments.
Some time ago I wrote a simple tool called live-trace which runs a daemon thread which collects stacktraces every N milliseconds.
But signal handling in the interpreter itself has one disadvantage:
Although Python signal handlers are called asynchronously as far as the Python user is concerned, they can only occur between the “atomic” instructions of the Python interpreter. This means that signals arriving during long calculations implemented purely in C (such as regular expression matches on large bodies of text) may be delayed for an arbitrary amount of time.
Source: https://docs.python.org/2/library/signal.html
How could I work around above constraint and get a stacktrace, even if the interpreter is in some C code for several seconds?
Related: https://github.com/23andMe/djdt-flamegraph/issues/5

I use py-spy with speedscope now. It is a very cool combination.
py-spy works on Windows/Linux/macOS, can output flame graphs by its own and is actively deployed, eg. subprocess profiling support was added in October 2019.

Have you tried Pyflame? It's based on ptrace, so it shouldn't be affected by CPython's signal handling subtleties.

Maybe the perf-tool from Brendan Gregg can help

Problems Using the Berkeley DB Transactional Processing

I'm writing a set of programs that have to operate on a common database, possibly concurrently. For the sake of simplicity (for the user), I didn't want to require the setup of a database server. Therefore I setteled on Berkeley DB, where one can just fire up a program and let it create the DB if it doesn't exist.
In order to let programs work concurrently on a database, one has to use the transactional features present in the 5.x release (here I use python3-bsddb3 6.1.0-1+b2 with libdb5.3 5.3.28-12): the documentation clearly says that it can be done. However I quickly ran in trouble, even with some basic tasks :
Program 1 initializes records in a table
Program 2 has to scan the records previously added by program 1 and updates them with additional data.
To speed things up, there is an index for said additional data. When program 1 creates the records, the additional data isn't present, so the pointer to that record is added to the index under an empty key. Program 2 can then just quickly seek to the not-yet-updated records.
Even when not run concurrently, the record updating program crashes after a few updates. First it complained about insufficient space in the mutex area. I had to resolve this with an obscure DB_CONFIG file and then run db_recover.
Next, again after a few updates it complained 'Cannot allocate memory -- BDB3017 unable to allocate space from the buffer cache'. db_recover and relaunching the program did the trick, only for it to crash again with the same error a few records later.
I'm not even mentioning concurrent use: when one of the programs is launched while the other is running, they almost instantly crash with deadlock, panic about corrupted segments and ask to run recover. I made many changes so I went throug a wide spectrum of errors which often yield irrelevant matches when searched for. I even rewrote the db calls to use lmdb, which in fact works quite well and is really quick, which tends to indicate my program logic isn't at fault. Unfortunately it seems the datafile produced by lmdb is quite sparse, and quickly grew to unacceptable sizes.
From what I said, it seems that maybe some resources are being leaked somewhere. I'm hesitant to rewrite all this directly in C to check if the problem can come from the Python binding.
I can and I will update the question with code, but for the moment ti is long enough. I'm looking for people who have used the transactional stuff in BDB, for similar uses, which could point me to some of the gotchas.
Thanks

RPM (see http://rpm5.org) uses Berkeley DB in transactional mode. There's a fair number of gotchas, depending on what you are attempting.
You have already found DB_CONFIG: you MUST configure the sizes for mutexes and locks, the defaults are invariably too small.
Needing to run db_recover while developing is quite painful too. The best fix (imho) is to automate recovery while opening by checking the return code for DB_RUNRECOVERY, and then reopening the dbenv with DB_RECOVER.
Deadlocks are usually design/coding errors: run db_stat -CA to see what is deadlocked (or what locks are held) and adjust your program. "Works with lmdv" isn't sufficient to claim working code ;-)
Leaks can be seen with either valgrind and/or BDB compilation with -fsanitize:address. Note that valgrind will report false uninitializations unless you use overrides and/or compile BDB to initialize.

How to trace random MemoryError in python script?

I have a python script, which is used to perform a lab measurement using several devices. The whole setup is rather involved, including communication over serial devices, API calls as well as the use of self-written and commercial drivers. In the end, however, everything boils down to two nested loops, which vary some parameters, collect data and write it to a file.
My problem is that I observe random occurences of a MemoryError, typically after 10 hours, equivalent to ~15k runs of the loops. At the moment, I don't have an idea, where it comes from or how I can trace it further. So I would be happy for suggestions, how to work on my problem. My observations up to this moment are as follows.
The error occurs at random states of the program. Different runs will throw the MemoryError at different lines of my script.
There is never any helpful error message. Python only says MemoryError without any error string. The traceback leads me to some point in the script, where memory is needed (e.g. when building a list), but it appears to be no specific instruction, which is the problem.
My RAM is far from full. The python process in question typically consumes some ten MB of RAM when viewed in the task manager. In addition, the RAM usage appears to be stable for hours. Usually, it increases slowly for some time, just to drop to down to the previous level quickly, which I interpret as the garbage collector kicking in periodically.
So far I did not find any indications for a memory leak. I used memory_profiler to trace the memory usage of my functions and found it to be stable. In addition, I followed this blog entry to observe what the garbage collector does in detail. Again, I could not find any hints for undeleted objects.
I am stuck to Win7 x86 due to a driver, which will only work on a 32bit system. So I cannot follow suggestions like this to go to a 64 bit version of Windows. Anyway, I do not see, how this would help in my situation.
The iPython console, from which the script is being launched, often behaves strange after the error occurred. Sometimes, a new MemoryError is thrown even for very simple operations. Often, the console is marked by Windows as "not responding" after some time. A menu pops up, where besides the usual options to wait for the process or to terminate it, there is a third option to "restore" the program (whatever that means). Doing so usually causes the console to work normal again.
At this point, I am somewhat out of ideas on how to proceed. The general receipe to comment out parts of the script until it works is highly undesirable in my case. As stated above, each test run will take several hours, meaning a potential downtime of weeks for my lab equipment. Going that direction, appears unfeasable to me. Is there any more direct approach to learn, what is crashing behind the scenes? How can I understand that python apparently fails to malloc?

Lock down a program so it has no access to outside files, like a virus scanner does

I would like to launch an untrusted application programmatically, so I want to remove the program's ability to access files, network, etc. Essentially, I want to restrict it so its only interface to the rest of the computer is stdin and stdout.
Can I do that? Preferably in a cross-platform way but I sort of expect to have to do it differently for each OS. I'm using Python, but I'm willing to write this part in a lower level or more platform integrated language if necessary.
The reason I need to do this is to write a distributed computing infrastructure. It needs to download a program, execute it, piping data to stdin, and returning data that it receives on stdout to the central server. But since the program it downloads is untrusted, I want to restrict it to only using stdin and stdout.

The short answer is no.
The long answer is not really. Consider a C program, in which the program opens a log file by grabbing the next available file descriptor. Your program, in order to stop this, would need to somehow monitor this, and block it. Depending on the robustness of the untrusted program, this could cause a fatal crash, or inhibit harmless functionality. There are many other similar issues to this one that make what you are trying to do hard.
I would recommend looking into sandboxing solutions already available. In particular, a virtual machine can be very useful for testing out untrusted code. If you can't find anything that meets your needs, your best bet is to probably deal with this at the kernel level, or with something a bit closer to the hardware such as C.

Yes, you can do this. You can run an inferior process through ptrace (essentially you act as a debugger) and you hook on system calls and determine whether they should be allowed or not.
codepad.org does this for instance, see: about codepad. It uses the geordi supervisor to execute the untrusted code.

You can run untrusted apps in chroot and block them from using network with an iptables rule (for example, owner --uid-owner match)
But really, virtual machine is more reliable and on modern hardware performance impact is negligible.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.