Odd python search-path behavior, what's going wrong here?

Odd python search-path behavior, what's going wrong here? - python

We have an application based on Excel 2003 and Python 2.4 on Windows XP 32bit. The application consists of a large collection of Python functions which can be called from a number of excel worksheets.
We've notcied an anomolous behavior which is that sometimes in the middle of one of these calls the python interpreter will start hunting around for modules which almost certainly are already loaded and in memory.
We know this because we were able to hook-up Sysinternal's Process Monitor to the process and observe that from time to time the process (when called) starts hunting around a bunch of directories and eggs for certain .py files.
The obvious thing to try is to see if the python search-path had become modified, however we found this not to be the case. It's exactly what we'd expect. The odd thing is that:
The occasions on which this searching behavior was triggered appears to be random, i.e. it did not happen every time or with any noticable pattern.
The behavior did not affect the result of the function. It returned the same value irrespective of whether this file searching behavior was triggered.
The folders that were being scanned were non-existant (e.g. J:/python-eggs ) on a machine where J-drive contained no-such folder. Naturally procmon reports that this generated a file-not found error.
It's all very mysterious so I dont expect anybody to be able to provide a definitive answer as to what might be going wrong. I would appreciate any suggestions about how this problem might be debugged.
Thanks!
Answers to comments
All the things that are being searched for are actual, known python files which exist in the main project .egg file. The odd thing is that at the time they are being searched-for those particuar modules have already been imported. They must be in memory in order for the process to work.
Yes, this affects performance because sometimes this searching behavior tries to hit network drives. Also by searching eggs which couldnt possibly contain certain modules it the process gets interrupted by the corporate mandated virus-scanner. That slows down what would normally be a harmless and instant interruption.
This is stock python 2.4.4. No modifications.

Python programs can import modules at any time, not just during program load. Try searching the modules you are using for import.
If this doesn't work, you can write an import hook to catch and report all attempted imports before they occur. For example, if you run this before everything else, you will get a dump of every attempted import and its source:
import sys, traceback
class ImportDebugger:
def find_module(self, fullname, path=None):
print "Attempting to import %s:" % fullname
traceback.print_stack()
sys.meta_path.insert(0, ImportDebugger())

"Python functions which can be called from a number of excel worksheets"
And you're not blaming Excel for randomly running Python modules? Why not? How have you proven that Excel is behaving properly?

Related

Compile time checking and where is the main script byte-code is stored in Python?

I'm new to the Python and following multiple online tutorials to learn. One of it is Google for Education.
In the Google's tutorial there is a section:
Code Checked at Runtime
Python does very little checking at compile time, deferring almost all
type, name, etc. checks on each line until that line runs. Suppose the
above main() calls repeat() like this:
def main():
if name == 'Guido':
print repeeeet(name) + '!!!'
else:
print repeat(name)
The if-statement contains an obvious error, where the repeat() function is accidentally typed in as repeeeet().
The funny thing in Python ... this code compiles and runs fine so long
as the name at runtime is not 'Guido'. Only when a run actually tries
to execute the repeeeet() will it notice that there is no such
function and raise an error. This just means that when you first run a
Python program, some of the first errors you see will be simple typos
like this. This is one area where languages with a more verbose type
system, like Java, have an advantage ... they can catch such errors at
compile time (but of course you have to maintain all that type
information ... it's a tradeoff).
There is a good example of runtime check in that section but no compile time check example.
And I'm interested in knowing about that little checking at compile time.
I can't find anything in the internet regarding that line. Every possible search returns about compiling python scripts and modules like this, this and this.
Edit:
python myscript.py is compiled(otherwise we wont be getting errors) and then interpreted to execute. Then compilation process should definitely produce a code(it might be byte-code). Is that code stored in memory instead of storing it as .pyc in the filesystem?
Edit 2:
For more on why the main script byte code is stored in memory and why modules are compiled can be found here.

Not sure about exact compiler procedure in python. but what "little check" here means on running python file it will check if all the modules used/imported are existing and have references but it won't check the variables or it's types. Because in python we don't use types to declare variables. So all such type errors are ignored in compile time and encountered only during execution
A pyc file is created for imported modules, and they are placed in the same directory containing the py file. However... no pyc file is created for the main script for your program. In other words... if you call "python myscript.py" on the command line, there will be no .pyc file for myscript.py. Since it is main script the compiled pyc wont be reusable.. but if it is a module (without main) then the same pyc could be reused whenever it is imported.
Hope it is useful!

Python crashes in rare cases when running code - how to debug?

I have a problem that I seriously spent months on now!
Essentially I am running code that requires to read from and save to HD5 files. I am using h5py for this.
It's very hard to debug because the problem (whatever it is) only occurs in like 5% of the cases (each run takes several hours) and when it gets there it crashes python completely so debugging with python itself is impossible. Using simple logs it's also impossible to pinpoint to the exact crashing situation - it appears to be very random, crashing at different points within the code, or with a lag.
I tried using OllyDbg to figure out whats happening and can safely conclude that it consistently crashes at the following location: http://i.imgur.com/c4X5W.png
It seems to be shortly after calling the python native PyObject_ClearWeakRefs, with an access violation error message. The weird thing is that the file is successfully written to. What would cause the access violation error? Or is that python internal (e.g. the stack?) and not file (i.e. my code) related?
Has anyone an idea whats happening here? If not, is there a smarter way of finding out what exactly is happening? maybe some hidden python logs or something I don't know about?
Thank you

PyObject_ClearWeakRefs is in the python interpreter itself. But if it only happens in a small number of runs, it could be hardware related. Things you could try:
Run your program on a different machine. if it doesn't crash there, it is probably a hardware issue.
Reinstall python, in case the installed version has somehow become corrupted.
Run a memory test program.

Thanks for all the answers. I ran two versions this time, one with a new python install and my same program, another one on my original computer/install, but replacing all HDF5 read/write procedures with numpy read/write procedures.
The program continued to crash on my second computer at odd times, but on my primary computer I had zero crashes with the changed code. I think it is thus safe to conclude that the problems were HDF5 or more specifically h5py related. It appears that more people encountered issues with h5py in that respect. Given that any error in my application translates to potentially large financial losses I decided to dump HDF5 completely in favor of other stable solutions.

Use a try catch statement. This can be put into the program in order to stop the program from crashing when erroneous data is entered

python modules missing in sage

I have Sage 4.7.1 installed and have run into an odd problem. Many of my older scripts that use functions like deepcopy() and uniq() no longer recognize them as global names. I have been able to fix this by importing the python modules one by one, but this is quite tedious. But when I start the command-line Sage interface, I can type "list2=deepcopy(list1)" without importing the copy module, and this works fine. How is it possible that the command line Sage can recognize global name 'deepcopy' but if I load my script that uses the same name it doesn't recognize it?
oops, sorry, not familiar with stackoverflow yet. I type: 'sage_4.7.1/sage" to start the command line interface; then, I type "load jbom.py" to load up all the functions I defined in a python script. When I use one of the functions from the script, it runs for a few seconds (complex function) then hits a spot where I use some function that Sage normally has as a global name (deepcopy, uniq, etc) but for some reason the script I loaded does not know what the function is. And to reiterate, my script jbom.py used to work the last time I was working on this particular research, just as I described.
It also makes no difference if I use 'load jbom.py' or 'import jbom'. Both methods get the functions I defined in my script (but I have to use jbom. in the second case) and both get the same error about 'deepcopy' not being a global name.
REPLY TO DSM: I have been sloppy about describing the problem, for which I am sorry. I have created a new script 'experiment.py' that has "import jbom" as its first line. Executing the function in experiment.py recognizes the functions in jbom.py but deepcopy is not recognized. I tried loading jbom.py as "load jbom.py" and I can use the functions just like I did months ago. So, is this all just a problem of layering scripts without proper usage of import/load etc?
SOLVED: I added "from sage.all import *" to the beginning of jbom.py and now I can load experiment.py and execute the functions calling jbom.py functions without any problems. From the Sage doc on import/load I can't really tell what I was doing wrong exactly.

Okay, here's what's going on:
You can only import files ending with .py (ignoring .py[co]) These are standard Python files and aren't preparsed, so 1/3 == int(0), not QQ(1)/QQ(3), and you don't have the equivalent of a from sage.all import * to play with.
You can load and attach both .py and .sage files (as well as .pyx and .spyx and .m). Both have access to Sage definitions but the .py files aren't preparsed (so y=17 makes y a Python int) while the .sage files are (so y=17 makes y a Sage Integer).
So import jbom here works just like it would in Python, and you don't get the access to what Sage has put in scope. load etc. are handy but they don't scale up to larger programs so well. I've proposed improving this in the past and making .sage scripts less second-class citizens, but there hasn't yet been the mix of agreement on what to do and energy to do it. In the meantime your best bet is to import from sage.all.

Dangerous Python Keywords?

I am about to get a bunch of python scripts from an untrusted source.
I'd like to be sure that no part of the code can hurt my system, meaning:
(1) the code is not allowed to import ANY MODULE
(2) the code is not allowed to read or write any data, connect to the network etc
(the purpose of each script is to loop through a list, compute some data from input given to it and return the computed value)
before I execute such code, I'd like to have a script 'examine' it and make sure that there's nothing dangerous there that could hurt my system.
I thought of using the following approach: check that the word 'import' is not used (so we are guaranteed that no modules are imported)
yet, it would still be possible for the user (if desired) to write code to read/write files etc (say, using open).
Then here comes the question:
(1) where can I get a 'global' list of python methods (like open)?
(2) Is there some code that I could add to each script that is sent to me (at the top) that would make some 'global' methods invalid for that script (for example, any use of the keyword open would lead to an exception)?
I know that there are some solutions of python sandboxing. but please try to answer this question as I feel this is the more relevant approach for my needs.
EDIT: suppose that I make sure that no import is in the file, and that no possible hurtful methods (such as open, eval, etc) are in it. can I conclude that the file is SAFE? (can you think of any other 'dangerous' ways that built-in methods can be run?)

This point hasn't been made yet, and should be:
You are not going to be able to secure arbitrary Python code.
A VM is the way to go unless you want security issues up the wazoo.

You can still obfuscate import without using eval:
s = '__imp'
s += 'ort__'
f = globals()['__builtins__'].__dict__[s]
** BOOM **

Built-in functions.
Keywords.
Note that you'll need to do things like look for both "file" and "open", as both can open files.
Also, as others have noted, this isn't 100% certain to stop someone determined to insert malacious code.

An approach that should work better than string matching us to use module ast, parse the python code, do your whitelist filtering on the tree (e.g. allow only basic operations), then compile and run the tree.
See this nice example by Andrew Dalke on manipulating ASTs.

built in functions/keywords:
eval
exec
__import__
open
file
input
execfile
print can be dangerous if you have one of those dumb shells that execute code on seeing certain output
stdin
__builtins__
globals() and locals() must be blocked otherwise they can be used to bypass your rules
There's probably tons of others that I didn't think about.
Unfortunately, crap like this is possible...
object().__reduce__()[0].__globals__["__builtins__"]["eval"]("open('/tmp/l0l0l0l0l0l0l','w').write('pwnd')")
So it turns out keywords, import restrictions, and in-scope by default symbols alone are not enough to cover, you need to verify the entire graph...

Use a Virtual Machine instead of running it on a system that you are concerned about.

Without a sandboxed environment, it is impossible to prevent a Python file from doing harm to your system aside from not running it.
It is easy to create a Cryptominer, delete/encrypt/overwrite files, run shell commands, and do general harm to your system.
If you are on Linux, you should be able to use docker to sandbox your code.
For more information, see this GitHub issue: https://github.com/raxod502/python-in-a-box/issues/2.
I did come across this on GitHub, so something like it could be used, but that has a lot of limits.
Another approach would be to create another Python file which parses the original one, removes the bad code, and runs the file. However, that would still be hit-and-miss.

Embedded Python - Blocking operations in time module

I'm developing my own Python code interpreter using the Python C API, as described in the Python documentation. I've taken a look on the Python source code and I tried to follow the same steps that are carried out in the standard interpreter when executing a py file. These steps (sequence of C API function calls) are basically:
PyRun_AnyFileExFlags()
PyRun_SimpleFileExFlags()
PyRun_FileExFlags()
PyArena_New()
PyParser_ASTFromFile()
run_mod()
PyAST_Compile()
PyEval_EvalCode()
PyEval_EvalCodeEx()
PyThreadState_GET()
PyFrame_New()
PyEval_EvalFrameEx()
The only difference in my code is that I do manually the AST compilation, frame creation, etc. and then I call PyEval_EvalFrame.
With this, I am able to execute an arbitrary .py file with my program, as if it were the normal Python interpreter. My problem comes when the code that my program is executing makes use of the time module: all time module operations get blocked in the GIL! For example, if the Python code calls time.sleep(1), this call is blocked and never gets executed.
Obviously I am doing something wrong that blocks the GIL (and therefore blocks the time module) but I dont know how to correct it. The last statement in my code where I have control is in PyEval_EvalFrameEx, and from that point on, everything runs "as in regular Python interpreter", I think.
Anybody had a similar problem? What am I doing wrong, so that I block the time module?
Hope somebody can help me...
Thanks for your time. Best regards,
R.

You need to provide more detail.
How does your interpreter's behavior differ from the standard interpreter?
If you just want to run arbitrary source files, why are you not calling one of the higher level interfaces, like PyRun_SimpleFile? Did your code call Py_Initialize?

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.