Embedded Python - Blocking operations in time module

Embedded Python - Blocking operations in time module - python

I'm developing my own Python code interpreter using the Python C API, as described in the Python documentation. I've taken a look on the Python source code and I tried to follow the same steps that are carried out in the standard interpreter when executing a py file. These steps (sequence of C API function calls) are basically:
PyRun_AnyFileExFlags()
PyRun_SimpleFileExFlags()
PyRun_FileExFlags()
PyArena_New()
PyParser_ASTFromFile()
run_mod()
PyAST_Compile()
PyEval_EvalCode()
PyEval_EvalCodeEx()
PyThreadState_GET()
PyFrame_New()
PyEval_EvalFrameEx()
The only difference in my code is that I do manually the AST compilation, frame creation, etc. and then I call PyEval_EvalFrame.
With this, I am able to execute an arbitrary .py file with my program, as if it were the normal Python interpreter. My problem comes when the code that my program is executing makes use of the time module: all time module operations get blocked in the GIL! For example, if the Python code calls time.sleep(1), this call is blocked and never gets executed.
Obviously I am doing something wrong that blocks the GIL (and therefore blocks the time module) but I dont know how to correct it. The last statement in my code where I have control is in PyEval_EvalFrameEx, and from that point on, everything runs "as in regular Python interpreter", I think.
Anybody had a similar problem? What am I doing wrong, so that I block the time module?
Hope somebody can help me...
Thanks for your time. Best regards,
R.

You need to provide more detail.
How does your interpreter's behavior differ from the standard interpreter?
If you just want to run arbitrary source files, why are you not calling one of the higher level interfaces, like PyRun_SimpleFile? Did your code call Py_Initialize?

Related

How to protect Python source code, while making the file available for running?

So, I recently made a Python program that I want to send to someone with them being able to execute it, but not read the code I have typed in it. Any ideas how to do it?
BTW, I want it to be irreversible
In short, here are my Parameters:
Should remain a Python file
Can't be reversed
Code should not be readable
Should still have the ability to be run

The criteria you've posted are inconsistent.
Python is an interpreted language. The entity running the language (i.e. Python interpreter) is reading your code and executing it, line by line. If you wrap it up to send to someone, their Python interpreter must have read permissions on the file, whether it's source code or "compiled" Python (which is easily decompiled into equivalent source code).
If we take a wider interpretation of "send to someone", there may be a business solution that serves your needs. You would provide your functionality, rather than the code: deploy it as a service from some available server: your own, or rented space. To do this, you instead provide an interface to your functionality.
If this fulfills your needs, you now have your next research topic.

How to run c code within python

How can I run c/c++ code within python in the form:
def run_c_code(code):
#Do something to run the code
code = """
Arbitrary code
"""
run_c_code(code)
It would be great if someone could provide an easy solution which does not involve installing packages. I know that C is not a scripting language but it would be great if it could do a 'mini'-compile that is able to run the code into the console. The code should run as it would compiled normally but this needs to be able to work on the fly as the rest of the code runs it and if possible, run as fast as normal and be able to create and edit variables so that python can use it. If necessary, the code can be pre-compiled into the code = """something""".
Sorry for all the requirements but if you can make the c code run in python then that would be great. Thanks in advance for all the answers..

As somebody else already pointed out, to run C/C++ code from "within" Python, you'd have to write said C/C++ code into an own file, compile it correctly, and then execute that program from your Python code.
You can't just type one command, compile it, and execute it. You always have to have the whole "framework" set up. You can't compile a program when you haven't yet written the } that ends the class/function/statement 20 lines later on. At this point you'd already have to write the whole C/C++ program for it to work. It's simply not meant to be interpreted on the run, line by line. You can do that with python, bash/dash/batch, and a few others. But C/C++ definitely isn't one of them.
With those come several issues. Firstly, the C/C++ part probably needs data from the Python part. I don't know of any way of doing it in RAM alone (maybe there is one, but I don't know), so the Python part would have to write it into a file, the C/C++ part would read and process it, then put the processed data into another file, and then the Python part would have to read that and continue.
Which brings another point up. Here we're already getting into multi-threading territory, because the moment you execute that C/C++ program you're dealing with a second thread. So, somehow, you'd have to coordinate those programs so that the Python part only continues once the C/C++ part is done. Shouldn't be a huge problem to get running, but it can be a nightmare to performance and RAM if done wrongly.
Without knowing to what extent you use that program, I also like to add that C/C++ isn't platform-independent like Python. You'll have to compile that program for every single different OS that you run it on. That may come with minor changes to the code and in general just a lot of work because you have to debug and test it for every single system.
To sum up, I think it may be better to find another solution. I don't know why you'd want to run this specific part in C/C++, but I'd recommend trying to get it done in one language. If there's absolutely no way you can get it done in Python (which I doubt, there's libraries for almost everything), you should get your Python to C/C++ instead.

If you want to run C/C++ code - you'll need either a C/C++ compiler, or a C/C++ interpreter.
The former is quite easy to arrange (though probably not suitable for an end user product) and you can just compile the code and run as required.
The latter requires that you attempt to process the code yourself and generate python code that you can then import. I'm not sure this one is worth the effort at all given that even websites that offer compilation tools wrap gcc/g++ rather than implement it in javascript.
I suspect that this is an XY problem; you may wish to take a couple of steps back and try to explain why you want to run c++ code from within a python script.

refactoring code to keep large objects/models in memory in iPython to be reused in python scripts

My script depends on loading lots of variables in a minute and uses them globally in many functions. Every time I call that script in iPython, it loads them again, taking time.
I tried to take these calls to load and populate functions out of that script, but then these global variables are not available to the functions in the script.
It gives NameError: name 'clf' is not defined error message.
Is there a best way to refactor this code to keep these globals in memory and make the script use them? The script loads many variables like these, and uses them in other functions as globals.
vectorizer_title, vectorizer_desc, clf,
df_instance, vocab, all_tokens, df_dist_all,
df_soc2class_proba, dict_p2s,
dict_f2m, token_pattern, cleanup_pattern,
excluded_words = load_data_and_model(lang)
dict_token2idx_all, dict_token2idx_instance,
dist_array, token_dist_to_instance_min,
dict_bigram_by_instance, denominate,
similar_threshold = populate_data(1)

I had asked this question after trying
from depended_library import *
it had not worked in iPython.
But used with python and used in a Flask Web API it works.
Importing library using the "from" statement executes also the codes out of functions in the depended_library in addition to defining functions.
(If someone explains the problem with iPython and suggest a solution, I shall select it as answer.)

python modules missing in sage

I have Sage 4.7.1 installed and have run into an odd problem. Many of my older scripts that use functions like deepcopy() and uniq() no longer recognize them as global names. I have been able to fix this by importing the python modules one by one, but this is quite tedious. But when I start the command-line Sage interface, I can type "list2=deepcopy(list1)" without importing the copy module, and this works fine. How is it possible that the command line Sage can recognize global name 'deepcopy' but if I load my script that uses the same name it doesn't recognize it?
oops, sorry, not familiar with stackoverflow yet. I type: 'sage_4.7.1/sage" to start the command line interface; then, I type "load jbom.py" to load up all the functions I defined in a python script. When I use one of the functions from the script, it runs for a few seconds (complex function) then hits a spot where I use some function that Sage normally has as a global name (deepcopy, uniq, etc) but for some reason the script I loaded does not know what the function is. And to reiterate, my script jbom.py used to work the last time I was working on this particular research, just as I described.
It also makes no difference if I use 'load jbom.py' or 'import jbom'. Both methods get the functions I defined in my script (but I have to use jbom. in the second case) and both get the same error about 'deepcopy' not being a global name.
REPLY TO DSM: I have been sloppy about describing the problem, for which I am sorry. I have created a new script 'experiment.py' that has "import jbom" as its first line. Executing the function in experiment.py recognizes the functions in jbom.py but deepcopy is not recognized. I tried loading jbom.py as "load jbom.py" and I can use the functions just like I did months ago. So, is this all just a problem of layering scripts without proper usage of import/load etc?
SOLVED: I added "from sage.all import *" to the beginning of jbom.py and now I can load experiment.py and execute the functions calling jbom.py functions without any problems. From the Sage doc on import/load I can't really tell what I was doing wrong exactly.

Okay, here's what's going on:
You can only import files ending with .py (ignoring .py[co]) These are standard Python files and aren't preparsed, so 1/3 == int(0), not QQ(1)/QQ(3), and you don't have the equivalent of a from sage.all import * to play with.
You can load and attach both .py and .sage files (as well as .pyx and .spyx and .m). Both have access to Sage definitions but the .py files aren't preparsed (so y=17 makes y a Python int) while the .sage files are (so y=17 makes y a Sage Integer).
So import jbom here works just like it would in Python, and you don't get the access to what Sage has put in scope. load etc. are handy but they don't scale up to larger programs so well. I've proposed improving this in the past and making .sage scripts less second-class citizens, but there hasn't yet been the mix of agreement on what to do and energy to do it. In the meantime your best bet is to import from sage.all.

Dangerous Python Keywords?

I am about to get a bunch of python scripts from an untrusted source.
I'd like to be sure that no part of the code can hurt my system, meaning:
(1) the code is not allowed to import ANY MODULE
(2) the code is not allowed to read or write any data, connect to the network etc
(the purpose of each script is to loop through a list, compute some data from input given to it and return the computed value)
before I execute such code, I'd like to have a script 'examine' it and make sure that there's nothing dangerous there that could hurt my system.
I thought of using the following approach: check that the word 'import' is not used (so we are guaranteed that no modules are imported)
yet, it would still be possible for the user (if desired) to write code to read/write files etc (say, using open).
Then here comes the question:
(1) where can I get a 'global' list of python methods (like open)?
(2) Is there some code that I could add to each script that is sent to me (at the top) that would make some 'global' methods invalid for that script (for example, any use of the keyword open would lead to an exception)?
I know that there are some solutions of python sandboxing. but please try to answer this question as I feel this is the more relevant approach for my needs.
EDIT: suppose that I make sure that no import is in the file, and that no possible hurtful methods (such as open, eval, etc) are in it. can I conclude that the file is SAFE? (can you think of any other 'dangerous' ways that built-in methods can be run?)

This point hasn't been made yet, and should be:
You are not going to be able to secure arbitrary Python code.
A VM is the way to go unless you want security issues up the wazoo.

You can still obfuscate import without using eval:
s = '__imp'
s += 'ort__'
f = globals()['__builtins__'].__dict__[s]
** BOOM **

Built-in functions.
Keywords.
Note that you'll need to do things like look for both "file" and "open", as both can open files.
Also, as others have noted, this isn't 100% certain to stop someone determined to insert malacious code.

An approach that should work better than string matching us to use module ast, parse the python code, do your whitelist filtering on the tree (e.g. allow only basic operations), then compile and run the tree.
See this nice example by Andrew Dalke on manipulating ASTs.

built in functions/keywords:
eval
exec
__import__
open
file
input
execfile
print can be dangerous if you have one of those dumb shells that execute code on seeing certain output
stdin
__builtins__
globals() and locals() must be blocked otherwise they can be used to bypass your rules
There's probably tons of others that I didn't think about.
Unfortunately, crap like this is possible...
object().__reduce__()[0].__globals__["__builtins__"]["eval"]("open('/tmp/l0l0l0l0l0l0l','w').write('pwnd')")
So it turns out keywords, import restrictions, and in-scope by default symbols alone are not enough to cover, you need to verify the entire graph...

Use a Virtual Machine instead of running it on a system that you are concerned about.

Without a sandboxed environment, it is impossible to prevent a Python file from doing harm to your system aside from not running it.
It is easy to create a Cryptominer, delete/encrypt/overwrite files, run shell commands, and do general harm to your system.
If you are on Linux, you should be able to use docker to sandbox your code.
For more information, see this GitHub issue: https://github.com/raxod502/python-in-a-box/issues/2.
I did come across this on GitHub, so something like it could be used, but that has a lot of limits.
Another approach would be to create another Python file which parses the original one, removes the bad code, and runs the file. However, that would still be hit-and-miss.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.