Does executing a python script load it into memory?

Does executing a python script load it into memory? - python

I'm running a python script using python3 myscript.py on Ubuntu 16.04. Is the script loaded into memory or read and interpreted line by line from the hdd? If it's not loaded all at once, is there any way of knowing or controlling how big the chunks are, that are loaded into Memory?

It is loaded into memory in its entirety. This must be the case, because a syntax error near the end will abort the program straight away. Try it and see.
There does not need to be any way to control or configure this. It is surely an implementation detail best left alone. If you have a problem related to this (e.g. your script is larger than your RAM), it can be solved some other way.

The "script" you use is only the human friendly representation you see. Python opens that script, reads lines, tokenizes them, creates a parse and ast tree for it and then emits bytecode which you can see using the dis module.
The "script" isn't loaded, it's code object (the object that contains the instructions generated for it) is. There's no direct way to affect that process. I have never heard of a script being so big that you need to read it in chunks, I'd be surprised if you accomplished it.

Related

How to run c code within python

How can I run c/c++ code within python in the form:
def run_c_code(code):
#Do something to run the code
code = """
Arbitrary code
"""
run_c_code(code)
It would be great if someone could provide an easy solution which does not involve installing packages. I know that C is not a scripting language but it would be great if it could do a 'mini'-compile that is able to run the code into the console. The code should run as it would compiled normally but this needs to be able to work on the fly as the rest of the code runs it and if possible, run as fast as normal and be able to create and edit variables so that python can use it. If necessary, the code can be pre-compiled into the code = """something""".
Sorry for all the requirements but if you can make the c code run in python then that would be great. Thanks in advance for all the answers..

As somebody else already pointed out, to run C/C++ code from "within" Python, you'd have to write said C/C++ code into an own file, compile it correctly, and then execute that program from your Python code.
You can't just type one command, compile it, and execute it. You always have to have the whole "framework" set up. You can't compile a program when you haven't yet written the } that ends the class/function/statement 20 lines later on. At this point you'd already have to write the whole C/C++ program for it to work. It's simply not meant to be interpreted on the run, line by line. You can do that with python, bash/dash/batch, and a few others. But C/C++ definitely isn't one of them.
With those come several issues. Firstly, the C/C++ part probably needs data from the Python part. I don't know of any way of doing it in RAM alone (maybe there is one, but I don't know), so the Python part would have to write it into a file, the C/C++ part would read and process it, then put the processed data into another file, and then the Python part would have to read that and continue.
Which brings another point up. Here we're already getting into multi-threading territory, because the moment you execute that C/C++ program you're dealing with a second thread. So, somehow, you'd have to coordinate those programs so that the Python part only continues once the C/C++ part is done. Shouldn't be a huge problem to get running, but it can be a nightmare to performance and RAM if done wrongly.
Without knowing to what extent you use that program, I also like to add that C/C++ isn't platform-independent like Python. You'll have to compile that program for every single different OS that you run it on. That may come with minor changes to the code and in general just a lot of work because you have to debug and test it for every single system.
To sum up, I think it may be better to find another solution. I don't know why you'd want to run this specific part in C/C++, but I'd recommend trying to get it done in one language. If there's absolutely no way you can get it done in Python (which I doubt, there's libraries for almost everything), you should get your Python to C/C++ instead.

If you want to run C/C++ code - you'll need either a C/C++ compiler, or a C/C++ interpreter.
The former is quite easy to arrange (though probably not suitable for an end user product) and you can just compile the code and run as required.
The latter requires that you attempt to process the code yourself and generate python code that you can then import. I'm not sure this one is worth the effort at all given that even websites that offer compilation tools wrap gcc/g++ rather than implement it in javascript.
I suspect that this is an XY problem; you may wish to take a couple of steps back and try to explain why you want to run c++ code from within a python script.

How can I call a python script with arguments from within Processing?

I have a python script which outputs a JSON when called with different arguments. I am looking for a way to call that script from within Processing and load the output using something like loadJSONObject()
The problem is that I don't know how to call the python script with arguments from within Processing.
Any tip will be appreciated, thanks!

One option, as pointed out in the comments, is to use open, and then load the file that generates the normal way.
Another -arguably much better- way is to not do this and to run your python script as services with a web interface instead, so that your python scripts sits listening on http://localhost:1234, for instance, and your Processing sketch can simply load a file "http://localhost:1234/somefile?input=whatever" and not even care what is actually generating the content.
The upside there is also that you can run your script anywhere that can be reached via URLs, and those things don't need to rely on python being available as an executable.

Dangerous Python Keywords?

I am about to get a bunch of python scripts from an untrusted source.
I'd like to be sure that no part of the code can hurt my system, meaning:
(1) the code is not allowed to import ANY MODULE
(2) the code is not allowed to read or write any data, connect to the network etc
(the purpose of each script is to loop through a list, compute some data from input given to it and return the computed value)
before I execute such code, I'd like to have a script 'examine' it and make sure that there's nothing dangerous there that could hurt my system.
I thought of using the following approach: check that the word 'import' is not used (so we are guaranteed that no modules are imported)
yet, it would still be possible for the user (if desired) to write code to read/write files etc (say, using open).
Then here comes the question:
(1) where can I get a 'global' list of python methods (like open)?
(2) Is there some code that I could add to each script that is sent to me (at the top) that would make some 'global' methods invalid for that script (for example, any use of the keyword open would lead to an exception)?
I know that there are some solutions of python sandboxing. but please try to answer this question as I feel this is the more relevant approach for my needs.
EDIT: suppose that I make sure that no import is in the file, and that no possible hurtful methods (such as open, eval, etc) are in it. can I conclude that the file is SAFE? (can you think of any other 'dangerous' ways that built-in methods can be run?)

This point hasn't been made yet, and should be:
You are not going to be able to secure arbitrary Python code.
A VM is the way to go unless you want security issues up the wazoo.

You can still obfuscate import without using eval:
s = '__imp'
s += 'ort__'
f = globals()['__builtins__'].__dict__[s]
** BOOM **

Built-in functions.
Keywords.
Note that you'll need to do things like look for both "file" and "open", as both can open files.
Also, as others have noted, this isn't 100% certain to stop someone determined to insert malacious code.

An approach that should work better than string matching us to use module ast, parse the python code, do your whitelist filtering on the tree (e.g. allow only basic operations), then compile and run the tree.
See this nice example by Andrew Dalke on manipulating ASTs.

built in functions/keywords:
eval
exec
__import__
open
file
input
execfile
print can be dangerous if you have one of those dumb shells that execute code on seeing certain output
stdin
__builtins__
globals() and locals() must be blocked otherwise they can be used to bypass your rules
There's probably tons of others that I didn't think about.
Unfortunately, crap like this is possible...
object().__reduce__()[0].__globals__["__builtins__"]["eval"]("open('/tmp/l0l0l0l0l0l0l','w').write('pwnd')")
So it turns out keywords, import restrictions, and in-scope by default symbols alone are not enough to cover, you need to verify the entire graph...

Use a Virtual Machine instead of running it on a system that you are concerned about.

Without a sandboxed environment, it is impossible to prevent a Python file from doing harm to your system aside from not running it.
It is easy to create a Cryptominer, delete/encrypt/overwrite files, run shell commands, and do general harm to your system.
If you are on Linux, you should be able to use docker to sandbox your code.
For more information, see this GitHub issue: https://github.com/raxod502/python-in-a-box/issues/2.
I did come across this on GitHub, so something like it could be used, but that has a lot of limits.
Another approach would be to create another Python file which parses the original one, removes the bad code, and runs the file. However, that would still be hit-and-miss.

How to access a data structure from a currently running Python process on Linux?

I have a long-running Python process that is generating more data than I planned for. My results are stored in a list that will be serialized (pickled) and written to disk when the program completes -- if it gets that far. But at this rate, it's more likely that the list will exhaust all 1+ GB free RAM and the process will crash, losing all my results in the process.
I plan to modify my script to write results to disk periodically, but I'd like to save the results of the currently-running process if possible. Is there some way I can grab an in-memory data structure from a running process and write it to disk?
I found code.interact(), but since I don't have this hook in my code already, it doesn't seem useful to me (Method to peek at a Python program running right now).
I'm running Python 2.5 on Fedora 8. Any thoughts?
Thanks a lot.
Shahin

There is not much you can do for a running program. The only thing I can think of is to attach the gdb debugger, stop the process and examine the memory. Alternatively make sure that your system is set up to save core dumps then kill the process with kill --sigsegv <pid>. You should then be able to open the core dump with gdb and examine it at your leisure.
There are some gdb macros that will let you examine python data structures and execute python code from within gdb, but for these to work you need to have compiled python with debug symbols enabled and I doubt that is your case. Creating a core dump first then recompiling python with symbols will NOT work, since all the addresses will have changed from the values in the dump.
Here are some links for introspecting python from gdb:
http://wiki.python.org/moin/DebuggingWithGdb
http://chrismiles.livejournal.com/20226.html
or google for 'python gdb'
N.B. to set linux to create coredumps use the ulimit command.
ulimit -a will show you what the current limits are set to.
ulimit -c unlimited will enable core dumps of any size.

While certainly not very pretty you could try to access data of your process through the proc filesystem.. /proc/[pid-of-your-process]. The proc filesystem stores a lot of per process information such as currently open file pointers, memory maps and what not. With a bit of digging you might be able to access the data you need though.
Still i suspect you should rather look at this from within python and do some runtime logging&debugging.

+1 Very interesting question.
I don't know how well this might work for you (especially since I don't know if you'll reuse the pickled list in the program), but I would suggest this: as you write to disk, print out the list to STDOUT. When you run your python script (I'm guessing also from command line), redirect the output to append to a file like so:
python myScript.py >> logFile.
This should store all the lists in logFile.
This way, you can always take a look at what's in logFile and you should have the most up to date data structures in there (depending on where you call print).
Hope this helps

This answer has info on attaching gdb to a python process, with macros that will get you into a pdb session in that process. I haven't tried it myself but it got 20 votes. Sounds like you might end up hanging the app, but also seems to be worth the risk in your case.

Embedded Python - Blocking operations in time module

I'm developing my own Python code interpreter using the Python C API, as described in the Python documentation. I've taken a look on the Python source code and I tried to follow the same steps that are carried out in the standard interpreter when executing a py file. These steps (sequence of C API function calls) are basically:
PyRun_AnyFileExFlags()
PyRun_SimpleFileExFlags()
PyRun_FileExFlags()
PyArena_New()
PyParser_ASTFromFile()
run_mod()
PyAST_Compile()
PyEval_EvalCode()
PyEval_EvalCodeEx()
PyThreadState_GET()
PyFrame_New()
PyEval_EvalFrameEx()
The only difference in my code is that I do manually the AST compilation, frame creation, etc. and then I call PyEval_EvalFrame.
With this, I am able to execute an arbitrary .py file with my program, as if it were the normal Python interpreter. My problem comes when the code that my program is executing makes use of the time module: all time module operations get blocked in the GIL! For example, if the Python code calls time.sleep(1), this call is blocked and never gets executed.
Obviously I am doing something wrong that blocks the GIL (and therefore blocks the time module) but I dont know how to correct it. The last statement in my code where I have control is in PyEval_EvalFrameEx, and from that point on, everything runs "as in regular Python interpreter", I think.
Anybody had a similar problem? What am I doing wrong, so that I block the time module?
Hope somebody can help me...
Thanks for your time. Best regards,
R.

You need to provide more detail.
How does your interpreter's behavior differ from the standard interpreter?
If you just want to run arbitrary source files, why are you not calling one of the higher level interfaces, like PyRun_SimpleFile? Did your code call Py_Initialize?

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.