I plan to use those functions in web-environment, so my concern is if those functions can be exploited and used for executing malicious software on the server.
Edit: I don't execute the result. I parse the AST tree and/or catch SyntaxError.
This is the code in question:
try:
#compile the code and check for syntax errors
compile(code_string, filename, "exec")
except SyntaxError, value:
msg = value.args[0]
(lineno, offset, text) = value.lineno, value.offset, value.text
if text is None:
return [{"line": 0, "offset": 0,
"message": u"Problem decoding source"}]
else:
line = text.splitlines()[-1]
if offset is not None:
offset = offset - (len(text) - len(line))
else:
offset = 0
return [{"line": lineno, "offset": offset, "message": msg}]
else:
#no syntax errors, check it with pyflakes
tree = compiler.parse(code_string)
w = checker.Checker(tree, filename)
w.messages.sort(lambda a, b: cmp(a.lineno, b.lineno))
checker.Checker is pyflakes class that parses the AST tree.
I think the more interesting question is what are you doing with the compiled functions? Running them is definitely unsafe.
I've tested the few exploits i could think of seeing as its just a syntax checker (can't redefine classes/functions etc) i don't think there is anyway to get python to execute arbitrary code at compile time
If the resulting code or AST object is never evaluated, I think you are only subject to DDoS attacks.
If you are evaluating user inputed code, it is the same as giving shell access as the webserver user to every user.
They are not, but it's not too hard finding a subset of Python that can be sandboxed to a point. If you want to go down that road you need to parse that subset of Python yourself and intercept all calls, attribute lookups and everything else involved. You also don't want to give users access to any language construct such as unterminating loops and more.
Still interested? Head over to jinja2.sandbox :)
compiler.parse and compile could most definitely be used for an attack if the attacker can control their input and the output is executed. In most cases, you are going to either eval or exec their output to make it run so those are still the usual suspects and compile and compiler.parse (deprecated BTW) are just adding another step between the malicious input and the execution.
EDIT: Just saw that you left a comment indicating that you are actually planning on using these on USER INPUT. Don't do that. Or at least, don't actually execute the result. That's a huge security hole for whoever ends up running that code. And if nobody's going to run it, why compile it? Since you clarified that you only want to check syntax, this should be fine. I would not store the output though as there's no reason to make anything easier for a potential attacker and being able to get arbitrary code onto your system is a first step.
If you do need to store it, I would probably favor a scheme similar to that commonly used for images where they are renamed in a non-predictable manner with the added step of making sure that it is not stored on the import path.
Yes, they can be maliciously exploited.
If you really want safe sandboxing, you could look at PyPy's sandboxing features, but be aware that sandboxing is not easy, and there may be better ways to accomplish whatever you are seeking.
Correction
Since you've updated your question to clarify that you're only parsing the untrusted input to AST, there is no need to sandbox anything: sandboxing is specifically about executing untrusted code (which most people probably assumed your goal was, by asking about sandboxing).
Using compile / compiler only for parsing this way should be safe: Python source parsing does not have any hooks into code execution. (Note that this is not necessarily true of all languages: for example, Perl cannot be (completely) parsed without code execution.)
The only other remaining risk is that someone may be able to craft some pathological Python source code that makes one of the parsers use runaway amounts of memory / processor time, but resource exhaustion attacks affect everything, so you'll just want to manage this as it becomes necessary. (For example, if your deployment is mission-critical and cannot afford a denial of service by an attacker armed with pathological source code, you can execute the parsing in a resource-limited subprocess).
Related
I would like users of my program to be able to define custom scripts in python without breaking the program. I am looking at something like this:
def call(script):
code = "access modification code" + script
exec(code)
Where "access modification code" defines a scope such that script can only access variables it instantiates itself. Is it possible to do this or something with similar functionality, such as creating a new python environment with its own scope and then receiving output from it?
Thank you for your time :)
Clarification Edit
"I want to prevent both active attacks and accidental interaction with the program variables outside the users script (hence hiding all globals). The user scripts are intended to be small and inputted as text. The return of the user script needs to be immediate, as though it were native to the program."
There's two separate problems you want to prevent in this scenario:
Prevent an outside attacker from running arbitrary code in the context of the OS user executing your program. This means preventing arbitrary code execution and privilege escalation.
Preventing a legitimate user of your program from changing the program behavior in unintended ways.
For the first problem, you need to make sure that the source of the python code you execute is the user. You shouldn't accept input from a socket, or from a file that other users can write to. You need to make sure, somehow, that it was the user who is running the program that provided the input.
Actual solutions will depend on your OS, but you may want to consider setting up restrictive file permissions, if you're storing the code in a file.
Don't ignore or downplay this problem, or your users will fall victim to virus/malware/hackers thanks to your program.
The way you solve the second problem depends on what exactly constitutes intended behaviour in your program. If you're happy with outputting simple data structures, you can run your user-inputted code in a separate process, and pass the result over a pipe in a serialization format such as JSON or YAML.
Here's a very simple example:
#remember to set restrictive file permissions on this file. This is OS-dependent
USER_CODE_FILE="/home/user/user_code_file.py"
#absolute path to python binary (executable)
PYTHON_PATH="/usr/bin/python"
import subprocess
import json
user_code= '''
import json
my_data= {"a":[1,2,3]}
print json.dumps(my_data)
'''
with open(USER_CODE_FILE,"wb") as f:
f.write(user_code)
user_result_str= subprocess.check_output([PYTHON_PATH, USER_CODE_FILE])
user_result= json.loads( user_result_str )
print user_result
This is a fairly simple solution, and it has significant overhead. Don't use this if you need to run the user code many times in a short period of time.
Ultimately, this solution is only effective against unsophisticated attackers (users). Generally speaking, there's no way to protect any user process from the user itself - nor would it make much sense.
If you really want more assurance, and want to mitigate the first problem too, you should run the process as a separate ("guest") user. This, again, is OS-dependent.
Finally a warning: avoid exec and eval to the best of your abilities. They don't protect either your program or the user against the inputted code. There's a lot of information about this on the web, just search for "python secure eval"
I am about to get a bunch of python scripts from an untrusted source.
I'd like to be sure that no part of the code can hurt my system, meaning:
(1) the code is not allowed to import ANY MODULE
(2) the code is not allowed to read or write any data, connect to the network etc
(the purpose of each script is to loop through a list, compute some data from input given to it and return the computed value)
before I execute such code, I'd like to have a script 'examine' it and make sure that there's nothing dangerous there that could hurt my system.
I thought of using the following approach: check that the word 'import' is not used (so we are guaranteed that no modules are imported)
yet, it would still be possible for the user (if desired) to write code to read/write files etc (say, using open).
Then here comes the question:
(1) where can I get a 'global' list of python methods (like open)?
(2) Is there some code that I could add to each script that is sent to me (at the top) that would make some 'global' methods invalid for that script (for example, any use of the keyword open would lead to an exception)?
I know that there are some solutions of python sandboxing. but please try to answer this question as I feel this is the more relevant approach for my needs.
EDIT: suppose that I make sure that no import is in the file, and that no possible hurtful methods (such as open, eval, etc) are in it. can I conclude that the file is SAFE? (can you think of any other 'dangerous' ways that built-in methods can be run?)
This point hasn't been made yet, and should be:
You are not going to be able to secure arbitrary Python code.
A VM is the way to go unless you want security issues up the wazoo.
You can still obfuscate import without using eval:
s = '__imp'
s += 'ort__'
f = globals()['__builtins__'].__dict__[s]
** BOOM **
Built-in functions.
Keywords.
Note that you'll need to do things like look for both "file" and "open", as both can open files.
Also, as others have noted, this isn't 100% certain to stop someone determined to insert malacious code.
An approach that should work better than string matching us to use module ast, parse the python code, do your whitelist filtering on the tree (e.g. allow only basic operations), then compile and run the tree.
See this nice example by Andrew Dalke on manipulating ASTs.
built in functions/keywords:
eval
exec
__import__
open
file
input
execfile
print can be dangerous if you have one of those dumb shells that execute code on seeing certain output
stdin
__builtins__
globals() and locals() must be blocked otherwise they can be used to bypass your rules
There's probably tons of others that I didn't think about.
Unfortunately, crap like this is possible...
object().__reduce__()[0].__globals__["__builtins__"]["eval"]("open('/tmp/l0l0l0l0l0l0l','w').write('pwnd')")
So it turns out keywords, import restrictions, and in-scope by default symbols alone are not enough to cover, you need to verify the entire graph...
Use a Virtual Machine instead of running it on a system that you are concerned about.
Without a sandboxed environment, it is impossible to prevent a Python file from doing harm to your system aside from not running it.
It is easy to create a Cryptominer, delete/encrypt/overwrite files, run shell commands, and do general harm to your system.
If you are on Linux, you should be able to use docker to sandbox your code.
For more information, see this GitHub issue: https://github.com/raxod502/python-in-a-box/issues/2.
I did come across this on GitHub, so something like it could be used, but that has a lot of limits.
Another approach would be to create another Python file which parses the original one, removes the bad code, and runs the file. However, that would still be hit-and-miss.
I'm looking for a way to run tests on command-line utilities written in bash, or any other language.
I'd like to find a testing framework that would have statements like
setup:
command = 'do_awesome_thing'
filename = 'testfile'
args = ['--with', 'extra_win', '--file', filename]
run_command command args
test_output_was_correct
assert_output_was 'Creating awesome file "' + filename + '" with extra win.'
test_file_contains_extra_win
assert_file_contains filename 'extra win'
Presumably the base test case would set up a temp directory in which to run these commands, and remove it at teardown.
I would prefer to use something in Python, since I'm much more familiar with it than with other plausible candidate languages.
I imagine that there could be something using a DSL that would make it effectively language-agnostic (or its own language, depending on how you look at it); however this might be less than ideal, since my testing techniques usually involve writing code that generates tests.
It's a bit difficult to google for this, as there is a lot of information on utilities which run tests, which is sort of the converse of what I'm looking for.
Support for doctests embedded in the output of command --help would be an extra bonus :)
Check out ScriptTest :
from scripttest import TestFileEnvironment
env = TestFileEnvironment('./scratch')
def test_script():
env.reset()
result = env.run('do_awesome_thing testfile --with extra_win --file %s' % filename)
# or use a list like ['do_awesome_thing', 'testfile', ...]
assert result.stdout.startswith('Creating awesome file')
assert filename in result.files_created
It's reasonably doctest-usable as well.
Well... What we usually do (and one of the wonders of O.O. languages) is to write all the components of an application before actually make the application. Every component might have an standalone way to be executed, for testing purpose (command line, usually), that also allows you to think in them as complete programs each by each, and use them in future projects. If what you want is to test the integrity of an existing program... well, I think the best way is to learn in deep how it work, or even deeper: read the source. Or even deeper: develop a bot to force-test it :3
Sorry that's what I have .-.
I know that this question is old, but since I was looking for an answer, I figured I would add my own for anyone else who happens along.
Full disclaimer: The project I am mentioning is my own, but it is completely free and open source.
I ran into a very similar problem, and ended up rolling my own solution. The test code will look like this:
from CLITest import CLITest, TestSuite
from subprocess import CalledProcessError
class TestEchoPrintsToScreen(CLITest):
'''Tests whether the string passed in is the string
passed out'''
def test_output_contains_input(self):
self.assertNotIsInstance(self.output, CalledProcessError)
self.assertIn("test", self.output)
def test_ouput_equals_input(self):
self.assertNotIsInstance(self.output, CalledProcessError)
self.assertEqual("test", self.output)
suite = TestSuite()
suite.add_test(TestEchoPrintsToScreen("echo test"))
suite.run_tests()
This worked well enough to get me through my issues, but I know it could use some more work to make it as robust as possible (test discovery springs to mind). It may help, and I always love a good pull request.
outside of any prepackaged testing framework that may exist but I am unaware of, I would just point out that expect is an awesome and so underutilized tool for this kind of automation, especially if you want to support multistage interaction, which is to say not just send a command and check output but respond to output with more input. If you wind up building your own system, it's worth looking into.
There is also python reimplementation of expect called pexpect.There may be some direct interfaces to the expect library available as well. I'm not a python guy so I couldn't tell you much about them.
I would like to enable students to submit python code solutions to a few simple python problems. My applicatoin will be running in GAE. How can I limit the risk from malicios code that is sumitted? I realize that this is a hard problem and I have read related Stackoverflow and other posts on the subject. I am curious if the restrictions aleady in place in the GAE environment make it simpler to limit damage that untrusted code could inflict. Is it possible to simply scan the submitted code for a few restricted keywords (exec, import, etc.) and then ensure the code only runs for less than a fixed amount of time, or is it still difficult to sandbox untrusted code even in the resticted GAE environment? For example:
# Import and execute untrusted code in GAE
untrustedCode = """#Untrusted code from students."""
class TestSpace(object):pass
testspace = TestSpace()
try:
#Check the untrusted code somehow and throw and exception.
except:
print "Code attempted to import or access network"
try:
# exec code in a new namespace (Thanks Alex Martelli)
# limit runtime somehow
exec untrustedCode in vars(testspace)
except:
print "Code took more than x seconds to run"
#mjv's smiley comment is actually spot-on: make sure the submitter IS identified and associated with the code in question (which presumably is going to be sent to a task queue), and log any diagnostics caused by an individual's submissions.
Beyond that, you can indeed prepare a test-space that's more restrictive (thanks for the acknowledgment;-) including a special 'builtin' that has all you want the students to be able to use and redefines __import__ &c. That, plus a token pass to forbid exec, eval, import, __subclasses__, __bases__, __mro__, ..., gets you closer. A totally secure sandbox in a GAE environment however is a real challenge, unless you can whitelist a tiny subset of the language that the students are allowed.
So I would suggest a layered approach: the sandbox GAE app in which the students upload and execute their code has essentially no persistent layer to worry about; rather, it "persists" by sending urlfetch requests to ANOTHER app, which never runs any untrusted code and is able to vet each request very critically. Default-denial with whitelisting is still the holy grail, but with such an extra layer for security you may be able to afford a default-acceptance with blacklisting...
You really can't sandbox Python code inside App Engine with any degree of certainty. Alex's idea of logging who's running what is a good one, but if the user manages to break out of the sandbox, they can erase the event logs. The only place this information would be safe is in the per-request logging, since users can't erase that.
For a good example of what a rathole trying to sandbox Python turns into, see this post. For Guido's take on securing Python, see this post.
There are another couple of options: If you're free to choose the language, you could run Rhino (a Javascript interpreter) on the Java runtime; Rhino is nicely sandboxed. You may also be able to use Jython; I don't know if it's practical to sandbox it, but it seems likely.
Alex's suggestion of using a separate app is also a good one. This is pretty much the approach that shell.appspot.com takes: It can't prevent you from doing malicious things, but the app itself stores nothing of value, so there's no harm if you do.
Here's an idea. Instead of running the code server-side, run it client-side with Skuplt:
http://www.skulpt.org/
This is both safer, and easier to implement.
Say I want to manipulate some files on a floppy drive or a USB card reader. How do I check to see if the drive in question is ready? (That is, has a disk physically inserted.)
The drive letter exists, so os.exists() will always return True in this case. Also, at this point in the process I don't yet know any file names, so checking to see if a given file exists also won't work.
Some clarification: the issue here is exception handling. Most of the win32 API calls in question just throw an exception when you try to access a drive that isn't ready. Normally, this would work fine - look for something like the free space, and then catch the raised exception and assume that means there isn't a disk present. However, even when I catch any and all exceptions, I still get an angry exception dialog box from Windows telling me the floppy / card reader isn't ready. So, I guess the real question is - how do I suppress the windows error box?
And the answer, as with so many things, turns out to be in an article about C++/Win32 programming from a decade ago.
The issue, in a nutshell, is that Windows handles floppy disk errors slightly differently than other kinds of drive errors. By default, no matter what you program does, or thinks it's doing, Windows will intercept any errors thrown by the device and present a dialog box to the user rather than letting the program handle it - the exact issue I was having.
But, as it turns out, there's a Win32 API call to solve this issue, primarily SetErrorMode()
In a nutshell (and I'm handwaving a way a lot of the details here), we can use SetErrorMode() to get Windows to stop being quite so paranoid, do our thing and let the program handle the situation, and then reset the Windows error mode back to what it was before as if we had never been there. (There's probably a Keyser Soze joke here, but I've had the wrong amount of caffeine today to be able to find it.)
Adapting the C++ sample code from the linked article, that looks about like this:
int OldMode; //a place to store the old error mode
//save the old error mode and set the new mode to let us do the work:
OldMode = SetErrorMode(SEM_FAILCRITICALERRORS);
// Do whatever we need to do that might cause an error
SetErrorMode(OldMode); //put things back the way they were
Under C++, detecting errors the right way needs the `GetLastError()' function, which we fortunately don't need to worry about here, since this is a Python question. In our case, Python's exception handling works fine. This, then, is the function I knocked together to check a drive letter for "readiness", all ready for copy-pasting if anyone else needs it:
import win32api
def testDrive( currentLetter ):
"""
Tests a given drive letter to see if the drive is question is ready for
access. This is to handle things like floppy drives and USB card readers
which have to have physical media inserted in order to be accessed.
Returns true if the drive is ready, false if not.
"""
returnValue = False
#This prevents Windows from showing an error to the user, and allows python
#to handle the exception on its own.
oldError = win32api.SetErrorMode( 1 ) #note that SEM_FAILCRITICALERRORS = 1
try:
freeSpace = win32file.GetDiskFreeSpaceEx( letter )
except:
returnValue = False
else:
returnValue = True
#restore the Windows error handling state to whatever it was before we
#started messing with it:
win32api.SetErrorMode( oldError )
return returnValue
I've been using this quite a bit the last few days, and it's been working beautifully for both floppies and USB card readers.
A few notes: pretty much any function needing disk access will work in the try block - all we're looking for in an exception due to the media not being present.
Also, while the python win32api package exposes all the functions we need, it dones't seem to have any of the flag constants. After a trip to the ancient bowels of MSDN, it turns out that SEM_FAILCRITICALERRORS is equal to 1, which makes our life awfully easy.
I hope this helps someone else with a similar problem!
You can compare len(os.listdir("path")) to zero to see if there are any files in the directory.
If you have pythonwin, does any of the information in this recipe help?
At a guess, "Availability" and "Status" may be worth looking at. Or you could test volume name, which I guess will be either 'X:' or '' if there is nothing in the drive. Or, heck, look for free space or total number of blocks.
You can use win32 functions via the excellent pywin32 (http://sourceforge.net/projects/pywin32/) for this purpose.
I suggest looking at the GetDiskFreeSpace function. You can check the free space on the target drive and continue based on that information.
As an alternative you can watch the changes of a directory or file with the ReadDirectoryChangesW function. You'll get notifications about file changes etc. But you have to check whether this works for you or not. You can look at this example foryourself:
http://timgolden.me.uk/python/downloads/watch_directory.py
Not sure about your platform, but SNMP might be the answer for you.
I hope this will solve your Problem.
import os
cmd = 'Powershell "Get-PhysicalDisk"'
drive_details = []
for each_line in os.popen(cmd).readlines():
try:
disk_info = str(each_line)
if int(disk_info[0]) >= 0:
each_line = each_line.split()
drive_details.append({
'sequence':each_line[0],
'disk_type':str(each_line[-7]),
'health_status':str(each_line[-4]),
'disk_size':str(each_line[-2]) + str(each_line[-1])
})
except:
pass
print(drive_details)