I' trying to create a directory and then delete it (for testing purposes, which I will ommit, but can give details if needed).
Like this:
>>> import os
>>> os.makedirs('C:\\ProgramData\\dir\\test')
>>> os.remove('C:\\ProgramData\\dir\\test')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
PermissionError: [WinError 5] Access is denied: 'C:\\ProgramData\\dir\\test'
I always get access denied, although I'm running the interpreter as an admin. Also I have no problem manually deleting the directory.
Use os.rmdir to delete a folder.
os.remove is for deleting files.
Use os.rmdir to remove a directory. On Windows this is implemented by calling the WinAPI function RemoveDirectory. os.remove is implemented by calling DeleteFile, which is only meant for deleting files. If the filename argument is a directory, the call fails and it sets the last error code to ERROR_ACCESS_DENIED, for which Python 3 raises a PermissionError.
In this case the access denied error is based on the NTSTATUS code STATUS_FILE_IS_A_DIRECTORY, i.e. RtlNtStatusToDosError(0xC00000BA) == 5. Often the kernel's status code is more informative than the corresponding WinAPI error, but not always, and there isn't always a simple mapping from one to the other, depending on the division of labor between the kernel, system processes, services, environment subsystems, and the application. In this case I think the kernel status code is undeniably more informative than a generic access denied error.
At a lower level the cause of the error when trying to delete a directory via DeleteFile is that it calls the system service NtOpenFile with the FILE_NON_DIRECTORY_FILE flag set in OpenOptions, whereas RemoveDirectory specifies FILE_DIRECTORY_FILE. Subsequently both functions call NtSetInformationFile to set the FileDispositionInformation to delete the file or directory.
Just to be a contrarian, let's implement the entire sequence using only file operations on an NTFS file system.
>>> import os, pathlib
>>> base = pathlib.Path(os.environ['ProgramData'])
Create the 'dir' directory:
>>> dirp = base / 'dir::$INDEX_ALLOCATION'
>>> open(str(dirp), 'w').close()
>>> os.path.isdir(str(dirp))
True
By manually specifying the stream type as $INDEX_ALLOCATION, opening this 'file' actually creates an NTFS directory. Incidentally, you can also add multiple named $DATA streams to a directory. Refer to the file streams topic.
Next create the 'test' subdirectory and call os.remove to delete it:
>>> test = base / 'dir' / 'test::$INDEX_ALLOCATION'
>>> open(str(test), 'w').close()
>>> os.path.isdir(str(test))
True
>>> os.remove(str(test))
>>> os.path.exists(str(test))
False
You may be surprised that this worked. Remember the filename in this case explicitly specifies the $INDEX_ALLOCATION stream. This overrules the FILE_NON_DIRECTORY_FILE flag. You get what you ask for. But don't rely on this since these streams are an implementation detail of NTFS, which isn't the only file system in use on Windows.
Related
I have a large system written in python. when I run it, it reads all sorts of data from many different files on my filesystem. There are thousands lines of code, and hundreds of files, most of them are not actually being used. I want to see which files are actually being accessed by the system (ubuntu), and hopefully, where in the code they are being opened. Filenames are decided dynamically using variables etc., so the actual filenames cannot be determined just by looking at the code.
I have access to the code, of course, and can change it.
I try to figure how to do this efficiently, with minimal changes in the code:
is there a Linux way to determine which files are accessed, and at what times? this might be useful, although it won't tell me where in the code this happens
is there a simple way to make an "open file" command also log the file name, time, etc... of the open file? hopefully without having to go into the code and change every open command, there are many of them, and some are not being used at runtime.
Thanks
You can trace file accesses without modifying your code, using strace.
Either you start your program with strace, like this
strace -f -e trace=file your_program.py
Otherwise you attach strace to a running program like this
strace -f -e trace=file -p <PID>
For 1 - You can use
ls -la /proc/<PID>/fd`
Replacing <PID> with your process id.
Note that it will give you all the open file descriptors, some of them are stdin stdout stderr, and often other things, such as open websockets (which use a file descriptor), however filtering it for files should be easy.
For 2- See the great solution proposed here -
Override python open function when used with the 'as' keyword to print anything
e.g. overriding the open function with your own, which could include the additional logging.
One possible method is to "overload" the open function. This will have many effects that depend on the code, so I would do that very carefully if needed, but basically here's an example:
>>> _open = open
>>> def open(filename):
... print(filename)
... return _open(filename)
...
>>> open('somefile.txt')
somefile.txt
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 3, in open
FileNotFoundError: [Errno 2] No such file or directory: 'somefile.txt'
As you can see my new open function will return the original open (renamed as _open) but will first print out the argument (the filename). This can be done with more sophistication to log the filename if needed, but the most important thing is that this needs to run before any use of open in your code
I'm trying to connect another computer in local network via python (subprocesses module) with this commands from CMD.exe
net use \\\\ip\C$ password /user:username
copy D:\file.txt \\ip\C$
Then in python it look like below.
But when i try second command, I get:
"FileNotFoundError: [WinError 2]"
Have you met same problem?
Is there any way to fix it?
import subprocess as sp
code = sp.call(r'net use \\<ip>\C$ <pass> /user:<username>')
print(code)
sp.call(r'copy D:\file.txt \\<ip>\C$')
The issue is that copy is a built-in, not a real command in Windows.
Those Windows messages are awful, but "FileNotFoundError: [WinError 2]" doesn't mean one of source & destination files can't be accessed (if copy failed, you'd get a normal Windows message with explicit file names).
Here, it means that the command could not be accessed.
So you'd need to add shell=True to your subprocess call to gain access to built-ins.
But don't do that (security issues, non-portability), use shutil.copy instead.
Aside, use check_call instead of call for your first command, as if net use fails, the rest will fail too. Better have an early failure.
To sum it up, here's what I would do:
import shutil
import subprocess as sp
sp.check_call(['net','use',r'\\<ip>\C$','password','/user:<username>'])
shutil.copy(r'D:\file.txt,r'\\<ip>\C$')
you need make sure you have right to add a file.
i have testted successfully after i corrected the shared dirctory's right.
LYX_EXE = r'"c:\Program Files (x86)\LyX 2.3\bin\LyX2.3.exe"'
process = subprocess.Popen(LYX_EXE)
This works - the program loads.
LYX_EXE = r'"c:\Program Files (x86)\LyX 2.3\bin\LyX2.3.exe"'
process = subprocess.Popen([LYX_EXE])
This fails: I get "PermissionError: [WinError 5] Access is denied".
What did I do wrong? I need the second call type since I want to use parameters.
I think in the second call type you have to avoid the quoting (Since it's already in a list, the executable and arguments are already separated):
LYX_EXE = r"c:\Program Files (x86)\LyX 2.3\bin\LyX2.3.exe"
process = subprocess.Popen([LYX_EXE])
See also: https://docs.python.org/2/library/subprocess.html#converting-argument-sequence
Quite simply, I am cycling through all sub folders in a specific location, and collecting a few numbers from three different files.
def GrepData():
import glob as glob
import os as os
os.chdir('RUNS')
RUNSDir = os.getcwd()
Directories = glob.glob('*.*')
ObjVal = []
ParVal = []
AADVal = []
for dir in Directories:
os.chdir(dir)
(X,Y) = dir.split(sep='+')
AADPath = glob.glob('Aad.out')
ObjPath = glob.glob('fobj.out')
ParPath = glob.glob('Par.out')
try:
with open(os.path.join(os.getcwd(),ObjPath[0])) as ObjFile:
for line in ObjFile:
ObjVal.append(list([X,Y,line.split()[0]]))
ObjFile.close()
except(IndexError):
ObjFile.close()
try:
with open(os.path.join(os.getcwd(),ParPath[0])) as ParFile:
for line in ParFile:
ParVal.append(list([X,Y,line.split()[0]]))
ParFile.close()
except(IndexError):
ParFile.close()
try:
with open(os.path.join(os.getcwd(),AADPath[0])) as AADFile:
for line in AADFile:
AADVal.append(list([X,Y,line.split()[0]]))
AADFile.close()
except(IndexError):
AADFile.close()
os.chdir(RUNSDir)
Each file open command is placed in a try - except block, as in a few cases the file that is opened will be empty, and thus appending the line.split() will lead to an index error since the list is empty.
However when running this script i get the following error: "OSError: [Errno 24] Too Many open files"
I was under the impression that the idea of the "with open..." statement was that it took care of closing the file after use? Clearly that is not happening.
So what I am asking for is two things:
The answer to: "Is my understanding of with open correct?"
How can I correct whatever error is inducing this problem?
(And yes i know the code is not exactly elegant. The whole try - except ought to be a single object that is reused - but I will fix that after figuring out this error)
Try moving your try-except inside the with like so:
with open(os.path.join(os.getcwd(),ObjPath[0])) as ObjFile:
for line in ObjFile:
try:
ObjVal.append(list([X,Y,line.split()[0]]))
except(IndexError):
pass
Notes: there is no need to close your file manually, this is what with is for. Also, there is no need to use as os in your imports if you are using the same name.
"Too many open files" has nothing to do with writing semantically incorrect python code, and you are using with correctly. The key is the part of your error that says "OSError," which refers to the underlying operating system.
When you call open(), the python interpreter will execute a system call. The details of the system call vary a bit by which OS you are using, but on linux this call is open(2). The operating system kernel will handle the system call. While the file is open, it has an entry in the system file table and takes up OS resources -- this means effectively it is "taking up space" whilst it is open. As such the OS has a limit to the number of files that can be opened at any one time.
Your problem is that while you call open(), you don't call close() quickly enough. In the event that your directory structure requires you to have many thousands files open at once that might approach this cap, it can be temporarily changed (at least on linux, I'm less familiar with other OSes so I don't want to go into too many details about how to do this across platforms).
I want to disallow access to file system from clients code, so I think I could overwrite open function
env = {
'open': lambda *a: StringIO("you can't use open")
}
exec(open('user_code.py'), env)
but I got this
unqualified exec is not allowed in function 'my function' it contains a
nested function with free variables
I also try
def open_exception(*a):
raise Exception("you can't use open")
env = {
'open': open_exception
}
but got the same Exception (not "you can't use open")
I want to prevent of:
executing this:
"""def foo():
return open('some_file').read()
print foo()"""
and evaluate this
"open('some_file').write('some text')"
I also use session to store code that was evaluated previously so I need to prevent of executing this:
"""def foo(s):
return open(s)"""
and then evaluating this
"foo('some').write('some text')"
I can't use regex because someone could use (eval inside string)
"eval(\"opxx('some file').write('some text')\".replace('xx', 'en')"
Is there any way to prevent access to file system inside exec/eval? (I need both)
There's no way to prevent access to the file system inside exec/eval. Here's an example code that demonstrates a way for the user code to call otherwise restricted classes that always works:
import subprocess
code = """[x for x in ().__class__.__bases__[0].__subclasses__()
if x.__name__ == 'Popen'][0](['ls', '-la']).wait()"""
# Executing the `code` will always run `ls`...
exec code in dict(__builtins__=None)
And don't think about filtering the input, especially with regex.
You might consider a few alternatives:
ast.literal_eval if you could limit yourself only to simple expressions
Using another language for user code. You might look at Lua or JavaScript - both are sometimes used to run unsafe code inside sandboxes.
There's the pysandbox project, though I can't guarantee you that the sandboxed code is really safe. Python wasn't designed to be sandboxed, and in particular the CPython implementation wasn't written with sandboxing in mind. Even the author seems to doubt the possibility to implement such sandbox safely.
You can't turn exec() and eval() into a safe sandbox. You can always get access to the builtin module, as long as the sys module is available::
sys.modules[().__class__.__bases__[0].__module__].open
And even if sys is unavailable, you can still get access to any new-style class defined in any imported module by basically the same way. This includes all the IO classes in io.
This actually can be done.
That is, practically just what you describe can be accomplished on Linux, contrary to other answers here. That is, you can achieve a setup where you can have an exec-like call which runs untrusted code under security which is reasonably difficult to penetrate, and which allows output of the result. Untrusted code is not allowed to access the filesystem at all except for reading specifically allowed parts of the Python vm and standard library.
If that's close enough to what you wanted, read on.
I'm envisioning a system where your exec-like function spawns a subprocess under a very strict AppArmor profile, such as the one used by Straitjacket (see here and here). This will limit all filesystem access at the kernel level, other than files specifically allowed to be read. This will also limit the process's stack size, max data segment size, max resident set size, CPU time, the number of signals that can be queued, and the address space size. The process will have locked memory, cores, flock/fcntl locks, POSIX message queues, etc, wholly disallowed. If you want to allow using size-limited temporary files in a scratch area, you can mkstemp it and make it available to the subprocess, and allow writes there under certain conditions (make sure that hard links are absolutely disallowed). You'd want to make sure to clear out anything interesting from the subprocess environment and put it in a new session and process group, and close all FDs in the subprocess except for the stdin/stdout/stderr, if you want to allow communication with those.
If you want to be able to get a Python object back out from the untrusted code, you could wrap it in something which prints the result's repr to stdout, and after you check its size, you evaluate it with ast.literal_eval(). That pretty severely limits the possible types of object that can be returned, but really, anything more complicated than those basic types probably carries the possibility of sekrit maliciousness intended to be triggered within your process. Under no circumstances should you use pickle for the communication protocol between the processes.
As #Brian suggest overriding open doesn't work:
def raise_exception(*a):
raise Exception("you can't use open")
open = raise_exception
print eval("open('test.py').read()", {})
this display the content of the file but this (merging #Brian and #lunaryorn answers)
import sys
def raise_exception(*a):
raise Exception("you can't use open")
__open = sys.modules['__builtin__'].open
sys.modules['__builtin__'].open = raise_exception
print eval("open('test.py').read()", {})
will throw this:
Traceback (most recent call last):
File "./test.py", line 11, in <module>
print eval("open('test.py').read()", {})
File "<string>", line 1, in <module>
File "./test.py", line 5, in raise_exception
raise Exception("you can't use open")
Exception: you can't use open
Error in sys.excepthook:
Traceback (most recent call last):
File "/usr/lib/python2.6/dist-packages/apport_python_hook.py", line 48, in apport_excepthook
if not enabled():
File "/usr/lib/python2.6/dist-packages/apport_python_hook.py", line 23, in enabled
conf = open(CONFIG).read()
File "./test.py", line 5, in raise_exception
raise Exception("you can't use open")
Exception: you can't use open
Original exception was:
Traceback (most recent call last):
File "./test.py", line 11, in <module>
print eval("open('test.py').read()", {})
File "<string>", line 1, in <module>
File "./test.py", line 5, in raise_exception
raise Exception("you can't use open")
Exception: you can't use open
and you can access to open outside user code via __open
"Nested function" refers to the fact that it's declared inside another function, not that it's a lambda. Declare your open override at the top level of your module and it should work the way you want.
Also, I don't think this is totally safe. Preventing open is just one of the things you need to worry about if you want to sandbox Python.