Python module to keep a file location variable updated? - python

I have a variable in my python script that holds the path to a file as a string. Is there a way, through a python module, etc, to keep the file path updated if the file were to be moved to another destination?

Short answer: no, not with a string.
Long answer: If you want to use only a string to record the location of this file, you're probably out of luck unless you use other tools to find the location of the file, or record whenever it moves - I don't know what those tools are.
You don't give a lot of info in your question about what you want this variable for; as #DeepSpace says in his comment, if you're trying to make this String follow the file between different runs of this program, then you'd be better off making an argument for the script. If, however, you expect the file to move sometime during the execution of your program, you might be able to use a file object to keep track of it. Or, rather - instead of keeping the filepath in memory, keep a file descriptor in memory instead (the kind you would generate by using the open() function, and just never close that file until the program terminates. You can use seek to return to the start of the file if you needed to read it multiple times. Problems with this include that it's not memory-safe, and it's absolutely not a best practice.

TL;DR
Your best bet is probably to go with a solution like #DeepSpace mentioned where you could go and call your script with a parameter in command-line which will then force the user to input a valid path.
This is actually a really good question, but unfortunately purely Pythonly speaking, this is impossible.
No python module will be able to dynamically linked a variable to a path on the file-system. You will need an external script or routine which will update any kind of data structure that will hold the path value.
Even then, the name of the file could change, but not it's location. here is what I mean by that.
Let's say you were to wrapped that file in a folder only containing that specific file. Since you now know that it's location is fixed (theoretically speaking), you can have another python script/routine that will read the filename and store it in a textfile. Your other script could go and get that file name (considering your routine would sync that file on a regular basis). But, as soon as the location of the file changes, how can you possibly know where it is now. It has to be manually hard coded somewhere to have something close to the behavior your expecting.
Note that my example is not in any way a solution to go-to for your problem. I'm actually trying to underline the shortcomings of such a feature.

Related

Get current directory - 'os' and 'subprocess' library are banned

I'm stuck in a rock and a hard place.
I have been given a very restricted Python 2/3 environment where both os and subprocess library are banned. I am testing file upload functionality using Python selenium; creating the file is straight forward, however, when I use the method driver.send_keys(xyz), the send_keys is expecting a full path rather than just a file name. Any suggestions on how to work around this?
I do not know if it would work in very restricted Python 2/3, but you might try following:
create empty file, let say empty.py like so:
with open('empty.py','w') as f:
pass
then do:
import empty
and:
pathtofile = empty.__file__
print(pathtofile)
This exploits fact empty text file is legal python module (remember to set name which is not use) and that generally import machinery set __file__ dunder, though this is not required, so you might end with None. Keep in mind that if it is not None then it is path to said file (empty.py) so you would need to further process it to get catalog itself.
with no way of using the os module it would seem you're SOL unless the placement of your script is static (i.e. you define the current working dirrectory as a constant) and you then handle all the path/string operations within that directory yourself.
you won't be able to explore what's in the directory but you can keep tabs on any files you create and just store the paths yourself, it will be tedious but there's no reason why it shouldn't work.
you won't be able to delete files i don't think but you should be able to clear their contents

Tracking changes of the file being appended in Python

I would like to track the changes of a file being appended by another program.
My plan of approach is this. I read the file's contents first. At a later time, the file contents would be appended from another application. I want to read the appended data only rather than re-reading everything from the top. In order to do this, I'm going to check the file's modification time. I would then seek() to the previous size of the file, and start reading from there.
Is this a proper approach? Or there is a known idiom for this?
Well, you have to make quite some assumptions about both the other program writing to file as well as the file system, but in generally it should work. Personally I would rather write the current seek position or line number (if reading simple text files) to another file and check it from there. This will also allow you to revert back in the file if some part is rewritten and the file size stays the same (or even gets smaller).
If you have some very important/unique data, besides making backups you should maybe think about appending the new data to new file and later rejoining the files (if needed) when you have checked that the data is fine in your other program. This way you could just read any new file as a whole after certain time. (Also remember that in a larger picture, system time and creation/modification times are not 100% trustworthy).

Locate the Containing Package or Module **Before** Importing

I often find myself needing to import something, but not quite sure of its fully qualified name. I usually end up opening a browser, performing an internet search like python [target_of_import], and scanning a page or two until I find it.
This works, but causes a relatively long break in my workflow, especially if I have to search for a few in a row. How do other people address this?
Is there something like Haskell's Hoogle for Python?
[Note: I currently use vim, in case anyone suggests an IDE-based solution.]
EDIT: For answers concerning autocomplete, please specify this. In general, autocomplete is probably a non-starter solution since in the particular case I am asking about the leftmost characters of the string to be autocompleted are not known.
EDIT 2: While I will not categorically rule out suggestions concerning switching to/learning a new IDE, I'm pretty unlikely to completely change the way I work to accomplish this (e.g., switching from vim on the command line to something like Eclipse + plugins).
You can do this in vim using the Unite.vim
Enable fuzzy file searching by adding the following to your .vimrc:
call unite#filters#matcher_default#use(['matcher_fuzzy'])
Search for file:
:UniteWithInput file_rec/async:/base/path:!<cr>
Search within files:
:UniteWithInut grep:/base/path<cr>
Search file names and within files
:UniteWithInput file_rec/async:/base/path:! grep:/base/path<cr>
(Use to change between sources)
See also :h :UniteWithCursorWord
This will open a buffer with the file matches. You can open the file by pressing enter but since you only want copy the file name simply use y$ to yank the line, q to close the buffer and the p to paste the yanked line.

Dangerous Python Keywords?

I am about to get a bunch of python scripts from an untrusted source.
I'd like to be sure that no part of the code can hurt my system, meaning:
(1) the code is not allowed to import ANY MODULE
(2) the code is not allowed to read or write any data, connect to the network etc
(the purpose of each script is to loop through a list, compute some data from input given to it and return the computed value)
before I execute such code, I'd like to have a script 'examine' it and make sure that there's nothing dangerous there that could hurt my system.
I thought of using the following approach: check that the word 'import' is not used (so we are guaranteed that no modules are imported)
yet, it would still be possible for the user (if desired) to write code to read/write files etc (say, using open).
Then here comes the question:
(1) where can I get a 'global' list of python methods (like open)?
(2) Is there some code that I could add to each script that is sent to me (at the top) that would make some 'global' methods invalid for that script (for example, any use of the keyword open would lead to an exception)?
I know that there are some solutions of python sandboxing. but please try to answer this question as I feel this is the more relevant approach for my needs.
EDIT: suppose that I make sure that no import is in the file, and that no possible hurtful methods (such as open, eval, etc) are in it. can I conclude that the file is SAFE? (can you think of any other 'dangerous' ways that built-in methods can be run?)
This point hasn't been made yet, and should be:
You are not going to be able to secure arbitrary Python code.
A VM is the way to go unless you want security issues up the wazoo.
You can still obfuscate import without using eval:
s = '__imp'
s += 'ort__'
f = globals()['__builtins__'].__dict__[s]
** BOOM **
Built-in functions.
Keywords.
Note that you'll need to do things like look for both "file" and "open", as both can open files.
Also, as others have noted, this isn't 100% certain to stop someone determined to insert malacious code.
An approach that should work better than string matching us to use module ast, parse the python code, do your whitelist filtering on the tree (e.g. allow only basic operations), then compile and run the tree.
See this nice example by Andrew Dalke on manipulating ASTs.
built in functions/keywords:
eval
exec
__import__
open
file
input
execfile
print can be dangerous if you have one of those dumb shells that execute code on seeing certain output
stdin
__builtins__
globals() and locals() must be blocked otherwise they can be used to bypass your rules
There's probably tons of others that I didn't think about.
Unfortunately, crap like this is possible...
object().__reduce__()[0].__globals__["__builtins__"]["eval"]("open('/tmp/l0l0l0l0l0l0l','w').write('pwnd')")
So it turns out keywords, import restrictions, and in-scope by default symbols alone are not enough to cover, you need to verify the entire graph...
Use a Virtual Machine instead of running it on a system that you are concerned about.
Without a sandboxed environment, it is impossible to prevent a Python file from doing harm to your system aside from not running it.
It is easy to create a Cryptominer, delete/encrypt/overwrite files, run shell commands, and do general harm to your system.
If you are on Linux, you should be able to use docker to sandbox your code.
For more information, see this GitHub issue: https://github.com/raxod502/python-in-a-box/issues/2.
I did come across this on GitHub, so something like it could be used, but that has a lot of limits.
Another approach would be to create another Python file which parses the original one, removes the bad code, and runs the file. However, that would still be hit-and-miss.

adding space in an output file with out having to read the entire thing first

Question: How do you write data to an already existing file at the beginning of the file with out writing over what's already there and with out reading the entire file into memory? (e.g. prepend)
Info:
I'm working on a project right now where the program frequently dumps data into a file. this file will very quickly balloon up to 3-4gb. I'm running this simulation on a computer with only 768mb of ram. pulling all that data to the ram over and over will be a great pain and a huge waste of time. The simulation already takes long enough to run as it is.
The file is structured such that the number of dumps it makes is listed at the beginning with just a simple value, like 6. each time the program makes a new dump I want that to be incremented, so now it's 7. the problem lies with the 10th, 100th, 1000th, and so dump. the program will enter the 10 just fine, but remove the first letter of the next line:
"9\n580,2995,2083,028\n..."
"10\n80,2995,2083,028\n..."
obviously, the difference between 580 and 80 in this case is significant. I can't lose these values. so i need a way to add a little space in there so that I can add in this new data without losing my data or having to pull the entire file up and then rewrite it.
Basically what I'm looking for is a kind of prepend function. something to add data to the beginning of a file instead of the end.
Programmed in Python
~n
See the answers to this question:
How do I modify a text file in Python?
Summary: you can't do it without reading the file in (this is due to how the operating system works, rather than a Python limitation)
It's not addressing your original question, but here are some possible workarounds:
Use SQLite (it's bundled with your Python)
Use a fancier database, either RDBMS or NoSQL
Just track the number of dumps in a different text file
The first couple of options are a little more work up front, but provide more flexibility. The last option is the easiest solution to your current problem.
You could quite easily create an new file, output the data you wish to prepend to that file and then copy the content of the existing file and append it to the new one, then rename.
This would prevent having to read the whole file if that is the primary issue.

Categories

Resources