Methods to avoid hard-coding file paths in Python - python

Working with scientific data, specifically climate data, I am constantly hard-coding paths to data directories in my Python code. Even if I were to write the most extensible code in the world, the hard-coded file paths prevent it from ever being truly portable. I also feel like having information about the file system of your machine coded in your programs could be security issue.
What solutions are out there for handling the configuration of paths in Python to avoid having to code them out explicitly?

One of the solution rely on using configuration files.
You can store all your path in a json file like so :
{
"base_path" : "/home/bob/base_folder",
"low_temp_area_path" : "/home/bob/base/folder/low_temp"
}
and then in your python code, you could just do :
import json
with open("conf.json") as json_conf :
CONF = json.load(json_conf)
and then you can use your path (or any configuration variable you like) like so :
print "The base path is {}".format(CONF["base_path"])

First off its always good practise to add a main function to go with each class to test that class or functions in the file. Along with this you determine the current working directory. This becomes incredibly important when running python from a cron job or from a directory that is not the current working directory. No JSON files or environment variables are then needed and you will obtain interoperation across Mac, RHEL and Debian distributions.
This is how you do it, and it will work on windows also if you use '\' instead of '/' (if that is even necessary, in your case).
if "__main__" == __name__:
workingDirectory = os.path.realpath(sys.argv[0])
As you can see when you run your command, the working directory is calculated if you provide a full path or relative path, meaning it will work in a cron job automatically.
After that if you want to work with data that is stored in the current directory use:
fileName = os.path.join( workingDirectory, './sub-folder-of-current-directory/filename.csv' )
fp = open( fileName,'r')
or in the case of the above working directory (parallel to your project directory):
fileName = os.path.join( workingDirectory, '../folder-at-same-level-as-my-project/filename.csv' )
fp = open( fileName,'r')

I believe there are many ways around this, but here is what I would do:
Create a JSON config file with all the paths I need defined.
For even more portability, I'd have a default path where I look for this config file but also have a command line input to change it.

In my opinion passing arguments from command line would be best solution. You should take a look at argparse . This allows you to create nice way to handle arguments from the command line. for example:
myDataScript.py /home/userName/datasource1/

Related

Running Fortran executable within Python script

I am writing a Python script with the following objectives:
Starting from current working directory, change directory to child directory 'A'
Make slight adjustments to a fort.4 file
Run a Fortran binary file (the syntax of which is ../../../../ continuing until I hit the folder containing the binary); return to 2. until my particular objective is complete, then
Back out of child directory to parent, then enter another child directory and return to 2. until I have iterated through all the folders in question.
The code is coming along well. I am having to rely heavily upon Python's OS module for the directory work. However, I have never had any experience a) making minor adjustments of a file using python and b) running an executable. Could you guys give me some ideas on Python modules, direct me to a similar stack source etc, or perhaps give ways that this can be accomplished? I understand this is a vague question, so please ask if you do not understand what I am asking and I will elaborate. Also, the changes I have to make to this fort.4 file are repetitive in nature; they all happen at the same position in the file.
Cheers
EDIT::
entire fort.4 file:
file_name
movie1.dat !name of a general file the binary reads
nbr_box ! line3-8 is general info
2
this_box
1
lrdf_bead
.true.
beadid1
C1 !this is the line I must change
beadid2
F4 !This is a second line I must change
lrdf_com
.false.
bin_width
0.04
rcut
7
So really, I need to change "C1" to "C2" for example. The changes are very insignificant to make, but I must emphasize the fact that the main fortran executable reads this fort.4, as well as this movie1.dat file that I have already created. Hope this helps
Ok so there is a few important things here, first we need to be able to manage our cwd, for that we will use the os module
import os
whenever a method operates on a folder it is important to change directories into the folder and back to the parent folder. This can also be achieved with the os module.
def operateOnFolder(folder):
os.chdir(folder)
...
os.chdir("..")
Now we need to do some method for each directory, that comes with this,
for k in os.listdir(".") if os.path.isdir(k):
operateOnFolder(k)
Finally in order to operate on some preexisting FORTRAN file we can use the builtin file operators.
fileSource = open("someFile.f","r")
fileText = fileSource.read()
fileSource.close()
fileLines = fileText.split("\n")
# change a line in the file with -> fileLines[42] = "the 42nd line"
fileText = "\n".join(fileLines)
fileOutput = open("someFile.f","w")
fileOutput.write(fileText)
You can create and run your executable output.fx from source.f90::
subprocess.call(["gfortran","-o","output.fx","source.f90"])#create
subprocess.call(["output.fx"]) #execute

%USERPROFILE% env variable for python

I am writing a script in Python 2.7.
It needs to be able to go whoever the current users profile in Windows.
This is the variable and function I currently have:
import os
desired_paths = os.path.expanduser('HOME'\"My Documents")
I do have doubts that this expanduser will work though. I tried looking for Windows Env Variables to in Python to hopefully find a list and know what to convert it to. Either such tool doesn't exist or I am just not using the right search terms since I am still pretty new and learning.
You can access environment variables via the os.environ mapping:
import os
print(os.environ['USERPROFILE'])
This will work in Windows. For another OS, you'd need the appropriate environment variable.
Also, the way to concatenate strings in Python is with + signs, so this:
os.path.expanduser('HOME'\"My Documents")
^^^^^^^^^^^^^^^^^^^^^
should probably be something else. But to concatenate paths you should be more careful, and probably want to use something like:
os.sep.join(<your path parts>)
# or
os.path.join(<your path parts>)
(There is a slight distinction between the two)
If you want the My Documents directory of the current user, you might try something like:
docs = os.path.join(os.environ['USERPROFILE'], "My Documents")
Alternatively, using expanduser:
docs = os.path.expanduser(os.sep.join(["~","My Documents"]))
Lastly, to see what environment variables are set, you can do something like:
print(os.environ.keys())
(In reference to finding a list of what environment vars are set)
Going by os.path.expanduser , using a ~ would seem more reliable than using 'HOME'.

Python temporary directory to execute other processes?

I have a string of Java source code in Python that I want to compile, execute, and collect the output (stdout and stderr). Unfortunately, as far as I can tell, javac and java require real files, so I have to create a temporary directory.
What is the best way to do this? The tempfile module seems to be oriented towards creating files and directories that are only visible to the Python process. But in this case, I need Java to be able to see them too. However, I also want the other stuff to be handled intelligently if possible (such as deleting the folder when done or using the appropriate system temp folder)
tempfile.NamedTemporaryFile and tempfile.TemporaryDirectory work perfectly fine for your purposes. The resulting objects have a .name attribute that provides a file system visible name that java/javac can handle just fine, just make sure to:
Set the suffix appropriately if the compiler insists on files being named with a .java extension
Always call .flush() on the file handle before handing the .name of a NamedTemporaryFile to an external process or it may (usually will) see an incomplete file
If you don't want Python cleaning up the files when you close the objects, either pass delete=False to NamedTemporaryFile's constructor, or use the mkstemp and mkdtemp functions (which create the objects, but don't clean them up for you).
So for example, you might do:
# Create temporary directory for source and class files
with tempfile.TemporaryDirectory() as d:
# Write source code
srcpath = os.path.join(d.name, "myclass.java")
with open(srcpath, "w") as srcfile:
srcfile.write('source code goes here')
# Compile source code
subprocess.check_call(['javac', srcpath])
# Run source code
# Been a while since I've java-ed; you don't include .java or .class
# when running, right?
invokename = os.path.splitext(srcpath)[0]
subprocess.check_call(['java', invokename])
... with block for TemporaryDirectory done, temp directory cleaned up ...
tempfile.mkstemp creates a file that is normally visible in the filesystem and returns you the path as well. You should be able to use this to create your input and output files - assuming javac will atomically overwrite the output file if it exists there should be no race condition if other processes on your system don't misbehave.

Python: execfile from other file's working directory?

I have some code that loads a default configuration file and then allows users to supply their own Python files as additional supplemental configuration or overrides of the defaults:
# foo.py
def load(cfg_path=None):
# load default configuration
exec(default_config)
# load user-specific configuration
if cfg_path:
execfile(cfg_path)
There is a problem, though: execfile() executes directives in the file specified by cfg_path as if it were in the working directory of foo.py, not its own working directory. Thus, import directives might fail if the cfg_path file does, say, from m import x where m is a module in the same directory as cfg_path.
How do I execfile() from the working directory of its argument, or otherwise achieve an equivalent result? Also, I've been told that execfile is deprecated in Python 3 and that I should be using exec, so if there's a better way that I should be doing this, I'm all ears.
Note: I don't think solutions which merely change the working directory are correct. That won't put those modules on the interpreter's module-lookup path, as far as I can tell.
os.chdir lets you change the working directory as you wish (you can extract the working directory of cfg_path with os.path.dirname); be sure to first get the current directory with os.getcwd if you want to restore it when you're done exec'ing cfg_path.
Python 3 does indeed remove execfile (in favor of a sequence where you read the file, compile the contents, then exec them), but you need not worry about that, if you're currently coding in Python 2.6, since the 2to3 source to source translation deals with all this smoothly and seamlessly.
Edit: the OP says, in a comment, that execfile launches a separate process and does not respect the current working directory. This is false, and here's an example showing that it is:
import os
def makeascript(where):
f = open(where, 'w')
f.write('import os\nprint "Dir in file:", os.getcwd()\n')
f.close()
def main():
where = '/tmp/bah.py'
makeascript(where)
execfile(where)
os.chdir('/tmp')
execfile(where)
if __name__ == '__main__':
main()
Running this on my machine produces output such as:
Dir in file: /Users/aleax/stko
Dir in file: /private/tmp
clearly showing that execfile does keep using the working directory that's set at the time execfile executes. (If the file executed changes the working directory, that will be reflected after execfile returns -- exactly because everything is taking place in the same process!).
So, whatever problems the OP is still observing are not tied to the current working directory (it's hard to diagnose what they may actually be, without seeing the code and the exact details of the observed problems;-).

OS X: Determine Trash location for a given path

Simply moving the file to ~/.Trash/ will not work, as if the file os on an external drive, it will move the file to the main system drive..
Also, there are other conditions, like files on external drives get moved to /Volumes/.Trash/501/ (or whatever the current user's ID is)
Given a file or folder path, what is the correct way to determine the trash folder? I imagine the language is pretty irrelevant, but I intend to use Python
Based upon code from http://www.cocoadev.com/index.pl?MoveToTrash I have came up with the following:
def get_trash_path(input_file):
path, file = os.path.split(input_file)
if path.startswith("/Volumes/"):
# /Volumes/driveName/.Trashes/<uid>
s = path.split(os.path.sep)
# s[2] is drive name ([0] is empty, [1] is Volumes)
trash_path = os.path.join("/Volumes", s[2], ".Trashes", str(os.getuid()))
if not os.path.isdir(trash_path):
raise IOError("Volume appears to be a network drive (%s could not be found)" % (trash_path))
else:
trash_path = os.path.join(os.getenv("HOME"), ".Trash")
return trash_path
Fairly basic, and there's a few things that have to be done seperatly, particularly checking if the filename already exist in trash (to avoid overwriting) and the actual moving to trash, but it seems to cover most things (internal, external and network drives)
Update: I wanted to trash a file in a Python script, so I re-implemented Dave Dribin's solution in Python:
from AppKit import NSURL
from ScriptingBridge import SBApplication
def trashPath(path):
"""Trashes a path using the Finder, via OS X's Scripting Bridge.
"""
targetfile = NSURL.fileURLWithPath_(path)
finder = SBApplication.applicationWithBundleIdentifier_("com.apple.Finder")
items = finder.items().objectAtLocation_(targetfile)
items.delete()
Usage is simple:
trashPath("/tmp/examplefile")
Alternatively, if you're on OS X 10.5, you could use Scripting Bridge to delete files via the Finder. I've done this in Ruby code here via RubyCocoa. The the gist of it is:
url = NSURL.fileURLWithPath(path)
finder = SBApplication.applicationWithBundleIdentifier("com.apple.Finder")
item = finder.items.objectAtLocation(url)
item.delete
You could easily do something similar with PyObjC.
A better way is NSWorkspaceRecycleOperation, which is one of the operations you can use with -[NSWorkspace performFileOperation:source:destination:files:tag:]. The constant's name is another artifact of Cocoa's NeXT heritage; its function is to move the item to the Trash.
Since it's part of Cocoa, it should be available to both Python and Ruby.
In Python, without using the scripting bridge, you can do this:
from AppKit import NSWorkspace, NSWorkspaceRecycleOperation
source = "path holding files"
files = ["file1", "file2"]
ws = NSWorkspace.sharedWorkspace()
ws.performFileOperation_source_destination_files_tag_(NSWorkspaceRecycleOperation, source, "", files, None)
The File Manager API has a pair of functions called FSMoveObjectToTrashAsync and FSPathMoveObjectToTrashSync.
Not sure if that is exposed to Python or not.
Another one in ruby:
Appscript.app('Finder').items[MacTypes::Alias.path(path)].delete
You will need rb-appscript gem, you can read about it here

Categories

Resources