WAF: recursive find_program()

WAF: recursive find_program() - python

I am very new to WAF. I have configuration function like:
def configure(ctx):
########################################################################
# **/myexe does not work too; also abs path in path_list does not work!
ctx.find_program('myexe', var='MYEXE', path_list=['mydir/here'])
and it does not find myexe binary. Only if I pass 'mydir/here/this_dir'! It seems that find_program() is not recursive. How to do it in recursive way? Maybe another method?

find_program is not recursive, meaning that it doesn't look for subdirectories of the ones you provide. It's for efficiency and security reasons. That the same when your OS look for binaries, it looks in a path list (usually through the PATH environment variable) but not recursively in subdirectories. A hacker can put a modified command in a subdirectory that will be used instead of the real one. That why the current directory is never in PATH :)
As waf is python, and if you absolutely want to get that behavior, you can implement it :)

Related

What's the use case for pythons os.path.join dropping all arguments before when one is an absolute path? [duplicate]

I'm learning Python and I noticed something strange with one of my scripts. Doing a little testing I discovered the problem stemmed from this behavior:
>>> import os
>>> os.path.join('a','b')
'a/b'
>>> os.path.join('a','/b')
'/b'
Checking the documentation, this is, in fact, the design of the function:
os.path.join(path1[, path2[, ...]])
Join one or more path components intelligently. If any component is an absolute path, all previous components (on Windows, including the previous drive letter, if there was one) are thrown away, and joining continues. ...
My question isn't why my script failed, but rather why the function was designed this way. I mean, on Unix at least, a//b is a perfectly acceptable way to designate a path, if not elegant. Why was the function designed this way? Is there any way to tell if one or more path elements have been discarded short of testing each path string with os.path.isabs()?
Out of curiosity, I also checked the case where a path component ends in an os.sep character:
>>> os.path.join('a/','b')
'a/b'
That works as expected.

One case where it is useful for os.path.join('a', '/b') to return /b would be if you ask a user for a filename.
The user can enter either a path relative to the current directory, or a full path, and your program could handle both cases like this:
os.path.join(os.getcwd(), filename)
In [54]: os.getcwd()
Out[54]: '/tmp'
In [55]: os.path.join(os.getcwd(), 'foo')
Out[55]: '/tmp/foo'
In [56]: os.path.join(os.getcwd(), '/foo/bar')
Out[56]: '/foo/bar'

Think you're writing a utility like cd to check the new directory, you would use
os.path.join(currdir, newdir)
If the user enters /b you would except it to throw the first argument. This hold for plenty of thing using current directory.

Methods to avoid hard-coding file paths in Python

Working with scientific data, specifically climate data, I am constantly hard-coding paths to data directories in my Python code. Even if I were to write the most extensible code in the world, the hard-coded file paths prevent it from ever being truly portable. I also feel like having information about the file system of your machine coded in your programs could be security issue.
What solutions are out there for handling the configuration of paths in Python to avoid having to code them out explicitly?

One of the solution rely on using configuration files.
You can store all your path in a json file like so :
{
"base_path" : "/home/bob/base_folder",
"low_temp_area_path" : "/home/bob/base/folder/low_temp"
}
and then in your python code, you could just do :
import json
with open("conf.json") as json_conf :
CONF = json.load(json_conf)
and then you can use your path (or any configuration variable you like) like so :
print "The base path is {}".format(CONF["base_path"])

First off its always good practise to add a main function to go with each class to test that class or functions in the file. Along with this you determine the current working directory. This becomes incredibly important when running python from a cron job or from a directory that is not the current working directory. No JSON files or environment variables are then needed and you will obtain interoperation across Mac, RHEL and Debian distributions.
This is how you do it, and it will work on windows also if you use '\' instead of '/' (if that is even necessary, in your case).
if "__main__" == __name__:
workingDirectory = os.path.realpath(sys.argv[0])
As you can see when you run your command, the working directory is calculated if you provide a full path or relative path, meaning it will work in a cron job automatically.
After that if you want to work with data that is stored in the current directory use:
fileName = os.path.join( workingDirectory, './sub-folder-of-current-directory/filename.csv' )
fp = open( fileName,'r')
or in the case of the above working directory (parallel to your project directory):
fileName = os.path.join( workingDirectory, '../folder-at-same-level-as-my-project/filename.csv' )
fp = open( fileName,'r')

I believe there are many ways around this, but here is what I would do:
Create a JSON config file with all the paths I need defined.
For even more portability, I'd have a default path where I look for this config file but also have a command line input to change it.

In my opinion passing arguments from command line would be best solution. You should take a look at argparse . This allows you to create nice way to handle arguments from the command line. for example:
myDataScript.py /home/userName/datasource1/

%USERPROFILE% env variable for python

I am writing a script in Python 2.7.
It needs to be able to go whoever the current users profile in Windows.
This is the variable and function I currently have:
import os
desired_paths = os.path.expanduser('HOME'\"My Documents")
I do have doubts that this expanduser will work though. I tried looking for Windows Env Variables to in Python to hopefully find a list and know what to convert it to. Either such tool doesn't exist or I am just not using the right search terms since I am still pretty new and learning.

You can access environment variables via the os.environ mapping:
import os
print(os.environ['USERPROFILE'])
This will work in Windows. For another OS, you'd need the appropriate environment variable.
Also, the way to concatenate strings in Python is with + signs, so this:
os.path.expanduser('HOME'\"My Documents")
^^^^^^^^^^^^^^^^^^^^^
should probably be something else. But to concatenate paths you should be more careful, and probably want to use something like:
os.sep.join(<your path parts>)
# or
os.path.join(<your path parts>)
(There is a slight distinction between the two)
If you want the My Documents directory of the current user, you might try something like:
docs = os.path.join(os.environ['USERPROFILE'], "My Documents")
Alternatively, using expanduser:
docs = os.path.expanduser(os.sep.join(["~","My Documents"]))
Lastly, to see what environment variables are set, you can do something like:
print(os.environ.keys())
(In reference to finding a list of what environment vars are set)

Going by os.path.expanduser , using a ~ would seem more reliable than using 'HOME'.

Does os.walk take advantage of the file type returned by the OS for efficiency?

The os.walk function returns separate lists for directories and files. The underlying OS calls on many common operating systems such as Windows and Linux return a file type or flag specifying whether each directory entry is a file or a directory; without this flag it's necessary to query the OS again for each returned filename. Does the code for os.walk make use of this information or does it throw it away as os.listdir does?

Nope, it does not.
Under the hood, os.walk() uses os.listdir() and os.path.isdir() to list files and directories separately. See the source code of walk().
Specifically:
try:
# Note that listdir and error are globals in this module due
# to earlier import-*.
names = listdir(top)
except error, err:
if onerror is not None:
onerror(err)
return
dirs, nondirs = [], []
for name in names:
if isdir(join(top, name)):
dirs.append(name)
else:
nondirs.append(name)
where listdir and isdir are module globals for the os.listdir() and os.path.isdir() functions. It calls itself recursively for subdirs.

As Martijn Pieters's answer explains, os.walk just uses os.listdir and os.path.isdir.
There's been some discussion on this a few times on the mailing lists, but no concrete suggestion for the stdlib has ever come out of it. There are various edge cases that make this less trivial than it seems. Also, if Python 3.4 or later grows a new path module, there's a good chance os.walk will just be replaced/deprecated rather than improved in place.
However, there are a number of third-party modules that you can use.
The simplest is probably Ben Hoyt's betterwalk. I believe he's intending to get this on PyPI, and maybe even submit it for Python 3.4 or later, but at present you have to install it off github. betterwalk provides an os.listdir replacement called iterdir_stat, and a 90%-complete os.walk replacement built on top of it. On most POSIX systems, and Win32, it can usually avoid unnecessary stat calls. (There are some cases where it can't do as good a job as fts (3)/nftw (3)/find (1), but at worst it just does some unnecessary calls, rather than failing. The parts that may not be complete, last I checked, are dealing with symlinks, and maybe error handling.)
There's also a nice wrapper around fts for POSIX systems, which is obviously ideal as far as performance goes on modern POSIX systems—but it has a different (better, in my opinion, but still different) interface, and doesn't support Windows or other platforms (or even older POSIX systems).
There are also about 30-odd "everything under the sun to do with paths" modules on PyPI and elsewhere, some of which have new walk-like functions.

Find a path in Windows relative to another

This problem should be a no-brainer, but I haven't yet been able to nail it.
I need a function that takes two parameters, each a file path, relative or absolute, and returns a filepath which is the first path (target) resolved relative to the second path (start). The resolved path may be relative to the current directory or may be absolute (I don't care).
Here as an attempted implementation, complete with several doc tests, that exercises some sample uses cases (and demonstrates where it fails). A runnable script is also available on my source code repository, but it may change. The runnable script will run the doctest if no parameters are supplied or will pass one or two parameters to findpath if supplied.
def findpath(target, start=os.path.curdir):
r"""
Find a path from start to target where target is relative to start.
>>> orig_wd = os.getcwd()
>>> os.chdir('c:\\windows') # so we know what the working directory is
>>> findpath('d:\\')
'd:\\'
>>> findpath('d:\\', 'c:\\windows')
'd:\\'
>>> findpath('\\bar', 'd:\\')
'd:\\bar'
>>> findpath('\\bar', 'd:\\foo') # fails with '\\bar'
'd:\\bar'
>>> findpath('bar', 'd:\\foo')
'd:\\foo\\bar'
>>> findpath('bar\\baz', 'd:\\foo')
'd:\\foo\\bar\\baz'
>>> findpath('\\baz', 'd:\\foo\\bar') # fails with '\\baz'
'd:\\baz'
Since we're on the C drive, findpath may be allowed to return
relative paths for targets on the same drive. I use abspath to
confirm that the ultimate target is what we expect.
>>> os.path.abspath(findpath('\\bar'))
'c:\\bar'
>>> os.path.abspath(findpath('bar'))
'c:\\windows\\bar'
>>> findpath('..', 'd:\\foo\\bar')
'd:\\foo'
>>> findpath('..\\bar', 'd:\\foo')
'd:\\bar'
The parent of the root directory is the root directory.
>>> findpath('..', 'd:\\')
'd:\\'
restore the original working directory
>>> os.chdir(orig_wd)
"""
return os.path.normpath(os.path.join(start, target))
As you can see from the comments in the doctest, this implementation fails when the start specifies a drive letter and the target is relative to the root of the drive.
This brings up a few questions
Is this behavior a limitation of os.path.join? In other words, should os.path.join('d:\foo', '\bar') resolve to 'd:\bar'? As a Windows user, I tend to think so, but I hate to think that a mature function like path.join would need alteration to handle this use case.
Is there an example of an existing target path resolver such as findpath that will work in all of these test cases?
If 'no' to the above questions, how would you implement this desired behavior?

I agree with you: this seems like a deficiency in os.path.join. Looks like you have to deal with the drives separately. This code passes all your tests:
def findpath(target, start=os.path.curdir):
sdrive, start = os.path.splitdrive(start)
tdrive, target = os.path.splitdrive(target)
rdrive = tdrive or sdrive
return os.path.normpath(os.path.join(rdrive, os.path.join(start, target)))
(and yes, I had to nest two os.path.join's to get it to work...)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.