Clean handling of file path with spaces in Makefile - python

I have a script that pulls in data from a server drive where I do not have the ability to alter the directory names and they all have spaces in them. I am using a Makefile to run the script (in Windows) and it is presenting a problem.
My initial workaround is having a python script run before make is called to copy the data from the server into my local folder, and it looks like this:
# grab_data.py
import argparse
parser = argparse.ArgumentParser()
parser.add_argument("-o", "--output", help="output filepath")
args = parser.parse_args()
output_path = Path(args.output)
src = 'S:/Server Path/To Data I Need/File I Need.xlsx'
dst = output_path
shutil.copyfile(src, dst)
And I run my Makefile like this:
.PHONY : runall
runall : data/file_i_need.xlsx final_output.csv
python grab_data.py - o data/file_i_need.xlsx
final_output.csv : data/file_i_need.xlsx processing_script.py
python processing_script.py -i $< -o $#
I want to find some way to include the file 'S:/Server Path/To Data I Need/File I Need.xlsx' directly in the Makefile but cannot figure out what will work. Is there some other workaround that would allow me to do this?

Not sure what version of make you're using, but with gmake (which likely is one of the options around Windows) escaping spaces would work:
x\ yz: ab\ c
cp "$<" "$#"
Then you get:
$ make
cp "ab c" "x yz"
As pointed out in the comment, to not forgo it as too obvious. I've also double-quoted variables used in the recipe to make sure those are correctly passed through as a whole string. If a rule has multiple prerequisites and only one of them contains space(s), it's still the same story, you just need to make it the first prerequisite so that you can refer to "$<". If you had multiple prerequisites with spaces and wanted to refer to all of them, I fear you may be out of luck even trying to expand $^ with $(foreach ...) won't help.

Related

"Sort by random" in OSX Finder setting kMDItemFinderComment to a random hash for every file in a folder?

Finder allows you to sort files by many different attributes.
In the OSX filesystem, there is an attribute on every file called "comments" (com.apple.metadata:kMDItemFinderComment) which allows you to add any arbitrary string data as metadata for that file.
Finder exposes this "comment" attribute in the GUI and you can "sort" by it. I thought I could abuse this attribute to fill in random data for the each files "comments", and then sort by those random comments.
tldr; I'm trying to create "sort by random" functionality (in Finder) with the help of a BASH script and some python.
this does work to achieve that (sort of):
find $1 -type f -print0 | while IFS= read -r -d $'\0' file; #get a list of files in the dir
do
if [[ $file == *.wav ]]
then
hash=$(openssl rand -hex 12); #generate a random hash
osxmetadata --set findercomment "$hash" $file; #set the comment
fi
done
here i'm using the osxmetadata python utility to do the heavy lifting.
and while it works as intended, but it's really slow:
https://i.stack.imgur.com/d7exk.gif
i'm trying to do this operation on folders with many items, and would frequently be "re-seeding" the files with random comments.
can anyone suggest an optimization i can try to make this faster? i tried using xattrs but that doesn't seem reindex the comments in finder when they update.
I'd wrap the then-clause in a (...)& and add a wait after the loop. Then it will do every file in parallel.

parse_args all .png files from a parser argument

I would like to get a arg.pics which returns something like ['pic1.png', 'pic2.png', 'pic3.png'] (to arbitrarily parse all files of .png format) after running the following (test.py):
import argparse
import os
def parser_arg():
par = argparse.ArgumentParser()
parser = par.add_argument_group('pictures')
parser.add_argument("-p", "--pics", nargs="+", help="picture files", required=True)
arguments = par.parse_args()
return arguments
args = parser_arg()
And upon running the script via command line, and inputting
python test.py -p ../User/Desktop/Data/*.png
then args.pics returns ['../User/Desktop/Data/*.png'] instead..
Am I using the right approach? I heard using *.png will be expanded into the .png files once inputted but it doesn't seem to be the case on my end.
Edits: I'm using Anaconda Prompt on Windows 10 if it helps.
There are a couple of things that could be going on. One possibility is that ../User/Desktop/Data/*.png does not match any files, so does not get expanded. This would happen on a UNIX-like shell only (or PowerShell I suppose). The other possibility is that you are using cmd.exe on Windows, which simply does not do wildcard expansion at all. Given that you are using Anaconda prompt on Windows, I would lean towards the latter possibility as the explanation.
Since you are looking for a list of all the PNGs in a folder, you don't need to rely on the shell at all. There are lots of ways of doing the same thing in Python, with and without integrating in argparse.
Let's start by implementing the listing functionality. Given a directory, here are some ways to get a list of all the PNGs in it:
Use glob.glob (recommended option). This can either recurse into subdirectories or not, depending on what you want:
mydir = '../User/Desktop/Data/'
pngs = glob.glob(os.path.join(mydir, '*.png'))
To recurse into subfolders, just add the recursive=True keyword-only option.
Use os.walk. This is much more flexible (and therefore requires more work), but also lets you have recursive or non-recursive solutions:
mydir = '../User/Desktop/Data/'
pngs = []
for path, dirs, files in os.walk(mydir):
pngs.extend(f for f in files if f.lower().endswith('.png'))
del dirs[:]
To enable recursion, just delete the line del dirs[:], which suppresses subdirectory search.
A related method that is always non-recursive, is to use os.listdir, which is Pythons rough equivalent to ls or dir commands:
mydir = '../User/Desktop/Data/'
pngs = [f for f in os.listdir(mydir) if f.lower().endswith('.png')]
This version does not check if something is actually a file. It assumes you don't have folder names ending in .png.
Let's say you've picked one of these methods and created a function that accepts a folder and returns a file list:
def list_pngs(directory):
return glob.glob(os.path.join(directory, '*.png'))
Now that you know how to list files in a folder, you can easily plug this into argparse at any level. Here are a couple of examples:
Just get all your directories from the argument and list them out afterwards:
parser.add_argument("-p", "--pics", action='store', help="picture files", required=True)
Once you've processed the arguments:
print(list_pngs(args.pics))
Integrate directly into argparse with the type argument:
parser.add_argument("-p", "--pics", action='store', type=list_pngs, help="picture files", required=True)
Now you can use the argument directly, since it will be converted into a list directly:
print(args.pics)
Your approach is correct. However, your script will only receive the expanded list of files as parameters if your shell supports globbing and the pattern actually matches any files. Otherwise, it will be the pattern itself in most cases.
The Anaconda Command Prompt uses cmd.exe by default, which doesn't support wildcard expansion. You could use PowerShell instead, which does understand wildcards. Alternatively, you can do the expansion in your application as described in Mad Physicist's answer.

Sphinx apidoc section titles for Python module/package names

When I run sphinx-apidoc and then make html it produces doc pages that have "Subpackages" and "Submodules" sections and "module" and "package" at the end of each module/package name in the table of contents (TOC). How might I prevent these extra titles from being written without editing the Sphinx source?
here's an example doc pages I would like to make (notice TOC):
http://selenium.googlecode.com/svn/trunk/docs/api/py/index.html#documentation
I understand it is due to the apidoc.py file in the sphinx source (line 88):
https://bitbucket.org/birkenfeld/sphinx/src/ef3092d458cc00c4b74dd342ea05ba1059a5da70/sphinx/apidoc.py?at=default
I could manually edit each individual .rst file to delete these titles or just remove those lines of code from the script but then I'd have to compile the Sphinx source code. Is there an automatic way of doing this without manually editing the Sphinx source?
I was struggling with this myself when I found this question... The answers given didn't quite do what I wanted so I vowed to come back when I figured it out. :)
In order to remove 'package' and 'module' from the auto-generated headings and have docs that are truly automatic, you need to make changes in several places so bear with me.
First, you need to handle your sphinx-apidoc options. What I use is:
sphinx-apidoc -fMeET ../yourpackage -o api
Assuming you are running this from inside the docs directory, this will source yourpackage for documentation and put the resulting files at docs/api. The options I'm using here will overwrite existing files, put module docs before submodule docs, put documentation for each module on its own page, abstain from creating module/package headings if your docstrings already have them, and it won't create a table of contents file.
That's a lot of options to remember, so I just add this to the end of my Makefile:
buildapi:
sphinx-apidoc -fMeET ../yourpackage -o api
#echo "Auto-generation of API documentation finished. " \
"The generated files are in 'api/'"
With this in place, you can just run make buildapi to build your docs.
Next, create an api.rst file at the root of your docs with the following contents:
API Documentation
=================
Information on specific functions, classes, and methods.
.. toctree::
:glob:
api/*
This will create a table of contents with everything in the api folder.
Unfortunately, sphinx-apidoc will still generate a yourpackage.rst file with an ugly 'yourpackage package' heading, so we need one final piece of configuration. In your conf.py file, find the exclude_patterns option and add this file to the list. It should look something like this:
exclude_patterns = ['_build', 'api/yourpackage.rst']
Now your documentation should look exactly like you designed it in the module docstrings, and you never have to worry about your Sphinx docs and your in-code documentation being out of sync!
It's probably late, but the options maxdepth or titlesonly should do the trick.
More details :
http://sphinx-doc.org/latest/markup/toctree.html
The answer by Jen Garcia helped a lot but it requires to put repeat package names in docstrings. I used a Perl one-liner to remove the "module" or "package" suffix in my Makefile:
docs:
rm -rf docs/api docs/_build
sphinx-apidoc -MeT -o docs/api wdmapper
for f in docs/api/*.rst; do\
perl -pi -e 's/(module|package)$$// if $$. == 1' $$f ;\
done
$(MAKE) -C docs html
I didn't want to use the titles within my docstrings as I was following numpy style guidelines. So I first generate the rst files and then run the following python script as a post-processing step.
from pathlib import Path
src_dir = Path("source/api")
for file in src_dir.iterdir():
print("Processed RST file:", file)
with open(file, "r") as f:
lines = f.read()
junk_strs = ["Submodules\n----------", "Subpackages\n-----------"]
for junk in junk_strs:
lines = lines.replace(junk, "")
lines = lines.replace(" module\n=", "\n")
with open(file, "w") as f:
f.write(lines)
This script is kept in the same directory as the makefile. I also add the following lines to the makefile.
html:
# rm -r "$(BUILDDIR)"
python rst_process.py
#$(SPHINXBUILD) -M $# "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
Now running make html builds the documentation in the way I want.
I'm not sure I'm 100% answering your question, but I had a similar experience and I realized I was running sphinx-apidoc with the -f flag each time, which created the .rst files fresh each time.
Now I'm allowing sphinx-apidoc to generate the .rst files once, but not overwriting them, so I can modify them to change titles/etc. and then run make html to propagate the changes. If I want to freshly generate .rst files I can just remove the files I want to regenerate or pass the -f flag in.
So you do have to change the rst files but only once.
In newer versions of Apidoc, you can use a custom Jinja template to control the generated output.
The default templates are here: https://github.com/sphinx-doc/sphinx/tree/5.x/sphinx/templates/apidoc
You can make a local copy of each template using the same names (e.g. source/_templates/toc.rst_t) and invoke sphinx-apidoc with the --templatedir option (e.g. sphinx-apidoc --templatedir source/_templates).
Once you are using your own template file, you can customize it however you want. For example, you can remove the ugly "package" and "module" suffix, which is added at this stage.

File deletion using rm command

I want to make sure that I delete required files.
I have code something like
dir="/some/path/"
file = "somefile.txt"
cmd_rm= "rm -rf "+dir + file
os.system(cmd_rm)
The dir and file values are fetched from a database. How can I make sure I never end up running rm -rf /?
What things should I check before doing rm -rf?
Don't use the -r switch if you just want to remove a single file. Also, there could be spaces in the file name.
Better use the functions in Python's os module instead:
dirname = "/some/path/"
filename = "somefile.txt"
pathname = os.path.abspath(os.path.join(dirname, filename))
if pathname.startswith(dirname):
os.remove(pathname)
Normalizing the path with abspath and comparing it against the target directory avoids file names like "../../../etc/passwd" or similar.
You might consider using os.remove() instead since it's a great deal less dangerous than what you're attempting.
First, I suggest you to use the os.remove() and os.rmdir() functions for working with things like that. You will end up with more portable code and less headache for checking command return.
To check what you are effectively attempting to remove (you may not want to just check "/"), you can use some regular expressions on the generated path or just add a base path to all path returned from you database (depending what you are doing ...).
Use shutil.rmtree as Dave Kirby says. If you want to delete the just the file use:
dir = "/some/path/"
file = "somefile.txt"
cmd = os.path.join(dir, file)
shutil.rmtree(cmd)
If you want to delete the directory use:
dir = "/some/path/"
file = "somefile.txt"
shutil.rmtree(dir)
If the files are write protected make sure you have write permissions before you run this.
There is a module called shutil that provides shell-like file manipulation. If you want to delete a directory and all files and directories in it then use shutil.rmtree.
However it is implemented in python so if you are deleting a huge number of files then spawning rm may be faster, but will fail if the path has a space in it.
Assuming that your mentioning rm -rf is not just at random, but is exactly the command you need, why not just to call it? There is a lib allowing a greater integration with shell called sh.
from sh import rm
path_to_delete = '/some/path'
if os.path.exists(path_to_delete):
rm('-rf', path_to_delete)
PS Make sure you are not root and/or ask for user input to be extra cautious.
And, yes, smoke the man to avoid deleting a single file recursively ;)

Problem with subprocess.call

In my current working directory I have the dir ROOT/ with some files inside.
I know I can exec cp -r ROOT/* /dst and I have no problems.
But if I open my Python console and I write this:
import subprocess
subprocess.call(['cp', '-r', 'ROOT/*', '/dst'])
It doesn't work!
I have this error: cp: cannot stat ROOT/*: No such file or directory
Can you help me?
Just came across this while trying to do something similar.
The * will not be expanded to filenames
Exactly. If you look at the man page of cp you can call it with any number of source arguments and you can easily change the order of the arguments with the -t switch.
import glob
import subprocess
subprocess.call(['cp', '-rt', '/dst'] + glob.glob('ROOT/*'))
Try
subprocess.call('cp -r ROOT/* /dst', shell=True)
Note the use of a single string rather than an array here.
Or build up your own implementation with listdir and copy
The * will not be expanded to filenames. This is a function of the shell. Here you actually want to copy a file named *. Use subprocess.call() with the parameter shell=True.
Provide the command as list instead of the string + list.
The following two commands are same:-
First Command:-
test=subprocess.Popen(['rm','aa','bb'])
Second command:-
list1=['rm','aa','bb']
test=subprocess.Popen(list1)
So to copy multiple files, one need to get the list of files using blob and then add 'cp' to the front of list and destination to the end of list and provide the list to subprocess.Popen().
Like:-
list1=blob.blob("*.py")
list1=['cp']+list1+['/home/rahul']
xx=subprocess.Popen(list1)
It will do the work.

Categories

Resources