Handling quotes and spaces in filenames - python

I want to create a Python (3) script that passes files to a Linux shell program. Straightforward enough to do, but I'm not sure how to pass filenames that could contain single- or double-quotes and spaces to the shell. I would presumably need to delimit filenames in case they contain spaces.
I might consider a command string something like f"wc -c '{filename}'", but that would break down if I encounter a filename containing a single quote. Likewise if I delimit with double-quotes and encounter a file containing those.
As something like Bob's "special" file would be a valid ext4 filename, how do I cope with all the possibilities?

As Tim Roberts mentioned in comments, you can use subprocess module to bypass this problem. Here is a short example (assuming you have a list of filenames) for passing a list of filenames to wc -c:
from subprocess import run
# assuming you have got a list of filenames
filenames = ['test.py', "Bob's special file", 'test space.py']
for filename in filenames:
run(['wc', '-c', filename])
By the way, if you want to use Python to get all filenames under one specific directory,
you might consider os.listdir.

Related

How do I pass user-input filenames to ImageMagick safely?

I am generating an ImageMagick bash command using Python. Something like
import subprocess
input_file = "hello.png"
output_file = "world.jpg"
subprocess.run(["convert", input_file, output_file])
where there might be more arguments before input_file or output_file. My question is, if either of the filenames is user provided and the user provides a filename that can be parsed as a command line option for ImageMagick, isn't that unsafe?
If the filename starts with a dash, ImageMagick indeed could think that this is an option instead of a filename. Most programs - including AFIK the ImageMagick command line tools - follow the convention that a double-dash (--) denotes the end of the options. If you do a
subprocess.run(["convert", "--", input_file, output_file])
you should be safe in this respect.
From the man page (and a few tests), convert requires an input file and an output file. If you only allow two tokens and if a file name is interpreted as an option then convert is going to miss at least one of the files, so you'll get an ugly message but you should be fine.
Otherwise you can prefix any file name that starts with - with ./ (except - itself, which is stdin or stdout depending on position), so that it becomes an unambiguous file path to the same file.

Ghostcript destination name with blank space returns error [duplicate]

I have a main file which uses(from the main I do a source) a properties file with variables pointing to paths.
The properties file looks like this:
TMP_PATH=/$COMPANY/someProject/tmp
OUTPUT_PATH=/$COMPANY/someProject/output
SOME_PATH=/$COMPANY/someProject/some path
The problem is SOME_PATH, I must use a path with spaces (I can't change it).
I tried escaping the whitespace, with quotes, but no solution so far.
I edited the paths, the problem with single quotes is I'm using another variable $COMPANY in the path
Use one of these threee variants:
SOME_PATH="/mnt/someProject/some path"
SOME_PATH='/mnt/someProject/some path'
SOME_PATH=/mnt/someProject/some\ path
I see Federico you've found solution by yourself.
The problem was in two places. Assignations need proper quoting, in your case
SOME_PATH="/$COMPANY/someProject/some path"
is one of possible solutions.
But in shell those quotes are not stored in a memory,
so when you want to use this variable, you need to quote it again, for example:
NEW_VAR="$SOME_PATH"
because if not, space will be expanded to command level, like this:
NEW_VAR=/YourCompany/someProject/some path
which is not what you want.
For more info you can check out my article about it http://www.cofoh.com/white-shell
You can escape the "space" char by putting a \ right before it.
SOME_PATH=/mnt/someProject/some\ path
should work
If the file contains only parameter assignments, you can use the following loop in place of sourcing it:
# Instead of source file.txt
while IFS="=" read name value; do
declare "$name=$value"
done < file.txt
This saves you having to quote anything in the file, and is also more secure, as you don't risk executing arbitrary code from file.txt.
If the path in Ubuntu is "/home/ec2-user/Name of Directory", then do this:
1) Java's build.properties file:
build_path='/home/ec2-user/Name\\ of\\ Directory'
Where ~/ is equal to /home/ec2-user
2) Jenkinsfile:
build_path=buildprops['build_path']
echo "Build path= ${build_path}"
sh "cd ${build_path}"

Python string concatenation and equivalent of bash parameter expansion

I'm kind of new to python, but something I find myself doing in bash a lot is prepending and appending strings to filenames with parameter expansion.
e.g.
for file in *.txt ; do mkdir ${file%.*} ; mv $file ${file%.*}/ ; done
Would be an example for stripping off the extension of a load of files, making directories based on those names, and then moving the files inside their namesake folders now.
If I want to achieve a similar thing, such as rename the output of a function based on the input file name (below is an example of a Biopython function), I've seen a few ways to do it with string concatenation etc, but without bracketing and so on, it looks confusing and like it might create parsing errors with spaces, quotes and so on being all over the place potentially.
SeqIO.convert(genbank, 'genbank', genbank[:-3]+'tmp', 'fasta')
There are other threads on here about using rsplit, string concatenation and so on, but is one of these more 'correct' than another?
String concatenation is really nice and works great in simple commands like print(), but when adding to commands that are expecting separated values, it strikes me as a little messy?
You can use os.path.splitext which is build especially for file names:
>>> import os
>>>
>>> fname = '/tmp/foo/bar.baz'
>>> sp = os.path.splitext(fname)
>>> sp
('/tmp/foo/bar', '.baz')
Extracting the name of the file without extension:
>>> os.path.basename(sp[0])
'bar'
And formatting a new file name:
>>> "{}.txt".format(os.path.basename(sp[0]))
'bar.txt'
In general, when manipulating file names and paths I try to just use os.path, since it already handles edge cases, can normalize paths from things like /..//./././, etc.

sys.argv arguments with spaces

I'm trying to input folder names as sys.argv arguments, but am having problem with folder names that have spaces, which become multiple variables.
For example, from the command line below, "Folder Name" becomes two variables.
Program.py D:\Users\Erick\Desktop\Folder Name
Any solutions?
Space is the delimiter for command line arguments. You'll be better off not using spaces in your directory and file names if possible. For entering an argument which has space in it you'll have to enclose it in quotes "folder with space".
Program.py "D:\Users\Erick\Desktop\Folder Name"
Assuming input is always going to be a single file/folder path:
path = " ".join(sys.argv[1:])
To extend the simplicity of Arshiyan's answer for a case involving multiple paths, you could join the paths with a delimiter such as a hash, and then split the resulting string when it gets to python...
paths = " ".join(sys.argv[1:]).split("#")

Why is glob ignoring some directories?

I'm trying to find all *.txt files in a directory with glob(). In some cases, glob.glob('some\path\*.txt') gives an empty string, despite existing files in the given directories. This is especially true, if path is all lower-case or numeric.
As a minimal example I have two folders a and A on my C: drive both holding one Test.txt file.
import glob
files1 = glob.glob('C:\a\*.txt')
files2 = glob.glob('C:\A\*.txt')
yields
files1 = []
files2 = ['C:\\A\\Test.txt']
If this is by design, is there any other directory name, that leads to such unexpected behaviour?
(I'm working on win 7, with Python 2.7.10 (32bit))
EDIT: (2019) Added an answer for Python 3 using pathlib.
The problem is that \a has a special meaning in string literals (bell char).
Just double backslashes when inserting paths in string literals (i.e. use "C:\\a\\*.txt").
Python is different from C because when you use backslash with a character that doesn't have a special meaning (e.g. "\s") Python keeps both the backslash and the letter (in C instead you would get just the "s").
This sometimes hides the issue because things just work anyway even with a single backslash (depending on what is the first letter of the directory name) ...
I personally avoid using double-backslashes in Windows and just use Python's handy raw-string format. Just change your code to the following and you won't have to escape the backslashes:
import glob
files1 = glob.glob(r'C:\a\*.txt')
files2 = glob.glob(r'C:\A\*.txt')
Notice the r at the beginning of the string.
As already mentioned, the \a is a special character in Python. Here's a link to a list of Python's string literals:
https://docs.python.org/2/reference/lexical_analysis.html#string-literals
As my original answer attracted more views than expected and some time has passed. I wanted to add an answer that reliably solves this kind of problems and is also cross-plattform compatible. It's in python 3 on Windows 10, but should also work on *nix systems.
from pathlib import Path
filepath = Path(r'C:\a')
filelist = list(filepath.glob('*.txt'))
--> [WindowsPath('C:/a/Test.txt')]
I like this solution better, as I can copy and paste paths directly from windows explorer, without the need to add or double backslashes etc.

Categories

Resources