find file with variable name in python script

find file with variable name in python script - python

I am trying to execute the find command in a python script, using a for loop to pass a variable index determining the specific file name to find. I am using the following syntax, which in python returns an empty set, however works in the terminal:
for j in [1,2,3,5]:
file_name = cmd.getoutput('find . -type f -name "*${j}-xyz.stc" -printf "%f\n"')
Obviously, the variable is not being passed to the find expression in my python code, but how can I remedy that? Any suggestions are appreciated.

variables aren't expanded in python the same as they are in bash. You probably want:
command = r'find . -type f -name "*{0}-xyz.stc" -printf "%f\n"'.format(j)
file_name = cmd.getoutput(command)
Also note that the commands module is deprecated in favor of subprocess. Finally, it should probably be pointed out that you could write this function in python without relying on find if you used os.walk in conjunction with glob.glob.
untested, but something like this should be close ...
import os
import glob
def find_files(glob_expr):
for root,_,_ in os.walk(os.curdir):
for fname in glob.iglob(os.path.join(os.curdir,root,glob_expr)):
yield fname
for i in (1,2,3,4):
print (list(find_files('{0}-xyz.stc'.format(i))))

file_name = cmd.getoutput('find . -type f -name "*%i-xyz.stc" -printf "%%f\n"' % (j))

Passing filenames in a string to the shell is unsafe (leads to potentially security-impacting bugs). Best practice is to pass an explicit argv list:
import subprocess
for j in range(1, 6):
file_name = subprocess.check_output(['find', '.', '-type', 'f', '-name',
'*%s-xyz.stc' % (j,),
'-printf', '%f\\n'])
If you really care about correctness (and you should!), use '%f\\0' as your format string, and expect your outputs to be NUL-separated. Otherwise, you can't tell the difference between a file with a newline in its name and two files returned.
To appreciate the importance, consider the case where an attacker can persuade software running on your system to create a file named like so:
/your/top/dir/$'\n'/etc/passwd$'\n'/1-xyz.stc
If you treat each line returned by find as a filename, you would consider /etc/passwd to be part of your returned values -- a very bad thing if you then present this data to the user, delete it, etc.

Related

Need to pass argument from sys.argv[1] as input to glob() in python

I have a situation where i need to take argument from command-line and use that string (expression) I need to print files based on that regex.
I want to use glob to parse my string as i can pass regex to filter.
excerpt from python file:
dated = sys.argv[1]
files = glob.glob(dated)
This throws me empty list
> python analysis.py <some_expression>
[]
However, if I give any value manually:
dated = '*.xlsx' # example sake
files = glob.glob(dated)
print(files)
it prints:
[<list of files conforming to the required filter>]
It's obvious that the CLI arguments that it prints above, but I want the CLI argument to work properly.
I tested manually if the arguments are actually testing and it worked, so the sys.argv[1] is working but the results are not getting parsed in the glob.glob()
any ideas if I am missing something somewhere?

The issue her is not in Python, but in the shell that invokes it. Most shells I know (definitely all Linux shells) perform glob expansion before passing arguments to the executable they spawn (your Python script, in this case). This means that, at most, sys.argv[1] would contain the first file matching the glob expression you pass, and anyway, applying glob on it would not do any good.
For example, if your work directory has files a.xlsx, b.xlsx and c.xlsx, and you invoke you code using:
python mycode.py *.xlsx
Then the shell will actually glob the argument you specified, and will pass the results to your script, making the following true:
sys.argv[1:] == [`a.xlsx`, `b.xlsx`, `c.xlsx`]
In fact, instead of explicitly invoking glob, you can simply iterate on sys.argv[1:].

Python, searching own codes wrote

There are scripts wrote and all of them may of different topics. Some about documents handling, some about text extraction, some about automation.
Sometimes I forgot a usage, for example how to create a .xls file, so I want to search if in the scripts there is a line about how to do it.
What I am doing is to convert all the .py files into .txt, and combine all txt files together. Then I use Word to open this aggregated .txt file and find.
What’s the better way to search specific lines in own written codes?
**converting .py to .txt:
folder = "C:\\Python27\\"
for a in os.listdir(folder):
root, ext = os.path.splitext(a)
if ext == ".py":
os.rename(folder + a, folder + root + ".txt")
**putting all .txt together:
base_folder = "C:\\TXTs\\"
all_files = []
for each in os.listdir(base_folder):
if each.endswith('.txt'):
kk = os.path.join(base_folder, each)
all_files.append(kk)
with open(base_folder + " final.txt", 'w') as outfile:
for fname in all_files:
with open(fname) as infile:
for line in infile:
outfile.write(line)

Keep all of your code in a single directory tree, e.g. code.
If your operating system doesn't have a decent search tool like grep, install "the silver searcher". (You will also need a decent terminal emulator.)
For example (I'm using FreeBSD here), I tend to keep all my source code under src/. If I want to know in which scripts I use xls files, I type:
ag xls src/
Which retuns:
src/progs/uren/project-uren.py
24: print("Usage: {} urenbriefjesNNNN.xlsm project".format(binary),
This tells me to look at line 24 of the file src/progs/uren/project-uren.py.
If I e.g. search for Counter:
ag Counter src/
I get multiple hits:
src/scripts/public/csv2tbl.py
13:from collections import Counter
45: letters, sep = Counter(), Counter()
src/scripts/public/dvd2webm.py
15:from collections import Counter
94: rv = Counter(re.findall('crop=(\d+:\d+:\d+:\d+)', proc.stderr))
src/scripts/public/scripts-tests.py
14:from collections import Counter
26: v = Counter(rndcaps(100000)).values()

You can install rStudio, an open source IDE for the r language. In the Edit menu there is a Find in File... feature you can use just like find and replace in a word document. It will go through files in the directory you point it to...I have not yet had problems searching scripts as they are, untransformed to txt. It will search for terms or regex expressions....it is pretty useful!
As is R!

cat *.py | grep xls if you're on Linux.
Otherwise it may be helpful to keep some sort of README file with your python scripts. I, personally, prefer Markdown:
## Scripts
They do stuff
### Script A
Does stuff A, call `script_a.py -h` for more info
### Script B
Does stuff B, call `script_b.py -h` for more info
It compiles to this:
Scripts
They do stuff
Script A
Does stuff A, call script_a.py -h for more info
Script B
Does stuff B, call script_b.py -h for more info
It takes basically no time to write and Markdown can be easily used on sites such as SO, Github, Reddit and others. This very answer, in fact, is written in Markdown. But if you can't be bothered with Markdown, a simple README.txt is still much better than nothing.

The technical term for what you're trying to do is a "full text file search". Googling this together with your operating system name will give you many methods. Here is one for Windows: https://www.howtogeek.com/99406/how-to-search-for-text-inside-of-any-file-using-windows-search/.
If you're on MacOS I recommend looking into BASH command line syntax to do a bit more complex automation tasks (although what you need is also perfectly covered in Spotlight search). On Windows 10 you could check out the new Linux Subsystem that gives you the same syntax [1]. Composing small commands together using pipes and xargs in command line is a very powerful automation tool. For what you're asking I still think a full text search is the most straightforward solution, but since you're already into programming I thought I bring this up.
To demonstrate, the task you describe would be something like
find . -name "*.py" | xargs -I {} grep -H "xls" {}
This would search your working directory (and all subdirectories) for python files (using . as its first argument to find, which refers to the directory you're currently in, shown by pwd), and then search each of those python files for the string "xls". xargs takes all lines from standard input (which the pipe | gets from the last command) and converts them into command line parameters. grep -H searches files for the specified string and prints the occurrences together with the file name.
[1] I'm assuming you're not on Linux already since you like to use MS Office.

How to escape a spacebar in a path name with subprocess?

I'm trying to convert a file from .m4a to .mp3 using ffmpeg and I need to access to the music folder.
The path name of this folder is : C:\\Users\A B\Desktop\Music
I can't access it with subprocess.call() because only C:\\Users\A gets recognized. The white space is not processed.
Here's my python script :
import constants
import os
import subprocess
path = 'C:\\Users\A B\Desktop\Music'
def main():
files = sorted(os.listdir(path), key=lambda x: os.path.getctime(os.path.join(path, x)))
if "Thumbs.db" in files: files.remove("Thumbs.db")
for f in files:
if f.lower()[-3:] == "m4a":
process(f)
def process(f):
inFile = f
outFile = f[:-3] + "mp3"
subprocess.call('ffmpeg -i {} {} {}'.format('C:\\Users\A B\Desktop\Music', inFile, outFile))
main()
When I run it I get an error that states :
C:\Users\A: No such file or directory
I wonder if someones knows how to put my full path name (C:\Users\A B\Desktop\Music) in subprocess.call() ?

Beforehand edit: spaces or not, the following command line -i <directory> <infilename> <outfilename> is not correct for ffmpeg since it expects the -i option, then input file and output file, not a directory first. So you have more than one problem here (which explains the "permission denied" message you had, because ffmpeg was trying to open a directory as a file!)
I suppose that you want to:
read all files from directory
convert them all to a file located in the same directory
In that case, you could add quotes to your both input & output absolute files like this:
subprocess.call('ffmpeg -i "{0}\{1}" "{0}\{2}"'.format('C:\\Users\A B\Desktop\Music', inFile, outFile))
That would work, but that's not the best thing to do: not very performant, using format when you already have all the arguments already, you may not have knowledge of other characters to escape, etc... don't reinvent the wheel.
The best way to do it is to pass the arguments in a list so subprocess module handles the quoting/escaping when necessary:
path = r'C:\Users\A B\Desktop\Music' # use raw prefix to avoid backslash escaping
subprocess.call(['ffmpeg','-i',os.path.join(path,inFile), os.path.join(path,outFile)])
Aside: if you're the user in question, it's even better to do:
path = os.getenv("USERPROFILE"),'Desktop','Music'
and you could even run the process in the path directory with cwd option:
subprocess.call(['ffmpeg','-i',inFile, outFile],cwd=path)
and if you're not, be sure to run the script with elevated privileges or you won't get access to another user directory (read-protected)

Trouble extracting zip in python over ftp

I'm trying to unzip a file from an FTP site. I've tried it using 7z in a subprocess as well as using 7z in the older os.system format. I get closest however when I'm using the zipfile module in python so I've decided to stick with that. No matter how I edit this I seem to get one of two errors so here are both of them so y'all can see where I'm banging my head against the wall:
z = zipfile.ZipFile(r"\\svr-dc\ftp site\%s\daily\data1.zip" % item)
z.extractall()
NotImplementedError: compression type 6 (implode)
(I think this one is totally wrong, but figured I'd include.)
I seem to get the closest with the following:
z = zipfile.ZipFile(r"\\svr-dc\ftp site\%s\daily\data1.zip" % item)
z.extractall(r"\\svr-dc\ftp site\%s\daily\data1.zip" % item)
IOError: [Errno 2] No such file or directory: '\\\\svr-dc...'
The catch with this is that it is actually giving me the first file name in the zip. I can see the file AJ07242013.PRN at the end of the error so I feel closer because it's at least getting to the point of reading the contents of the zip file.
Pretty much any iteration of this that I try gets me one of those two errors, or a syntax error but that's easily addressed and not my primary concern.
Sorry for being so long winded. I'd love to get this working, so let me know what you think I need to do.
EDIT:
So 7z has finally been added to the path and is running through without any errors with both the subprocess as well as os.system. However, I still can't seem to get anything to unpack. It looks to me, from all I've read in the python documentation that I should be using the subprocess.communicate() module to extract this file but it just won't unpack. When I use os.system it keeps telling me that it cannot find the archive.
import subprocess
cmd = ['7z', 'e']
sp = subprocess.Popen(cmd, stderr=subprocess.STDOUT, stdout=subprocess.PIPE)
sp.communicate('r"\C:\Users\boster\Desktop\Data1.zip"')
I don't think that sp.communicate is right but if I add anything else to it I have too many arguments.

python's zipfile doesn't support compression type 6 (imploded) so its simply not going to work. In the first case, that's obvious from the error. In the second case, things are worse. The parameter for extractfile is an alternate unzip directory. Since you gave it the name of your zip file, a directory of the same name can't be found and zipfile gives up before getting to the not-supported problem.
Make sure you can do this with 7z on the command line, try implementing subprocess again and ask for help on that technique if you need it.
Here's a script that will look for 7z in the usual places:
import os
import sys
import subprocess
from glob import glob
print 'python version:', sys.version
subprocess.call('ver', shell=True)
print
if os.path.exists(r'C:\Program Files\7-Zip'):
print 'have standard 7z install'
if '7-zip' in os.environ['PATH'].lower():
print '...and its in the path'
else:
print '...but its not in the path'
print
print 'find in path...'
found = 0
for p in os.environ['PATH'].split(os.path.pathsep):
candidate = os.path.join(p, '7z.*')
for fn in glob(candidate):
print ' found', fn
found += 1
print
if found:
print '7z located, attempt run'
subprocess.call(['7z'])
else:
print '7z not found'

Accoring to the ZipFile documentation, you might be better off copying the zip first to your working directory. (http://docs.python.org/2/library/zipfile#zipfile.ZipFile.extract)
If you have problems copying, you might want to store the zip in a path with no spaces or protect your code against spaces by using os.path.
I made a small test in which I used os.path.abspath to make sure I had the proper path to my zip and it worked properly.
Also make sure that for extractall the path that you specify is the path where the zip content will be extracted. (If a folder that is specified is not created, it will be created automatically) Your files will be extracted in your current working directory (CWD) if no parameter is passed to extractall.
Cheers!

Managed to get this to work without using the PIPE functionality as subprocess.communicate wouldn't unpack the files. Here was the solution using subprocess.call. Hope this can help someone in the future.
def extract_data_one():
for item in sites:
os.chdir(r"\\svr-dc\ftp site\%s\Daily" % item)
subprocess.call(['7z', 'e', 'data1.zip', '*.*'])

python behaviour when passed an asterisk

I'm writing a small script, that should be able to handle multiple files. So I've added that files can be passed comma seperated, and do a arg.split(',') and then handle each one.
Now I've wanted to add asterisk as input possibility like
python myPythonScript.py -i folder/*
If I print the the argument to option -i right when I access it the first time I get
folder/firstFileInFolder.txt
But if I call my script with
python myPythonScript.py -i someFolder/someFile,folder/*
it works just fine. Does anyone have an idea, why python might behave that way?

Try to run this script
import sys
for arg in sys.argv:
print arg
python script.py *
your shell expands the asterisk before python sees it.

As mentioned in the comments, your shell is expanding the asterisk for the non-comma separated case. If you know that the user may specify an asterisk as part of a file name as in your second example, you can have Python do the path expansion by using the glob module.
from glob import glob
glob('*')
code which would allow either the shell or Python to do asterisk expansion may look something like this:
import glob
file_list = []
for pattern in sys.argv[1:]:
file_list.extend(glob.glob(pattern))
In your case, using a comma as a separator would then prevent you from using a comma as part of a filename.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.