The program txt.py prints that takes in a directory as a command line parameter and prints all the files of a particular extension
def printAll(path):
for txtFile in glob.glob(path):
print txtFile
printAll(sys.argv[1])
In the command line I type in python txt.py /home/Documents/*.txt
It prints only the first txt file.
How do I print all txt files in that directory given we pass the directory from command line?
At the command line the * is a shell wildcard, your shell expands it and passes a list of files to the script, e.g.
python txt.py "/home/Documents/1.txt" "/home/Documents/2.txt" "/home/Documents/3.txt"
You only look at the first argument, so only print one file.
You need to escape the * in your shell so it gets through to Python as a single argument with an asterisk. In bash shell, maybe:
python txt.py "/home/Documents/\*.txt"
Edit: As an alternative you could just take the directory path on the command line, and add the *.txt part in your program, e.g.
def printAll(path):
path = path + "/*.txt"
for txtFile in glob.glob(path):
print txtFile
printAll(sys.argv[1])
call it with:
$ python txt.py "/home/Documents"
Edit 2: How about passing an example file in the path and the script can find files with the same file extension as that? It doesn't even have to exist. That sounds fun:
import os, sys
def printAll(path, fileext):
query = os.path.join(path, '*.' + fileext)
for someFile in glob.glob(query):
print someFile
printAll(sys.argv[1], sys.argv[2])
call it with
$ python txt.py /home/Documents txt
(os.path.join is a utility that adds / if it's needed)
Or how about passing an example file in the path and the script can find files with the same file extension as that? It doesn't even have to exist. That sounds fun:
import os, sys
def printAll(path):
searchdir, filename = os.path.split(path)
tmp, fileext = os.path.splitext(filename)
path = os.path.join(searchdir, '*'.fileext)
for someFile in glob.glob(path):
print someFile
printAll(sys.argv[1])
call it with:
$ python txt.py "/home/Documents/example.txt"
Thanks to how the shell works, you are actually passing in the entire list of files, but only using the first one. This is because the shell will expand the * character and then as a result, all *.txt files are passed to your script.
If you simply do this in your script:
for i in sys.argv[1:]:
print(i)
You will see the list your program is supposed to print. To avoid this, you have a few options:
Quote your argument "/home/Documents/*.txt"
Pass only the extension part, python txt.py /home/Documents/.txt, and in your script:
def print_all(path):
path = path[:path.rfind('/')+1]+'*'+path[path.rfind('/')+1:]
for i in glob.glob(path):
print(i)
Pass two arguments, the path, and then the extension python txt.py /home/Documents/ .txt and then join them together:
print_all(sys.argv[1]+'*'+sys.argv[2])
Related
I want to run a bash script from a python program. The script has a command like this:
find . -type d -exec bash -c 'cd "$0" && gunzip -c *.gz | cut -f 3 >> ../mydoc.txt' {} \;
Normally I would run a subprocess call like:
subprocess.call('ls | wc -l', shell=True)
But that's not possible here because of the quoting signs. Any suggestions?
Thanks!
While the question is answered already, I'll still jump in because I assume that you want to execute that bash script because you do not have the functionally equivalent Python code (which is lees than 40 lines basically, see below).
Why do this instead the bash script?
Your script now is able to run on any OS that has a Python interpreter
The functionality is a lot easier to read and understand
If you need anything special, it is always easier to adapt your own code
More Pythonic :-)
Please bear in mind that is (as your bash script) without any kind of error checking and the output file is a global variable, but that can be changed easily.
import gzip
import os
# create out output file
outfile = open('/tmp/output.txt', mode='w', encoding='utf-8')
def process_line(line):
"""
get the third column (delimiter is tab char) and write to output file
"""
columns = line.split('\t')
if len(columns) > 3:
outfile.write(columns[3] + '\n')
def process_zipfile(filename):
"""
read zip file content (we assume text) and split into lines for processing
"""
print('Reading {0} ...'.format(filename))
with gzip.open(filename, mode='rb') as f:
lines = f.read().decode('utf-8').split('\n')
for line in lines:
process_line(line.strip())
def process_directory(dirtuple):
"""
loop thru the list of files in that directory and process any .gz file
"""
print('Processing {0} ...'.format(dirtuple[0]))
for filename in dirtuple[2]:
if filename.endswith('.gz'):
process_zipfile(os.path.join(dirtuple[0], filename))
# walk the directory tree from current directory downward
for dirtuple in os.walk('.'):
process_directory(dirtuple)
outfile.close()
Escape the ' marks with a \.
i.e. For every: ', replace with: \'
Triple quotes or triple double quotes ('''some string''' or """some other string""") are handy as well. See here (yeah, its python3 documentation, but it all works 100% in python2)
mystring = """how many 'cakes' can you "deliver"?"""
print(mystring)
how many 'cakes' can you "deliver"?
I'm trying to convert a file from .m4a to .mp3 using ffmpeg and I need to access to the music folder.
The path name of this folder is : C:\\Users\A B\Desktop\Music
I can't access it with subprocess.call() because only C:\\Users\A gets recognized. The white space is not processed.
Here's my python script :
import constants
import os
import subprocess
path = 'C:\\Users\A B\Desktop\Music'
def main():
files = sorted(os.listdir(path), key=lambda x: os.path.getctime(os.path.join(path, x)))
if "Thumbs.db" in files: files.remove("Thumbs.db")
for f in files:
if f.lower()[-3:] == "m4a":
process(f)
def process(f):
inFile = f
outFile = f[:-3] + "mp3"
subprocess.call('ffmpeg -i {} {} {}'.format('C:\\Users\A B\Desktop\Music', inFile, outFile))
main()
When I run it I get an error that states :
C:\Users\A: No such file or directory
I wonder if someones knows how to put my full path name (C:\Users\A B\Desktop\Music) in subprocess.call() ?
Beforehand edit: spaces or not, the following command line -i <directory> <infilename> <outfilename> is not correct for ffmpeg since it expects the -i option, then input file and output file, not a directory first. So you have more than one problem here (which explains the "permission denied" message you had, because ffmpeg was trying to open a directory as a file!)
I suppose that you want to:
read all files from directory
convert them all to a file located in the same directory
In that case, you could add quotes to your both input & output absolute files like this:
subprocess.call('ffmpeg -i "{0}\{1}" "{0}\{2}"'.format('C:\\Users\A B\Desktop\Music', inFile, outFile))
That would work, but that's not the best thing to do: not very performant, using format when you already have all the arguments already, you may not have knowledge of other characters to escape, etc... don't reinvent the wheel.
The best way to do it is to pass the arguments in a list so subprocess module handles the quoting/escaping when necessary:
path = r'C:\Users\A B\Desktop\Music' # use raw prefix to avoid backslash escaping
subprocess.call(['ffmpeg','-i',os.path.join(path,inFile), os.path.join(path,outFile)])
Aside: if you're the user in question, it's even better to do:
path = os.getenv("USERPROFILE"),'Desktop','Music'
and you could even run the process in the path directory with cwd option:
subprocess.call(['ffmpeg','-i',inFile, outFile],cwd=path)
and if you're not, be sure to run the script with elevated privileges or you won't get access to another user directory (read-protected)
I have multiple text files in a certain subdirectory. All the text files are the same size, same amount of content, etc.
I do not know how to write a python script that takes an input file and can run from the Terminal. For text files 'file1.txt','file2.txt','file3.txt','file4.txt' in \subdirectory, there should be a way to run the script with
python script.py --inputfile file1.txt, file2.txt, file3.txt, file4.txt
or something like
python script.py (something) \subdirectory
should input all text files into the python script and run. How does one do this?
I usually just go to the local subdirectory and run the file from there, i.e.
import os
path = "/Users/name/desktop"
os.chdir(path)
filename = "file.txt"
f = open(filename, 'r')
output = f.read()
And 'output' will be the text file. I'm not sure how to write this so that in runs in the command line.
quick and dirty:
import sys
files = sys.argv
for f in files:
print f #or read the files or whatever
If you call this program (say, script.py) like so:
python script.py file1.txt file2.txt file3.txt
the output will be
file1.txt
file2.txt
file3.txt
Now, a much nicer way (but with slightly more code) can be achieved with
import argparse
you can read about that module.
If you want semantics with some sugar you could use argparse,
but you can also process both files and directories given on the command line
import sys
import os
import glob
def handle_file(filename):
# whatever you want to do with the file named filename e.g.
print(filename)
for name in sys.argv[1:]:
if name.find('.txt') != -1:
handle_file(name)
else:
for filename in glob.glob(os.path.join(name, '*.txt')):
handle_file(filename)
Given a subdirectory 'files' with files 'foo1.txt' 'foo2.txt', both
python script.py files/foo1.txt files/foo2.txt
and
python script.py files
call the handler for all respective .txt files.
Take a look at the 'argparse' module, the first example in the docs has a solution for your use case: https://docs.python.org/2/library/argparse.html
How to do write a Python script that inputs all files from a certain
subdirectory from command line?
You can use the wildcard * on the command line to get all the files in a directory:
$ python prog.py ./path/to/subdir/*.*
In a python program, sys.argv is a list of all the arguments passed on the command line:
import sys
for fname in sys.argv[1:]:
with open(fname) as f:
print(f.read())
I am a novice to python scripting. I have a script that I am hoping to run on all files in a directory. I found very helpful advice in this thread. However, I am having difficulty in determining how to format the actual script so that it retrieves the filename of the file that I want to run the script on in the command prompt, i.e. "python script.py filename.*" I've tried my best at looking through the Python documentation and the forums in this site and have come up empty (I probably just don't know what keywords I should be searching).
I am currently able to run my script on one file at a time, and output it with a new file extension using the following code, but this way I can only do one file at a time. I'd like to be able to iterate over the whole directory using 'GENE.*':
InFileName = 'GENE.303'
InFile = open(InFileName, 'r') #opens a pipeline to the file to be read line by line
OutFileName = InFile + '.phy'
OutFile = open(OutFileName, 'w')
What can I do to the code to allow myself to use an iteration through the directory similar to what is done in this case? Thank you!
You are looking for:
import sys
InFileName = sys.argv[1]
See the documentation.
For something more sophisticated, take a look at the optparse and argparse modules (the latter is preferable but is only available in newer versions of Python).
You have quite a few options to process a list of files using Python:
You can use the shell expansion facilities of your command line to pass more filenames to your script and then iterate the command line arguments:
import sys
def process_file(fname):
with open(fname) as f:
for line in f:
# TODO: implement
print line
for fname in sys.argv[1:]:
process_file(fname)
and call it like:
python my_script.py * # expands to all files in the directory
You can also use the glob module to do this expansion:
import glob
for fname in glob.glob('*'):
process_file(fname)
I am running 32-bit Windows 7 and Python 2.7.
I am trying to write a command line Python script that can run from CMD. I am trying to assign a value to sys.argv[1]. The aim of my script is to calculate the MD5 hash value of a file. This file will be inputted when the script is invoked in the command line and so, sys.argv[1] should represent the file to be hashed.
Here's my code below:
import sys
import hashlib
filename = sys.argv[1]
def md5Checksum(filePath):
fh = open(filePath, 'rb')
m = hashlib.md5()
while True:
data = fh.read(8192)
if not data:
break
m.update(data)
return m.hexdigest()
# print len(sys.argv)
print 'The MD5 checksum of text.txt is', md5Checksum(filename)
Whenver I run this script, I receive an error:
filename = sys.argv[1]
IndexError: list index out of range
To call my script, I have been writing "script.py test.txt" for example. Both the script and the source file are in the same directory. I have tested len(sys.argv) and it only comes back as containing one value, that being the python script name.
Any suggestions? I can only assume it is how I am invoking the code through CMD
You should check that in your registry the way you have associated the files is correct, for example:
[HKEY_CLASSES_ROOT\Applications\python.exe\shell\open\command]
#="\"C:\\Python27\\python.exe\" \"%1\" %*"
The problem is in the registry. Calling python script.py test.txt works, but this is not the solution. Specially if you decide to add the script to your PATH and want to use it inside other directories as well.
Open RegEdit and navigate to HKEY_CLASSES_ROOT\Applications\python.exe\shell\open\command. Right click on name (Default) and Modify. Enter:
"C:\Python27\python.exe" "%1" %*
Click OK, restart your CMD and try again.
try to run the script using python script.py test.txt, you might have a broken association of the interpreter with the .py extention.
Did you try sys.argv[0]? If len(sys.argv) = 0 then sys.argv[1] would try to access the second and nonexistent item