I'm learning how to use argparse and it's a labyrinth for me.
I have a code that works: if I run python Test.py . it prints all files in hierarchy using this code
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import os
import argparse
import sys
import glob
#parser = argparse.ArgumentParser()
#parser.add_argument('-input', dest='input',help="input one or more files",nargs='+',metavar=None
#args = parser.parse_args()
def dirlist(path, c = 1):
for i in glob.glob(os.path.join(path, "*")):
if os.path.isfile(i):
filepath, filename = os.path.split(i)
print ('----' *c + filename)
elif os.path.isdir(i):
dirname = os.path.basename(i)
print ('----' *c + dirname)
c+=1
dirlist(i,c)
c-=1
#path = os.path.normpath(args.input)
path = os.path.normpath(sys.argv[1])
print(os.path.basename(path))
dirlist(path)
But, as I want to understand how argparse works I want to run the code using python Test.py - input .
But nothing works.
I know I'm close, I have written a sort of Frankenstein code which is commented.
Where am I wrong? I feel I'm so close to the solution...
Thank you #match for the right tips.
Problem was I was using nargs='+' in the argparse definition
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import os
import argparse
import sys
import glob
parser = argparse.ArgumentParser()
parser.add_argument('-input', dest='input',help="input one or more files",metavar=None)
args = parser.parse_args()
def dirlist(path, c = 1):
for i in glob.glob(os.path.join(path, "*")):
if os.path.isfile(i):
filepath, filename = os.path.split(i)
print ('----' *c + filename)
elif os.path.isdir(i):
dirname = os.path.basename(i)
print ('----' *c + dirname)
c+=1
dirlist(i,c)
c-=1
path = os.path.normpath(args.input)
print(os.path.basename(path))
dirlist(path)
The code works now!
Related
I have a huge directory of pdf files that I need to parse into xml files. Those xml files would then need to be converted into an xlsx (using a pandas df). I have written the code for the latter and it is working, but I am stuck on figuring out this for-loop.
Here is the loop:
import io
from xml.etree import ElementTree
from pprint import pprint
import os
from os.path import isfile, join
import pandas as pd
from os import listdir
directory = '/jupyter/pdf_script_test/pdf_files_test'
i = 1
for filename in os.listdir(directory):
print(filename)
if filename.endswith('.pdf'):
pathname = os.path.join(directory, filename)
# attempt to assign variable name
filename = 'new_output%s' %i
os.system('dumppdf.py -a' + pathname + '>new_output.xml')
i = i + 1
else:
print('invalid pdf file')
So I can see pretty quickly that each time the loop iterates, it will overwrite "new_output.xml" with the previous pdf file. I was trying to find a way to maybe assign a variable name or maybe create a nested loop that would help fix the problem. My biggest hang up is how to incorporate dumppdf.py into this loop.
Maybe a nested loop that looks something like this:
# code from above here...
data = os.system('dumppdf.py -a' + pathname) # etc..
with open('data' + str(i) + '.xml', 'w') as outfile:
f.write()
Here is how I ended up solving my problem:
import io
from xml.etree import ElementTree
from pprint import pprint
import os
from os.path import isfile, join
import pandas as pd
from os import listdir
directory = '/jupyter/pdf_script_test/pdf_files_test/'
patient_number = 1 #variable name
#loop over pdf files and write a new .xml for each pdf file.
for filename in os.listdir(directory):
print(filename)
if filename.endswith(".pdf"):
pathname = os.path.join(directory, filename)
#Run dumppdf.py on the pdf file and write the .xml using the assigned variable
data = os.system('dumppdf.py -a ' + pathname + ' > ' + str(patient_number) + '.xml')
patient_number = patient_number + 1
else:
print("invaild pdf file")
This is an easy question.
I'm using glob to print the full hierarchy of a folder using this code:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import os
import argparse
import sys
import glob
parser = argparse.ArgumentParser()
parser.add_argument('-input', dest='input',help="input one or more files",metavar=None)
args = parser.parse_args()
def dirlist(path):
for i in glob.glob(os.path.join(path, "*")):
if os.path.isfile(i):
print (i)
elif os.path.isdir(i):
dirname = os.path.basename(i)
dirlist(i)
path = os.path.normpath(args.input)
dirlist(path)
It works pretty well, you just need to run python3 Test.py -input ..
As you can see by the argparse help description I would like to input directories but also single files.
I don't think this can be done using glob, is that right?
Can you suggest me a library that could help me print the full hierarchy of both directories and files?
I found here a long list of globe examples but they all seems to work for directories, not when you input a single file
Is nearly midnight but at least I found the solution after 2 days:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import os
import argparse
import sys
import glob
parser = argparse.ArgumentParser()
parser.add_argument('-input', dest='input',help="input one or more files",metavar=None)
args = parser.parse_args()
def dirlist(path):
if os.path.isfile(path):
print (path)
elif os.path.isdir(path):
for i in glob.glob(os.path.join(path, "*")):
if os.path.isfile(i):
print (i)
elif os.path.isdir(i):
dirname = os.path.basename(i)
dirlist(i)
path = os.path.normpath(args.input)
dirlist(path)
I am actually trying to remove or rename files name which contains question mark in python.
Does anyone would have some guidance or experience to share?
my files are like this:
??test.txt
?test21.txt
test??1.txt
the result that I am trying to get is:
test.txt
test21.txt
test1.txt
thank you for your help.
AL
Below the code which I have attempted using the suggestion below:
#!/usr/bin/python
import sys, os, glob
for iFiles in glob.glob('*.txt'):
print (iFiles)
os.rename(iFiles, iFiles.replace("?",''))
This should do what you require.
import os
import sys
import argparse
parser = argparse.ArgumentParser(description='Rename files by replacing all instances of a character from filename')
parser.add_argument('--dir', help='Target DIR containing files to rename', required=True)
parser.add_argument('--value', help='Value to search for an replace', required=True)
args = vars(parser.parse_args())
def rename(target_dir, rep_value):
try:
for root, dirs, files in os.walk(target_dir):
for filename in files:
if rep_value in filename:
filename_new = str(filename).replace(rep_value, '')
os.rename(os.path.join(root, filename), os.path.join(root, filename_new))
print '{} renamed to {}'.format(filename, os.path.join(root, filename_new))
except Exception,e:
print e
target_dir = args['dir']
rep_value = args['value']
rename(target_dir, rep_value)
Example usage:
rename.py --dir /root/Python/ --value ?
Output
?test.txt renamed to /root/Python/test.txt
?test21.txt renamed to /root/Python/test21.txt
test1?.txt renamed to /root/Python/test1.txt
I'm trying to delete some archives in a folder.
Here is what I've written to do that:
import sys
import os
from os import listdir
from os.path import join
dir_path = os.path.dirname(os.path.realpath(__file__))
for file in dir_path:
if (file.endswith(".gz")) or (file.endswith(".bz2")):
os.remove(join((dir_path), file))
print("Removed file.")
print("Done.")
When I run the module, it just prints "Done." but deletes no files, even though there are files with that extension in the same directory as the module.
Can't figure out what I'm doing wrong, help?
It looks like you missed os.listdir(dir_path) in the for-loop.
This seems to have worked:
import sys
import os
from os import listdir
from os.path import join
dirdir = "/Users/kosay.jabre/Desktop/Programming/Password List"
dir_path = os.listdir(dirdir)
for file in dir_path:
if (file.endswith(".gz")) or (file.endswith(".bz2")):
os.remove(file)
print("Done.")
Can anyone guide me how can I get file path if we pass file from command line argument and extract file also. In case we also need to check if the file exist into particular directory
python.py /home/abhishek/test.txt
get file path and check test.txt exist into abhishek folder.
I know it may be very easy but I am bit new to pytho
import os
import sys
fn = sys.argv[1]
if os.path.exists(fn):
print os.path.basename(fn)
# file exists
Starting with python 3.4 you can use argparse together with pathlib:
import argparse
from pathlib import Path
parser = argparse.ArgumentParser()
parser.add_argument("file_path", type=Path)
p = parser.parse_args()
print(p.file_path, type(p.file_path), p.file_path.exists())
I think the most elegant way is to use the ArgumentParser This way you even get the -h option that helps the user to figure out how to pass the arguments. I have also included an optional argument (--outputDirectory).
Now you can simply execute with python3 test.py /home/test.txt --outputDirectory /home/testDir/
import argparse
import sys
import os
def create_arg_parser():
# Creates and returns the ArgumentParser object
parser = argparse.ArgumentParser(description='Description of your app.')
parser.add_argument('inputDirectory',
help='Path to the input directory.')
parser.add_argument('--outputDirectory',
help='Path to the output that contains the resumes.')
return parser
if __name__ == "__main__":
arg_parser = create_arg_parser()
parsed_args = arg_parser.parse_args(sys.argv[1:])
if os.path.exists(parsed_args.inputDirectory):
print("File exist")
Use this:
import sys
import os
path = sys.argv[1]
# Check if path exits
if os.path.exists(path):
print "File exist"
# Get filename
print "filename : " + path.split("/")[-1]