I have a file that contains .odt files and I would like to convert them to pdf. My current function works fine, the problem is that even if the file is already converted, the function converts it again, and I do not want to convert a file if it is already converted.
Is there a way to check if the name.odt and name.pdf files already exist?
import sys
import os
import comtypes.client
import glob
def convert():
for file in glob.glob("*.odt"): # Listing all files
wdFormatPDF = 17
in_file = os.path.abspath(file)
name, ext = os.path.splitext(file)
suffix = '.pdf'
os.path.join(name + suffix)
if not os.path.exists(name): # Pdf file doesn't exist
out_file = os.path.abspath(name)
word = comtypes.client.CreateObject('Word.Application')
doc = word.Documents.Open(in_file)
doc.SaveAs(out_file, FileFormat=wdFormatPDF)
print('the file ' + name +' has been converted')
else :
print('all the file are converted')
doc.Close()
word.Quit()
There are a few things that are not right with your code. Here's the minimal modifications I made to make it work:
import sys
import os
import win32com.client
import glob
def convert():
word = win32com.client.Dispatch('Word.Application')
for input_file in glob.glob("*.odt"): # Listing all files
wdFormatPDF = 17
in_file = os.path.abspath(input_file)
name, ext = os.path.splitext(input_file)
suffix = '.pdf'
name = name + suffix
if not os.path.exists(name): # Pdf file doesn't exist
out_file = os.path.abspath(name)
doc = word.Documents.Open(in_file)
doc.SaveAs(out_file, FileFormat=wdFormatPDF)
print('the file ' + name +' has been converted')
doc.Close()
else:
print('The file ' + name + ' already exists')
print('all the file are converted')
word.Quit()
os.chdir(r"C:\Users\evensf\Documents\Question-48733924\Source")
convert()
Here are my comments about my modifications:
For some reason I couldn't understand, I wasn't able to install the comtypes module. So I used the win32com module that comes with Python for Win32 (pywin32) extensions. I think it pretty similar.
I opened the Word connector object outside of the loop. You don't really need to open and close it every time you want to open a document. I couldn't make your code work without doing that and it should speedup the execution.
I changed your variable name from file to input_file because at one time the name was already assigned to something in Python and that could spell disaster, if I remember correctly. I think this isn't as relevant today, but it's always a good habit to have descriptive name for your variables.
Your code seemed to print that all the file are converted when it find an already existant PDF file. I couldn't understand why you would want to do that. So I have put a message when the PDF file has already been created and put your message outside the loop.
Since you seem to be working with files in the local directory. I added a command to change the working directory.
But we can go further and simplify your code:
import win32com.client
import pathlib
source_directory = pathlib.Path(r"C:\Users\evensf\Documents\Question-48733924\Source")
wdFormatPDF = 17
destination_suffix = '.pdf'
word_application = win32com.client.Dispatch('Word.Application')
for current_file in source_directory.glob("*.odt"): # Listing all files
absolute_current_file = current_file.resolve()
destination_name = absolute_current_file.with_suffix(destination_suffix)
if destination_name.exists():
print('The file', destination_name, 'already exists. Not converting.')
else:
current_document = word_application.Documents.Open(str(absolute_current_file))
current_document.SaveAs(str(destination_name), FileFormat=wdFormatPDF)
print('A file has been converted to', destination_name)
current_document.Close()
print('Finished converting files')
word_application.Quit()
I used the pathlib module which has a lot of provisions to simplify your code
Related
Newbie in python here, I'm trying to get a list of file names (.wav) from an excel file, find the files with those names under some directory, and rename those wav files by the index in the list.
Here is the simple version my code:
import glob
import pandas as pd
import os
path_before = ""
path_after = ""
path_excel = ""
# the excel file with the names ("Title") of the wav files, I want to save the file name + extension
data = pd.read_excel(path_excel+"data.xlsx")
data['Title_temp'] = data['Title']+'.wav'
filelist = data['Title_temp']
# finding the wav files to be renamed
file_names_all = glob.glob(path_before+'\*')
# and get rid of the directory to just keep the names
get_name = []
for file in file_names_all:
temp_name = file.split('\\')
get_name.append(temp_name[-1])
# for the wav files to be renamed, go through the filelist from the excel file
# and if the name is on filelist, rename it to the index of filelist,
# so the new name should be a number.wav
# Also, all the file names are in the list and they all should be renamed,
# but I couldn't find a better way to do this, so the code below
for filename in get_name:
if filename in filelist:
try:
os.rename(filename, filelist.index(filename))
except:
print ('File ' + filename + ' could not be renamed!')
else: print ('File ' + filename + ' could not be found!')
I printed the file names out for both the directory and the excel list, they match (with the .wav extensions and everything), but when I run the code, I always get an error that the filename could not be found. Could somebody tell me what's wrong? (The codes are written in windows jupyter notebook)
Assuming your code is working and you need to reduce its size and complexity, here is another version.
import pandas as pd
import os
path_before = "path to the files to be renamed"
path_excel = "path to the excel file"
data = pd.read_excel('{}/data.xlsx'.format(path_excel))
files_to_rename = os.listdir(path_before)
for index, file_title in enumerate(data['Title']):
try:
# as per your code, the excel file has file names without extension, hence adding .wav extension in formatted string'
if '{}.wav'.format(file_title) in files_to_rename:
os.rename(os.path.join(path_before, '{}.wav'.format(file_title)), os.path.join(path_before, '{}.wav'.format(index)))
else:
print("File {}.wav not found in {}".format(file_title, path_before))
except Exception as ex:
print("Cannot rename file: {}".format(file_title))
I have an issue regarding changing a .doc or .docx filename according to a certain text inside the document.
I have been able to establish this function with .txt files. With the following code:
import os
import re
pat = "ID number(\\d\\d\\d\\d\\d)" #This is for the text to be found in the file
ext = '.txt' #Type of file the python is searching for
mydir = '' #Path or directory where python is doing its magic
for arch in os.listdir(mydir):
archpath = os.path.join(mydir, arch)
with open(archpath) as f:
txt = f.read()
s = re.search(pat, txt)
if s is None:
continue
name = s.group(1)
newpath = os.path.join(mydir, name)
if not os.path.exists(newpath):
os.rename(archpath, newpath + ext)
Anyone have any takes on this?
you will need python-docx
from docx import Document
for arch in os.listdir(mydir):
archpath = os.path.join(mydir, arch)
document = Document(archpath)
for para in document.paragraphs:
s = re.search(pat, para.text)
if s is None:
continue
name = s.group(1)
newpath = os.path.join(mydir, name)
if not os.path.exists(newpath):
os.rename(archpath, newpath+'docx')
The answer was found. The issue was on my end. I was trying to find a value. But what i needed was to specify an cell. Since the value was in a table.
Here is the result:
import os
import re
import sys
pat = "(\d+)" #Type of string/value that is being renamed
ext = '.docx' #Type of file the python is searching for
mydir = '' #Path or directory where python is doing its magic
from docx import Document
for arch in os.listdir(mydir):
archpath = os.path.join(mydir, arch)
document = Document(archpath)
table = document.tables[0]
s = re.search(pat,table.cell(1,2).text)
if s is None:
continue
name = s.group(1)
newpath = os.path.join(mydir, name)
if not os.path.exists(newpath):
os.rename(archpath, newpath + ext)
print (newpath + ext)
input("Press Enter to exit")
It needs to be taken in account that this method is only usable with .docx files that are usable with word 2007 and later. Since python-docx does not work with earlier versions or .doc files
So my next project is to get implemented an converter from .doc to .docx
Thank you for everyones participation.
import os
def rename_files():
# (1) get file names from a folder
file_list = os.listdir(r"C:\Users\USEER\Desktop\Udacity\Udacity - Programming Foundation with Python\Project\prank\prank")
# print(file_list)
saved_path = os.getcwd()
print("Current Working Directory is " + saved_path)
os.chdir(r"C:\Users\USEER\Desktop\Udacity\Udacity - Programming Foundation with Python\Project\prank\prank")
# (2) for each file, rename file name
for file_name in file_list:
print("Old Name - " + file_name)
print("New Name - " + file_name.translate("0123456789"))
os.rename(file_name, file_name.translate("0123456789"))
os.chdir(saved_path)
rename_files()
The code above doesn't rename the file by removing the integers. Can anyone help? (Python 3x)
import re
new_name = re.sub('[0-9]', '', file_name)
In Python 3 the String.translate is gone. Therefore you need to use the str.translate. It needs 'str.maketrans' which normally creates a translation table with the first two arguments supplied(not needed in this example), the third argument supplies the characters to be stripped.
This line should have the desired effect ...
os.rename(file_name, file_name.translate(str.maketrans('','','0123456789'))
Previous suggestions used .strip() however in this case as the numbers are mixed in with the filenames (not before or after) I believe it would not work, another used Regular Expressions which is perfectly valid, however within the context of this particular Udacity course translate was the suggested solution.
Here are the docs for maketrans :
[https://docs.python.org/3/library/stdtypes.html#str.maketrans][1]
The problem is in your translate function that doesn't do anything. There are better options available, but if your want to use translate then the proper syntax is:
#!/usr/bin/env python2
import string
new_name = string.translate(file_name, None, "0123456789")
Here is one way of renaming files.
Use os.renames to rename the files
Use file_name.strip("0123456789") to remove numbers
Code is given below:
import os
def file_rename():
name_list=os.listdir(r"C:\python\prank")
print(name_list)
saved_path=os.getcwd()
print("Current working directory is"+saved_path)
os.chdir(r"C:\python\prank")
for file_name in name_list:
print("old name"+file_name)
print("new name"+file_name.strip("0123456789"))
os.renames(file_name,file_name.strip("0123456789"))
os.chdir(saved_path)
file_rename()
To read more about os.renames check here.
To read more about the strip function, check here.
Another way to do this without import re. Instead of utilizing the .translate, use the .strip.
os.rename(file_name, file_name.strip('0123456789'))
Another observation is that your code wont read the new file name after changing it. At the top of your code you are reading file names and saving those name in file_list
# (1) get file names from a folder
file_list = os.listdir(r"C:\Users\USEER\Desktop\Udacity\Udacity - Programming Foundation with Python\Project\prank\prank")
In the for loop, where you are changing the name of each file, YOU ARE NOT reading the new file's name. You need to do something like this.
# (2) for each file, rename file name
for file_name in file_list:
print("Old Name - " + file_name)
os.rename(file_name, file_name.strip("0123456789"))
# (3) read file's name again... 'file_list' has old names
new_file_list = os.listdir(r"C:\Users\USEER\Desktop\Udacity\Udacity - Programming Foundation with Python\Project\prank\prank")
for file_name in new_file_list:
print("New file's name: " + new_file_name)
os.chdir(saved_path)
import os
def rename_files():
#1 Get file names from the folder
file = os.listdir(r"C:\Web\Python\prank")
print(file)
saved_path = os.getcwd()
print("Current Working Directory is"+saved_path)
os.chdir(r"C:\Web\Python\prank")
#2 For each file name rename file names
for file_name in file:
print("Old Name - " + file_name)
os.rename(file_name,file_name.strip("0123456789"))
print("New Name - " + file_name)
os.chdir(saved_path)
rename_files()
enter code here
import os
dir="/home/lucidvis/myPythonHome/prank/"
def rename_files():
# get file names from a folder
filenames = os.listdir(dir)
# print(filenames)
for file in filenames:
#print(file)
try:
#with os.open(filename for filename in filenames,"r+"):
#read_file = filename.read()
# new_name = file.translate(str.maketrans('','', "0123456789"))
new_filname = (file.translate(str.maketrans('','', "0123456789"))).replace(" ","")
#print(dir+new_file_name)
os.rename(dir+file,dir+new_file_name)
except SyntaxError as e:
print(e)
continue
# for each file, rename filename
rename_files()
import os
dir="/home/lucidvis/myPythonHome/prank/"
def rename_files():
# get file names from a folder
filenames = os.listdir(dir)
# iterate through the list of filenames
for file in filenames:
#print(file)
try:
#assign a variable to new names for easy manipulation
new_filname = (file.translate(str.maketrans('','', "0123456789"))).replace(" ","")
#concatenating the directory name to old and new file names
os.rename(dir+file,dir+new_file_name)
#just to manage errors
except SyntaxError as e:
print(e)
continue
#file renaming function call
rename_files()
will guys i was trying to solve this problem because i see udacity online courses and it require to rename file without numbers thanks to simon for his replay i have to figured it out
this is my code to rename files without numbers, hope it jhelp anyone who stuck
import os
import re
def rename():
#get the list of the photo name
plist = os.listdir(r"D:\pay\prank")
print(plist)
#removing the numbers from the photo names
os.chdir(r"D:\pay\prank")
for pname in plist :
os.rename(pname, re.sub('[0-9]', '' , pname))
print(pname)
rename()
import os
import re
from string import digits
#Get file names
file_list = os.listdir(r"C:\Users\703305981\Downloads\prank\prank")
print(file_list)
#chenage directory.
os.chdir(r"C:\Users\703305981\Downloads\prank\prank")
print (os.getcwd())
#Change the Name.
for file_name in file_list:
os.rename(file_name, re.sub(r'[0-9]+', '', file_name))
I would like to save the output in the same location as the input but using a different, but still related, name.
Minimal Example:
This script searches for lines containing the string "NFS" in the input file. Then, it prints the results to another file.
I would like to save the print result to an output file in the same location as the input file but using a different name like "inputfilename.out.csv".
Here is the code :
from __future__ import print_function
import sys
fname = sys.argv[1]
out = open(fname.out.csv, "w") #this doesn't work but this is the idea
with open(fname) as file:
reader = file.readlines()
for line in reader:
if "NFS" in line:
print(line, file = out)
Any suggestions ?
You could use os.path.splitext() to extract extension:
import os
name, ext = os.path.splitext(fname)
output_filename = name + ".out" + ext
Or if you want to change the name completely, you could use os.path.dirname() to get the parent directory:
import os
dirpath = os.path.dirname(fname)
output_filename = os.path.join(dirpath, "outputfile.txt")
Use concatenation to add ".out.csv" to the end of your string.
out = open(fname + ".out.csv", "w")
If the input filename is "inputfilename", then the output will be "inputfilename.out.csv". This may be undesirable behavior if the input filename already has an extension. Then "inputfilename.csv" will become "inputfilename.csv.out.csv". In which case, you may wish to use os.path to construct the new name.
import os.path
filename = "inputfilename.csv"
root, extension = os.path.splitext(filename)
output_filename = root + ".out" + extension
print output_filename
Result:
inputfilename.out.csv
I am trying to take a folder which contains 9 files, each containing FASTA records of separate genes, and remove duplicate records. I want to set it up so that the script is called with the folder that contains the genes as the first parameter, and a new folder name to rewrite the new files without duplicates to. However, if the files are stored in a folder called results within the current directory it is not letting me open any of the gene files within that folder to process them for duplicates. I have searched around and it seems that I should be able to call python's open() function with a string of the file name like this:
input_handle = open(f, "r")
This line is not allowng me to open the file to read its contents, and I think it may have something to do with the type of f, which shows to be type 'str' when I call type(f)
Also, if I use the full path:
input_handle = open('~/Documents/Research/Scala/hiv-biojava-scala/results/rev.fa', "r")
It says that no such file exists. I have checked my spelling and I am sure that the file does exist. I also get that file does not exist if I try to call its name as a raw string:
input_handle = open(r'~/Documents/Research/Scala/hiv-biojava-scala/results/rev.fa', "r")
Or if I try to call it as the following it says that no global results exists:
input_handle = open(os.path.join(os.curdir,results/f), "r")
Here is the full code. If anybody knows what the problem is I would really appreciate any help that you could offer.
#!/usr/bin/python
import os
import os.path
import sys
import re
from Bio import SeqIO
def processFiles(files) :
for f in files:
process(f)
def process(f):
input_handle = open(f, "r")
records = list(SeqIO.parse(input_handle, "fasta"))
print records
i = 0
while i < len(records)-1:
temp = records[i]
next = records[i+1]
if (next.id == temp.id) :
print "duplicate found at " + next.id
if (len(next.seq) < len(temp.seq)) :
records.pop(i+1)
else :
records.pop(i)
i = i + 1
output_handle = open("out.fa", "w")
for record in records:
SeqIO.write(records, output_handle, "fasta")
input_handle.close()
def main():
input_folder = sys.argv[1]
out_folder = sys.argv[2]
if os.path.exists(out_folder):
print("Folder %s exists; please specify empty folder or new one" % out_folder)
sys.exit(1)
os.makedirs(out_folder)
files = os.listdir(input_folder)
print files
processFiles(files)
main()
Try input_handle = open(os.path.join(os.getcwd,results/f), "r"). os.curdir returns . See mail.python.org/pipermail/python-list/2012-September/631864.html.