Python script not working as intended when called by task scheduler - python

I'm a beginner at Python and this site. Sorry if this might be simple.
I have modified a python script that calculates the amount of words in a pdf file "Master.pdf" an writes the time and date plus the amount of words to a .txt file.
I have Python2.7 installed, I have installed Anancoda and I am using the PyCharm editor. When I open my PyCharm editor and run this script, no problems arise, the script executes and everything works.
As I would like this script to run every 15 minutes, I have made it a task using Task Scheduler. The task is "Start a program" the program is:
- C:\Users\alkare\AppData\Local\Continuum\anaconda2\python.exe - and the argument is - "C:/Users/alkare/Desktop/Report/WordCount.py" -.
whenever it runs I see the command prompt open, some text fly across my screen and then the command line terminal closes, BUT no changes are done to my .txt file.
here is the code I am using saved as "WordCount.py":
#!/usr/bin/env python2.7
import os
import sys
import re
import datetime
import PyPDF2
def getPageCount(pdf_file):
pdfFileObj = open(pdf_file, 'rb')
pdfReader = PyPDF2.PdfFileReader(pdfFileObj)
pages = pdfReader.numPages
return pages
def extractData(pdf_file, page):
pdfFileObj = open(pdf_file, 'rb')
pdfReader = PyPDF2.PdfFileReader(pdfFileObj)
pageObj = pdfReader.getPage(page)
data = pageObj.extractText()
return data
def getWordCount(data):
data = data.split()
return len(data)
def main():
pdfFile = 'Master.pdf'
# get the word count in the pdf file
totalWords = 0
numPages = getPageCount(pdfFile)
for i in range(numPages):
text = extractData(pdfFile, i)
totalWords += getWordCount(text)
Now = datetime.datetime.now().strftime("%Y%m%d-%H%M%S")
f = open("TrackingTimeData.txt", "a")
f.write(Now[0:4] + "\t" + Now[4:6] + "/" + Now[6:8] + "\t" + Now[9:11] + ":" + Now[11:13] + "\t" + str(totalWords) + "\n")
f.close()
if __name__ == '__main__':
main()

The problem is that you are allowing the program to fail without providing you any meaningful output (it sounds like it hits an exception and closes).
Instead of just calling main() without guarding it in a try block:
if __name__ == '__main__':
main()
give yourself some slack here to gather information:
if __name__ == '__main__':
try:
main()
except Exception as e:
print("Error {}".format(e))
# drop into a command-prompt debugger:
import pdb
pdb.set_trace()
# slightly more old-school, pause the window to read the exception:
import time
time.sleep(15)
# throwback to DOS windows
import os
os.system('pause')
# read the error, come back to stackoverflow and describe the problem more, etc.
For example, mixing this with task scheduler, you'd want to right-click on your python.exe in Windows, go to properties, set "Run as Administrator" because maybe you're getting an access denied trying to read/write to a .PDF in some special directory. This is just an example of the many guesses people could throw in to randomly help you solve an issue versus knowing exactly what the error is.

Related

how to do docx to pdf conversion using python library without subprocess in linux? [duplicate]

I'am tasked with converting tons of .doc files to .pdf. And the only way my supervisor wants me to do this is through MSWord 2010. I know I should be able to automate this with python COM automation. Only problem is I dont know how and where to start. I tried searching for some tutorials but was not able to find any (May be I might have, but I don't know what I'm looking for).
Right now I'm reading through this. Dont know how useful this is going to be.
A simple example using comtypes, converting a single file, input and output filenames given as commandline arguments:
import sys
import os
import comtypes.client
wdFormatPDF = 17
in_file = os.path.abspath(sys.argv[1])
out_file = os.path.abspath(sys.argv[2])
word = comtypes.client.CreateObject('Word.Application')
doc = word.Documents.Open(in_file)
doc.SaveAs(out_file, FileFormat=wdFormatPDF)
doc.Close()
word.Quit()
You could also use pywin32, which would be the same except for:
import win32com.client
and then:
word = win32com.client.Dispatch('Word.Application')
You can use the docx2pdf python package to bulk convert docx to pdf. It can be used as both a CLI and a python library. It requires Microsoft Office to be installed and uses COM on Windows and AppleScript (JXA) on macOS.
from docx2pdf import convert
convert("input.docx")
convert("input.docx", "output.pdf")
convert("my_docx_folder/")
pip install docx2pdf
docx2pdf input.docx output.pdf
Disclaimer: I wrote the docx2pdf package. https://github.com/AlJohri/docx2pdf
I have tested many solutions but no one of them works efficiently on Linux distribution.
I recommend this solution :
import sys
import subprocess
import re
def convert_to(folder, source, timeout=None):
args = [libreoffice_exec(), '--headless', '--convert-to', 'pdf', '--outdir', folder, source]
process = subprocess.run(args, stdout=subprocess.PIPE, stderr=subprocess.PIPE, timeout=timeout)
filename = re.search('-> (.*?) using filter', process.stdout.decode())
return filename.group(1)
def libreoffice_exec():
# TODO: Provide support for more platforms
if sys.platform == 'darwin':
return '/Applications/LibreOffice.app/Contents/MacOS/soffice'
return 'libreoffice'
and you call your function:
result = convert_to('TEMP Directory', 'Your File', timeout=15)
All resources:
https://michalzalecki.com/converting-docx-to-pdf-using-python/
I have worked on this problem for half a day, so I think I should share some of my experience on this matter. Steven's answer is right, but it will fail on my computer. There are two key points to fix it here:
(1). The first time when I created the 'Word.Application' object, I should make it (the word app) visible before open any documents. (Actually, even I myself cannot explain why this works. If I do not do this on my computer, the program will crash when I try to open a document in the invisible model, then the 'Word.Application' object will be deleted by OS. )
(2). After doing (1), the program will work well sometimes but may fail often. The crash error "COMError: (-2147418111, 'Call was rejected by callee.', (None, None, None, 0, None))" means that the COM Server may not be able to response so quickly. So I add a delay before I tried to open a document.
After doing these two steps, the program will work perfectly with no failure anymore. The demo code is as below. If you have encountered the same problems, try to follow these two steps. Hope it helps.
import os
import comtypes.client
import time
wdFormatPDF = 17
# absolute path is needed
# be careful about the slash '\', use '\\' or '/' or raw string r"..."
in_file=r'absolute path of input docx file 1'
out_file=r'absolute path of output pdf file 1'
in_file2=r'absolute path of input docx file 2'
out_file2=r'absolute path of outputpdf file 2'
# print out filenames
print in_file
print out_file
print in_file2
print out_file2
# create COM object
word = comtypes.client.CreateObject('Word.Application')
# key point 1: make word visible before open a new document
word.Visible = True
# key point 2: wait for the COM Server to prepare well.
time.sleep(3)
# convert docx file 1 to pdf file 1
doc=word.Documents.Open(in_file) # open docx file 1
doc.SaveAs(out_file, FileFormat=wdFormatPDF) # conversion
doc.Close() # close docx file 1
word.Visible = False
# convert docx file 2 to pdf file 2
doc = word.Documents.Open(in_file2) # open docx file 2
doc.SaveAs(out_file2, FileFormat=wdFormatPDF) # conversion
doc.Close() # close docx file 2
word.Quit() # close Word Application
unoconv (writen in Python) and OpenOffice running as a headless daemon.
https://github.com/unoconv/unoconv
http://dag.wiee.rs/home-made/unoconv/
Works very nicely for doc, docx, ppt, pptx, xls, xlsx.
Very useful if you need to convert docs or save/convert to certain formats on a server.
As an alternative to the SaveAs function, you could also use ExportAsFixedFormat which gives you access to the PDF options dialog you would normally see in Word. With this you can specify bookmarks and other document properties.
doc.ExportAsFixedFormat(OutputFileName=pdf_file,
ExportFormat=17, #17 = PDF output, 18=XPS output
OpenAfterExport=False,
OptimizeFor=0, #0=Print (higher res), 1=Screen (lower res)
CreateBookmarks=1, #0=No bookmarks, 1=Heading bookmarks only, 2=bookmarks match word bookmarks
DocStructureTags=True
);
The full list of function arguments is: 'OutputFileName', 'ExportFormat', 'OpenAfterExport', 'OptimizeFor', 'Range', 'From', 'To', 'Item', 'IncludeDocProps', 'KeepIRM', 'CreateBookmarks', 'DocStructureTags', 'BitmapMissingFonts', 'UseISO19005_1', 'FixedFormatExtClassPtr'
It's worth noting that Stevens answer works, but make sure if using a for loop to export multiple files to place the ClientObject or Dispatch statements before the loop - it only needs to be created once - see my problem: Python win32com.client.Dispatch looping through Word documents and export to PDF; fails when next loop occurs
If you don't mind using PowerShell have a look at this Hey, Scripting Guy! article. The code presented could be adopted to use the wdFormatPDF enumeration value of WdSaveFormat (see here).
This blog article presents a different implementation of the same idea.
I have modified it for ppt support as well. My solution support all the below-specified extensions.
word_extensions = [".doc", ".odt", ".rtf", ".docx", ".dotm", ".docm"]
ppt_extensions = [".ppt", ".pptx"]
My Solution: Github Link
I have modified code from Docx2PDF
I tried the accepted answer but wasn't particularly keen on the bloated PDFs Word was producing which was usually an order of magnitude bigger than expected. After looking how to disable the dialogs when using a virtual PDF printer I came across Bullzip PDF Printer and I've been rather impressed with its features. It's now replaced the other virtual printers I used previously. You'll find a "free community edition" on their download page.
The COM API can be found here and a list of the usable settings can be found here. The settings are written to a "runonce" file which is used for one print job only and then removed automatically. When printing multiple PDFs we need to make sure one print job completes before starting another to ensure the settings are used correctly for each file.
import os, re, time, datetime, win32com.client
def print_to_Bullzip(file):
util = win32com.client.Dispatch("Bullzip.PDFUtil")
settings = win32com.client.Dispatch("Bullzip.PDFSettings")
settings.PrinterName = util.DefaultPrinterName # make sure we're controlling the right PDF printer
outputFile = re.sub("\.[^.]+$", ".pdf", file)
statusFile = re.sub("\.[^.]+$", ".status", file)
settings.SetValue("Output", outputFile)
settings.SetValue("ConfirmOverwrite", "no")
settings.SetValue("ShowSaveAS", "never")
settings.SetValue("ShowSettings", "never")
settings.SetValue("ShowPDF", "no")
settings.SetValue("ShowProgress", "no")
settings.SetValue("ShowProgressFinished", "no") # disable balloon tip
settings.SetValue("StatusFile", statusFile) # created after print job
settings.WriteSettings(True) # write settings to the runonce.ini
util.PrintFile(file, util.DefaultPrinterName) # send to Bullzip virtual printer
# wait until print job completes before continuing
# otherwise settings for the next job may not be used
timestamp = datetime.datetime.now()
while( (datetime.datetime.now() - timestamp).seconds < 10):
if os.path.exists(statusFile) and os.path.isfile(statusFile):
error = util.ReadIniString(statusFile, "Status", "Errors", '')
if error != "0":
raise IOError("PDF was created with errors")
os.remove(statusFile)
return
time.sleep(0.1)
raise IOError("PDF creation timed out")
I was working with this solution but I needed to search all .docx, .dotm, .docm, .odt, .doc or .rtf and then turn them all to .pdf (python 3.7.5). Hope it works...
import os
import win32com.client
wdFormatPDF = 17
for root, dirs, files in os.walk(r'your directory here'):
for f in files:
if f.endswith(".doc") or f.endswith(".odt") or f.endswith(".rtf"):
try:
print(f)
in_file=os.path.join(root,f)
word = win32com.client.Dispatch('Word.Application')
word.Visible = False
doc = word.Documents.Open(in_file)
doc.SaveAs(os.path.join(root,f[:-4]), FileFormat=wdFormatPDF)
doc.Close()
word.Quit()
word.Visible = True
print ('done')
os.remove(os.path.join(root,f))
pass
except:
print('could not open')
# os.remove(os.path.join(root,f))
elif f.endswith(".docx") or f.endswith(".dotm") or f.endswith(".docm"):
try:
print(f)
in_file=os.path.join(root,f)
word = win32com.client.Dispatch('Word.Application')
word.Visible = False
doc = word.Documents.Open(in_file)
doc.SaveAs(os.path.join(root,f[:-5]), FileFormat=wdFormatPDF)
doc.Close()
word.Quit()
word.Visible = True
print ('done')
os.remove(os.path.join(root,f))
pass
except:
print('could not open')
# os.remove(os.path.join(root,f))
else:
pass
The try and except was for those documents I couldn't read and won't exit the code until the last document.
You should start from investigating so called virtual PDF print drivers.
As soon as you will find one you should be able to write batch file that prints your DOC files into PDF files. You probably can do this in Python too (setup printer driver output and issue document/print command in MSWord, later can be done using command line AFAIR).
import docx2txt
from win32com import client
import os
files_from_folder = r"c:\\doc"
directory = os.fsencode(files_from_folder)
amount = 1
word = client.DispatchEx("Word.Application")
word.Visible = True
for file in os.listdir(directory):
filename = os.fsdecode(file)
print(filename)
if filename.endswith('docx'):
text = docx2txt.process(os.path.join(files_from_folder, filename))
print(f'{filename} transfered ({amount})')
amount += 1
new_filename = filename.split('.')[0] + '.txt'
try:
with open(os.path.join(files_from_folder + r'\txt_files', new_filename), 'w', encoding='utf-8') as t:
t.write(text)
except:
os.mkdir(files_from_folder + r'\txt_files')
with open(os.path.join(files_from_folder + r'\txt_files', new_filename), 'w', encoding='utf-8') as t:
t.write(text)
elif filename.endswith('doc'):
doc = word.Documents.Open(os.path.join(files_from_folder, filename))
text = doc.Range().Text
doc.Close()
print(f'{filename} transfered ({amount})')
amount += 1
new_filename = filename.split('.')[0] + '.txt'
try:
with open(os.path.join(files_from_folder + r'\txt_files', new_filename), 'w', encoding='utf-8') as t:
t.write(text)
except:
os.mkdir(files_from_folder + r'\txt_files')
with open(os.path.join(files_from_folder + r'\txt_files', new_filename), 'w', encoding='utf-8') as t:
t.write(text)
word.Quit()
The Source Code, see here:
https://neculaifantanaru.com/en/python-full-code-how-to-convert-doc-and-docx-files-to-pdf-from-the-folder.html
I would suggest ignoring your supervisor and use OpenOffice which has a Python api. OpenOffice has built in support for Python and someone created a library specific for this purpose (PyODConverter).
If he isn't happy with the output, tell him it could take you weeks to do it with word.

How do I open a specific webpage in a new tab before a Python function returns?

In my main.py I have a Python function called def loop(request) and it returns a string. At the end of the function, before I return the value, I try to open a webpage
def loop (request):
text = "BLAH"
#THESE DIDN'T WORK
webbrowser.open('http://net-informations.com', new=2)
urllib.request.urlopen('https://www.google.com/')
return text
and the rest of the script is as follows:
with open('abc.json', 'r') as f:
dt = json.load(f)
f.close()
ret = loop(dt)
# print('RET ' + str(ret))
#THESE DIDNT OPEN THE PAGE AS WELL :(
try:
print("?????????????")
urllib.request.urlopen('https://www.google.com/')
except Exception as e:
print(str(e))
I run my python in the terminal as python3 main.py However on the terminal I can execute python3 and it launches the webpage...
>>> import webbrowser
>>> webbrowser.open('http://net-informations.com', new=2)

user created log files

I am getting a TypeError: object of type file' has no len()
I have traced down the issue to the path established upon execution.
What am I missing to correct this error found within the "savePath" deceleration or usage within the "temp = os.path.join(savePath, files)"?
def printTime(time):
savePath = "C:\Users\Nicholas\Documents"
files = open("LogInLog.txt", "a")
temp = os.path.join(savePath, files)
files.write("A LogIn occured.")
files.write(time)
print files.read
files.close
main()
The whole program is below for reference:
from time import strftime
import os.path
def main():
getTime()
def getTime():
time = strftime("%Y-%m-%d %I:%M:%S")
printTime(time)
def printTime(time):
savePath = "C:\Users\Nicholas\Documents"
files = open("LogInLog.txt", "a")
temp = os.path.join(savePath, files)
files.write("A LogIn occured.")
files.write(time)
print files.read
files.close
main()
Here's a working version:
from time import strftime
import os.path
def main():
getTime()
def getTime():
time = strftime("%Y-%m-%d %I:%M:%S")
printTime(time)
def printTime(time):
savePath = "C:\Users\Nicholas\Documents"
logFile = "LogInLog.txt"
files = open(os.path.join(savePath, logFile), "a+")
openPosition = files.tell()
files.write("A LogIn occured.")
files.write(time)
files.seek(openPosition)
print(files.read())
files.close()
if __name__ == '__main__':
main()
There were a few problems with the code snippet posted in the question:
Two import statements were concatenated together. Each should be on a separate line.
The os.path.join function doesn't work on an open filehandle.
The read() and close() methods were missing parens.
If the intent is to read what is written in append mode, it's necessary to get the current file position via tell() and seek() to that position after writing to the file.
While it's legal to call main() without any conditional check, it's usually best to make sure the module is being called as a script as opposed to being imported.

Force a script to continue running despite bringing up an error box

I have a script that reads information from two different files, and writes output to a third file. I have an error catch (the goal of the error catch is to display any IDs that did not get processed by the script) at the very end that uses a ctypes windows message box. Currently, the script does not actually finish writing to the output file until I click "OK" on the error message box. I would like the program to, instead, finish writing to the output file regardless of me pressing "OK". Is this possible to do?
The script:
'''select newest reference file'''
directory = 'C:\User\Python test\Folder1'
newest = max(glob.iglob(os.path.join(directory, '*.txt')), key=os.path.getctime)
timestamp = time.strftime("%Y_%m_%d - %H_%M_%S")
'''Get IDS to be read'''
idlist = open('C:\User\Python test\ID List.txt').read().splitlines()
'''Print or write lines associated with selected IDS'''
output = open('C:\User\Python test\%s.txt' % timestamp, 'w')
with open(newest, 'r') as f:
head = f.readline().strip()
output.writelines(head + "\n")
for referenceline in f.read().strip().split("\n"):
for ids in idlist:
if ids in referenceline:
output.writelines(referenceline.replace(" ", "") + "\n")
idlist.remove(ids)
text = '\n'.join(idlist)
ctypes.windll.user32.MessageBoxW(0, u"%s" %text, u"IDs not found:", 0)
This one was pretty simple. I just needed to close the file before the error catch.
Code:
output.close()
text = '\n'.join(idlist)
ctypes.windll.user32.MessageBoxW(0, u"%s" %text, u"IDs not found:", 0)

Python 2.7 to exe using py2exe issue

I successfully created an .exe using py2exe with a simple test script I found on a tutorials website. The script I am using, however, does not seem to work. My code uses the csv module and dict reader with two .csv inputs.
I run the python setup.py p2exe command, and I get a flash of a command prompt, but that disappears before I can read anything on it. And once it disappears, I do not have the correct .csv file output that I would get if I just ran the script in python.
Can anyone offer any advice or things to try? Or is there a way I could get that pesky cmd window to stay open long enough for me to see what it says?
Thanks. My script is below.
import csv
def main():
iFileName = 'DonorsPlayTesting.csv'
oFileName = iFileName[:-4] + '-Output' + iFileName[-4:]
iFile = csv.DictReader(open(iFileName))
oFile = csv.writer(open(oFileName, 'w'), lineterminator = '\n')
iDirectory = csv.DictReader(open("DonorsDirectory.csv"))
oNames = {}
directory = {}
for line in iDirectory:
directory[line['Number']] = line['Name']
for key in directory.keys():
oNames[directory[key]] = 0
out_header = ['Name', 'Plays']
oFile.writerow(out_header)
for line in iFile:
if line['Type'] == "Test Completion":
if line['Number'] in directory:
oNames[directory[line['Number']]] += 1
elif line['Number'] not in directory:
oNames[line['Number']] = 'Need Name'
oFile.writerows(oNames.items())
main()

Categories

Resources