Converting all word document in a folder to txt using python

Converting all word document in a folder to txt using python - python

I am trying to convert word document to txt. Changing the extension doesn't work. I need to open it in word ,then save it to .txt format
I am using code from here http://code.activestate.com/recipes/279003-converting-word-documents-to-text/
import fnmatch, os, pythoncom, sys, win32com.client
wordapp = win32com.client.gencache.EnsureDispatch("Word.Application")
try:
for path, dirs, files in os.walk(sys.argv[1]):
for doc in [os.path.abspath(os.path.join(path, filename)) for filename in files if fnmatch.fnmatch(filename, '*.doc')]:
print "processing %s" % doc
wordapp.Documents.Open(doc)
docastxt = doc.rstrip('doc') + 'txt'
wordapp.ActiveDocument.SaveAs(docastxt, FileFormat=win32com.client.constants.wdFormatTextLineBreaks)
wordapp.ActiveWindow.Close()
finally:
wordapp.Quit()
however when I run it the first file will work perfectly but giving error on the second file
here are the error msg
Traceback (most recent call last):
File "recipe-279003-1.py", line 11, in <module>
wordapp.ActiveDocument.SaveAs(docastxt, FileFormat=win32com.client.constants.wdFormatTextLineBreaks)
File "C:\Users\moxzhang\AppData\Local\Continuum\anaconda3\lib\site-packages\win32com\client\__init__.py", line 474, in __getattr__
return self._ApplyTypes_(*args)
File "C:\Users\moxzhang\AppData\Local\Continuum\anaconda3\lib\site-packages\win32com\client\__init__.py", line 467, in _ApplyTypes_
self._oleobj_.InvokeTypes(dispid, 0, wFlags, retType, argTypes, *args),
pywintypes.com_error: (-2147023179, 'The interface is unknown.', None, None)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "recipe-279003-1.py", line 14, in <module>
wordapp.Quit()
File "C:\Users\moxzhang\AppData\Local\Temp\gen_py\3.7\00020905-0000-0000-C000-000000000046x0x8x7\_Application.py", line 353, in Quit
, OriginalFormat, RouteDocument)
pywintypes.com_error: (-2147023179, 'The interface is unknown.', None, None)
I have tried only put one word file in the directory , it works fine. but once i have 2 files there. the procedure will fail on the 2ed file (both file will convert succesfully if it is the only file in the folder)
Could you please let me know what this error message means ? and what can I do to fix it.

Related

Python Exiftool get metadata from livestream

I am trying to read geotagging data from a live stream online, here is my code:
import exiftool
def getVideo(url):
with exiftool.ExifToolHelper() as et:
metadata = et.getmetadata(url)
print(metadata)
getVideo("url/to/stream")
however, I got this error:
Traceback (most recent call last):
File "C:\Users\alexa\Documents\vtest2.py", line 9, in <module>
getVideo("url/to/stream")
File "C:\Users\alexa\Documents\vtest2.py", line 4, in getVideo
with exiftool.ExifToolHelper() as et:
^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Python311\Lib\site-packages\exiftool\helper.py", line 101, in __init__
super().__init__(**kwargs)
File "C:\Python311\Lib\site-packages\exiftool\exiftool.py", line 300, in __init__
self.executable = executable or constants.DEFAULT_EXECUTABLE
^^^^^^^^^^^^^^^
File "C:\Python311\Lib\site-packages\exiftool\exiftool.py", line 374, in executable
raise FileNotFoundError(f'"{new_executable}" is not found, on path or as absolute path')
FileNotFoundError: "exiftool.exe" is not found, on path or as absolute path
is there a better way to read metadata from a live stream?

Why can't i create a folder when he doesn't exist?

i 'm trying to make a for loop who browse files in a specific directory while creating a folder if he doesn't exist with this solution. here is the code:
import ftputil
host=ftputil.FTPHost('x.x.x.x',"x","x") #connecting to the ftp server
mypathexist='./CameraOld' (he is here: /opt/Camera/CameraOld
mypath = '.' #it put you in /opt/Camera (it's the default path configured)
host.chdir(mypath)
files = host.listdir(host.curdir)
for f in files: #i browse the files in my folders
if f==mypathexist: #if a file is named CameraOld (it's a folder)
isExist=True
break
else: isExist=False #if 0 file are named like it
print(isExist)
if isExist==False: #if the file doesn't exist
host.mkdir(mypathexist) #create the folder
else:
print("ok")
The problem is that isExist is always false so the script try to create a folder who is already created. And i don't understand why.
Here's the output:
False #it's the print(isExist)
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/ftputil/host.py", line 695, in command
self._session.mkd(path)
File "/usr/lib/python3.10/ftplib.py", line 637, in mkd
resp = self.voidcmd('MKD ' + dirname)
File "/usr/lib/python3.10/ftplib.py", line 286, in voidcmd
return self.voidresp()
File "/usr/lib/python3.10/ftplib.py", line 259, in voidresp
resp = self.getresp()
File "/usr/lib/python3.10/ftplib.py", line 254, in getresp
raise error_perm(resp)
ftplib.error_perm: 550 CameraOld: file exist
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/user/Bureau/try.py", line 16, in <module>
host.mkdir(mypathexist)
File "/usr/local/lib/python3.10/dist-packages/ftputil/host.py", line 697, in mkdir
self._robust_ftp_command(command, path)
File "/usr/local/lib/python3.10/dist-packages/ftputil/host.py", line 656, in _robust_ftp_command
return command(self, tail)
File "/usr/local/lib/python3.10/dist-packages/ftputil/host.py", line 694, in command
with ftputil.error.ftplib_error_to_ftp_os_error:
File "/usr/local/lib/python3.10/dist-packages/ftputil/error.py", line 195, in __exit__
raise PermanentError(
ftputil.error.PermanentError: 550 CameraOld: file exist
Debugging info: ftputil 5.0.4, Python 3.10.4 (linux)

I would bet your mypathexist is not correct. Or the other way around, your file list, doesn't hold the strings in that condition you assume it does.
Take a look at your condition by hand. Print out f in your loop. Is it what you would expect to be?
In the end, Python is simply comparing Strings.

Download file google drive python

How do I download a file from googledrive?
I am using pydrive using the link.
#https://drive.google.com/open?id=DWADADDSASWADSCDAW
from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
gauth = GoogleAuth()
drive = GoogleDrive(gauth)
gdrive_file = drive.CreateFile({'id': 'id=DWADADDSASWADSCDAW'})
gdrive_file.GetContentFile('DWADSDCXZCDWA.zip') # Download content file.
Error:
raceback (most recent call last):
File "C:\Users\Hoxton\AppData\Local\Continuum\miniconda3\lib\site-packages\oauth2client\clientsecrets.py", line 121, in _loadfile
with open(filename, 'r') as fp:
FileNotFoundError: [Errno 2] No such file or directory: 'client_secrets.json'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\Users\Hoxton\AppData\Local\Continuum\miniconda3\lib\site-packages\pydrive\auth.py", line 386, in LoadClientConfigFile
client_type, client_info = clientsecrets.loadfile(client_config_file)
File "C:\Users\Hoxton\AppData\Local\Continuum\miniconda3\lib\site-packages\oauth2client\clientsecrets.py", line 165, in loadfile
return _loadfile(filename)
File "C:\Users\Hoxton\AppData\Local\Continuum\miniconda3\lib\site-packages\oauth2client\clientsecrets.py", line 125, in _loadfile
exc.strerror, exc.errno)
oauth2client.clientsecrets.InvalidClientSecretsError: ('Error opening file', 'client_secrets.json', 'No such file or directory', 2)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:/Users/Hoxton/123/pyu_test.py", line 8, in <module>
gdrive_file.GetContentFile('PyUpdater+App-win-1.0.zip') # Download content file.
File "C:\Users\Hoxton\AppData\Local\Continuum\miniconda3\lib\site-packages\pydrive\files.py", line 210, in GetContentFile
self.FetchContent(mimetype, remove_bom)
File "C:\Users\Hoxton\AppData\Local\Continuum\miniconda3\lib\site-packages\pydrive\files.py", line 42, in _decorated
self.FetchMetadata()
File "C:\Users\Hoxton\AppData\Local\Continuum\miniconda3\lib\site-packages\pydrive\auth.py", line 57, in _decorated
self.auth.LocalWebserverAuth()
File "C:\Users\Hoxton\AppData\Local\Continuum\miniconda3\lib\site-packages\pydrive\auth.py", line 113, in _decorated
self.GetFlow()
File "C:\Users\Hoxton\AppData\Local\Continuum\miniconda3\lib\site-packages\pydrive\auth.py", line 443, in GetFlow
self.LoadClientConfig()
File "C:\Users\Hoxton\AppData\Local\Continuum\miniconda3\lib\site-packages\pydrive\auth.py", line 366, in LoadClientConfig
self.LoadClientConfigFile()
File "C:\Users\Hoxton\AppData\Local\Continuum\miniconda3\lib\site-packages\pydrive\auth.py", line 388, in LoadClientConfigFile
raise InvalidConfigError('Invalid client secrets file %s' % error)
pydrive.settings.InvalidConfigError: Invalid client secrets file ('Error opening file', 'client_secrets.json', 'No such file or directory', 2)
Process finished with exit code 1

Try the provided sample code in the documentation.
The Drive API allows you to download files that are stored in Google
Drive. Also, you can download exported versions of Google Documents
(Documents, Spreadsheets, Presentations, etc.) in formats that your
app can handle. Drive also supports providing users direct access to a
file via the URL in the webViewLink property.
Here is the code snippet:
file_id = '0BwwA4oUTeiV1UVNwOHItT0xfa2M'
request = drive_service.files().get_media(fileId=file_id)
fh = io.BytesIO()
downloader = MediaIoBaseDownload(fh, request)
done = False
while done is False:
status, done = downloader.next_chunk()
print "Download %d%%." % int(status.progress() * 100)

This works for me:
from google_drive_downloader import GoogleDriveDownloader as gdd
gdd.download_file_from_google_drive(file_id='1z8e2CnvrX8ZSu2kk0QgiFWurOKMr0', dest_path='E:/model.h5')
Source: https://newbedev.com/python-download-files-from-google-drive-using-url

Hey I know it's a bit late to answer, but it might still be helpful to someone.
I had a similar problem with G-sheets, the problem here is that there might be multiple formats to download the file in and you're not specifying which one you want.To do this you need to add the mimetype parameter to the GetContentFile Method. Like so:
gdrive_file.GetContentFile('DWADSDCXZCDWA.zip', mimetype = 'application/zip')
Note that there are multiple mimetypes for zip files and that the mimetype and extension need to agree. So you need to know which one to use, or just try out different ones if you don't. Here's a handy list:
application/x-compressed
application/x-zip-compressed
application/zip
multipart/x-zip
Furthermore, if you actually access the metadata of the file you can have a peek at all the types of formats you can export it in under 'exportLinks'. There will be a dict with mimetypes and the associated links.

Reading password protected Word Documents with zipfile

I am trying to read a password protected word document on Python using zipfile.
The following code works with a non-password protected document, but gives an error when used with a password protected file.
try:
from xml.etree.cElementTree import XML
except ImportError:
from xml.etree.ElementTree import XML
import zipfile
psw = "1234"
WORD_NAMESPACE = '{http://schemas.openxmlformats.org/wordprocessingml/2006/main}'
PARA = WORD_NAMESPACE + 'p'
TEXT = WORD_NAMESPACE + 't'
def get_docx_text(path):
document = zipfile.ZipFile(path, "r")
document.setpassword(psw)
document.extractall()
xml_content = document.read('word/document.xml')
document.close()
tree = XML(xml_content)
paragraphs = []
for paragraph in tree.getiterator(PARA):
texts = [node.text
for node in paragraph.getiterator(TEXT)
if node.text]
if texts:
paragraphs.append(''.join(texts))
return '\n\n'.join(paragraphs)
When running get_docx_text() with a password protected file, I received the following error:
Traceback (most recent call last):
File "<ipython-input-15-d2783899bfe5>", line 1, in <module>
runfile('/Users/username/Workspace/Python/docx2txt.py', wdir='/Users/username/Workspace/Python')
File "/Applications/Spyder-Py2.app/Contents/Resources/lib/python2.7/spyderlib/widgets/externalshell/sitecustomize.py", line 680, in runfile
execfile(filename, namespace)
File "/Applications/Spyder-Py2.app/Contents/Resources/lib/python2.7/spyderlib/widgets/externalshell/sitecustomize.py", line 78, in execfile
builtins.execfile(filename, *where)
File "/Users/username/Workspace/Python/docx2txt.py", line 41, in <module>
x = get_docx_text("/Users/username/Desktop/file.docx")
File "/Users/username/Workspace/Python/docx2txt.py", line 23, in get_docx_text
document = zipfile.ZipFile(path, "r")
File "zipfile.pyc", line 770, in __init__
File "zipfile.pyc", line 811, in _RealGetContents
BadZipfile: File is not a zip file
Does anyone have any advice to get this code to work?

I don't think this is an encryption problem, for two reasons:
Decryption is not attempted when the ZipFile object is created. Methods like ZipFile.extractall, extract, and open, and read take an optional pwd parameter containing the password, but the object constructor / initializer does not.
Your stack trace indicates that the BadZipFile is being raised when you create the ZipFile object, before you call setpassword:
document = zipfile.ZipFile(path, "r")
I'd look carefully for other differences between the two files you're testing: ownership, permissions, security context (if you have that on your OS), ... even filename differences can cause a framework to "not see" the file you're working on.
Also --- the obvious one --- try opening the encrypted zip file with your zip-compatible command of choice. See if it really is a zip file.
I tested this by opening an encrypted zip file in Python 3.1, while "forgetting" to provide a password. I could create the ZipFile object (the variable zfile below) without any error, but got a RuntimeError --- not a BadZipFile exception --- when I tried to read a file without providing a password:
Traceback (most recent call last):
File "./zf.py", line 35, in <module>
main()
File "./zf.py", line 29, in main
print_checksums(zipfile_name)
File "./zf.py", line 22, in print_checksums
for checksum in checksum_contents(zipfile_name):
File "./zf.py", line 13, in checksum_contents
inner_file = zfile.open(inner_filename, "r")
File "/usr/lib64/python3.1/zipfile.py", line 903, in open
"password required for extraction" % name)
RuntimeError: File apache.log is encrypted, password required for extraction
I was also able to raise a BadZipfile exception, once by trying to open an empty file and once by trying to open some random logfile text that I'd renamed to a ".zip" extension. The two test files produced identical stack traces, down to the line numbers.
Traceback (most recent call last):
File "./zf.py", line 35, in <module>
main()
File "./zf.py", line 29, in main
print_checksums(zipfile_name)
File "./zf.py", line 22, in print_checksums
for checksum in checksum_contents(zipfile_name):
File "./zf.py", line 10, in checksum_contents
zfile = zipfile.ZipFile(zipfile_name, "r")
File "/usr/lib64/python3.1/zipfile.py", line 706, in __init__
self._GetContents()
File "/usr/lib64/python3.1/zipfile.py", line 726, in _GetContents
self._RealGetContents()
File "/usr/lib64/python3.1/zipfile.py", line 738, in _RealGetContents
raise BadZipfile("File is not a zip file")
zipfile.BadZipfile: File is not a zip file
While this stack trace isn't exactly the same as yours --- mine has a call to _GetContents, and the pre-3.2 "small f" spelling of BadZipfile --- but they're close enough that I think this is the kind of problem you're dealing with.

How to open files, web browsers, and URLs in Python, not in IDLE.

I know you can open files, browsers, and URLs in the Python GUI. However, I don't know how to apply this to programs. For example, none of the below work. (The below are snippets from my growing chat bot program):
def browser():
print('OPENING FIREFOX...')
handle = webbroswer.get() # webbrowser is imported at the top of the file
handle.open('http://youtube.com')
handle.open_new_tab('http://google.com')
and
def file():
file = str(input('ENTER THE FILE\'S NAME AND EXTENSION:'))
action = open(file, 'r')
actionTwo = action.read()
print (actionTwo)
These errors occur, in respect to the above order, but in individual runs:
OPENING FIREFOX...
Traceback (most recent call last):
File "C:/Users/RCOMP/Desktop/Programming/Python Files/AI/COMPUTRON_01.py", line 202, in <module>
askForQuestions()
File "C:/Users/RCOMP/Desktop/Programming/Python Files/AI/COMPUTRON_01.py", line 64, in askForQuestions
browser()
File "C:/Users/RCOMP/Desktop/Programming/Python Files/AI/COMPUTRON_01.py", line 38, in browser
handle = webbroswer.get()
NameError: global name 'webbroswer' is not defined
>>>
ENTER THE FILE'S NAME AND EXTENSION:file.txt
Traceback (most recent call last):
File "C:/Users/RCOMP/Desktop/Programming/Python Files/AI/COMPUTRON_01.py", line 202, in <module>
askForQuestions()
File "C:/Users/RCOMP/Desktop/Programming/Python Files/AI/COMPUTRON_01.py", line 66, in askForQuestions
file()
File "C:/Users/RCOMP/Desktop/Programming/Python Files/AI/COMPUTRON_01.py", line 51, in file
action = open(file, 'r')
IOError: [Errno 2] No such file or directory: 'file.txt'
>>>
Am I handling this wrong, or can I just not use open() and webbrowser in a program?

You should read the errors and try to understand them - they are very helpful in this case - as they often are:
The first one says NameError: global name 'webbroswer' is not defined.
You can see here that webbrowser is spelled wrong in the code. It also tells you the line it finds the error (line 38)
The second one IOError: [Errno 2] No such file or directory: 'file.txt' tells you that you're trying to open a file that doesn't exist. This does not work because you specified
action = open(file, 'r')
which means that you're trying to read a file. Python does not allow reading from a file that does not exist.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Converting all word document in a folder to txt using python - python

Related

Python Exiftool get metadata from livestream

Why can't i create a folder when he doesn't exist?

Download file google drive python

Reading password protected Word Documents with zipfile

How to open files, web browsers, and URLs in Python, not in IDLE.

Categories

Resources