Manipulating docm files with Python

Manipulating docm files with Python - python

I was looking how to manipulate docm files with Python and found this library named python-docx-docm.
I followed the documentation and tried a simple programm :
import docx
doc = docx.Document(my docm file)
all_paras = doc.paragraphs
for para in all_paras:
print(para.text)
print("-----------")
To which I am geting the following error :
Traceback (most recent call last):
File "c:\Users\clemdcz\Desktop\Projet\intoWORD.py", line 3, in <module>
doc = docx.Document(r'C:\Users\clemdcz\Desktop\my_file.docm')
File "C:\Users\clemdcz\AppData\Local\Programs\Python\Python310\lib\site-packages\docx\api.py", line 36, in Document
return document_part.document
AttributeError: 'Part' object has no attribute 'document'
If I then try with a docx file it works fine and shows me the correct data. So I was wondering on how to fix this error ?
The documentation doesn't seem to give informations on docm file. But I read that it was supposed to work the same for both docm and docx. I couldn't find any other libraries that could manipulate docx files with python.

Related

OpenAI retrieve file content

Unable to retrieve the content of file uploaded already.
Kindly suggest what is going wrong? I have tried for each type of file: search, classification, answers, and fine-tune. Files upload successfully but while retrieving content it shows an error.
import openai
openai.api_key = "sk-bbjsjdjsdksbndsndksbdksbknsndksd" # this is wrong key
# Replace file_id with the file's id whose file content is required
content = openai.File.download("file-5Xs86wEDO5gx8fOitMYArV8r")
print(content)
Error:
Traceback (most recent call last):
File "main.py", line 6, in <module>
content = openai.File.download("file-5Xs86wEDO5gx8fOitMYArV8r")
File "/usr/local/lib/python3.8/dist-packages/openai/api_resources/file.py", line 61, in download
raise requestor.handle_error_response(
openai.error.InvalidRequestError: Not allowed to download files of purpose: classifications

Answer from OpenAI community
Currently, we only allow downloads on the results of fine-tuning runs and not the input files to the fine tuning run. We also don’t allow downloads for search related files. ↗️

Simple PyPDF exercise - AttributeError: 'NullObject' object has no attribute 'get'

Working on a simple PyPDF related exercise - I basically need to take a PDF file and apply a watermark to to it.
Here's my code:
# We need to build a program that will watermark all of our PDF files
# Use the wtr.pdf and apply it to all of the pages of our PDF file
import PyPDF2
# Open the file we want to add the watermark to
with open("combined.pdf", mode="rb") as file:
reader = PyPDF2.PdfFileReader(file)
# Open the watermark file and get the watermark
with open("wtr.pdf", mode="rb") as watermark_file:
watermark_reader = PyPDF2.PdfFileReader(watermark_file)
# Create a writer object for the output file
writer = PyPDF2.PdfFileWriter()
for i in range(reader.numPages):
page = reader.getPage(i)
# Merge the watermark page object into our current page
page.mergePage(watermark_reader.getPage(0))
# Append this new page into our writer object
writer.addPage(page)
with open("watermarked.pdf", mode="wb") as output_file:
writer.write(output_file)
I am unclear as to why I get this error:
$ python watermark.py
Traceback (most recent call last):
File "watermark.py", line 20, in <module>
page.mergePage(watermark_reader.getPage(0))
File "C:\Python38\lib\site-packages\PyPDF2\pdf.py", line 2239, in mergePage
self._mergePage(page2)
File "C:\Python38\lib\site-packages\PyPDF2\pdf.py", line 2260, in _mergePage
new, newrename = PageObject._mergeResources(originalResources, page2Resources, res)
File "C:\Python38\lib\site-packages\PyPDF2\pdf.py", line 2170, in _mergeResources
newRes.update(res1.get(resource, DictionaryObject()).getObject())
AttributeError: 'NullObject' object has no attribute 'get'
I would appreciate any insights. I have been staring at this for a while.

For some reason your pdf file doesn't contain "/Resources". PyPDF2 tries to get it in line 2314 in https://github.com/mstamy2/PyPDF2/blob/master/PyPDF2/pdf.py#L2314
You can try another pdf file to check if the error persists. May be it is a bug in the library or the library doesn't support such files.
Another thing I noticed is that line numbers in master branch of the library do not match line numbers in your stack trace, so may be you need to get more recent version of the library and hope that the problem is fixed there.
By briefly looking at pdf file structure it seems that /Resources are optional. If this is a case, then PyPDF2 doesn't handle this case and it should be probably reported as a bug at https://github.com/mstamy2/PyPDF2/issues

python-pptx: Dealing with password-protected PowerPoint files

I'm using a slightly modified version of the "Extract all text from slides in presentation" example at https://python-pptx.readthedocs.io/en/latest/user/quickstart.html to extract text from some PowerPoint slides.
I'm getting a PackageNotFoundError when I try to use the Presentation() method to open some of the PowerPoint files to read the text.
This appears to be due to the fact that, unbeknownst to me before I started this project, a few of the PowerPoint files are password protected.
I obviously don't expect to be able to read text from a password-protected file but is there a recommended way of dealing with password-protected PowerPoint files? Having my Python script die every time it runs into one is annoying.
I'd be fine with something that basically went: "Hi! The file you're trying to read may be password-protected. Skipping."
I tried using a try/except block to catch the PackageNotFoundError but then I got "NameError: name 'PackageNotFoundError' is not defined".
EDIT1: Here's a minimal case the generates the error:
EDIT2: See below for a working try/catch block, thanks to TheGamer007's suggestion.
import pptx
from pptx import Presentation
password_protected_file = r"C:\Users\J69401\Documents\password_protected_file.pptx"
prs = Presentation(password_protected_file)
And here's the error that is generated:
Traceback (most recent call last):
File "T:/W/Wintermute/50 Sandbox/Pownall/Python/copy files/minimal_case_opening_file.py", line 6, in <module>
prs = Presentation(password_protected_file)
File "C:\Anaconda3\lib\site-packages\python_pptx-0.6.18-py3.6.egg\pptx\api.py", line 28, in Presentation
presentation_part = Package.open(pptx).main_document_part
File "C:\Anaconda3\lib\site-packages\python_pptx-0.6.18-py3.6.egg\pptx\opc\package.py", line 125, in open
pkg_reader = PackageReader.from_file(pkg_file)
File "C:\Anaconda3\lib\site-packages\python_pptx-0.6.18-py3.6.egg\pptx\opc\pkgreader.py", line 33, in from_file
phys_reader = PhysPkgReader(pkg_file)
File "C:\Anaconda3\lib\site-packages\python_pptx-0.6.18-py3.6.egg\pptx\opc\phys_pkg.py", line 32, in __new__
raise PackageNotFoundError("Package not found at '%s'" % pkg_file)
pptx.exc.PackageNotFoundError: Package not found at 'C:\Users\J69401\Documents\password_protected_file.pptx'
Here's the minimal case again but with a working try/catch block.
import pptx
from pptx import Presentation
import pptx.exc
from pptx.exc import PackageNotFoundError
password_protected_file = r"C:\Users\J69401\Documents\password_protected_file.pptx"
try:
prs = Presentation(password_protected_file)
except PackageNotFoundError:
print("PackageNotFoundError generated - possible password-protected file.")

How to compress a text file?

I have a text file created and I want to compress it.
How would I accomplish this?
I have done some research, around the forum ; found a question, similar to this but when I tried it out, it did not work as it was text typed in, not a file, for example
import zlib, base64
text = 'STACK OVERFLOW'
code = base64.b64encode(zlib.compress(text,9))
print code
source from: (Compressing a file in python and keep the grammar exact when opening it again)
When i tried it out this error came up, for example:
hTraceback (most recent call last):
File "C:\Users\Shahid\Desktop\Suhail\Task 3.py", line 3, in <module>
code = base64.b64encode(zlib.compress(text,9))
TypeError: must be string or read-only buffer, not file
Here is the code that I have used:
import zlib, base64
text = open('Suitable.txt','r')
code = base64.b64encode(zlib.compress(text,9))
print code
But what i want is a text file to be compressed.

there is a section entitled "Example of how to GZIP compress an existing file" at the bottom of https://docs.python.org/2/library/gzip.html

you should use this code to do what you tried:
import zlib, base64
file = open('Suitable.txt','r')
text = file.read()
file.close()
code = base64.b64encode(zlib.compress(text.encode('utf-8'),9))
code = code.decode('utf-8')
print(code)
but it actually want be compressed because code is longer than text.

Importing data from JSON using python

Hi I am new to python and I am trying to import a Dataset from JSON file in the repository using Python
import json
with open ('dataforms.json','r') as f:
data = json.load(f)
for row in data:
print (row[Flood])
this code is throwing the following error:
Traceback (most recent call last):
File "C:\Users\Ayush\Desktop\js2.py", line 5, in <module>
print (row[Flood])
NameError: name 'Flood' is not defined

I'm assuming Flood is a string? In which case you need to put quotes around it, or Python thinks it is a variable name.
print (row['Flood'])

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Manipulating docm files with Python - python

Related

OpenAI retrieve file content

Simple PyPDF exercise - AttributeError: 'NullObject' object has no attribute 'get'

python-pptx: Dealing with password-protected PowerPoint files

How to compress a text file?

Importing data from JSON using python

Categories

Resources