OpenAI retrieve file content - python

Unable to retrieve the content of file uploaded already.
Kindly suggest what is going wrong? I have tried for each type of file: search, classification, answers, and fine-tune. Files upload successfully but while retrieving content it shows an error.
import openai
openai.api_key = "sk-bbjsjdjsdksbndsndksbdksbknsndksd" # this is wrong key
# Replace file_id with the file's id whose file content is required
content = openai.File.download("file-5Xs86wEDO5gx8fOitMYArV8r")
print(content)
Error:
Traceback (most recent call last):
File "main.py", line 6, in <module>
content = openai.File.download("file-5Xs86wEDO5gx8fOitMYArV8r")
File "/usr/local/lib/python3.8/dist-packages/openai/api_resources/file.py", line 61, in download
raise requestor.handle_error_response(
openai.error.InvalidRequestError: Not allowed to download files of purpose: classifications

Answer from OpenAI community
Currently, we only allow downloads on the results of fine-tuning runs and not the input files to the fine tuning run. We also don’t allow downloads for search related files. ↗️

Related

Manipulating docm files with Python

I was looking how to manipulate docm files with Python and found this library named python-docx-docm.
I followed the documentation and tried a simple programm :
import docx
doc = docx.Document(my docm file)
all_paras = doc.paragraphs
for para in all_paras:
print(para.text)
print("-----------")
To which I am geting the following error :
Traceback (most recent call last):
File "c:\Users\clemdcz\Desktop\Projet\intoWORD.py", line 3, in <module>
doc = docx.Document(r'C:\Users\clemdcz\Desktop\my_file.docm')
File "C:\Users\clemdcz\AppData\Local\Programs\Python\Python310\lib\site-packages\docx\api.py", line 36, in Document
return document_part.document
AttributeError: 'Part' object has no attribute 'document'
If I then try with a docx file it works fine and shows me the correct data. So I was wondering on how to fix this error ?
The documentation doesn't seem to give informations on docm file. But I read that it was supposed to work the same for both docm and docx. I couldn't find any other libraries that could manipulate docx files with python.

Download File From Shared Dropbox Using Dropbox ApiError

Im looking to download a CSV file that is on a shared dropbox folder. The code that I currently have given me an ApiError. Full code and error below:
My Code:
import dropbox
ACCESS_TOKEN = '***MY ACCESS_TOKEN***'
dbx = dropbox.Dropbox(ACCESS_TOKEN)
url = "https://www.dropbox.com/sh/s8vwbg46zjsg3rw/AAC0T1BhIgfp5BfH_sJ_Vnb1a?dl=0"
file = "https://www.dropbox.com/sh/s8vwbg46zjsg3rw/AAC0T1BhIgfp5BfH_sJ_Vnb1a?dl=0&preview=Stock+List+2021-03-08.csv"
md, res = dbx.sharing_get_shared_link_file(url=file)
print(md)
print(res)
Error:
Traceback (most recent call last):
File "D:\***\PyCharm\Furniture\Test 1\dropbox_test.py", line 10, in <module>
md, res = dbx.sharing_get_shared_link_file(url=file)
File "C:\Python39\lib\site-packages\dropbox\base.py", line 4181, in sharing_get_shared_link_file
r = self.request(
File "C:\Python39\lib\site-packages\dropbox\dropbox_client.py", line 346, in request
raise ApiError(res.request_id,
dropbox.exceptions.ApiError: ApiError('90075839f9f94c53a112a48692314d4f', GetSharedLinkFileError('shared_link_is_directory', None))
Any help would be great. I have also tried files_download and I also get an error.
The "https://www.dropbox.com/sh/s8vwbg46zjsg3rw/AAC0T1BhIgfp5BfH_sJ_Vnb1a?dl=0..." link itself points to a folder, not a particular file (whether or not you have the preview parameter on it).
Here are two ways you can make this work:
Supply the path parameter on sharing_get_shared_link_file to specify the file in the folder you want:
md, res = dbx.sharing_get_shared_link_file(url=url, path="/Stock List 2021-03-08.csv")
Use the actual link to the file in particular (which I retrieved manually via the shared link page):
file = "https://www.dropbox.com/sh/s8vwbg46zjsg3rw/AABhEIN92e98iufhllgVuIvga/Stock%20List%202021-03-08.csv?dl=0"
md, res = dbx.sharing_get_shared_link_file(url=file)
Also, if the file is in the connected account for the access token you're using, you should certainly be able to use files_download to download it. Feel free to open another question with the details of that issue if you wish.

Simple PyPDF exercise - AttributeError: 'NullObject' object has no attribute 'get'

Working on a simple PyPDF related exercise - I basically need to take a PDF file and apply a watermark to to it.
Here's my code:
# We need to build a program that will watermark all of our PDF files
# Use the wtr.pdf and apply it to all of the pages of our PDF file
import PyPDF2
# Open the file we want to add the watermark to
with open("combined.pdf", mode="rb") as file:
reader = PyPDF2.PdfFileReader(file)
# Open the watermark file and get the watermark
with open("wtr.pdf", mode="rb") as watermark_file:
watermark_reader = PyPDF2.PdfFileReader(watermark_file)
# Create a writer object for the output file
writer = PyPDF2.PdfFileWriter()
for i in range(reader.numPages):
page = reader.getPage(i)
# Merge the watermark page object into our current page
page.mergePage(watermark_reader.getPage(0))
# Append this new page into our writer object
writer.addPage(page)
with open("watermarked.pdf", mode="wb") as output_file:
writer.write(output_file)
I am unclear as to why I get this error:
$ python watermark.py
Traceback (most recent call last):
File "watermark.py", line 20, in <module>
page.mergePage(watermark_reader.getPage(0))
File "C:\Python38\lib\site-packages\PyPDF2\pdf.py", line 2239, in mergePage
self._mergePage(page2)
File "C:\Python38\lib\site-packages\PyPDF2\pdf.py", line 2260, in _mergePage
new, newrename = PageObject._mergeResources(originalResources, page2Resources, res)
File "C:\Python38\lib\site-packages\PyPDF2\pdf.py", line 2170, in _mergeResources
newRes.update(res1.get(resource, DictionaryObject()).getObject())
AttributeError: 'NullObject' object has no attribute 'get'
I would appreciate any insights. I have been staring at this for a while.
For some reason your pdf file doesn't contain "/Resources". PyPDF2 tries to get it in line 2314 in https://github.com/mstamy2/PyPDF2/blob/master/PyPDF2/pdf.py#L2314
You can try another pdf file to check if the error persists. May be it is a bug in the library or the library doesn't support such files.
Another thing I noticed is that line numbers in master branch of the library do not match line numbers in your stack trace, so may be you need to get more recent version of the library and hope that the problem is fixed there.
By briefly looking at pdf file structure it seems that /Resources are optional. If this is a case, then PyPDF2 doesn't handle this case and it should be probably reported as a bug at https://github.com/mstamy2/PyPDF2/issues

Check file before calling SimpleITK.SimpleITK.ImageFileReader.ReadImageInformation()

I am processing a set of DICOM files, some of which have image information and some of which don't. If a file has image information, the following code works fine.
file_reader = sitk.ImageFileReader()
file_reader.SetFileName(fileName)
file_reader.ReadImageInformation()
However, if the file does not have image information, I get the following error.
Traceback (most recent call last):
File "<ipython-input-61-d187aed107ed>", line 5, in <module>
file_reader.ReadImageInformation()
File "/home/peter/anaconda3/lib/python3.7/site-packages/SimpleITK/SimpleITK.py", line 8673, in ReadImageInformation
return _SimpleITK.ImageFileReader_ReadImageInformation(self)
RuntimeError: Exception thrown in SimpleITK ImageFileReader_ReadImageInformation: /tmp/SimpleITK/Code/IO/src/sitkImageReaderBase.cxx:107:
sitk::ERROR: Unable to determine ImageIO reader for "/path/115.dcm"
If the DICOM file has no information, I would like to just ignore the file rather than calling ReadImageInformation(). Is there a way to check whether ReadImageInformation() will work before it is called? I tried the following and they are no different between files where ReadImageInformation() and files where it does not.
file_reader.GetImageIO()
file_reader.GetMetaDataKeys() # Crashes
file_reader.GetDimension()
I would just put an exception handler around it to catch the error. So it'd look something like this:
file_reader = sitk.ImageFileReader()
file_reader.SetFileName(fileName)
try:
file_reader.ReadImageInformation()
except:
print(fileName, "has no image information")

python-pptx: Dealing with password-protected PowerPoint files

I'm using a slightly modified version of the "Extract all text from slides in presentation" example at https://python-pptx.readthedocs.io/en/latest/user/quickstart.html to extract text from some PowerPoint slides.
I'm getting a PackageNotFoundError when I try to use the Presentation() method to open some of the PowerPoint files to read the text.
This appears to be due to the fact that, unbeknownst to me before I started this project, a few of the PowerPoint files are password protected.
I obviously don't expect to be able to read text from a password-protected file but is there a recommended way of dealing with password-protected PowerPoint files? Having my Python script die every time it runs into one is annoying.
I'd be fine with something that basically went: "Hi! The file you're trying to read may be password-protected. Skipping."
I tried using a try/except block to catch the PackageNotFoundError but then I got "NameError: name 'PackageNotFoundError' is not defined".
EDIT1: Here's a minimal case the generates the error:
EDIT2: See below for a working try/catch block, thanks to TheGamer007's suggestion.
import pptx
from pptx import Presentation
password_protected_file = r"C:\Users\J69401\Documents\password_protected_file.pptx"
prs = Presentation(password_protected_file)
And here's the error that is generated:
Traceback (most recent call last):
File "T:/W/Wintermute/50 Sandbox/Pownall/Python/copy files/minimal_case_opening_file.py", line 6, in <module>
prs = Presentation(password_protected_file)
File "C:\Anaconda3\lib\site-packages\python_pptx-0.6.18-py3.6.egg\pptx\api.py", line 28, in Presentation
presentation_part = Package.open(pptx).main_document_part
File "C:\Anaconda3\lib\site-packages\python_pptx-0.6.18-py3.6.egg\pptx\opc\package.py", line 125, in open
pkg_reader = PackageReader.from_file(pkg_file)
File "C:\Anaconda3\lib\site-packages\python_pptx-0.6.18-py3.6.egg\pptx\opc\pkgreader.py", line 33, in from_file
phys_reader = PhysPkgReader(pkg_file)
File "C:\Anaconda3\lib\site-packages\python_pptx-0.6.18-py3.6.egg\pptx\opc\phys_pkg.py", line 32, in __new__
raise PackageNotFoundError("Package not found at '%s'" % pkg_file)
pptx.exc.PackageNotFoundError: Package not found at 'C:\Users\J69401\Documents\password_protected_file.pptx'
Here's the minimal case again but with a working try/catch block.
import pptx
from pptx import Presentation
import pptx.exc
from pptx.exc import PackageNotFoundError
password_protected_file = r"C:\Users\J69401\Documents\password_protected_file.pptx"
try:
prs = Presentation(password_protected_file)
except PackageNotFoundError:
print("PackageNotFoundError generated - possible password-protected file.")

Categories

Resources