.xlsx and xls(Latest Versions) to pdf using python

.xlsx and xls(Latest Versions) to pdf using python - python

With the help of this .doc to pdf using python
Link I am trying for excel (.xlsx and xls formats)
Following is modified Code for Excel:
import os
from win32com import client
folder = "C:\\Oprance\\Excel\\XlsxWriter-0.5.1"
file_type = 'xlsx'
out_folder = folder + "\\PDF_excel"
os.chdir(folder)
if not os.path.exists(out_folder):
print 'Creating output folder...'
os.makedirs(out_folder)
print out_folder, 'created.'
else:
print out_folder, 'already exists.\n'
for files in os.listdir("."):
if files.endswith(".xlsx"):
print files
print '\n\n'
word = client.DispatchEx("Excel.Application")
for files in os.listdir("."):
if files.endswith(".xlsx") or files.endswith('xls'):
out_name = files.replace(file_type, r"pdf")
in_file = os.path.abspath(folder + "\\" + files)
out_file = os.path.abspath(out_folder + "\\" + out_name)
doc = word.Workbooks.Open(in_file)
print 'Exporting', out_file
doc.SaveAs(out_file, FileFormat=56)
doc.Close()
It is showing following error :
>>> execfile('excel_to_pdf.py')
Creating output folder...
C:\Excel\XlsxWriter-0.5.1\PDF_excel created.
apms_trial.xlsx
~$apms_trial.xlsx
Exporting C:\Excel\XlsxWriter-0.5.1\PDF_excel\apms_trial.pdf
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "excel_to_pdf.py", line 30, in <module>
doc = word.Workbooks.Open(in_file)
File "<COMObject <unknown>>", line 8, in Open
pywintypes.com_error: (-2147352567, 'Exception occurred.', (0, u'Microsoft Excel
', u"Excel cannot open the file '~$apms_trial.xlsx' because the file format or f
ile extension is not valid. Verify that the file has not been corrupted and that
the file extension matches the format of the file.", u'xlmain11.chm', 0, -21468
27284), None)
>>>
There is problem in
doc.SaveAs(out_file, FileFormat=56)
What should be FileFormat file format?
Please Help

Link of xlsxwriter :
https://xlsxwriter.readthedocs.org/en/latest/contents.html
With the help of this you can generate excel file with .xlsx and .xls
for example excel file generated name is trial.xls
Now if you want to generate pdf of that excel file then do the following :
from win32com import client
xlApp = client.Dispatch("Excel.Application")
books = xlApp.Workbooks.Open('C:\\excel\\trial.xls')
ws = books.Worksheets[0]
ws.Visible = 1
ws.ExportAsFixedFormat(0, 'C:\\excel\\trial.pdf')

I got the same thing and the same error... ANSWER: 57.... see below...
from win32com import client
import win32api
def exceltopdf(doc):
excel = client.DispatchEx("Excel.Application")
excel.Visible = 0
wb = excel.Workbooks.Open(doc)
ws = wb.Worksheets[1]
try:
wb.SaveAs('c:\\targetfolder\\result.pdf', FileFormat=57)
except Exception, e:
print "Failed to convert"
print str(e)
finally:
wb.Close()
excel.Quit()
... as an alternative to the fragile ExportAsFixedFormat...

You can print an excel sheet to pdf on linux using python.
Do need to run openoffice as a headless server and use unoconv, takes a bit of configuring but is doable
You run OO as a (service) daemon and use it for the conversions for xls, xlsx and doc, docx.
http://dag.wiee.rs/home-made/unoconv/

Another solution for
Is to start gotenberg docker container locally
https://github.com/gotenberg/gotenberg
And pass (any supported by libreoffice) file from python wia HTTP to the container and get result as pdf
LIBREOFFICE_URL = 'http://localhost:3000/forms/libreoffice/convert'
LIBREOFFICE_LANDSCAPE_URL = 'http://localhost:3000/forms/libreoffice/convert?landscape=1'
def _retry_gotenberg(url, io_bytes, post_file_name='index.html'):
response = None
for _ in range(5):
response = requests.post(url, files={post_file_name: io_bytes})
if response.status_code == 200:
break
logging.info('Will sleep and retry: %s %s', response.status_code, response.content)
sleep(3)
if not response or response.status_code != 200:
raise RuntimeRrror(f'Bad response from doc-to-pdf: {response.status_code} {response.content}')
return response
def process_libreoffice(io_bytes, ext: str):
if ext in ('.doc', '.docx'):
url = LIBREOFFICE_URL
else:
url = LIBREOFFICE_LANDSCAPE_URL
response = self._retry_gotenberg(url, io_bytes, post_file_name=f'file.{ext}')
return response.content

The GroupDocs.Conversion Cloud SDK for Python is another option to convert Excel to PDF. It is paid API. However, it provides 150 free monthly API calls.
P.S: I'm a developer evangelist at GroupDocs.
# Import module
import groupdocs_conversion_cloud
from shutil import copyfile
# Get your client_id and client_key at https://dashboard.groupdocs.cloud (free registration is required).
client_id = "xxxxxx-xxxx-xxxx-xxxx-xxxxxxxxx"
client_key = "xxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
# Create instance of the API
convert_api = groupdocs_conversion_cloud.ConvertApi.from_keys(client_id, client_key)
try:
#Convert PDF to PNG
# Prepare request
request = groupdocs_conversion_cloud.ConvertDocumentDirectRequest("pdf", "C:/Temp/Book1.xlsx")
# Convert
result = convert_api.convert_document_direct(request)
copyfile(result, 'C:/Temp/Book1_output.pdf')
print("Result {}".format(result))
except groupdocs_conversion_cloud.ApiException as e:
print("Exception when calling get_supported_conversion_types: {0}".format(e.message))

Related

SharePoint Excel URL to Python Pandas DataFrame -Streamlit

Issue: uploading large file to Streamlit-> need a workaround for file size related issues.
Is there a way to create a pandas df from just a file SharePoint file url link?
I solved it for Google Drive url link but cannot figure out SharePoint.
Potential Solution: Create a url link from SharePoint and load the excel/csv file in as a pandas df.
import pandas as pd
url = 'google drive url'
path = 'https://drive.google.com/uc?export=download&id='+url.split('/')[-2]
df = pd.read_csv(path)

yea you can use https://github.com/vgrem/Office365-REST-Python-Client
download_path = os.path.join(tempfile.mkdtemp(), os.path.basename(FILE_URL))
with open(download_path, "wb") as local_file:
ctx.web.get_file_by_server_relative_url(FILE_URL).download(local_file).execute_query()
then read the download_path
df = pd.read_csv(downloadpath)
don't for get to del out the temp file !
The Library is amazing, you can also read the sharepoint file directly in bytes
Ex:
def read_csv(ctx, relative_url, pandas=False):
# relative_url = "/sites/myLib/Folder/test.csv" #TEST
# ctx = auth()
response = File.open_binary(ctx, relative_url)
bytes_data = response.content
try:
s = str(bytes_data, 'utf8')
except Exception as e:
print('utf8 encoding error')
print(relative_url, e)
try:
s = str(bytes_data, 'cp1252')
except Exception as e:
print('CRITIAL ERROR cp1252 encoding error')
print(relative_url, e)
if pandas == False:
return s
else:
data = StringIO(s)
return data
I use panadas variable bc my final code looks like
df= pd.read_csv(read_csv(ctx=ctx, relative_url=FILE_URL, pandas=True), dtype=str, keep_default_na=False) # read master qrd db

Django: How do I download .xls file through a django view

I have a button which downloads a excel file with extension .xls. I am using module xlrd to parse the file and return it back to the user. However it appears to add the object name into the excel file instead of the data.
How can I return the file to the user with the data rather than the objects name?
View
def download_file(self, testname):
import csv, socket, os, xlrd
extension = '.xls'
path = r"C:\tests\{}_Report{}".format(testname, extension)
try:
f = xlrd.open_workbook(path)
response = HttpResponse(f, content_type='application/ms-excel')
response['Content-Disposition'] = 'attachment; filename={}_Report{}'.format(testname, extension)
return response
except Exception as Error:
return HttpResponse(Error)
return redirect('emissions_dashboard:overview_view_record')
Excel result
Download successful:
Content:
Note: I understand this is an old file format but is required for this particular project.

You are trying to send a xlrd.book.Book object, not a file.
You used xlrd to do your things in the workbook, and then saved to a file.
workbook = xlrd.open_workbook(path)
#... do something
workbook.save(path)
Now you send it like any other file:
with open(path, 'rb') as f:
response = HttpResponse(f.read(), content_type="application/ms-excel")
response['Content-Disposition'] = 'attachment; filename={}_Report{}'.format(testname, extension)

Error when converting Excel document to pdf using comtypes in Python

I am trying to convert an Excel spreadsheet to PDF using Python and the comtypes package using this code:
import os
import comtypes.client
FORMAT_PDF = 17
SOURCE_DIR = 'C:/Users/IEUser/Documents/jscript/test/resources/root3'
TARGET_DIR = 'C:/Users/IEUser/Documents/jscript'
app = comtypes.client.CreateObject('Excel.Application')
app.Visible = False
infile = os.path.join(os.path.abspath(SOURCE_DIR), 'spreadsheet1.xlsx')
outfile = os.path.join(os.path.abspath(TARGET_DIR), 'spreadsheet1.pdf')
doc = app.Workbooks.Open(infile)
doc.SaveAs(outfile, FileFormat=FORMAT_PDF)
doc.Close()
app.Quit()
This script above runs fine and the pdf file is created, but when I try to open it I get the error "The file cannot be opened - there is a problem with the file format" (but after closing this error dialog it is actually possible to preview the pdf file). I have tried a similar script to convert Word documents to pdfs and this worked just fine.
Any ideas on how I can resolve this problem with the file format error?

Found a solution - this seems to be working:
import os
import comtypes.client
SOURCE_DIR = 'C:/Users/IEUser/Documents/jscript/test/resources/root3'
TARGET_DIR = 'C:/Users/IEUser/Documents/jscript'
app = comtypes.client.CreateObject('Excel.Application')
app.Visible = False
infile = os.path.join(os.path.abspath(SOURCE_DIR), 'spreadsheet1.xlsx')
outfile = os.path.join(os.path.abspath(TARGET_DIR), 'spreadsheet1.pdf')
doc = app.Workbooks.Open(infile)
doc.ExportAsFixedFormat(0, outfile, 1, 0)
doc.Close()
app.Quit()
This link may also be helpful as an inspiration regarding the arguments to the ExportAsFixedFormatfunction: Document.ExportAsFixedFormat Method (although some of the values of arguments have to be modified a bit).

You need to describe ExportAsFixedFormat(0,outputfile) to save workbook in pdf format. The solution from http://thequickblog.com/convert-an-excel-filexlsx-to-pdf-python/ works for me.
from win32com import client
import win32api
input_file = r'C:\Users\thequickblog\Desktop\Python session 2\tqb_sample.xlsx'
#give your file name with valid path
output_file = r'C:\Users\thequickblog\Desktop\Python session 2\tqb_sample_output.pdf'
#give valid output file name and path
app = client.DispatchEx("Excel.Application")
app.Interactive = False
app.Visible = False
Workbook = app.Workbooks.Open(input_file)
try:
Workbook.ActiveSheet.ExportAsFixedFormat(0, output_file)
except Exception as e:
print("Failed to convert in PDF format.Please confirm environment meets all the requirements and try again")
print(str(e))
finally:
Workbook.Close()
app.Exit()

Downloading a file using the Dropbox Python library

Environment: Windows 7, Python Tools for Visual Studio, Python 2.7, Python Package dropbox(6.9.0), Access Token from my Dropbox account
The following code is run:
import dropbox
access_token = '<token value here>'
dbx = dropbox.Dropbox(access_token)
with open("C:\Test.txt", "w") as f:
metadata, res = dbx.files_download(path="/Test.txt")
f.write(res.content)
It errors on the last line with the following:
"No disassembly available"
I don't understand the error not being a Python developer.. the file is created on the local machine but nothing is downloaded into it from the dropbox file..
Any help would be greatly appreciated.. Thanks

python code for dropbox download with business API:
def dropbox_file_download(access_token,dropbox_file_path,local_folder_name):
try:
dropbox_file_name = dropbox_file_path.split('/')[-1]
dropbox_file_path = '/'.join(dropbox_file_path.split('/')[:-1])
dbx = dropbox.DropboxTeam(access_token)
# get the team member id for common user
members = dbx.team_members_list()
for i in range(0,len(members.members)):
if members.members[i].profile.name.display_name == logged_user_name:
member_id = members.members[i].profile.team_member_id
break
# connect to dropbox with member id
dbx = dropbox.DropboxTeam(access_token).as_user(member_id)
# list all the files from the folder
result = dbx.files_list_folder(dropbox_file_path, recursive=False)
# download given file from dropbox
for entry in result.entries:
if isinstance(entry, dropbox.files.FileMetadata):
if entry.name == dropbox_file_name:
dbx.files_download_to_file(local_folder_name+entry.name, entry.path_lower)
return True
return False
except Exception as e:
print(e)
return False

import dropbox
access_token = '**********************'
dbx = dropbox.Dropbox(access_token)
f = open("ABC.txt","w")
metadata,res = dbx.files_download("abc.txt") //dropbox file path
f.write(res.content)

Python - create txt file on the fly and send it via FTP?

So i am currently creating a text file from a jinja2 template on the fly and having it be downloaded by the users browser, however i want to add an option to send it somewhere via FTP (all the FTP details are predefined and wont change)
how do i create the file to be sent?
Thanks
code:
...
device_config.stream(
STR = hostname,
IP = subnet,
BGPASNO = bgp_as,
LOIP = lo1,
DSLUSER = dsl_user,
DSLPASS = dsl_pass,
Date = install_date,
).dump(config_file)
content = config_file.getvalue()
content_type = 'text/plain'
content_disposition = 'attachment; filename=%s' % (file_name)
response = None
if type == 'FILE':
response = HttpResponse(content, content_type=content_type)
response['Content-Disposition'] = content_disposition
elif type == 'FTP':
with tempfile.NamedTemporaryFile() as temp:
temp.write(content)
temp.seek(0)
filename = temp.name
session = ftplib.FTP('192.168.1.1','test','password')
session.storbinary('STOR {0}'.format(file_name), temp)
session.quit()
temp.flush()
return response
EDIT
needed to add temp.seek(0) before sending the file

You can use the tempfile module to create a named temporary file.
import tempfile
with tempfile.NamedTemporaryFile() as temp:
temp.write(content)
temp.flush()
filename = temp.name
session.storbinary('STOR {0}'.format(file_name), temp)

Here is a working example using BytesIO under io module. Code is tested and works.
import ftplib
import io
session = ftplib.FTP('192.168.1.1','USERNAME','PASSWORD')
# session.set_debuglevel(2)
buf=io.BytesIO()
# b'str' to content of buff.write() as it throws an error in python3.7
buf.write(b"test string")
buf.seek(0)
session.storbinary("STOR testfile.txt",buf)
session.quit()

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

.xlsx and xls(Latest Versions) to pdf using python - python

You can print an excel sheet to pdf on linux using python. Do need to run openoffice as a headless server and use unoconv, takes a bit of configuring but is doable You run OO as a (service) daemon and use it for the conversions for xls, xlsx and doc, docx. http://dag.wiee.rs/home-made/unoconv/

Related

SharePoint Excel URL to Python Pandas DataFrame -Streamlit

Django: How do I download .xls file through a django view

Error when converting Excel document to pdf using comtypes in Python

Downloading a file using the Dropbox Python library

Python - create txt file on the fly and send it via FTP?

Categories

Resources