Convert .pages to .doc or .pdf in Python - python

How does one convert a .pages file to a .doc or .pdf file using Python? My use case is basically:
User uploads a .pages file to my service
My service converts the .pages to a .pdf`
The .pdf is rendered in browser using a browser-based .pdf viewer

I've never done it, but it appears the .pages file already contains a pdf version if you unzip the file: http://blog.cleverly.com/

A complete native solution in python will be difficult.
Appropriate solution would be to look at how you can automate pages to export the file in pdf or ms word.
For that, there seems to be an available solution:
pyobjc
Three is an example that automates pages using pyobjc: http://www.mugginsoft.com/kosmictask/help/automation-python

Related

How to have a pandas DATAFRAME saved into a SHAREPOINT as csv file?

I have a DataFrame that I would like to store as a CSV file in a Sharepoint.
It seems that the only way is to first save CSV file locally and then, using Shareplum, upload file to Sharepoint.
Is there a way to directly save DataFrame into Sharepoint as CSV file, without creating a local file?
Thanks a lot for your help.
It should be possible to write the csv content to an in-memory text buffer (e.g. StringIO or ByteIO) rather than to a local file - here is an example (last section of the page).
After that, you could use a library for writing the content directly to a Sharepoint: This discussion shows several approaches how to do that, including the Office365-REST-Python-Client and also SharePlum, which you have already mentioned.
Here are two more sources (Microsoft technical doc) that you might find useful:
How can I upload a file to Sharepoint using Python?
How to get and upload files from sharepoint with python?

Can't read .docx file which i got after converting pdf using soffice command

I am trying to convert pdf to docx using soffice. It converts it into .docx but it gives textboxes which I am unable to read using the docx api provided by python. Is there any better way to read the file or any better way to convert pdf to docx so that I do not get textboxes?
soffice --infilter="writer_pdf_import" --convert-to docx "convert_this.pdf"
You can try using Aspose.Words for Cloud to convert PDF to Word documents.
https://docs.aspose.cloud/display/wordscloud/Convert+PDF+Document+to+Word
It converts PDF from fixed form to flow form so it is editable in MS Word.
Disclosure: I work at Aspose.Words team.

downloading csv files from a specific site using python

Goal: want to automatize the download of various .csv files from https://wyniki.tge.pl/en/wyniki/archiwum/2/?date_to=2018-03-21&date_from=2018-02-19&data_scope=contract&market=rtee&data_period=3 using Python (this is not the main issue though)
Specifics: in particular, I am trying to download the csv file for the "Settlement price" and "BASE Year"
Problem: when I see the source code for this web page.I see the references to the "Upload" button, but I don't see refences for the csv file(Tbf I am not very good at looking at the source code). As I am using Python (urllib) I need to know the URL of the csv file but don't know how to get it.
This is not a question of Python per se, but about how to find the URL of some .csv that can be downloaded from a web page. Hence, no code is provided.
If you inspect the source code from that webpage in particular, you will see that the form to obtain the csv file has 3 main inputs:
file_type
fields
contracts
So, to obtain the csv file for the "Settlement price" and "BASE Year", you would simply do a POST request to that same URL, passing these as the payload:
file_type=2&fields=4&contracts=4
I would recommend wget command with python. WGET is a command to download any file. Once you download the file with wget then you can manipulate the csv file using other library.
I found this wget library for python.
https://pypi.python.org/pypi/wget
Regards.
Eduardo Estevez.

Downloading public files in Google Drive (Python)

Suppose that someone gives me a link that enables me to download a public file in Google Drive.
I want to write a program that can read the link and then download it as a text file.
For example, https://docs.google.com/document/d/1yJVXtabsP7KrJXSu3XyOh-F2cFoP8Lftr14PtXCLEVU/edit is one of files in my Google Drive.
Everyone can access this file.
But how can I write a Python program that downloads the text file given the above link?
Could someone have some pieces of sample code for me?
It seems that some Google Drive SDK could be useful(?), but is there any way to do it without using SDK?
first you need to write a program that would slice off the link of the file that you have uploaded.
for example in the link that you gave:
https://docs.google.com/document/d/1yJVXtabsP7KrJXSu3XyOh-F2cFoP8Lftr14PtXCLEVU/edit
id is 1yJVXtabsP7KrJXSu3XyOh-F2cFoP8Lftr14PtXCLEVU
save it in some variable , say download_link
now to get the download link:
https://docs.google.com/uc?export=download&id=download_link
this link will download the file
If the above answer doesn't work for you use the following links :
to save as .txt file :
https://docs.google.com/document/d/1yJVXtabsP7KrJXSu3XyOh-F2cFoP8Lftr14PtXCLEVU/export?format=txt
to save as docx file:
https://docs.google.com/document/d/1yJVXtabsP7KrJXSu3XyOh-F2cFoP8Lftr14PtXCLEVU/export?format=docx
generally the trick is to add : export?format=txt instead of edit ! hope it helps.

Read contents of a pdf file

Is there a commandline tool to read a pdf file on linux.Please indicate the appropriate urls for this.
Thanks..
Xpdf and Poppler contain the commandline-utility pdftotext wich converts PDF files to plain text.
There is PyODConverter. It uses OpenOffice working as a service and can convert between various document formats including PDF and simple text.
Not a command line tool but a pdf reading and generation framework
http://www.reportlab.com/software/opensource/
you should also be able to write a simple reader using
https://pypi.org/project/pypdf/
http://code.activestate.com/recipes/511465-pure-python-pdf-to-text-converter/
you can also look at:
http://www.unixuser.org/~euske/python/pdfminer/index.html

Categories

Resources