I need to convert .doc and .docx files to .pdf using python - python

I need to convert .doc and .docx files to .pdf using python . I have seen some answers already available but that are using comtypes and opening WordApplication. I can not do that.
What I seek is a way of doing it using some python libraries that preserves font , tables , heading size and images etc , without opening MS Word or LibreOffice or anything like that
Converting .doc and .docx files to some intermediate format(and later converting that format to pdf) would be fine too , if needed . Please help me with the code or guided instructions(I am not a pro in python) I should follow.

I have been in the similar problem earlier,
My suggestion:
sorry there is no such direct python library to handle Microsoft office formats specially (.doc)
So try to use LibreOffice as a service in Ubuntu its "libreoffice"
if windows its "soffice.exe" use this in command line to convert the document to .PDF without opening LibreOffice
and its easy and fast too and more over gives almost perfect conversion of the file.
A sample:
For Windows:
C:\Program Files (x86)\LibreOffice 4\program\soffice.exe" --headless --convert-to pdf "input_file_path" --outdir "output_dir_path"
This will convert the input file into pdf in the given output directory without opening the LibreOffice ans just using it as a service.
To run this command from python you can use "subprocess" like libraries.

Related

JPG files copied by python script being downloaded with underscore character in place of pipe char

I have created a python script in Google colab that copies and renames a couple of jpg file as well as creating an xml file. It works fine apart from the fact that the jpg files need to be renamed with a pipe '|' towards the end e.g. 1211415|1.jpg. In colab I can see the created files and they seem to be named as expected, but I am auto copying and downloading them with ..
shutil.copy('imageOne.jpg', imageZeroName)
shutil.copy('imageTwo.jpg', imageOneName)
files.download(imageZeroName)
files.download(imageOneName)
They then appear in my downloaded folder as 1211414_1.jpg. The pipe has been replaced by an underscore.
I noticed that the original files could not be located when they had pipes in the name. So I changed the names to imageOne.jpg and imageTwo.jpg i.e. I was using a sample with the correct naming convention like 589946681l0.jpg and received error like file does not exist.
Why is this happening and is there anyway to prevent it from doing this?
Thanks.
If this is on Windows, Windows doesn't support | in file paths and the module is probably replacing it to be able to create the file.
Answers to the What characters are forbidden in Windows and Linux directory names? question expand on what characters and names are invalid on windows

Convert PDF to .ipynb (recover Jupyter notebook from PDF)

I have a PDF file that was created from a Jupyter notebook, but the original .ipynb file is lost.
Is there some tool that would help to convert PDF to .ipynb?
that may not be possible since .ipynb file contains pieces of code that requires for it to execute in jupyter notebook ..so the best option is to try to copy the contents from the pdf on to new .ipynb file and execute it.
PDF to Python is straightforward, but it takes several steps. Essentially you must extract the code to text format and then parse and clean it up to get it back into an executable format.
Save the PDF as text file. Adobe Acrobat does this well, but there are several Python PDF libraries to extract text from any PDF.
parse the text to identify and capture the Python code (as text strings)
Convert the Python text strings to Python tokens.
Clean or lint the Python code to format it so that it will run without errors due to indentation. You can use the Python "black" module or PEP8 linter to clean up indentation.
There are numerous examples of parsing Python in HTML format to Jupyter Notebook format. Spyder and VSCode linters work well to fix indentation.
Not possible to convert pdf into ipynd. But you can use google lens it will help you in copy pasting.

Python3: How do I import an excel spreadsheet into python project? (Im using repl.it website for learning python3)

Python3: How do I import an excel spreadsheet into python project? (I'm using repl.it website for learning python3). I want to automate the entering of data into several connected spreadsheets. I'm trying to automate my work so that I don't have to do it manually anymore.
You can process Excel files directly if you use a local install of Python and a library like OpenPyXL or others.
However, as a workaround, in Excel save the file as a .csv (comma separated values). Open that .csv in a text editor. Copy the contents. In repl.it, click the new file button and create a new file called something like input.csv and paste in the contents.
Repl.it does have the csv library since that's a native library. Details are at https://docs.python.org/3.6/library/csv.html. That should let you read in the data fairly easily.
Since you are dealing with several connected spreadsheets, you may have to do this step with each one and create a new .csv file as your result which you can then open in Excel and save as an Excel file. However, you really are pushing the bounds of what can be done in an online repl.
If you are dealing with large files or want to skip the "save as csv" step, you'll need a local installation of Python.

Openin exe binary files and editing

I'm working on a project to automatize a process. My task is to open an .exe file and edit the binary. I was researching for a possible solution to this task without any success. Does anyone know if there is a Python library or java class that can help? Or any other solution to edit an exe file.
If you just need to edit the binary data contained in the file, then it is just a matter of opening the file as binary and seeking/reading/writing as you would any other binary file.
See the Python docs about reading and writing files: Reading and Writing Files
You'll do something like:
f = open('filename.exe', 'r+b') //'r+b' means read and write binary
Then proceed to seek through the file, read and write where needed.
Depending on your needs, you could treat the .exe file as an "ordinary" binary file as suggested in an other answer.
In other hand, if you need to "decode" Windows portable executable files (accessing header, copying sections), there are some dedicated Python module specialized in that task. I don't know which work the best or has the most features, but you should take a look for example to:
pefile
PELP

How to convert text files to pdf files in python script?

I have a report text file, which is created by my python script. I want to create pdf file from mentioned text file in python script. After searching, I came across reportlab library, but in tutorials of same library it shows creating pdf having manually written contents.However I want to convert my text file to pdf file.
Is there any other option, any script?
thank you in advance.

Categories

Resources