pdf in python which consist data from .xlsx file and png image - python

I wanted to create a pdf using Python 3x.
The pdf should have some text data which is stored in a .xlsx file i.e.., it should read data from .xlsx file and write into the .pdf file.
Along with that, the pdf should have a png image of passport size.
I have come up with two basic ideas which are:-
First one is by writing a program which create a text file in which all required data from the pdf will be written along with the png image. After that the program will convert it into a pdf file.
Second one is by writing a program which will create the pdf file and write the data from .xlsx file as well as insert the image too into the pdf file.
I don't know whether these ideas can be used or not and how it can be used but after going through some researches on GFG, Stack overflow..., I have got totally confused and ended up asking this problem on this platform.
I have tried some modules like PIL, FPDF, reportlab,.. and am successfully able to create a pdf file with either texts or images but unable to combine both in the same text file.
Also I am confused in deciding which idea I should implement.
What I need from you guys is the answer of few of my questions which are:-
Are the ideas I mentioned above(second one specially) practically possible?
Can I make a program which imports data from file as well as png image into the same pdf. What modules and functions will be used there and how.
Please provide the code with comments or defining/elaborating the work of function used.
I hope I will get the desired result soon. Meanwhile I will try to solve it out by myself.

Related

Unable to print pdf created by PyPDF2

I created a modified pdf using the python library PyPDF2 but when I try to print it using my printer all I get is a set of blank pages. The Adobe Reader is also not able to read the newly created PDF file which makes me think that most probably the pdf is being read by the printer but the format is inappropriate for printing. And the problem is I don't have much knowledge about PDFs and their formats.
For example a pdf that does print has a format:
And for the pdf that I created and which doesn't print, the format is:
Hence, I would like to know what is the problem with the PDF I created using PyPDF2 and how can I fix it for printing it.
Edit:
The code used by me can be seen here on my previous post.

HDF5 file data to JPG image conversion

I have a number of HDF5 files that I want to convert the data inside these files to JPG or PNG format.
This is my HDF5 files link.
These files are from the PICMUS challenge and the only way to open them is through the MATLAB codes provided in this link.
However, I need to open these files in Python or at least convert the data inside them into PNG or JPG images so that I can use them in my Python project.
I tried different ways but none of them worked. Can anyone convert these files for me or at least tell me how to open them?
Please provide me a detailed solution.

Python - Split pdf or powerpoint by pixel location?

I will explain my dilemma first: I have several thousand powerpoint files (.ppt) that I need to extract the text. The problem is the text is is disorganized in the file and when read as a complete page it makes no sense for what I need (it would read in the example: line 1, line 3, line 2, line 4, line 5).
I was using tika to read the files initially. I then thought if I converted to pdf using glob and win32com.client that I would have some better luck but it's basically the same result. The picture here is an example of what the text is like.
So now my idea now is if I can section the pdf or ppt by pixel location (and save to separate temp files if needed, opened, and read that way) I can keep things in order and get what I need. Although the text moves around within each box, the black outline boxes are always roughly in the same location.
I cannot find anything to split an individual pdf page though, only multiple pages into a single page. Does anyone have an idea how to go about doing this?
I need to read the text in box one together (line 1 and line 2) and load into a dictionary or some other container, and the same for the second box. For reference there is only one slide in the powerpoint.
Allow me to provide the answer as a general guideline:
Both .ppt and .pptx files are glorified .zip files.
Use 7-zip or WinZip to open the .pptx and understand the structure.
Convert them into a .pptx file.
Each slide should now have a .xml file full of tags you can parse.
For example you will find tags for each text box with tags for that box's text nested inside.
Also: python-pptx
Mass convert by tweaking this VBA code: Link for VBA
Or using PowerShell: Link for [PowerShell]

How to Convert a CSV file into a PDF

I'm currently making a program in python that creates data and then gets stored into a text file. The data is in a column like formation and when i change the file format to csv, it opens LibreOffice Calc (raspberry pi's version of excel) which is exactly how i wanted the data to be formatted.
But i want to take it one step further and convert my CSV file data into a PDF. I've looked on the web and it says how to convert a pdf into a csv which isn't what i want. I also saw something called pyPDF but im not sure about if that would be of any use.
This is the string of data that is being looped 10 times,
resultStr = 'Test,{},InNum,{},stats,{},Duration(ms),{} \n'.format("OFF",inPin, result, round(duration*1000))
Once the loop finishes, a text file gets opened and the 'resultStr' is the string is getting stored.
Thanks everyone for your help,
~Neamus
Using ReportLab, you can programatically generate PDF documents with your data. There are plenty of examples available to demonstrate the framework and how to use it. In your case, you should simply append to your document story in a loop for each of your CSV result strings.

What does preview app of OS X do to help extracting from pdf?

When I extracted content from a pdf file with 12 pages using my program based on pdfminer, I got wrong result with only 11 pages. I tested it with other files and got right result in most cases.
By accident, I opened it with preview app in OS X Yosemite(v10.10.4), and save it without any other operations. Then the result I got from program was right. I found size of this file was changed from 2m to 300k by preview, but have no idea what it had done.
I tried searching an answer, but most topics are about using export function of preview app to compress pdf file, and seems no one come across the same problem with pdfminer neither.
1, What does preview app do with a pdf file when "save" ?
2, How can I deal with the problem ?
Thanks in advance!
PDF is a complex file format which supports many different features and ways of doing things. Your pdfminer app apparently has problems with some of those features, which causes it to misinterpret certain files. Preview on the other hand seems to correctly support everything and was able to correctly read the file into its internal presentation format. When you then re-saved the file, Preview wrote it in the way that it would write the same information. Again, lots of different ways to do the same thing means different programs will do things differently.
Preview apparently has a better, more compatible, more streamlined way to express the same content; and your pdfminer can handle it better.

Categories

Resources