Python - Convert JPG to text file - python

Good morning all,
I've made a Python script that adds text on top of images, based on a preset template. I'm now developing a template editor that will let the user edit the template in GUI, then save the template as a config file. The idea is that one user can create a template, export it, send it to a new user on a separate computer, who can import it into their config file. The second user will retain full edit abilities on the template (if any changes needs to be made).
Now, in addition to the text, I also want the ability to add up to two images (company logos, ect.) to the template/stills. Now, my question: Is there a way to convert a JPG to pure text data, that can be saved to a config file, and that can be reinterpreted to a JPG at the receiving system. And if not, what would be the best way to achieve this? What I'm hoping to avoid is the user having to send the image files separately.

Sounds questionable that you want to ship an image as text file (it's easy, base64 is supplied with python, but it drastically increases the amount of bytes. I'd strongly recommend not doing that).
I'd rather take the text and embed it in the image metadata! That way, you would still have a valid image file, but if loaded with your application, that application could read the metadata, interpret it as text config.
There's EXIF and XMP metadata, for both there's python modules.
Alternatively, would make more sense to simply put images and config files into one archive file (you know .docx word documents? They do exactly that, just like .odt; java jar files? Same. Android APK files? All archive files with multiple files inside) python brings a zip module to enable you to do that easily.
Instead of an archive, you could also build a PDF file. That way, you could simply have the images embedded in the PDF, the text editable on top of it, any browser can display it, and the text stays editable. Operating on pdf files can be done in many ways, but I like Fitz from the PyMuPDF package. Just make a document the size of your image, add the image file, put the text on top. On the reader side, find the image and text elements. It's relatively ok to do!
PDF is a very flexible format, if you need more config that just text information, you can add arbitrary text streams to the document that are not displayed.

If I understand properly, you want to use the config file as a settings file that stores the preferences of a user, you could store such data as JSON/XML/YAML or similar, such files are used to store data in pure readable text than binary can be parsed into a Python dict object. As for storing the images, you can have the generated images uploaded to a server then use their URL when they are needed to re-download them, unless if I didn’t understand the question?

Related

Using temporary files and folders in Web2py app

I am relatively new to web development and very new to using Web2py. The application I am currently working on is intended to take in a CSV upload from a user, then generate a PDF file based on the contents of the CSV, then allow the user to download that PDF. As part of this process I need to generate and access several intermediate files that are specific to each individual user (these files would be images, other pdfs, and some text files). I don't need to store these files in a database since they can be deleted after the session ends, but I am not sure the best way or place to store these files and keep them separate based on each session. I thought that maybe the subfolders in the sessions folder would make sense, but I do not know how to dynamically get the path to the correct folder for the current session. Any suggestions pointing me in the right direction are appreciated!
I was having this error "TypeError: expected string or Unicode object, NoneType found" and I had to store just a link in the session to the uploaded document in the db or maybe the upload folder in your case. I would store it to upload to proceed normally, and then clear out the values and the file if not 'approved'?
If the information is not confidential in similar circumstances, I directly write the temporary files under /tmp.

pdf in python which consist data from .xlsx file and png image

I wanted to create a pdf using Python 3x.
The pdf should have some text data which is stored in a .xlsx file i.e.., it should read data from .xlsx file and write into the .pdf file.
Along with that, the pdf should have a png image of passport size.
I have come up with two basic ideas which are:-
First one is by writing a program which create a text file in which all required data from the pdf will be written along with the png image. After that the program will convert it into a pdf file.
Second one is by writing a program which will create the pdf file and write the data from .xlsx file as well as insert the image too into the pdf file.
I don't know whether these ideas can be used or not and how it can be used but after going through some researches on GFG, Stack overflow..., I have got totally confused and ended up asking this problem on this platform.
I have tried some modules like PIL, FPDF, reportlab,.. and am successfully able to create a pdf file with either texts or images but unable to combine both in the same text file.
Also I am confused in deciding which idea I should implement.
What I need from you guys is the answer of few of my questions which are:-
Are the ideas I mentioned above(second one specially) practically possible?
Can I make a program which imports data from file as well as png image into the same pdf. What modules and functions will be used there and how.
Please provide the code with comments or defining/elaborating the work of function used.
I hope I will get the desired result soon. Meanwhile I will try to solve it out by myself.

Python and Visio 365: automated saving to .svg

I would like to create a script that opens Visio files (.vsd), save it to vsdx, pdf and svg (with every page of vsd being seperate file), close the file, opens the next until end of files.
So far i was successful with saving it to .pdf using: Python Visio to pdf
import win32com.client
#change later to dynamic current path
path= r"C:/automation_visio/"
visio = win32com.client.Dispatch("Visio.Application")
doc = visio.Documents.Open(path+'test.vsd')
doc.ExportAsFixedFormat( 1, path+'test.pdf', 1, 0 ) #exports as pdf only XD
I looked in lots of places (most relevant: https://learn.microsoft.com/en-us/office/vba/api/visio.document.saveas) but to no avail - I don't know how to save to other filetypes which are available with manual "SaveAs".
EDIT: I need to know also how to navigate trough the pages (to get list of pages and iterate trough them and save to svg files) and (shamefully) how to correctly close the file after files are exported.
You need to use page.Export method instead of ExportAsFixedFormat. Just give the target file the .svg extension, and you are good to go.
BTW, I have a Visio Add-in (check the profile) that adds some useful stuff to the export, like connections, properties, etc, to be used from JavaScript. And it is also callable programmatically.
I wanted to add that there are two options for exporting SVG from Visio. Normally, Visio adds a bunch of extra data, such as User-defined cells, Layers, and Shape Data fields. This can be useful if you want to program against the export, or re-import to Visio at some time in the future.
However, if you want small, clean SVG, you won't want all of that extra stuff. So you can fiddle with Visio.Application.ApplicationSettings.SVGExportFormat, setting it equal to one of the following:
// (0) Include both SVG elements and Visio elements. This is the default.
Visio.VisSVGExportFormat.visSVGIncludeVisioElements
// (1) Include SVG elements only.
Visio.VisSVGExportFormat.visSVGExcludeVisioElements
The extra Visio info that is added to the SVG export is easy to find, just look for elements with the "v:" prefix.

Do .py Python files contain metadata?

.doc files, .pdf files, and some image formats all contain metadata about the file, such as the author.
Is a .py file just a plain text file whose contents are all visible once opened with a code editor like Sublime, or does it also contain metadata? If so, how does one access this metadata?
On Linux and most Unixes, .py's are just text (sometimes unicode text).
On Windows and Mac, there are cubbyholes where you can stash data, but I doubt Python uses them.
.pyc's, on the other hand, have at least a little metadata stuff in them - or so I've heard. Specifically: there's supposed to be a timestamp in them, so that if you copy a filesystem hierarchy, python won't automatically recreate all the .pyc's on import. There may or may not be more.

Generate ODT/DOC(X) and convert to PDF, without OO.o/MS

I have a WSGI application that generates invoices and stores them as PDF.
So far I have solved similar problems with FPDF (or equivalents), generating the PDF from scratch like a GUI. Sadly this means the entire formatting logic (positioning headers, footers and content, styling) is in the application, where it really shouldn't be.
As the templates already exist in Office formats (ODT, DOC, DOCX), I would prefer to simply use those as a basis and fill in the actual content. I've found the Appy framework, which does pretty much that with annotated ODT files.
That still leaves the bigger problem open, tho: converting ODT (or DOC, or DOCX) to PDF. On a server. Running Linux. Without GUI libraries. And thus, without OO.o or MS Office.
Is this at all possible or am I better off keeping the styling in my code?
The actual content that would be filled in is actually quite restricted: a few paragraphs, some of which may be optional, a headline or two, always at the same place, and a few rows of a table. In HTML this would be trivial.
EDIT: Basically, I want a library that can generate ODT files from ODF files acting as templates and a library that can convert the result into PDF (which is probably the crux).
I don't know how to go about automatic ODT -> PDF conversion, but a simpler route might be to generate your invoices as HTML and convert them to PDF using http://www.xhtml2pdf.com/. I haven't tried the library myself, but it definitely seems promising.
You can use QTextDocument, QTextCursor and QTextDocumentWriter in PyQt4. A simple example to show how to write to an odt file:
>>>from pyqt4 import QtGui
# Create a document object
>>>doc = QtGui.QTextDocument()
# Create a cursor pointing to the beginning of the document
>>>cursor = QtGui.QTextCursor(doc)
# Insert some text
>>>cursor.insertText('Hello world')
# Create a writer to save the document
>>>writer = QtGui.QTextDocumentWriter()
>>>writer.supportedDocumentFormats()
[PyQt4.QtCore.QByteArray(b'HTML'), PyQt4.QtCore.QByteArray(b'ODF'), PyQt4.QtCore.QByteArray(b'plaintext')]
>>>odf_format = writer.supportedDocumentFormats()[1]
>>>writer.setFormat(odf_format)
>>>writer.setFileName('hello_world.odt')
>>>writer.write(doc) # Return True if successful
True
If not sure the difference between odt and odf in this case. I checked the file type and it said 'application/vnd.oasis.opendocument.text'. So I assume it is odt. You can print to a pdf file by using QPrinter.
More information at:
http://qt-project.org/doc/qt-4.8/

Categories

Resources