Simple one I think but essentially I need to know what the syntax is for the save function on the PIL. The help is really vague and I can't find anything online. Any help'd be great, thanks :).
From the PIL Handbook:
im.save(outfile, options...)
im.save(outfile, format, options...)
Simplest case:
im.save('my_image.png')
or whatever. In this case, the type of the image will be determined from the extension. Is there a particular problem you're having? Or specific saving option that you'd like to use but aren't sure how to do so?
You may be able to find additional information in the documentation on each filetype. The PIL Handbox Appendixes list the different file types that are supported. In some cases, options are given for save. For example, on the JPEG file format page, we're told that save supports
quality
optimize, and
progressive
with notes about each option.
Image.save(filename[, format[, options]]). You can usually just use Image.save(filename) since it automatically figures out the file type for you from the extension.
Related
How can I read pdf in python? I know one way of converting it to text, but I want to read the content directly from pdf.
Can anyone explain which module in python is best for pdf extraction?
I tried to use PyPDF2 package but it gives me inconsistent results. Also, i would like a lot to have a way to get the tables, the images, and remove the headers and the footers at least consistently, it doesn't need to happens 100% of the times. Thanks for your answers, i just need to find the right library. Thanks!
From another post that asked pretty much the same:
The answer depends if the question is general or specific to a single form. Your approach is reasonable for the general case, but there will be variability. If you have a pdf form that is a single form or report that has been created with different data at each iteration consider converting the form from pdf to postscript then see if you can parse the postscript.
Two utilities do this: pdf2ps and pdftops Try each. This approach may benefit if you know some postscript. With some luck the needed fields may be simple text strings. Worth a try.
I want to create my own .ttf font.
It should contains emojis.
I have some Images (emojis) and I want to put these in a new font (I don't want to edit an existing font and I don't have an empty .ttf template).
I googled and found out that it is possible with python (I am happy about this because Python is my favorite programming language and, in my opinion, I am good in it) and fontforge.
I already installed fontforge but I can't import it in python.
and I don't know how to continue after Import.
can someone give me an example please.
or do you know another way to do this, It don't have to be python and fontforge.
but please with example.
Thank you soooo much 🤗
Since you like using Python, FontTools might be useful for you. See https://fonttools.readthedocs.io/en/latest/colorLib/index.html for documentation regarding building fonts with a COLR table. Also, https://github.com/googlefonts/picosvg and https://github.com/googlefonts/nanoemoji might be of interest.
You didn't actually mention which colour format you want to use for your emoji: bitmaps (CBDT or sbix tables), layered outlines (COLR/CPAL tables), or embedded SVG documents (SVG table)>. I know the above will work for COLR/CPAL; not sure about CBDT, sbix or SVG.
I've been looking for a fast and relatively easy way of searching (grep-ish) for user-defined strings in files of varying formats, i.e xlsx, docx, pptx, pdf using Python.
My research has led me to believe that there might not be a convenient way of doing this, as per a single module or similar. Am I forced to use a separate module for each file type? And if so are these approriate?
docx
openpyxl
pptx
slate
I also looked at forms of decompression to get to the xml-files containing actual text but it seems unwieldy. I just want to be sure that there is no simple, uniform way of handling all of these different filetypes.
Well, I've mostly figured it out. In the end I decided to use powershell combined with "itextsharp.dll" to process the files. It turned out to be simpler than using portable python. Thanks for the answers:-)
I've been trying for about a week to automate image extraction from a pdf. Unfortunately, the answers I found here were of no help. I've seen multiple variations on the same code using pypdf2, all with ['/XObject'] in them, which results in a KeyError.
What I'm looking for seems to be hiding in streams, which I can't find in pypdf2's dictionary (even after recursively exploring the whole structure, calling .getObject() on every indirect object I can find).
Using pypdf2 I've written one page off the pdf and opened it using Notepad++, to find some streams with the /FlateDecode filter.
pdfrw was slightly more helpful, allowing me to use PdfReader(path).pages[page].Contents.stream to get A stream (no clue how to get the others).
Using zlib, I decompressed it, and got something starting with:
/Part <</MCID 0 >>BDC
(It also contains a lot of floating-point numbers, both positive and negative)
From what I could find, BDC has something to do with ghostscript.
At this point I gave up and decided to ask for help.
Is there a python tool to, at least, extract all streams (and identify FlateDecode tag?)
And is there a way for me to identify what's hidden in there? I expected the start tag of some image format, which this clearly isn't. How do I further parse this result to find any image that could be hidden in there?
I'm looking for something I can apply to any PDF that's displayed properly. Some tool to further parse, or at least help me make sense of the streams, or even a reference that will help me understand what's going on.
Edit: it seems, as noted by Patrick, that I was barking up the wrong tree. I went to streams since I couldn't find any xObjects when opening the PDF in Notepad++, or when running the various python scripts used to parse PDFs. I managed to find what I suspect are the images, with no xObject tags, but with what seems like a stream tag - though the information is not compressed.
Unless you are looking to extract inline images, which aren't that common, the content stream is not the place to look for images. The more common case are Streams of type XObject, of subtype Image, which are usually found in a page's Resource->XObject dictionary (see sections 7.3.3, 7.8.3, and 8.95 of the PDF Reference indicated by #mkl).
Alternately, Image XObjects can also be found in Form XObjects (subtype Form, which indicates they have their own content streams) in their own Resource->XObject dictionary, so the search for Image XObjects can be recursive.
An Image XObject can also have a softMask, which is itself its own Image XObject. Form XObjects are also used in Tiling Patterns, and so could conceivably contain Image XObjects (but they aren't that common either), or used in an Annotation's Normal Appearance (but Image XObjects are less commonly used within such Annotations, except maybe 3D or multimedia annotations).
I'm writing a program that requires input in the form of a document, it needs to replace a few values, insert a table, and convert it to PDF. It's written in Python + Qt (PyQt). Is there any well known document standard which can be easily used programmatically? It must be cross platform, and preferably open.
I have looked into Microsoft Doc and Docx, which are binary formats and I can't edit them. Python has bindings for it, but they're only on Windows.
Open Office's ODT/ODF is zipped in an xml file, so I can edit that one but there's no command line utilities or any way to programmatically convert the file to a PDF. Open Office provides bindings, but you need to run Open Office from the command line, start a server, etc. And my clients may not have Open Office installed.
RTF is readable from Python, but I couldn't find any way/libraries to convert RTF documents to PDF.
At the moment I'm exporting from Microsoft Word to HTML, replacing the values and using PyQt to convert it to a PDF. However it loses formatting features and looks awful. I'm surprised there isn't a well known library which lets you edit a variety of document formats and convert them into other formats, am I missing something?
Update: Thanks for the advice, I'll have a look at using Latex.
Thanks,
Jackson
Have you looked into using LaTeX documents?
They are perfect to use programatically (compiling documents? You gotta love that...), and you have several Python frameworks you can use such as plasTeX and PyTex.
Exporting a LaTeX documents to PDF is almost immediate.
Since you're already using PyQt anyway, it might be worth looking at Qt's built-in RTF processing module which looks decent. Here's the documentation on detailed content manipulation including inserting tables. Also the QPrinter module's default print-to-file format happens to be PDF.
Without knowing more about your particular needs it's hard to say if these would do what you want, but since your application already has PyQt as a dependency, seems silly to introduce any more without evaluating the functionality you've already got available.
The non-GUI parts of the Qt framework are often overlooked though.
edit: included more links.
You might want to try ReportLab. The open source version can write PDFs, and the commercial version has a lot of really nice abstractions to allow output to a variety of different formats from a single input.
I don't know the kind of odience of your program, Tex is good and i would go with it.
Another possible choice is Excel format, parsing it with xlrd.
I've used it a couple of time and it's pretty straightforward.
Excel file is a good for the following reasons:
Well known format easy to edit
You could prepare a predefined template with constrains and table
Creating XML documents, transforming them to XSL/fo and rendering with Fop or RenderX. If you use docbook as the primary input, there are toolchains freely available for converting that to PDF, RTF, HTML and so forth.
It is rather quirky to use and not my idea of fun, but is does deliver and can be embedded in an application, AFAICT.
Creating docbook is very straightforward as it has a wide range of semantic tags, table support etc to give a "meaningful" markup which can be reliably formatted. The XSL stylesheets are modular and allow parts to be customized or replaced to generate your own look and feel.
It works well for relatively free flow documents with lots of text.
For filling in the blanks kind of documents, a regular reporting engine may be a better fit, or some straighforward XSL stylesheets spitting out the XSL-fo directly.