Google App Engine (So "Pure Python"): Convert PDF to Image

Google App Engine (So "Pure Python"): Convert PDF to Image - python

In Google App Engine, I need to be able to take an uploaded PDF and convert it to an image (or maybe one day a number of tiled images) for storing and serving back out. Is there a library that will read PDF files that is also 100% python (so it can be uploaded with my app)?
From what I've gathered so far...
PIL does not read PDF files, only writes them.
GhostScript is the standard FOSS PDF reader, but I don't believe I'll be able to upload it with my app to GAE since I don't believe it's 100% python.
Is there anything else I might be able to use? Or maybe even a web service that I can call?)

You may want to look into using the GAE Conversion API (not yet fully released). There's a tester signup form here, with a link to further details.
From the doc:
Conversions can be performed in any direction between PDF, HTML, TXT, and image formats, and OCR will be employed if necessary. Note that while PNG, GIF, JPEG, and BMP image formats are supported as input formats, only PNG is available for output.

Related

How do i extract data from a blueprint file with js

I've been looking around to see if i can find a js library for extracting data from a blueprint in a pdf or png file.
Blue print file sample
I have actually not found any kind of library that could be used to solve this problem. I'll appreciate if someone out there can help out.

For extracting text from PDF files:
pdf.js
pdfminer
PyPDF2 (Python)
For extracting images from PNG or PDF files:
OpenCV (Python)
Pillow (Python)
These libraries can extract the data from the files, but you would have to write additional code to specifically extract information from a blueprint. It may be a complex process, as blueprints can have a variety of different formats and layouts, making it challenging to extract information in a consistent and automated way.
If you are looking to extract information from blueprint files, it may be helpful to consult with a software engineer or computer vision specialist to determine the best approach.

Python web crawling/scraping - Download diagram(PDF or TIFF) from Webpage and save to Local machine

I have one website which has search button and i need to give some numeric value and give enter button. It will go to another page and it display some content in which there are some URL, if i click that URL, it will ask to save diagram and the diagram is either tiff format or PDF.
To download Tiff format diagram, i am using swift plugin in internet explore and save to my machine
Here i am doing this work manually, just i want to do automate this whole process.
Steps:
Using python request module and pass the URL with numeric value to post method
save response content to variable
perform pattern matching and fetch url
click the url but i am stuck with this part to save the diagram local since it is tiff.
is there any module to download tiff based diagram and save to local machine?

Just I want to share How i resolved the issue for the above question and it might be useful for others.
Since tiff image needs to be downloaded from web, so I used python request module with pillow module as below,
from PIL import image
import requests
tiffURL='https://***.tif'
img=Image.open(requests.get(tiffURL,stream=True).raw)
img.save('imagename.jpg')
#img.save('imagename.jpg',quality=95)
Note:
tiff image can not be viewed by normal editor , so i converted to jpg
if you want high resoultion, you can pass quality=95 to save method

Is there a Python library to create thumbnails for various document file formats?

I'd like to generate thumbnails from various "document" file formats such as odt, doc(x) and ppt(x) but also mp4, psd, tiff (and possibly others) from a Python application. As far as I know for each of these formats there is at least one open source application which can generate preview images/thumbnails (e.g. LibreOffice, ffmpeg) or at least extract embedded thumbnails (e.g. imagemagick).
My main problem is that each of these applications/libraries use different command line options so I'm looking for a Python library (or a unified CLI tool) which provides a high-level API to generate a thumbnail with specified dimensions, quality level given a filename and calls the appropriate external tool (ideally including catching exceptions, segfaults and timeouts). Bonus points if it can generate multiple thumbnails if requested (e.g. one per page, page X-Y, every Z seconds but at most N images).
Does anyone know such a library/utility? (Boundary condition: The files may contain sensitive material or might be quite big so this must work without any network communication, using an external web service is not possible.)
If there is no such thing in Python, a locally installable web service would be fine as well.

I ended up writing my own library (named anythumbnailer, MIT license) which worked well enough for my immediate needs. The library is not what I envisioned (only basic thumbnailing, no support for dimensions, …) but it can generate thumbnails for doc(x), xls(x), ppt(x), videos and pdf on Linux with the help of ffmpeg, LibreOffice and ffmpeg.

you can look at Preview generator. preview-generator is a library for generating preview - thumbnails, pdf, text and json overview for all your file-based content. This module gives you access to jpeg, pdf, text, htlm and json preview of virtually any kind of file. It also includes a cache mechanism so you do not have to care about preview storage.

Approaches to embedded vector images/charts into PDF

How have people from the Linux world embedded vector images into PDF?
I am attempting to create automated reports from data that I currently render as SVG images. Ideally, I would like to use the same XML in PostScript, PDF or DjVu format. To what degree are those formats able to handle SVG natively?
More broadly, what have people's experiences been? Should I
reuse the native SVG XML?
rasterise SVGs that have already been created?
or use another format?
I'm restricted to formats that are accessible from Ubuntu 10.04 & Python. This will probably exclude me from utilising Adobe Illustrator files.

Investigate Apache FOP, its main purpose is to convert XML to PDF.
Upsides (for this project):
full Apache project (=> reliable)
Downsides (for this project):
Will need to learn XSL-FO
Not Python

Batik is a nice Java SVG library. It has a utility library called batik-rasterizer.jar which can convert SVG into a some useful formats: PDF, TIFF, PNG, and GIF.
You could use Jython to link to this library with python.

Text to a PNG on App Engine (Python)

Note: I am cross-posting this from App Engine group because I got no answers there.
As part of my site about Japan, I have a feature where the user can
get a large PNG for use as desktop background that shows the user's
name in Japanese. After switching my site hosting entirely to App
Engine, I removed this particular feature because I could not find any
way to render text to a PNG using the image API.
In other words, how would you go about outputting an unicode string on
top of an image of known dimensions (1024x768 for example), so that
the text will be as large as possible horizontally, and centered
vertically? Is there a way to do this is App Engine, or is there some
external service besides App Engine that could make this easier for
me, that you could recommend (besides running ImageMagick on your own
server)?

Solution #1. Pure Python image library.
You can try to bundle PyPNG with your application. PyPNG is a pure Python library to create PNG images. It depends on zlib module, which is allowed on AppEngine, so PyPNG should work on AppEngine. Just use StringIO objects instead of files and write PNG data to them.
Shamelessly adapting PyPNG example how to make a bitmap PNG image:
import png
from StringIO import StringIO
# bitmap data
s = ['110010010011',
'101011010100',
'110010110101',
'100010010011']
s = map(lambda x: map(int, x), s)
f = StringIO()
w = png.Writer(len(s[0]), len(s), greyscale=True, bitdepth=1)
w.write(f, s)
# binary PNG data
print f.getvalue()
I suspect suboptimal performance, but as far as I know there is no other way to generate images on GAE.
And you still need to figure out how to rasterize text to produce bitmap data. The easiest way, probably, is just to keep bitmaps of all the symbols around (essentially, using a bitmap font).
To render ASCII text with PyPNG take a look at texttopng script.
So, limitations are:
Probably slow (needs to be checked)
Glyph rasterization is to be addressed
Solution #2. Off-site text-to-image rendering.
Google AppEngine does not provide tools to render text as raster images, but Google Charts does. With a proper choice of parameters, the outline text chart just renders simple text to PNG images.
For example, http://chart.apis.google.com/chart?chst=d_text_outline&chld=000000|32|h|FFFFFF|_|Render text to image|with Google Charts.|Some Unicode too:|Здра́вствуйте|こんにちは|नमस्ते|你好|שלו produces this:
Limitations:
You cannot generate images bigger than 300000 pixels
Style and font customizations are limited
Some Unicode scripts are not available
White background only

I ran into this same problem with writing text to an image. The issue at hand is that any imaging libraries used on google app engine must be pure python, which rules out PIL.
PyBMP
PyBMP is a pure-python library that can do simple text rendering. From there you can use google's imaging library to composite the resulting bitmap onto your other pictures. There's some sample code below. The downside is the library lacks nicer features like anti-aliasing and fine control over fonts so the text that it renders looks kind of crappy. It also may or may not handle unicode well.
# Create the image
text_img = bmp.BitMap(300,35,bmp.Color.WHITE)
# bmpfont_Tw_Cen_MT_30 is a generated file using PyBMP's tool
text_img.setFont(bmpfont_Tw_Cen_MT_30.font_data)
text_img.setPenColor( bmp.Color.BLACK )
text_img.drawText(name, 0, 0)
After this you can use google's composite function on text_img.getBitmap() as you would any other image.
External Image Processing
If the text isn't good enough (it wasn't for my project), an alternative solution is to set up an external server on a service like Rackspace purely for image processing. Set up an HTTP handler that does your image processing with PIL, and then returns the resulting image. From there you can either
upload the result straight to your static file hosting server (like s3) or
get the generated-text image result with app engine's urlfetch library and do the rest of your compositing in app engine
Not pretty, but it gets the job done.

It's a bit too late but I was looking for the same. I managed to draw unicode string (here Devanagari) onto an image and save it as a '.png' file by doing the following:
# -*- coding: utf-8 -*-
import Image, ImageDraw, ImageFont
img = Image.new('L', (16,16), 255)
draw = ImageDraw.Draw(img)
text_to_draw = unicode('क','utf-8')
font = ImageFont.truetype('Path/to/font/file',12)
draw.text((2,2), text_to_draw, font = font)
del draw
img.save('image.png')
P.S. got help from other posts on stackoverflow

[Stop press: As comment suggests - this answer doesn't work in Googe App Engine.]
The Python Imaging Library (PIL) can accomplish this.
You can load in the image, draw Unicode text on it with the ImageDraw.text() function.
You may need to call ImageDraw.textsize() a few times with different font sizes to find thelargest font that will fit.
Finally, you can save the .png image to a file (or serve it back directly).
Test with large images if you are running it from within the context of a web-server, to make sure you can allocate sufficient memory to processs large PNG files.
(Have I answered your question appropriately? I don't know if PIL is an option from within the Google App Engine.)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Google App Engine (So "Pure Python"): Convert PDF to Image - python

Related

How do i extract data from a blueprint file with js

Python web crawling/scraping - Download diagram(PDF or TIFF) from Webpage and save to Local machine

Is there a Python library to create thumbnails for various document file formats?

Approaches to embedded vector images/charts into PDF

Text to a PNG on App Engine (Python)

Categories

Resources