Get a preview JPEG of a PDF on Windows? - python

I have a cross-platform (Python) application which needs to generate a JPEG preview of the first page of a PDF.
On the Mac I am spawning sips. Is there something similarly simple I can do on Windows?

ImageMagick delegates the PDF->bitmap conversion to GhostScript anyway, so here's a command you can use (it's based on the actual command listed by the ps:alpha delegate in ImageMagick, just adjusted to use JPEG as output):
gs -q -dQUIET -dPARANOIDSAFER -dBATCH -dNOPAUSE -dNOPROMPT \
-dMaxBitmap=500000000 -dLastPage=1 -dAlignToPixels=0 -dGridFitTT=0 \
-sDEVICE=jpeg -dTextAlphaBits=4 -dGraphicsAlphaBits=4 -r72x72 \
-sOutputFile=$OUTPUT -f$INPUT
where $OUTPUT and $INPUT are the output and input filenames. Adjust the 72x72 to whatever resolution you need. (Obviously, strip out the backslashes if you're writing out the whole command as one line.)
This is good for two reasons:
You don't need to have ImageMagick installed anymore. Not that I have anything against ImageMagick (I love it to bits), but I believe in simple solutions.
ImageMagick does a two-step conversion. First PDF->PPM, then PPM->JPEG. This way, the conversion is one-step.
Other things to consider: with the files I've tested, PNG compresses better than JPEG. If you want to use PNG, change the -sDEVICE=jpeg to -sDEVICE=png16m.

You can use ImageMagick's convert utility for this, see some examples in http://studio.imagemagick.org/pipermail/magick-users/2002-May/002636.html
:
Convert taxes.pdf taxes.jpg
Will convert a two page PDF file into [2] jpeg files: taxes.jpg.0,
taxes.jpg.1
I can also convert these JPEGS to a thumbnail as follows:
convert -size 120x120 taxes.jpg.0 -geometry 120x120 +profile '*' thumbnail.jpg
I can even convert the PDF directly to a jpeg thumbnail as follows:
convert -size 120x120 taxes.pdf -geometry 120x120 +profile '*' thumbnail.jpg
This will result in a thumbnail.jpg.0 and thumbnail.jpg.1 for the two
pages.

Is the PC likely to have Acrobat installed? I think Acrobat installs a shell extension so previews of the first page of a PDF document appear in Windows Explorer's thumbnail view. You can get thumbnails yourself via the IExtractImage COM API, which you'll need to wrap. VBAccelerator has an example in C# that you could port to Python.

Related

ImageMagick is splitting the NASA's [.IMG] file by a black line upon converting to JPG

I have some raw .IMG format files which I'm converting to .jpg using ImageMagick to apply a CNN Classifier. The converted images, however have a black vertical line splitting the image into two. The part on the left side of the line should have actually been on the right side of the right part of the image. I've posted a sample image:
I used the command magick convert input_filename.IMG output_filename.jpg
Raw .IMG File
Here is how the image is supposed to look (converted manually using numpy):
How the image is actually looking (with the vertical black line using ImageMagick):
Version Details:
harshitjindal#Harshits-MacBook-Pro ~ % magick identify -version
Version: ImageMagick 7.0.10-0 Q16 x86_64 2020-03-08
https://imagemagick.org Copyright: © 1999-2020 ImageMagick Studio LLC
License: https://imagemagick.org/script/license.php Features: Cipher DPC
HDRI Modules OpenMP(3.1) Delegates (built-in): bzlib freetype heic jng
jp2 jpeg lcms ltdl lzma openexr png tiff webp xml zlib
I don't know why ImageMagick is failing to interpret the file correctly, but I can show you how to make it work.
You need to search in your file for the height, width and data type of your image, you can do that like this:
grep -E "LINES|LINE_SAMPLES|BITS" EW0220149939G.IMG
LINES = 1024
LINE_SAMPLES = 1024
SAMPLE_BITS = 8
That means your image is 1024x1024 and 8 bits/sample (1 byte). Then you need to take that number of bytes from the tail end of the file and feed them into ImageMagick. So, you need the final 1024x1024 bytes which you can get with tail or gtail (GNU tail) as you are on a Mac.
gtail -c $((1024*1024*1)) EW0220149939G.IMG | convert -depth 8 -size 1024x1024 gray:- result.jpg
If your image is 16-bit, like in your other question, you need to use:
gtail -c $((1024*1024*2)) 16-BIT-IMAGE.IMG | convert -depth 16 -size 1024x1024 gray:- result.jpg
If you dislike using gtail to get the last megabyte, you can alternatively specify an offset from the start of the file that tells ImageMagick where the pixel data starts. So, first you need the size of the header:
grep -E "RECORD_BYTES|LABEL_RECORDS" EW*IMG
RECORD_BYTES = 1024
LABEL_RECORDS = 0007
That means we need to skip 1024*7 bytes to get to the image, so the command is:
convert -size 1024x1024+$((1024*7)) -depth 8 gray:EW0220149939G.IMG result.jpg

gdal_merge overlaying pngs over one another

I have a lot of PNGs that were tiled with gdal2tiles.py. I have done some processing on these tiles and now I would like combine them back into one large TIF.
For example I have folder 13-20 for different zoom levels, let's say I want all the PNGs from zoom level 20 to be a single mosaic, how would I do with gdal_merge? I'm using gdal_merge now trying this but I end up getting the last .PNG that it processes so my TIF is just a 256x256 TIF of the last processed PNG. Here is my current code,
python gdal_merge.py -o mos.tif -of GTiff -v --optfile tif_list.txt
tif_list.txt contains the list of all my PNGs
I'm assuming I might need to add a -co option but I cannot find any documentation on what I can use in -co. If this is needed my coordinate system is EPSG 3857, and the tiles were generated as mercator. Any help would be appreciated.
Update:
format of tif_list.txt,
C:\Users\Administrator\Desktop\19\195953\226590.png
C:\Users\Administrator\Desktop\19\195954\226581.png
C:\Users\Administrator\Desktop\19\195954\226582.png
C:\Users\Administrator\Desktop\19\195954\226583.png
C:\Users\Administrator\Desktop\19\195954\226584.png
C:\Users\Administrator\Desktop\19\195954\226585.png
C:\Users\Administrator\Desktop\19\195954\226586.png
C:\Users\Administrator\Desktop\19\195954\226587.png
C:\Users\Administrator\Desktop\19\195954\226588.png
C:\Users\Administrator\Desktop\19\195954\226589.png
C:\Users\Administrator\Desktop\19\195954\226590.png
C:\Users\Administrator\Desktop\19\195955\226581.png
C:\Users\Administrator\Desktop\19\195955\226582.png
C:\Users\Administrator\Desktop\19\195955\226583.png
C:\Users\Administrator\Desktop\19\195955\226584.png
C:\Users\Administrator\Desktop\19\195955\226585.png
C:\Users\Administrator\Desktop\19\195955\226586.png
C:\Users\Administrator\Desktop\19\195955\226587.png
C:\Users\Administrator\Desktop\19\195955\226588.png
C:\Users\Administrator\Desktop\19\195955\226589.png
C:\Users\Administrator\Desktop\19\195955\226590.png
examples of the PNGs,

Stitch images together from html

This website has a large image combrised of 132 images, I want to find a way to stitch them together into a single image. I know some python if anyone has a clue where to start.
http://www.mapytatr.net/PRODUKTY/MAPY_TAT/WYSOKIE/SLICES/wys_ii.htm
Thanks!
Matt
Forget Python - use ImageMagic (http://www.imagemagick.org/)
+append to create row
convert tpn_1.jpg tpn_2.jpg tpn_3.jpg +append row_1.jpg
-append to create column
convert row_1.jpg row_2.jpg row_3.jpg -append final.jpg
You can try also montage (from ImageMagic too) to get all in one command
montage -mode concatenate -tile 3x3 tpn_* final.jpg
BTW: on Linux you can download all images using wget and for in bash
for i in $(seq 132); do echo "http://www.mapytatr.net/PRODUKTY/MAPY_TAT/WYSOKIE/SLICES/tpn_$i.jpg"; done | wget -i -
kochane tatry :)

Saving thumbnails as fits files

Most of my code takes a .fits file and creates small thumbnail images that are based upon certain parameters (they're images of galaxies, and all this is extraneous information . . .)
Anyways, I managed to figure out a way to save the images as a .pdf, but I don't know how to save them as .fits files instead. The solution needs to be something within the "for" loop, so that it can just save the files en masse, because there are way too many thumbnails to iterate through one by one.
The last two lines are the most relevant ones.
for i in range(0,len(ra_new)):
ra_new2=cat['ra'][z&lmass&ra&dec][i]
dec_new2=cat['dec'][z&lmass&ra&dec][i]
target_pixel_x = ((ra_new2-ra_ref)/(pixel_size_x))+reference_pixel_x
target_pixel_y = ((dec_new2-dec_ref)/(pixel_size_y))+reference_pixel_y
value=img[target_pixel_x,target_pixel_y]>0
ra_new3=cat['ra'][z&lmass&ra&dec&value][i]
dec_new_3=cat['dec'][z&lmass&ra&dec&value][i]
new_target_pixel_x = ((ra_new3-ra_ref)/(pixel_size_x))+reference_pixel_x
new_target_pixel_y = ((dec_new3-dec_ref)/(pixel_size_y))+reference_pixel_y
fig = plt.figure(figsize=(5.,5.))
plt.imshow(img[new_target_pixel_x-200:new_target_pixel_x+200, new_target_pixel_y-200:new_target_pixel_y+200], vmin=-0.01, vmax=0.1, cmap='Greys')
fig.savefig(image+"PHOTO"+str(i)+'.pdf')
Any ideas SO?
For converting FITS images to thumbnails, I recommend using the mJPEG tool from the "Montage" software package, available here: http://montage.ipac.caltech.edu/docs/mJPEG.html
For example, to convert a directory of FITS images to JPEG files, and then resize them to thumbnails, I would use a shell script like this:
#!/bin/bash
for FILE in `ls /path/to/images/*.fits`; do
mJPEG -gray $FILE 5% 90% log -out $FILE.jpg
convert $FILE.jpg -resize 64x64 $FILE.thumbnail.jpg
done
You can, of course, call these commands from Python instead of a shell script.
As noted in a comment, the astropy package (if not yet installed) will be useful:
http://astropy.readthedocs.org. You can import the required module at the beginning.
from astropy.io import fits
At the last line, you can save a thumbnail FITS file.
thumb = img[new_target_pixel_x-200:new_target_pixel_x+200,
new_target_pixel_y-200:new_target_pixel_y+200]
fits.writeto(image+str(i).zfill(3)+'.fits',thumb)

Ghostscript PDF to PNG: Character spacing of words gets messed up in the resulting image

From a pdf file, I am successfully generating 1 png image for each page in the pdf.
The problem is that no matter what setting I use, for some pages GhostScript messes up the font spacing such that in some pngs, one word looks like it is 2 or 3 words.
This is a problem as I am using these files in evernote which messes up expected search results. So a search for "Providers" returns nothing because in the png file, it appears as "Pro vid e rs" (or 'Users' appears as "Use rs").
Dropbox link to a screenshot showing the original text of the source pdf on the left and generated png on the right --> http://dl.dropbox.com/u/13267240/ScreenClip.png
I am new to Ghostscript and am at a loss as to why this is happening.
Here is the command line I am using (in Python):
cmd = "gswin%sc " % (SYS_PROCESSOR_ARCH) + "-q -dNOPAUSE -dBATCH -dPDFFitPage=true -sDEVICE=png16m -r%s " % (PNG_RES) + "-sOutputFile=" + '"%s\%s-pg-%%d.%s" "%s"' % (outputdir, outputFileNamePrefix, suffix, pdfSourceFile)
OR evaluated at runtime:
gswin64c -q -dNOPAUSE -dBATCH -dPDFFitPage=true -sDEVICE=png16m -r300X300 -sOutputFile="C:\EPTK-TMP\02-01-Introduction-pg-%d.png" "C:\EPTK-TMP\02-01-Introduction.pdf"
The font in your PDF sample is a sans-serif font (without the little ornamental endings of lines etc...), the font in your PNG sample is a serif font (with the little ornamental...).
So GhostScript is for some reason not using the correct font while doing the PDF to PNG conversion. This might have several reasons:
1) The fonts might not be embedded in the PDF file, so GhostScript has to figure out something else.
2) The fonts might also not be available on your system, so GhostScript simply replaces them with some default. This changes how the letters look and probably also the width of the letters, which gives you the resulting spacing issues.
So the question is whether you are generating the PDF in the first place. If so you might need to do it better so that GhostScript can use the embedded font. If you are not generating the PDF you could try to figure out what fonts are used in these PDF files you have and make sure they are available to GhostScript on your system.
I'm not that well known with GhostScript, but perhaps the fonts are already on your system and it's just a matter of GhostScript not finding them. In that case look whether there is a command line argument to point it to the correct font folder(s) on your system.

Categories

Resources