Saving thumbnails as fits files - python

Most of my code takes a .fits file and creates small thumbnail images that are based upon certain parameters (they're images of galaxies, and all this is extraneous information . . .)
Anyways, I managed to figure out a way to save the images as a .pdf, but I don't know how to save them as .fits files instead. The solution needs to be something within the "for" loop, so that it can just save the files en masse, because there are way too many thumbnails to iterate through one by one.
The last two lines are the most relevant ones.
for i in range(0,len(ra_new)):
ra_new2=cat['ra'][z&lmass&ra&dec][i]
dec_new2=cat['dec'][z&lmass&ra&dec][i]
target_pixel_x = ((ra_new2-ra_ref)/(pixel_size_x))+reference_pixel_x
target_pixel_y = ((dec_new2-dec_ref)/(pixel_size_y))+reference_pixel_y
value=img[target_pixel_x,target_pixel_y]>0
ra_new3=cat['ra'][z&lmass&ra&dec&value][i]
dec_new_3=cat['dec'][z&lmass&ra&dec&value][i]
new_target_pixel_x = ((ra_new3-ra_ref)/(pixel_size_x))+reference_pixel_x
new_target_pixel_y = ((dec_new3-dec_ref)/(pixel_size_y))+reference_pixel_y
fig = plt.figure(figsize=(5.,5.))
plt.imshow(img[new_target_pixel_x-200:new_target_pixel_x+200, new_target_pixel_y-200:new_target_pixel_y+200], vmin=-0.01, vmax=0.1, cmap='Greys')
fig.savefig(image+"PHOTO"+str(i)+'.pdf')
Any ideas SO?

For converting FITS images to thumbnails, I recommend using the mJPEG tool from the "Montage" software package, available here: http://montage.ipac.caltech.edu/docs/mJPEG.html
For example, to convert a directory of FITS images to JPEG files, and then resize them to thumbnails, I would use a shell script like this:
#!/bin/bash
for FILE in `ls /path/to/images/*.fits`; do
mJPEG -gray $FILE 5% 90% log -out $FILE.jpg
convert $FILE.jpg -resize 64x64 $FILE.thumbnail.jpg
done
You can, of course, call these commands from Python instead of a shell script.

As noted in a comment, the astropy package (if not yet installed) will be useful:
http://astropy.readthedocs.org. You can import the required module at the beginning.
from astropy.io import fits
At the last line, you can save a thumbnail FITS file.
thumb = img[new_target_pixel_x-200:new_target_pixel_x+200,
new_target_pixel_y-200:new_target_pixel_y+200]
fits.writeto(image+str(i).zfill(3)+'.fits',thumb)

Related

How do I modify TIFF physical resolution metadata

I have several pyramidal, tiled TIFF images that were converted from a different format. The converter program wrote incorrect data to the XResolution and YResolution TIFF metadata. How can I modify these fields?
tiff.ResolutionUnit: 'centimeter'
tiff.XResolution: '0.34703996762331574'
tiff.YResolution: '0.34704136833246829'
Ideally I would like to use Python or a command-line tool.
One can use tifftools.tiff_set from Tiff Tools.
import tifftools
tifftools.tiff_set(
PATH_TO_ORIG_IMAGE,
PATH_TO_NEW_IMAGE,
overwrite=False,
setlist=[
(
tifftools.Tag.RESOLUTIONUNIT,
tifftools.constants.ResolutionUnit.CENTIMETER.value,
),
(tifftools.Tag.XRESOLUTION, xresolution),
(tifftools.Tag.YRESOLUTION, yresolution),
],
)
Replace xresolution and yresolution with the desired values. These values must be floats. In this example, the resolution unit is centimeter.
This is also possible with the excellent tifffile package. In fact there is an example of this use case in the README.
with TiffFile('temp.tif', mode='r+') as tif:
_ = tif.pages[0].tags['XResolution'].overwrite((96000, 1000))
Be aware that this will overwrite the original image. If this is not desired, make a copy of the image first and then overwrite the tags.

Is there a proper way to convert common picture file extensions into a .PGM "P2" using PIL or cv2?

Edit: Problem solved and code updated.
I apologize in advance for the long post. I wanted to bring as much as I could to the table. My question consists of two parts.
Background: I was in need of a simple Python script that would convert common picture file extensions into a .PGM ASCII file. I had no issues coming up with a naive solution as PGM seems pretty straight forward.
# convert-to-pgm.py is a script for converting image types supported by PIL into their .pgm
# ascii counterparts, as well as resizing the image to have a width of 909 and keeping the
# aspect ratio. Its main purpose will be to feed NOAA style images into an APT-encoder
# program.
from PIL import Image, ImageOps, ImageEnhance
import numpy as np
# Open image, convert to greyscale, check width and resize if necessary
im = Image.open(r"pics/NEKO.JPG").convert("L")
image_array = np.array(im)
print(f"Original 2D Picture Array:\n{image_array}") # data is stored differently depending on
# im.mode (RGB vs L vs P)
image_width, image_height = im.size
print(f"Size: {im.size}") # Mode: {im.mode}")
# im.show()
if image_width != 909:
print("Resizing to width of 909 keeping aspect ratio...")
new_width = 909
ratio = (new_width / float(image_width))
new_height = int((float(image_height) * float(ratio)))
im = im.resize((new_width, new_height))
print(f"New Size: {im.size}")
# im.show()
# Save image data in a numpy array and make it 1D.
image_array1 = np.array(im).ravel()
print(f"Picture Array: {image_array1}")
# create file w .pgm ext to store data in, first 4 lines are: pgm type, comment, image size,
# maxVal (=white, 0=black)
file = open("output.pgm", "w+")
file.write("P2\n# Created by convert-to-pgm.py \n%d %d\n255\n" % im.size)
# Storing greyscale data in file with \n delimiter
for number in image_array1:
# file.write(str(image_array1[number]) + '\n') #### This was the culprit of the hindered image quality...changed to line below. Thanks to Mark in comments.
file.write(str(number) + '\n')
file.close()
im = im.save(r"pics/NEKO-greyscale.jpg")
# Strings to replace the newline characters
WINDOWS_LINE_ENDING = b'\r\n'
UNIX_LINE_ENDING = b'\n'
with open('output.pgm', 'rb') as open_file:
content = open_file.read()
content = content.replace(WINDOWS_LINE_ENDING, UNIX_LINE_ENDING)
with open('output.pgm', 'wb') as open_file:
open_file.write(content)
open_file.close()
This produces a .PGM file that, when opened with a text editor, looks similar to the same image that was exported as a .PGM using GIMP (My prior solution was to use the GIMP export tool to manually convert the pictures and I couldn't find any other converters that supported the "P2" format). However, the quality of the resulting picture is severely diminished compared to what is produced using the GIMP export tool. I have tried a few methods of image enhancement (brightness, equalize, posterize, autocontrast, etc.) to get a better result, but none have been entirely successful. So my first question: what can I do differently to obtain a result that looks more like what GIMP produces? I am not looking for perfection, just a little clarity and a learning experience. How can I automatically adjust {insert whatever} for the best picture?
Below is the .PGM image produced by my version compared GIMP's version, open in a text editor, using the same input .jpg
My version vs. GIMP's version:
Below are comparisons of adding various enhancements before creating the .pgm file compared to the original .jpg and the original .jpg converted as a greyscale ("L"). All photos are opened through GIMP.
Original .jpg
Greyscale .jpg, after .convert("L") command
**This is ideally what I want my .PGM to look like. Why is the numpy array data close, yet different than the data in the GIMP .PGM file, even though the produced greyscale image looks identical to what GIMP produces?
Answer: Because it wasn't saving the correct data. :D
GIMP's Resulting .PGM
My Resulting .PGM
My Resulting .PGM with lower brightness, with Brightness.enhance(0.5)
Resulting .PGM with posterize, ImageOps.posterize(im, 4)
SECOND PROBLEM:
My last issue comes when viewing the .PGM picture using various PGM viewers, such as these online tools (here and here). The .PGM file is not viewable through one of the above links, but works "fine" when viewing with the other link or with GIMP. Likewise, the .PGM file I produce with my script is also not currently compatible with the program that I intend to use it for. This is most important to me, since its purpose is to feed the properly formatted PGM image into the program. I'm certain that something in the first four lines of the .PGM file is altering the program's ability to sense that it is indeed a PGM, and I'm pretty sure that it's something trivial, since some other viewers are also not capable of reading my PGM. So my main question is: Is there a proper way to do this conversion or, with the proper adjustments, is my script suitable? Am I missing something entirely obvious? I have minimal knowledge on image processing.
GitHub link to the program that I'm feeding the .PGM images into: here
More info on this particular issue: The program throws a fault when ran with one of my .PGM images, but works perfectly with the .PGM images produced with GIMP. The program is in C++ and the line "ASSERT(buf[2] == '\n')" returns the error, implying that my .PGM file is not in the correct format. If I comment this line out and recompile, another "ASSERT(width == 909)..." throws an error, implying that my .PGM does not have a width of 909 pixels. If I comment this line out as well and recompile, I am left with the infamous "segmentation fault (core dumped)." I compiled this on Windows, with cygwin64. Everything seems to be in place, so the program is having trouble reading the contents of the file (or understanding '\n'?). How could this be if both my version and GIMP's version are essentially identical in format, when viewed with a text editor?
Terminal output:
Thanks to all for the help, any and all insight/criticism is acceptable.
The first part of my question was answered in the comments, it was a silly mistake on my end as I'm still learning syntax. The above code now works as intended.
I was able to do a little more research on the second part of my problems and I noticed something very important, and also feel quite silly for missing it yesterday.
So of course the reason why my program was having a problem reading the '\n' character was simply because Windows encodes newline characters as CRLF aka '\r\n' as opposed to the Unix way of LF aka '\n'. So in my script at the very end I just add the simple code [taken from here]:
# replacement strings
WINDOWS_LINE_ENDING = b'\r\n'
UNIX_LINE_ENDING = b'\n'
with open('output.pgm', 'rb') as open_file:
content = open_file.read()
content = content.replace(WINDOWS_LINE_ENDING, UNIX_LINE_ENDING)
with open('output.pgm', 'wb') as open_file:
open_file.write(content)
Now, regardless on whether the text file is encoded with CRLF or LF, the script will work properly.

gdal_merge overlaying pngs over one another

I have a lot of PNGs that were tiled with gdal2tiles.py. I have done some processing on these tiles and now I would like combine them back into one large TIF.
For example I have folder 13-20 for different zoom levels, let's say I want all the PNGs from zoom level 20 to be a single mosaic, how would I do with gdal_merge? I'm using gdal_merge now trying this but I end up getting the last .PNG that it processes so my TIF is just a 256x256 TIF of the last processed PNG. Here is my current code,
python gdal_merge.py -o mos.tif -of GTiff -v --optfile tif_list.txt
tif_list.txt contains the list of all my PNGs
I'm assuming I might need to add a -co option but I cannot find any documentation on what I can use in -co. If this is needed my coordinate system is EPSG 3857, and the tiles were generated as mercator. Any help would be appreciated.
Update:
format of tif_list.txt,
C:\Users\Administrator\Desktop\19\195953\226590.png
C:\Users\Administrator\Desktop\19\195954\226581.png
C:\Users\Administrator\Desktop\19\195954\226582.png
C:\Users\Administrator\Desktop\19\195954\226583.png
C:\Users\Administrator\Desktop\19\195954\226584.png
C:\Users\Administrator\Desktop\19\195954\226585.png
C:\Users\Administrator\Desktop\19\195954\226586.png
C:\Users\Administrator\Desktop\19\195954\226587.png
C:\Users\Administrator\Desktop\19\195954\226588.png
C:\Users\Administrator\Desktop\19\195954\226589.png
C:\Users\Administrator\Desktop\19\195954\226590.png
C:\Users\Administrator\Desktop\19\195955\226581.png
C:\Users\Administrator\Desktop\19\195955\226582.png
C:\Users\Administrator\Desktop\19\195955\226583.png
C:\Users\Administrator\Desktop\19\195955\226584.png
C:\Users\Administrator\Desktop\19\195955\226585.png
C:\Users\Administrator\Desktop\19\195955\226586.png
C:\Users\Administrator\Desktop\19\195955\226587.png
C:\Users\Administrator\Desktop\19\195955\226588.png
C:\Users\Administrator\Desktop\19\195955\226589.png
C:\Users\Administrator\Desktop\19\195955\226590.png
examples of the PNGs,

having cv2.imread reading images from file objects or memory-stream-like data (here non-extracted tar)

I have a .tar file containing several hundreds of pictures (.png). I need to process them via opencv.
I am wondering whether - for efficiency reasons - it is possible to process them without passing by the disc. In other, words I want to read the pictures from the memory stream related to the tar file.
Consider for instance
import tarfile
import cv2
tar0 = tarfile.open('mytar.tar')
im = cv2.imread( tar0.extractfile('fname.png').read() )
The last line doesn't work as imread expects a file name rather than a stream.
Consider that this way of reading directly from the tar stream can be achieved e.g. for text (see e.g. this SO question).
Any suggestion to open the stream with the correct png encoding?
Untarring to ramdisk is of course an option, although I was looking for something more cachable.
Thanks to the suggestion of #abarry and this SO answer I managed to find the answer.
Consider the following
def get_np_array_from_tar_object(tar_extractfl):
'''converts a buffer from a tar file in np.array'''
return np.asarray(
bytearray(tar_extractfl.read())
, dtype=np.uint8)
tar0 = tarfile.open('mytar.tar')
im0 = cv2.imdecode(
get_np_array_from_tar_object(tar0.extractfile('fname.png'))
, 0 )
Perhaps use imdecode with a buffer coming out of the tar file? I haven't tried it but seems promising.

How to get the diff of two PDF files using Python?

I need to find the difference between two PDF files. Does anybody know of any Python-related tool which has a feature that directly gives the diff of the two PDFs?
What do you mean by "difference"? A difference in the text of the PDF or some layout change (e.g. an embedded graphic was resized). The first is easy to detect, the second is almost impossible to get (PDF is an VERY complicated file format, that offers endless file formatting capabilities).
If you want to get the text diff, just run a pdf to text utility on the two PDFs and then use Python's built-in diff library to get the difference of the converted texts.
This question deals with pdf to text conversion in python: Python module for converting PDF to text.
The reliability of this method depends on the PDF Generators you are using. If you use e.g. Adobe Acrobat and some Ghostscript-based PDF-Creator to make two PDFs from the SAME word document, you might still get a diff although the source document was identical.
This is because there are dozens of ways to encode the information of the source document to a PDF and each converter uses a different approach. Often the pdf to text converter can't figure out the correct text flow, especially with complex layouts or tables.
I do not know your use case, but for regression tests of script which generates pdf using reportlab, I do diff pdfs by
Converting each page to an image using ghostsript
Diffing each page against page image of standard pdf, using PIL
e.g
im1 = Image.open(imagePath1)
im2 = Image.open(imagePath2)
imDiff = ImageChops.difference(im1, im2)
This works in my case for flagging any changes introduced due to code changes.
Met the same question on my encrypted pdf unittest, neither pdfminer nor pyPdf works well for me.
Here are two commands (pdftocairo, pdftotext) work perfect on my test. (Ubuntu Install: apt-get install poppler-utils)
You can get pdf content by:
from subprocess import Popen, PIPE
def get_formatted_content(pdf_content):
cmd = 'pdftocairo -pdf - -' # you can replace "pdftocairo -pdf" with "pdftotext" if you want to get diff info
ps = Popen(cmd, shell=True, stdin=PIPE, stdout=PIPE, stderr=PIPE)
stdout, stderr = ps.communicate(input=pdf_content)
if ps.returncode != 0:
raise OSError(ps.returncode, cmd, stderr)
return stdout
Seems pdftocairo can redraw pdf files, pdftotext can extract all text.
And then you can compare two pdf files:
c1 = get_formatted_content(open('f1.pdf').read())
c2 = get_formatted_content(open('f2.pdf').read())
print(cmp(c1, c2)) # for binary compare
# import difflib
# print(list(difflib.unified_diff(c1, c2))) # for text compare
Even though this question is quite old, my guess is that I can contribute to the topic.
We have several applications generating tons of PDFs. One of these apps is written in Python and recently I wanted to write integration tests to check if the PDF generation was working correctly.
Testing PDF generation is HARD, because the specs for PDF files are very complicated and non-deterministic. Two PDFs, generated with the same exact input data, will generate different files, so direct file comparison is discarded.
The solution: we have to go with testing the way they look like (because THAT should be deterministic!).
In our case, the PDFs are being generated with the reportlab package, but this doesn't matter from the test perspective, we just need a filename or the PDF blob (bytes) from the generator. We also need an expectation file containing a "good" PDF to compare with the one coming from the generator.
The PDFs are converted to images and then compared. This can be done in multiple ways, but we decided to use ImageMagick, because it is extremely versatile and very mature, with bindings for almost every programming language out there. For Python 3, the bindings are offered by the Wand package.
The test looks something like the following. Specific details of our implementation were removed and the example was simplified:
import os
from unittest import TestCase
from wand.image import Image
from app.generators.pdf import PdfGenerator
DIR = os.path.dirname(__file__)
class PdfGeneratorTest(TestCase):
def test_generated_pdf_should_match_expectation(self):
# `pdf` is the blob of the generated PDF
# If using reportlab, this is what you get calling `getpdfdata()`
# on a Canvas instance, after all the drawing is complete
pdf = PdfGenerator().generate()
# PDFs are vectorial, so we need to set a resolution when
# converting to an image
actual_img = Image(blob=pdf, resolution=150)
filename = os.path.join(DIR, 'expected.pdf')
# Make sure to use the same resolution as above
with Image(filename=filename, resolution=150) as expected:
diff = actual.compare(expected, metric='root_mean_square')
self.assertLess(diff[1], 0.01)
The 0.01 is as low as we can tolerate small differences. Considering that diff[1] varies from 0 to 1 using the root_mean_square metric, we are here accepting a difference up to 1% on all channels, comparing with the sample expected file.
Check this out, it can be useful: pypdf

Categories

Resources