display FITS file content - python

I have a bunch of data in of FITS file format that a I need to visualize.
Some details:
I use python with astropy to manipulate and preview the data;
The data stored in the FITS file is basically a numpy array with 70 lines ("orders") of 8096 pixels each (an echelle spectrum)
the data is saved as a multipage pdf where each page corresponds to one specific order of the FITS observations
I want to display the data as per figure 1:
figure 1 corresponds to one single order from each FITS file;
the "grey" region on the top panel corresponds to regions with no data/observations;
each "line" on the top panel corresponds to a different observation (x-axis: wavelenght, y-axis: date of observation; z-axis: flux)
the red line is optional
the bottom panel is the same data as above, but with all obsevations overlapping (x-axis: wavelenght, y-axis: flux)
the flux is normalised to the median on the panels, but the fits will have values sometimes well above 10^7
now, I am facing the following problem. If I save in pdf (or even png, etc), I am limited by the dpi which I use. The higher the dpi, the better I can preview the data, but it becomes impossible to work with due to the file size. But with a low dpi, the data appears blurry. When I do a preview of the data with matplotlib's show, I can zoom in and out with no problems, but it becomes impossible to work with as I generate my images on a serve at work and it becomes impossible to do remotely.
So, my question is: is there a file format I could use to store my data similarly to figure 1 (ideally a multipage format and that I could use python to create, but not limited to.), but would allow me to work with an "infinite" resolution similarly to what I have with matplotlib's show? There are several FITS file viewers "in the wild", but to my knowledge they do not allow to preview the data as I wish....

Related

Extract a label from several single page PDF files and align them to fill an A4 page (to save paper)

I receive a single page A4 PDF file from my shipping courier. It contains a small label on the top left corner, and the rest of the page is blank. As I usually have to print several a day, I end up wasting a lot of paper. I could drop all the PDFs to inkscape (it imports the label as a grouped object) and manually align them to fit an A4 page. But it will become tedious really fast and it would waste time.
Can you point me in the right direction as to what to look for in order to write a python script to do this?
You would need to determine the size of the label in PDF units (1 point = 1/72 inch).
Then determine how many labels you can fit onto one page, i.e. how many columns and rows you can have (taking the needed printing margin into account).
The script could take the PDF pages as command line arguments, import each page as Form XObject and place the labels into the row/column raster.
I would do the following:
Once:
Create a file with a different linked image for each label that fits on the page (with correct proportions)
For each label file (Inkscape command line):
Open with Inkscape
Create an object of the size of the label and clip the contents to that object (if necessary, e.g. because there's more in the file than just that single group object)
Resize page to contents
Save
Then:
create a CSV file from your current set of label file names, suitable for https://gitlab.com/Moini/nextgenerator - as many as fit on one page per line
use the extension (see documentation)
Note that the extension can also be used from the command line, if needed.

how to locate and extract coordinates/data/sub-components of charts/map image data?

I'm working on creating a tile server from some raster nautical charts (maps) i've paid for access, and i'm trying to post-process the raw image data that these charts are distributed as, prior to geo-referencing them and slicing them up into tiles
i've got a two sets of tasks and would greatly appreciate any help or even sample code on how to get these done in an automated way. i'm no stranger to python/jupyter notebooks but have zero experience with this type of data-science to do image analysis/processing using things like opencv/machine learning (or if there's a better toolkit library that i'm not even yet aware of).
i have some sample images (originals are PNG but too big to upload so i encoded them in high-quality JPEGs to follow along/provide sample data).. here's what i'm trying to get done:
validation of all image data.. the first chart (as well as last four) demonstrate what properly formatted charts images should looks like (i manually added a few colored rectangles to the first, to highlight different parts of the image in the bonus section below)
some images will either have missing tile data, as in the 2nd sample image, these are ALWAYS chunks of 256x256 image data, so should be straightforward to identify black boxes of this exact size..
some images will have corrupt/misplaced tiles as in the 3rd image (notice in the center/upper half of the image is a large colorful semi-circle/arcs, it is slightly duplicated beneath and if you look along horizontally you can see the image data is shifted and so these tiles have been corrupted somehow
extraction of information, ultimately once all image data is verified to be valid (the above steps are ensured), there is a few bit of data i really need pulled out of the image, the most important of which is
the 4 coordinates (upper left, upper right, lower left, lower right) of the internal chart frame, in the first image they are highlighted in a small pink box at each corner (the other images don't have them but they are located in a simlar way) - NOTE, because these are geographic coordinates and involve projections, they are NOT always 100% horizontal/vertical of each other.
the critical bit is that SOME images container more than one "chartlet", i really need to obtain the above 4 coordinate for EACH chartlet (some charts have no chartlets, some two to several of them, and they are not always simple rectangular shapes), i may be able to generate for input the number of chartlets if that helps..
if possible, what would also help is extracting each chartlet as a separate image (each of these have a single capital letter, A, B, C in a circle that would be good if it appeared in the filename)
as a bonus, if there was a way to also extract the sections sampled in the first sample image (in the lower left corner), this would probably involve recognize where/if in the image this appears (would probably only appear once per file but not certain) and then extracting based on its coordinates?
mainly the most important is inside a green box and represents a pair of tables (the left table is an example and i believe would always be the same, and the right has a variable amount of columns)
also the table in the orange box would be good to also get the text from as it's related
as would the small overview map in the blue box, can be left as an image
i have been looking at tutorials on opencv and image recognition processes but the content so far has been highly elementary not to mention an overwhelming endless list of algorithms for different operations (which again i don't know which i'd even need), so i'm not sure how it relates to what i'm trying to do.. really i don't even know where to begin to structure the steps needed for undertaking all these tasks or how each should be broken down further to ease the processing.

Python matplotlib reducing the image resolution [duplicate]

In matplotlib, I am using LineCollection to draw and color the countries, where the boundaries of the counties are given. When I am saving the figure as a pdf file:
fig.savefig('filename.pdf',dpi=300)
the figure size are quite big. However, on saving them as png file:
fig.savefig('filename.png',dpi=300)
and then converting them to pdf using linux convert command the files are small. I tried reducing the dpi, however that do not change the pdf file size. Is there a way the figures can be saved directly as smaller-pdf files from matplotlib?
The PDF is larger, since it contains all the vector information. By saving a PNG, you produce a rasterized image. It seems that in your case, you can produce a smaller PDF by rasterizing the plot directly:
plt.plot(x, y, 'r-', rasterized=True)
Here, x, y are some plot coordinates. You basically have to use the additionally keyword argument raterized to achieve the effect.
I think using "rasterized = True" effectively saves the image similarly to png format. When you zoom in, you will see blurring pixels.
If you want the figures to be high quality, my suggestion is to sample from the data and make a plot. The pdf file size is roughly the amount of data points it need to remember.

How to save Python plots with entire information like in interactive Plots (output of plt.show())?

When I use Matplotlib's plt.show() I get a nice Plot which can can be zoomed to very high precision(practically infinite). But when I save it as a image it loses all this information gives information depending on resolution.
Is there any way I can save the plot with the entire information? i.e Like those interactive plots which can rescaled at any time?
P.S- I know I can set dpi to get high quality images. This is not what I want. I want image similar to Plot which python shows when I run the program. What format is that? Or is it just very high resolution image?
Note- I am plotting .csv files which includes data varying from 10^(-10) to 100's. Thus when I save the plot as .png file I lose all the information/kinks of graph at verн small scales and only retain features from 1-100.
Maybe the interactive graphic library bokeh is an option for you. See here. It's API is just little different from what you know from matplotlib.
Bokeh creates plots as html files that you can view in your browser. For each graphic you can select wheel zoom to zoom interactively into your graphic. You can change interactively the range that you want to be plotted. Therefore you don't loose information in your graphic.

.png file not saving correctly matplotlib

While saving a multiple grid figure in png with 300 as dpi, I lose quality
However this error does not occur while saving the figure as a pdf.
Here is the small portion of the code that saves the image generated:
fig.savefig(filepath, format = 'pdf'
,bbox_inches='tight',dpi=300)
fig.savefig(filepath, format = 'png'
,bbox_inches='tight',dpi=300)
Is there a way to obtain a good resolution png of an image such as the above without having to resort to using pdf?
.pdf images are vector graphics, and thus preserve all information. In other words setting dpi=300 in the pdf creation doesn't do anything (unless you have set specific objects to be rasterized using rasterized = True).
.png images are raster graphics (e.g. check this out). Therefore you have to adjust the dpi to get the balance of filesize vs. quality that you want. In other words, the image is saving correctly, it's just lower quality than the 'perfect' pdf.
The choice of image output format depends on how you will use it. Vector graphics (.pdf, .svg) are great if you have simple plots that you want to scale (zoom) perfectly. However, if you are plotting many points (>10,000 or so), this can lead to very large filesizes. In this case it may be better to rasterize the figure because a person can't see that many data points anyway.
"Which raster format should you use?" .png and .jpg are the most common. The former has better compression for images with large patches of the same color, while the latter has better compression for high pixel variability (e.g. photographs). Check this out for more info.
Note that while .png is considered 'lossless', it is only so in the sense that it preserves the rasterized information. Information is still lost when saving/converting to rasterized format.

Categories

Resources