Write a new fits file after modification in pixel values - python

I have a small problem. I know there are some similar questions but I do not know that I am doing wrong because they are not working for me, I would appreciate any help.
I want to change some pixel values in a fits file. They are basically empty spots and I want to fill them with ~the mean pixels value of the image.
I do like this:
from __future__ import division
import pyfits as fits
import numpy as np
obj1 = fits.open(raw_input('Name of the image to be improved? '))
data_obj1 = obj1[0].data
meanpix = np.mean(data_obj1)
noise = np.linspace(-meanpix,meanpix,100000)
shape = data_obj1.shape
result = np.zeros(shape)
for x in range(0,shape[0]):
for y in range(0,shape[1]):
if data_obj1[x,y] > -5.48e-14 and data_obj1[x,y] < -5.46e-14:
random_noise = np.random.choice(noise,1)
result[x,y] = random_noise
else:
result[x,y] = data_obj1[x,y]
out = obj1
out[0].data = result
out.writeto(raw_input('Name of the output file? '), clobber=True)
I know it is doing the operation I want to do, because if I print result[x,y] it is how it is supposed to be. Nevertheless, when I open the generated fits file, it is exactly the same as it was at the beginning. So probably I do not understand i) how to properly save the fits file or ii) how to build my new image correctly. Can someone help me?

Apart from typos explained by #MSeifert, it is just a visualization problem. See the comments for clarifications!

Related

StandardScaler.inverse_transform() return the same array as input :/ Is sklearn broken or am I?

Good evening,
I'm currently pursuing a PhD in chemistry and in this framework I'm trying to apply my few knowledge in python and stats to discriminate sample based on their IR spectrum.
After a few of weeks of data acquisition I'm finally able to build my data set and was about to see what PCA can offer (this was the easy part).
I was able to build my script and get the loadings, scores and everything else that I could possibly need or want. However I used the StandardScaler from sklearn.preprocessing to scale down my data so (correct my if i'm wrong) I should get back loadings in this "standard scaled" space.
As my data are actual IR spectra those loadings have a chemical meanings (even thought there are not real spectrum) e.g. if my PC1 loadings have a peak at XX cm-1 i know that samples with high PC1 are likely to contain compounds that absorb at this wavenumber .
So i want to reverse the StandardScaler transformation. I've tried to used StandardScaler.inverse_transform() however it appears to return me the same array that I gave him... which is very frustrating...
I'm trying to do the same thing with my samples spectrum but it gave me the same result again : here is the portion of my script where I tried this :
Wavenumbers = DFF.columns
#in fact this is a little more complicated but that's the spirit
Spectre = DFF.values.tolist()
#btw DFF is my pandas.dataframe containing spectrum with features = wavenumber
SS = StandardScaler(copy=True)
DFF = SS.fit_transform(DFF) #at this point I use SS for preprocessing before PCA
#I'm then trying to inverse SS and get back the 1rst spectrum of the dataset
D = SS.inverse_transform(DFF[0])
#However at this point DFF[0] and D are almost-exactly the same I'm sure because :
plt.plot(Wavenumbers,D)
plt.plot(Wavenumbers,DFF[0]) #the curves are the sames, and :
for i,j in enumerate(D) :
if j==DFF[0][i] : pass
else : print("{}".format(j-DFF[0][i] )) #return nothing bigger than 10e-16
The problem is more than likely syntax or how i used StandardScaler, however i have no one around me to search for help with that . Can anyone tell me what i did wrong ? or give me an hint on how i could get back my loadings in the "actual real IR spectra" space ?
PS: sorry for the wacky English and i hope to be understandable
Good evening,
After putting the problem aside for a few days I finally re-coded the function I needed (as suggested by Robert Dodier).
For reminder, I wanted to have a function that could take my data from a pandas dataframe and mean-centered it in order to do PCA, but also that could reverse the preprocessing for latter uses.
Here is the code I ended up with :
import pandas as pd
import numpy as np
class Scaler:
std =[]
mean = []
def fit(self,DF):
self.std=[]
self.mean=[]
for c in DF.columns:
self.std.append(DF[c].std())
self.mean.append(DF[c].mean())
def transform(self,DF):
X = np.zeros(shape=DF.shape)
for i,c in enumerate(DF.columns):
for j in range(len(DF.index)):
X[j][i] = (DF[c][j] - self.mean[i]) / self.std[i]
return X
def reverse(self,X):
Y = np.zeros(shape=X.shape)
for i in range(len(X[0])):
for j in range(len(X)):
Y[j][i] = X[j][i] * self.std[i] + self.mean[i]
return Y
def fit_transform(self,DF):
self.fit(DF)
X = self.transform(DF)
return X
It's pretty slow and surely very low-tech but it seems to do the job just fine. Hope it will save some time to other python beginners.
I designed it to be as close as I think sklearn.preprocessing.StandardScaler does it.
example :
S = Scaler() #create scaler object
S.fit(DF) #fit the scaler to the dataframe (calculate mean and std for every columns in DF /!\ DF must be a pd.dataframe)
X=S.transform(DF) # return a np.array with mean centered data
Y = S.reverse(X) # reverse the transformation to get back original data
Again sorry for the fast tipped English. And thanks to Robert for taking the time to answer.

Replacing masked data with previous values of same dataset

I am working on filling in missing data in a large (4GB) netcdf datafile (3 dimensions: time, longitude and latitude). The method is to fill in the masked values in data1 either with:
1) previous values from data1 or
2) with data from another (also masked dataset, data2) if the found value from data1 < the found value from data2.
So fare I have tried a couple of things, one is to make a very complex script with long for loops which never finished running after 24 hours. I have tried to reduce it, but i think it is still very much to complicated. I believe there is a much more simple procedure to do it than the way I am doing it now I just can't see how.
I have made a script where masked data is first replaced with zeroes in order to use the function np.where to get the index of my masked data (i did not find a function that returns the coordinates of masked data, so this is my work arround it). My problem is that my code is very long and i think time consuming for large datasets to run through. I believe there is a more simple way of doing it, but I haven't found another work arround it.
Here is what I have so fare: : (the first part is just to generate some matrices that are easy to work with):
if __name__ == '__main__':
import numpy as np
import numpy.ma as ma
from sortdata_helpers import decision_tree
# Generating some (easy) test data to try the algorithm on:
# data1
rand1 = np.random.randint(10, size=(10, 10, 10))
rand1 = ma.masked_where(rand1 > 5, rand1)
rand1 = ma.filled(rand1, fill_value=0)
rand1[0,:,:] = 1
#data2
rand2 = np.random.randint(10, size=(10, 10, 10))
rand2[0, :, :] = 1
coordinates1 = np.asarray(np.where(rand1 == 0)) # gives the locations of where in the data there are zeros
filled_data = decision_tree(rand1, rand2, coordinates1)
print(filled_data)
The functions that I defined to be called in the main script are these, in the same order as they are used:
def decision_tree(data1, data2, coordinates):
# This is the main function,
# where the decision between data1 or data2 is chosen.
import numpy as np
from sortdata_helpers import generate_vector
from sortdata_helpers import find_value
for i in range(coordinates.shape[1]):
coordinate = [coordinates[0, i], coordinates[1,i], coordinates[2,i]]
AET_vec = generate_vector(data1, coordinate) # makes vector to go back in time
AET_value = find_value(AET_vec) # Takes the vector and find closest day with data
PET_vec = generate_vector(data2, coordinate)
PET_value = find_value(PET_vec)
if PET_value > AET_value:
data1[coordinate[0], coordinate[1], coordinate[2]] = AET_value
else:
data1[coordinate[0], coordinate[1], coordinate[2]] = PET_value
return(data1)
def generate_vector(data, coordinate):
# This one generates the vector to go back in time.
vector = data[0:coordinate[0], coordinate[1], coordinate[2]]
return(vector)
def find_value(vector):
# Here the fist value in vector that is not zero is chosen as "value"
from itertools import dropwhile
value = list(dropwhile(lambda x: x == 0, reversed(vector)))[0]
return(value)
Hope someone has a good idea or suggestions on how to improve my code. I am still struggling with understanding indexing in python, and I think this can definately be done in a more smooth way than I have done here.
Thanks for any suggestions or comments,

Creating an ASCII art world map

I'd like to render an ASCII art world map given this GeoJSON file.
My basic approach is to load the GeoJSON into Shapely, transform the points using pyproj to Mercator, and then do a hit test on the geometries for each character of my ASCII art grid.
It looks (edit: mostly) OK when centered one the prime meridian:
But centered on New York City (lon_0=-74), and it suddenly goes haywire:
I'm fairly sure I'm doing something wrong with the projections here. (And it would probably be more efficient to transform the ASCII map coordinates to lat/lon than transform the whole geometry, but I am not sure how.)
import functools
import json
import shutil
import sys
import pyproj
import shapely.geometry
import shapely.ops
# Load the map
with open('world-countries.json') as f:
countries = []
for feature in json.load(f)['features']:
# buffer(0) is a trick for fixing polygons with overlapping coordinates
country = shapely.geometry.shape(feature['geometry']).buffer(0)
countries.append(country)
mapgeom = shapely.geometry.MultiPolygon(countries)
# Apply a projection
tform = functools.partial(
pyproj.transform,
pyproj.Proj(proj='longlat'), # input: WGS84
pyproj.Proj(proj='webmerc', lon_0=0), # output: Web Mercator
)
mapgeom = shapely.ops.transform(tform, mapgeom)
# Convert to ASCII art
minx, miny, maxx, maxy = mapgeom.bounds
srcw = maxx - minx
srch = maxy - miny
dstw, dsth = shutil.get_terminal_size((80, 20))
for y in range(dsth):
for x in range(dstw):
pt = shapely.geometry.Point(
(srcw*x/dstw) + minx,
(srch*(dsth-y-1)/dsth) + miny # flip vertically
)
if any(country.contains(pt) for country in mapgeom):
sys.stdout.write('*')
else:
sys.stdout.write(' ')
sys.stdout.write('\n')
I made edit at the bottom, discovering new problem (why there is no Canada and unreliability of Shapely and Pyproj)
Even though its not exactly solving the problem, I believe this attitude has more potential than using pyproc and Shapely and in future, if you will do more Ascii art, will give you more possibilites and flexibility. Firstly I will write pros and cons.
PS: Initialy I wanted to find problem in your code, but I had problems with running it, because pyproj was returning me some error.
PROS
1) I was able to extract all points (Canada is really missing) and rotate image
2) The processing is very fast and therefore you can create Animated Ascii art.
3) Printing is done all at once without need to loop
CONS (known Issues, solvable)
1) This attitude is definetly not translating geo-coordinates correctly - too plane, it should look more spherical
2) I didnt take time to try to find out solution to filling the borders, so only borders has '*'. Therefore this attitude needs to find algorithm to fill the countries. I think it shouldnt be problem since the JSON file contains countries separated
3) You need 2 extra libs beside numpy - opencv(you can use PIL instead) and Colorama, because my example is animated and I needed to 'clean' terminal by moving cursor to (0,0) instead of using os.system('cls')
4) I made it run only in python 3. In python 2 it works too but I am getting error with sys.stdout.buffer
Change font size on terminal to lowest point so the the printed chars fit in terminal. Smaller the font, better resolution
The animation should look like the map is 'rotating'
I used little bit of your code to extract the data. Steps are in the commentaries
import json
import sys
import numpy as np
import colorama
import sys
import time
import cv2
#understand terminal_size as how many letters in X axis and how many in Y axis. Sorry not good name
if len(sys.argv)>1:
terminal_size = (int(sys.argv[1]),int(sys.argv[2]))
else:
terminal_size=(230,175)
with open('world-countries.json') as f:
countries = []
minimal = 0 # This can be dangerous. Expecting negative values
maximal = 0 # Expecting bigger values than 0
for feature in json.load(f)['features']: # getting data - I pretend here, that geo coordinates are actually indexes of my numpy array
indexes = np.int16(np.array(feature['geometry']['coordinates'][0])*2)
if indexes.min()<minimal:
minimal = indexes.min()
if indexes.max()>maximal:
maximal = indexes.max()
countries.append(indexes)
countries = (np.array(countries)+np.abs(minimal)) # Transform geo-coordinates to image coordinates
correction = np.abs(minimal) # because geo-coordinates has negative values, I need to move it to 0 - xaxis
colorama.init()
def move_cursor(x,y):
print ("\x1b[{};{}H".format(y+1,x+1))
move = 0 # 'rotate' the globe
for i in range(1000):
image = np.zeros(shape=[maximal+correction+1,maximal+correction+1]) #creating clean image
move -=1 # you need to rotate with negative values
# because negative one are by numpy understood. Positive one will end up with error
for i in countries: # VERY STRANGE,because parsing the json, some countries has different JSON structure
if len(i.shape)==2:
image[i[:,1],i[:,0]+move]=255 # indexes that once were geocoordinates now serves to position the countries in the image
if len(i.shape)==3:
image[i[0][:,1],i[0][:,0]+move]=255
cut = np.where(image==255) # Bounding box
if move == -1: # creating here bounding box - removing empty edges - from sides and top and bottom - we need space. This needs to be done only once
max_x,min_x = cut[0].max(),cut[0].min()
max_y,min_y = cut[1].max(),cut[1].min()
new_image = image[min_x:max_x,min_y:max_y] # the bounding box
new_image= new_image[::-1] # reverse, because map is upside down
new_image = cv2.resize(new_image,terminal_size) # resize so it fits inside terminal
ascii = np.chararray(shape = new_image.shape).astype('|S4') #create container for asci image
ascii[:,:]='' #chararray contains some random letters - dunno why... cleaning it
ascii[:,-1]='\n' #because I pring everything all at once, I am creating new lines at the end of the image
new_image[:,-1]=0 # at the end of the image can be country borders which would overwrite '\n' created one step above
ascii[np.where(new_image>0)]='*' # transforming image array to chararray. Better to say, anything that has pixel value higher than 0 will be star in chararray mask
move_cursor(0,0) # 'cleaning' the terminal for new animation
sys.stdout.buffer.write(ascii) # print into terminal
time.sleep(0.025) # FPS
Maybe it would be good to explain what is the main algorithm in the code. I like to use numpy whereever I can. The whole thing is that I pretend that coordinates in the image, or whatever it may be (in your case geo-coordinates) are matrix indexes. I have then 2 Matrixes - Real Image and Charray as Mask. I then take indexes of interesting pixels in Real image and for the same indexes in Charray Mask I assign any letter I want. Thanks to this, the whole algorithm doesnt need a single loop.
About Future posibilities
Imagine you will also have information about terrain(altitude). Let say you somehow create grayscale image of world map where gray shades expresses altitude. Such grayscale image would have shape x,y. You will prepare 3Dmatrix with shape = [x,y,256]. For each layer out of 256 in the 3D matrix, you assign one letter ' ....;;;;### and so on' that will express shade.
When you have this prepared, you can take your grayscale image where any pixel will actually have 3 coordinates: x,y and shade value. So you will have 3 arrays of indexes out of your grascale map image -> x,y,shade. Your new charray will simply be extraction of your 3Dmatrix with layer letters, because:
#Preparation phase
x,y = grayscale.shape
3Dmatrix = np.chararray(shape = [x,y,256])
table = ' ......;;;;;;;###### ...'
for i in range(256):
3Dmatrix[:,:,i] = table[i]
x_indexes = np.arange(x*y)
y_indexes = np.arange(x*y)
chararray_image = np.chararray(shape=[x,y])
# Ready to print
...
shades = grayscale.reshape(x*y)
chararray_image[:,:] = 3Dmatrix[(x_indexes ,y_indexes ,shades)].reshape(x,y)
Because there is no loop in this process and you can print chararray all at once, you can actually print movie into terminal with huge FPS
For example if you have footage of rotating earth, you can make something like this - (250*70 letters), render time 0.03658s
You can ofcourse take it into extreme and make super-resolution in your terminal, but resulting FPS is not that good: 0.23157s, that is approximately 4-5 FPS. Interesting to note is, that this attitude FPS is enourmous, but terminal simply cannot handle printing, so this low FPS is due to limitations of terminal and not of calculation as calculation of this high resolution took 0.00693s, that is 144 FPS.
BIG EDIT - contradicting some of above statements
I accidentaly opened raw json file and find out, there is CANADA and RUSSIA with full correct coordinates. I made mistake to rely on the fact that we both didnt have canada in the result, so I expected my code is ok. Inside JSON, the data has different NOT-UNIFIED structure. Russia and Canada has 'Multipolygon', so you need to iterate over it.
What does it mean? Dont rely on Shapely and pyproj. Obviously they cant extract some countries and if they cant do it reliably, you cant expect them to do anything more complicated.
After modifying the code, everything is allright
CODE: This is how to load the file correctly
...
with open('world-countries.json') as f:
countries = []
minimal = 0
maximal = 0
for feature in json.load(f)['features']: # getting data - I pretend here, that geo coordinates are actually indexes of my numpy array
for k in range((len(feature['geometry']['coordinates']))):
indexes = np.int64(np.array(feature['geometry']['coordinates'][k]))
if indexes.min()<minimal:
minimal = indexes.min()
if indexes.max()>maximal:
maximal = indexes.max()
countries.append(indexes)
...

When plotting Mandelbrot using NumPy & Pillow, program outputs apparent noise

Previously, I had created a Mandelbrot generator in python using turtle. Now, I am re-writing the program to use the Python Imaging Library in order to increase speed and reduce limits on size of images.
However, the program below only outputs RGB nonsense, almost noise. I think it is something to do with a difference in the way NumPy and PIL deal with arrays, since saying l[x,y] = [1,1,1] where l = np.zeros((height,width,3)) doesn't just make 1 pixel white when img = Image.fromarray(l) and img.show() are performed.
def imagebrot(mina=-1.25, maxa=1.25, minb=-1.25, maxb=1.25, width=100, height=100, maxit=300, inf=2):
l,b = np.zeros((height,width,3), dtype=np.float64), minb
for y in range(0, height):
a = mina
for x in range(0, width):
ab = mandel(a, b, maxit, inf)
if ab[0] == maxit:
l[x,y:] = [1,1,1]
#if ab[0] < maxit:
#smoothit = mandelc(ab[0], ab[1], ab[2])
#l[x, y] = colorsys.hsv_to_rgb(smoothit, 1, 1)
a += abs(mina-maxa)/width
b += abs(minb-maxb)/height
img = Image.fromarray(l, "RGB")
img.show()
def mandel(re, im, maxit, inf):
z = complex(re, im)
c,it = z,0
for i in range(0, maxit):
if abs(z) > inf:
break
z,it = z*z+c,it+1
return it,z,inf
def mandelc(it,z,inf):
return (it+1-log(log(abs(z)))/log(2))
UPDATE 1:
I realised that one of the major errors in this program (I'm sure there are many) is the fact that I was using the x,y coords as the complex coefficients! So, 0 to 100 instead of -1.25 to 1.25! I have changed this so that the code now uses variables a,b to describe them, incremented in a manner I've stolen from some of my code in the turtle version. The code above has been updated accordingly. Since the Smooth Colouring Algorithm code is currently commented out for debugging, the inf variable has been reduced to 2 in size.
UPDATE 2:
I have edited the numpy index with help from a great user. The program now outputs this when set to 200 by 200:
As you can see, it definitely shows some mathematical shape and yet is filled with all these strange red, green and blue pixels! Why could these be here? My program can only set RGB values to [1,1,1] or leave it as a default [0,0,0]. It can't be [1,0,0] or anything like that - this must be a serious flaw...
UPDATE 3:
I think there is an error with NumPy and PIL's integration. If I make l = np.zeros((100, 100, 3)) and then state l[0,0,:] = 1 and finally img = Image.fromarray(l) & img.show(), this is what we get:
Here we get a series of coloured pixels. This calls for another question.
UPDATE 4:
I have no idea what was happening previously, but it seems with a np.uint8 array, Image.fromarray() uses colour values from 0-255. With this piece of wisdom, I move one step closer to understanding this Mandelbug!
Now, I do get something vaguely mathematical, however it still outputs strange things.
This dot is all there is... I get even stranger things if I change to np.uint16, I presume due to the different byte-shape and encoding scheme.
You are indexing the 3D array l incorrectly, try
l[x,y,:] = [1,1,1]
instead. For more details on how to access and modify numpy arrays have a look at numpy indexing
As a side note: the quickstart documentation of numpy actually has an implementation of the mandelbrot set generation and plotting.

Collapsing / Flattening a FITS data cube in python

I've looked all over the place and am not finding a solution to this issue. I feel like it should be fairly straightforward, but we'll see.
I have a .FITS format data cube and I need to collapse it into a 2D FITS image. The data cube has two spacial dimensions and one spectral/velocity dimension.
Just looking for a simple python routine to load in the cube and flatten all these layers (i.e. integrate them along the spectral/velocity axis). Thanks for any help.
This tutorial on pyfits is a little old, but still basically correct. The key is that the output of opening a FITS cube with pyfits (or astropy.io.fits) is that you have a 3 dimensional numpy array.
import pyfits
# if you are using astropy then for this example
# from astropy.io import fits as pyfits
data_cube, header_data_cube = pyfits.getdata("data_cube.fits", 0, header=True)
data_cube.shape
# (Z, X, Y)
You then have to decided how to flatten/integrate cube along the Z axis, and there are plenty of resources out there to help you decide the right (hopefully based in some analysis framework) to do that.
OK, this seems to work:
import pyfits
import numpy as np
hdulist = pyfits.open(filename)
header = hdulist[0].header
data = hdulist[0].data
data = np.nan_to_num(data)
new_data = data[0]
for i in range(1,84): #this depends on number of layers or pages
new_data += data[i]
hdu = pyfits.PrimaryHDU(new_data)
hdu.writeto(new_filename)
One problem with this routine is that WCS coordinates (which are attached to the original data cube) are lost during this conversion.
This is a bit of an old question, but spectral-cube now provides a better solution for this.
Example, based on Teachey's answer:
from spectral_cube import SpectralCube
cube = SpectralCube.read(filename)
summed_image = cube.sum(axis=0)
summed_image.hdu.writeto(new_filename)

Categories

Resources