I'd like to smooth a map that doesn't cover the full sky. This map is not Gaussian nor has it mean zero, so the default behavior of healpy in which it fills missing values with 0 leads to a bias towards lower values at the edges of this mask:
import healpy as hp
nside = 128
npix = hp.nside2npix(nside)
arr = np.ones(npix)
mask = np.zeros(npix, dtype=bool)
mask[:mask.size//2] = True
arr[~mask] = hp.UNSEEN
arr_sm = hp.smoothing(arr, fwhm=np.radians(5.))
hp.mollview(arr, title='Input array')
hp.mollview(arr_sm, title='Smoothed array')
I would like to preserve the sharp edge by setting the weight of the masked values to zero, instead of setting the values to zero. This appears to be difficult because healpy performs the smoothing in harmonic space.
To be more specific, I'd like to mimic the mode keyword in scipy.gaussian_filter(). healpy.smoothing() implicitly uses mode=constant with cval=0, but I'd require something like mode=reflect.
Is there any reasonable way to overcome this issue?
The easiest way to handle this is to remove the mean of the map, perform the smoothing with hp.smoothing, then add the offset back.
This works around the issue because now the map is zero-mean so zero-filling does not create a border effect.
def masked_smoothing(m, fwhm_deg=5.0):
#make sure m is a masked healpy array
m = hp.ma(m)
offset = m.mean()
smoothed=hp.smoothing(m - offset, fwhm=np.radians(fwhm_deg))
return smoothed + offset
The other option I can think of is some iterative algorithm to fill the map in "reflect" mode before smoothing, possibly to be implemented in cythonor numba , the main issue is how complex is your boundary. If it is easy like a latitude cut then all of this is easy, for a general case is very complex and could have a lot of corner cases you need to handle:
Identify "border layers"
get all the missing pixels
find the neighbors and find which one has a valid neighbor and mark it as the "first border"
repeat this algorithm and find pixels that have a "first border" pixel neighbor and mark it as "second border"
repeat until you have all the layers you need
Fill reflected values
loop on border layers
loop on each layer pixel
find the valid neighbors, compute their barycenter, now assume that the line between the border pixel center and the barycenter is going perpendicular through the mask boundary and the mask boundary is halfway
now extend this line by doubling it in the direction inside the mask, take the interpolated value of the map at that location and assign it to the current missing pixel
repeat this for the other layers by playing with the length of the line.
This problem is related to the following question and answer (disclaimer: from me):
https://stackoverflow.com/a/36307291/5350621
It can be transferred to your case as follows:
import numpy as np
import healpy as hp
nside = 128
npix = hp.nside2npix(nside)
# using random numbers here to see the actual smoothing
arr = np.random.rand(npix)
mask = np.zeros(npix, dtype=bool)
mask[:mask.size//2] = True
def masked_smoothing(U, rad=5.0):
V=U.copy()
V[U!=U]=0
VV=hp.smoothing(V, fwhm=np.radians(rad))
W=0*U.copy()+1
W[U!=U]=0
WW=hp.smoothing(W, fwhm=np.radians(rad))
return VV/WW
# setting array to np.nan to exclude the pixels in the smoothing
arr[~mask] = np.nan
arr_sm = masked_smoothing(arr)
arr_sm[~mask] = hp.UNSEEN
hp.mollview(arr, title='Input array')
hp.mollview(arr_sm, title='Smoothed array')
I have a collection of points:
[[2000,3000], [2000,12000], [10000,120000], [10000,3000], [2000,3000]]
and it has a center at coordinates [6000, 7500]
What is a way to shift all the coordinates around a new center [x_new, y_new]? Example, if I wanted to shift all the x/y's around [0,0] instead of the current center but I want to retain the shape.
The shapes vertices are not always rectangles, I am just using that for a simple example.
I want to limit 3rd party modules to numpy and the standard python library.
Thanks!
Shifting a group of points in lockstep is achieved by adding the same displacement vector to each of them.
This is easy using numpy
import numpy as np
points = np.array([[2000,3000], [2000,12000], [10000,120000], [10000,3000], [2000,3000]])
com = np.mean(points, axis=0)
delta = np.array((0, 0)) - com
shifted_points = points + delta
I have a boolean array with one connected component of True values, the border of which I would like to convert to a polygon, e.g. in shapely.
Assuming my array is img, I can get the border indices like this
import numpy as np
from skimage.morphology binary_erosion
border_indices = np.transpose(np.nonzero(np.logical_xor(binary_erosion(img), img)))
but just feeding those into a shapely.Polygon object does not work because the points are not ordered along the boundary, but in increasing x and y values.
It may be possible to use alpha shapes to solve this (note that I'm not looking for the convex hull), but maybe someone can suggest a simpler way of getting to the bounding polygon, ideally directly operating on the original array.
It sounds like rasterio.features.shapes is what you are looking for. A simple example that should illustrate the procedure:
import rasterio.features
import shapely.geometry
import numpy as np
im = np.zeros([5, 5], dtype=np.uint8)
im[1:-1, 1:-1] = 1
shapes = rasterio.features.shapes(im)
shapes is a generator with pairs of (geometry, value). To get the geometry corresponding to where the value is equal to 1:
polygons = [shapely.geometry.Polygon(shape[0]["coordinates"][0]) for shape in shapes if shape[1] == 1]
This creates a list of shapely polygons corresponding to the areas in the array where the value is equal to 1.
print(polygons)
[<shapely.geometry.polygon.Polygon object at 0x7f64bf9ac9e8>]
I'm trying to get python to return, as close as possible, the center of the most obvious clustering in an image like the one below:
In my previous question I asked how to get the global maximum and the local maximums of a 2d array, and the answers given worked perfectly. The issue is that the center estimation I can get by averaging the global maximum obtained with different bin sizes is always slightly off than the one I would set by eye, because I'm only accounting for the biggest bin instead of a group of biggest bins (like one does by eye).
I tried adapting the answer to this question to my problem, but it turns out my image is too noisy for that algorithm to work. Here's my code implementing that answer:
import numpy as np
from scipy.ndimage.filters import maximum_filter
from scipy.ndimage.morphology import generate_binary_structure, binary_erosion
import matplotlib.pyplot as pp
from os import getcwd
from os.path import join, realpath, dirname
# Save path to dir where this code exists.
mypath = realpath(join(getcwd(), dirname(__file__)))
myfile = 'data_file.dat'
x, y = np.loadtxt(join(mypath,myfile), usecols=(1, 2), unpack=True)
xmin, xmax = min(x), max(x)
ymin, ymax = min(y), max(y)
rang = [[xmin, xmax], [ymin, ymax]]
paws = []
for d_b in range(25, 110, 25):
# Number of bins in x,y given the bin width 'd_b'
binsxy = [int((xmax - xmin) / d_b), int((ymax - ymin) / d_b)]
H, xedges, yedges = np.histogram2d(x, y, range=rang, bins=binsxy)
paws.append(H)
def detect_peaks(image):
"""
Takes an image and detect the peaks usingthe local maximum filter.
Returns a boolean mask of the peaks (i.e. 1 when
the pixel's value is the neighborhood maximum, 0 otherwise)
"""
# define an 8-connected neighborhood
neighborhood = generate_binary_structure(2,2)
#apply the local maximum filter; all pixel of maximal value
#in their neighborhood are set to 1
local_max = maximum_filter(image, footprint=neighborhood)==image
#local_max is a mask that contains the peaks we are
#looking for, but also the background.
#In order to isolate the peaks we must remove the background from the mask.
#we create the mask of the background
background = (image==0)
#a little technicality: we must erode the background in order to
#successfully subtract it form local_max, otherwise a line will
#appear along the background border (artifact of the local maximum filter)
eroded_background = binary_erosion(background, structure=neighborhood, border_value=1)
#we obtain the final mask, containing only peaks,
#by removing the background from the local_max mask
detected_peaks = local_max - eroded_background
return detected_peaks
#applying the detection and plotting results
for i, paw in enumerate(paws):
detected_peaks = detect_peaks(paw)
pp.subplot(4,2,(2*i+1))
pp.imshow(paw)
pp.subplot(4,2,(2*i+2) )
pp.imshow(detected_peaks)
pp.show()
and here's the result of that (varying the bin size):
Clearly my background is too noisy for that algorithm to work, so the question is: how can I make that algorithm less sensitive? If an alternative solution exists then please let me know.
EDIT
Following Bi Rico advise I attempted smoothing my 2d array before passing it on to the local maximum finder, like so:
H, xedges, yedges = np.histogram2d(x, y, range=rang, bins=binsxy)
H1 = gaussian_filter(H, 2, mode='nearest')
paws.append(H1)
These were the results with a sigma of 2, 4 and 8:
EDIT 2
A mode ='constant' seems to work much better than nearest. It converges to the right center with a sigma=2 for the largest bin size:
So, how do I get the coordinates of the maximum that shows in the last image?
Answering the last part of your question, always you have points in an image, you can find their coordinates by searching, in some order, the local maximums of the image. In case your data is not a point source, you can apply a mask to each peak in order to avoid the peak neighborhood from being a maximum while performing a future search. I propose the following code:
import matplotlib.image as mpimg
import matplotlib.pyplot as plt
import numpy as np
import copy
def get_std(image):
return np.std(image)
def get_max(image,sigma,alpha=20,size=10):
i_out = []
j_out = []
image_temp = copy.deepcopy(image)
while True:
k = np.argmax(image_temp)
j,i = np.unravel_index(k, image_temp.shape)
if(image_temp[j,i] >= alpha*sigma):
i_out.append(i)
j_out.append(j)
x = np.arange(i-size, i+size)
y = np.arange(j-size, j+size)
xv,yv = np.meshgrid(x,y)
image_temp[yv.clip(0,image_temp.shape[0]-1),
xv.clip(0,image_temp.shape[1]-1) ] = 0
print xv
else:
break
return i_out,j_out
#reading the image
image = mpimg.imread('ggd4.jpg')
#computing the standard deviation of the image
sigma = get_std(image)
#getting the peaks
i,j = get_max(image[:,:,0],sigma, alpha=10, size=10)
#let's see the results
plt.imshow(image, origin='lower')
plt.plot(i,j,'ro', markersize=10, alpha=0.5)
plt.show()
The image ggd4 for the test can be downloaded from:
http://www.ipac.caltech.edu/2mass/gallery/spr99/ggd4.jpg
The first part is to get some information about the noise in the image. I did it by computing the standard deviation of the full image (actually is better to select an small rectangle without signal). This is telling us how much noise is present in the image.
The idea to get the peaks is to ask for successive maximums, which are above of certain threshold (let's say, 3, 4, 5, 10, or 20 times the noise). This is what the function get_max is actually doing. It performs the search of maximums until one of them is below the threshold imposed by the noise. In order to avoid finding the same maximum many times it is necessary to remove the peaks from the image. In the general way, the shape of the mask to do so depends strongly on the problem that one want to solve. for the case of stars, it should be good to remove the star by using a Gaussian function, or something similar. I have chosen for simplicity a square function, and the size of the function (in pixels) is the variable "size".
I think that from this example, anybody can improve the code by adding more general things.
EDIT:
The original image looks like:
While the image after identifying the luminous points looks like this:
Too much of a n00b on Stack Overflow to comment on Alejandro's answer elsewhere here. I would refine his code a bit to use a preallocated numpy array for output:
def get_max(image,sigma,alpha=3,size=10):
from copy import deepcopy
import numpy as np
# preallocate a lot of peak storage
k_arr = np.zeros((10000,2))
image_temp = deepcopy(image)
peak_ct=0
while True:
k = np.argmax(image_temp)
j,i = np.unravel_index(k, image_temp.shape)
if(image_temp[j,i] >= alpha*sigma):
k_arr[peak_ct]=[j,i]
# this is the part that masks already-found peaks.
x = np.arange(i-size, i+size)
y = np.arange(j-size, j+size)
xv,yv = np.meshgrid(x,y)
# the clip here handles edge cases where the peak is near the
# image edge
image_temp[yv.clip(0,image_temp.shape[0]-1),
xv.clip(0,image_temp.shape[1]-1) ] = 0
peak_ct+=1
else:
break
# trim the output for only what we've actually found
return k_arr[:peak_ct]
In profiling this and Alejandro's code using his example image, this code about 33% faster (0.03 sec for Alejandro's code, 0.02 sec for mine.) I expect on images with larger numbers of peaks, it would be even faster - appending the output to a list will get slower and slower for more peaks.
I think the first step needed here is to express the values in H in terms of the standard deviation of the field:
import numpy as np
H = H / np.std(H)
Now you can put a threshold on the values of this H. If the noise is assumed to be Gaussian, picking a threshold of 3 you can be quite sure (99.7%) that this pixel can be associated with a real peak and not noise. See here.
Now the further selection can start. It is not exactly clear to me what exactly you want to find. Do you want the exact location of peak values? Or do you want one location for a cluster of peaks which is in the middle of this cluster?
Anyway, starting from this point with all pixel values expressed in standard deviations of the field, you should be able to get what you want. If you want to find clusters you could perform a nearest neighbour search on the >3-sigma gridpoints and put a threshold on the distance. I.e. only connect them when they are close enough to each other. If several gridpoints are connected you can define this as a group/cluster and calculate some (sigma-weighted?) center of the cluster.
Hope my first contribution on Stackoverflow is useful for you!
The way I would do it:
1) normalize H between 0 and 1.
2) pick a threshold value, as tcaswell suggests. It could be between .9 and .99 for example
3) use masked arrays to keep only the x,y coordinates with H above threshold:
import numpy.ma as ma
x_masked=ma.masked_array(x, mask= H < thresold)
y_masked=ma.masked_array(y, mask= H < thresold)
4) now you can weight-average on the masked coordinates, with weight something like (H-threshold)^2, or any other power greater or equal to one, depending on your taste/tests.
Comment:
1) This is not robust with respect to the type of peaks you have, since you may have to adapt the thresold. This is the minor problem;
2) This DOES NOT work with two peaks as it is, and will give wrong results if the 2nd peak is above threshold.
Nonetheless, it will always give you an answer without crashing (with pros and cons of the thing..)
I'm adding this answer because it's the solution I ended up using. It's a combination of Bi Rico's comment here (May 30 at 18:54) and the answer given in this question: Find peak of 2d histogram.
As it turns out using the peak detection algorithm from this question Peak detection in a 2D array only complicates matters. After applying the Gaussian filter to the image all that needs to be done is to ask for the maximum bin (as Bi Rico pointed out) and then obtain the maximum in coordinates.
So instead of using the detect-peaks function as I did above, I simply add the following code after the Gaussian 2D histogram is obtained:
# Get 2D histogram.
H, xedges, yedges = np.histogram2d(x, y, range=rang, bins=binsxy)
# Get Gaussian filtered 2D histogram.
H1 = gaussian_filter(H, 2, mode='nearest')
# Get center of maximum in bin coordinates.
x_cent_bin, y_cent_bin = np.unravel_index(H1.argmax(), H1.shape)
# Get center in x,y coordinates.
x_cent_coor , y_cent_coord = np.average(xedges[x_cent_bin:x_cent_bin + 2]), np.average(yedges[y_cent_g:y_cent_g + 2])
For spatial analysis purposes, I am trying to set up a filter that would, for a pixel in a given neighbourhood, give the percentile of this pixel in its neighbourhood (defined by a structuring element).
Below is my best shot so far:
import numpy as np
import scipy.ndimage as ndimage
import scipy.stats as sp
def get_percentile(values, radius=3):
# Retrieve central pixel and neighbours values
cur_value = values[4]
other_values = np.delete(values, 4)
return sp.percentileofscore(other_values, cur_value)/100
def percentiles(image):
# definition of the neighbourhood (structuring element)
footprint = np.array([[1,1,1],
[1,1,1],
[1,1,1]])
# Using generic_filter to apply sequentially a my own user-defined
# function (`get_percentile`) in the filter
results = ndimage.generic_filter(
image,
get_percentile,
footprint=footprint,
mode='constant',
cval=np.nan)
return results
# Pick dimensions for a dummy example
dims = [12,15]
# Generate dummy example
df = np.random.randn(np.product(dims)).reshape(dims[0], dims[1])
percentiles(df)
It sort of work, but:
1. I'm sure the code is not really optimal, and could run faster
2. The dimension of my neighbourhood is hard coded. Something I would like is to better identify the central pixel on which I'm applying the filter (footprint) from its neighbours according to this filter.