I'm trying to get python to return, as close as possible, the center of the most obvious clustering in an image like the one below:
In my previous question I asked how to get the global maximum and the local maximums of a 2d array, and the answers given worked perfectly. The issue is that the center estimation I can get by averaging the global maximum obtained with different bin sizes is always slightly off than the one I would set by eye, because I'm only accounting for the biggest bin instead of a group of biggest bins (like one does by eye).
I tried adapting the answer to this question to my problem, but it turns out my image is too noisy for that algorithm to work. Here's my code implementing that answer:
import numpy as np
from scipy.ndimage.filters import maximum_filter
from scipy.ndimage.morphology import generate_binary_structure, binary_erosion
import matplotlib.pyplot as pp
from os import getcwd
from os.path import join, realpath, dirname
# Save path to dir where this code exists.
mypath = realpath(join(getcwd(), dirname(__file__)))
myfile = 'data_file.dat'
x, y = np.loadtxt(join(mypath,myfile), usecols=(1, 2), unpack=True)
xmin, xmax = min(x), max(x)
ymin, ymax = min(y), max(y)
rang = [[xmin, xmax], [ymin, ymax]]
paws = []
for d_b in range(25, 110, 25):
# Number of bins in x,y given the bin width 'd_b'
binsxy = [int((xmax - xmin) / d_b), int((ymax - ymin) / d_b)]
H, xedges, yedges = np.histogram2d(x, y, range=rang, bins=binsxy)
paws.append(H)
def detect_peaks(image):
"""
Takes an image and detect the peaks usingthe local maximum filter.
Returns a boolean mask of the peaks (i.e. 1 when
the pixel's value is the neighborhood maximum, 0 otherwise)
"""
# define an 8-connected neighborhood
neighborhood = generate_binary_structure(2,2)
#apply the local maximum filter; all pixel of maximal value
#in their neighborhood are set to 1
local_max = maximum_filter(image, footprint=neighborhood)==image
#local_max is a mask that contains the peaks we are
#looking for, but also the background.
#In order to isolate the peaks we must remove the background from the mask.
#we create the mask of the background
background = (image==0)
#a little technicality: we must erode the background in order to
#successfully subtract it form local_max, otherwise a line will
#appear along the background border (artifact of the local maximum filter)
eroded_background = binary_erosion(background, structure=neighborhood, border_value=1)
#we obtain the final mask, containing only peaks,
#by removing the background from the local_max mask
detected_peaks = local_max - eroded_background
return detected_peaks
#applying the detection and plotting results
for i, paw in enumerate(paws):
detected_peaks = detect_peaks(paw)
pp.subplot(4,2,(2*i+1))
pp.imshow(paw)
pp.subplot(4,2,(2*i+2) )
pp.imshow(detected_peaks)
pp.show()
and here's the result of that (varying the bin size):
Clearly my background is too noisy for that algorithm to work, so the question is: how can I make that algorithm less sensitive? If an alternative solution exists then please let me know.
EDIT
Following Bi Rico advise I attempted smoothing my 2d array before passing it on to the local maximum finder, like so:
H, xedges, yedges = np.histogram2d(x, y, range=rang, bins=binsxy)
H1 = gaussian_filter(H, 2, mode='nearest')
paws.append(H1)
These were the results with a sigma of 2, 4 and 8:
EDIT 2
A mode ='constant' seems to work much better than nearest. It converges to the right center with a sigma=2 for the largest bin size:
So, how do I get the coordinates of the maximum that shows in the last image?
Answering the last part of your question, always you have points in an image, you can find their coordinates by searching, in some order, the local maximums of the image. In case your data is not a point source, you can apply a mask to each peak in order to avoid the peak neighborhood from being a maximum while performing a future search. I propose the following code:
import matplotlib.image as mpimg
import matplotlib.pyplot as plt
import numpy as np
import copy
def get_std(image):
return np.std(image)
def get_max(image,sigma,alpha=20,size=10):
i_out = []
j_out = []
image_temp = copy.deepcopy(image)
while True:
k = np.argmax(image_temp)
j,i = np.unravel_index(k, image_temp.shape)
if(image_temp[j,i] >= alpha*sigma):
i_out.append(i)
j_out.append(j)
x = np.arange(i-size, i+size)
y = np.arange(j-size, j+size)
xv,yv = np.meshgrid(x,y)
image_temp[yv.clip(0,image_temp.shape[0]-1),
xv.clip(0,image_temp.shape[1]-1) ] = 0
print xv
else:
break
return i_out,j_out
#reading the image
image = mpimg.imread('ggd4.jpg')
#computing the standard deviation of the image
sigma = get_std(image)
#getting the peaks
i,j = get_max(image[:,:,0],sigma, alpha=10, size=10)
#let's see the results
plt.imshow(image, origin='lower')
plt.plot(i,j,'ro', markersize=10, alpha=0.5)
plt.show()
The image ggd4 for the test can be downloaded from:
http://www.ipac.caltech.edu/2mass/gallery/spr99/ggd4.jpg
The first part is to get some information about the noise in the image. I did it by computing the standard deviation of the full image (actually is better to select an small rectangle without signal). This is telling us how much noise is present in the image.
The idea to get the peaks is to ask for successive maximums, which are above of certain threshold (let's say, 3, 4, 5, 10, or 20 times the noise). This is what the function get_max is actually doing. It performs the search of maximums until one of them is below the threshold imposed by the noise. In order to avoid finding the same maximum many times it is necessary to remove the peaks from the image. In the general way, the shape of the mask to do so depends strongly on the problem that one want to solve. for the case of stars, it should be good to remove the star by using a Gaussian function, or something similar. I have chosen for simplicity a square function, and the size of the function (in pixels) is the variable "size".
I think that from this example, anybody can improve the code by adding more general things.
EDIT:
The original image looks like:
While the image after identifying the luminous points looks like this:
Too much of a n00b on Stack Overflow to comment on Alejandro's answer elsewhere here. I would refine his code a bit to use a preallocated numpy array for output:
def get_max(image,sigma,alpha=3,size=10):
from copy import deepcopy
import numpy as np
# preallocate a lot of peak storage
k_arr = np.zeros((10000,2))
image_temp = deepcopy(image)
peak_ct=0
while True:
k = np.argmax(image_temp)
j,i = np.unravel_index(k, image_temp.shape)
if(image_temp[j,i] >= alpha*sigma):
k_arr[peak_ct]=[j,i]
# this is the part that masks already-found peaks.
x = np.arange(i-size, i+size)
y = np.arange(j-size, j+size)
xv,yv = np.meshgrid(x,y)
# the clip here handles edge cases where the peak is near the
# image edge
image_temp[yv.clip(0,image_temp.shape[0]-1),
xv.clip(0,image_temp.shape[1]-1) ] = 0
peak_ct+=1
else:
break
# trim the output for only what we've actually found
return k_arr[:peak_ct]
In profiling this and Alejandro's code using his example image, this code about 33% faster (0.03 sec for Alejandro's code, 0.02 sec for mine.) I expect on images with larger numbers of peaks, it would be even faster - appending the output to a list will get slower and slower for more peaks.
I think the first step needed here is to express the values in H in terms of the standard deviation of the field:
import numpy as np
H = H / np.std(H)
Now you can put a threshold on the values of this H. If the noise is assumed to be Gaussian, picking a threshold of 3 you can be quite sure (99.7%) that this pixel can be associated with a real peak and not noise. See here.
Now the further selection can start. It is not exactly clear to me what exactly you want to find. Do you want the exact location of peak values? Or do you want one location for a cluster of peaks which is in the middle of this cluster?
Anyway, starting from this point with all pixel values expressed in standard deviations of the field, you should be able to get what you want. If you want to find clusters you could perform a nearest neighbour search on the >3-sigma gridpoints and put a threshold on the distance. I.e. only connect them when they are close enough to each other. If several gridpoints are connected you can define this as a group/cluster and calculate some (sigma-weighted?) center of the cluster.
Hope my first contribution on Stackoverflow is useful for you!
The way I would do it:
1) normalize H between 0 and 1.
2) pick a threshold value, as tcaswell suggests. It could be between .9 and .99 for example
3) use masked arrays to keep only the x,y coordinates with H above threshold:
import numpy.ma as ma
x_masked=ma.masked_array(x, mask= H < thresold)
y_masked=ma.masked_array(y, mask= H < thresold)
4) now you can weight-average on the masked coordinates, with weight something like (H-threshold)^2, or any other power greater or equal to one, depending on your taste/tests.
Comment:
1) This is not robust with respect to the type of peaks you have, since you may have to adapt the thresold. This is the minor problem;
2) This DOES NOT work with two peaks as it is, and will give wrong results if the 2nd peak is above threshold.
Nonetheless, it will always give you an answer without crashing (with pros and cons of the thing..)
I'm adding this answer because it's the solution I ended up using. It's a combination of Bi Rico's comment here (May 30 at 18:54) and the answer given in this question: Find peak of 2d histogram.
As it turns out using the peak detection algorithm from this question Peak detection in a 2D array only complicates matters. After applying the Gaussian filter to the image all that needs to be done is to ask for the maximum bin (as Bi Rico pointed out) and then obtain the maximum in coordinates.
So instead of using the detect-peaks function as I did above, I simply add the following code after the Gaussian 2D histogram is obtained:
# Get 2D histogram.
H, xedges, yedges = np.histogram2d(x, y, range=rang, bins=binsxy)
# Get Gaussian filtered 2D histogram.
H1 = gaussian_filter(H, 2, mode='nearest')
# Get center of maximum in bin coordinates.
x_cent_bin, y_cent_bin = np.unravel_index(H1.argmax(), H1.shape)
# Get center in x,y coordinates.
x_cent_coor , y_cent_coord = np.average(xedges[x_cent_bin:x_cent_bin + 2]), np.average(yedges[y_cent_g:y_cent_g + 2])
Related
I need to write a python function or class with the following Input/Output
Input :
The position of the X-rays source (still not sure why it's needed)
The position of the board (still not sure why it's needed)
A three dimensional CT-Scan
Output :
A 2D X-ray Scan (simulate an X-Ray Scan which is a scan that goes through the whole body)
A few important remarks to what I'm trying to achieve:
You don’t need additional information from the real world or any advanced knowledge.
You can add any input parameter that you see fit.
If your method produces artifacts, you are excepted to fix them.
Please explain every step of your method.
What I've done until now: (.py file added)
I've read the .dicom files, which are located in "Case2" folder.
These .dicom files can be downloaded from my Google Drive:
https://drive.google.com/file/d/1lHoMJgj_8Dt62JaR2mMlK9FDnfkesH5F/view?usp=sharing
I've sorted the files by their position.
Finally, I've created a 3D array, and added all the images to that array in order to plot the results (you can see them in the added image) - which are slice of the CT Scans. (reference: https://pydicom.github.io/pydicom/stable/auto_examples/image_processing/reslice.html#sphx-glr-auto-examples-image-processing-reslice-py)
Here's the full code:
import pydicom as dicom
import os
import matplotlib.pyplot as plt
import sys
import glob
import numpy as np
path = "./Case2"
ct_images = os.listdir(path)
slices = [dicom.read_file(path + '/' + s, force=True) for s in ct_images]
slices[0].ImagePositionPatient[2]
slices = sorted(slices, key = lambda x: x.ImagePositionPatient[2])
#print(slices)
# Read a dicom file with a ctx manager
with dicom.dcmread(path + '/' + ct_images[0]) as ds:
# plt.imshow(ds.pixel_array, cmap=plt.cm.bone)
print(ds)
#plt.show()
fig = plt.figure()
for num, each_slice in enumerate(slices[:12]):
y= fig.add_subplot(3,4,num+1)
#print(each_slice)
y.imshow(each_slice.pixel_array)
plt.show()
for i in range(len(ct_images)):
with dicom.dcmread(path + '/' + ct_images[i], force=True) as ds:
plt.imshow(ds.pixel_array, cmap=plt.cm.bone)
plt.show()
# pixel aspects, assuming all slices are the same
ps = slices[0].PixelSpacing
ss = slices[0].SliceThickness
ax_aspect = ps[1]/ps[0]
sag_aspect = ps[1]/ss
cor_aspect = ss/ps[0]
# create 3D array
img_shape = list(slices[0].pixel_array.shape)
img_shape.append(len(slices))
img3d = np.zeros(img_shape)
# fill 3D array with the images from the files
for i, s in enumerate(slices):
img2d = s.pixel_array
img3d[:, :, i] = img2d
# plot 3 orthogonal slices
a1 = plt.subplot(2, 2, 1)
plt.imshow(img3d[:, :, img_shape[2]//2])
a1.set_aspect(ax_aspect)
a2 = plt.subplot(2, 2, 2)
plt.imshow(img3d[:, img_shape[1]//2, :])
a2.set_aspect(sag_aspect)
a3 = plt.subplot(2, 2, 3)
plt.imshow(img3d[img_shape[0]//2, :, :].T)
a3.set_aspect(cor_aspect)
plt.show()
The result isn't what I wanted because:
These are slice of the CT scans. I need to simulate an X-Ray Scan which is a scan that goes through the whole body.
Would love your help to simulate an X-Ray scan that goes through the body.
I've read that it could be done in the following way: "A normal 2D X-ray image is a sum projection through the volume. Send parallel rays through the volume and add up the densities." Which I'm not sure how it's accomplished in code.
References that may help: https://pydicom.github.io/pydicom/stable/index.html
EDIT: as further answers noted, this solution yields a parallel projection, not a perspective projection.
From what I understand of the definition of "A normal 2D X-ray image", this can be done by summing each density for each pixel, for each slice of a projection in a given direction.
With your 3D volume, this means performing a sum over a given axis, which can be done with ndarray.sum(axis) in numpy.
# plot 3 orthogonal slices
a1 = plt.subplot(2, 2, 1)
plt.imshow(img3d.sum(2), cmap=plt.cm.bone)
a1.set_aspect(ax_aspect)
a2 = plt.subplot(2, 2, 2)
plt.imshow(img3d.sum(1), cmap=plt.cm.bone)
a2.set_aspect(sag_aspect)
a3 = plt.subplot(2, 2, 3)
plt.imshow(img3d.sum(0).T, cmap=plt.cm.bone)
a3.set_aspect(cor_aspect)
plt.show()
This yields the following result:
Which, to me, looks like a X-ray image.
EDIT : the result is a bit too "bright", so you may want to apply gamma correction. With matplotlib, import matplotlib.colors as colors and add a colors.PowerNorm(gamma_value) as the norm parameter in plt.imshow:
plt.imshow(img3d.sum(0).T, norm=colors.PowerNorm(gamma=3), cmap=plt.cm.bone)
Result:
The way I understand the task you are expected to write a ray-tracer that follows the X-rays from the source (that's why you need its position) to the projection plane (That's why you need its position).
Sum up the values as you go and do a mapping to the allowed grey-values in the end.
Take a look at line drawing algorithms to see how you can do this.
It is really no black magic, I have done this kind of stuff more than 30 years ago. Damn, I'm old...
What you want is a perspective projection instead of a parallel projection. In order to obtain this, you need to know which values to sum for each point on the projection plane. There are multiple considerations to keep in mind:
We are talking about voxels, so you need to a method to determine whether a certain point in space belongs to a certain voxel in your volume.
A line between two points is straight, but because voxels are a discrete representation of space different methods of determining the above can lead to different (mostly minor) results. This difference will ultimately also lead to slightly different images depending on the alogrithms used. This is expected.
Let's say you have a CT scan volume comprising of 256 512x512 pixel slices. This gives you a volume of 512x512x256 voxels. For each of these voxels you need to know what their positions in x,y,z coordinates are. You can do this as follows:
- Use the ImagePositionPatient attribute to find out the x,y,z coordinate of the upper left hand corner pixel in mm for a given slice.
- Use the PixelSpacing attribute to calculate the x,y,z coordinates of the other pixels in your slice. Repeat for all slices
edit: i just found a counterexample against below method, the rest is still helpful. will update
Now to find out for a given point (Xa, Ya, Za) what voxel values need to be summed if the source is at (Xb, Yb, Zb):
Find the voxel that belongs to (Xa,Ya, Za). Keep pixel/voxel data.
Calculate (you can do this with NumPy) the distance between voxel(Xa, Ya, Za) and (Xb, Yb, Zb). There is an optimalization possible here :)
For all directly surrounding voxels (that will be a number of 3x3x3-1 voxels) also calculate this distance. Can also be optimized :)
Take the voxel with the shortest distance as the starting point for a next iteration of the above. Add pixel/voxel data.
Repeat until out of bounds of you CT volume.
In order to obtain a projection repeat these steps for all points on your projection plane and visualize the result. Good luck with your assignment! :)
I have several points (x,y,z coordinates) in a 3D box with associated masses. I want to draw an histogram of the mass-density that is found in spheres of a given radius R.
I have written a code that, providing I did not make any errors which I think I may have, works in the following way:
My "real" data is something huge thus I wrote a little code to generate non overlapping points randomly with arbitrary mass in a box.
I compute a 3D histogram (weighted by mass) with a binning about 10 times smaller than the radius of my spheres.
I take the FFT of my histogram, compute the wave-modes (kx, ky and kz) and use them to multiply my histogram in Fourier space by the analytic expression of the 3D top-hat window (sphere filtering) function in Fourier space.
I inverse FFT my newly computed grid.
Thus drawing a 1D-histogram of the values on each bin would give me what I want.
My issue is the following: given what I do there should not be any negative values in my inverted FFT grid (step 4), but I get some, and with values much higher that the numerical error.
If I run my code on a small box (300x300x300 cm3 and the points of separated by at least 1 cm) I do not get the issue. I do get it for 600x600x600 cm3 though.
If I set all the masses to 0, thus working on an empty grid, I do get back my 0 without any noted issues.
I here give my code in a full block so that it is easily copied.
import numpy as np
import matplotlib.pyplot as plt
import random
from numba import njit
# 1. Generate a bunch of points with masses from 1 to 3 separated by a radius of 1 cm
radius = 1
rangeX = (0, 100)
rangeY = (0, 100)
rangeZ = (0, 100)
rangem = (1,3)
qty = 20000 # or however many points you want
# Generate a set of all points within 1 of the origin, to be used as offsets later
deltas = set()
for x in range(-radius, radius+1):
for y in range(-radius, radius+1):
for z in range(-radius, radius+1):
if x*x + y*y + z*z<= radius*radius:
deltas.add((x,y,z))
X = []
Y = []
Z = []
M = []
excluded = set()
for i in range(qty):
x = random.randrange(*rangeX)
y = random.randrange(*rangeY)
z = random.randrange(*rangeZ)
m = random.uniform(*rangem)
if (x,y,z) in excluded: continue
X.append(x)
Y.append(y)
Z.append(z)
M.append(m)
excluded.update((x+dx, y+dy, z+dz) for (dx,dy,dz) in deltas)
print("There is ",len(X)," points in the box")
# Compute the 3D histogram
a = np.vstack((X, Y, Z)).T
b = 200
H, edges = np.histogramdd(a, weights=M, bins = b)
# Compute the FFT of the grid
Fh = np.fft.fftn(H, axes=(-3,-2, -1))
# Compute the different wave-modes
kx = 2*np.pi*np.fft.fftfreq(len(edges[0][:-1]))*len(edges[0][:-1])/(np.amax(X)-np.amin(X))
ky = 2*np.pi*np.fft.fftfreq(len(edges[1][:-1]))*len(edges[1][:-1])/(np.amax(Y)-np.amin(Y))
kz = 2*np.pi*np.fft.fftfreq(len(edges[2][:-1]))*len(edges[2][:-1])/(np.amax(Z)-np.amin(Z))
# I create a matrix containing the values of the filter in each point of the grid in Fourier space
R = 5
Kh = np.empty((len(kx),len(ky),len(kz)))
#njit(parallel=True)
def func_njit(kx, ky, kz, Kh):
for i in range(len(kx)):
for j in range(len(ky)):
for k in range(len(kz)):
if np.sqrt(kx[i]**2+ky[j]**2+kz[k]**2) != 0:
Kh[i][j][k] = (np.sin((np.sqrt(kx[i]**2+ky[j]**2+kz[k]**2))*R)-(np.sqrt(kx[i]**2+ky[j]**2+kz[k]**2))*R*np.cos((np.sqrt(kx[i]**2+ky[j]**2+kz[k]**2))*R))*3/((np.sqrt(kx[i]**2+ky[j]**2+kz[k]**2))*R)**3
else:
Kh[i][j][k] = 1
return Kh
Kh = func_njit(kx, ky, kz, Kh)
# I multiply each point of my grid by the associated value of the filter (multiplication in Fourier space = convolution in real space)
Gh = np.multiply(Fh, Kh)
# I take the inverse FFT of my filtered grid. I take the real part to get back floats but there should only be zeros for the imaginary part.
Density = np.real(np.fft.ifftn(Gh,axes=(-3,-2, -1)))
# Here it shows if there are negative values the magnitude of the error
print(np.min(Density))
D = Density.flatten()
N = np.mean(D)
# I then compute the histogram I want
hist, bins = np.histogram(D/N, bins='auto', density=True)
bin_centers = (bins[1:]+bins[:-1])*0.5
plt.plot(bin_centers, hist)
plt.xlabel('rho/rhom')
plt.ylabel('P(rho)')
plt.show()
Do you know why I'm getting these negative values? Do you think there is a simpler way to proceed?
Sorry if this is a very long post, I tried to make it very clear and will edit it with your comments, thanks a lot!
-EDIT-
A follow-up question on the issue can be found [here].1
The filter you create in the frequency domain is only an approximation to the filter you want to create. The problem is that we are dealing with the DFT here, not the continuous-domain FT (with its infinite frequencies). The Fourier transform of a ball is indeed the function you describe, however this function is infinitely large -- it is not band-limited!
By sampling this function only within a window, you are effectively multiplying it with an ideal low-pass filter (the rectangle of the domain). This low-pass filter, in the spatial domain, has negative values. Therefore, the filter you create also has negative values in the spatial domain.
This is a slice through the origin of the inverse transform of Kh (after I applied fftshift to move the origin to the middle of the image, for better display):
As you can tell here, there is some ringing that leads to negative values.
One way to overcome this ringing is to apply a windowing function in the frequency domain. Another option is to generate a ball in the spatial domain, and compute its Fourier transform. This second option would be the simplest to achieve. Do remember that the kernel in the spatial domain must also have the origin at the top-left pixel to obtain a correct FFT.
A windowing function is typically applied in the spatial domain to avoid issues with the image border when computing the FFT. Here, I propose to apply such a window in the frequency domain to avoid similar issues when computing the IFFT. Note, however, that this will always further reduce the bandwidth of the kernel (the windowing function would work as a low-pass filter after all), and therefore yield a smoother transition of foreground to background in the spatial domain (i.e. the spatial domain kernel will not have as sharp a transition as you might like). The best known windowing functions are Hamming and Hann windows, but there are many others worth trying out.
Unsolicited advice:
I simplified your code to compute Kh to the following:
kr = np.sqrt(kx[:,None,None]**2 + ky[None,:,None]**2 + kz[None,None,:]**2)
kr *= R
Kh = (np.sin(kr)-kr*np.cos(kr))*3/(kr)**3
Kh[0,0,0] = 1
I find this easier to read than the nested loops. It should also be significantly faster, and avoid the need for njit. Note that you were computing the same distance (what I call kr here) 5 times. Factoring out such computation is not only faster, but yields more readable code.
Just a guess:
Where do you get the idea that the imaginary part MUST be zero? Have you ever tried to take the absolute values (sqrt(re^2 + im^2)) and forget about the phase instead of just taking the real part? Just something that came to my mind.
I have some images for which I want to calculate the Minkowski/box count dimension to determine the fractal characteristics in the image. Here are 2 example images:
10.jpg:
24.jpg:
I'm using the following code to calculate the fractal dimension:
import numpy as np
import scipy
def rgb2gray(rgb):
r, g, b = rgb[:,:,0], rgb[:,:,1], rgb[:,:,2]
gray = 0.2989 * r + 0.5870 * g + 0.1140 * b
return gray
def fractal_dimension(Z, threshold=0.9):
# Only for 2d image
assert(len(Z.shape) == 2)
# From https://github.com/rougier/numpy-100 (#87)
def boxcount(Z, k):
S = np.add.reduceat(
np.add.reduceat(Z, np.arange(0, Z.shape[0], k), axis=0),
np.arange(0, Z.shape[1], k), axis=1)
# We count non-empty (0) and non-full boxes (k*k)
return len(np.where((S > 0) & (S < k*k))[0])
# Transform Z into a binary array
Z = (Z < threshold)
# Minimal dimension of image
p = min(Z.shape)
# Greatest power of 2 less than or equal to p
n = 2**np.floor(np.log(p)/np.log(2))
# Extract the exponent
n = int(np.log(n)/np.log(2))
# Build successive box sizes (from 2**n down to 2**1)
sizes = 2**np.arange(n, 1, -1)
# Actual box counting with decreasing size
counts = []
for size in sizes:
counts.append(boxcount(Z, size))
# Fit the successive log(sizes) with log (counts)
coeffs = np.polyfit(np.log(sizes), np.log(counts), 1)
return -coeffs[0]
I = rgb2gray(scipy.misc.imread("24.jpg"))
print("Minkowski–Bouligand dimension (computed): ", fractal_dimension(I))
From the literature I've read, it has been suggested that natural scenes (e.g. 24.jpg) are more fractal in nature, and thus should have a larger fractal dimension value
The results it gives me are in the opposite direction than what the literature would suggest:
10.jpg: 1.259
24.jpg: 1.073
I would expect the fractal dimension for the natural image to be larger than for the urban
Am I calculating the value incorrectly in my code? Or am I just interpreting the results incorrectly?
With fractal dimension of something physical the dimension might converge at different stages to different values. For example, a very thin line (but of finite width) would initially seem one dimensional, then eventual two dimensional as its width becomes of comparable size to the boxes used.
Lets see the dimensions that you have produced:
What do you see? Well the linear fits are not so good. And the dimensions is going towards a value of two.
To diagnose, lets take a look at the grey-scale images produced, with the threshold that you have (that is, 0.9):
The nature picture has almost become an ink blob. The dimensions would go to a value of 2 very soon, as the graphs told us. That is because we pretty much lost the image.
And now with a threshold of 50?
With new linear fits that are much better, the dimensions are 1.6 and 1.8 for urban and nature respectively. Keep in mind, that the urban picture actually has a lot of structure to it, in particular on the textured walls.
In future good threshold values would be ones closer to the mean of the grey scale images, that way your image does not turn into a blob of ink!
A good text book on this is "Fractals everywhere" by Michael F. Barnsley.
I've interpolated a spline to fit pixel data from an image with a curve that I would like to straighten. I'm not sure what tools are appropriate to solve this problem. Can someone recommend an approach?
Here's how I'm getting my spline:
import numpy as np
from skimage import io
from scipy import interpolate
import matplotlib.pyplot as plt
from sklearn.neighbors import NearestNeighbors
import networkx as nx
# Read a skeletonized image, return an array of points on the skeleton, and divide them into x and y coordinates
skeleton = io.imread('skeleton.png')
curvepoints = np.where(skeleton==False)
xpoints = curvepoints[1]
ypoints = -curvepoints[0]
# reformats x and y coordinates into a 2-dimensional array
inputarray = np.c_[xpoints, ypoints]
# runs a nearest neighbors algorithm on the coordinate array
clf = NearestNeighbors(2).fit(inputarray)
G = clf.kneighbors_graph()
T = nx.from_scipy_sparse_matrix(G)
# sorts coordinates according to their nearest neighbors order
order = list(nx.dfs_preorder_nodes(T, 0))
xx = xpoints[order]
yy = ypoints[order]
# Loops over all points in the coordinate array as origin, determining which results in the shortest path
paths = [list(nx.dfs_preorder_nodes(T, i)) for i in range(len(inputarray))]
mindist = np.inf
minidx = 0
for i in range(len(inputarray)):
p = paths[i] # order of nodes
ordered = inputarray[p] # ordered nodes
# find cost of that order by the sum of euclidean distances between points (i) and (i+1)
cost = (((ordered[:-1] - ordered[1:])**2).sum(1)).sum()
if cost < mindist:
mindist = cost
minidx = i
opt_order = paths[minidx]
xxx = xpoints[opt_order]
yyy = ypoints[opt_order]
# fits a spline to the ordered coordinates
tckp, u = interpolate.splprep([xxx, yyy], s=3, k=2, nest=-1)
xpointsnew, ypointsnew = interpolate.splev(np.linspace(0,1,270), tckp)
# prints spline variables
print(tckp)
# plots the spline
plt.plot(xpointsnew, ypointsnew, 'r-')
plt.show()
My broader project is to follow the approach outlined in A novel method for straightening curved text-lines in stylistic documents. That article is reasonably detailed in finding the line that describes curved text, but much less so where straightening the curve is concerned. I have trouble visualizing the only reference to straightening that I see is in the abstract:
find the angle between the normal at a point on the curve and the vertical line, and finally visit each point on the text and rotate by their corresponding angles.
I also found Geometric warp of image in python, which seems promising. If I could rectify the spline, I think that would allow me to set a range of target points for the affine transform to map to. Unfortunately, I haven't found an approach to rectify my spline and test it.
Finally, this program implements an algorithm to straighten splines, but the paper on the algorithm is behind a pay wall and I can't make sense of the javascript.
Basically, I'm lost and in need of pointers.
Update
The affine transformation was the only approach I had any idea how to start exploring, so I've been working on that since I posted. I generated a set of destination coordinates by performing an approximate rectification of the curve based on the euclidean distance between points on my b-spline.
From where the last code block left off:
# calculate euclidian distances between adjacent points on the curve
newcoordinates = np.c_[xpointsnew, ypointsnew]
l = len(newcoordinates) - 1
pointsteps = []
for index, obj in enumerate(newcoordinates):
if index < l:
ord1 = np.c_[newcoordinates[index][0], newcoordinates[index][1]]
ord2 = np.c_[newcoordinates[index + 1][0], newcoordinates[index + 1][1]]
length = spatial.distance.cdist(ord1, ord2)
pointsteps.append(length)
# calculate euclidian distance between first point and each consecutive point
xpositions = np.asarray(pointsteps).cumsum()
# compose target coordinates for the line after the transform
targetcoordinates = [(0,0),]
for element in xpositions:
targetcoordinates.append((element, 0))
# perform affine transformation with newcoordinates as control points and targetcoordinates as target coordinates
tform = PiecewiseAffineTransform()
tform.estimate(newcoordinates, targetcoordinates)
I'm presently hung up on errors with the affine transform (scipy.spatial.qhull.QhullError: QH6154 Qhull precision error: Initial simplex is flat (facet 1 is coplanar with the interior point)
), but I'm not sure whether it's because of a problem with how I'm feeding the data in, or because I'm abusing the transform to do my projection.
I got the same error with you when using scipy.spatial.ConvexHull.
First, let me explain my project: what i wanted to do is to segment the people from its background(image matting). In my code, first I read an image and a trimap, then according to the trimap, I segment the original image to foreground, bakground and unknown pixels. Here is part of the coed:
img = scipy.misc.imread('sweater_black.png') #color_image
trimap = scipy.misc.imread('sw_trimap.png', flatten='True') #trimap
bg = trimap == 0 #background
fg = trimap == 255 #foreground
unknown = True ^ np.logical_or(fg,bg) #unknown pixels
fg_px = img[fg] #here i got the rgb value of the foreground pixels,then send them to the ConvexHull
fg_hull = scipy.spatial.ConvexHull(fg_px)
But i got an error here.So I check the Array of fg_px and then I found this array is n*4. which means every scalar i send to ConvexHull has four values. Howerver, the input of ConvexHUll should be 3 dimension.
I source my error and found that the input color image is 32bits(rgb channel and alpha channel) which means it has an alpha channel. After transferring the image to 24 bit (which means only rgb channels), the code works.
In one sentence, the input of ConvexHull should be b*4, so check your input data! Hope this works for you~
I am trying to develop a fast algorithm in python for finding peaks in an image and then finding the centroid of those peaks. I have written the following code using the scipy.ndimage.label and ndimage.find_objects for locating the objects. This seems to be the bottleneck in the code, and it takes about 7 ms to locate 20 objects in a 500x500 image. I would like to scale this up to larger (2000x2000) image, but then the time increases to almost 100 ms. So, I'm wondering if there is a faster option.
Here is the code that I have so far, which works, but is slow. First I simulate my data using some gaussian peaks. This part is slow, but in practice I will be using real data, so I don't care too much about speeding that part up. I would like to be able to find the peaks very quickly.
import time
import numpy as np
import matplotlib.pyplot as plt
import scipy.ndimage
import matplotlib.patches
plt.figure(figsize=(10,10))
ax1 = plt.subplot(221)
ax2 = plt.subplot(222)
ax3 = plt.subplot(223)
ax4 = plt.subplot(224)
size = 500 #width and height of image in pixels
peak_height = 100 # define the height of the peaks
num_peaks = 20
noise_level = 50
threshold = 60
np.random.seed(3)
#set up a simple, blank image (Z)
x = np.linspace(0,size,size)
y = np.linspace(0,size,size)
X,Y = np.meshgrid(x,y)
Z = X*0
#now add some peaks
def gaussian(X,Y,xo,yo,amp=100,sigmax=4,sigmay=4):
return amp*np.exp(-(X-xo)**2/(2*sigmax**2) - (Y-yo)**2/(2*sigmay**2))
for xo,yo in size*np.random.rand(num_peaks,2):
widthx = 5 + np.random.randn(1)
widthy = 5 + np.random.randn(1)
Z += gaussian(X,Y,xo,yo,amp=peak_height,sigmax=widthx,sigmay=widthy)
#of course, add some noise:
Z = Z + scipy.ndimage.gaussian_filter(0.5*noise_level*np.random.rand(size,size),sigma=5)
Z = Z + scipy.ndimage.gaussian_filter(0.5*noise_level*np.random.rand(size,size),sigma=1)
t = time.time() #Start timing the peak-finding algorithm
#Set everything below the threshold to zero:
Z_thresh = np.copy(Z)
Z_thresh[Z_thresh<threshold] = 0
print 'Time after thresholding: %.5f seconds'%(time.time()-t)
#now find the objects
labeled_image, number_of_objects = scipy.ndimage.label(Z_thresh)
print 'Time after labeling: %.5f seconds'%(time.time()-t)
peak_slices = scipy.ndimage.find_objects(labeled_image)
print 'Time after finding objects: %.5f seconds'%(time.time()-t)
def centroid(data):
h,w = np.shape(data)
x = np.arange(0,w)
y = np.arange(0,h)
X,Y = np.meshgrid(x,y)
cx = np.sum(X*data)/np.sum(data)
cy = np.sum(Y*data)/np.sum(data)
return cx,cy
centroids = []
for peak_slice in peak_slices:
dy,dx = peak_slice
x,y = dx.start, dy.start
cx,cy = centroid(Z_thresh[peak_slice])
centroids.append((x+cx,y+cy))
print 'Total time: %.5f seconds\n'%(time.time()-t)
###########################################
#Now make the plots:
for ax in (ax1,ax2,ax3,ax4): ax.clear()
ax1.set_title('Original image')
ax1.imshow(Z,origin='lower')
ax2.set_title('Thresholded image')
ax2.imshow(Z_thresh,origin='lower')
ax3.set_title('Labeled image')
ax3.imshow(labeled_image,origin='lower') #display the color-coded regions
for peak_slice in peak_slices: #Draw some rectangles around the objects
dy,dx = peak_slice
xy = (dx.start, dy.start)
width = (dx.stop - dx.start + 1)
height = (dy.stop - dy.start + 1)
rect = matplotlib.patches.Rectangle(xy,width,height,fc='none',ec='red')
ax3.add_patch(rect,)
ax4.set_title('Centroids on original image')
ax4.imshow(Z,origin='lower')
for x,y in centroids:
ax4.plot(x,y,'kx',ms=10)
ax4.set_xlim(0,size)
ax4.set_ylim(0,size)
plt.tight_layout
plt.show()
The results for size=500:
EDIT: If the number of peaks is large (~100) and the size of the image is small, then the bottleneck is actually the centroiding part. So, perhaps the speed of this part also needs to be optimized.
Your method for finding the peaks (simple thresholding) is of course very sensitive to the choice of threshold: set it too low and you'll "detect" things that are not peaks; set it too high and you'll miss valid peaks.
There are more robust alternatives, that will detect all the local maxima in the image intensity regardless of their intensity value. My preferred one is applying a dilation with a small (5x5 or 7x7) structuring element, then find the pixels where the original image and its dilated version have the same value. This works because, by definition, dilation(x, y, E, img) = { max of img within E centered at pixel (x,y) }, and therefore dilation(x, y, E, img) = img(x, y) whenever (x,y) is the location of a local maximum at the scale of E.
With a fast implementation of the morphological operators (e.g. the one in OpenCV) this algorithm is linear in the size of the image in both space and time (one extra image-sized buffer for the dilated image, and one pass on both). In a pinch, it can also be implemented on-line without the extra buffer and a little extra complexity, and it's still linear time.
To further robustify it in the presence of salt-and-pepper or similar noise, which may introduce many false maxima, you can apply the method twice, with structuring elements of different size (say, 5x5 and 7x7), then retain only the stable maxima, where stability can be defined by unchanging position of the maxima, or by position not changing by more than one pixel, etc. Additionally, you may want to suppress low nearby maxima when you have reason to believe they are due to noise. An efficient way to do this is to first detect all the local maxima as above, sort them descending by height, then go down the sorted list and keep them if their value in the image has not changed and, if they are kept, set to zero all the pixels in a (2d+1) x (2d+1) neighborhood of them, where d is the min distance between nearby maxima that you are willing to tolerate.
If you have many peaks, it is faster to use scipy.ndimage.center_of_mass. You can replace your code starting with the definition of peak_slices, till the printing of the total time, with the following two lines:
centroids = scipy.ndimage.center_of_mass(Z_thresh, labeled_image,
np.arange(1, number_of_objects + 1))
centroids = [(j, i) for i, j in centroids]
For num_peaks = 20 this runs about 3x slower than your approach, but for num_peaks = 100 it runs about 10x faster. So your best option will depend on your actual data.
An other approach is to avoid all sum(), meshgrid() and stuff. Replace everything with straight linear algebra.
>>> def centroid2(data):
h,w=data.shape
x=np.arange(h)
y=np.arange(w)
x1=np.ones((1,h))
y1=np.ones((w,1))
return ((np.dot(np.dot(x1, data), y))/(np.dot(np.dot(x1, data), y1)),
(np.dot(np.dot(x, data), y1))/(np.dot(np.dot(x1, data), y1)))
#be careful, it returns two arrays
This can be expended to higher dimension as well. 60% of speedup compares to centroid()
The following centroid calculation is faster than both, especially for large data:
def centroidnp(data):
h,w = data.shape
x = np.arange(w)
y = np.arange(h)
vx = data.sum(axis=0)
vx /= vx.sum()
vy = data.sum(axis=1)
vy /= vy.sum()
return np.dot(vx,x),np.dot(vy,y)