I would like to find minimum distance of each voxel to a boundary element in a binary image in which the z voxel size is different from the xy voxel size. This is to say that a single voxel represents a 225x110x110 (zyx) nm volume.
Normally, I would do something with scipy.ndimage.morphology.distance_transform_edt (https://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.ndimage.morphology.distance_transform_edt.html) but this gives the assume that isotropic sizes of the voxel:
dtrans_stack = np.zeros_like(segm_stack) # empty array to add to
### iterate over the t dimension and get distance transform
for t_iter in range(dtrans_stack.shape[0]):
segm_ = segm_stack[t_iter, ...] # segmented image in single t
neg_segm = np.ones_like(segm_) - segm_ # negative of the segmented image
# get a ditance transform with isotropic voxel sizes
dtrans_stack_iso = distance_transform_edt(segm_)
dtrans_neg_stack_iso = -distance_transform_edt(neg_segm) # make distance in the segmented image negative
dtrans_stack[t_iter, ...] = dtrans_stack_iso + dtrans_neg_stack_iso
I can do this with brute force using scipy.spatial.distance.cdist (https://docs.scipy.org/doc/scipy/reference/generated/scipy.spatial.distance.cdist.html) but this takes ages and I'd rather avoid it if I can
vox_multiplier = np.array([z_voxelsize, xy_voxelsize, xy_voxelsize]) # array of voxel sizes
## get a subset of coordinatess so I'm not wasting times in empty space
disk_size = 5 # size of disk for binary dilation
mip_tz = np.max(np.max(decon_stack, axis = 1), axis = 0)
thresh_li = threshold_li(mip_tz) # from from skimage.filters
mip_mask = mip_tz >= thresh_li
mip_mask = remove_small_objects(mip_mask) # from skimage.morphology
mip_dilated = binary_dilation(mip_mask, disk(disk_size)) # from skimage.morphology
# get the coordinates of the mask
coords = np.argwhere(mip_dilated == 1)
ycoords = coords[:, 0]
xcoords = coords[:, 1]
# get the lower and upper bounds of the xyz coordinates
ylb = np.min(ycoords)
yub = np.max(ycoords)
xlb = np.min(xcoords)
xub = np.max(xcoords)
zlb = 0
zub = zdims -1
# make zeros arrays of the proper size
dtrans_stack = np.zeros_like(segm_stack)
dtrans_stack_neg = np.zeros_like(segm_stack) # this will be the distance transform into the low inten area
for t_iter in range(dtrans_stack.shape[0]):
segm_ = segm_stack[t_iter, ...]
neg_segm_ = np.ones_like(segm_) - segm_ # negative of the segmented image
# get the coordinats of segmented image and convert to nm
segm_coords = np.argwhere(segm_ == 1)
segm_coords_nm = vox_multiplier * segm_coords
neg_segm_coords = np.argwhere(neg_segm_ == 1)
neg_segm_coords_nm = vox_multiplier * neg_segm_coords
# make an empty arrays for the xy and z distance transforms
dtrans_stack_x = np.zeros_like(segm_)
dtrans_stack_y = np.zeros_like(segm_)
dtrans_stack_z = np.zeros_like(segm_)
dtrans_stack_neg_x = np.zeros_like(segm_)
dtrans_stack_neg_y = np.zeros_like(segm_)
dtrans_stack_neg_z = np.zeros_like(segm_)
# iterate over the zyx and determine the minimum distance in nm from segmented image
for z_iter in range(zlb, zub):
for y_iter in range(ylb, yub):
for x_iter in range(xlb, xub):
coord_nm = vox_multiplier* np.array([z_iter, y_iter, x_iter]) # change coords from pixel to nm
coord_nm = coord_nm.reshape(1, 3) # reshape for distance calculateion
dists_segm = distance.cdist(coord_nm, segm_coords_nm) # distance from the segmented image
dists_neg_segm = distance.cdist(coord_nm, neg_segm_coords_nm) # distance from the negative segmented image
dtrans_stack[t_iter, z_iter, y_iter, x_iter] = np.min(dists_segm) # add minimum distance to distance transfrom stack
dtrans_neg_stack[t_iter, z_iter, y_iter, x_iter] = np.min(dists_neg_segm)
Here is image of a single zslice of segmented image if that helps to clear things up
single z-slice of segmented image
Normally, I would do something with scipy.ndimage.morphology.distance_transform_edt but this gives the assume that isotropic sizes of the voxel:
It does no such thing! You are looking for the sampling= parameter. From the latest version of the docs:
Spacing of elements along each dimension. If a sequence, must be of length equal to the input rank; if a single number, this is used for all axes. If not specified, a grid spacing of unity is implied.
The wording "sampling" or "spacing" is probably a bit mysterious if you think of pixels as little squares/cubes, and that is probably why you missed it. In most situations, it is better to think of pixels as point samples on a grid, with fixed spacing between samples. I recommend Alvy Ray's a pixel is not a little square for a better understanding of this terminology.
Related
I have a mosaic tif file (gdalinfo below) I made (with some additional info on the tiles here) and have looked extensively for a function that simply returns the elevation (the z value of this mosaic) for a given lat/long. The functions I've seen want me to input the coordinates in the coordinates of the mosaic, but I want to use lat/long, is there something about GetGeoTransform() that I'm missing to achieve this?
This example for instance here shown below:
from osgeo import gdal
import affine
import numpy as np
def retrieve_pixel_value(geo_coord, data_source):
"""Return floating-point value that corresponds to given point."""
x, y = geo_coord[0], geo_coord[1]
forward_transform = \
affine.Affine.from_gdal(*data_source.GetGeoTransform())
reverse_transform = ~forward_transform
px, py = reverse_transform * (x, y)
px, py = int(px + 0.5), int(py + 0.5)
pixel_coord = px, py
data_array = np.array(data_source.GetRasterBand(1).ReadAsArray())
return data_array[pixel_coord[0]][pixel_coord[1]]
This gives me an out of bounds error as it's likely expecting x/y coordinates (e.g. retrieve_pixel_value([153.023499,-27.468968],dataset). I've also tried the following from here:
import rasterio
dat = rasterio.open(fname)
z = dat.read()[0]
def getval(lon, lat):
idx = dat.index(lon, lat, precision=1E-6)
return dat.xy(*idx), z[idx]
Is there a simple adjustment I can make so my function can query the mosaic in lat/long coords?
Much appreciated.
Driver: GTiff/GeoTIFF
Files: mosaic.tif
Size is 25000, 29460
Coordinate System is:
PROJCRS["GDA94 / MGA zone 56",
BASEGEOGCRS["GDA94",
DATUM["Geocentric Datum of Australia 1994",
ELLIPSOID["GRS 1980",6378137,298.257222101004,
LENGTHUNIT["metre",1]],
ID["EPSG",6283]],
PRIMEM["Greenwich",0,
ANGLEUNIT["degree",0.0174532925199433,
ID["EPSG",9122]]]],
CONVERSION["UTM zone 56S",
METHOD["Transverse Mercator",
ID["EPSG",9807]],
PARAMETER["Latitude of natural origin",0,
ANGLEUNIT["degree",0.0174532925199433],
ID["EPSG",8801]],
PARAMETER["Longitude of natural origin",153,
ANGLEUNIT["degree",0.0174532925199433],
ID["EPSG",8802]],
PARAMETER["Scale factor at natural origin",0.9996,
SCALEUNIT["unity",1],
ID["EPSG",8805]],
PARAMETER["False easting",500000,
LENGTHUNIT["metre",1],
ID["EPSG",8806]],
PARAMETER["False northing",10000000,
LENGTHUNIT["metre",1],
ID["EPSG",8807]],
ID["EPSG",17056]],
CS[Cartesian,2],
AXIS["easting",east,
ORDER[1],
LENGTHUNIT["metre",1,
ID["EPSG",9001]]],
AXIS["northing",north,
ORDER[2],
LENGTHUNIT["metre",1,
ID["EPSG",9001]]]]
Data axis to CRS axis mapping: 1,2
Origin = (491000.000000000000000,6977000.000000000000000)
Pixel Size = (1.000000000000000,-1.000000000000000)
Metadata:
AREA_OR_POINT=Area
Image Structure Metadata:
INTERLEAVE=BAND
Corner Coordinates:
Upper Left ( 491000.000, 6977000.000) (152d54'32.48"E, 27d19'48.33"S)
Lower Left ( 491000.000, 6947540.000) (152d54'31.69"E, 27d35'45.80"S)
Upper Right ( 516000.000, 6977000.000) (153d 9'42.27"E, 27d19'48.10"S)
Lower Right ( 516000.000, 6947540.000) (153d 9'43.66"E, 27d35'45.57"S)
Center ( 503500.000, 6962270.000) (153d 2' 7.52"E, 27d27'47.16"S)
Band 1 Block=25000x1 Type=Float32, ColorInterp=Gray
NoData Value=-999
Update 1 - I tried the following:
tif = r"mosaic.tif"
dataset = rio.open(tif)
d = dataset.read()[0]
def get_xy_coords(latlng):
transformer = Transformer.from_crs("epsg:4326", dataset.crs)
coords = [transformer.transform(x, y) for x,y in latlng][0]
#idx = dataset.index(coords[1], coords[0])
return coords #.xy(*idx), z[idx]
longx,laty = 153.023499,-27.468968
coords = get_elevation([(laty,longx)])
print(coords[0],coords[1])
print(dataset.width,dataset.height)
(502321.11181384244, 6961618.891167777)
25000 29460
So something is still not right. Maybe I need to subtract the coordinates from the bottom left/right of image e.g.
coords[0]-dataset.bounds.left,coords[1]-dataset.bounds.bottom
where
In [78]: dataset.bounds
Out[78]: BoundingBox(left=491000.0, bottom=6947540.0, right=516000.0, top=6977000.0)
Update 2 - Indeed, subtracting the corners of my box seems to get closer.. though I'm sure there is a much nice way just using the tif metadata to get what I want.
longx,laty = 152.94646, -27.463175
coords = get_xy_coords([(laty,longx)])
elevation = d[int(coords[1]-dataset.bounds.bottom),int(coords[0]-dataset.bounds.left)]
fig,ax = plt.subplots(figsize=(12,12))
ax.imshow(d,vmin=0,vmax=400,cmap='terrain',extent=[dataset.bounds.left,dataset.bounds.right,dataset.bounds.bottom,dataset.bounds.top])
ax.plot(coords[0],coords[1],'ko')
plt.show()
You basically have two distinct steps:
Convert lon/lat coordinates to map coordinates, this is only necessary if your input raster is not already in lon/lat. Map coordinates are the coordinates in the projection that the raster itself uses
Convert the map coordinates to pixel coordinates.
There are all kinds of tool you might use, perhaps to make things simpler (like pyproj, rasterio etc). But for such a simple case it's probably nice to start with doing it all in GDAL, that probably also enhances your understanding of what steps are needed.
Inputs
from osgeo import gdal, osr
raster_file = r'D:\somefile.tif'
lon = 153.023499
lat = -27.468968
lon/lat to map coordinates
# fetch metadata required for transformation
ds = gdal.OpenEx(raster_file)
raster_proj = ds.GetProjection()
gt = ds.GetGeoTransform()
ds = None # close file, could also keep it open till after reading
# coordinate transformation (lon/lat to map)
# define source projection
# this definition ensures the order is always lon/lat compared
# to EPSG:4326 for which it depends on the GDAL version (2 vs 3)
source_srs = osr.SpatialReference()
source_srs.ImportFromWkt(osr.GetUserInputAsWKT("urn:ogc:def:crs:OGC:1.3:CRS84"))
# define target projection based on the file
target_srs = osr.SpatialReference()
target_srs.ImportFromWkt(raster_proj)
# convert
ct = osr.CoordinateTransformation(source_srs, target_srs)
mapx, mapy, *_ = ct.TransformPoint(lon, lat)
You could verify this intermediate result by for example adding it as Point WKT in something like QGIS (using the QuickWKT plugin, making sure the viewer has the same projection as the raster).
map coordinates to pixel
# apply affine transformation to get pixel coordinates
gt_inv = gdal.InvGeoTransform(gt) # invert for map -> pixel
px, py = gdal.ApplyGeoTransform(gt_inv, mapx, mapy)
# it wil return fractional pixel coordinates, so convert to int
# before using them to read. Round to nearest with +0.5
py = int(py + 0.5)
px = int(px + 0.5)
# read pixel data
ds = gdal.OpenEx(raster_file) # open file again
elevation_value = ds.ReadAsArray(px, py, 1, 1)
ds = None
The elevation_value variable should be the value you're after. I would definitelly verify the result independently, try a few points in QGIS or the gdallocationinfo utility:
gdallocationinfo -l_srs "urn:ogc:def:crs:OGC:1.3:CRS84" filename.tif 153.023499 -27.468968
# Report:
# Location: (4228P,4840L)
# Band 1:
# Value: 1804.51879882812
If you're reading a lot of points, there will be some threshold at which it would be faster to read a large chunk and extract the values from that array, compared to reading every point individually.
edit:
For applying the same workflow on multiple points at once a few things change.
So for example having the inputs:
lats = np.array([-27.468968, -27.468968, -27.468968])
lons = np.array([153.023499, 153.023499, 153.023499])
The coordinate transformation needs to use ct.TransformPoints instead of ct.TransformPoint which also requires the coordinates to be stacked in a single array of shape [n_points, 2]:
coords = np.stack([lons.ravel(), lats.ravel()], axis=1)
mapx, mapy, *_ = np.asarray(ct.TransformPoints(coords)).T
# reshape in case of non-1D inputs
mapx = mapx.reshape(lons.shape)
mapy = mapy.reshape(lons.shape)
Converting from map to pixel coordinates changes because the GDAL method for this only takes single point. But manually doing this on the arrays would be:
px = gt_inv[0] + mapx * gt_inv[1] + mapy * gt_inv[2]
py = gt_inv[3] + mapx * gt_inv[4] + mapy * gt_inv[5]
And rounding the arrays to integer changes to:
px = (px + 0.5).astype(np.int32)
py = (py + 0.5).astype(np.int32)
If the raster (easily) fits in memory, reading all points would become:
ds = gdal.OpenEx(raster_file)
all_elevation_data = ds.ReadAsArray()
ds = None
elevation_values = all_elevation_data[py, px]
That last step could be optimized by checking highest/lowest pixel coordinates in both dimensions and only read that subset for example, but it would require normalizing the coordinates again to be valid for that subset.
The py and px arrays might also need to be clipped (eg np.clip) if the input coordinates fall outside the raster. In that case the pixel coordinates will be < 0 or >= xsize/ysize.
I have the following code which looks for feature points in a binary skeletonized image. I need to find ending points, branch points and intersection points separately and display their coordinates as (x, y, point type). For example, (147, 45, 3), where 3 is the number of adjacent pixels (branch point).
import cv2 as cv
import numpy as np
def extraction(img):
# Find row and column locations that are non-zero
(rows, cols) = np.nonzero(img)
# Initialize empty list of co-ordinates
skel_coords = []
# For each non-zero pixel
for (r, c) in zip(rows, cols):
# Extract an 8-connected neighbourhood
(col_neigh, row_neigh) = np.meshgrid(np.array([c - 1, c, c + 1]), np.array([r - 1, r, r + 1]))
# Cast to int to index into image
col_neigh = col_neigh.astype('int')
row_neigh = row_neigh.astype('int')
# Convert into a single 1D array and check for non-zero locations
pix_neighbourhood = img[row_neigh, col_neigh].ravel() != 0
# If the number of non-zero locations, add this to our list of co-ordinates
if np.sum(pix_neighbourhood) == 2:
skel_coords.append((c, r, 1))
elif np.sum(pix_neighbourhood) == 4:
skel_coords.append((c, r, 3))
elif np.sum(pix_neighbourhood) == 5:
skel_coords.append((c, r, 4))
return skel_coords
img = cv.imread('abc.png', 0)
coord = extraction(img)
for element in coord:
print(element)
The code correctly finds the number of neighboring pixels, but they are not branching and crossing points. You can see it in the picture below (the found point is marked in gray):
An enlarged image of a 3x3 pixel matrix (below, two white pixels are in a row):
I need to find points of the following kind for branch points (so that neighboring pixels alternate):
Does anyone have any ideas how to implement this? I would be very grateful for your help!
I have done a lot of searching but have yet to find an answer. I am currently working on some data of a crop field. I have PLY files for multiple fields which I have successfully read into, filtered, and visualised using Python and VTK. My main goal is to eventually segment and run analysis on individual crop plots.
However to make that task easier I first want to "Normalize" my point cloud so that all plots are essentially "on the same level". From the image I have attached you can see that the point clod slopes from one corner to its opposite. So what I want to flatten out the image so the ground points are all on the same plane/ level. And the reset of the points adjusted accordingly.
Point Cloud
I've also included my code to show how I got to this point. If anyone has any advice on how I can achieve the normalising to one plane I would be very appreciative. Sadly I cannot include my data as it is work related.
Thanks.
Josh
import vtk
from vtk.util import numpy_support
import numpy as np
filename = 'File.ply'
# Reader
r = vtk.vtkPLYReader()
r.SetFileName(filename)
# Filters
vgf = vtk.vtkVertexGlyphFilter()
vgf.SetInputConnection(r.GetOutputPort())
# Elevation
pc = r.GetOutput()
bounds = pc.GetBounds()
#print(bounds)
minz = bounds[4]
maxz = bounds[5]
#print(bounds[4], bounds[5])
evgf = vtk.vtkElevationFilter()
evgf.SetInputConnection(vgf.GetOutputPort())
evgf.SetLowPoint(0, 0, minz)
evgf.SetHighPoint(0, 0, maxz)
#pc.GetNumberOfPoints()
# Look up table
lut = vtk.vtkLookupTable()
lut.SetHueRange(0.667, 0)
lut.SetSaturationRange(1, 1)
lut.SetValueRange(1, 1)
lut.Build
# Renderer
mapper = vtk.vtkPolyDataMapper()
mapper.SetInputConnection(evgf.GetOutputPort())
mapper.SetLookupTable(lut)
actor = vtk.vtkActor()
actor.SetMapper(mapper)
renderer = vtk.vtkRenderer()
renWin = vtk.vtkRenderWindow()
renWin.AddRenderer(renderer)
iren = vtk.vtkRenderWindowInteractor()
iren.SetRenderWindow(renWin)
renderer.AddActor(actor)
renderer.SetBackground(0, 0, 0)
renWin.Render()
iren.Start()
I once solved a similar problem. Find below some code that I used back then. It uses two functions fitPlane and findTransformFromVectors that you could replace with your own implementations.
Note that there are many ways to fit a plane through a set of points. This SO post discusses compares scipy.optimize.minimize with scipy.linalg.lstsq. In another SO post, the use of PCA or RANSAC and other methods are suggested. You probably want to use methods provided by sklearn, numpy or other modules. My solution simply (and non-robustly) computes ordinary least squares regression.
import vtk
import numpy as np
# Convert vtk to numpy arrays
from vtk.util.numpy_support import vtk_to_numpy as vtk2np
# Create a random point cloud.
center = [3.0, 2.0, 1.0]
source = vtk.vtkPointSource()
source.SetCenter(center)
source.SetNumberOfPoints(50)
source.SetRadius(1.)
source.Update()
source = source.GetOutput()
# Extract the points from the point cloud.
points = vtk2np(source.GetPoints().GetData())
points = points.transpose()
# Fit a plane. nRegression contains the normal vector of the
# regression surface.
nRegression = fitPlane(points)
# Compute a transform that maps the source center to the origin and
# plane normal to the z-axis.
trafo = findTransformFromVectors(originFrom=center,
axisFrom=nRegression.transpose(),
originTo=(0,0,0),
axisTo=(0.,0.,1.))
# Apply transform to source.
sourceTransformed = vtk.vtkTransformFilter()
sourceTransformed.SetInputData(source)
sourceTransformed.SetTransform(trafo)
sourceTransformed.Update()
# Visualize output...
Here my implementations of fitPlane and findTransformFromVectors:
# The following code has been written by normanius under the CC BY-SA 4.0
# license.
# License: https://creativecommons.org/licenses/by-sa/4.0/
# Author: normanius: https://stackoverflow.com/users/3388962/normanius
# Date: October 2018
# Reference: https://stackoverflow.com/questions/52716438
def fitPlane(X, tolerance=1e-10):
'''
Estimate the plane normal by means of ordinary least dsquares.
Requirement: points X span the full column rank. If the points lie in a
perfect plane, the regression problem is ill-conditioned!
Formulas:
a = (XX^T)^(-1)*X*z
Surface normal:
n = [a[0], a[1], -1]
n = n/norm(n)
Plane intercept:
c = a[2]/norm(n)
NOTE: The condition number for the pseudo-inverse improves if the
formulation is changed to homogenous notation.
Formulas (homogenous):
a = (XX^T)^(-1)*[1,1,1]^T
n = a[:-1]
n = n/norm(n)
c = a[-1]/norm(n)
Arguments:
X: A matrix with n rows and 3 columns
tolerance: Minimal condition number accepted. If the condition
number is lower, the algorithm returns None.
Returns:
If the computation was successful, a numpy array of length three is
returned that represents the estimated plane normal. On failure,
None is returned.
'''
X = np.asarray(X)
d,N = X.shape
X = np.vstack([X,np.ones([1,N])])
z = np.ones([d+1,1])
XXT = np.dot(X, np.transpose(X)) # XXT=X*X^T
if np.linalg.det(XXT) < 1e-10:
# The test covers the case where n<3
return None
n = np.dot(np.linalg.inv(XXT), z)
intercept = n[-1]
n = n[:-1]
scale = np.linalg.norm(n)
n /= scale
intercept /= scale
return n
def findTransformFromVectors(originFrom=None, axisFrom=None,
originTo=None, axisTo=None,
origin=None,
scale=1):
'''
Compute a transformation that maps originFrom and axisFrom to originTo
and axisTo respectively. If scale is set to 'auto', the scale will be
determined such that the axes will also match in length:
scale = norm(axisTo)/norm(axisFrom)
Arguments: originFrom: sequences with 3 elements, or None
axisFrom: sequences with 3 elements, or None
originTo: sequences with 3 elements, or None
axisTo: sequences with 3 elements, or None
origin: sequences with 3 elements, or None,
overrides originFrom and originTo if set
scale: - scalar (isotropic scaling)
- sequence with 3 elements (anisotropic scaling),
- 'auto' (sets scale such that input axes match
in length after transforming axisFrom)
- None (no scaling)
Align two axes alone, assuming that we sit on (0,0,0)
findTransformFromVectors(axisFrom=a0, axisTo=a1)
Align two axes in one point (all calls are equivalent):
findTransformFromVectors(origin=o, axisFrom=a0, axisTo=a1)
findTransformFromVectors(originFrom=o, axisFrom=a0, axisTo=a1)
findTransformFromVectors(axisFrom=a0, originTo=o, axisTo=a1)
Move between two points:
findTransformFromVectors(orgin=o0, originTo=o1)
Move from one position to the other and align axes:
findTransformFromVectors(orgin=o0, axisFrom=a0, originTo=o1, axisTo=a1)
'''
# Prelude with trickle-down logic.
# Infer the origins if an information is not set.
if origin is not None:
# Check for ambiguous input.
assert(originFrom is None and originTo is None)
originFrom = origin
originTo = origin
if originFrom is None:
originFrom = originTo
if originTo is None:
originTo = originFrom
if originTo is None:
# We arrive here only if no origin information was set.
originTo = [0.,0.,0.]
originFrom = [0.,0.,0.]
originFrom = np.asarray(originFrom)
originTo = np.asarray(originTo)
# Check if any rotation will be involved.
axisFrom = np.asarray(axisFrom)
axisTo = np.asarray(axisTo)
axisFromL2 = np.linalg.norm(axisFrom)
axisToL2 = np.linalg.norm(axisTo)
if axisFrom is None or axisTo is None or axisFromL2==0 or axisToL2==0:
rotate = False
else:
rotate = not np.array_equal(axisFrom, axisTo)
# Scale.
if scale is None:
scale = 1.
if scale == 'auto':
scale = axisToL2/axisFromL2 if axisFromL2!=0. else 1.
if np.isscalar(scale):
scale = scale*np.ones(3)
if rotate:
rAxis = np.cross(axisFrom.ravel(), axisTo.ravel()) # rotation axis
angle = np.dot(axisFrom, axisTo) / axisFromL2 / axisToL2
angle = np.arccos(angle)
# Here we finally compute the transform.
trafo = vtk.vtkTransform()
trafo.Translate(originTo)
if rotate:
trafo.RotateWXYZ(angle / np.pi * 180, rAxis[0], rAxis[1], rAxis[2])
trafo.Scale(scale[0],scale[1],scale[2])
trafo.Translate(-originFrom)
return trafo
I have an image that is of size 50000x50000. It has around 25000 connected different connected components. I'm using ndimage.label to label each of them and then I find the non zero points and finally get the min x, max x, min y and max y values. However, I have to find these coordinates is for each of the 25000 connected components. This is expensive as I have to run np.nonzero on the 50000x50000 image 25000 times. Here is a snippet of the code doing what I just mentioned.
im, _ = ndimage.label(im)
num_instances = np.max(np.max(im))
for instance_id in range(1,num_instances+1):
im_inst = im == instance_id
points = np.nonzero(im_inst) # running this is expensive as im is 50000x50000
cropped_min_x_1 = np.min(points[0])
cropped_min_y_1 = np.min(points[1])
cropped_max_x_1 = np.max(points[0])+1
cropped_max_y_1 = np.max(points[1])+1
Does anyone know what I can do to significantly speed up this process?
If the fraction of labelled pixels is not too large:
nz = np.flatnonzero(im)
order = np.argsort(im.ravel()[nz])
nz = nz[order]
blocks = np.searchsorted(im.ravel()[nz], np.arange(2, num_instances+1))
# or (which is faster will depend on numbers)
blocks = 1 + np.where(np.diff(im.ravel()[nz]))[0]
coords = np.array(np.unravel_index(nz, (50000, 50000)))
groups = np.split(coords, blocks, axis=-1)
groups will be a list of 2xn_i coordinates where n_i is the size of component i.
I am trying to convert some code from Matlab to Python, but I am unfamiliar with a considerable amount of the Matlab syntax and functionality. I have managed to do a some of the conversion using the PIL and Numpy python package, but I was hoping someone would be able to explain what is going on with some elements of this code.
clear all;close all;clc;
% Set gray scale to 0 for color images. Will need more memory
GRAY_SCALE = 1
% The physical mask placed close to the sensor has 4 harmonics, therefore
% we will have 9 angular samples in the light field
nAngles = 9;
cAngles = (nAngles+1)/2;
% The fundamental frequency of the cosine in the mask in pixels
F1Y = 238; F1X = 191; %Cosine Frequency in Pixels from Calibration Image
F12X = floor(F1X/2);
F12Y = floor(F1Y/2);
%PhaseShift due to Mask In-Plane Translation wrt Sensor
phi1 = 300; phi2 = 150;
%read 2D image
disp('Reading Input Image...');
I = double(imread('InputCones.png'));
if(GRAY_SCALE)
%take green channel only
I = I(:,:,2);
end
%make image odd size
I = I(1:end,1:end-1,:);
%find size of image
[m,n,CH] = size(I);
%Compute Spectral Tile Centers, Peak Strengths and Phase
for i = 1:nAngles
for j = 1:nAngles
CentY(i,j) = (m+1)/2 + (i-cAngles)*F1Y;
CentX(i,j) = (n+1)/2 + (j-cAngles)*F1X;
%Mat(i,j) = exp(-sqrt(-1)*((phi1*pi/180)*(i-cAngles) + (phi2*pi/180)*(j-cAngles)));
end
end
Mat = ones(nAngles,nAngles);
% 20 is because we cannot have negative values in the mask. So strenght of
% DC component is 20 times that of harmonics
Mat(cAngles,cAngles) = Mat(cAngles,cAngles) * 20;
% Beginning of 4D light field computation
% do for all color channel
for ch = 1:CH
disp('=================================');
disp(sprintf('Processing channel %d',ch));
% Find FFT of image
disp('Computing FFT of 2D image');
f = fftshift(fft2(I(:,:,ch)));
%If you want to visaulize the FFT of input 2D image (Figure 8 of
%paper), uncomment the next 2 lines
% figure;imshow(log10(abs(f)),[]);colormap gray;
% title('2D FFT of captured image (Figure 8 of paper). Note the spectral replicas');
%Rearrange Tiles of 2D FFT into 4D Planes to obtain FFT of 4D Light-Field
disp('Rearranging 2D FFT into 4D');
for i = 1: nAngles
for j = 1: nAngles
FFT_LF(:,:,i,j) = f( CentY(i,j)-F12Y:CentY(i,j)+F12Y, CentX(i,j)-F12X:CentX(i,j)+F12X)/Mat(i,j);
end
end
clear f
k = sqrt(-1);
for i = 1:nAngles
for j = 1:nAngles
shift = (phi1*pi/180)*(i-cAngles) + (phi2*pi/180)*(j-cAngles);
FFT_LF(:,:,i,j) = FFT_LF(:,:,i,j)*exp(k*shift);
end
end
disp('Computing inverse 4D FFT');
LF = ifftn(ifftshift(FFT_LF)); %Compute Light-Field by 4D Inverse FFT
clear FFT_LF
if(ch==1)
LF_R = LF;
elseif(ch==2)
LF_G = LF;
elseif(ch==3)
LF_B = LF;
end
clear LF
end
clear I
%Now we have 4D light fiel
disp('Light Field Computed. Done...');
disp('==========================================');
% Digital Refocusing Code
% Take a 2D slice of 4D light field
% For refocusing, we only need the FFT of light field, not the light field
disp('Synthesizing Refocused Images by taking 2D slice of 4D Light Field');
if(GRAY_SCALE)
FFT_LF_R = fftshift(fftn(LF_R));
clear LF_R
else
FFT_LF_R = fftshift(fftn(LF_R));
clear LF_R
FFT_LF_G = fftshift(fftn(LF_G));
clear LF_G
FFT_LF_B = fftshift(fftn(LF_B));
clear LF_B
end
% height and width of refocused image
H = size(FFT_LF_R,1);
W = size(FFT_LF_R,2);
count = 0;
for theta = -14:14
count = count + 1;
disp('===============================================');
disp(sprintf('Calculating New ReFocused Image: theta = %d',theta));
if(GRAY_SCALE)
RefocusedImage = Refocus2D(FFT_LF_R,[theta,theta]);
else
RefocusedImage = zeros(H,W,3);
RefocusedImage(:,:,1) = Refocus2D(FFT_LF_R,[theta,theta]);
RefocusedImage(:,:,2) = Refocus2D(FFT_LF_G,[theta,theta]);
RefocusedImage(:,:,3) = Refocus2D(FFT_LF_B,[theta,theta]);
end
str = sprintf('RefocusedImage%03d.png',count);
%Scale RefocusedImage in [0,255]
RefocusedImage = RefocusedImage - min(RefocusedImage(:));
RefocusedImage = 255*RefocusedImage/max(RefocusedImage(:));
%write as png image
clear tt
for ii = 1:CH
tt(:,:,ii) = fliplr(RefocusedImage(:,:,ii)');
end
imwrite(uint8(tt),str);
disp(sprintf('Refocused image written as %s',str));
end
Here is the Refocus2d function:
function IOut = Refocus2D(FFTLF,theta)
[m,n,p,q] = size(FFTLF);
Theta1 = theta(1);
Theta2 = theta(2);
cTem = floor(size(FFTLF)/2) + 1;
% find the coordinates of 2D slice
[XX,YY] = meshgrid(1:n,1:m);
cc = (XX - cTem(2))/size(FFTLF,2);
cc = Theta2*cc + cTem(4);
dd = (YY - cTem(1))/size(FFTLF,1);
dd = Theta1*dd + cTem(3);
% Resample 4D light field along the 2D slice
v = interpn(FFTLF,YY,XX,dd,cc,'cubic');
%set nan values to zero
idx = find(isnan(v)==1);
disp(sprintf('Number of Nans in sampling = %d',size(idx,1)))
v(isnan(v)) = 0;
% take inverse 2D FFT to get the image
IOut = real(ifft2(ifftshift(v)));
If anyone could help it would be greatly appreciated.
Thanks in advance
Apologies: Here is a brief description of what the code does:
The code reads in an image of a light field, and with prior knowledge of the plenoptic mask it we store the relevant nAngles and the fundamental frequencies of the mask and the phase shift, these are used to find multiple spectral replicas of the image.
Once the image is read in and the green channel is extracted we perform a Fast Fourier Transform on the image, and start taking slices from the image matrix that represent one of the spectral replicas.
We then take the Inverse Fourier Transform of all the spectral replicas to produce a the light field.
The Refocus2d function, then takes a 2 dimensional slice of the 4d data to recreate a refocused image.
The things I am struggling with specifically are:
FFT_LF(:,:,i,j) = f( CentY(i,j)-F12Y:CentY(i,j)+F12Y, CentX(i,j)-F12X:CentX(i,j)+F12X)/Mat(i,j);
We are taking a slice from the Matrix f, but where is that data in FFT_LF? What does (:,:,i,j) mean? Is it a multidimensional array?
and what does the size function return:
[m,n,p,q] = size(FFTLF);
Just a brief explanation of how this translates to python would be a great help.
Thanks everyone so far :)
How about getting start with this page http://www.scipy.org/NumPy_for_Matlab_Users? Also if you have brief description of what this is supposed to do, that would be good
You're correct: FFT_LF(:,:,i,j) refers to a multidimensional array. In this case, FFT_LF is a 4-D array, but the calculations result in a 2-D array. The (:,:,i,j) tells MATLAB exactly how to place the 2-D results into the 4-D variable.
In effect, it is storing one MxN array for each pair of indices (i,j). The colons (:) effectively mean "get every element in that dimension."
What [m,n,p,q] = size(FFTLF) will do is return the length of each dimension in your array. So, if FFTLF ends up being a 5x5x3x2 array, you get:
m=5, n=5, p=3, q=2.
If you have MATLAB available, typing "help size" should give a good explanation of what it does. The same can be said for most MATLAB functions: I've always been quite impressed with their documentation.
Hope that Helps