Summing one array in terms of another - python - python

I have two corresponding 2D arrays, one of velocity, one of intensity. The values of intensity match each of the velocity elements.
I have created another 1d array that that goes from min to max velocity in even bin widths.
How would I sum the intensity values from my 2d array which correspond to my velocity bins in my 1d array.
For example: if I have I = 5 corresponding to velocity = 101km/s, then this is added to the bin 100 - 105 km/s.
Here's my input:
rad = np.linspace(0, 3, 100) # polar coordinates
phi = np.linspace(0, np.pi, 100)
r, theta = np.meshgrid(rad, phi) # 2d arrays of r and theta coordinates
V0 = 225 # Velocity function w/ constants.
rpe = 0.149
alpha = 0.003
Vr = V0 * (1 - np.exp(-r / rpe)) * (1 + (alpha * np.abs(r) / rpe)) # returns 100x100 array of Velocities.
Vlos = Vr * np.cos(theta)# Line of sight velocity assuming the observer is in the plane of the polar disk.
a = (r**2) # intensity as a function of radius
b = (r**2 / 0.23)
I = (3.* np.exp(-1. * a)) - (1.8 * np.exp(-1. * b))
I wish to first create velocity bins from Vmin to Vmax and then sum the intensities over each bin.
My desired out put would be something along the lines of
V_bins = [0, 5, 10,... Vlos.max()]
I_sum = [1.4, 1.1, 1.8, ... 1.2]
plot(V_bins, I_sum)
EDIT: I have come up with temporary solution but perhaps there is a more elegant/efficient method of achieving it?
The two array Vlos and I are both 100 by 100 matrices.
Vlos = array([[ 0., 8.9, 17.44, ..., 238.5],...,
[-0., -8.9, -17.44, ..., -238.5]])
I = random.random((100, 100))
V = np.arange(Vlos.min(), Vlos.max()+5, 5)
bins = np.zeros(len(V))
for i in range(0, len(V)-1):
for j in range(0, len(Vlos)): # horizontal coordinate in matrix
for k in range(0, len(Vlos[0])): # vert coordinate
if Vlos[j,k] >= V[i]and Vlos[j,k] < V[i+1]:
bins[i] = bins[i] + I[j,k]
The result is plotted below.
The overall shape in the histogram is to be expected, however I don't understand the spike in the curve at V = 0. As far as I can tell this isn't there in the data which leads me to question my method.
Any further help would be appreciated.

import numpy as np
bins = np.arange(100,120,5)
velocities = np.array([101, 111, 102, 112])
intensities = np.array([1,2,3,4])
h = np.histogram(velocities, bins, weights=intensities)
print h
Output:
(array([4, 0, 6]), array([100, 105, 110, 115]))

Related

data generator of a regular polygon

Don't know how to do it.
Write a function C = make_means(k, radius) that outputs a k x 2 data matrix C containing the 2D coordinates of k means (of the data to be generated later). The means of the data generator must lie on the vertices of a regular polygon (if k=3 the polygon is a equilateral triangle, if k=6 it is a regular hexagon, etc). Write your own code to determine the position of the vertices of a regular polygon given a radius value (input radius) of the circle centered in the origin and inscribing the regular polygon. The first point of the polygon on the x-axis.
For example make_means(3, radius=1) would yield:
[[ 1. , 0. ],
[-0.5 , 0.8660254],
[-0.5 , -0.8660254]]
here is my code:
for i in range(0, 360, 72):
theta = np.radians(i)
mean = np.array([1., 0.])
c, s = np.cos(theta), np.sin(theta)
cov = np.array(((c, -s), (s, c)))
C = np.random.multivariate_normal(mean, cov, size= 1000)
#C
plt.scatter(C[:, 0], C[:, 1])
but it seems does not appear to rotate.
x_values = []
y_values = []
for i in range(k):
angle = 2 * np.pi / k
x = radius * np.cos(i * angle)
y = radius * np.sin(i * angle)
x_values.append(x)
y_values.append(y)
A = np.c_[x_values,y_values]
its something as simple as this.

Convolve array with kernel of variable standard deviation

Good day to you fellow programmer !
Today I would like to do something that I believe is tricky. I have a very large 2D array called tac that basically contains time curve values and a file containing a tuple of coordinates called coor which contains information on where to place these curves in a 3D array. What this set of variables represents is actually a 4D array: the first 3 dimensions represent space dimensions and the fourth is time. The whole thing is stored as is to avoid storing an immense amount of zeros.
I would like to apply, for each time (in other words, each values in the 4th dimension), a gaussian kernel to this set of data. I was able to generate this kernel and to perform the convolution quite easily for a fixed standard deviation for the whole array using scipy.ndimage.convolve. The kernel was created using scipy.signal.gaussian. Here is a brief example of the principle where tac_4d contains the 4D array (stores a lot of data I know... but one problem at the time):
def gaussian_kernel_3d(radius, sigma):
num = 2 * radius + 1
kernel_1d = signal.gaussian(num, std=sigma).reshape(num, 1)
kernel_2d = np.outer(kernel_1d, kernel_1d)
kernel_3d = np.outer(kernel_1d, kernel_2d).reshape(num, num, num)
kernel_3d = np.expand_dims(kernel_3d, -1)
return kernel_3d
g = gaussian_kernel_3d(1, .5)
cag = nd.convolve(tac_4d, g, mode='constant', cval=0.0)
The trick is now to convolve the array with a kernel which standard deviation is different for each SPACE coordinate. In other words, I would have a 3D array std containing standard deviations for each coordinate of the array.
It seems https://github.com/sheliak/varconvolve is the code needed to take care of this problem. However I don't really understand how to use it and quite frankly, I would prefer to come up with a genuine solution. Do you guys see a way to solve this problem?
Thanks in advance !
EDIT
Here is what I hope can be considered MCVE
import numpy as np
from scipy import signal
from scipy import ndimage as nd
def gaussian_kernel_2d(radius, sigma):
num = 2 * radius + 1
kernel_1d = signal.gaussian(num, std=sigma).reshape(num, 1)
kernel_2d = np.outer(kernel_1d, kernel_1d)
return kernel_2d
def gaussian_kernel_3d(radius, sigma):
num = 2 * radius + 1
kernel_1d = signal.gaussian(num, std=sigma).reshape(num, 1)
kernel_2d = np.outer(kernel_1d, kernel_1d)
kernel_3d = np.outer(kernel_1d, kernel_2d).reshape(num, num, num)
kernel_3d = np.expand_dims(kernel_3d, -1)
return kernel_3d
np.random.seed(0)
number_of_tac = 150
time_samples = 915
z, y, x = 100, 150, 100
voxel_number = x * y * z
# TACs in the right order
tac = np.random.uniform(0, 4, time_samples * number_of_tac).reshape(number_of_tac, time_samples)
arr = np.array([0] * (voxel_number - number_of_tac) + [1] * number_of_tac)
np.random.shuffle(arr)
arr = arr.reshape(z, y, x)
coor = np.where(arr != 0) # non-empty voxel
# Algorithm to replace TAC in 3D space
nnz = np.zeros(arr.shape)
nnz[coor] = 1
tac_4d = np.zeros((x, y, z, time_samples))
tac_4d[np.where(nnz == 1)] = tac
# 3D convolution for all time
# TODO: find a way to make standard deviation change for each voxel
g = gaussian_kernel_3d(1, 1) # 3D kernel of std = 1
v = np.random.uniform(0, 1, x * y * z).reshape(z, y, x) # 3D array of std
cag = nd.convolve(tac_4d, g, mode='constant', cval=0.0) # convolution
Essentially, you have a 4D dataset, shape (nx, ny, nz, nt) that is sparse in (nx, ny, nz) and dense in the nt axis. If (i, j, k) are coordinates of nonzero points in the sparse dimensions, you want to convolve with a Gaussian 3D kernel that has a sigma that depends on (i, j, k).
For example, if there are nonzero points at [1, 2, 5] and [1, 4, 5] with corresponding sigmas 0.1 and 1.0, then the output at coordinates [1, 3, 5] is affected mostly by the [1, 4, 5] point because that one has the largest point spread.
Your question is ambiguous; it could also mean that point [1, 3, 5] has a its own associated sigma, for example 0.5, and pulls data from the two adjacent points with equal weight. I will assume the first definition (sigma values associated with input points, not with output points).
Because the operation is not a true convolution, there is no fast FFT-based method to do the entire operation in one operation. Instead, you have to loop over the sigma values. Fortunately, your example has only 150 nonzero points, so the loop is not too expensive.
Here is an implementation. I keep the data in sparse representation as long as possible.
import scipy.signal
import numpy as np
def kernel3d(mm, sigma):
"""Return (mm, mm, mm) shaped, normalized kernel."""
g1 = scipy.signal.gaussian(mm, std=sigma)
g3 = g1.reshape(mm, 1, 1) * g1.reshape(1, mm, 1) * g1.reshape(1, 1, mm)
return g3 * (1/g3.sum())
np.random.seed(1)
s = 2 # scaling factor (original problem: s=10)
nx, ny, nz, nt, nnz = 10*s, 11*s, 12*s, 91*s, 15*s
# select nnz random voxels to fill with time series data
randint = np.random.randint
tseries = {} # key: (i, j, k) tuple; value: time series data, shape (nt,)
for _ in range(nnz):
while True:
ijk = (randint(nx), randint(ny), randint(nz))
if ijk not in tseries:
tseries[ijk] = np.random.uniform(0, 1, size=nt)
break
ijks = np.array(list(tseries.keys())) # shape (nnz, 3)
# sigmas: key: (i, j, k) tuple; value: standard deviation
sigmas = { k: np.random.uniform(0, 2) for k in tseries.keys() }
# output will be stored as dense array, padded to avoid edge issues
# with convolution.
m = 5 # padding size
cag_4dp = np.zeros((nx+2*m, ny+2*m, nz+2*m, nt))
mm = 2*m + 1 # kernel width
for (i, j, k), tdata in tseries.items():
kernel = kernel3d(mm, sigmas[(i, j, k)]).reshape(mm, mm, mm, 1)
# convolution of one voxel by kernel is trivial.
# slice4d_c has shape (mm, mm, mm, nt).
slice4d_c = kernel * tdata
cag_4dp[i:i+mm, j:j+mm, k:k+mm, :] += slice4d_c
cag_4d = cag_4dp[m:-m, m:-m, m:-m, :]
#%%
import matplotlib.pyplot as plt
fig, axs = plt.subplots(2, 2, tight_layout=True)
plt.close('all')
# find a few planes
#ks = np.where(np.any(cag_4d != 0, axis=(0, 1,3)))[0]
ks = ijks[:4, 2]
for ax, k in zip(axs.ravel(), ks):
ax.imshow(cag_4d[:, :, k, nt//2].T)
ax.set_title(f'Voxel [:, :, {k}] at time {nt//2}')
fig.show()
for ijk, sigma in sigmas.items():
print(f'{ijk}: sigma={sigma:.2f}')

Get angle into range 0 - 2*pi - python

In my simulation i compute multiple values for a phase, for example
phi = np.linspace(-N,N,1000)
where N can be large.
Is there an easy way to map the values to the intervall [0,2pi) ?
Does that work ?
import numpy as np
import math
N=10
phi = np.linspace(-N,N,1000)
phi = phi%(2*math.pi)
print(phi)
Output
[2.56637061 2.58639063 ... 3.69679467 3.71681469]
It sounds like you are looking for np.interp. Scipy offers an interpolate function too.
For a usage example, to map the values of phi (which are between -N and N) to [0, 2π] try
np.interp(phi, (-N, N), (0, 2*np.pi))
To exclude 2π you could either change upper bound so no value maps onto 2π.
np.interp(phi, (-N, N + 1), (0, 2*np.pi))
Or reduce the largest value you include in phi
phi = np.linspace(-N, N, 1000, endpoint=False)
I believe it would be easier to just ask for the values directly.
For example, 1000 points over the range [0, 2π] can be given by
np.linspace(0, 2*np.pi, 1000)
And for the range [0, 2π) which excludes the value 2π
np.linspace(0, 2*np.pi, 1000, endpoint=False)
Another possible way:
First, convert to the [-pi, pi] interval using np.arctan2(np.sin(angle), np.cos(angle)). Then, you still need to transform the negative values. A final function like this would work:
def convert_angle_to_0_2pi_interval(angle):
new_angle = np.arctan2(np.sin(angle), np.cos(angle))
if new_angle < 0:
new_angle = abs(new_angle) + 2 * (np.pi - abs(new_angle))
return new_angle
To confirm, run:
angles = [10, 200, 365, -10, -180]
print(np.rad2deg([convert_angle_to_0_2pi_interval(np.deg2rad(a)) for a in angles]))
...which prints: [ 10., 200., 5., 350., 180.]

Calculate distance between neighbors efficiently

I have data geographically scattered without any kind of pattern and I need to create an image where the value of each pixel is an average of the neighbors of that pixel that are less than X meters.
For this I use the library scipy.spatial to generate a KDTree with the data (cKDTree). Once the data structure is generated, I locate the pixel geographically and locate the geographic points that are closest.
# Generate scattered data points
coord_cart= [
[
feat.geometry().GetY(),
feat.geometry().GetX(),
feat.GetField(feature),
] for feat in layer
]
# Create KDTree structure
tree = cKDTree(coord_cart)
# Get raster image dimensions
pixel_size = 5
source_layer = shapefile.GetLayer()
x_min, x_max, y_min, y_max = source_layer.GetExtent()
x_res = int((x_max - x_min) / pixel_size)
y_res = int((y_max - y_min) / pixel_size)
# Create grid
x = np.linspace(x_min, x_max, x_res)
y = np.linspace(y_min, y_max, y_res)
X, Y = np.meshgrid(x, y)
grid = np.array(zip(Y.ravel(), X.ravel()))
# Get points that are less than 10 meters away
inds = tree.query_ball_point(grid, 10)
# inds is an np.array of lists of different length, so I need to convert it into an array of n_points x maximum number of neighbors
ll = np.array([len(l) for l in inds])
maxlen = max(ll)
arr = np.zeros((len(ll), maxlen), int)
# I don't know why but inds is an array of list, so I convert it into an array of array to use grid[inds]
# I THINK THIS IS A LITTLE INEFFICIENT
for i in range(len(inds)):
inds[i].extend([i] * (maxlen - len(inds[i])))
arr[i] = np.array(inds[i], dtype=int)
# AND THIS DOESN'T WORK
d = np.linalg.norm(grid - grid[inds])
Is there a better way to do this? I'm trying to use IDW to perform the interpolation between the points. I found this snippet that uses a function that gets the N nearest points but it does not work for me because I need that if there is no point in a radius R, the value of the pixel is 0.
d, inds = tree.query(zip(xt, yt, zt), k = 10)
w = 1.0 / d**2
air_idw = np.sum(w * air.flatten()[inds], axis=1) / np.sum(w, axis=1)
air_idw.shape = lon_curv.shape
Thanks in advance!
This may be one of the cases where KDTrees are not a good solution. This is because you are mapping to a grid, which is a very simple structure meaning there is nothing to gain from the KDTree's sophistication. Nearest grid point and distance can be found by simple arithmetic.
Below is a simple example implementation. I'm using a Gaussian kernel but changing that to IDW if you prefer should be straight-forward.
import numpy as np
from scipy import stats
def rasterize(coords, feature, gu, cutoff, kernel=stats.norm(0, 2.5).pdf):
# compute overlap (filter size / grid unit)
ovlp = int(np.ceil(cutoff/gu))
# compute raster dimensions
mn, mx = coords.min(axis=0), coords.max(axis=0)
reso = np.ceil((mx - mn) / gu).astype(int)
base = (mx + mn - reso * gu) / 2
# map coordinates to raster, the residual is the distance
grid_res = coords - base
grid_coords = np.rint(grid_res / gu).astype(int)
grid_res -= gu * grid_coords
# because of overlap we must add neighboring grid points to the nearest
gcovlp = np.c_[-ovlp:ovlp+1, np.zeros(2*ovlp+1, dtype=int)]
grid_coords = (gcovlp[:, None, None, :] + gcovlp[None, :, None, ::-1]
+ grid_coords).reshape(-1, 2)
# the corresponding residuals have the same offset with opposite sign
gdovlp = -gu * (gcovlp+1/2)
grid_res = (gdovlp[:, None, None, :] + gdovlp[None, :, None, ::-1]
+ grid_res).reshape(-1, 2)
# discard off fov grid points and points outside the cutoff
valid, = np.where(((grid_coords>=0) & (grid_coords<=reso)).all(axis=1) & (
np.einsum('ij,ij->i', grid_res, grid_res) <= cutoff*cutoff))
grid_res = grid_res[valid]
feature = feature[valid // (2*ovlp+1)**2]
# flatten grid so we can use bincount
grid_flat = np.ravel_multi_index(grid_coords[valid].T, reso+1)
return np.bincount(
grid_flat,
feature * kernel(np.sqrt(np.einsum('ij,ij->i', grid_res, grid_res))),
(reso + 1).prod()).reshape(reso+1)
gu = 5
cutoff = 10
coords = np.random.randn(10_000, 2) * (100, 20)
coords[:, 1] += 80 * np.sin(coords[:, 0] / 40)
feature = np.random.uniform(0, 1000, (10_000,))
from timeit import timeit
print(timeit("rasterize(coords, feature, gu, cutoff)", globals=globals(), number=100)*10, 'ms')
pic = rasterize(coords, feature, gu, cutoff)
import pylab
pylab.pcolor(pic, cmap=pylab.cm.jet)
pylab.colorbar()
pylab.show()

How to generate 2D gaussian with Python?

I can generate Gaussian data with random.gauss(mu, sigma) function, but how can I generate 2D gaussian? Is there any function like that?
If you can use numpy, there is numpy.random.multivariate_normal(mean, cov[, size]).
For example, to get 10,000 2D samples:
np.random.multivariate_normal(mean, cov, 10000)
where mean.shape==(2,) and cov.shape==(2,2).
I'd like to add an approximation using exponential functions. This directly generates a 2d matrix which contains a movable, symmetric 2d gaussian.
I should note that I found this code on the scipy mailing list archives and modified it a little.
import numpy as np
def makeGaussian(size, fwhm = 3, center=None):
""" Make a square gaussian kernel.
size is the length of a side of the square
fwhm is full-width-half-maximum, which
can be thought of as an effective radius.
"""
x = np.arange(0, size, 1, float)
y = x[:,np.newaxis]
if center is None:
x0 = y0 = size // 2
else:
x0 = center[0]
y0 = center[1]
return np.exp(-4*np.log(2) * ((x-x0)**2 + (y-y0)**2) / fwhm**2)
For reference and enhancements, it is hosted as a gist here. Pull requests welcome!
Since the standard 2D Gaussian distribution is just the product of two 1D Gaussian distribution, if there are no correlation between the two axes (i.e. the covariant matrix is diagonal), just call random.gauss twice.
def gauss_2d(mu, sigma):
x = random.gauss(mu, sigma)
y = random.gauss(mu, sigma)
return (x, y)
import numpy as np
# define normalized 2D gaussian
def gaus2d(x=0, y=0, mx=0, my=0, sx=1, sy=1):
return 1. / (2. * np.pi * sx * sy) * np.exp(-((x - mx)**2. / (2. * sx**2.) + (y - my)**2. / (2. * sy**2.)))
x = np.linspace(-5, 5)
y = np.linspace(-5, 5)
x, y = np.meshgrid(x, y) # get 2D variables instead of 1D
z = gaus2d(x, y)
Straightforward implementation and example of the 2D Gaussian function. Here sx and sy are the spreads in x and y direction, mx and my are the center coordinates.
Numpy has a function to do this. It is documented here. Additionally to the method proposed above it allows to draw samples with arbitrary covariance.
Here is a small example, assuming ipython -pylab is started:
samples = multivariate_normal([-0.5, -0.5], [[1, 0],[0, 1]], 1000)
plot(samples[:, 0], samples[:, 1], '.')
samples = multivariate_normal([0.5, 0.5], [[0.1, 0.5],[0.5, 0.6]], 1000)
plot(samples[:, 0], samples[:, 1], '.')
In case someone find this thread and is looking for somethinga little more versatile (like I did), I have modified the code from #giessel. The code below will allow for asymmetry and rotation.
import numpy as np
def makeGaussian2(x_center=0, y_center=0, theta=0, sigma_x = 10, sigma_y=10, x_size=640, y_size=480):
# x_center and y_center will be the center of the gaussian, theta will be the rotation angle
# sigma_x and sigma_y will be the stdevs in the x and y axis before rotation
# x_size and y_size give the size of the frame
theta = 2*np.pi*theta/360
x = np.arange(0,x_size, 1, float)
y = np.arange(0,y_size, 1, float)
y = y[:,np.newaxis]
sx = sigma_x
sy = sigma_y
x0 = x_center
y0 = y_center
# rotation
a=np.cos(theta)*x -np.sin(theta)*y
b=np.sin(theta)*x +np.cos(theta)*y
a0=np.cos(theta)*x0 -np.sin(theta)*y0
b0=np.sin(theta)*x0 +np.cos(theta)*y0
return np.exp(-(((a-a0)**2)/(2*(sx**2)) + ((b-b0)**2) /(2*(sy**2))))
We can try just using the numpy method np.random.normal to generate a 2D gaussian distribution.
The sample code is np.random.normal(mean, sigma, (num_samples, 2)).
A sample run by taking mean = 0 and sigma 20 is shown below :
np.random.normal(0, 20, (10,2))
>>array([[ 11.62158316, 3.30702215],
[-18.49936277, -11.23592946],
[ -7.54555371, 14.42238838],
[-14.61531423, -9.2881661 ],
[-30.36890026, -6.2562164 ],
[-27.77763286, -23.56723819],
[-18.18876597, 41.83504042],
[-23.62068377, 21.10615509],
[ 15.48830184, -15.42140269],
[ 19.91510876, 26.88563983]])
Hence we got 10 samples in a 2d array with mean = 0 and sigma = 20

Categories

Resources