According to the original paper by Huang
https://arxiv.org/pdf/1401.4211.pdf
The marginal Hibert spectrum is given by:
where A = A(w,t) (i.e., a function time and frequency) and p(w,A)
the joint probability density function of P(ω, A) of the frequency [ωi] and amplitude [Ai].
I am trying to estimate 1) The joint probability density using the plt.hist2d 2) the integral shown below using a sum.
The code I am using is the following:
IA_flat1 = np.ravel(IA) ### Turn matrix to 1 D array
IF_flat1 = np.ravel(IF) ### Here IA corresponds to A
IF_flat = IF_flat1[(IF_flat1>min_f) & (IF_flat1<fs)] ### Keep only desired frequencies
IA_flat = IA_flat1[(IF_flat1>min_f) & (IF_flat1<fs)] ### Keep IA that correspond to desired frequencies
### return the Joint probability density
Pjoint,f_edges, A_edges,_ = plt.hist2d(IF_flat,IA_flat,bins=[bins_F,bins_A], density=True)
plt.close()
n1 = np.digitize(IA_flat, A_edges).astype(int) ### Return the indices of the bins to which
n2 = np.digitize(IF_flat, f_edges).astype(int) ### each value in input array belongs.
### define integration function
from numba import jit, prange ### Numba is added for speed
#jit(nopython=True, parallel= True)
def get_int(A_edges, Pjoint ,IA_flat, n1, n2):
dA = np.diff(A_edges)[0] ### Find dx for integration
sum_h = np.zeros(np.shape(Pjoint)[0]) ### Intitalize array
for j in prange(np.shape(Pjoint)[0]):
h = np.zeros(np.shape(Pjoint)[1]) ### Intitalize array
for k in prange(np.shape(Pjoint)[1]):
needed = IA_flat[(n1==k) & (n2==j)] ### Keep only the elements of arrat that
### are related to PJoint[j,k]
h[k] = Pjoint[j,k]*np.nanmean(needed**2)*dA ### Pjoint*A^2*dA
sum_h[j] = np.nansum(h) ### Sum_{i=0}^{N}(Pjoint*A^2*dA)
return sum_h
### Now run previously defined function
sum_h = get_int(A_edges, Pjoint ,IA_flat, n1, n2)
1) I am not sure that everything is correct though. Any suggestions or comments on what I might be doing wrong?
2) Is there a way to do the same using a scipy integration scheme?
You can extract the probability from the 2D histogram and use it for the integration:
# Added some numbers to have something to run
import numpy as np
import matplotlib.pyplot as plt
IA = np.random.rand(100,100)
IF = np.random.rand(100,100)
bins_F = np.linspace(0,1,20)
bins_A = np.linspace(0,1,100)
min_f = 0
fs = 1.0
IA_flat1 = np.ravel(IA) ### Turn matrix to 1 D array
IF_flat1 = np.ravel(IF) ### Here IA corresponds to A
IF_flat = IF_flat1[(IF_flat1>min_f) & (IF_flat1<fs)] ### Keep only desired frequencies
IA_flat = IA_flat1[(IF_flat1>min_f) & (IF_flat1<fs)] ### Keep IA that correspond to desired frequencies
### return the Joint probability density
Pjoint,f_edges, A_edges,_ = plt.hist2d(IF_flat,IA_flat,bins=[bins_F,bins_A], density=True)
f_values = (f_edges[1:]+f_edges[:-1])/2
A_values = (A_edges[1:]+A_edges[:-1])/2
dA = A_values[1]-A_values[0] # for the integral
#Pjoint.shape (19,99)
h = np.zeros(f_values.shape)
for i in range(len(f_values)):
f = f_values[i]
# column of the histogram with frequency f, probability
p = Pjoint[i]
# summatory equivalent to the integral
integral_result = np.sum(p*A_values**2*dA )
h[i] = integral_result
plt.figure()
plt.plot(f_values,h)
Related
Short version:
Is it possible to create a new scipy.spatial.Delaunay object with a subset of the triangles (2D data) from an existing object?
The goal would be to use the find_simplex method on the new object with filtered out simplices.
Similar but not quite the same
matplotlib contour/contourf of **concave** non-gridded data
How to deal with the (undesired) triangles that form between the edges of my geometry when using Triangulation in matplotlib
Long version:
I am looking at lat-lon data that I regrid with scipy.interpolate.griddata like in the pseudo-code below:
import numpy as np
from scipy.interpolate import griddata
from scipy.spatial import Delaunay
from scipy.interpolate.interpnd import _ndim_coords_from_arrays
#lat shape (a,b): 2D array of latitude values
#lon shape (a,b): 2D array of longitude values
#x shape (a,b): 2D array of variable of interest at lat and lon
# lat-lon data
nonan = ~np.isnan(lat)
flat_lat = lat[nonan]
flat_lon = lon[nonan]
flat_x = x[nonan]
# regular lat-lon grid for regridding
lon_ar = np.arange(loni,lonf,resolution)
lat_ar = np.arange(lati,latf,resolution)
lon_grid, lat_grid = np.meshgrid(lon_ar,lat_ar)
# regrid
x_grid = griddata((flat_lon,flat_lat),flat_x,(lon_grid,lat_grid), method='nearest')
# filter out extrapolated values
cloud_points = _ndim_coords_from_arrays((flat_lon,flat_lat))
regrid_points = _ndim_coords_from_arrays((lon_grid.ravel(),lat_grid.ravel()))
tri = Delaunay(cloud_points)
outside_hull = tri.find_simplex(regrid_points) < 0
x_grid[outside_hull.reshape(x_grid.shape)] = np.nan
# filter out large triangles ??
# it would be easy if I could "subset" tri into a new scipy.spatial.Delaunay object
# new_tri = ??
# outside_hull = new_tri.find_simplex(regrid_points) < 0
The problem is that the convex hull has low quality (very large, shown in blue in example below) triangles that I would like to filter out as they don't represent the data well. I know how to filter them out in input points, but not in the regridded output. Here is the filter function:
def filter_large_triangles(
points: np.ndarray, tri: Optional[Delaunay] = None, coeff: float = 2.0
):
"""
Filter out triangles that have an edge > coeff * median(edge)
Inputs:
tri: scipy.spatial.Delaunay object
coeff: triangles with an edge > coeff * median(edge) will be filtered out
Outputs:
valid_slice: boolean array that selects "normal" triangles
"""
if tri is None:
tri = Delaunay(points)
edge_lengths = np.zeros(tri.vertices.shape)
seen = {}
# loop over triangles
for i, vertex in enumerate(tri.vertices):
# loop over edges
for j in range(3):
id0 = vertex[j]
id1 = vertex[(j + 1) % 3]
# avoid calculating twice for non-border edges
if (id0,id1) in seen:
edge_lengths[i, j] = seen[(id0,id1)]
else:
edge_lengths[i, j] = np.linalg.norm(points[id1] - points[id0])
seen[(id0,id1)] = edge_lengths[i, j]
median_edge = np.median(edge_lengths.flatten())
valid_slice = np.all(edge_lengths < coeff * median_edge, axis=1)
return valid_slice
The bad triangles are shown in blue below:
import matplotlib.pyplot as plt
no_large_triangles = filter_large_triangles(cloud_points,tri)
fig,ax = plt.subplot()
ax.triplot(points[:,0],points[:,1],tri.simplices,c='blue')
ax.triplot(points[:,0],points[:,1],tri.simplices[no_large_triangles],c='green')
plt.show()
Is it possible to create a new scipy.spatial.Delaunay object with only the no_large_triangles simplices? The goal would be to use the find_simplex method on that new object to easily filter out points.
As an alternative how could I find the indices of points in regrid_points that fall inside the blue triangles? (tri.simplices[~no_large_triangles])
So it is possible to modify the Delaunay object for the purpose of using find_simplex on a subset of simplices, but it seems only with the bruteforce algorithm.
# filter out extrapolated values
cloud_points = _ndim_coords_from_arrays((flat_lon,flat_lat))
regrid_points = _ndim_coords_from_arrays((lon_grid.ravel(),lat_grid.ravel()))
tri = Delaunay(cloud_points)
outside_hull = tri.find_simplex(regrid_points) < 0
# filter out large triangles
large_triangles = ~filter_large_triangles(cloud_points,tri)
large_triangle_ids = np.where(large_triangles)[0]
subset_tri = tri # this doesn't preserve tri, effectively just a renaming
# the _find_simplex_bruteforce method only needs the simplices and neighbors
subset_tri.nsimplex = large_triangle_ids.size
subset_tri.simplices = tri.simplices[large_triangles]
subset_tri.neighbors = tri.neighbors[large_triangles]
# update neighbors
for i,triangle in enumerate(subset_tri.neighbors):
for j,neighbor_id in enumerate(triangle):
if neighbor_id in large_triangle_ids:
# reindex the neighbors to match the size of the subset
subset_tri.neighbors[i,j] = np.where(large_triangle_ids==neighbor_id)[0]
elif neighbor_id>=0 and (neighbor_id not in large_triangle_ids):
# that neighbor was a "normal" triangle that should not exist in the subset
subset_tri.neighbors[i,j] = -1
inside_large_triangles = subset_tri.find_simplex(regrid_points,bruteforce=True) >= 0
invalid_slice = np.logical_or(outside_hull,inside_large_triangles)
x_grid[invalid_slice.reshape(x_grid.shape)] = np.nan
Showing that the new Delaunay object has only the subset of large triangles
import matplotlib.pyplot as plt
fig,ax = plt.subplot()
ax.triplot(cloud_points[:,0],cloud_points[:,1],subset_tri.simplices,color='red')
plt.show()
Plotting x_grid with pcolormesh before the filtering for large triangles (zoomed in the blue circle above):
After the filtering:
I am trying to estimate the parameters of my set of ODEs in my program, always minimizing the error between my experimental data and predicted data.
The problem is, I can obtain a good prediction and a very good fit, but I can only estimate the same number of points as my experimental data, which returns a very strange output.
Can you please provide me more information in how can I obtain a more accurate set of predicted points?
Code can be found below.
from gekko import GEKKO
import numpy as np
import matplotlib.pyplot as plt
xm = np.array([0,1,2,3,4,5])
ym = np.array([2.0,1.5,np.nan,2.2,3.0,5.0])
m = GEKKO(remote=False)
m.time = xm
a = m.FV(lb=0.1,ub=2.0)
a.STATUS=1
y = m.CV(value=ym,name='y',fixed_initial=False)
y.FSTATUS=1
m.Equation(y.dt()==a*y)
m.options.IMODE = 5
m.options.SOLVER = 1
m.solve(disp=True)
print('Optimized, a = ' + str(a.value[0]))
plt.figure(figsize=(6,2))
plt.plot(xm,ym,'bo',label='Meas')
plt.plot(xm,y.value,'r-',label='Pred')
plt.ylabel('y')
plt.ylim([0,6])
plt.legend()
plt.show()
If I replace variable m.time to obtain more data predicted with :
m.time = np.linspace(0,5,30)
I get the error: raise Exception('Data arrays must have the same length, and match time discretization in dynamic problems')
Exception: Data arrays must have the same length, and match time discretization in dynamic problems
There are two options (Methods 1 and 2) that I've shown below. You can either plot the interpolating nodes to give you more resolution or create a new model for simulation.
from gekko import GEKKO
import numpy as np
import matplotlib.pyplot as plt
xm = np.array([0,1,2,3,4,5])
ym = np.array([2.0,1.5,np.nan,2.2,3.0,5.0])
m = GEKKO(remote=False)
m.time = xm
a = m.FV(lb=0.1,ub=2.0)
a.STATUS=1
y = m.CV(value=ym,name='y',fixed_initial=False)
y.FSTATUS=1
m.Equation(y.dt()==a*y)
m.options.IMODE = 5
m.options.SOLVER = 1
m.options.CSV_WRITE = 2 # For Method 1
m.options.NODES = 3 # For Method 1 (options 3-6)
m.solve(disp=True)
print('Optimized, a = ' + str(a.value[0]))
# Method 1: Plot interpolating nodes
import json
with open(m.path+'//results_all.json') as f:
results = json.load(f)
# Method 2: Re-simulate with more points
sim = GEKKO(remote=False)
ap = a.value[0]
xp = np.linspace(0,7); sim.time=xp
yp = sim.Var(y.value[0])
sim.Equation(yp.dt()==ap*yp)
sim.options.NODES = 3
sim.options.IMODE=4; sim.solve()
plt.figure(figsize=(6,2))
plt.plot(xm,ym,'bo',label='Meas')
plt.plot(xm,y.value,'gs-.',label='Pred Original')
plt.plot(results['time'],results['y'],'kx-',\
MarkerSize=10,label='Pred Method 1')
plt.plot(xp,yp,'r.--',label='Pred Method 2')
plt.ylabel('y')
plt.ylim([0,10])
plt.legend()
plt.show()
A third option is to reset the .value of the original model but that can be tedious. Instead, you can also create both the estimation and simulation models in a loop as is done with an example of Moving Horizon Estimation and Model Predictive Control that use the same model but transfer parameters between them:
# use remote=True for MacOS
mhe = GEKKO(name='tclab-mhe',remote=False)
mpc = GEKKO(name='tclab-mpc',remote=False)
# create 2 models (MHE and MPC) in one loop
for m in [mhe,mpc]:
# Parameters with bounds
m.K1 = m.FV(value=0.607,lb=0.1,ub=1.0)
m.K2 = m.FV(value=0.293,lb=0.1,ub=1.0)
m.K3 = m.FV(value=0.24,lb=0.1,ub=1.0)
m.tau12 = m.FV(value=192,lb=100,ub=200)
m.tau3 = m.FV(value=15,lb=10,ub=20)
m.Ta = m.Param(value=23.0) # degC
m.Q1 = m.MV(value=0,lb=0,ub=100,name='q1')
m.Q2 = m.MV(value=0,lb=0,ub=100,name='q2')
# Heater temperatures
m.TH1 = m.SV(value=T1m[0])
m.TH2 = m.SV(value=T2m[0])
# Sensor temperatures
m.TC1 = m.CV(value=T1m[0],name='tc1')
m.TC2 = m.CV(value=T2m[0],name='tc2')
# Temperature difference between two heaters
m.DT = m.Intermediate(m.TH2-m.TH1)
# Equations
m.Equation(m.tau12*m.TH1.dt()+(m.TH1-m.Ta)==m.K1*m.Q1+m.K3*m.DT)
m.Equation(m.tau12*m.TH2.dt()+(m.TH2-m.Ta)==m.K2*m.Q2-m.K3*m.DT)
m.Equation(m.tau3*m.TC1.dt()+m.TC1==m.TH1)
m.Equation(m.tau3*m.TC2.dt()+m.TC2==m.TH2)
I have created a code that returns the output that I am after - 2 graphs with multiple lines on each graph. However, the code is slow and quite big (in terms of how many lines of code it takes). I am interested in any improvements I can make that will help me to get such graphs faster, and make my code more presentable.
Additionally, I would like to add more to my graphs (axis names and titles is what I am after). Normally, I would use plt.xlabel,plt.ylabel and plt.title to do so, however I couldn't quite understand how to use them here. The aim here is to add a line to each graph after each loop ( I have adapted this piece of code to do so).
I should note that I need to use Python for this task (so I cannot change to anything else) and I do need Sympy library to find values that are plotted in my graphs.
My code so far is as follows:
import matplotlib.pyplot as plt
import sympy as sym
import numpy as np
sym.init_printing()
x, y = sym.symbols('x, y') # defining our unknown probabilities
al = np.arange(20,1000,5).reshape((196,1)) # values of alpha/beta
prob_of_strA = []
prob_of_strB = []
colours=['r','g','b','k','y']
pen_values = [[0,-5,-10,-25,-50],[0,-25,-50,-125,-250]]
fig1, ax1 = plt.subplots()
fig2, ax2 = plt.subplots()
for j in range(0,len(pen_values[1])):
for i in range(0,len(al)): # choosing the value of beta
A = sym.Matrix([[10, 50], [int(al[i]), pen_values[0][j]]]) # defining matrix A
B = sym.Matrix([[pen_values[1][j], 50], [int(al[i]), 10]]) # defining matrix B
sigma_r = sym.Matrix([[x, 1-x]]) # defining the vector of probabilities
sigma_c = sym.Matrix([y, 1-y]) # defining the vector of probabilities
ts1 = A * sigma_c ; ts2 = sigma_r * B # defining our utilities
y_sol = sym.solvers.solve(ts1[0] - ts1[1],y,dict = True) # solving for y
x_sol = sym.solvers.solve(ts2[0] - ts2[1],x,dict = True) # solving for x
prob_of_strA.append(y_sol[0][y]) # adding the value of y to the vector
prob_of_strB.append(x_sol[0][x]) # adding the value of x to the vector
ax1.plot(al,prob_of_strA,colours[j],label = ["penalty = " + str(pen_values[0][j])]) # plotting value of y for a given penalty value
ax2.plot(al,prob_of_strB,colours[j],label = ["penalty = " + str(pen_values[1][j])]) # plotting value of x for a given penalty value
ax1.legend() # showing the legend
ax2.legend() # showing the legend
prob_of_strA = [] # emptying the vector for the next round
prob_of_strB = [] # emptying the vector for the next round
You can save a couple of lines by initializing your empty vectors inside the loop. You don't have to bother re-defining them at the end.
for j in range(0,len(pen_values[1])):
prob_of_strA = []
prob_of_strB = []
for i in range(0,len(al)): # choosing the value of beta
A = sym.Matrix([[10, 50], [int(al[i]), pen_values[0][j]]]) # defining matrix A
B = sym.Matrix([[pen_values[1][j], 50], [int(al[i]), 10]]) # defining matrix B
sigma_r = sym.Matrix([[x, 1-x]]) # defining the vector of probabilities
sigma_c = sym.Matrix([y, 1-y]) # defining the vector of probabilities
ts1 = A * sigma_c ; ts2 = sigma_r * B # defining our utilities
y_sol = sym.solvers.solve(ts1[0] - ts1[1],y,dict = True) # solving for y
x_sol = sym.solvers.solve(ts2[0] - ts2[1],x,dict = True) # solving for x
prob_of_strA.append(y_sol[0][y]) # adding the value of y to the vector
prob_of_strB.append(x_sol[0][x]) # adding the value of x to the vector
ax1.plot(al,prob_of_strA,colours[j],label = ["penalty = " + str(pen_values[0][j])]) # plotting value of y for a given penalty value
ax2.plot(al,prob_of_strB,colours[j],label = ["penalty = " + str(pen_values[1][j])]) # plotting value of x for a given penalty value
ax1.legend() # showing the legend
ax2.legend() # showing the legend
I've been trying to create a 2D map of blobs of matter (Gaussian random field) using a variance I have calculated. This variance is a 2D array. I have tried using numpy.random.normal since it allows for a 2D input of the variance, but it doesn't really create a map with the trend I expect from the input parameters. One of the important input constants lambda_c should manifest itself as the physical size (diameter) of the blobs. However, when I change my lambda_c, the size of the blobs does not change if at all. For example, if I set lambda_c = 40 parsecs, the map needs blobs that are 40 parsecs in diameter. A MWE to produce the map using my variance:
import numpy as np
import random
import matplotlib.pyplot as plt
from matplotlib.pyplot import show, plot
import scipy.integrate as integrate
from scipy.interpolate import RectBivariateSpline
n = 300
c = 3e8
G = 6.67e-11
M_sun = 1.989e30
pc = 3.086e16 # parsec
Dds = 1097.07889283e6*pc
Ds = 1726.62069147e6*pc
Dd = 1259e6*pc
FOV_arcsec_original = 5.
FOV_arcmin = FOV_arcsec_original/60.
pix2rad = ((FOV_arcmin/60.)/float(n))*np.pi/180.
rad2pix = 1./pix2rad
x_pix = np.linspace(-FOV_arcsec_original/2/pix2rad/180.*np.pi/3600.,FOV_arcsec_original/2/pix2rad/180.*np.pi/3600.,n)
y_pix = np.linspace(-FOV_arcsec_original/2/pix2rad/180.*np.pi/3600.,FOV_arcsec_original/2/pix2rad/180.*np.pi/3600.,n)
X_pix,Y_pix = np.meshgrid(x_pix,y_pix)
conc = 10.
M = 1e13*M_sun
r_s = 18*1e3*pc
lambda_c = 40*pc ### The important parameter that doesn't seem to manifest itself in the map when changed
rho_s = M/((4*np.pi*r_s**3)*(np.log(1+conc) - (conc/(1+conc))))
sigma_crit = (c**2*Ds)/(4*np.pi*G*Dd*Dds)
k_s = rho_s*r_s/sigma_crit
theta_s = r_s/Dd
Renorm = (4*G/c**2)*(Dds/(Dd*Ds))
#### Here I just interpolate and zoom into my field of view to get better resolutions
A = np.sqrt(X_pix**2 + Y_pix**2)*pix2rad/theta_s
A_1 = A[100:200,0:100]
n_x = n_y = 100
FOV_arcsec_x = FOV_arcsec_original*(100./300)
FOV_arcmin_x = FOV_arcsec_x/60.
pix2rad_x = ((FOV_arcmin_x/60.)/float(n_x))*np.pi/180.
rad2pix_x = 1./pix2rad_x
FOV_arcsec_y = FOV_arcsec_original*(100./300)
FOV_arcmin_y = FOV_arcsec_y/60.
pix2rad_y = ((FOV_arcmin_y/60.)/float(n_y))*np.pi/180.
rad2pix_y = 1./pix2rad_y
x1 = np.linspace(-FOV_arcsec_x/2/pix2rad_x/180.*np.pi/3600.,FOV_arcsec_x/2/pix2rad_x/180.*np.pi/3600.,n_x)
y1 = np.linspace(-FOV_arcsec_y/2/pix2rad_y/180.*np.pi/3600.,FOV_arcsec_y/2/pix2rad_y/180.*np.pi/3600.,n_y)
X1,Y1 = np.meshgrid(x1,y1)
n_x_2 = 500
n_y_2 = 500
x2 = np.linspace(-FOV_arcsec_x/2/pix2rad_x/180.*np.pi/3600.,FOV_arcsec_x/2/pix2rad_x/180.*np.pi/3600.,n_x_2)
y2 = np.linspace(-FOV_arcsec_y/2/pix2rad_y/180.*np.pi/3600.,FOV_arcsec_y/2/pix2rad_y/180.*np.pi/3600.,n_y_2)
X2,Y2 = np.meshgrid(x2,y2)
interp_spline = RectBivariateSpline(y1,x1,A_1)
A_2 = interp_spline(y2,x2)
A_3 = A_2[50:450,0:400]
n_x_3 = n_y_3 = 400
FOV_arcsec_x = FOV_arcsec_original*(100./300)*400./500.
FOV_arcmin_x = FOV_arcsec_x/60.
pix2rad_x = ((FOV_arcmin_x/60.)/float(n_x_3))*np.pi/180.
rad2pix_x = 1./pix2rad_x
FOV_arcsec_y = FOV_arcsec_original*(100./300)*400./500.
FOV_arcmin_y = FOV_arcsec_y/60.
pix2rad_y = ((FOV_arcmin_y/60.)/float(n_y_3))*np.pi/180.
rad2pix_y = 1./pix2rad_y
x3 = np.linspace(-FOV_arcsec_x/2/pix2rad_x/180.*np.pi/3600.,FOV_arcsec_x/2/pix2rad_x/180.*np.pi/3600.,n_x_3)
y3 = np.linspace(-FOV_arcsec_y/2/pix2rad_y/180.*np.pi/3600.,FOV_arcsec_y/2/pix2rad_y/180.*np.pi/3600.,n_y_3)
X3,Y3 = np.meshgrid(x3,y3)
n_x_4 = 1000
n_y_4 = 1000
x4 = np.linspace(-FOV_arcsec_x/2/pix2rad_x/180.*np.pi/3600.,FOV_arcsec_x/2/pix2rad_x/180.*np.pi/3600.,n_x_4)
y4 = np.linspace(-FOV_arcsec_y/2/pix2rad_y/180.*np.pi/3600.,FOV_arcsec_y/2/pix2rad_y/180.*np.pi/3600.,n_y_4)
X4,Y4 = np.meshgrid(x4,y4)
interp_spline = RectBivariateSpline(y3,x3,A_3)
A_4 = interp_spline(y4,x4)
############### Function to calculate variance
variance = np.zeros((len(A_4),len(A_4)))
def variance_fluctuations(x):
for i in xrange(len(x)):
for j in xrange(len(x)):
if x[j][i] < 1.:
variance[j][i] = (k_s**2)*(lambda_c/r_s)*((np.pi/x[j][i]) - (1./(x[j][i]**2 -1)**3.)*(((6.*x[j][i]**4. - 17.*x[j][i]**2. + 26)/3.)+ (((2.*x[j][i]**6. - 7.*x[j][i]**4. + 8.*x[j][i]**2. - 8)*np.arccosh(1./x[j][i]))/(np.sqrt(1-x[j][i]**2.)))))
elif x[j][i] > 1.:
variance[j][i] = (k_s**2)*(lambda_c/r_s)*((np.pi/x[j][i]) - (1./(x[j][i]**2 -1)**3.)*(((6.*x[j][i]**4. - 17.*x[j][i]**2. + 26)/3.)+ (((2.*x[j][i]**6. - 7.*x[j][i]**4. + 8.*x[j][i]**2. - 8)*np.arccos(1./x[j][i]))/(np.sqrt(x[j][i]**2.-1)))))
variance_fluctuations(A_4)
#### Creating the map
mean = 0
delta_kappa = np.random.normal(0,variance,A_4.shape)
xfinal = np.linspace(-FOV_arcsec_x*np.pi/180./3600.*Dd/pc/2,FOV_arcsec_x*np.pi/180./3600.*Dd/pc/2,1000)
yfinal = np.linspace(-FOV_arcsec_x*np.pi/180./3600.*Dd/pc/2,FOV_arcsec_x*np.pi/180./3600.*Dd/pc/2,1000)
Xfinal, Yfinal = np.meshgrid(xfinal,yfinal)
plt.contourf(Xfinal,Yfinal,delta_kappa,100)
plt.show()
The map looks like this, with the density of blobs increasing towards the right. However, the size of the blobs don't change and the map looks virtually the same whether I use lambda_c = 40*pc or lambda_c = 400*pc.
I'm wondering if the np.random.normal function isn't really doing what I expect it to do? I feel like the pixel scale of the map and the way samples are drawn make no link to the size of the blobs. Maybe there is a better way to create the map using the variance, would appreciate any insight.
I expect the map to look something like this , the blob sizes change based on the input parameters for my variance :
This is quite a well visited problem in (surprise surprise) astronomy and cosmology.
You could use lenstool: https://lenstools.readthedocs.io/en/latest/examples/gaussian_random_field.html
You could also try here:
https://andrewwalker.github.io/statefultransitions/post/gaussian-fields
Not to mention:
https://github.com/bsciolla/gaussian-random-fields
I am not reproducing code here because all credit goes to the above authors. However, they did just all come right out a google search :/
Easiest of all is probably a python module FyeldGenerator, apparently designed for this exact purpose:
https://github.com/cphyc/FyeldGenerator
So (adapted from github example):
pip install FyeldGenerator
from FyeldGenerator import generate_field
from matplotlib import use
use('Agg')
import matplotlib.pyplot as plt
import numpy as np
plt.figure()
# Helper that generates power-law power spectrum
def Pkgen(n):
def Pk(k):
return np.power(k, -n)
return Pk
# Draw samples from a normal distribution
def distrib(shape):
a = np.random.normal(loc=0, scale=1, size=shape)
b = np.random.normal(loc=0, scale=1, size=shape)
return a + 1j * b
shape = (512, 512)
field = generate_field(distrib, Pkgen(2), shape)
plt.imshow(field, cmap='jet')
plt.savefig('field.png',dpi=400)
plt.close())
This gives:
Looks pretty straightforward to me :)
PS: FoV implied a telescope observation of the gaussian random field :)
A completely different and much quicker way may be just to blur the delta_kappa array with gaussian filter. Try adjusting sigma parameter to alter the blobs size.
from scipy.ndimage.filters import gaussian_filter
dk_gf = gaussian_filter(delta_kappa, sigma=20)
Xfinal, Yfinal = np.meshgrid(xfinal,yfinal)
plt.contourf(Xfinal,Yfinal,dk_ma,100, cmap='jet')
plt.show();
this is image with sigma=20
this is image with sigma=2.5
ThunderFlash, try this code to draw the map:
# function to produce blobs:
from scipy.stats import multivariate_normal
def blob (positions, mean=(0,0), var=1):
cov = [[var,0],[0,var]]
return multivariate_normal(mean, cov).pdf(positions)
"""
now prepare for blobs generation.
note that I use less dense grid to pick blobs centers (regulated by `step`)
this makes blobs more pronounced and saves calculation time.
use this part instead of your code section below comment #### Creating the map
"""
delta_kappa = np.random.normal(0,variance,A_4.shape) # same
step = 10 #
dk2 = delta_kappa[::step,::step] # taking every 10th element
x2, y2 = xfinal[::step],yfinal[::step]
field = np.dstack((Xfinal,Yfinal))
print (field.shape, dk2.shape, x2.shape, y2.shape)
>> (1000, 1000, 2), (100, 100), (100,), (100,)
result = np.zeros(field.shape[:2])
for x in range (len(x2)):
for y in range (len(y2)):
res2 = blob(field, mean = (x2[x], y2[y]), var=10000)*dk2[x,y]
result += res2
# the cycle above took over 20 minutes on Ryzen 2700X. It could be accelerated by vectorization presumably.
plt.contourf(Xfinal,Yfinal,result,100)
plt.show()
you may want to play with var parameter in blob() to smoothen the image and with step to make it more compressed.
Here is the image that I got using your code (somehow axes are flipped and more dense areas on the top):
I have figured out a method to cluster disperse point data into structured 2-d array(like rasterize function). And I hope there are some better ways to achieve that target.
My work
1. Intro
1000 point data has there dimensions of properties (lon, lat, emission) whicn represent one factory located at (x,y) emit certain amount of CO2 into atmosphere
grid network: predefine the 2-d array in the shape of 20x20
http://i4.tietuku.com/02fbaf32d2f09fff.png
The code reproduced here:
#### define the map area
xc1,xc2,yc1,yc2 = 113.49805889531724,115.5030664238035,37.39995194888143,38.789235929357105
map = Basemap(llcrnrlon=xc1,llcrnrlat=yc1,urcrnrlon=xc2,urcrnrlat=yc2)
#### reading the point data and scatter plot by their position
df = pd.read_csv("xxxxx.csv")
px,py = map(df.lon, df.lat)
map.scatter(px, py, color = "red", s= 5,zorder =3)
#### predefine the grid networks
lon_grid,lat_grid = np.linspace(xc1,xc2,21), np.linspace(yc1,yc2,21)
lon_x,lat_y = np.meshgrid(lon_grid,lat_grid)
grids = np.zeros(20*20).reshape(20,20)
plt.pcolormesh(lon_x,lat_y,grids,cmap = 'gray', facecolor = 'none',edgecolor = 'k',zorder=3)
2. My target
Finding the nearest grid point for each factory
Add the emission data into this grid number
3. Algorithm realization
3.1 Raster grid
note: 20x20 grid points are distributed in this area represented by blue dot.
http://i4.tietuku.com/8548554587b0cb3a.png
3.2 KD-tree
Find the nearest blue dot of each red point
sh = (20*20,2)
grids = np.zeros(20*20*2).reshape(*sh)
sh_emission = (20*20)
grids_em = np.zeros(20*20).reshape(sh_emission)
k = 0
for j in range(0,yy.shape[0],1):
for i in range(0,xx.shape[0],1):
grids[k] = np.array([lon_grid[i],lat_grid[j]])
k+=1
T = KDTree(grids)
x_delta = (lon_grid[2] - lon_grid[1])
y_delta = (lat_grid[2] - lat_grid[1])
R = np.sqrt(x_delta**2 + y_delta**2)
for i in range(0,len(df.lon),1):
idx = T.query_ball_point([df.lon.iloc[i],df.lat.iloc[i]], r=R)
# there are more than one blue dot which are founded sometimes,
# So I'll calculate the distances between the factory(red point)
# and all blue dots which are listed
if (idx > 1):
distance = []
for k in range(0,len(idx),1):
distance.append(np.sqrt((df.lon.iloc[i] - grids[k][0])**2 + (df.lat.iloc[i] - grids[k][1])**2))
pos_index = distance.index(min(distance))
pos = idx[pos_index]
# Only find 1 point
else:
pos = idx
grids_em[pos] += df.so2[i]
4. Result
co2 = grids_em.reshape(20,20)
plt.pcolormesh(lon_x,lat_y,co2,cmap =plt.cm.Spectral_r,zorder=3)
http://i4.tietuku.com/6ded65c4ac301294.png
5. My question
Can someone point out some drawbacks or error of this method?
Is there some algorithms more aligned with my target?
Thanks a lot!
There are many for-loop in your code, it's not the numpy way.
Make some sample data first:
import numpy as np
import pandas as pd
from scipy.spatial import KDTree
import pylab as pl
xc1, xc2, yc1, yc2 = 113.49805889531724, 115.5030664238035, 37.39995194888143, 38.789235929357105
N = 1000
GSIZE = 20
x, y = np.random.multivariate_normal([(xc1 + xc2)*0.5, (yc1 + yc2)*0.5], [[0.1, 0.02], [0.02, 0.1]], size=N).T
value = np.ones(N)
df_points = pd.DataFrame({"x":x, "y":y, "v":value})
For equal space grids you can use hist2d():
pl.hist2d(df_points.x, df_points.y, weights=df_points.v, bins=20, cmap="viridis");
Here is the output:
Here is the code to use KdTree:
X, Y = np.mgrid[x.min():x.max():GSIZE*1j, y.min():y.max():GSIZE*1j]
grid = np.c_[X.ravel(), Y.ravel()]
points = np.c_[df_points.x, df_points.y]
tree = KDTree(grid)
dist, indices = tree.query(points)
grid_values = df_points.groupby(indices).v.sum()
df_grid = pd.DataFrame(grid, columns=["x", "y"])
df_grid["v"] = grid_values
fig, ax = pl.subplots(figsize=(10, 8))
ax.plot(df_points.x, df_points.y, "kx", alpha=0.2)
mapper = ax.scatter(df_grid.x, df_grid.y, c=df_grid.v,
cmap="viridis",
linewidths=0,
s=100, marker="o")
pl.colorbar(mapper, ax=ax);
the output is: