I have a function that returns the density estimate for points (x, y). I would like to iterate over all (x, y) points for a given 2-D grid and have the density function compute the estimate for each point so that I can have a matrix of density values which I can then plot.
Say the function is called density(x, y), that takes any point (x, y) and returns the density estimate (z) for that (x, y). I would like to be able to apply the function to each point within a 2-Dimensional grid and store the density estimate wherein I could use, say, plt.pcolormesh() to view the density.
How can I do this?
I think you want something on the lines of this.
First, define a density function. For simplicity, I am taking the function |x| + |y|.
def density(x, y):
return np.abs(x) + np.abs(y)
Now let's define the points along x and y dimensions and populate the arrays. In the following example, x and y are 1D arrays which store n_x and n_y points each sampled uniformly in [-1, 1].
n_x = 100
n_y = 100
x = np.linspace(-1, 1, n_x)
y = np.linspace(-1, 1, n_y)
Compute the grid in terms of pairs of points and compute the density D over each point in the grid.
xx, yy = np.meshgrid(x, y)
D = density(xx, yy)
Note that you don't need to explicitly iterate over meshgrid, you can use the seemingly scalar density() function for the arrays xx and yy as well. For details about meshgrid, see this page.
Next simply use pcolormesh() to display or save.
plt.pcolormesh(x, y, D)
plt.title('Density function = |x| + |y|')
plt.savefig('density.png')
The output is:
Related
I am experimenting with gradient descent and want to plot a contour of the gradient given independent variables x and y.
The optimization objective is to estimate a point given only a list of points and the distances to each of those points. I have a list of vectors of form [(x_1, y_1, d_1), ..., (x_n, y_n, d_n)] where d_i is the measured distance from the point to be estimated to the point (x_i, y_i), and I have a function g(x, y) that returns the gradient at the point (x, y). (The function g(x, y) uses the training vectors to calculate the gradient.)
The gradient descent algorithm works fine and arrives at a close estimate to the actual point coordinates. I want now to visualize the gradient as a contour map. I have the following for x and y values:
xlist = np.linspace(min([v[0] for v in vectors])-1, max([v[0] for v in vectors])+1, 100)
ylist = np.linspace(min([v[1] for v in vectors])-1, max([v[1] for v in vectors])+1, 100)
X, Y = np.meshgrid(xlist, ylist)
But now I need a Z value that maps each pair of coordinates in the grid mesh to g(x, y), and it needs to be the correct shape for the matplotlib contour plot. The examples I have seen have been useless because they all simply multiplied the x and y arrays to generate z values (which obviously will not work in this case), and all the tips, tricks, and SO answers I have encountered ultimately did not help.
How do I use my custom function g(x, y) to create the 2D Z array necessary for constructing a valid contour plot?
I have strangely arranged spatial data. It describes something like temperature. There are two arrays of spatial coordinates, x and y. Here is how those data are arranged:
plt.scatter(x,y, c=x.index, cmap=plt.cm.rainbow)
for label, xi, yi in zip([str(s) for s in range(x.size)], x, y):
plt.annotate(label, xy=(xi, yi))
plt.show()
I also have a vector u describing the temperature field. each entry u[i] corresponds to positions x[i] and y[i]. How can I reshape u into a meshgrid, where each mesh position corresponds to the positions described by x and y? For example, this could allow me to put the meshgrid of u into plt.contourf and it would look similar to plt.scatter(x, y, c=u). I don't want to assume that the grid is square.
Here's what I've tried so far:
nx = len(np.unique(x))
ny = len(np.unique(y))
U = np.reshape(u, (nx, ny))
I have a data set comprising a long array of x-values and an equally long array of y-values. For each (x,y) pair, I want to find the nearest points on a known function y(x).
I could in principle loop over each pair and perform a minimization such as scipy.optimize.cobyla, but looping in python is slow. Scipy's odr package looks interesting, but I can't figure out how to make it simply return the orthogonal vectors without also minimizing the whole thing (setting the maximum iterations "maxit" to zero doesn't give me what I want).
Is there a simple way to get this done using the speed of numpy arrays?
The answer is simple:
Don't loop over points in the list
Loop over points on your
function curve.
I take the liberty to rename your function y(x) into f(z) to avoid confusion.
import numpy as np
# x and y are your numpy arrays of point coords
x = np.array([1,2])
y = np.array([3,4])
# this is your "y(x)" function
def f(z):
return z**2
xmin = x.min()
xmax = x.max()
step = 0.01 # choose your step at the precision you want
# find distances to every point
zpoints = np.arange(xmin,xmax,step)
distances_squared = np.array([(y-f(z))**2+(x-z)**2 for z in zpoints])
# find z coords of closest points
zmin = zpoints[distances_squared.argmin(axis=0)]
fmin = np.array([f(z) for z in zmin])
for i in range(len(x)):
print("point on the curve {},{} is closest to {},{}".format(zmin[i],fmin[i],x[i],y[i]))
point on the curve 1.6700000000000006,2.788900000000002 is closest to 1,3
point on the curve 1.9900000000000009,3.9601000000000033 is closest to 2,4
There is a way to speed up Hennadii Madan's approach, by asking numpy to do the looping instead of python. As usual, this comes at the expense of additional RAM.
Below is the function I am now using for 2d. A nice feature is it's symmetric -- one can swap the data sets and the computation time will be the same.
def find_nearests_2d(x1, y1, x2, y2):
"""
Given two data sets d1 = (x1, y1) and d2 = (x2, y2), return the x,y pairs
from d2 that are closest to each pair from x1, the difference vectors, and
the d2 indices of these closest points.
Parameters
----------
x1
1D array of x-values for data set 1.
y1
1D array of y-values for data set 1 (must match size of x1).
x2
1D array of x-values for data set 2.
y2
1D array of y-values for data set 2 (must match size of x2).
Returns x2mins, y2mins, xdiffs, ydiffs, indices
-------
x2mins
1D array of minimum-distance x-values from data set 2. One value for each x1.
y2mins
1D array of minimum-distance y-values from data set 2. One value for each y1.
xdiffs
1D array of differences in x. One value for each x1.
ydiffs
1D array of differences in y. One value for each y1.
indices
Indices of each minimum-distance point in data set 2. One for each point in
data set 1.
"""
# Generate every combination of points for subtracting
x1s, x2s = _n.meshgrid(x1, x2)
y1s, y2s = _n.meshgrid(y1, y2)
# Calculate all the differences
dx = x1s - x2s
dy = y1s - y2s
d2 = dx**2 + dy**2
# Find the index of the minimum for each data point
n = _n.argmin(d2, 0)
# Index for extracting from the meshgrids
m = range(len(n))
return x2s[n,m], y2s[n,m], dx[n,m], dy[n,m], d2[n,m], n
One can also then use this to quickly estimate the distance between x,y pairs and a function:
def find_nearests_function(x, y, f, *args, fpoints=1000):
"""
Takes a data set (arrays of x and y values), and a function f(x, *args),
then estimates the points on the curve f(x) that are closest to each of
the data set's x,y pairs.
Parameters
----------
x
1D array of x-values for data set 1.
y
1D array of y-values for data set 1 (must match size of x).
f
A function of the form f(x, *args) with optional additional arguments.
*args
Optional additional arguments to send to f (after argument x).
fpoints=1000
Number of evenly-spaced points to search in the x-domain (automatically
the maximum possible range).
"""
# Make sure everything is a numpy array
x = _n.array(x)
y = _n.array(y)
# First figure out the range we need for f. Since the function is single-
# valued, we can put bounds on the x-range: for each point, calculate the
# y-distance, and subtract / add this to the x-values
dys = _n.abs(f(x)-y)
xmin = min(x-dys)
xmax = max(x+dys)
# Get "dense" function arrays
xf = _n.linspace(xmin, xmax, fpoints)
yf = f(xf,*args)
# Find all the minima
xfs, yfs, dxs, dys, d2s, n = find_nearests_2d(x, y, xf, yf)
# Return this info plus the function arrays used
return xfs, yfs, dxs, dys, d2s, n, xf, yf
If this is part of an orthogonal distance regression (like in my case), the differences dx and dy can be readily scaled by error bar data sets without much overhead, such that the returned distances are studentized (unitless) residuals.
Ultimately, this "search everywhere uniformly" technique will only get you close, and will fail if the function isn't particularly smooth over the range of x data.
Quick test code:
x = [1,2,5]
y = [1,-1,1]
def f(x): return _n.cos(x)
fxmin, fymin, dxmin, dymin, d2min, n, xf, yf = find_nearests_function(x, y, f)
import pylab
pylab.plot(x,y, marker='o', ls='', color='m', label='input points')
pylab.plot(xf,yf, color='b', label='function')
pylab.plot(fxmin,fymin, marker='o', ls='', color='r', label='nearest points')
pylab.legend()
pylab.show()
produces
I have 3D dataset (X,Y,Z). I would like to perform KDE, plot the data and its estimation. Then, get the zero crossings and plot it with KDE. My attempt is below. I have the following questions:
line X, Y = np.mgrid[xmin:xmax:100j, ymin:ymax:100j] and positions = np.vstack([X.ravel(),Y.ravel(),Z.ravel()])as here (kde documentation) will they have any effects in visualising the real estimation for the original data?. I don't really understand why I have to use my min and max to perform KDE and then use ravel()?
why I have to transpose the data in f = np.reshape(kernel(positions).T, X.shape)
Is the code correct ?
I failed to plot the original data with KDE estimation and KDE estimation/ original data with zero crossing:
Should zero crossings be vector ?. In the code below it's tuple
df = pd.read_csv(file, delimiter = ',')
Convert series from data-frame into arrays
X = np.array(df['x'])
Y = np.array(df['y'])
Z = np.array(df['z'])
data = np.vstack([X, Y, Z])
# perform KDE
kernel = scipy.stats.kde.gaussian_kde(data)
density = kernel(data)
fig, ax = plt.subplots(subplot_kw=dict(projection='3d'))
x, y, z = data
scatter = ax.scatter(x, y, z, c=density)
xmin = values[0].min()
xmax = values[0].max()
ymin = values[1].min()
ymax = values[1].max()
zmin = values[2].min()
zmax = values[2].max()
X,Y, Z = np.mgrid[xmin:xmax:100j,ymin:ymax:100j,zmin:zmax:100j]
positions = np.vstack([X.ravel(),Y.ravel(),Z.ravel()])
f = np.reshape(kernel(positions).T, X.shape)
derivative = np.gradient(f)
dz, dy, dx = derivative
xdiff = np.sign(dx) # along X-axis
ydiff = np.sign(dy) # along Y-axis
zdiff = np.sign(dz) # along Z-axis
xcross = np.where(xdiff[:-1] != xdiff[1:])
ycross = np.where([ydiff[:-1] != ydiff[1:]])
zcross = np.where([zdiff[:-1] != zdiff[1:]])
Zerocross = xcross + ycross + zcross
line X, Y = np.mgrid[xmin:xmax:100j, ymin:ymax:100j] and positions = np.vstack([X.ravel(),Y.ravel(),Z.ravel()]) as here (kde documentation) will they have any effects in visualising the real estimation for the original data?. I don't really understand why I have to use my min and max to perform KDE and then use ravel()?
Those two lines set up a grid of x, y, z locations where the KDE will be evaluated. In the code above they are only being used to estimate the derivative of the kernel density function. Since they aren't currently being used for anything related to plotting, they won't affect the visualisation.
xmin, xmax etc. are used to ensure that the grid covers the full range of x, y, z values in your data. The syntax xmin:xmax:100j does the equivalent of np.linspace(xmin, xmax, 100), i.e. np.mgrid returns 100 evenly spaced points between xmin and xmax.
The X, Y and Z arrays returned by np.mgrid will each have shapes (100, 100, 100), whereas the positions argument to kernel(positions) needs to be (n_dimensions, n_points). The line np.vstack([X.ravel(),Y.ravel(),Z.ravel()]) just reshapes the output of np.mgrid into this form. .ravel() flattens each (100, 100, 100) array into a (1000000,) vector, and np.vstack concatenates them over the first dimension to make a (3, 1000000) array of points.
why I have to transpose the data in f = np.reshape(kernel(positions).T, X.shape)
You don't :-). The output of kernel(positions) is a 1D vector, so transposing it will have no effect.
I failed to plot the original data with KDE estimation and KDE estimation/ original data with zero crossing:
What did you try? The code above seems to estimate zero-crossings of the gradient of the kernel density function, but doesn't include any code to plot them. What sort of a plot do you want to make?
Should zero crossings be vector ?. In the code below it's tuple
When you call np.where(x) where x is a multidimensional array, you get back a tuple containing the indices where x is non-zero. Since xdiff[:-1] != xdiff[1:] is a 3D array, you will get back a tuple containing three 1D arrays of indices, one per dimension.
You probably don't want the extra set of square brackets in np.where([ydiff[:-1] != ydiff[1:]]), since in that case [ydiff[:-1] != ydiff[1:]] will be treated as a (1, 100, 100, 100) array rather than (100, 100, 100), and you'll therefore get a tuple containing 4 arrays of indices rather than 3 (the first one will be all zeros, since the size in the first dimension is 1).
I would like to know how does numpy.gradient work.
I used gradient to try to calculate group velocity (group velocity of a wave packet is the derivative of frequencies respect to wavenumbers, not a group of velocities). I fed a 3 column array to it, the first 2 colums are x and y coords, the third column is the frequency of that point (x,y). I need to calculate gradient and I did expect a 2d vector, being gradient definition
df/dx*i+df/dy*j+df/dz*k
and my function only a function of x and y i did expect something like
df/dx*i+df/dy*j
But i got 2 arrays with 3 colums each, i.e. 2 3d vectors; at first i thought that the sum of the two would give me the vector i were searchin for but the z component doesn't vanish. I hope i've been sufficiently clear in my explanation. I would like to know how numpy.gradient works and if it's the right choice for my problem. Otherwise i would like to know if there's any other python function i can use.
What i mean is: I want to calculate gradient of an array of values:
data=[[x1,x2,x3]...[x1,x2,x3]]
where x1,x2 are point coordinates on an uniform grid (my points on the brillouin zone) and x3 is the value of frequency for that point. I give in input also steps for derivation for the 2 directions:
stepx=abs(max(unique(data[:,0])-min(unique(data[:,0]))/(len(unique(data[:,0]))-1)
the same for y direction.
I didn't build my data on a grid, i already have a grid and this is why kind examples given here in answers do not help me.
A more fitting example should have a grid of points and values like the one i have:
data=[]
for i in range(10):
for j in range(10):
data.append([i,j,i**2+j**2])
data=array(data,dtype=float)
gx,gy=gradient(data)
another thing i can add is that my grid is not a square one but has the shape of a polygon being the brillouin zone of a 2d crystal.
I've understood that numpy.gradient works properly only on a square grid of values, not what i'm searchin for. Even if i make my data as a grid that would have lots of zeroes outside of the polygon of my original data, that would add really high vectors to my gradient affecting (negatively) the precision of calculation. This module seems to me more a toy than a tool, it has severe limitations imho.
Problem solved using dictionaries.
You need to give gradient a matrix that describes your angular frequency values for your (x,y) points. e.g.
def f(x,y):
return np.sin((x + y))
x = y = np.arange(-5, 5, 0.05)
X, Y = np.meshgrid(x, y)
zs = np.array([f(x,y) for x,y in zip(np.ravel(X), np.ravel(Y))])
Z = zs.reshape(X.shape)
gx,gy = np.gradient(Z,0.05,0.05)
You can see that plotting Z as a surface gives:
Here is how to interpret your gradient:
gx is a matrix that gives the change dz/dx at all points. e.g. gx[0][0] is dz/dx at (x0,y0). Visualizing gx helps in understanding:
Since my data was generated from f(x,y) = sin(x+y) gy looks the same.
Here is a more obvious example using f(x,y) = sin(x)...
f(x,y)
and the gradients
update Let's take a look at the xy pairs.
This is the code I used:
def f(x,y):
return np.sin(x)
x = y = np.arange(-3,3,.05)
X, Y = np.meshgrid(x, y)
zs = np.array([f(x,y) for x,y in zip(np.ravel(X), np.ravel(Y))])
xy_pairs = np.array([str(x)+','+str(y) for x,y in zip(np.ravel(X), np.ravel(Y))])
Z = zs.reshape(X.shape)
xy_pairs = xy_pairs.reshape(X.shape)
gy,gx = np.gradient(Z,.05,.05)
Now we can look and see exactly what is happening. Say we wanted to know what point was associated with the value atZ[20][30]? Then...
>>> Z[20][30]
-0.99749498660405478
And the point is
>>> xy_pairs[20][30]
'-1.5,-2.0'
Is that right? Let's check.
>>> np.sin(-1.5)
-0.99749498660405445
Yes.
And what are our gradient components at that point?
>>> gy[20][30]
0.0
>>> gx[20][30]
0.070707731517679617
Do those check out?
dz/dy always 0 check.
dz/dx = cos(x) and...
>>> np.cos(-1.5)
0.070737201667702906
Looks good.
You'll notice they aren't exactly correct, that is because my Z data isn't continuous, there is a step size of 0.05 and gradient can only approximate the rate of change.