Transform polynomial graph - Multiply by a list of values - python

I'm using matplotlib. I have a list of 600 values. I also have an polynomial function that I'm graphing with values between 0 and 600. I'm trying to multiply every point by the corresponding value in the list.
I could evaluate the polynomial in a loop, and do the multiplication there, but I would end up with a graph of points instead of a line.
I think I might need to use the Transformations framework, but not sure how to apply it to the graph.
Edit:
a = [5, 2, 3 ... 0, 2, 8] # 600 values
poly_a = polyfit(a)
deriv_a = polyder(poly_a)
b = [232, 342 ... 346, 183] # 600 values
I need to multiply deriv_a by b.

I think you're misunderstanding things a bit. This is what numpy is for (if you're using matplotlib it's already converting things to a numpy array when you plot, regardless.)
Just convert your "list of 600 values" to a numpy array and then evaluate the polynomial.
As an example:
import numpy as np
import matplotlib.pyplot as plt
# Your "list of 600 values"...
x = np.linspace(0, 10, 600)
# Evaluate a polynomial at each location in `x`
y = -1.3 * x**3 + 10 * x**2 - 3 * x + 10
plt.plot(x, y)
plt.show()
Edit:
Based on your edit, it sounds like you're asking how to use numpy.polyder?
Basically, you just want to use numpy.polyval to evaluate the polynomial returned by polyder at your point locations.
To build on the example above:
import numpy as np
import matplotlib.pyplot as plt
# Your "list of 600 values"...
x = np.linspace(0, 10, 600)
coeffs = [-1.3, 10, 3, 10]
# Evaluate a polynomial at each location in `x`
y = np.polyval(coeffs, x)
# Calculate the derivative
der_coeffs = np.polyder(coeffs)
# Evaluate the derivative on the same points...
y_prime = np.polyval(der_coeffs, x)
# Plot the two...
fig, (ax1, ax2) = plt.subplots(nrows=2)
ax1.plot(x, y)
ax1.set_title('Original Function')
ax2.plot(x, y_prime)
ax2.set_title('Deriviative')
plt.show()

Related

Estimate joint density with 2d Gaussian kernel

I have the following data set where I have to estimate the joint density of 'bwt' and 'age' using kernel density estimation with a 2-dimensional Gaussian kernel and width h=5. I can't use modules such as scipy where there are ready functions to do this and I have to built functions to calculate the density. Here's what I've gotten so far.
import numpy as np
import pandas as pd
babies_full = pd.read_csv("https://www2.helsinki.fi/sites/default/files/atoms/files/babies2.txt", sep='\t')
#Getting the columns I need
babies_full1=babies_full[['gestation', 'age']]
x=np.array(babies_full1,'int')
#2d Gaussian kernel
def k_2dgauss(x):
return np.exp(-np.sum(x**2, 1)/2) / np.sqrt(2*np.pi)
#Multivariate kernel density
def mv_kernel_density(t, x, h):
d = x.shape[1]
return np.mean(k_2dgauss((t - x)/h))/h**d
t = np.linspace(1.0, 5.0, 50)
h=5
print(mv_kernel_density(t, x, h))
However, I get a value error 'ValueError: operands could not be broadcast together with shapes (50,) (1173,2)' which think is because different shape of the matrices. I also don't understand why k_2dgauss(x) for me returns an array of zeros since it should only return one value. In general, I am new to the concept of kernel density estimation I don't really know if I've written the functions right so any hints would help!
Following on from my comments on your original post, I think this is what you want to do, but if not then come back to me and we can try again.
# info supplied by OP
import numpy as np
import pandas as pdbabies_full = \
pd.read_csv("https://www2.helsinki.fi/sites/default/files/atoms/files/babies2.txt", sep='\t')
#Getting the columns I need
babies_full1=babies_full[['gestation', 'age']]
x=np.array(babies_full1,'int')
# my contributions
from math import floor, ceil
def binMaker(arr, base):
"""function I already use for this sort of thing.
arr is the arr I want to make bins for
base is the bin separation, but does require you to import floor and ceil
otherwise you can make these bins manually yourself"""
binMin = floor(arr.min() / base) * base
binMax = ceil(arr.max() / base) * base
return np.arange(binMin, binMax + base, base)
bins1 = binMaker(x[:,0], 20.) # bins from 140. to 360. spaced 20 apart
bins2 = binMaker(x[:,1], 5.) # bins from 15. to 45. spaced 5. apart
counts = np.zeros((len(bins1)-1, len(bins2)-1)) # empty array for counts to go in
for i in range(0, len(bins1)-1): # loop over the intervals, hence the -1
boo = (x[:,0] >= bins1[i]) * (x[:,0] < bins1[i+1])
for j in range(0, len(bins2)-1): # loop over the intervals, hence the -1
counts[i,j] = np.count_nonzero((x[boo,1] >= bins2[j]) *
(x[boo,1] < bins2[j+1]))
# if you want your PDF to be a fraction of the total
# rather than the number of counts, do the next line
counts /= x.shape[0]
# plotting
import matplotlib.pyplot as plt
from matplotlib.colors import BoundaryNorm
# setting the levels so that each number in counts has its own colour
levels = np.linspace(-0.5, counts.max()+0.5, int(counts.max())+2)
cmap = plt.get_cmap('viridis') # or any colormap you like
norm = BoundaryNorm(levels, ncolors=cmap.N, clip=True)
fig, ax = plt.subplots(1, 1, figsize=(6,5), dpi=150)
pcm = ax.pcolormesh(bins2, bins1, counts, ec='k', lw=1)
fig.colorbar(pcm, ax=ax, label='Counts (%)')
ax.set_xlabel('Age')
ax.set_ylabel('Gestation')
ax.set_xticks(bins2)
ax.set_yticks(bins1)
plt.title('Manually making a 2D (joint) PDF')
If this is what you wanted, then there is an easier way with np.histgoram2d, although I think you specified it had to be using your own methods, and not built in functions. I've included it anyway for completeness' sake.
pdf = np.histogram2d(x[:,0], x[:,1], bins=(bins1,bins2))[0]
pdf /= x.shape[0] # again for normalising and making a percentage
levels = np.linspace(-0.5, pdf.max()+0.5, int(pdf.max())+2)
cmap = plt.get_cmap('viridis') # or any colormap you like
norm = BoundaryNorm(levels, ncolors=cmap.N, clip=True)
fig, ax = plt.subplots(1, 1, figsize=(6,5), dpi=150)
pcm = ax.pcolormesh(bins2, bins1, pdf, ec='k', lw=1)
fig.colorbar(pcm, ax=ax, label='Counts (%)')
ax.set_xlabel('Age')
ax.set_ylabel('Gestation')
ax.set_xticks(bins2)
ax.set_yticks(bins1)
plt.title('using np.histogram2d to make a 2D (joint) PDF')
Final note - in this example, the only place where counts doesn't equal pdf is for the bin between 40 <= age < 45 and 280 <= gestation 300, which I think is due to how, in my manual case, I've used <= and <, and I'm a little unsure how np.histogram2d handles values outside the bin ranges, or on the bin edges etc. We can see the element of x that is responsible
>>> print(x[1011])
[280 45]

Efficiency of Code for Matrix operations on Wave equation

I've written a wave superposition program that overlaps wave equations of multiple wave sources and then gives a single wave which contains all the constructive and destructive interferences and plots the intensities of superpositions. This code works but is very inefficient.
If I give the points here as a 1000x1000 grid, the whole program takes a while to run.
Is there any way I can make this code more efficient and clean using one or all of the following (Functions, lambda functions, mappable, defining 2D numpy arrays directly or similar?).
If so, is there a way to measure the time it takes to run the operation. This isn't homework, am trying to build something on my own for my optics research. Thanks so much for your help in advance, I really appreciate it.
import numpy as np
from matplotlib import pyplot as plt
from sklearn import mixture
import matplotlib as mpl
from mpl_toolkits.mplot3d import Axes3D
from matplotlib import cm
import scipy
import scipy.ndimage
import scipy.misc
xmin, xmax = 0,25
ymin, ymax = -12.500,12.500
xpoints, ypoints = 500,500
points,amp,distance,wave=[],[],[],[]
numsource=11
x=np.linspace(xmin, xmax, xpoints)
y=np.linspace(ymin, ymax, ypoints)
xx,yy=np.meshgrid(x,y, sparse=False)
pointer = np.concatenate([xx.reshape(-1,1),yy.reshape(-1, 1)], axis=-1)
counter=len(pointer)
A=[0]*counter #for storing amplitudes later
# Arrays of point source locations
source=tuple([0,(((numsource-1)/2)-i)*2.5] for i in range(0,numsource-1))
# Arrays of Subtraction of Coordinates of Sources from Point sources (For Distance from source)
points=[(pointer-source[p]) for p in range(0,numsource-1)]
# distance of each point in map from its source (Sqrt of Coordinate difference sum)
distance=[(points[i][:,0]**2 + points[i][:,1]**2)**0.5 for i in range(0,numsource-1)]
# Amplitudes of each wave defined arbitrarily
amp= np.array([4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4])
k=20
# wave equation for each wave based on defined amplitude and distance from source
wave=([amp[i] * np.sin (k*distance[i]) for i in range(0,numsource-1)])
#superposition
for i in range(0,numsource-1):
A=A+wave[i]
A=np.asarray(A)
print(A)
intensity = A**2
#constructive, destructive superposition plot
plt.figure(figsize=(10,10))
plt.xlim(xmin,xmax)
plt.ylim(ymin,ymax)
plt.scatter(pointer[:,0], pointer[:,1],c=intensity, cmap='viridis')
plt.colorbar()
I managed to reduce the computation time from 166 to 146 ms on my machine.
You want to get rid of the part where A is initialized as a 250,000x1 list then converted to an array. You can initialize it as an array directly. The parts I disabled are left as comments in your code.
#import matplotlib as mpl
from matplotlib import pyplot as plt
#from mpl_toolkits.mplot3d import Axes3D
#from matplotlib import cm
import numpy as np
#import scipy
#import scipy.ndimage
#import scipy.misc
#from sklearn import mixture
import time
class time_this_scope():
"A handy context manager to measure timings."""
def __init__(self):
self.t0 = None
self.dt = None
def __enter__(self):
self.t0 = time.perf_counter()
return self
def __exit__(self, *args):
self.dt = time.perf_counter() - self.t0
print(f"This scope took {int(self.dt*1000)} ms.")
def main():
xmin, xmax = 0, 25
ymin, ymax = -12.500, 12.500
xpoints, ypoints = 500, 500
points, amp, distance, wave = [], [], [], []
numsource = 11
x = np.linspace(xmin, xmax, xpoints)
y = np.linspace(ymin, ymax, ypoints)
xx, yy = np.meshgrid(x, y, sparse=False)
pointer = np.concatenate([xx.reshape(-1, 1), yy.reshape(-1, 1)], axis=-1)
# counter = len(pointer)
# A = [0]*counter #for storing amplitudes later
A = np.zeros(len(pointer))
# Arrays of point source locations
source = tuple((0, (((numsource-1)/2)-i)*2.5) for i in range(0, numsource-1))
# Arrays of Subtraction of Coordinates of Sources from Point sources (For Distance from source)
points = [(pointer-source[i]) for i in range(0, numsource-1)]
# distance of each point in map from its source (Sqrt of Coordinate difference sum)
distance = [(points[i][:,0]**2 + points[i][:,1]**2)**0.5 for i in range(0, numsource-1)]
# Amplitudes of each wave defined arbitrarily
# amp = np.array([4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4])
amp = np.array([4]*numsource)
k = 20
# wave equation for each wave based on defined amplitude and distance from source
wave = [amp[i] * np.sin(k*distance[i]) for i in range(0, numsource-1)]
#superposition
for i in range(0, numsource-1):
A = A + wave[i]
# A = np.asarray(A)
print(A)
intensity = A**2
#constructive, destructive superposition plot
# plt.figure(figsize=(10, 10))
# plt.xlim(xmin, xmax)
# plt.ylim(ymin, ymax)
# plt.scatter(pointer[:,0], pointer[:,1], c=intensity, cmap='viridis')
# plt.colorbar()
# Timing.
total_time = 0
N = 20
for _ in range(N):
with time_this_scope() as timer:
main()
total_time += timer.dt
print(total_time/N)
Now the plotting itself still takes 500 ms, so I thinkimshow would be a better option than scatter. I doubt matplotlib is very happy to have 250 000 markers to render.

How do I use numpy's polyfit when applied to two lists?

y_boxes_1 = [y[i:i + divisor_1] for i in range(0, len(y), divisor_1)]
x_boxes_1 = [x[i:i + divisor_1] for i in range(0, len(x), divisor_1)]
The above code divides a list by a divisor. What I want to do is use numpy polyfit to create a new list of polynomial coefficients for each individual box of y and box of x.
If:
x_boxes_1 = [[1,2,3,4,5,6,7,8],[9,10,11,12,13,14,15,16]]
y_boxes_1 = [[3,5,2,3,1,2,3,4],[2,3,4,1,5,6,7,10]]
Then polyfit would use only the values from x_boxes_1[0] and y_boxes_1[0] and x_boxes_1[1] and y_boxes_1[1] and produce a new list with the coefficients from each individual calculation.
How would I accomplish this?
As far as I understand x_boxes_1 holds the coordinates, and y_boxes_1 holds the points that you want to fit a polynomial. And, you want to call polyfit two times for each corresponding pair. If that is the case, this should work:
import numpy as np
import matplotlib.pyplot as plt
x_boxes_1 = [[1,2,3,4,5,6,7,8],[9,10,11,12,13,14,15,16]]
y_boxes_1 = [[3,5,2,3,1,2,3,4],[2,3,4,1,5,6,7,10]]
zipped = zip(x_boxes_1, y_boxes_1)
z_boxes_1 = [np.polyfit(x,y,6) for x,y in zipped]
Note that degree of the fitting polynomial is 6.
You can plot to verify:
xp_boxes_1 = [np.linspace(1, 8, 100), np.linspace(9, 16, 100)]
for i in [0,1]:
x = x_boxes_1[i]
y = y_boxes_1[i]
z = z_boxex_1[i]
xp = xp_boxes_1[i]
p = np.poly1d(z)
plt.subplot(1,2,i+1)
plt.plot(x, y, '.', xp, p(xp), '-');

Spline in 3D can not be differentiated due to an AttributeError

I am trying to fit a smoothing B-spline to some data and I found this very helpful post on here. However, I not only need the spline, but also its derivatives, so I tried to add the following code to the example:
tck_der = interpolate.splder(tck, n=1)
x_der, y_der, z_der = interpolate.splev(u_fine, tck_der)
For some reason this does not seem to work due to some data type issues. I get the following traceback:
Traceback (most recent call last):
File "interpolate_point_trace.py", line 31, in spline_example
tck_der = interpolate.splder(tck, n=1)
File "/home/user/anaconda3/lib/python3.7/site-packages/scipy/interpolate/fitpack.py", line 657, in splder
return _impl.splder(tck, n)
File "/home/user/anaconda3/lib/python3.7/site-packages/scipy/interpolate/_fitpack_impl.py", line 1206, in splder
sh = (slice(None),) + ((None,)*len(c.shape[1:]))
AttributeError: 'list' object has no attribute 'shape'
The reason for this seems to be that the second argument of the tck tuple contains a list of numpy arrays. I thought turning the input data to be a numpy array as well would help, but it does not change the data types of tck.
Does this behavior reflect an error in scipy, or is the input malformed?
I tried manually turning the list into an array:
tck[1] = np.array(tck[1])
but this (which didn't surprise me) also gave an error:
ValueError: operands could not be broadcast together with shapes (0,8) (7,1)
Any ideas of what the problem could be? I have used scipy before and on 1D splines the splder function works just fine, so I assume it has something to do with the spline being a line in 3D.
------- edit --------
Here is a minimum working example:
import numpy as np
import matplotlib.pyplot as plt
from scipy import interpolate
from mpl_toolkits.mplot3d import Axes3D
total_rad = 10
z_factor = 3
noise = 0.1
num_true_pts = 200
s_true = np.linspace(0, total_rad, num_true_pts)
x_true = np.cos(s_true)
y_true = np.sin(s_true)
z_true = s_true / z_factor
num_sample_pts = 80
s_sample = np.linspace(0, total_rad, num_sample_pts)
x_sample = np.cos(s_sample) + noise * np.random.randn(num_sample_pts)
y_sample = np.sin(s_sample) + noise * np.random.randn(num_sample_pts)
z_sample = s_sample / z_factor + noise * np.random.randn(num_sample_pts)
tck, u = interpolate.splprep([x_sample, y_sample, z_sample], s=2)
x_knots, y_knots, z_knots = interpolate.splev(tck[0], tck)
u_fine = np.linspace(0, 1, num_true_pts)
x_fine, y_fine, z_fine = interpolate.splev(u_fine, tck)
# this is the part of the code I inserted: the line under this causes the crash
tck_der = interpolate.splder(tck, n=1)
x_der, y_der, z_der = interpolate.splev(u_fine, tck_der)
# end of the inserted code
fig2 = plt.figure(2)
ax3d = fig2.add_subplot(111, projection='3d')
ax3d.plot(x_true, y_true, z_true, 'b')
ax3d.plot(x_sample, y_sample, z_sample, 'r*')
ax3d.plot(x_knots, y_knots, z_knots, 'go')
ax3d.plot(x_fine, y_fine, z_fine, 'g')
fig2.show()
plt.show()
Stumbled into the same problem...
I circumvented the error by using interpolate.splder(tck, n=1) and instead used interpolate.splev(spline_ev, tck, der=1) which returns the derivatives at the points spline_ev (see Scipy Doku).
If you need the spline I think you can then use interpolate.splprep() again.
In total something like:
import numpy as np
from scipy import interpolate
import matplotlib.pyplot as plt
points = np.random.rand(10,2) * 10
(tck, u), fp, ier, msg = interpolate.splprep(points.T, s=0, k=3, full_output=True)
spline_ev = np.linspace(0.0, 1.0, 100, endpoint=True)
spline_points = interpolate.splev(spline_ev, tck)
# Calculate derivative
spline_der_points = interpolate.splev(spline_ev, tck, der=1)
spline_der = interpolate.splprep(spline_der_points.T, s=0, k=3, full_output=True)
# Plot the data and derivative
fig = plt.figure()
plt.plot(points[:,0], points[:,1], '.-', label="points")
plt.plot(spline_points[0], spline_points[1], '.-', label="tck")
plt.plot(spline_der_points[0], spline_der_points[1], '.-', label="tck_der")
# Show tangent
plt.arrow(spline_points[0][23]-spline_der_points[0][23], spline_points[1][23]-spline_der_points[1][23], 2.0*spline_der_points[0][23], 2.0*spline_der_points[1][23])
plt.legend()
plt.show()
EDIT:
I also opened an Issue on Github and according to ev-br the usage of interpolate.splprep is depreciated and one should use make_interp_spline / BSpline instead.
As noted in other answers, splprep output is incompatible with splder, but is compatible with splev. And the latter can evaluate the derivatives.
However, for interpolation, there is an alternative approach, which avoids splprep altogether. I'm basically copying a reply on the SciPy issue tracker (https://github.com/scipy/scipy/issues/10389):
Here's an example of replicating the splprep outputs. First let's make sense out of the splprep output:
# start with the OP example
import numpy as np
from scipy import interpolate
points = np.random.rand(10,2) * 10
(tck, u), fp, ier, msg = interpolate.splprep(points.T, s=0, k=3, full_output=True)
# check the meaning of the `u` array: evaluation of the spline at `u`
# gives back the original points (up to a list/transpose)
xy = interpolate.splev(u, tck)
xy = np.asarray(xy)
np.allclose(xy.T, points)
Next, let's replicate it without splprep. First, build the u array: the curve is represented parametrically, and u is essentially an approximation for the arc length. Other parametrizations are possible, but here let's stick to what splprep does. Translating the pseudocode from the doc page, https://docs.scipy.org/doc/scipy/reference/generated/scipy.interpolate.splprep.html
vv = np.sum((points[1:, :] - points[:-1, :])**2, axis=1)
vv = np.sqrt(vv).cumsum()
vv/= vv[-1]
vv = np.r_[0, vv]
# check:
np.allclose(u, vv)
Now, interpolate along the parametric curve: points vs vv:
spl = interpolate.make_interp_spline(vv, points)
# check spl.t vs knots from splPrep
spl.t - tck[0]
The result, spl, is a BSpline object which you can evaluate, differentiate etc in a usual way:
np.allclose(points, spl(vv))
# differentiate
spl_derivative = spl.derivative(vv)

Gaussian Kernel Density Estimation (KDE) of large numbers in Python

I have 1000 large numbers, randomly distributed in range 37231 to 56661.
I am trying to use the stats.gaussian_kde but something does not work.
(maybe because of my poor knowledge of statistics?).
Here is the code:
from scipy import stats.gaussian_kde
import matplotlib.pyplot as plt
# 'data' is a 1D array that contains the initial numbers 37231 to 56661
xmin = min(data)
xmax = max(data)
# get evenly distributed numbers for X axis.
x = linspace(xmin, xmax, 1000) # get 1000 points on x axis
nPoints = len(x)
# get actual kernel density.
density = gaussian_kde(data)
y = density(x)
# print the output data
for i in range(nPoints):
print "%s %s" % (x[i], y[i])
plt.plot(x, density(x))
plt.show()
In the printout, I get x values in the column 1, and zeros in the column 2.
The plot shows a flat line.
I simply can not find the solution.
I tried a very wide range of X-es, the same result.
What is the problem? What am I doing wrong?
Could the large numbers be the cause?
I think what's happening is that your data array is made up of integers, which leads to problems:
>>> import numpy, scipy.stats
>>>
>>> data = numpy.random.randint(37231, 56661,size=10)
>>> xmin, xmax = min(data), max(data)
>>> x = numpy.linspace(xmin, xmax, 10)
>>>
>>> density = scipy.stats.gaussian_kde(data)
>>> density.dataset
array([[52605, 45451, 46029, 40379, 48885, 41262, 39248, 38247, 55987,
44019]])
>>> density(x)
array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0])
but if we use floats:
>>> density = scipy.stats.gaussian_kde(data*1.0)
>>> density.dataset
array([[ 52605., 45451., 46029., 40379., 48885., 41262., 39248.,
38247., 55987., 44019.]])
>>> density(x)
array([ 4.42201513e-05, 5.51130237e-05, 5.94470211e-05,
5.78485526e-05, 5.21379448e-05, 4.43176188e-05,
3.66725694e-05, 3.06297511e-05, 2.56191024e-05,
2.01305127e-05])
I have made a function to do this. You can vary the bandwidth as a parameter of the function. That is, smaller number = more pointy, larger number = smoother. The default is 0.3.
It works in IPython notebook --pylab=inline
The number of bins is optimized and coded so will vary on the number of variables in your data.
import scipy.stats as stats
import matplotlib.pyplot as plt
import numpy as np
def hist_with_kde(data, bandwidth = 0.3):
#set number of bins using Freedman and Diaconis
q1 = np.percentile(data,25)
q3 = np.percentile(data,75)
n = len(data)**(.1/.3)
rng = max(data) - min(data)
iqr = 2*(q3-q1)
bins = int((n*rng)/iqr)
x = np.linspace(min(data),max(data),200)
kde = stats.gaussian_kde(data)
kde.covariance_factor = lambda : bandwidth
kde._compute_covariance()
plt.plot(x,kde(x),'r') # distribution function
plt.hist(data,bins=bins,normed=True) # histogram
data = np.random.randn(500)
hist_with_kde(data,0.25)

Categories

Resources