Interpolation and Root Finding - python

For an assignment I had to interpolate data [linear interpolation and cubic interpolation] and create two graphs that showed the data as points and the interpolation as a line. I also had to plot the max value of the data. I got all of this to work, but was assigned a last part for extra credit and could not figure it out. V represents the voltage and I represents the current.
Extra assignment: Expand your program so that it also determines the maximum current, and the corresponding voltage, using a different method known as “root finding”. To accomplish this, you will need to numerically differentiate the function. (The numpy function named “diff” will be useful.) Then to carry out the root finding, you can use the “brentq” function From the scipy.optimize library.
from pylab import *
from scipy.interpolate import interp1d
from scipy.optimize import brentq
#load text into two variables V[Voltage] and I[Current]
V, I = loadtxt('data1(1).txt', unpack = True, skiprows = 1)
#voltage and current interpolated[linear]
f_line = interp1d(V, I, 'linear')
new_V = linspace(0, 12, 1000) #array of new voltages created
new_I = f_line(new_V) #array of new currents created from interpolated data
#voltage, current, new voltage, and new current plotted.
plot(V, I, 'ro', new_V, new_I, 'b-')
plot(max(new_I), 'go') #max current plotted
title("Linear Interpolation")
xlabel("Voltage (V)")
xlim(0,12)
ylabel("Current (mA)")
legend(['Data', 'linear Interp', 'Max Current'], loc = 'best')
show()
print "The maximum current is", max(new_I), "mA"
#voltage and current interpolated[cubic]
f_cube = interp1d(V, I, 'cubic')
new_V = linspace(0, 12, 1000) #array of new voltages created
new_I = f_cube(new_V) #array of new currents created from interpolated data
index = argmax(new_I) #index of max current
#voltage, current, new voltage, and new current plotted.
plot(V, I, 'ro', new_V, new_I, 'b-')
plot(new_V[index], max(new_I), 'go') #max current and voltage plotted
title("Cubic Interpolation")
xlabel("Voltage (V)")
xlim(0,12)
ylabel("Current (mA)")
ylim(0,1.4)
legend(['Data', 'linear Interp', 'Max Current'], loc = 'best')
show()
print "The maximum current is", max(new_I), "mA"
print "The corresponding maxium voltage is", new_V[index], "V"
All the above code works, but it's the last part I'm unsure how to start on. I'd appreciate any suggestions or help. I did attempt it, but it threw an error and I was really unsure of how to handle it, as I don't know much about the functions (diff, brentq) being used, or how to use them to find the max current and voltage.
#load text into two variables V[Voltage] and I[Current]
V, I = loadtxt('data1(1).txt', unpack = True, skiprows = 1)
f_line = interp1d(V, I, 'linear')
new_V = linspace(0, 12, 1000) #array of new voltages created
d = diff(new_V, 1)
r = brentq(d)
print r
10 d = diff(new_V, 1)
11
---> 12 r = brentq(d)
13 print r
TypeError: brentq() takes at least 3 arguments (1 given)
I understand what the error is saying, but I don't really know know the correct arguments to give it in order to find the max current and voltage by "root finding" as was asked. Any suggestions are appreciated.

Related

How to find the pH at the equivalence point using python

Create a function that finds the largest derivative in the derivative
list. Feel free to compare with the numpy function max. Let the
program print out what volume this corresponds to. This is the volume
of strong base added at the equivalence point. Also find the pH at the
equivalence point using your program.
I was able to find the first part of the question by making a function to find max and got the correct answer from that, but im stuck on how to use that information to find the pH at the equivalence point.
My code:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
fil = pd.read_csv('https://raw.githubusercontent.com/andreasdh/programmering-i-kjemi/master/docs/datafiler/titreringsdata.txt', delimiter = ",")
volum = fil['volum']
pH = fil['pH']
print(pH, volume)
plt.plot(volum, pH, color = "#B00B69", label = "Tilpasset modell")
plt.scatter(volum, pH, color = "hotpink", label = "Datapunkter")
plt.xlabel("volum")
plt.ylabel("pH")
plt.grid()
plt.show()
d = []
for i in range(len(volum)-1):
dery = pH[i+1] - pH[i]
dert = volum[i+1] - volum[i]
dydt = dery/dert
d.append(dydt)
print(d)
def fmax(list):
max = list[0]
for x in list:
if x > max:
max = x
return max
print('the biggest element in the derivative is', fmax(d))
I believe that at somepoint I will ahve to use matplotlib.pyplot to make a graph and scatter the data around but still can't understand what I'm supposed to do.

Using np.interp to find x value for a given y gives wrong answer

I want to find the x value for a given y (I want to know at what t, X, the conversion, reaches 0.9). There are questions like this all over SO and they say use np.interp but I did that in two ways and both were wrong. The code is:
import numpy as np
import matplotlib.pyplot as plt
from scipy.integrate import odeint
# Create time domain
t = np.linspace(0,4000,100)
# Parameters
A = 1.5*10**(-3) # Arrhenius constant
T = 300 # Temperature [K]
R = 8.31 # Ideal gas constant [J/molK]
E_a= 1000 # Activation energy [J/mol]
V = 5 # Reactor volume [m3]
# Initial condition
C_A0 = 0.1 # Initial concentration [mol/m3]
def dNdt(C_A,t):
r_A = (-k*C_A)/V
dNdt = r_A*V
return dNdt
k=A*np.exp(-E_a/(R*T))
C_A = odeint(dNdt,C_A0,t)
N_A0 = C_A0*V
N_A = C_A*V
X = (N_A0 - N_A)/N_A0
# Plot
plt.figure()
plt.plot(t,X,'b-',label='Conversion')
plt.plot(t,C_A,'r--',label='Concentration')
plt.legend(loc='best')
plt.grid(True)
plt.xlabel('Time [s]')
plt.ylabel('Conversion')
Looking at the graph, at roughly t=2300, the conversion is 0.9.
Method 1:
I wrote this function so I can ask for any given point and get the x-value:
def find(x_val,f):
f = np.reshape(f,len(f))
global t
t = np.reshape(t,len(t))
return np.interp(x_val,t,f)
print('Conversion of 0.9 is reached at: ',int(find(0.9,X)),'s')
When I call the function at 0.9 I get 0.0008858 which gets rounded to 0 which is wrong. I thought maybe something is going wrong when I declare global t??
Method 2:
When I do it outside the function; so I manually reshape X and t and use np.interp(0.9,t,X), the output is 0.9.
X = np.reshape(X,len(X))
t = np.reshape(t,len(t))
print(np.interp(0.9,t,X))
I thought I made a mistake in the order of the variables so I did np.interp(0.9,X,t), and again it surprised me with 0.9.
I'm unsure as to where I'm going wrong. Any help would be appreciated. Many thanks :)
On your plot, t is horizontal and X is vertical. You want to find the horizontal coordinate where the vertical one is 0.9. That is, find t for a given X. Saying
find x value for a given y
is bound to lead to confusion, as it did here.
The problem is solved with
print(np.interp(0.9, X.ravel(), t)) # prints 2292.765497278863
(It's better to use ravel for flattening, instead of the reshape as you did). There is no need to reshape t, which is already one-dimensional.
I did np.interp(0.9,X,t), and again it surprised me with 0.9.
That sounds unlikely, you probably mistyped. This was the correct order.

Plot sparsely populated 2d numpy array

from an iterative image pattern search with decreasing step size I have a 'quality' array. Due to the nature of the search pattern the array is not fully filled. In the first iteration I go with stepsize 10, find the best spot and there search a +-10 XY range to find the true best spot. So most of the array has every 10th slot filled and there is the small 'best' region that is densely filled. Now I want to plot this array and would want the plot to be 'interpolated' where needed by using the data every 10th slot. Now to do my search I initialize the array with a huge value. All my measurements are smaller and later I use the np.argmin(q) function. That works fine for searching but for plotting it is bad. The dynamic range of the plot is lost.
Here is an example from an older version of the code that does exhaustive but unnecessarily long search :
And here is what I get with the optimized search :
Here is the piece of code that does the plots. (q is the quality array to plot)
fig= plt.figure(1)
im= plt.imshow(q[::-1], cmap='rainbow', interpolation='none', extent=[-search_size,search_size,-search_size,search_size])
fig.savefig(pfn(img_fn), bbox_inches='tight')
The issue may point back to the initialization of the array. Again as I do a minimum search I do this :
q = np.empty(shape=(2*search_size,2*search_size))
q.fill(+1e20)
q_min = 1e20
for xs in range(-search_size,+search_size,search_step):
for ys in range(-search_size,+search_size,search_step):
img_shift = np.zeros_like(img)
img_shift[mom(ys):non(ys), mom(xs):non(xs)] = img[mom(-ys):non(-ys), mom(-xs):non(-xs)]
d = np.absolute(img_shift - prev_img)[search_size:-search_size,search_size:-search_size]
q[ys+search_size,xs+search_size] = np.sum(d)
if q[ys+search_size,xs+search_size] < q_min : q_min= q[ys+search_size,xs+search_size]
#print '1st iter try : %+3d %+3d %6.3f %6.3f' % ( xs, ys, q[ys+search_size,xs+search_size], q_min)
idxmin = np.argmin(q)
dy,dx = np.unravel_index(idxmin, q.shape)
dx= dx-search_size
dy= dy-search_size
print '1st iter best : dx= %+3d dy= %+3d' % ( dx , dy )
Then follows another loop with search_step = 1.
Is it possible to initialize the array i.e. with NaN ? Would that allow the minimum search? And/or would it allow the plotter to jump accross undefined entries?
So what's the best way to initialize / plot so that the search works and the plots look good?
Thanks,
Gert
Update #Nix G-D
The averaging fails. I first tried code following the recommendation.
q_int = pd.DataFrame(q).interpolate(method='linear', axis=0).values
fig= plt.figure(1)
im= plt.imshow(q_int[::-1], cmap='rainbow', interpolation='none', extent=[-search_size,search_size,-search_size,search_size])
However the 2D interpolation failed. (at least as indicated by the plot)
I tried to add code to perform X and Y interpolation.
q_int = pd.DataFrame(q).interpolate(method='linear', axis=0).values
q_int = pd.DataFrame(q_intx).interpolate(method='linear', axis=1).values
fig= plt.figure(1)
im= plt.imshow(q_int[::-1], cmap='rainbow', interpolation='none', extent=[-search_size,search_size,-search_size,search_size])
But results still were corrupted.
Best,
Gert
You can initialize the array with NaN easily:
shape = (2*search_size, 2*search_size)
q = np.full(shape, np.nan)
This can then be searched as normal. To find the minimum indices ignoring NaNs, you can use np.nanargmin()
In [12]: np.nanargmin([1,-1,4,float('nan')])
Out[12]: 1
To get rid of these NaN values we can use, pandas.DataFrame.interpolate():
q_interpolated = pd.DataFrame(q).interpolate(method='linear', axis=0).values

Python - optimizing plot code

I am working on a live plot.I am getting data from a spectrum analyser which gives me the value at a certain frequency. But the program becomes slower the longer it runs.
So I hope you have some ideas. I also looked at my activity monitor while running and the RAM isn't full at all.
I tried to comment out ctf = ax.contourf( a, b, B, cmap=cma) which is responsible for plotting and if it don't need to draw it is so fast. But I need the plot so not drawing is not a solution at all.
And ax = plt.subplot( 111, polar = True) for extra information.
Here is my code:
while True :
trace = inst.query(':TRACe:DATA? TRACE1').partition(' ')[2][:-2].split(', ')# the first & last 2 entries are cut off, are random numbers
for value in trace : #write to file
f.write(value)
f.write('\n')
try : #looking if data is alright
trace = np.array(trace, np.float)
except ValueError: #if a ValueError is raised this message is displayed but the loop won't break and the piece is plotted in one color (green)
print'Some wrong data at the', i+1, 'th measurement'
longzeroarray = np.zeros(801)
a = np.linspace(i*np.pi/8-np.pi/16, i*np.pi/8+np.pi/16, 2)#Angle, circle is divided into 16 pieces
b = np.linspace(start -scaleplot, stop,801) #points of the frequency + 200 more points to gain the inner circle
A, B = np.meshgrid(a, longzeroarray)
cma = ListedColormap(['w'])
#actual plotting
ctf = ax.contourf( a, b, B, cmap=cma)
xCooPoint = i*np.pi/8 + np.pi/16 #shows the user the position of the plot
yCooPoint = stop
ax.plot(xCooPoint, yCooPoint, 'or', markersize = 15)
xCooWhitePoint = (i-1) * np.pi/8 + np.pi/16 #this erases the old red points
yCooWhitePoint = stop
ax.plot(xCooWhitePoint, yCooWhitePoint, 'ow', markersize = 15)
plt.draw()
time.sleep(60) #delaying the time to give analyser time to give us new correct data in the next step
i +=1
continue
maximasearch(trace,searchrange)
trace = np.insert(trace,0,zeroarray)
a = np.linspace(i*np.pi/8+np.pi/16-np.pi/8, i*np.pi/8+np.pi/16, 2)#Angle, circle is divided into 16 pieces
b = np.linspace(start -scaleplot, stop,801) #points of the frequency + 200 more points to gain the inner circle
A, B = np.meshgrid(a, trace)
#actual plotting
ctf = ax.contourf(a, b, B, cmap=cm.jet, vmin=-100, vmax=100)
xCooPoint = i*np.pi/8 + np.pi/16 #shows the user the position of the plot
yCooPoint = stop
ax.plot(xCooPoint, yCooPoint, 'or', markersize = 15)
xCooWhitePoint = (i-1) * np.pi/8 + np.pi/16 #this erases the old red points
yCooWhitePoint = stop
ax.plot(xCooWhitePoint, yCooWhitePoint, 'ow', markersize = 15)
plt.draw()
i+=1
Thats how the plot looks like, and with every new step a new piece of the circle is drawn.
EDIT
I found following question here on stack overflow: real-time plotting in while loop with matplotlib
I think the answer with 22 Upvotes could be helpful. Has anyone ever used blit ? I have no idea yet how to combine it with my code.
http://wiki.scipy.org/Cookbook/Matplotlib/Animations
I want to answer my own question again.
The best way to optimize the code is to calculate with modulo 2*pi for the radial values.
I changed my code a bit :
a = np.linspace((i*np.pi/8+np.pi/16-np.pi/8)%(np.pi*2), (i*np.pi/8+np.pi/16)%(np.pi*2), 2)
The problem before was that Python also plotted all the old pieces, because obviously the were still there but only under a layer of newly plotted data pieces. So although you didn't see the old plotted data, it was still drawn. Now only the circle from 0 to 2pi is redrawn.

scipy.interpolate.UnivariateSpline not smoothing regardless of parameters

I'm having trouble getting scipy.interpolate.UnivariateSpline to use any smoothing when interpolating. Based on the function's page as well as some previous posts, I believe it should provide smoothing with the s parameter.
Here is my code:
# Imports
import scipy
import pylab
# Set up and plot actual data
x = [0, 5024.2059124920379, 7933.1645067836089, 7990.4664106277542, 9879.9717114947653, 13738.60563208926, 15113.277958924193]
y = [0.0, 3072.5653360000988, 5477.2689107965398, 5851.6866463790966, 6056.3852496014106, 7895.2332350173638, 9154.2956175610598]
pylab.plot(x, y, "o", label="Actual")
# Plot estimates using splines with a range of degrees
for k in range(1, 4):
mySpline = scipy.interpolate.UnivariateSpline(x=x, y=y, k=k, s=2)
xi = range(0, 15100, 20)
yi = mySpline(xi)
pylab.plot(xi, yi, label="Predicted k=%d" % k)
# Show the plot
pylab.grid(True)
pylab.xticks(rotation=45)
pylab.legend( loc="lower right" )
pylab.show()
Here is the result:
I have tried this with a range of s values (0.01, 0.1, 1, 2, 5, 50), as well as explicit weights, set to either the same thing (1.0) or randomized. I still can't get any smoothing, and the number of knots is always the same as the number of data points. In particular, I'm looking for outliers like that 4th point (7990.4664106277542, 5851.6866463790966) to be smoothed over.
Is it because I don't have enough data? If so, is there a similar spline function or cluster technique I can apply to achieve smoothing with this few datapoints?
Short answer: you need to choose the value for s more carefully.
The documentation for UnivariateSpline states that:
Positive smoothing factor used to choose the number of knots. Number of
knots will be increased until the smoothing condition is satisfied:
sum((w[i]*(y[i]-s(x[i])))**2,axis=0) <= s
From this one can deduce that "reasonable" values for smoothing, if you don't pass in explicit weights, are around s = m * v where m is the number of data points and v the variance of the data. In this case, s_good ~ 5e7.
EDIT: sensible values for s depend of course also on the noise level in the data. The docs seem to recommend choosing s in the range (m - sqrt(2*m)) * std**2 <= s <= (m + sqrt(2*m)) * std**2 where std is the standard deviation associated with the "noise" you want to smooth over.
#Zhenya's answer of manually setting knots in between datapoints was too rough to deliver good results in noisy data without being selective about how this technique is applied. However, inspired by his/her suggestion, I have had success with Mean-Shift clustering from the scikit-learn package. It performs auto-determination of the cluster count and seems to do a fairly good smoothing job (very smooth in fact).
# Imports
import numpy
import pylab
import scipy
import sklearn.cluster
# Set up original data - note that it's monotonically increasing by X value!
data = {}
data['original'] = {}
data['original']['x'] = [0, 5024.2059124920379, 7933.1645067836089, 7990.4664106277542, 9879.9717114947653, 13738.60563208926, 15113.277958924193]
data['original']['y'] = [0.0, 3072.5653360000988, 5477.2689107965398, 5851.6866463790966, 6056.3852496014106, 7895.2332350173638, 9154.2956175610598]
# Cluster data, sort it and and save
inputNumpy = numpy.array([[data['original']['x'][i], data['original']['y'][i]] for i in range(0, len(data['original']['x']))])
meanShift = sklearn.cluster.MeanShift()
meanShift.fit(inputNumpy)
clusteredData = [[pair[0], pair[1]] for pair in meanShift.cluster_centers_]
clusteredData.sort(lambda pair1, pair2: cmp(pair1[0],pair2[0]))
data['clustered'] = {}
data['clustered']['x'] = [pair[0] for pair in clusteredData]
data['clustered']['y'] = [pair[1] for pair in clusteredData]
# Build a spline using the clustered data and predict
mySpline = scipy.interpolate.UnivariateSpline(x=data['clustered']['x'], y=data['clustered']['y'], k=1)
xi = range(0, round(max(data['original']['x']), -3) + 3000, 20)
yi = mySpline(xi)
# Plot the datapoints
pylab.plot(data['clustered']['x'], data['clustered']['y'], "D", label="Datapoints (%s)" % 'clustered')
pylab.plot(xi, yi, label="Predicted (%s)" % 'clustered')
pylab.plot(data['original']['x'], data['original']['y'], "o", label="Datapoints (%s)" % 'original')
# Show the plot
pylab.grid(True)
pylab.xticks(rotation=45)
pylab.legend( loc="lower right" )
pylab.show()
While I'm not aware of any library which will do it for you off-hand, I'd try a bit more DIY approach: I'd start from making a spline with knots in between the raw data points, in both x and y. In your particular example, having a single knot in between the 4th and 5th points should do the trick, since it'd remove the huge derivative at around x=8000.
I had trouble getting BigChef's answer running, here is a variation that works on python 3.6:
# Imports
import pylab
import scipy
import sklearn.cluster
# Set up original data - note that it's monotonically increasing by X value!
data = {}
data['original'] = {}
data['original']['x'] = [0, 5024.2059124920379, 7933.1645067836089, 7990.4664106277542, 9879.9717114947653, 13738.60563208926, 15113.277958924193]
data['original']['y'] = [0.0, 3072.5653360000988, 5477.2689107965398, 5851.6866463790966, 6056.3852496014106, 7895.2332350173638, 9154.2956175610598]
# Cluster data, sort it and and save
import numpy
inputNumpy = numpy.array([[data['original']['x'][i], data['original']['y'][i]] for i in range(0, len(data['original']['x']))])
meanShift = sklearn.cluster.MeanShift()
meanShift.fit(inputNumpy)
clusteredData = [[pair[0], pair[1]] for pair in meanShift.cluster_centers_]
clusteredData.sort(key=lambda li: li[0])
data['clustered'] = {}
data['clustered']['x'] = [pair[0] for pair in clusteredData]
data['clustered']['y'] = [pair[1] for pair in clusteredData]
# Build a spline using the clustered data and predict
mySpline = scipy.interpolate.UnivariateSpline(x=data['clustered']['x'], y=data['clustered']['y'], k=1)
xi = range(0, int(round(max(data['original']['x']), -3)) + 3000, 20)
yi = mySpline(xi)
# Plot the datapoints
pylab.plot(data['clustered']['x'], data['clustered']['y'], "D", label="Datapoints (%s)" % 'clustered')
pylab.plot(xi, yi, label="Predicted (%s)" % 'clustered')
pylab.plot(data['original']['x'], data['original']['y'], "o", label="Datapoints (%s)" % 'original')
# Show the plot
pylab.grid(True)
pylab.xticks(rotation=45)
pylab.show()

Categories

Resources