How to find the pH at the equivalence point using python - python

Create a function that finds the largest derivative in the derivative
list. Feel free to compare with the numpy function max. Let the
program print out what volume this corresponds to. This is the volume
of strong base added at the equivalence point. Also find the pH at the
equivalence point using your program.
I was able to find the first part of the question by making a function to find max and got the correct answer from that, but im stuck on how to use that information to find the pH at the equivalence point.
My code:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
fil = pd.read_csv('https://raw.githubusercontent.com/andreasdh/programmering-i-kjemi/master/docs/datafiler/titreringsdata.txt', delimiter = ",")
volum = fil['volum']
pH = fil['pH']
print(pH, volume)
plt.plot(volum, pH, color = "#B00B69", label = "Tilpasset modell")
plt.scatter(volum, pH, color = "hotpink", label = "Datapunkter")
plt.xlabel("volum")
plt.ylabel("pH")
plt.grid()
plt.show()
d = []
for i in range(len(volum)-1):
dery = pH[i+1] - pH[i]
dert = volum[i+1] - volum[i]
dydt = dery/dert
d.append(dydt)
print(d)
def fmax(list):
max = list[0]
for x in list:
if x > max:
max = x
return max
print('the biggest element in the derivative is', fmax(d))
I believe that at somepoint I will ahve to use matplotlib.pyplot to make a graph and scatter the data around but still can't understand what I'm supposed to do.

Related

How can I find the x-corresponding value of a function?

I have a function (Moorse potential in case anybody cares), and I want to find the coordinates of the minimum value in X. I can very easily find the minimum value in y with min(y), but, how can I find the x value asociated witgh the minimum y coordinate?
Copy of my code:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import math
from math import e
#para niquel
D= 0.4205 #profundidad de pozo
f = 2.7540 #distancia de equilibrio
a = 1.4199 #ancho de potencial
x = np.arange(-100,100,0.01) #Distancia interatómica / eje X
y = -D + D*(1-e**(-a*(x-f)))**2
plt.xlabel('Distancia interatómica [$\AA$]')
plt.ylabel('Energía [eV]')
plt.plot(x, y, color = 'mediumturquoise')
plt.xlim(2,5.5)
plt.ylim(-0.5, 0.75)
#plt.annotate('a$_{0}$', xy = (X, min(y)) ) HERE'S WHERE I'D NEED THE X COORDINATE!!
plt.show()
Thanks in advance!
You can use an argmin to find the minimum index of y, then based on how you generated the range, you can work backwards to get the actual value. You can do something like
x[np.argmin(y)]
Note that this might not be equivalent to
np.argmin(y) * 0.01 + -100
due to floating point rounding issues.
With some algebra (or a graphing calculator...) you can resolve the Morse Potential equation in terms of y and get an accurate and quick answer.
I went ahead and did this (using symbolab, you can confirm here) given your constants, and then plotted the graphs here to confirm they looked right. There are two mathematical functions due to the two cases arising when taking the square root.
Here's some code that should compute the x value based off the y value:
import math
def getXfromY(y):
if(y <= 2.754):
return -(math.ln(-math.sqrt(y/0.4205) + 1))/1.4199
elif(y > 2.754):
return -(math.ln(math.sqrt(y/0.4205) + 1))/1.4199

Simulate the compound random variable S

Let S=X_1+X_2+...+X_N where N is a nonnegative integer-valued random variable and X_1,X_2,... are i.i.d random variables.(If N=0, we set S=0).
Simulate S in the case where N ~ Poi(100) and X_i ~ Exp(0.5). (draw histograms and use the numpy or scipy built-in functions).And check the equations E(S)=E(N)*E(X_1) and Var(S)=E(N)*Var(X_1)+E(X_1)^2 *Var(N)
I was trying to solve it, but I'm not sure yet of everything and also got stuck on the histogram part. Note: I'm new to python or more generally , new to programming.
My work:
import scipy.stats as stats
import matplotlib as plt
N = stats.poisson(100)
X = stats.expon(0.5)
arr = X.rvs(N.rvs())
S = 0
for i in arr:
S=S+i
print(arr)
print("S=",S)
expected_S = (N.mean())*(X.mean())
variance_S = (N.mean()*X.var()) + (X.mean()*X.mean()*N.var())
print("E(X)=",expected_S)
print("Var(S)=",variance_S)
Your existing code mostly looks sensible, but I'd simplify:
arr = X.rvs(N.rvs())
S = 0
for i in arr:
S=S+i
down to:
S = X.rvs(N.rvs()).sum()
To draw a histogram, you need many samples from this distribution, which is now easily accomplished via:
arr = []
for _ in range(10_000):
arr.append(X.rvs(N.rvs()).sum())
or, equivalently, using a list comprehension:
arr = [X.rvs(N.rvs()).sum() for _ in range(10_000)]
to plot these in a histogram, you need the pyplot module from Matplotlib, so your import should be:
from matplotlib.pyplot import plt
plt.hist(arr, 50)
The 50 above says to use that number of "bins" when drawing the histogram. We can also compare these to the mean and variance you calculated by assuming the distribution is well approximated by a normal:
approx = stats.norm(expected_S, np.sqrt(variance_S))
_, x, _ = plt.hist(arr, 50, density=True)
plt.plot(x, approx.pdf(x))
This works because the second value returned from matplotlib's hist method are the locations of the bins. I used density=True so I could work with probability densities, but another option could be to just multiply the densities by the number of samples to get expected counts like the previous histogram.
Running this gives me:

Using np.interp to find x value for a given y gives wrong answer

I want to find the x value for a given y (I want to know at what t, X, the conversion, reaches 0.9). There are questions like this all over SO and they say use np.interp but I did that in two ways and both were wrong. The code is:
import numpy as np
import matplotlib.pyplot as plt
from scipy.integrate import odeint
# Create time domain
t = np.linspace(0,4000,100)
# Parameters
A = 1.5*10**(-3) # Arrhenius constant
T = 300 # Temperature [K]
R = 8.31 # Ideal gas constant [J/molK]
E_a= 1000 # Activation energy [J/mol]
V = 5 # Reactor volume [m3]
# Initial condition
C_A0 = 0.1 # Initial concentration [mol/m3]
def dNdt(C_A,t):
r_A = (-k*C_A)/V
dNdt = r_A*V
return dNdt
k=A*np.exp(-E_a/(R*T))
C_A = odeint(dNdt,C_A0,t)
N_A0 = C_A0*V
N_A = C_A*V
X = (N_A0 - N_A)/N_A0
# Plot
plt.figure()
plt.plot(t,X,'b-',label='Conversion')
plt.plot(t,C_A,'r--',label='Concentration')
plt.legend(loc='best')
plt.grid(True)
plt.xlabel('Time [s]')
plt.ylabel('Conversion')
Looking at the graph, at roughly t=2300, the conversion is 0.9.
Method 1:
I wrote this function so I can ask for any given point and get the x-value:
def find(x_val,f):
f = np.reshape(f,len(f))
global t
t = np.reshape(t,len(t))
return np.interp(x_val,t,f)
print('Conversion of 0.9 is reached at: ',int(find(0.9,X)),'s')
When I call the function at 0.9 I get 0.0008858 which gets rounded to 0 which is wrong. I thought maybe something is going wrong when I declare global t??
Method 2:
When I do it outside the function; so I manually reshape X and t and use np.interp(0.9,t,X), the output is 0.9.
X = np.reshape(X,len(X))
t = np.reshape(t,len(t))
print(np.interp(0.9,t,X))
I thought I made a mistake in the order of the variables so I did np.interp(0.9,X,t), and again it surprised me with 0.9.
I'm unsure as to where I'm going wrong. Any help would be appreciated. Many thanks :)
On your plot, t is horizontal and X is vertical. You want to find the horizontal coordinate where the vertical one is 0.9. That is, find t for a given X. Saying
find x value for a given y
is bound to lead to confusion, as it did here.
The problem is solved with
print(np.interp(0.9, X.ravel(), t)) # prints 2292.765497278863
(It's better to use ravel for flattening, instead of the reshape as you did). There is no need to reshape t, which is already one-dimensional.
I did np.interp(0.9,X,t), and again it surprised me with 0.9.
That sounds unlikely, you probably mistyped. This was the correct order.

FFT results Matlab VS Numpy (Python) : not the same results

I have a Matlab script to compute the DFT of a signal and plot it:
(data can be found here)
clc; clear; close all;
fid = fopen('s.txt');
txt = textscan(fid,'%f');
s = cell2mat(txt);
nFFT = 100;
fs = 24000;
deltaF = fs/nFFT;
FFFT = [0:nFFT/2-1]*deltaF;
win = hann(length(s));
sw = s.*win;
FFT = fft(sw, nFFT)/length(s);
FFT = [FFT(1); 2*FFT(2:nFFT/2)];
absFFT = 20*log10(abs(FFT));
plot(FFFT, absFFT)
grid on
I am trying to translate it to Python and can't get the same result.
import numpy as np
from matplotlib import pyplot as plt
x = np.genfromtxt("s.txt", delimiter=' ')
nfft = 100
fs = 24000
deltaF = fs/nfft;
ffft = [n * deltaF for n in range(nfft/2-1)]
ffft = np.array(ffft)
window = np.hanning(len(x))
xw = np.multiply(x, window)
fft = np.fft.fft(xw, nfft)/len(x)
fft = fft[0]+ [2*fft[1:nfft/2]]
fftabs = 20*np.log10(np.absolute(fft))
plt.figure()
plt.plot(ffft, np.transpose(fftabs))
plt.grid()
The plots I get (Matlab on the left, Python on the right):
What am I doing wrong?
Both codes are different in one case you concatenate two lists
FFT = [FFT(1); 2*FFT(2:nFFT/2)];
in the matlab code
in the other you add the first value of fft with the rest of the vector
fft = fft[0]+ [2*fft[1:nfft/2]]
'+' do not concatenate here because you have numpy array
In python, it should be:
fft = fft[0:nfft/2]
fft[1:nfft/2] = 2*fft[1:nfft/2]
I am not a Mathlab user so I am not sure but there are few things I'd ask to see if I can help you.
You called np.array after array has been made (ffft). That probably will not change the nature of array as well as you hoped, perhaps it would be better to try to define it inside np.array(n * deltaF for n in range(nfft/2-1)) I am not sure of formatting but you get the idea. The other thing is that the range doesn't seem right to me. You want it to have a value of 49?
Another one is the fft = fft[0]+ [2*fft[1:nfft/2]] compared to FFT = [FFT(1); 2*FFT(2:nFFT/2)]; I am not sure if the comparsion is accurate or not. It just seemed to be a different type of definition to me?
Also, when I do these type of calculations, I 'print' out the intermediate steps so I can compare the numbers to see where it breaks.
Hope this helps.
I found out that using np.fft.rfft instead of np.fft.fft and modifying the code as following does the job :
import numpy as np
from matplotlib import pyplot as pl
x = np.genfromtxt("../Matlab/s.txt", delimiter=' ')
nfft = 100
fs = 24000
deltaF = fs/nfft;
ffft = np.array([n * deltaF for n in range(nfft/2+1)])
window = np.hanning(len(x))
xw = np.multiply(x, window)
fft = np.fft.rfft(xw, nfft)/len(x)
fftabs = 20*np.log10(np.absolute(fft))
pl.figure()
pl.plot(np.transpose(ffft), fftabs)
pl.grid()
The resulting plot :
right result with Python
I can see that the first and the last points, as well as the amplitudes are not the same. It isn't a problem for me (I am more interested in the general shape), but if someone can explain, I'd be happy.

get bins coordinates with hexbin in matplotlib

I use matplotlib's method hexbin to compute 2d histograms on my data.
But I would like to get the coordinates of the centers of the hexagons in order to further process the results.
I got the values using get_array() method on the result, but I cannot figure out how to get the bins coordinates.
I tried to compute them given number of bins and the extent of my data but i don't know the exact number of bins in each direction. gridsize=(10,2) should do the trick but it does not seem to work.
Any idea?
I think this works.
from __future__ import division
import numpy as np
import math
import matplotlib.pyplot as plt
def generate_data(n):
"""Make random, correlated x & y arrays"""
points = np.random.multivariate_normal(mean=(0,0),
cov=[[0.4,9],[9,10]],size=int(n))
return points
if __name__ =='__main__':
color_map = plt.cm.Spectral_r
n = 1e4
points = generate_data(n)
xbnds = np.array([-20.0,20.0])
ybnds = np.array([-20.0,20.0])
extent = [xbnds[0],xbnds[1],ybnds[0],ybnds[1]]
fig=plt.figure(figsize=(10,9))
ax = fig.add_subplot(111)
x, y = points.T
# Set gridsize just to make them visually large
image = plt.hexbin(x,y,cmap=color_map,gridsize=20,extent=extent,mincnt=1,bins='log')
# Note that mincnt=1 adds 1 to each count
counts = image.get_array()
ncnts = np.count_nonzero(np.power(10,counts))
verts = image.get_offsets()
for offc in xrange(verts.shape[0]):
binx,biny = verts[offc][0],verts[offc][1]
if counts[offc]:
plt.plot(binx,biny,'k.',zorder=100)
ax.set_xlim(xbnds)
ax.set_ylim(ybnds)
plt.grid(True)
cb = plt.colorbar(image,spacing='uniform',extend='max')
plt.show()
I would love to confirm that the code by Hooked using get_offsets() works, but I tried several iterations of the code mentioned above to retrieve center positions and, as Dave mentioned, get_offsets() remains empty. The workaround that I found is to use the non-empty 'image.get_paths()' option. My code takes the mean to find centers but which means it is just a smidge longer, but it does work.
The get_paths() option returns a set of x,y coordinates embedded that can be looped over and then averaged to return the center position for each hexagram.
The code that I have is as follows:
counts=image.get_array() #counts in each hexagon, works great
verts=image.get_offsets() #empty, don't use this
b=image.get_paths() #this does work, gives Path([[]][]) which can be plotted
for x in xrange(len(b)):
xav=np.mean(b[x].vertices[0:6,0]) #center in x (RA)
yav=np.mean(b[x].vertices[0:6,1]) #center in y (DEC)
plt.plot(xav,yav,'k.',zorder=100)
I had this same problem. I think what needs to be developed is a framework to have a HexagonalGrid object which can then be applied to many different data sets (and it would be awesome to do it for N dimensions). This is possible and it surprises me that neither Scipy or Numpy has anything for it (furthermore there seems to be nothing else like it except perhaps binify)
That said, I assume you want to use hexbinning to compare multiple binned data sets. This requires some common base. I got this to work using matplotlib's hexbin the following way:
import numpy as np
import matplotlib.pyplot as plt
def get_data (mean,cov,n=1e3):
"""
Quick fake data builder
"""
np.random.seed(101)
points = np.random.multivariate_normal(mean=mean,cov=cov,size=int(n))
x, y = points.T
return x,y
def get_centers (hexbin_output):
"""
about 40% faster than previous post only cause you're not calculating the
min/max every time
"""
paths = hexbin_output.get_paths()
v = paths[0].vertices[:-1] # adds a value [0,0] to the end
vx,vy = v.T
idx = [3,0,5,2] # index for [xmin,xmax,ymin,ymax]
xmin,xmax,ymin,ymax = vx[idx[0]],vx[idx[1]],vy[idx[2]],vy[idx[3]]
half_width_x = abs(xmax-xmin)/2.0
half_width_y = abs(ymax-ymin)/2.0
centers = []
for i in xrange(len(paths)):
cx = paths[i].vertices[idx[0],0]+half_width_x
cy = paths[i].vertices[idx[2],1]+half_width_y
centers.append((cx,cy))
return np.asarray(centers)
# important parts ==>
class Hexagonal2DGrid (object):
"""
Used to fix the gridsize, extent, and bins
"""
def __init__ (self,gridsize,extent,bins=None):
self.gridsize = gridsize
self.extent = extent
self.bins = bins
def hexbin (x,y,hexgrid):
"""
To hexagonally bin the data in 2 dimensions
"""
fig = plt.figure()
ax = fig.add_subplot(111)
# Note mincnt=0 so that it will return a value for every point in the
# hexgrid, not just those with count>mincnt
# Basically you fix the gridsize, extent, and bins to keep them the same
# then the resulting count array is the same
hexbin = plt.hexbin(x,y, mincnt=0,
gridsize=hexgrid.gridsize,
extent=hexgrid.extent,
bins=hexgrid.bins)
# you could close the figure if you don't want it
# plt.close(fig.number)
counts = hexbin.get_array().copy()
return counts, hexbin
# Example ===>
if __name__ == "__main__":
hexgrid = Hexagonal2DGrid((21,5),[-70,70,-20,20])
x_data,y_data = get_data((0,0),[[-40,95],[90,10]])
x_model,y_model = get_data((0,10),[[100,30],[3,30]])
counts_data, hexbin_data = hexbin(x_data,y_data,hexgrid)
counts_model, hexbin_model = hexbin(x_model,y_model,hexgrid)
# if you want the centers, they will be the same for both
centers = get_centers(hexbin_data)
# if you want to ignore the cells with zeros then use the following mask.
# But if want zeros for some bins and not others I'm not sure an elegant way
# to do this without using the centers
nonzero = counts_data != 0
# now you can compare the two data sets
variance_data = counts_data[nonzero]
square_diffs = (counts_data[nonzero]-counts_model[nonzero])**2
chi2 = np.sum(square_diffs/variance_data)
print(" chi2={}".format(chi2))

Categories

Resources