Nan values when using np.linspace() as input - python

I am trying to calculate the probability of transmission for an electron through a series of potential wells. When looping through energy values using np.linspace() I get a return of nan for any value under 15. I understand this for values of 0 and 15, since they return a value of zero in the denominator for the k and q values. If I simply call getT(5) for example, I get a real value. However when getT(5) gets called from the loop using np.linspace(0,30,2001) then it returns nan. Shouldnt it return either nan or a value in both cases?
import numpy as np
import matplotlib.pyplot as plt
def getT(Ein):
#constants
hbar=1.055e-34 #J-s
m=9.109e-31 #mass of electron kg
N=10 #number of cells
a=1e-10 #meters
b=2e-10 #meters
#convert energy and potential to Joules
conv_J=1.602e-19
E_eV=Ein
V_eV=15
E=conv_J*E_eV
V=conv_J*V_eV
#calculate values for k and q
k=(2*m*E/hbar**2)**.5
q=(2*m*(E-V)/hbar**2)**.5
#create M1, M2 Matrices
M1=np.matrix([[((q+k)/(2*q))*np.exp(1j*k*b),((q-k)/(2*q))*np.exp(-1j*k*b)], \
[((q-k)/(2*q))*np.exp(1j*k*b),((q+k)/(2*q))*np.exp(-1j*k*b)]])
M2=np.matrix([[((q+k)/(2*k))*np.exp(1j*q*a),((k-q)/(2*k))*np.exp(-1j*q*a)], \
[((k-q)/(2*k))*np.exp(1j*q*a),((q+k)/(2*k))*np.exp(-1j*q*a)]])
#calculate M_Cell
M_Cell=M1*M2
#calculate M for N cells
M=M_Cell**N
#get items in M_Cell
M11=M.item(0,0)
M12=M.item(0,1)
M21=M.item(1,0)
M22=M.item(1,1)
#calculate r and t values
r=-M21/M22
t=M11-M12*M21/M22
#calculate final T value
T=abs(t)**2
return Ein,T
#create empty array for data to plot
data=[]
#Calculate T for 500 values of E in between 0 and 30 eV
for i in np.linspace(0,30,2001):
data.append(getT(i))
data=np.transpose(data)
#generate plot
fig, (ax1)=plt.subplots(1)
ax1.set_xlim([0,30])
ax1.set_xlabel('Energy (eV)',fontsize=32)
ax1.set_ylabel('T',fontsize=32)
ax1.grid()
plt.tick_params(labelsize=32)
plt.plot(data[0],data[1],lw=6)
plt.draw()
plt.show()

I think the difference comes from the line
q=(2*m*(E-V)/hbar**2)**.5
When testing with single values between 0 and 15, you're basically taking the root of a negative number (because E-V is negative), which is irrational, for example:
(-2)**0.5
>> (8.659560562354934e-17+1.4142135623730951j)
But when using np.linspace, you take the root of a NumPy array with negative values, which results in nan (and a warning):
np.array(-2)**0.5
>> RuntimeWarning: invalid value encountered in power
>> nan

Related

Optimization function yields wrong results

im trying to replicate a certain code from yuxing Yan's python for finance.
I am at a road block because I am getting very high minimized figures(in this case stock weights, which ca be both +(long) and (-short) after optimization with fmin().
can anyone help me with a fresh pair of eyes. I have seen some suggestion about avoiding passing negative or complex figures to fmin() but I can't afford to as its vital to my code
#Lets import our modules
from scipy.optimize import fmin #to minimise our negative sharpe-ratio
import numpy as np#deals with numbers python
from datetime import datetime#handles date objects
import pandas_datareader.data as pdr #to read download equity data
import pandas as pd #for reading and accessing tables etc
import scipy as sp
from scipy.stats import norm
import scipy.stats as stats
from scipy.optimize import fminbound
assets=('AAPL',
'IBM',
'GOOG',
'BP',
'XOM',
'COST',
'GS')
#start and enddate to be downloaded
startdate='2016-01-01'
enddate='2016-01-31'
rf_rate=0.0003
n=len(assets)
#_______________________________________________
#This functions takes the assets,start and end dates and
#returns portfolio return
#__________________________________________________
def port_returns (assets,startdate,enddate):
#We use adjusted clsoing prices of sepcified dates of assets
#as we will only be interested in returns
data = pdr.get_data_yahoo(assets, start=startdate, end=enddate)['Adj Close']
#We calculate the percentage change of our returns
#using pct_change function in python
returns=data.pct_change()
return returns
def portfolio_variance(returns,weight):
#finding the correlation of our returns by
#dropping the nan values and transposing
correlation_coefficient = np.corrcoef(returns.dropna().T)
#standard deviation of our returns
std=np.std(returns,axis=0)
#initialising our variance
port_var = 0.0
#creating a nested loop to calculate our portfolio variance
#where the variance is w12σ12 + w22σ22 + 2w1w2(Cov1,2)
#and correlation coefficient is given by covaraince btn two assets divided by standard
#multiplication of standard deviation of both assets
for i in range(n):
for j in range(n):
#we calculate the variance by continuously summing up the varaince between two
#assets using i as base loop, multiplying by std and corrcoef
port_var += weight[i]*weight[j]*std[i]*std[j]*correlation_coefficient[i, j]
return port_var
def sharpe_ratio(returns,weights):
#call our variance function
variance=portfolio_variance(returns,weights)
avg_return=np.mean(returns,axis=0)
#turn our returns to an array
returns_array = np.array(avg_return)
#Our sharpe ratio uses expected return gotten from multiplying weights and return
# and standard deviation gotten by square rooting our variance
#https://en.wikipedia.org/wiki/Sharpe_ratio
return (np.dot(weights,returns_array) - rf_rate)/np.sqrt(variance)
def negate_sharpe_ratio(weights):
#returns=port_returns (assets,startdate,enddate)
#creating an array with our weights by
#summing our n-1 inserted and subtracting by 1 to make our last weight
weights_new=np.append(weights,1-sum(weights))
#returning a negative sharpe ratio
return -(sharpe_ratio(returns_data,weights_new))
returns_data=port_returns(assets,startdate,enddate)
# for n stocks, we could only choose n-1 weights
ones_weights_array= (np.ones(n-1, dtype=float) * 1.0 )/n
weight_1 = fmin(negate_sharpe_ratio,ones_weights_array)
final_weight = np.append(weight_1, 1 - sum(weight_1))
final_sharpe_ratio = sharpe_ratio(returns_data,final_weight)
print ('Optimal weights are ')
print (final_weight)
print ('final Sharpe ratio is ')
print(final_sharpe_ratio)
A few things are causing your code not to work as written
is assets the list of items in ticker?
shouldstartdate be set equal to begdate?
Your call to port_returns() is looking for both assets and startdate which are never defined.
Function sharpe_ratio() is looking for a variable called rf_rate which is never defined. I assume this is the risk-free rate and the value assigned to rf at the beginning of the script. So should rf be called rf_rate instead?
After changing rf to rf_rate, begdate to startdate, and setting assets = list(ticker), it appears that this will work as written

Python: plotting across time with a single column file

I must write a function that allows me to find the local max and min from a series of values.
Data for function is x, y of each "peak".
Output are 4 vectors that contain x, y max and min "peaks".
To find max peaks, I must "stand" on each data point and check it is mare or less than neighbors on both sides in order to decide if it is a peak (save as max/min peak).
Points on both ends only have 1 neighbor, do not consider those for this analysis.
Then write a program to read a data file and invoke the function to calculate the peaks. The program must generate a graph showing the entered data with the calculated peaks.
1st file is an Array of float64 of (2001,) size. All data is in column 0. This file represents the amplitude of a signal in time, frequency of sampling is 200Hz. Asume initial time is 0.
Graph should look like this
Program must also generate an .xls file that shows 2 tables; 1 with min peaks, and another with max peaks. Each table must be titled and consist of 2 column, one with the time at which peaks occur, and the other with the amplitude of each peak.
No Pandas allowed.
first file is a .txt file, and is single column, 2001 rows total
0
0.0188425
0.0376428
0.0563589
0.0749497
0.0933749
0.111596
0.129575
0.147277
0.164669
0.18172
...
Current attempt:
import numpy as np
import matplotlib.pyplot as plt
filename = 'location/file_name.txt'
T = np.loadtxt(filename,comments='#',delimiter='\n')
x = T[::1] # all the files of column 0 are x vales
a = np.empty(x, dtype=array)
y = np.linspace[::1/200]
X, Y = np.meshgrid(x,y)
This does what you ask. I had to generate random data, since you didn't share yours. You can surely build your spreadsheet from the minima and maxima values.
import numpy as np
import matplotlib.pyplot as plt
#filename = 'location/file_name.txt'
#T = np.loadtxt(filename,comments='#',delimiter='\n')
#
#y = T[::1] # all the files of column 0 are x vales
y = np.random.random(200) * 2.0
minima = []
maxima = []
for i in range(0,y.shape[0]-1):
if y[i-1] < y[i] and y[i+1] < y[i]:
maxima.append( (i/200, y[i]) )
if y[i-1] > y[i] and y[i+1] > y[i]:
minima.append( (i/200, y[i]) )
minima = np.array(minima)
maxima = np.array(maxima)
print(minima)
print(maxima)
x = np.linspace(0, 1, 200 )
plt.plot( x, y )
plt.scatter( maxima[:,0], maxima[:,1] )
plt.show()

Apply a function to a numpy array

I have a numpy array with data from yahoo finance that i got like this:
!pip install yfinance
import yfinance
tickers = yfinance.Tickers('GCV22.CMX CLV22.NYM')
So for each symbol I have open low high close prices as well as volume, on a daily basis:
Open High Low Close Volume Dividends Stock Splits
Date
2021-09-20 1752.000000 1766.000000 1740.500000 1761.800049 3656 0 0
2021-09-21 1763.400024 1780.800049 1756.300049 1776.099976 11490 0 0
2021-09-22 1773.099976 1785.900024 1762.800049 1776.699951 6343 0 0
2021-09-23 1766.900024 1774.500000 1736.300049 1747.699951 10630 0 0
2021-09-24 1741.300049 1755.599976 1738.300049 1749.699951 10630 0 0
I found this function in a paper (https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2422183) and I would like to apply it to my dataset, but I can't understand how to apply it:
def fitKCA(t,z,q,fwd=0):
'''
Inputs:
t: Iterable with time indices
z: Iterable with measurements
q: Scalar that multiplies the seed states covariance
fwd: number of steps to forecast (optional, default=0)
Output:
x[0]: smoothed state means of position velocity and acceleration
x[1]: smoothed state covar of position velocity and acceleration
Dependencies: numpy, pykalman
'''
#1) Set up matrices A,H and a seed for Q
h=(t[-1]-t[0])/t.shape[0]
A=np.array([[1,h,.5*h**2],
[0,1,h],
[0,0,1]])
Q=q*np.eye(A.shape[0])
#2) Apply the filter
kf=KalmanFilter(transition_matrices=A,transition_covariance=Q)
#3) EM estimates
kf=kf.em(z)
#4) Smooth
x_mean,x_covar=kf.smooth(z)
#5) Forecast
for fwd_ in range(fwd):
x_mean_,x_covar_=kf.filter_update(filtered_state_mean=x_mean[-1], \
filtered_state_covariance=x_covar[-1])
x_mean=np.append(x_mean,x_mean_.reshape(1,-1),axis=0)
x_covar_=np.expand_dims(x_covar_,axis=0)
x_covar=np.append(x_covar,x_covar_,axis=0)
#6) Std series
x_std=(x_covar[:,0,0]**.5).reshape(-1,1)
for i in range(1,x_covar.shape[1]):
x_std_=x_covar[:,i,i]**.5
x_std=np.append(x_std,x_std_.reshape(-1,1),axis=1)
return x_mean,x_std,x_covar
In the paper they say: Numpy array t conveys the index of
observations. Numpy array z passes the observations. Scalar q provides a seed value for
initializing the EM estimation of the states covariance. How can i call this function with my data? I understand t should be the index column of each symbol, that is the data column, the z is the close price for each symbol of my numpy array, and q a random seed, but i can't make it works
The function in the paper states that you need :
t: Iterable with time indices
z: Iterable with measurements
q: Scalar that multiplies the seed states covariance
here is how you would compute them :
import yfinance
from random import random
tickers = yfinance.Ticker('MSFT')
history = tickers.history()
# t is the timestamps indexed at 0 for each row
t = [h[0] for h in history.values]
# z is the measurement here choosing open price
z = history.values.Open
# q random seeds
q = [random() for _ in t]
# finally call the function
fitKCA(t,z, q)

General way to quantize floating point numbers into arbitrary number of bins?

I want to quantize a series of numbers which have a maximum and minimum value of X and Y respectively into arbitrary number of bins. For instance, if the maximum value of my array is 65535 and the minimum is 0 (do not assume these are all integers) and I want to quantize the values into 2 bins, all values more than floor(65535/2) would become 65535 and the rest become 0. Similar story repeats if I want to quantize the array from any number between 1 to 65535. I wonder, is there an efficient and easy way to do this? If not, how can I do this efficiently for number of bins being powers of 2? Although a pseudocode would be fine but Python + Numpy is preferred.
It's not the most elegant solution, but:
MIN_VALUE = 0
MAX_VALUE = 65535
NO_BINS = 2
# Create random dataset from [0,65535] interval
numbers = np.random.randint(0,65535+1,100)
# Create bin edges
bins = np.arange(0,65535, (MAX_VALUE-MIN_VALUE)/NO_BINS)
# Get bin values
_, bin_val = np.histogram(numbers, NO_BINS-1, range=(MIN_VALUE, MAX_VALUE))
# Change the values to the bin value
for iter_bin in range(1,NO_BINS+1):
numbers[np.where(digits == iter_bin)] = bin_val[iter_bin-1]
UPDATE
Does the same job:
import pandas as pd
import numpy as np
# or bin_labels = [i*((MAX_VALUE - MIN_VALUE) / (NO_BINS-1)) for i in range(NO_BINS)]
_, bin_labels = np.histogram(numbers, NO_BINS-1, range=(MIN_VALUE, MAX_VALUE))
pd.cut(numbers, NO_BINS, right=False, labels=bin_labels)

plotting high precision data

I have an array which contains error values as a function of two different quantities (alpha and eigRange).
I fill my array like this :
for j in range(n):
for i in range(alphaLen):
alpha = alpha_list[i]
c = train.eig(xt_, yt_,m-j, m,alpha, "cpu")
costListTrain[j, i] = cost.err(xt_, xt_, yt_, c)
normedValues=costListTrain/np.max(costListTrain.ravel())
where
n = 20
alpha_list = [0.0001,0.0003,0.0008,0.001,0.003,0.006,0.01,0.03,0.05]
My costListTrain array contains some values that have very small differences, e.g.:
2.809458902485728 2.809458905776425 2.809458913576337 2.809459011062461
2.030326752376704 2.030329906064879 2.030337351188699 2.030428976282031
1.919840839066182 1.919846470077076 1.919859731440199 1.920021453630778
1.858436351617677 1.858444223016128 1.858462730482461 1.858687054377165
1.475871326997542 1.475901926855846 1.475973476249240 1.476822830933632
1.475775410801635 1.475806023102173 1.475877601316863 1.476727286424228
1.475774284270633 1.475804896751524 1.475876475382906 1.476726165223209
1.463578292548192 1.463611627166494 1.463689466240788 1.464609083309240
1.462859608038034 1.462893157900139 1.462971489632478 1.463896516033939
1.461912706143012 1.461954067956570 1.462047793798572 1.463079574605320
1.450581041157659 1.452770209885761 1.454835202839513 1.459676311335618
1.450581041157643 1.452770209885764 1.454835202839484 1.459676311335624
1.450581041157651 1.452770209885735 1.454835202839484 1.459676311335610
1.450581041157597 1.452770209885784 1.454835202839503 1.459676311335620
1.450581041157575 1.452770209885757 1.454835202839496 1.459676311335619
1.450581041157716 1.452770209885711 1.454835202839499 1.459676311335613
1.450581041157667 1.452770209885744 1.454835202839509 1.459676311335625
1.450581041157649 1.452770209885750 1.454835202839476 1.459676311335617
1.450581041157655 1.452770209885708 1.454835202839442 1.459676311335622
1.450581041157571 1.452770209885700 1.454835202839498 1.459676311335622
as you can here the value are very very close together!
I am trying to plotting this data in a way where I have the two quantities in the x, y axes and the error value is represented by the dot color.
This is how I'm plotting my data:
alpha_list = np.log(alpha_list)
eigenvalues, alphaa = np.meshgrid(eigRange, alpha_list)
vMin = np.min(costListTrain)
vMax = np.max(costListTrain)
plt.scatter(x, y, s=70, c=normedValues, vmin=vMin, vmax=vMax, alpha=0.50)
but the result is not correct.
I tried to normalize my error value by dividing all values by the max, but it didn't work !
The only way that I could make it work (which is incorrect) is to normalize my data in two different ways. One is base on each column (which means factor1 is constant, factor 2 changing), and the other one based on row (means factor 2 is constant and factor one changing). But it doesn't really make sense because I need a single plot to show the tradeoff between the two quantities on the error values.
UPDATE
this is what I mean by last paragraph.
normalizing values base on max on each rows which correspond to eigenvalues:
maxsEigBasedTrain= np.amax(costListTrain.T,1)[:,np.newaxis]
maxsEigBasedTest= np.amax(costListTest.T,1)[:,np.newaxis]
normEigCostTrain=costListTrain.T/maxsEigBasedTrain
normEigCostTest=costListTest.T/maxsEigBasedTest
normalizing values base on max on each column which correspond to alphas:
maxsAlphaBasedTrain= np.amax(costListTrain,1)[:,np.newaxis]
maxsAlphaBasedTest= np.amax(costListTest,1)[:,np.newaxis]
normAlphaCostTrain=costListTrain/maxsAlphaBasedTrain
normAlphaCostTest=costListTest/maxsAlphaBasedTest
plot 1:
where no. eigenvalue = 10 and alpha changes (should correspond to column 10 of plot 1) :
where alpha = 0.0001 and eigenvalues change (should correspond to first row of plot1)
but as you can see the results are different from plot 1!
UPDATE:
just to clarify more stuff this is how I read my data:
from sklearn.datasets.samples_generator import make_regression
rng = np.random.RandomState(0)
diabetes = datasets.load_diabetes()
X_diabetes, y_diabetes = diabetes.data, diabetes.target
X_diabetes=np.c_[np.ones(len(X_diabetes)),X_diabetes]
ind = np.arange(X_diabetes.shape[0])
rng.shuffle(ind)
#===============================================================================
# Split Data
#===============================================================================
import math
cross= math.ceil(0.7*len(X_diabetes))
ind_train = ind[:cross]
X_train, y_train = X_diabetes[ind_train], y_diabetes[ind_train]
ind_val=ind[cross:]
X_val,y_val= X_diabetes[ind_val], y_diabetes[ind_val]
I also uploaded .csv files HERE
log.csv contain the original value before normalization for plot 1
normalizedLog.csv for plot 1
eigenConst.csv for plot 2
alphaConst.csv for plot 3
I think I found the answer. First of all there was one problem in my code. I was expecting the "No. of eigenvalue" correspond to rows but in my for loop they fill the columns. The currect answer is this :
for i in range(alphaLen):
for j in range(n):
alpha=alpha_list[i]
c=train.eig(xt_, yt_,m-j,m,alpha,"cpu")
costListTrain[i,j]=cost.err(xt_,xt_,yt_,c)
costListTest[i,j]=cost.err(xt_,xv_,yv_,c)
After asking questions from friends and colleagues I got this answer :
I would assume on default imshow and other plotting commands you
might want to use, do equally sized intervals on the values you are
plotting. if you can set that to logarithmic you should be fine.
Ideally, equally "populated bins" would proof most effective, i guess.
for plotting I just subtract the min value from the error and the add a small number and at the end take the log.
temp=costListTrain- costListTrain.min()
temp+=0.00000001
extent = [0, 20,alpha_list[0], alpha_list[-1]]
plt.imshow(np.log(temp),interpolation="nearest",cmap=plt.get_cmap('spectral'), extent = extent, origin="lower")
plt.colorbar()
and result is :

Categories

Resources