Storing data with Python arrays (HAR-RV Credit Risk model implementation) - python

that's my first time here at Stack and I'm on my first days with Python.
I'm dealing with the HAR-RV model, trying to run this equation but having no success at all to store my operations on the array
Here what I'm trying to calculate:
r_[t,i] = Y_[t,i] - Y_[t,i-1]
https://cdn1.imggmi.com/uploads/2019/8/30/299a4ab026de7db33c4222b30f3ed70a-full.png
Im using the first relation here, where "r" means return and "Y" the stock prices
t = 10 # daily intervals
i = 30 # number of days
s = 1
# (Here I've just created some fake numbers, intending to simulate some stock prices)
Y = abs(np.random.random((10,30)) + 1)
# initializing my return array
return = np.array([])
# (I also tried to initialize it as a Matrix before..) :
return = np.zeros((10,30))
# here is my for loop to store each daily return at its position on the "return" Array. I wanted an Array but got just "() size"
for t in range(0,9):
for i in range(1,29):
return = np.array( Y.item((t,i)) - Y.item((t,i-1)) )
... so, I was expecting something like this:
return = [first difference, second difference, third difference...]
how can I do that ?

first don't use return as a variable name in python as it is a python keyword (it signifies the result of a function). I have changed your variable return to ret_val.
You are wanting to make a change at each position in your array, so make the following change to your for loop:
for t in range(0,10):
for i in range(1,30):
ret_val[t][i] = Y[t][i] - Y[t][i-1]
print(ret_val)
This is saying to change the value at index ret_val[t][i] with the result of subtracting the values at that specific index in Y. You should see an array of the same shape when you print.
Also, the range function in python does not include the upper number. So when you say, for i in range(0,9) you are saying to include numbers 0-8. For your array, you'll want to do for i in range(0,10) to include all values in your array. Correspondingly, you'll want to do the same for i in range(1,30).

Related

how to change a array of a single column to a row of values instead of arrays in python

I am trying to convert an array of arrays that each contain only one integer to a single array with just the integers.
This is my code below. k=1 after the first for loop and the next code deletes all the rows of except the first one and then transposes it.
handles.Background = np.zeros(((len(imgY) * len(imgX)),len(imgZ)))
WhereIsBackground = np.zeros((len(imgY), len(imgX)))
k = 0
for i in range(len(imgY)):
for j in range (len(imgX)):
if img[i,j,handles.PS_Index] < (handles.PS_Mean_Intensity / 8):
handles.Background[k,:] = img[i,j,:]
WhereIsBackground[i,j] = 1
k = k+1
handles.Background = np.delete(handles.Background,np.s_[k:(len(imgY)*len(imgX))+1],0).T
At this point, I can access data by using handles.Background[n] but this returns an array that contains a single integer. I was trying to convert the handles.Background so that when I do handles.Background[n], it just returns a single integer instead of an array containing that value.
So, I'm getting array([0.]) when I run handles.Background[0], but I want to get just 0 when I run handles.Background[0]
I've observed that int(handles.Background[i]) returns an integer and tried to reassign them using a for loop but the result didn't really change. What would be the best option for me?
for i in range (len(handles.Background)):
handles.Background[i] = int(handles.Background[i])
if handles.Background[n] returns an array, you can index into that, too, using the same [n] notation.
So you are looking for
handles.Background[n][0]
If you want to unpack the whole array at once, you can use this:
handles.Background = [bg[0] for bg in handles.Background]

iteration over every element of 1d array in python

From an nc file I've read variables which are in the form of arrays. Now I've performed a calculation with the first element of all these variables and created a new variable. I want to repeat the same set of calculations for each element in the initial arrays without changing the code for the calculation which I have made taking a single point into consideration.
I've tried zip and nditer but in both cases the if statements in variable a are to be changed to .any() or .all(). I can't do either because I want the if statement to take into consideration only a single point and not the entire array.
T = AD06_ALL_OMNI.variables['A_TEMP'][:][0]
REL_HUM = AD06_ALL_OMNI.variables['HUMIDITY'][:][0]
AIR_PRES = AD06_ALL_OMNI.variables['A_PRES'][:][0]
a = T-29.65
#masking of values so that division by 0 is avoided
if a!=0.0:
exponent1 = math.exp(17.67*T-0.16/a)
q = REL_HUM*exponent1/(26.3*AIR_PRES)
deltaq = 0.98*qs-q
print (deltaq)
I need a to be computed for each point so that deltaq is found out for the same point taking values of T, REL_HUM, and AIR_PRES from the corresponding points. All variables are of same size (1d arrays). Please help!
for var in range(0, length(AD06_ALL_OMNI.variables['A_TEMP'][:])-1):
T = AD06_ALL_OMNI.variables['A_TEMP'][:][var]
REL_HUM = AD06_ALL_OMNI.variables['HUMIDITY'][:][var]
AIR_PRES = AD06_ALL_OMNI.variables['A_PRES'][:][var]
a = T-29.65
#masking of values so that division by 0 is avoided
count = 0
for element in a:
if element!=0.0:
exponent1 = math.exp(17.67*T[count] -0.16/element)
q = REL_HUM[count]*exponent1/(26.3*AIR_PRES[count] )
deltaq = 0.98*qs-q
print (deltaq)
count = count + 1
Assuming that all the arrays are of the same length(it doesn't make sense if you don't have air pressure, air temperature and humidity of equal lengths)you can use a loop to iterate over all the values of a, check for 0 for each value and calculate and print deltaq for each point. I hope this helps.

create loop using values from an array

I have an array D of variable length,
I want to create a loop that performs a sum based on the value of D corresponding to the number of times looped
i.e. the 5th run through the loop would use the 5th value in my array.
My code is:
period = 63 # can be edited to an input() command for variable periods.
Mrgn_dec = .10 # decimal value of 10%, can be manipulated to produce a 10% increase/decrease
rtn_annual = np.arange(0.00,0.15,0.05) # creates an array ??? not sure if helpful
sig_annual = np.arange(0.01,0.31,0.01) #use .31 as python doesnt include the upper range value.
#functions for variables of daily return and risk.
rtn_daily = (1/252)*rtn_annual
sig_daily = (1/(np.sqrt(252)))*sig_annual
D=np.random.normal(size=period) # unsure of range to use for standard distribution
for i in range(period):
r=(rtn_daily+sig_daily*D)
I'm trying to make it so my for loop is multiplied by the value for D of each step.
So D has a random value for every value of period, where period represents a day.
So for the 8th day I want the loop value for r to be multiplied by the 8th value in my array, is there a way to select the specific value or not?
Does the numpy.cumprod command offer any help, I'm not sure how it works but it has been suggested to help the problem.
You can select element in an iterative object (such as D in your code) simply by choosing its index. Such as:
for i in range(period):
print D[i]
But in your code, rtn_daily and sig_daily are not in the same shape, I assume that you want to add sig_daily multiply by D[i] in each position of rtn. so try this:
# -*- coding:utf-8 -*-
import numpy as np
period = 63 # can be edited to an input() command for variable periods.
Mrgn_dec = .10 # decimal value of 10%, can be manipulated to produce a 10% increase/decrease
rtn_annual = np.repeat(np.arange(0.00,0.15,0.05), 31) # creates an array ??? not sure if helpful
sig_annual = np.repeat(np.arange(0.01,0.31,0.01), 3) #use .31 as python doesnt include the upper range value.
#functions for variables of daily return and risk.
rtn_daily = (float(1)/252)*rtn_annual
sig_daily = (1/(np.sqrt(252)))*sig_annual
D=np.random.normal(size=period) # unsure of range to use for standard distribution
print D
for i in range(period):
r=(rtn_daily[i]+sig_daily[i]*D[i])
print r
Last of all, if you are using python2, the division method is for integer, so that means 1/252 will give you zero as result.
a = 1/252 >-- 0
to solve this you may try to make it float:
rtn_daily = (float(1)/252)*rtn_annual
Right now, D is just a scalar.
I'd suggest reading https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.random.normal.html to learn about the parameters.
If you change it to:
D=np.random.normal(mean,stdev,period)
you will get a 1D array with period number of samples, where mean and stdev are your mean and standard deviation of the distribution. Then you change the loop to:
for i in range(period):
r=(rtn_daily+sig_daily*D[i])
EDIT: I don't know what I was thinking when I read the code the first time. It was a horribly bad read on my part.
Looking back at the code, a few things need to happen to make it work.
First:
rtn_annual = np.arange(0.00,0.15,0.05)
sig_annual = np.arange(0.01,0.31,0.01)
These two lines need to be fixed so that the dimensions of the resulting matricies are the same.
Then:
rtn_daily = (1/252)*rtn_annual
Needs to be changed so it doesn't zero everything out -- either change 1 to 1.0 or float(1)
Finally:
r=(rtn_daily+sig_daily*D)
needs to be changed to:
r=(rtn_daily+sig_daily*D[i])
I'm not really sure of the intent of the original code, but it appears as though the loop is unnecessary and you could just change the loop to:
r=(rtn_daily+sig_daily*D[day])
where day is the day you're trying to isolate.

ValueError: could not broadcast input array when assigning values to numpy array

I'm pretty new with python and numpy in general so I don't exactly know what's going on. I have a lot of data stored in variable X where X.shape = (N,1).
What I have looks like this:
X = np.genfromtxt("data.txt",delimiter=',');
# Running X.shape gives me (45000L,1L)
Yhat = np.zeros((X.size,25))
# Running Yhat.shape gives me (45000L,25L)
learner = [None] * 25
for i in range(0,25):
learner[i] = treeRegress(X,dY)
Yhat[:,i] = learner[i].predict(X)
dY = dY * Yhat[:,i] * i
So basically, dY will change for each iteration of the loop. For each index of learner we will have a new result. My predict function then predicts a value for each value within X and then assigns each value to Yhat which puts this predicted value in a row with the column value of i.
Currently, this gives me an error:
Yhat[:,i] = learner[i].predict(X)
ValueError: could not broadcast input array from shape (45000,1) into shape (45000)
Doing learner[i].predict(X).shape returns (45000,1) and doing Yhat[:,i].shape returns (45000,). So I understand why I can't do this. Unfortunately, if I use:
learner[i] = treeRegress(X,dY)
Yhat[:,i] = np.ravel(learner[i].predict(X))
This actually uses all of my memory and crashes my computer, so I can't exactly do that.
Also, the reason I need to have it stored in a column is because eventually I want to compute the sum in intervals like so:
for i in [1,5,10,25]:
ysum = sum(Yhat[:,0:i])
print ysum
So essentially I want to be able to iterate through certain intervals of columns. So, what am I doing wrong? Or is there some other way I need to structure these arrays? How should I be assigning the data into the column I want without everything around me crashing and burning?

Making histogram out of matrix entries?

Today my task is to make a histogram to represent the operation of A^n where A is a matrix, but only for specific entries in the matrix.
For example, say I have a matrix where the rows sum to one. The first entry is some specific decimal number. However, if I raise that matrix to the 2nd power, that first entry becomes something else, and if I raise that matrix to the 3rd power, it changes again, etc - ad nauseum, and that's what I need to plot.
Right now my attempt is to create an empty list, and then use a for loop to add the entries that result from matrix multiplication to the list. However, all that it does is print the result from the final matrix multiplication into the list, rather than printing its value at each iteration.
Here's the specific bit of code that I'm talking about:
print("The intial probability matrix.")
print(tabulate(matrix))
baseprob = []
for i in range(1000):
matrix_n = numpy.linalg.matrix_power(matrix, s)
baseprob.append(matrix_n.item(0))
print(baseprob)
print("The final probability matrix.")
print(tabulate(matrix_n))
Here is the full code, as well as the output I got.
http://pastebin.com/EkfQX2Hu
Of course it only prints the final value, you are doing the same operation, matrix^s, 1000 times. You need to have s change each of those 1000 times.
If you want to calculate all values in location matrix(0) for matrix^i where i is each value from 1 to s (your final power) do:
baseprob = []
for i in range(1,s): #changed to do a range 1-s instead of 1000
#must use the loop variable here, not s (s is always the same)
matrix_n = numpy.linalg.matrix_power(matrix, i)
baseprob.append(matrix_n.item(0))
Then baseprob will hold matrix(0) for matrix^1, matrix^2, etc. all the way to matrix^s.

Categories

Resources