Changing the values of a matrix above a threshold in python - python

I have a matrix :
matrix = np.array([[[0,0.5,0.6],[0.9,1.2,0]],[[0,0.5,0.6],[0.9,1.2,0]]])
I want to replace all the values 0.55 < x < 0.95 by 0.55.
PS : My question is similar to this question. But the answer does not work in my case.

You can use np.where:
matrix = np.array([[[0,0.5,0.6],[0.9,1.2,0]],[[0,0.5,0.6],[0.9,1.2,0]]])
matrix[np.where((matrix > 0.55) & (matrix < 0.95))] = 0.55
# Or
# matrix[(matrix > 0.55) & (matrix < 0.95)] = 0.55
Output:
>>> matrix
array([[[0. , 0.5 , 0.55],
[0.55, 1.2 , 0. ]],
[[0. , 0.5 , 0.55],
[0.55, 1.2 , 0. ]]])

Related

How to generate stepped distribution values based on min, max and likely values in Python?

The probability density function is defined by three parameters: Minimum, Median, and Maximum [Codling et al].
I need to generate a distribution value (PDF) y = f(x) based on these values. I know this function: numpy.heaviside, but couldn't use it properly.
Example: 7.5 is min, 11.4 is likely and 21.7 is max value.
What I expect:
The distribution condition will be 0.5, based on rand samples from 0 to 1. values ranging from min to likely [<0.5] and from likely to max [>0.5]. Forinstance:
if the random sample is below 0.5 the value has to be in the rage of min to likely and if the sample is above 0.5 the value has to be from median to max.
if sample = 0.35, then value has to be from 7.5 to 11.4
for instance:
x = random.rand(size)
sample = []
for s in x:
if s > 0.5:
y = 2*(s-0.5)*(max-med)
sample.append(y)
else:
y = 2*s*(med-min)
sample.append(y)
But it never reaches to min value or max value.
Codling et al., Probabilistic Well Time Estimation Using Operations Reporting Data
Are you looking to define a piecewise constant function?
You could do this by combining several np.heaviside functions:
def pdf1(x, minimum, median, maximum):
h = np.heaviside
return (
h(x - minimum, 0) * h(-(x - median), 0) / (median - minimum) / 2
+ h(x - median, 0) * h(-(x - maximum), 0) / (maximum - median) / 2
)
You could also use np.piecewise:
def pdf2(x, minimum, median, maximum):
return np.piecewise(
x,
[(minimum <= x) * (x < median), (median <= x) * (x < maximum)],
[1 / (median - minimum) / 2, 1 / (maximum - median) / 2]
)
Example:
>>> x = np.linspace(-3, 3, 20)
>>> minimum = -2
>>> median = -1
>>> maximum = 2
>>> pdf1(x, minimum, median, maximum)
[0. 0. 0. 0. 0.5 0.5
0.5 0.16666667 0.16666667 0.16666667 0.16666667 0.16666667
0.16666667 0.16666667 0.16666667 0.16666667 0. 0.
0. 0. ]
>>> pdf2(x, minimum, median, maximum)
[0. 0. 0. 0. 0.5 0.5
0.5 0.16666667 0.16666667 0.16666667 0.16666667 0.16666667
0.16666667 0.16666667 0.16666667 0.16666667 0. 0.
0. 0. ]

Numpy dot product for group of rows

I am trying to calculate a dot product between two matrices, for each couple of rows.
I have matrix D with (u x 2) dimensions and matrix R with (u*2 x c) dimensions.
Below an example:
D = np.array([[0.02747092, 0.11233295],
[0.02747092, 0.07295284],
[0.01245856, 0.19935923],
[0.01245856, 0.13520913],
[0.11233295, 0.07295284]])
R = np.array([[-3. , 0. , 1. , -1. ],
[-1.25 , 0.75 , 1.75 , -1.25 ],
[-2.33333333, -0.33333333, 1.66666667, -1.33333333],
[-1.25 , 0.75 , 1.75 , -1.25 ],
[ 0. , -2. , 2. , -4. ],
[-1.25 , 0.75 , 1.75 , -1.25 ],
[ 0.66666667, -3.33333333, 2.66666667, -4.33333333],
[-1.25 , 0.75 , 1.75 , -1.25 ],
[-2.33333333, -0.33333333, 1.66666667, -1.33333333],
[-3. , 0. , 1. , -1. ]])
The result should be matrix M with dimensions (u x c) as follows (example of first row):
M = np.array([[-0.2185, 0.0825, 0.2195, -0.1645],
[...]])
Which is result of dot product between the first row of D and first two rows of matrix R as such:
D_ = np.array([[0.027, 0.11]])
R_ = np.array([[-3., 0., 1., -1.],
[-1.25, 0.75, 1.75, -1.25]])
D_.dot(R_)
I tried various ways of np.tensordot after reshaping the D matrix into tensor, but without any luck. I am looking for vectorized solution and to avoid loops (which is my current solution, quite slow).
Reshape R to 3D and use np.einsum -
np.einsum('ijk,ij->ik',R.reshape(len(D),2,-1),D)

Drop NaN in a for loop for each column (Longstaff Schwartz Monte Carlo)

I will try to explain my problem. So I have two DataFrames , Df1 and Df2.
Each of them has 3 columns and 4 rows.
I will solve a quadratic functions with np.polyfit.
M=3
for t in range(M-1,0,-1):
regs = np.polyfit(Df1[:,t],Df2[:,t+1],2)
C = np.polyval(regs,Df1[:,t])
But I want to use only the values which are smaller than 1.1
Df1[Df1 < 1.1]
Now I have something like that
[1. , 1.09, 1.08, NaN]
[1. , 1., 1.07, 1.04]
[1. , NaN, 1.01, NaN]
[1. , 0.78, NaN,0.95]
And my Df2 looks like
[0.1 , 0., 0.08, 0.]
[0.1 , 0.11, 0., 0.09]
[0.1 , 0.33, 0.22, 0.]
[0.1 , 0.09, 0.108, 0.]
So what I want to do is for each column from Df1, if Df1 has a NaN
Then I don't want to calculate it.
Here is what I tried to explain:
X =[1.08,1.07,1.01]
Y =[0.,0.09,0]
I tried this one
S = [[1.,1.09,1.08,1.34],[1.,1.16,1.26,1.54],[1.,1.22,1.07,1.03],[1.,0.93,0.97,0.92],[1.,1.11,1.56,1.52],
[1.,0.76,0.77,0.9],[1.,0.92,0.84,1.01],[1.,0.88,1.22,1.34]]
K= 1.1
Sn = np.asarray(S)
r = 0.06
T=1
M=3
dt = T/M
h= np.maximum(K-Sn,0)
V = np.copy(h)
disk = np.exp(-r*dt)
for i in range(M-1,0,-1):
reg = np.polyfit(Sn[:,i],V[:,i+1]*disk,2)
C = np.polyval(reg,Sn[:,i])
V[:,i] = np.where(C > h[:,i],V[:,i+1]*disk,h[:,i])
C0 = disk* 1/8 * np.sum(V[:,1])
And my result for C0 is 0.11973..
This is the Longstaff Schwartz Monte Carlo Algorithm for pricing American Options.
But in the paper from Longstaff Schwartz ,they get a little different result
https://people.math.ethz.ch/~hjfurrer/teaching/LongstaffSchwartzAmericanOptionsLeastSquareMonteCarlo.pdf
(Page120)
They get 0.114. But I don't see my mistake

Python: appending to numpy array at certain indexes and change shape

I have a numpy array like this:
print(pred_galactic_prob.shape)
print(pred_galactic_prob[0:3])
(465, 5)
[[0.05 0.94 0.3 0.01 0.5 ]
[0.01 0.02 0.01 0.85 0.11]
[0.03 0.95 0.3 0.3 0.02]]
I want to append to this and change the shape so there are 13 columns and it would look like this:
[[0.05 0. 0.94 0. 0. 0.3 0. 0. 0.01 0. 0. 0. 0.5 ]
[0.01 0. 0.02 0. 0. 0.01 0. 0. 0.85 0. 0. 0. 0.11]
[0.03 0. 0.95 0. 0. 0.3 0. 0. 0.3 0. 0. 0. 0.02]]
i.e a column with all 0. is added after the first column, two columns with all 0. are added after the second entry and so on, per above.
I have tried the following:
pred_galactic_prob2 = np.array
for i in pred_galactic_prob:
pred_galactic_prob2 = np.append(pred_galactic_prob2, [i[0], 0.0, i[1], 0.0, 0.0, i[2], 0.0, 0.0, i[3], 0.0, 0.0, 0.0, i[4]])
but this just turns it into a 1D array.
A "one-line" solution would be
np.concatenate((a[:,:1],
np.lib.stride_tricks.as_strided(0,[len(a),1],[0,0]),
a[:,1:2],
np.lib.stride_tricks.as_strided(0,[len(a),2],[0,0]),
a[:,2:3],
np.lib.stride_tricks.as_strided(0,[len(a),2],[0,0]),
a[:,3:4],
np.lib.stride_tricks.as_strided(0,[len(a),3],[0,0]),
a[:,4:]), -1)
Though its wired in any sense. Using append would need even more as_strideds. I believe there should be a append-ish function that automatically broadcasts input but I'm not sure what is it. Anyway, a better solution is definitely as #hpaulj mentioned:
b = np.zeros((len(a), 13), a.dtype)
b[:,[0,2,5,8,12]] = a
here a means input, b means output

Pandas Multi-Index DataFrame to Numpy Ndarray

I am trying to convert a multi-index pandas DataFrame into a numpy.ndarray. The DataFrame is below:
s1 s2 s3 s4
Action State
1 s1 0.0 0 0.8 0.2
s2 0.1 0 0.9 0.0
2 s1 0.0 0 0.9 0.1
s2 0.0 0 1.0 0.0
I would like the resulting numpy.ndarray to be the following with np.shape() = (2,2,4):
[[[ 0.0 0.0 0.8 0.2 ]
[ 0.1 0.0 0.9 0.0 ]]
[[ 0.0 0.0 0.9 0.1 ]
[ 0.0 0.0 1.0 0.0]]]
I have tried df.as_matrix() but this returns:
[[ 0. 0. 0.8 0.2]
[ 0.1 0. 0.9 0. ]
[ 0. 0. 0.9 0.1]
[ 0. 0. 1. 0. ]]
How do I return a list of lists for the first level with each list representing an Action records.
You could use the following:
dim = len(df.index.get_level_values(0).unique())
result = df.values.reshape((dim1, dim1, df.shape[1]))
print(result)
[[[ 0. 0. 0.8 0.2]
[ 0.1 0. 0.9 0. ]]
[[ 0. 0. 0.9 0.1]
[ 0. 0. 1. 0. ]]]
The first line just finds the number of groups that you want to groupby.
Why this (or groupby) is needed: as soon as you use .values, you lose the dimensionality of the MultiIndex from pandas. So you need to re-pass that dimensionality to NumPy in some way.
One way
In [151]: df.groupby(level=0).apply(lambda x: x.values.tolist()).values
Out[151]:
array([[[0.0, 0.0, 0.8, 0.2],
[0.1, 0.0, 0.9, 0.0]],
[[0.0, 0.0, 0.9, 0.1],
[0.0, 0.0, 1.0, 0.0]]], dtype=object)
Using Divakar's suggestion, np.reshape() worked:
>>> print(P)
s1 s2 s3 s4
Action State
1 s1 0.0 0 0.8 0.2
s2 0.1 0 0.9 0.0
2 s1 0.0 0 0.9 0.1
s2 0.0 0 1.0 0.0
>>> np.reshape(P,(2,2,-1))
[[[ 0. 0. 0.8 0.2]
[ 0.1 0. 0.9 0. ]]
[[ 0. 0. 0.9 0.1]
[ 0. 0. 1. 0. ]]]
>>> np.shape(P)
(2, 2, 4)
Elaborating on Brad Solomon's answer, to get a sligthly more generic solution - indexes of different sizes and an unfixed number of indexes - one could do something like this:
def df_to_numpy(df):
try:
shape = [len(level) for level in df.index.levels]
except AttributeError:
shape = [len(df.index)]
ncol = df.shape[-1]
if ncol > 1:
shape.append(ncol)
return df.to_numpy().reshape(shape)
If df has missing sub-indexes reshape will not work. One way to add them would be (maybe there are better solutions):
def enforce_df_shape(df):
try:
ind = pd.MultiIndex.from_product([level.values for level in df.index.levels])
except AttributeError:
return df
fulldf = pd.DataFrame(-1, columns=df.columns, index=ind) # remove -1 to fill fulldf with nan
fulldf.update(df)
return fulldf
If you are just trying to pull out one column, say s1, and get an array with shape (2,2) you can use the .index.levshape like this:
x = df.s1.to_numpy().reshape(df.index.levshape)
This will give you a (2,2) containing the value of s1.

Categories

Resources