How to calculate sigma_1 and sigma_2 with Covariance Matrix

How to calculate sigma_1 and sigma_2 with Covariance Matrix - python

I'm reading this article.
In the "Covariance matrix & SVD" section,
there are two \sigmas, which are \sigma_1 and \sigma_2.
Those values are 14.4 and 0.19, respectively.
How can I get these values?
I already calculated the covariance matrix with Numpy:
import numpy as np
a = np.array([[2.9, -1.5, 0.1, -1.0, 2.1, -4.0, -2.0, 2.2, 0.2, 2.0, 1.5, -2.5],
[4.0, -0.9, 0.0, -1.0, 3.0, -5.0, -3.5, 2.6, 1.0, 3.5, 1.0, -4.7]])
cov_mat = (a.shape[1] - 1) * np.cov(a)
print(cov_mat)
# b = np.std(a, axis=1)**0.5
b = (a.shape[1] - 1) * np.std(a, axis=1)**0.5
# b = np.std(cov_mat, axis=1)
# b = np.std(cov_mat, axis=1)**0.5
print(b)
The result is:
[[ 53.46 73.42]
[ 73.42 107.16]]
[15.98102431 19.0154037 ]
No matter what I do, I can't get 14.4 and 0.19.
Are they just wrong values?
Please help me. Thank you in advance.

Don't know why you "un-sampled' your covariance, but the original np.cov output is what you want to get eigenvalues of:
np.linalg.eigvalsh(np.cov(a))
Out[]: array([ 0.19403958, 14.4077786 ])

Related

Printing between two number ranges with a given step value

I am new to loops, and I am trying to iterate over all items in a list, and I need to generate the values between 0 and 2 with a given step value. I have tried to use the "range" function, but cannot get it to work.
The end result should look something like this (doesn't have to be in a pandas dataframe, just for illustrative purposes):
import pandas as pd
import numpy as np
data = {'range_0.5' : [0,0.5,1,1.5,2, np.nan, np.nan, np.nan, np.nan],
'range_0.25' : [0,0.25,0.5,0.75,1,1.25,1.5,1.75,2]}
df = pd.DataFrame(data)
df
Here is what I have tried:
import numpy
x = []
seq = [0.5, 0.25, 0.125, 0.0625, 0.03125, 0.015625, 0.0078125, 0.00390625]
for i in seq:
x = range(0, 2, i)
The following error is thrown:
TypeError Traceback (most recent call last)
Input In [10], in <cell line: 1>()
1 for i in seq:
----> 2 x = range(0, 2, i)
TypeError: 'float' object cannot be interpreted as an integer
How can I properly create my loop?

np.arange()
You can use numpy.arange() which supports floats as step values.
import numpy as np
for step in [0.5, 0.25]:
print([i for i in np.arange(0, 2, step))
Expected output:
[0.0, 0.5, 1.0, 1.5]
[0.0, 0.25, 0.5, 0.75, 1.0, 1.25, 1.5, 1.75]
To include 2 just add the step value once again:
for step in [0.5, 0.25]:
print([i for i in np.arange(0, 2 + step, step)])
Expected output:
[0.0, 0.5, 1.0, 1.5, 2.0]
[0.0, 0.25, 0.5, 0.75, 1.0, 1.25, 1.5, 1.75, 2.0]
np.linspace()
Alternatively you can use np.linspace():
This has the ability to include the endpoint using endpoint=True;
for step in [0.5, 0.25]:
print([i for i in np.linspace(0, 2, int(2 // step) + 1, endpoint=True)])
Expected output:
[0.0, 0.5, 1.0, 1.5, 2.0]
[0.0, 0.25, 0.5, 0.75, 1.0, 1.25, 1.5, 1.75, 2.0]

How to remove rows from numpy array if certain number of an element is present

I have a 2d numpy array that contains some numbers like:
data =
[[1.1, 1.2, 1.3, 1.4],
[2.1, 2.2, 2.3, -1.0],
[-1.0, 3.2, 3.3, -1.0],
[-1.0, -1.0. -1.0, -1.0]]
I want to remove every row that contains the value -1.0 2 or more times, so I'm left with
data =
[[1.1, 1.2, 1.3, 1.4],
[2.1, 2.2, 2.3, -1.0]]
I found this question which looks like it's very close to what I'm trying to do, but I can't quite figure out how I can rewrite that to fit my use case.

You can easily do it with this piece of code:
new_data = data[(data == -1).sum(axis=1) < 2]
Result:
>>> new_data
array([[ 1.1, 1.2, 1.3, 1.4],
[ 2.1, 2.2, 2.3, -1. ]])

def remove_rows(data, threshold):
mask = np.array([np.sum(row == -1) < threshold for row in data])
return data[mask]
This function will return a new array with no rows having -1's more than or equal to the threshold
You need to pass in a Numpy array for it to work.

Creating lists from data file

I have a pre-defined list that gives data in the form of (min, max, increment). for example:
[[0.0 1.0 0.1 #mass
1.0 5.0 1.0 #velocity
45.0 47.0 1.0 #angle in degrees
0.05 0.07 0.1 #drag coeff.
0.0 0.0 0.0 #x-position
0.0 0.0 0.0]] #y-postion
and this goes on a for a few more variables. Ideally I want to take each one in as an individual variable declaration and create a finite list of each value in the given range.
For example, mass would be:
m = [0.0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0]
this way I can utilize itertools.combinations((m, x, b,...), r) to create all possible combinations given the various possibilities of each variable.
Any suggestions?

Not sure about you list structure, if you do need to take slices you can use itertools.islice and store all lists in a dict:
from itertools import islice
l = iter([0.0, 1.0, 0.1, #mass
1.0, 5.0, 1.0,#velocity
45.0 ,47.0, 1.0, #angle in degrees
0.05, 0.07, 0.1, #drag coeff.
0.0, 0.0 ,0.0 ,#x-position
0.0 ,0.0, 0.0])#y-postion
d = {}
import numpy as np
for v in ("m","v","and","drg","x-p","y-p"): # put all "variable" names in order
start, stop , step = islice(l, None, 3)
# or use next()
# start, stop , step = next(l), next(l), next(l)
if stop > start: # make sure we have a step to take
# create key/value pairing
d[v] = np.arange(start, stop + 1,step)
else:
# add empty list for zero values
d[v] = []
print(d)
{'x-p': [], 'drg': array([ 0.05, 0.15, 0.25, 0.35, 0.45, 0.55, 0.65, 0.75, 0.85,
0.95, 1.05]), 'and': array([ 45., 46., 47.]), 'v': array([ 1., 2., 3., 4., 5.]), 'y-p': [], 'm': array([ 0. , 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1. ,
1.1, 1.2, 1.3, 1.4, 1.5, 1.6, 1.7, 1.8, 1.9])}
You can also create your own range that will take a float as a step:
def float_range(start=0, stop=None, step=1):
while start <= stop:
yield start
start += step
Then call it with list(start, stop,step), but you need to be careful when dealing with floats because of Floating Point Arithmetic: Issues and Limitations

You wrote the list as a flat list, with all numbers on the same level
[[0.0 1.0 0.1 1.0 5.0 1.0 45.0 47.0 1.0 ...]]
but it's possible you meant to write it as a nested list
[[0.0, 1.0, 0.1], [1.0, 5.0, 1.0], [45.0, 47.0, 1.0], ...]
so I'll show both solutions. Please let me know how your data/list is actually structured.
Python's range function doesn't support floats, but you can use NumPy's arange.
The try ... except part is for your unchanging values like 0.0 0.0 0.0 #x-position.
Flat list solution:
flat_list = [0.0, 1.0, 0.1,
1.0, 5.0, 1.0,
45.0, 47.0, 1.0,
0.05, 0.07, 0.1,
0.0, 0.0, 0.0,
0.0, 0.0, 0.0]
import numpy as np
incremented_lists = []
for i in range(0, len(flat_list), 3): # Step in threes
minimum, maximum, increment = flat_list[i:i+3]
try:
incremented_list = list(np.arange(minimum, maximum + increment, increment))
except ZeroDivisionError:
incremented_list = [minimum]
incremented_lists.append(incremented_list)
Nested list solution:
nested_list = [[0.0, 1.0, 0.1],
[1.0, 5.0, 1.0],
[45.0, 47.0, 1.0],
[0.05, 0.07, 0.1],
[0.0, 0.0, 0.0],
[0.0, 0.0, 0.0]]
import numpy as np
incremented_lists = []
for sub_list in nested_list:
minimum, maximum, increment = sub_list
try:
incremented_list = list(np.arange(minimum, maximum + increment, increment))
except ZeroDivisionError:
incremented_list = [minimum]
incremented_lists.append(incremented_list)
Running either of these with Python 2.7 or Python 3.3 gets this:
incremented_lists: [[0.0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0],
[1.0, 2.0, 3.0, 4.0, 5.0],
[45.0, 46.0, 47.0],
[0.05, 0.15],
[0.0],
[0.0]]
The [0.05, 0.15] is probably undesirable, but I think your huge 0.1 increment for the drag coefficient is more likely a typo than something I should make the code handle. Please let me know if you would like the code to handle unnatural increments and avoid overshooting the maximum. One way to handle that would be to add incremented_list = [x for x in incremented_list if x <= maximum] right before incremented_lists.append(incremented_list), though I'm sure there's a cleaner way to do it.

I can't think of any existing format supporting your desired input -- with spaces as separator, newlines breaking sub-lists, and comments actually meaningful as you appear to desire the to define the sub-lists' names. So, I think you'll have to code your own parser, e.g:
import re, numpy as np
res_dict = {}
with open('thefile.txt') as f:
for line in f:
mo = re.match(r'[?[(\S+)\s*(\S+)\s*(\S+)\s*#(\w)', line)
keybase = mo.group(4)
keyadd = 0
key = keybase
while key in res_dict:
key = '{}{}'.format(keybase, keyadd)
keyadd += 1
res_dict[key] = np.arange(
float(mo.group(1)),
float(mo.group(2)),
float(mo.group(3)),
)
This won't give you a top-level variable m as you mention -- but rather a better-structured, more robust res_dict['m'] instead. If you insist on making your code brittle and fragile, you can globals().update(res_dict) to make it so:-)...

Multiply row in numpy array of fields with a list

Following on from this question:
Unexpectedly large array created with numpy.ones when setting names
When I multiply
a = np.ones([len(sectors),len(columns)])
a[0,:] *= [1.1,1.2,1.3,1.4,1.5,1.6,1.7,1.8]
It works fine.
When I try
columns = ["Port Wt", "Bench Wt", "Port Retn", "Bench Retn",
"Attrib", "Select", "Inter", "Total"]
a = np.ones((10,), dtype={"names":columns, "formats":["f8"]*len(columns)})
a[0] *= [1.1,1.2,1.3,1.4,1.5,1.6,1.7,1.8]
I get the error
TypeError: cannot convert to an int; scalar object is not a number
I would like to use field-names if possible. What am I doing wrong here?
Many thanks.

A element (row) of this a can be modified by assigning it a tuple. We can take advantage of the fact that lists easily convert to and from arrays, to write:
In [162]: a = np.ones((10,), dtype={"names":columns, "formats":["f8"]*len(columns)})
In [163]: x=[1.1,1.2,1.3,1.4,1.5,1.6,1.7,1.8]
In [164]: a[0]=tuple(np.array(x)*list(a[0]))
In [165]: a
Out[165]:
array([(1.1, 1.2, 1.3, 1.4, 1.5, 1.6, 1.7, 1.8),
...], dtype=[('Port Wt', '<f8'), ('Bench Wt', '<f8'),...
More generally you could write
a[i] = tuple(foo(list(a[i]))
Multiple values ('rows') of a can be set with a list of tuples.
An earlier SO structure array question (https://stackoverflow.com/a/26183332/901925) suggests another solution - create a partner 2d array that shares the same data buffer.
In [311]: a1 = np.empty((10,8)) # conventional 2d array
In [312]: a1.data = a.data # share the a's data buffer
In [313]: a1[0] *= x # do math on a1
In [314]: a1
Out[314]:
array([[ 1.1, 1.2, 1.3, 1.4, 1.5, 1.6, 1.7, 1.8],
...
[ 1. , 1. , 1. , 1. , 1. , 1. , 1. , 1. ]])
In [315]: a
Out[315]:
array([(1.1, 1.2, 1.3, 1.4, 1.5, 1.6, 1.7, 1.8),
...
(1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0)],
dtype=[('Port Wt', '<f8'), ('Bench Wt', '<f8'), ('Port Retn', '<f8'), ('Bench Retn', '<f8'), ('Attrib', '<f8'), ('Select', '<f8'), ('Inter', '<f8'), ('Total', '<f8')])
By sharing the data buffer, changes made to a1 affect a as well.
It might be better to view 2d a1 as the primary array, and a as a structured view. a could be constructed on the fly, as needed to display the data, access columns by name, or write to a csv file.

The rows of your array a are not numpy's arrays, the closest things to them are possibly tuples
>>> import numpy as np
>>> columns = ["Port Wt", "Bench Wt", "Port Retn", "Bench Retn",
... "Attrib", "Select", "Inter", "Total"]
>>> a = np.ones((10,), dtype={"names":columns, "formats":["f8"]*len(columns)})
>>> type(a[0,0])
IndexError: too many indices
>>> type(a[0][0])
numpy.float64
>>> type(a[0])
numpy.void
>>>
on the contrary the columns of a are ndarray's and you can multiply them by a list of floats of the correct length (not the nuber of columns but the number of rows)
>>> type(a['Select'])
numpy.ndarray
>>> a['Select']*[1.1,1.2,1.3,1.4,1.5,1.6,1.7,1.8]
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-27-fc8dc4596098> in <module>()
----> 1 a['Select']*[1.1,1.2,1.3,1.4,1.5,1.6,1.7,1.8]
ValueError: operands could not be broadcast together with shapes (10,) (8,)
>>> a['Select']*[1.1,1.2,1.3,1.4,1.5,1.6,1.7,1.8, 0,0]
array([ 1.1, 1.2, 1.3, 1.4, 1.5, 1.6, 1.7, 1.8, 0. , 0. ])
>>>
Edit
In response to a comment from OP: «is it not possible to apply a function to a row in a named array of fields (or tuple) in numpy?»
The only way that I know of is
>>> a[0] = tuple(b*a[c][0] for b, c in zip([1.1,1.2,1.3,1.4,1.5,1.6,1.7,1.8],columns))
>>> print a
[(1.1, 1.2, 1.3, 1.4, 1.5, 1.6, 1.7, 1.8)
(1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0)
(1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0)
(1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0)
(1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0)
(1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0)
(1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0)
(1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0)
(1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0)
(1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0)]
>>>
but I'm not the most skilled numpy expert around... maybe one of the least skilled indeed

Standard deviation/error of linear regression

So I have:
t = [0.0, 3.0, 5.0, 7.2, 10.0, 13.0, 15.0, 20.0, 25.0, 30.0, 35.0]
U = [12.5, 10.0, 7.6, 6.0, 4.4, 3.1, 2.5, 1.5, 1.0, 0.5, 0.3]
U_0 = 12.5
y = []
for number in U:
y.append(math.log(number/U_0, math.e))
(m, b) = np.polyfit(t, y, 1)
yp = np.polyval([m, b], t)
plt.plot(t, yp)
plt.show()
So by doing this I get linear regression fit with m=-0.1071 and b=0.0347.
How do I get deviation or error for m value?
I would like m = -0.1071*(1+ plus/minus error)
m is k and b is n in y=kx+n

import numpy as np
import pandas as pd
import statsmodels.api as sm
import math
U = [12.5, 10.0, 7.6, 6.0, 4.4, 3.1, 2.5, 1.5, 1.0, 0.5, 0.3]
U_0 = 12.5
y = []
for number in U:
y.append(math.log(number/U_0, math.e))
y = np.array(y)
t = np.array([0.0, 3.0, 5.0, 7.2, 10.0, 13.0, 15.0, 20.0, 25.0, 30.0, 35.0])
t = sm.add_constant(t, prepend=False)
model = sm.OLS(y,t)
result = model.fit()
result.summary()

You can use scipy.stats.linregress :
m, b, r_value, p_value, std_err = stats.linregress(t, yp)
The quality of the linear regression is given by the correlation coefficient in r_value, being r_value = 1.0 for a perfect correlation.
Note that, std_err is the standard error of the estimated gradient, and not from the linear regression.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to calculate sigma_1 and sigma_2 with Covariance Matrix - python

Don't know why you "un-sampled' your covariance, but the original np.cov output is what you want to get eigenvalues of: np.linalg.eigvalsh(np.cov(a)) Out[]: array([ 0.19403958, 14.4077786 ])

Related

Printing between two number ranges with a given step value

How to remove rows from numpy array if certain number of an element is present

Creating lists from data file

Multiply row in numpy array of fields with a list

Standard deviation/error of linear regression

Categories

Resources