pandas standard deviation with bell curve graph using stats norm - python

My data frame is
Here i want standard devistion for the above dataframe and need a standard deviation graph.
I used below code
import numpy as np
import scipy.stats as stats
import pylab as pl
import pandas as pd
h=pd.read_excel(r"C:\Users\monthlyReports\standard_deviation\stan_rawdata.xlsx")
fit = stats.norm.pdf(h, np.mean(h), np.std(h))
pl.plot(h,fit,'-o')
pl.hist(h,normed=True)
pl.show()
but I am getting type error:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-15-830c3a5f6c7c> in <module>()
7
8
----> 9 fit = stats.norm.pdf(h, np.mean(h), np.std(h)) #this is a fitting indeed
10
11 pl.plot(h,fit,'-o')
~\AppData\Local\Continuum\anaconda3\lib\sitepackages\scipy\stats\_distn_infrastructure.py in pdf(self, x, *args, **kwds)
1650 args = tuple(map(asarray, args))
1651 dtyp = np.find_common_type([x.dtype, np.float64], [])
-> 1652 x = np.asarray((x - loc)/scale, dtype=dtyp)
1653 cond0 = self._argcheck(*args) & (scale > 0)
1654 cond1 = self._support_mask(x) & (scale > 0)
TypeError: unsupported operand type(s) for -: 'str' and 'float'

Related

RAPIDS cuml KNeighbors: number of landmark samples must be >= k

Minimum reproducible example:
import cudf
from cuml.neighbors import KNeighborsRegressor
d = {
'id':['a','b','c','d','e','f'],
'latitude':[50,-22,13,37,43,14],
'longitude':[3,-43,100,27,-4,121],
}
df = cudf.DataFrame(d)
knn = KNeighborsRegressor(n_neighbors = 4, metric = 'haversine')
knn.fit(df[['latitude','longitude']],df.index)
dists, nears = knn.kneighbors(df[['latitude','longitude']], return_distance = True)
Throws an error number of landmark samples must be >= k
the whole trace is:
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
/tmp/ipykernel_33/1073358290.py in <module>
10 knn = KNeighborsRegressor(n_neighbors = 4, metric = 'haversine')
11 knn.fit(df[['latitude','longitude']],df.index)
---> 12 dists, nears = knn.kneighbors(df[['latitude','longitude']], return_distance = True)
/opt/conda/lib/python3.7/site-packages/cuml/internals/api_decorators.py in inner_get(*args, **kwargs)
584
585 # Call the function
--> 586 ret_val = func(*args, **kwargs)
587
588 return cm.process_return(ret_val)
cuml/neighbors/nearest_neighbors.pyx in cuml.neighbors.nearest_neighbors.NearestNeighbors.kneighbors()
cuml/neighbors/nearest_neighbors.pyx in cuml.neighbors.nearest_neighbors.NearestNeighbors._kneighbors()
cuml/neighbors/nearest_neighbors.pyx in cuml.neighbors.nearest_neighbors.NearestNeighbors._kneighbors_dense()
RuntimeError: exception occured! file=_deps/raft-src/cpp/include/raft/spatial/knn/detail/ball_cover.cuh line=326: number of landmark samples must be >= k
Obtained 64 stack frames
...
I have been trying hard to get around this error for days but the only way i know is to convert the cudf to pandas df and use sklearn. And it works perfectly:
import pandas as pd
from sklearn.neighbors import KNeighborsRegressor
d = {
'id':['a','b','c','d','e','f'],
'latitude':[50,-22,13,37,43,14],
'longitude':[3,-43,100,27,-4,121],
}
df = pd.DataFrame(d)
knn = KNeighborsRegressor(n_neighbors = 4, metric = 'haversine')
knn.fit(df[['latitude','longitude']],df.index)
dists, nears = knn.kneighbors(df[['latitude','longitude']], return_distance = True)
dists
gives us the distances array
Can you help me find a pure RAPIDS solution?
UPDATE: I found out that it works for number of neighbors <= length of the total data//2
UPDATE: Its a bug, and an appropriate issue has been opened here. We can pass algorithm='brute' as a work around until the issue gets resolved

Cannot use print function and mean function from numpy

I am taking a course on Udacity called Intro into Data Analysis and I am trying to run this code but I keep getting an error. I am using Python3. Thanks in advance. In the tutorial videos that were explaing the code and the course everything was working fine ( I assumed because it is a different version of Python). I tried many things but I still don't seem to be able to make it work.
%pylab inline
import matplotlib.pyplot as plt
import numpy as np
def describe_data(data):
print ('Mean:', np.mean(data))
print ('Standard deviation:', np.std(data))
print ('Minimum:', np.min(data))
print ('Maximum:', np.max(data))
plt.hist(data)
describe_data(total_minutes_by_account.values())
This is the error:
Populating the interactive namespace from numpy and matplotlib
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-34-669ffc75246c> in <module>
14 plt.hist(data)
15
---> 16 describe_data(total_minutes_by_account.values())
<ipython-input-34-669ffc75246c> in describe_data(data)
8 # Summarize the given data
9 def describe_data(data):
---> 10 print ('Mean:', np.mean(data))
11 print ('Standard deviation:', np.std(data))
12 print ('Minimum:', np.min(data))
<__array_function__ internals> in mean(*args, **kwargs)
~\Anaconda3\lib\site-packages\numpy\core\fromnumeric.py in mean(a, axis, dtype, out, keepdims, where)
3417 return mean(axis=axis, dtype=dtype, out=out, **kwargs)
3418
-> 3419 return _methods._mean(a, axis=axis, dtype=dtype,
3420 out=out, **kwargs)
3421
~\Anaconda3\lib\site-packages\numpy\core\_methods.py in _mean(a, axis, dtype, out, keepdims, where)
188 ret = ret.dtype.type(ret / rcount)
189 else:
--> 190 ret = ret / rcount
191
192 return ret
TypeError: unsupported operand type(s) for /: 'dict_values' and 'int'
I supposed "total_minutes_by_account" is a dataframe. So you can do it in the following way.
import matplotlib.pyplot as plt
import numpy as np
def describe_data(data):
print ('Mean:', np.mean(data))
print ('Standard deviation:', np.std(data))
print ('Minimum:', np.min(data))
print ('Maximum:', np.max(data))
plt.hist(data)
describe_data(total_minutes_by_account.values.tolist())
You need to convert your dataframe values to list before performning any numpy operations.

statsmodels.tsa.api-0.9.0 ZeroDivisionError: division by zero

Code:
import os
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import statsmodels as sm
from statsmodels.tsa.api import ExponentialSmoothing, SimpleExpSmoothing, Holt
print('statsmodels.__version__', sm.__version__)
df = pd.DataFrame([
[547.184518, 256.990247, 237.709566, 465.214791, 1479.401737],
], columns=['point_4', 'point_5', 'point_6', 'point_7', 'point_8'], index=['000001.XSHE'])
fit2 = SimpleExpSmoothing(df.loc['000001.XSHE']).fit(smoothing_level=0.6, optimized=False)
fcast1 = fit2.forecast(1)
Error:
statsmodels.__version__ 0.9.0
/opt/conda/lib/python3.6/site-packages/statsmodels/tsa/base/tsa_model.py:221: ValueWarning: An unsupported index was provided and will be ignored when e.g. forecasting.
' ignored when e.g. forecasting.', ValueWarning)
---------------------------------------------------------------------------
ZeroDivisionError Traceback (most recent call last)
<ipython-input-4-a742c2be4f46> in <module>
12 ], columns=['point_4', 'point_5', 'point_6', 'point_7', 'point_8'], index=['000001.XSHE'])
13
---> 14 fit2 = SimpleExpSmoothing(df.loc['000001.XSHE']).fit(smoothing_level=0.6, optimized=False)
15 fcast1 = fit2.forecast(1)
/opt/conda/lib/python3.6/site-packages/statsmodels/tsa/holtwinters.py in fit(self, smoothing_level, optimized)
814 [1] Hyndman, Rob J., and George Athanasopoulos. Forecasting: principles and practice. OTexts, 2014.
815 """
--> 816 return super(SimpleExpSmoothing, self).fit(smoothing_level=smoothing_level, optimized=optimized)
817
818
/opt/conda/lib/python3.6/site-packages/statsmodels/tsa/holtwinters.py in fit(self, smoothing_level, smoothing_slope, smoothing_seasonal, damping_slope, optimized, use_boxcox, remove_bias, use_basinhopping)
592 smoothing_seasonal=gamma, damping_slope=phi,
593 initial_level=l0, initial_slope=b0, initial_seasons=s0,
--> 594 use_boxcox=use_boxcox, lamda=lamda, remove_bias=remove_bias)
595 hwfit._results.mle_retvals = opt
596 return hwfit
/opt/conda/lib/python3.6/site-packages/statsmodels/tsa/holtwinters.py in _predict(self, h, smoothing_level, smoothing_slope, smoothing_seasonal, initial_level, initial_slope, damping_slope, initial_seasons, use_boxcox, lamda, remove_bias)
733 k = m * seasoning + 2 * trending + 2 + 1 * damped
734 aic = self.nobs * np.log(sse / self.nobs) + (k) * 2
--> 735 aicc = aic + (2 * (k + 2) * (k + 3)) / (self.nobs - k - 3)
736 bic = self.nobs * np.log(sse / self.nobs) + (k) * np.log(self.nobs)
737 resid = data - fitted[:-h - 1]
ZeroDivisionError: division by zero
SimpleExpSmoothing is used for forcasting time series based data, my input data is valid, it should output forecast data without error.
If I remove point_8 column from the DataFrame, then the error disappears.
Do you know why it throws ZeroDivisionError?

Iterator operand 0 dtype could not be cast from dtype('<M8[us]') to dtype('<M8[D]') according to the rule 'safe'

I'm trying to form a Black-Scholes Model by writing codes, but the error happened.
from scipy import stats
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
def time_to_maturity(t0, T, y=252):
t0 = pd.to_datetime(t0)
T = pd.to_datetime(T)
return ( np.busday_count(t0, T) / y )
time_to_maturity('2018-08-01', '2018-12-14')
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-20-c71b621d9d71> in <module>
10 return ( np.busday_count(t0, T) / y )
11
---> 12 time_to_maturity('2018-08-01', '2018-12-14')
<ipython-input-20-c71b621d9d71> in time_to_maturity(t0, T, y)
8 t0 = pd.to_datetime(t0)
9 T = pd.to_datetime(T)
---> 10 return ( np.busday_count(t0, T) / y )
11
12 time_to_maturity('2018-08-01', '2018-12-14')
<__array_function__ internals> in busday_count(*args, **kwargs)
**TypeError: Iterator operand 0 dtype could not be cast from dtype('<M8[us]') to dtype('<M8[D]') according to the rule 'safe'**
I can't understand what the problem is. How can I fix this one?
Checking, I found that the argument of np.busday_count() needs to be in datetime64 format. So I used np.datetime64() to convert it.
from scipy import stats
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
def time_to_maturity(t0, T, y=252):
t0 = np.datetime64(t0)
T = np.datetime64(T)
return ( np.busday_count(t0, T) / y )
time_to_maturity('2018-08-01', '2018-12-14')
0.38492063492063494

Match filtering in Python

I'm trying to do a simple match filtering operation on a data set in python (so I tried doing conjugation followed by convolution). However, an error message is showing in the convolution function saying object too deep for desired array. Below is the code I'm using:
import numpy as np
import cPickle
import matplotlib.pyplot as plt
with open('meteor2.pkl', 'rb') as f:
data = cPickle.load(f)
vlt = data['vlt']
mfilt=np.conjugate(vlt)
mfilt1=np.convolve(vlt,mfilt,mode='full')
#mfilt=np.conjugate(vlt)
#mfilt1=np.convolve(vlt,mfilt,'same')
r = data['r']
t = data['t']
codes = data['codes']
freqs = data['freqs']
ch0_db = 10*np.log10(np.abs(mfilt1[:, 0, :])**2)
plt.figure()
plt.imshow(ch0_db.T, vmin=0, origin='lower', cmap=plt.cm.coolwarm,aspect='auto')
plt.title('All pulses')
plt.figure()
plt.imshow(ch0_db[3::5, :].T, vmin=0, origin='lower', cmap=plt.cm.coolwarm,aspect='auto')
plt.title('Minimum sidelobe coded-pulses')
plt.show()
np.convolve does one-dimensional convolution, so in this line:
mfilt1=np.convolve(vlt,mfilt,mode='full')
you'll get that error if either vlt or mfilt is not 1-D. For example,
In [12]: x = np.array([[1,2,3]]) # x is 2-D
In [13]: y = np.array([1,2,3])
In [14]: np.convolve(x, y, mode='full')
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-14-9bf37a14877a> in <module>()
----> 1 np.convolve(x, y, mode='full')
/home/warren/anaconda/lib/python2.7/site-packages/numpy/core/numeric.pyc in convolve(a, v, mode)
822 raise ValueError('v cannot be empty')
823 mode = _mode_from_name(mode)
--> 824 return multiarray.correlate(a, v[::-1], mode)
825
826 def outer(a,b):
ValueError: object too deep for desired array
It looks like you want 2-D (or higher) convolution. scipy has a few options:
scipy.ndimage.convolve
scipy.signal.convolve
scipy.signal.convolve2d

Categories

Resources