I am trying to replicate a simple Technical-Analysis indicator using xlwings. However, the list/data seems not to be able to read Excel values. Below is the code
import pandas as pd
import datetime as dt
import numpy as np
#xw.func
def EMA(df, n):
EMA = pd.Series(pd.ewma(df['Close'], span = n, min_periods = n - 1), name = 'EMA_' + str(n))
df = df.join(EMA)
return df
When I enter a list of excel data : EMA = ({1,2,3,4,5}, 5}, I get the following error message
TypeError: list indices must be integers, not str EMA = pd.Series(pd.ewma(df['Close'], span = n, min_periods = n - 1), name = 'EMA_' + str(n))
(Expert) help much appreciated! Thanks.
EMA() expects a DataFrame df and a scalar n, and it returns the EMA in a separate column in the source DataFrame. You are passing a simple list of values, this is not supposed to work.
Construct a DataFrame and assign the values to the Close column:
v = range(100) # use your list of values instead
df = pd.DataFrame(v, columns=['Close'])
Call EMA() with this DataFrame:
EMA(df, 5)
Related
I have the following dataframe:
import pandas as pd
import numpy as np
from pandas_datareader import data as pdr
from datetime import date, timedelta
yf.pdr_override()
end = date.today()
start = end - timedelta(days=7300)
# download dataframe
data = pdr.get_data_yahoo('^GSPC', start=start, end= end)
Now, that I have the dataframe, I want to create a function to add the logarithmic return based on a column to the dataframe called 'data', with the following code:
data['log_return'] = np.log(data['Adj Close'] / data['Adj Close'].shift(1))
How I think the function should look like is like this:
def add_log_return(df):
# add returns in a logarithmic fashion
added = df.copy()
added["log_return"] = np.log(df[column] / df[column].shift(1))
added["log_return"] = added["log_return"].apply(lambda x: x*100)
return added
How can I select a specific column as an input of the function add_log_return(df['Adj Close']), so the function adds the logarithmic return to my 'data' dataframe?
data = add_log_return(df['Adj Close'])
Just add an argument column to your function!
def add_log_return(df, column):
# add returns in a logarithmic fashion
added = df.copy()
added["log_return"] = np.log(df[column] / df[column].shift(1)) * 100
return added
new_df = add_log_return(old_df, 'Adj_Close')
Note I removed the line in your function to apply a lambda that just multiplied by 100. It's much faster to do this in a vectorized manner, by including it in the np.log(...) line
However, if I were you, I'd just return the Series object instead of copying the dataframe and modifying and returning the copy.
def log_return(col: pd.Series) -> np.ndarray:
return np.log(col / col.shift(1)) * 100
Now, the caller can do what they want with it:
df['log_ret'] = log_return(df['Adj_Close'])
How can I make this account that I made in excel in python...
I wanted to take the column "Acumulado" and multiply by the bottom row of the column 'Selic por diy' and add that value in that row, and so do the same thing successively
import pandas as pd
# Creating the dataframe
df = pd.DataFrame({"Data":['06/03/2006','07/03/2006','08/03/2006','09/03/2006','10/03/2006','13/03/2006','14/03/2006','15/03/2006','16/03/2006','17/03/2006'],
"Taxa SELIC":[17.29,17.29,17.29,16.54,16.54,16.54,16.54,16.54,16.54,16.54,]})
df['Taxa Selic %'] = df['Taxa SELIC'] / 100
df['Selic por dia'] = (1 + df['Taxa SELIC'])**(1/252)
Data frame Example
Here's an example I did in excel
Second example of how I would like it to look
Not an efficient method, but you can try this:
import numpy as np
selic_per_dia = list(df['Selic por dia'].values)
accumulado = [1000000*selic_per_dia[0]]
for i,value in enumerate(selic_per_dia):
if i==0:
continue
else:
accumulado.append(accumulado[i-1]*value)
df['Acumulado'] = accumulado
df.loc[-1] = [np.nan,np.nan,np.nan,np.nan,1000000]
df.index = df.index + 1
df = df.sort_index()
I am trying to create a simple time-series, of different rolling types. One specific example, is a rolling mean of N periods using the Panda python package.
I get the following error : ValueError: DataFrame constructor not properly called!
Below is my code :
def py_TA_MA(v, n, AscendType):
df = pd.DataFrame(v, columns=['Close'])
df = df.sort_index(ascending=AscendType) # ascending/descending flag
M = pd.Series(df['Close'].rolling(n), name = 'MovingAverage_' + str(n))
df = df.join(M)
df = df.sort_index(ascending=True) #need to double-check this
return df
Would anyone be able to advise?
Kind regards
found the correction! It was erroring out (new error), where I had to explicitly declare n as an integer. Below, the code works
#xw.func
#xw.arg('n', numbers = int, doc = 'this is the rolling window')
#xw.ret(expand='table')
def py_TA_MA(v, n, AscendType):
df = pd.DataFrame(v, columns=['Close'])
df = df.sort_index(ascending=AscendType) # ascending/descending flag
M = pd.Series(df['Close'], name = 'Moving Average').rolling(window = n).mean()
#df = pd.Series(df['Close']).rolling(window = n).mean()
df = df.join(M)
df = df.sort_index(ascending=True) #need to double-check this
return df
I have a pandas dataframe which has a column 'INTENSITY' and a numpy array of same length containing the error for each intensity. I would like to generate columns with randomly generated numbers in the error range.
So far I use two nested for loops to create the new columns but I feel like this is inefficient:
theor_err = [ sqrt(abs(x)) for x in theor_df[str(INTENSITY)] ]
theor_err = np.asarray(theor_err)
for nr_sample in range(2):
sample = np.zeros(len(theor_df[str(INTENSITY)]))
for i, error in enumerate(theor_err):
sample[i] = theor_df[str(INTENSITY)][i] + random.uniform(-error, error)
theor_df['gen_{}'.format(nr_sample)] = Series(sample, index=theor_df.index)
theor_df.head()
Is there a more efficient way of approaching a problem like this?
Numpy can handle arrays for you. So, you can do it like this:
import pandas as pd
import numpy as np
a=pd.DataFrame([10,20,15,30],columns=['INTENSITY'])
a['theor_err']=np.sqrt(np.abs(a.INTENSITY))
a['sample']=np.random.uniform(-a['theor_err'],a['theor_err'])
Suppose you want to generate 6 samples. You can try to code below. You can tune the number of samples you want by setting the value k.
df = pd.DataFrame([[1],[2],[3],[4],[-5]], columns=["intensity"])
k = 6
sample_names = ["sample" + str(i+1) for i in range(k)]
df["err"] = np.sqrt(np.abs((df["intensity"])))
df[sample_names] = pd.DataFrame(
df["err"].map(lambda x: np.random.uniform(-x, x, k)).values.tolist())
df.loc[:,sample_names] = df.loc[:,sample_names].add(df.intensity, axis=0)
import pandas as pd
census_df = pd.read_csv('census.csv')
#census_df.head()
def answer_seven():
census_df_1 = census_df[(census_df['SUMLEV'] == 50)].set_index('CTYNAME')
census_df_1['highest'] = census_df_1[['POPESTIAMTE2010','POPESTIAMTE2011','POPESTIAMTE2012','POPESTIAMTE2013','POPESTIAMTE2014','POPESTIAMTE2015']].max()
census_df_1['lowest'] =census_df_1[['POPESTIAMTE2010','POPESTIAMTE2011','POPESTIAMTE2012','POPESTIAMTE2013','POPESTIAMTE2014','POPESTIAMTE2015']].min()
x = abs(census_df_1['highest'] - census_df_1['lowest']).tolist()
return x[0]
answer_seven()
This is trying to use the data from census.csv to find the counties that have the largest absolute change in population within 2010-2015(POPESTIMATES), I wanted to simply find the difference between abs.value of max and min value for each year/column. You must return a string. also [(census_df['SUMLEV'] ==50)] means only counties are taken as they are set to 50. But the code gives an error that ends with
KeyError: "['POPESTIAMTE2010' 'POPESTIAMTE2011' 'POPESTIAMTE2012'
'POPESTIAMTE2013'\n 'POPESTIAMTE2014' 'POPESTIAMTE2015'] not in index"
Am I indexing the wrong data structure? I'm really new to datascience and coding.
I think the column names in the code have typo. The pattern is 'POPESTIMATE201?' and not 'POPESTIAMTE201?'
Any help with shortening the code will be appreciated. Here is the code that works -
census_df = pd.read_csv('census.csv')
def answer_seven():
cdf = census_df[(census_df['SUMLEV'] == 50)].set_index('CTYNAME')
columns = ['POPESTIMATE2010', 'POPESTIMATE2011', 'POPESTIMATE2012', 'POPESTIMATE2013', 'POPESTIMATE2014', 'POPESTIMATE2015']
cdf['big'] = cdf[columns].max(axis =1)
cdf['sml'] = cdf[columns].min(axis =1)
cdf['change'] = cdf[['big']].sub(cdf['sml'], axis=0)
return cdf['change'].idxmax()