I want to find the modulus of two time delta columns in Pandas. My data frame looks something like this:
import pandas as pd
d={'date':(['01/01/2018','05/02/2018','01/01/2018','05/01/2018']),\
'fct date':(['01/06/2019','01/06/2019','01/06/2019','01/06/2019']),\
'tenor':[1,2,3,4]}
df=pd.DataFrame(data=d)
df['date'] = pd.to_datetime(df['date'],format='%d/%m/%Y')
df['fct date'] = pd.to_datetime(df['fct date'],format='%d/%m/%Y')
df['tenor']=pd.to_timedelta(df['tenor'],unit="d")
df
when I try to apply the modulus function, I get a unsupported operand timedeltaindex error. Any idea why this is throwing an error ?
a=(df['date']-df['fct date'])%df['tenor']
Related
I have yearly average closing values for an asset in a dataframe, and I need to find the structural breaks in the time series. I intended to do this using the stats model 'season_decompose' method but I am having trouble implementing it.
Example data below
from statsmodels.tsa.seasonal import seasonal_decompose
data = {'Year':['1991','1992','1993','1994','1995','1996','1997','1998','1999','2000','2001','2002','2003','2004'],'Close':[11,22,55,34,447,85,99,86,83,82,81,34,33,36]}
df = pd.DataFrame(data)
df['Year'] = df['Year'].astype(str)
sd = seasonal_decompose(df)
plt.show()
ValueError: You must specify a period or x must be a pandas object with a DatetimeIndex with a freq not set to None
When I change the 'Year' column to date time, I get the following issue:
TypeError: float() argument must be a string or a number, not 'Timestamp'
I do not know what the issue is. I have no missing values? Secondary to this, does anybody know a more efficient method to identify structural breaks in time series data?
Thanks
The problem is that you need to set column Year as the index after converting the string values to datetime (from the ValueError message: a pandas object with a DatetimeIndex).
So, e.g.:
from statsmodels.tsa.seasonal import seasonal_decompose
import pandas as pd
data = {'Year':['1991','1992','1993','1994','1995','1996','1997','1998','1999','2000','2001','2002','2003','2004'],'Close':[11,22,55,34,447,85,99,86,83,82,81,34,33,36]}
df = pd.DataFrame(data)
df['Year'] = pd.to_datetime(df['Year'])
df.set_index('Year', drop=True, inplace=True)
sd = seasonal_decompose(df)
sd.plot()
Plot:
I am trying to create a dataframe (df) that creates a sample sums, means, and standard deviations of the following 12 monthly return series by month from another dataframe cdv file called QUESTION1DATA.csv and he head of performance data looks like this: .
So far I have created a code to find what I am looking for and have come up with this:
import time
df['Timestamp'] = portlist.to_datetime(df['Year'],format='%Y')
new_df = df.groupby([df.index.year, df.index.month]).agg(['sum', 'mean', 'std'])
new_df.index.set_names(['Year', 'Month'], inplace = True)
new_df.reset_index(inplace = True)
However when I run this code I get this error and don't know where to go from there.
`"list' object has no attribute 'to_datetime'
I
to_datetime is available in Pandas and should be use as follows:
>>pd.to_datetime('13000101', format='%Y%m%d', errors='ignore')
datetime.datetime(1300, 1, 1, 0, 0)
In your case portlist is a list and hence the code is throwing error.
In your case it should be as follows:
df['Timestamp'] = pd.to_datetime(df['Year'],format='%Y')
This is my code:
import pandas as pd
data = pd.read_csv("temp.data",sep=';')
data['Date'] = pd.to_datetime(data['Date']+' '+data['Time'])
del data['Time']
data.rename(columns={'Date':'TimeStamp'}, inplace=True)
data = data.reset_index()
data['TimeStamp'] = pd.to_datetime(data['TimeStamp'], format='%d/%m/%Y %H:%M:%S')
data['num'] = pd.to_numeric(data['num'])
data = data.resample('1M')
Error:
"TypeError: Only valid with DatetimeIndex, TimedeltaIndex or PeriodIndex, but got an instance of 'RangeIndex'".
Sample data (Original data has 200 thousand rows, with no missing values):
17/12/2006;01:58:00;3.600
17/12/2006;01:59:00;2.098
17/12/2006;02:00:00;1.334
17/12/2006;02:01:00;4.362
17/12/2006;02:02:00;1.258
17/12/2006;02:03:00;2.448
17/12/2006;02:04:00;5.426
17/12/2006;02:05:00;9.704
As mentioned in the error, when trying to resample, you have a RangeIndex, which is just consecutive integers, while you need one that represents points in time.
Adding
data.set_index('TimeStamp', inplace=True)
before you resample will set your 'TimeStamp' column as the index. You can also do away with
data = data.reset_index()
unless you want an extra column labeled 'index' floating around.
I have a DataFrame that looks like this:
raw_data = {'Series_Date':['2017-03-10','2017-03-13','2017-03-14','2017-03-15'],'SeriesDate':['2017-03-10','2017-03-13','2017-03-14','2017-03-15']}
import pandas as pd
df = pd.DataFrame(raw_data,columns=['Series_Date','SeriesDate'])
print df
However, on running the following comands:
from pandas.tseries.offsets import BDay
df['SeriesDate'] = pd.to_datetime(df['SeriesDate'])
df['Start_Date'] = df['SeriesDate'] - BDay(10)
I get the following error:
TypeError: ufunc subtract cannot use operands with types dtype('<M8[ns]') and dtype('O')
How can I work around this?
Code works fine for me. So I'm guessing there's some problems in your environment. You can read up a similar answer here: pandas date column subtraction
Would have commented instead of an answer, but I do not have enough rep to do so.
I am using pandas to deal with monthly data that have some missing value. I would like to be able to use the resample method to compute annual statistics but for years with no missing data.
Here is some code and output to demonstrate :
import pandas as pd
import numpy as np
dates = pd.date_range(start = '1980-01', periods = 24,freq='M')
df = pd.DataFrame( [np.nan] * 10 + range(14), index = dates)
Here is what I obtain if I resample :
In [18]: df.resample('A')
Out[18]:
0
1980-12-31 0.5
1981-12-31 7.5
I would like to have a np.nan for the 1980-12-31 index since that year does not have monthly values for every month. I tried to play with the 'how' argument but to no luck.
How can I accomplish this?
i'm sure there's a better way, but in this case you can use:
df.resample('A', how=[np.mean, pd.Series.count, len])
and then drop all rows where count != len