Wrong index after monthly resampling of Dataframe, Pandas - python

My DataFrame has the following format:
I resampled the values based on a monthly basis, but the problem is that even the datatime index start from 2017-07-08, the Date Column after grouping by month and finding the mean, start from 2017-01-31. (There are not data at all in my DataFrame from January 2017 to August 2017). The data recording has started from August 2017.
Could you please give me some insights to understand what is happening?

Related

Using pandas resample monthly data to yearly data but start from a certain month

How do I resample monthly data to yearly data but starting from 1st October.
I tried the following as I know using base works for starting at a certain hour of a day but doesnt appear to work for month of the year.
df = (df.resample(rule='Y', base=10).sum().reset_index())
Here is how you do it:
offset = pd.DateOffset(months=9)
df.shift(freq=-offset).resample('YS').sum().shift(freq=offset)
Pandas has anchored offsets available for annual resamples starting at the first of a month.
The anchored offset for annual resampling starting in October is AS-OCT. Resampling and summing can be done like this:
df.resample("AS-OCT").sum()

Python Pandas: Resample: Adjust start and end date

I have monthly data from 1989/09 to 2020/12 and want to convert it to weekly data (friday to friday) starting 1989/09/29 to 2020/12/25 and just keep the values for the months in the weekly data (e.g. for all weeks in september the value for september for all weeks in october the value for october and so on).
That's how I did it:
df.set_index(pd.DatetimeIndex(df_k["Date"]), inplace=True) #set DatetimeIndex to resample
[![First: Monthly Data ][1]][1]
[1]: https://i.stack.imgur.com/9e5DW.png
df.resample("W-FRI",axis = 0).ffill() #upsample to get weekly data
[![Output: Weekly data with correct values but wrong timespan][2]][2]
[2]: https://i.stack.imgur.com/QAOMX.png
I get a dataframe with correct values but range 1989/09/01 to 2020/12/04. I want to adjust the range to
1989/09/29 to 2020/12/25 but cant find the correct input in the function.

Pandas dataframe delete rows by date

I have two pandas dataframe named dataset and startdate. dataset dataframe contains rows date starting from (1961 - February - 1) to (1961 - December - 31). and many years same way.
The startdate dataframe contain start day for each year such as for 1961 the start date is 1961-February-8. So I need to remove the rows from the dataset dated before start date (1961-February-8) in 1961. That means removing rows dated from (1961-February-1) to (1961-February-7).
I need to do the same for all the other years.
For 1961 I can do :
datset[dataset['date']>='1961-02-08']
But problem is start date from startdate for each year is different.
We do reindex
s=startdate.date.reindex(dataset['year'])
s.index=dataset.index
df=dataset[dataset['date']>=s].copy()

How can I group monthly over years in Python with pandas?

I have a dataset ranging from 2009 to 2019. The Dates include Years, months and days. I have two columns: one with dates and the other with values. I need to group my Dataframe monthly summing up the Values in the other column. At the moment what I am doing is setting the date column as index and using "df.resample('M').sum()".
The problem is that this is grouping my Dataframe monthly but for each different year (so I have 128 values in the "date" column). How can I group my data only for the 12 months without taking into consideration years?
Thank you very much in advance
I attached two images as example of the Dataset I have and the one I want to obtain.
Dataframe I have
Dataframe I want to obtain
use dt.month on your date column.
Example is
df.groupby(df['date'].dt.month).agg({'value':'sum'})

Resample by Date in Pandas — messes up a date in index

I have a multi-index dataFrame in Pandas, with data indexed by building, and then by date. The different columns represent different kinds of energy, and the values represent how much energy was used for a given month.
Image of the dataframe's head is here.
I'd like to turn this into yearly data. I currently have the line
df.unstack(level=0).resample('BAS-JUL').sum()
and this works almost perfectly. Here is the issue: all the dates are given as the 1st of the month, but for some reason, as it does resample, it picks July 2nd as the cut-off for 2012. So the number for July 1, 2012 ends up being counted in the 2011 data. It ends up looking like this. You can see that the second value in the Usage Month column is July 2. Other than that, the resample appears to work perfectly.
If I run df.index.get_level_values(1)[:20], the output is:
DatetimeIndex(['2011-07-01', '2011-08-01', '2011-09-01', '2011-10-01',
'2011-11-01', '2011-12-01', '2012-01-01', '2012-02-01',
'2012-03-01', '2012-04-01', '2012-05-01', '2012-06-01',
'2012-07-01', '2012-08-01', '2012-09-01', '2012-10-01',
'2012-11-01', '2012-12-01', '2013-01-01', '2013-02-01'],
dtype='datetime64[ns]', name='Usage Month', freq=None)
So the index is July 1 2012 in the original dataframe.
Any ideas on how to fix this mini-bug would be appreciated!
Use 'AS-JUL':
df.unstack(level=0).resample('AS-JUL').sum()
The B is for Business Annual Start.

Categories

Resources