Resample pandas DataFrame spanning several years - python

I have a Series that looks like this:
df.index[0:10]
DatetimeIndex(['1881-12-01', '1882-01-01', '1882-02-01', '1882-12-01',
'1883-01-01', '1883-02-01', '1883-12-01', '1884-01-01',
'1884-02-01', '1884-12-01'],
dtype='datetime64[ns]', name='date', freq=None)
Now I'd like to resample it so that every December, January and February is grouped together. More generally: I'd like to resample a dataframe to contain yearly periods, ignoring NaNs, so that the first index is taken into consideration:
assert(df[0:3].mean() == df.resample(something).mean().iloc[0])
df.resample('Y') treats the first index as a separate year. How do I do that? I wrote a partition function that groups an interable into equally sized chunks but I feel like there's a more idiomatic (and potentially faster) way that I'm missing.

I could solve this by using an anchored yearly offset as described here.

Related

Resampling a pandas series (but not a time series)

Suppose I have a series, and I want to do the sort of thing pandas does with resample - say, compute the mean (or some other aggregation) of rows 0-14, 14-29, ..., etc. Of course this can be done with rolling, but this will do (in the example case) 15 times as much work as necessary.
(so, if s is the series, then s.rolling(15).mean().iloc[::15] One can of course, introduce a DateTime index, and then do resample, but this seems like a kludge. What's the canonical way?

Grouping date ranges in pandas

Was trying to get output with date range(weekly). i am able to group the date and sum the values but how to get the output as per the below image. tried pd.grouping with frequency, resample with no luck any other methods can help?
looking for the desired output as per the image
resample works on time series data. If you want to resample a DataFrame, it should either have a DateTime index or you need to pass on parameter to resample.
This should work
df.resample('W', on='Date').sum()
W is the weekly frequency, see here.
Another option you might explor is cut, but IMO resample will be better for what you need.

Pandas Python Dataframe

I have a dataset with YYYY-MM as data, however I want to find the mean of the temperature for the year, therefore I need to add up the 12 months in a year, and find the summary. How do I do that using Pandas?
An example of my data: (I have more than a year dataset, tried to reshape them, but it doesn't seem to work)
Ket us do string slice then groupby + sum
s=df.groupby(df['month'].str[:4]).sum()

Add future date-time to pandas df

This is, I think, a rather simple question which I have not been able to find a proper answer.
I have a pandas dataframe with the following characteristics
shape(frame)
Out[117]: (3652, 2)
Here 3652 refers to days within a decade (3652 since we have 2 leap years)
I would like to add a third column that shows date range between 2035-01-01 and 2044-12-31
Many thanks

Resample by Date in Pandas — messes up a date in index

I have a multi-index dataFrame in Pandas, with data indexed by building, and then by date. The different columns represent different kinds of energy, and the values represent how much energy was used for a given month.
Image of the dataframe's head is here.
I'd like to turn this into yearly data. I currently have the line
df.unstack(level=0).resample('BAS-JUL').sum()
and this works almost perfectly. Here is the issue: all the dates are given as the 1st of the month, but for some reason, as it does resample, it picks July 2nd as the cut-off for 2012. So the number for July 1, 2012 ends up being counted in the 2011 data. It ends up looking like this. You can see that the second value in the Usage Month column is July 2. Other than that, the resample appears to work perfectly.
If I run df.index.get_level_values(1)[:20], the output is:
DatetimeIndex(['2011-07-01', '2011-08-01', '2011-09-01', '2011-10-01',
'2011-11-01', '2011-12-01', '2012-01-01', '2012-02-01',
'2012-03-01', '2012-04-01', '2012-05-01', '2012-06-01',
'2012-07-01', '2012-08-01', '2012-09-01', '2012-10-01',
'2012-11-01', '2012-12-01', '2013-01-01', '2013-02-01'],
dtype='datetime64[ns]', name='Usage Month', freq=None)
So the index is July 1 2012 in the original dataframe.
Any ideas on how to fix this mini-bug would be appreciated!
Use 'AS-JUL':
df.unstack(level=0).resample('AS-JUL').sum()
The B is for Business Annual Start.

Categories

Resources