Resample by Date in Pandas — messes up a date in index - python

I have a multi-index dataFrame in Pandas, with data indexed by building, and then by date. The different columns represent different kinds of energy, and the values represent how much energy was used for a given month.
Image of the dataframe's head is here.
I'd like to turn this into yearly data. I currently have the line
df.unstack(level=0).resample('BAS-JUL').sum()
and this works almost perfectly. Here is the issue: all the dates are given as the 1st of the month, but for some reason, as it does resample, it picks July 2nd as the cut-off for 2012. So the number for July 1, 2012 ends up being counted in the 2011 data. It ends up looking like this. You can see that the second value in the Usage Month column is July 2. Other than that, the resample appears to work perfectly.
If I run df.index.get_level_values(1)[:20], the output is:
DatetimeIndex(['2011-07-01', '2011-08-01', '2011-09-01', '2011-10-01',
'2011-11-01', '2011-12-01', '2012-01-01', '2012-02-01',
'2012-03-01', '2012-04-01', '2012-05-01', '2012-06-01',
'2012-07-01', '2012-08-01', '2012-09-01', '2012-10-01',
'2012-11-01', '2012-12-01', '2013-01-01', '2013-02-01'],
dtype='datetime64[ns]', name='Usage Month', freq=None)
So the index is July 1 2012 in the original dataframe.
Any ideas on how to fix this mini-bug would be appreciated!

Use 'AS-JUL':
df.unstack(level=0).resample('AS-JUL').sum()
The B is for Business Annual Start.

Related

Get sum of business days in dataframe python with resample

I have a time-series where I want to get the sum the business day values for each week. A snapshot of the dataframe (df) used is shown below. Note that 2017-06-01 is a Friday and hence the days missing represent the weekend
I use resample to group the data by each week, and my aim is to get the sum. When I apply this function however I get results which I can't justify. I was expecting in the first row to get 0 which is the sum of the values contained in the first week, then 15 for the next week etc...
df_resampled = df.resample('W', label='left').sum()
df_resampled.head()
Can someone explain to me what am I missing since it seems like I have not understood the resampling function correctly?

Create equidistant data frame with time ranged data with Python

I have a .cvs file in which data is stored for data ranges - from and to date columns. However, I would like to create a daily data frame with Python out of it.
The time can be ignored, as a gasday always starts at 6am and ends at 6am.
My idea was to have in the end a data frame index with a date (like from March 1st, 2019, ranging to December 31st, 2019 on a daily granularity.
I would create columns with the unique values of the identifier and as values place the respective values or nan in.
The latter one, I can easily do with pd.pivot_table, but still my problem with the time range exists...
Any ideas of how to cope with that?
time-ranged data frame
It should look like this, just with rows in a daily granularity, considering the to column as well. Maybe with range?
output should look similar to this, just with a different period
you can use pandas and groupby the column you want:
df=pd.read_csv("yourfile.csv")
groups=df.groupby("periodFrom")
group.get_group("2019-03-09 06:00")

Resample pandas DataFrame spanning several years

I have a Series that looks like this:
df.index[0:10]
DatetimeIndex(['1881-12-01', '1882-01-01', '1882-02-01', '1882-12-01',
'1883-01-01', '1883-02-01', '1883-12-01', '1884-01-01',
'1884-02-01', '1884-12-01'],
dtype='datetime64[ns]', name='date', freq=None)
Now I'd like to resample it so that every December, January and February is grouped together. More generally: I'd like to resample a dataframe to contain yearly periods, ignoring NaNs, so that the first index is taken into consideration:
assert(df[0:3].mean() == df.resample(something).mean().iloc[0])
df.resample('Y') treats the first index as a separate year. How do I do that? I wrote a partition function that groups an interable into equally sized chunks but I feel like there's a more idiomatic (and potentially faster) way that I'm missing.
I could solve this by using an anchored yearly offset as described here.

Wrong index after monthly resampling of Dataframe, Pandas

My DataFrame has the following format:
I resampled the values based on a monthly basis, but the problem is that even the datatime index start from 2017-07-08, the Date Column after grouping by month and finding the mean, start from 2017-01-31. (There are not data at all in my DataFrame from January 2017 to August 2017). The data recording has started from August 2017.
Could you please give me some insights to understand what is happening?

Python: Date conversion to year-weeknumber, issue at switch of year

I am trying to convert a dataframe column with a date and timestamp to a year-weeknumber format, i.e., 01-05-2017 03:44 = 2017-1. This is pretty easy, however, I am stuck at dates that are in a new year, yet their weeknumber is still the last week of the previous year. The same thing that happens here.
I did the following:
df['WEEK_NUMBER'] = df.date.dt.year.astype(str).str.cat(df.date.dt.week.astype(str), sep='-')
Where df['date'] is a very large column with date and times, ranging over multiple years.
A date which gives a problem is for example:
Timestamp('2017-01-01 02:11:27')
The output for my code will be 2017-52, while it should be 2016-52. Since the data covers multiple years, and weeknumbers and their corresponding dates change every year, I cannot simply subtract a few days.
Does anybody have an idea of how to fix this? Thanks!
Replace df.date.dt.year by this:
(df.date.dt.year- ((df.date.dt.week>50) & (df.date.dt.month==1)))
Basically, it means that you will substract 1 to the year value if the week number is greater than 50 and the month is January.

Categories

Resources