Grouping date ranges in pandas - python

Was trying to get output with date range(weekly). i am able to group the date and sum the values but how to get the output as per the below image. tried pd.grouping with frequency, resample with no luck any other methods can help?
looking for the desired output as per the image

resample works on time series data. If you want to resample a DataFrame, it should either have a DateTime index or you need to pass on parameter to resample.
This should work
df.resample('W', on='Date').sum()
W is the weekly frequency, see here.
Another option you might explor is cut, but IMO resample will be better for what you need.

Related

pandas computing new column as a average of other two conditions

So I have this dataset of temperatures. Each line describe the temperature in celsius measured by hour in a day.
So, I need to compute a new variable called avg_temp_ar_mensal which representsthe average temperature of a city in a month. City in this dataset is represented as estacao and month as mes.
I'm trying to do this using pandas. The following line of code is the one I'm trying to use to solve this problem:
df2['avg_temp_ar_mensal'] = df2['temp_ar'].groupby(df2['mes', 'estacao']).mean()
The goal of this code is to store in a new column the average of the temperature of the city and month. But it doesn't work. If I try the following line of code:
df2['avg_temp_ar_mensal'] = df2['temp_ar'].groupby(df2['mes']).mean()
It will works, but it is wrong. It will calculate for every city of the dataset and I don't want it because it will cause noise in my data. I need to separate each temperature based on month and city and then calculate the mean.
The dataframe after groupby is smaller than the initial dataframe, that is why your code run into error.
There is two ways to solve this problem. The first one is using transform as:
df.groupby(['mes', 'estacao'])['temp_ar'].transform(lambda g: g.mean())
The second is to create a new dfn from groupby then merge back to df
dfn = df.groupby(['mes', 'estacao'])['temp_ar'].mean().reset_index(name='average')
df = pd.merge(df, dfn, on=['mes', 'estacao'], how='left']
You are calling a groupby on a single column when you are doing df2['temp_ar'].groupby(...). This doesn't make much sense since in a single column, there's nothing to group by.
Instead, you have to perform the groupby on all the columns you need. Also, make sure that the final output is a series and not a dataframe
df['new_column'] = df[['city_column', 'month_column', 'temp_column']].groupby(['city_column', 'month_column']).mean()['temp_column']
This should do the trick if I understand your dataset correctly. If not, please provide a reproducible version of your df

Resampling with percentiles

I have a data frame with some numerical values and a date-timestamp.
What I would like to do is aggregate the data into monthly intervals outputting a max percentile value for each month.
What I have been doing so far is just using:
df = df.resample('M', on='ds').max()
Which gives me the max value for that month. However, from what I can see in my data there are usually one or two spikes in each month. The result is that by using max() I will get that spike value - which is not correct. So I way to filter out the few high value peaks I was wondering if I could use a percentile function instead of max(), .e.g:
np.percentile(df['y'], 99)
As far as I can see the resample function does not provide the option to use own functions. But I might be wrong? In any case, how can this be accomplished ?
Use custom lambda function in GroupBy.agg:
df = df.resample('M', on='ds')['y'].agg(lambda x: np.percentile(x, 99))

Calculating daily averages in pd.Series

I have a Dataframe series with 30s frequency.
df.head()
I want to calculate the daily averages for all signals in that series but it doesnt seem to work. I tried both
df_average = df.to_period('D')
df.resample('D')
And i get:
I want to have only 1 line per day. Why do i get more?
Thank you
If there is DatetimeIndex only add an aggregate function, here mean, to resample:
df1 = df.resample('D').mean()
#jezrael is sure way to go. Could also try;
df.groupby(df.index.date).mean()

How can I calculate the number of days between two dates with different format in Python?

I have a pandas dataframe with a column of orderdates formatted like this: 2019-12-26.
However when I take the max of this date it will give 2019-12-12. While it is actually 2019-12-26. It makes sense because my dateformat is Dutch and the max() function uses the 'American' (correct me if I'm wrong) format.
This meas that my calculations aren't correct.
How I can change the way the function calculate? Or if thats not possible, change the format of my date column so the calculations are correct?
[In] df['orderdate'] = df['orderdate'].astype('datetime64[ns]')
print(df["orderdate"].max())
[Out] 2019-12-12 00:00:00
Thank you!

Using a function to do a %change on a dataset

I am on python using pandas but running into this issue. I am having a dataset that has the countries on the columns and dates(my months) on the rows. The data consists of the population of an item.
I am required to calculate the % change of population month by month is there a function that I can use to get the data into a dataset with the %change month by month in the format attached?
I am trying to do the apply a function onto the dataset but getting the function to retrieve the previous month's population to do a % change is an issue.
Anyone has any good ideas to get this done? Thanks
You can use pct_change:
df.pct_change()
First order the data by month (if it isn't already), and then use the .shift() function for pandas dataframes
df['pct_change'] = (df.US - df.US.shift(1) ) / df.US
.shift() allows you to shift rows up or down depending on the argument.

Categories

Resources