I have a Dataframe series with 30s frequency.
df.head()
I want to calculate the daily averages for all signals in that series but it doesnt seem to work. I tried both
df_average = df.to_period('D')
df.resample('D')
And i get:
I want to have only 1 line per day. Why do i get more?
Thank you
If there is DatetimeIndex only add an aggregate function, here mean, to resample:
df1 = df.resample('D').mean()
#jezrael is sure way to go. Could also try;
df.groupby(df.index.date).mean()
Related
So I have this dataset of temperatures. Each line describe the temperature in celsius measured by hour in a day.
So, I need to compute a new variable called avg_temp_ar_mensal which representsthe average temperature of a city in a month. City in this dataset is represented as estacao and month as mes.
I'm trying to do this using pandas. The following line of code is the one I'm trying to use to solve this problem:
df2['avg_temp_ar_mensal'] = df2['temp_ar'].groupby(df2['mes', 'estacao']).mean()
The goal of this code is to store in a new column the average of the temperature of the city and month. But it doesn't work. If I try the following line of code:
df2['avg_temp_ar_mensal'] = df2['temp_ar'].groupby(df2['mes']).mean()
It will works, but it is wrong. It will calculate for every city of the dataset and I don't want it because it will cause noise in my data. I need to separate each temperature based on month and city and then calculate the mean.
The dataframe after groupby is smaller than the initial dataframe, that is why your code run into error.
There is two ways to solve this problem. The first one is using transform as:
df.groupby(['mes', 'estacao'])['temp_ar'].transform(lambda g: g.mean())
The second is to create a new dfn from groupby then merge back to df
dfn = df.groupby(['mes', 'estacao'])['temp_ar'].mean().reset_index(name='average')
df = pd.merge(df, dfn, on=['mes', 'estacao'], how='left']
You are calling a groupby on a single column when you are doing df2['temp_ar'].groupby(...). This doesn't make much sense since in a single column, there's nothing to group by.
Instead, you have to perform the groupby on all the columns you need. Also, make sure that the final output is a series and not a dataframe
df['new_column'] = df[['city_column', 'month_column', 'temp_column']].groupby(['city_column', 'month_column']).mean()['temp_column']
This should do the trick if I understand your dataset correctly. If not, please provide a reproducible version of your df
I have some data that is based on every 3 hours and I try to resample it by using
df = df.groupby(df.index.date).resample('1h').pad()
however it stops at the last data at 21:00 everyday and the last three hours are not there. How should I solve this?
You could use DataFrame.asfreq
df.asfreq('H').groupby(df.index.date).resample('H').pad()
Was trying to get output with date range(weekly). i am able to group the date and sum the values but how to get the output as per the below image. tried pd.grouping with frequency, resample with no luck any other methods can help?
looking for the desired output as per the image
resample works on time series data. If you want to resample a DataFrame, it should either have a DateTime index or you need to pass on parameter to resample.
This should work
df.resample('W', on='Date').sum()
W is the weekly frequency, see here.
Another option you might explor is cut, but IMO resample will be better for what you need.
I currently have a dataframe that looks like:
I am trying to figure out how to do the following, and just don't know how to start....
For each DAY, cumsum the volume..
After this, group the data by time
of day (ie, 10min intervals). If a day doesn't have that interval
(sometimes gaps), then it should just be treated as 0.
Any help would really be appreciated!
For number 1:
Let's use resample with D:
df.resample('D')['volume'].cumsum()
For Number 2:
Let's use resample('10T') with asfreq and replace:
df.resample('10T').asfreq().replace(np.nan,0)
I am using Pandas to structure and process Data. This is my DataFrame:
I grouped many datetimes by minute and I did an aggregation in order to have the sum of 'bitrate' scores by minute.
This was my code to have this Dataframe:
def aggregate_data(data):
def delete_seconds(time):
return (datetime.datetime.strptime(time, '%Y-%m-%d %H:%M:%S')).replace(second=0)
data['new_time'] = data['beginning_time'].apply(delete_seconds)
df = (data[['new_time', 'bitrate']].groupby(['new_time'])).aggregate(np.sum)
return df
Now I want to do a similar thing with 5 minutes as buckets. I wand to do group my datetimes by 5 minutes and do a mean..
Something like this : (This dosent work of course!)
df.groupby([df.index.map(lambda t: t.5minute)]).aggregate(np.mean)
Ideas ? Thx !
use resample.
df.resample('5Min').sum()
This assumes your index is properly set as a DateTimeIndex.
you can also use the TimeGrouper, as resampling is a groupby operation on time buckets.
df.groupby(pd.TimeGrouper('5Min')).sum()