Average pandas dataframe on time index for a particular time interval - python

I have a dataframe where for each timestamp there are some points earned by the user. It looks like the following i.e. data was collected after few seconds
>> df.head()
points
timestamp
2017-05-29 17:40:45 5
2017-05-29 17:41:53 7
2017-05-29 17:42:34 3
2017-05-29 17:42:36 8
2017-05-29 17:42:37 6
Then I wanted to resample it to an interval of 5 minutes so I did this
>> df.resample("5min").mean()
points
timestamp
5/29/2017 17:40 8
5/29/2017 17:45 1
5/29/2017 17:50 4
5/29/2017 17:55 3
5/29/2017 18:00 8
5/30/2017 17:30 3
5/30/2017 17:35 3
5/30/2017 17:40 7
5/30/2017 17:45 8
5/30/2017 17:50 5
5/30/2017 17:55 7
5/30/2017 18:00 1
Now I want to give an input like this input_time = "17:00-18:00" and I want to divide the input time into 5min interval for e.g. [17:05, 17:10 ... 17:55, 18:00]. After that for each interval I want to get the average points earned for that particular time interval. The results should look like the following
interval points
17:00 -
17:05 -
….
17:30 3
17:35 3
17:40 7.5
17:45 4.5
17:50 4.5
17:55 5
18:00 4.5
Need your help. Thanks

Create DatetimeIndex by date_range and change format by strftime:
input_time = "17:00-18:00"
s,e = input_time.split('-')
r = pd.date_range(s, e, freq='5T').strftime('%H:%M')
print (r)
['17:00' '17:05' '17:10' '17:15' '17:20' '17:25' '17:30' '17:35' '17:40'
'17:45' '17:50' '17:55' '18:00']
Also convert original index for groupby with aggregate mean, last reindex by range:
df = df.groupby(df.index.strftime('%H:%M'))['points'].mean().reindex(r)
print (df)
17:00 NaN
17:05 NaN
17:10 NaN
17:15 NaN
17:20 NaN
17:25 NaN
17:30 3.0
17:35 3.0
17:40 7.5
17:45 4.5
17:50 4.5
17:55 5.0
18:00 4.5
Name: points, dtype: float64

Related

Calculation of the daily average

How to transform this data so that the pm 2.5 pm 10 columns are the average of the whole day. The data I collected (example here below) collects data every 15 minutes.
Pm 2.5 Pm 10 Created At
0 6.00 19.20 2021-06-21 19:00
1 4.70 17.00 2021-06-21 19:15
2 4.80 16.70 2021-06-21 19:30
3 5.10 12.10 2021-06-21 19:45
4 7.90 19.10 2021-06-21 20:00
Let's resample the dataframe:
df['Created At'] = pd.to_datetime(df['Created At'])
df.resample('D', on='Created At').mean()
Pm 2.5 Pm 10
Created At
2021-06-21 5.7 16.82
You can use pd.Grouper and then transform if you want to preserve the dataframe shape:
df['Created At'] = pd.to_datetime(df['Created At'])
df[['Pm 2.5', 'Pm 10']] = df.groupby(pd.Grouper(key='Created At', freq='D'))\
[['Pm 2.5', 'Pm 10']].transform('mean')
Output:
Pm 2.5 Pm 10 Created At
0 5.7 16.82 2021-06-21 19:00:00
1 5.7 16.82 2021-06-21 19:15:00
2 5.7 16.82 2021-06-21 19:30:00
3 5.7 16.82 2021-06-21 19:45:00
4 5.7 16.82 2021-06-21 20:00:00
here is one way do it
convert the date using to_datetime, grab the date and carry out the mean
df.groupby(pd.to_datetime(df['Created At']).dt.date).mean()
Created At Pm 2.5 Pm 10
0 2021-06-21 5.7 16.82

Code required to aggregate dataframe group total as well as retrieve first and last date by group

I am using python for some supply chain/manufacturing purposes. I am trying to figure out some code that will allow me to do configure the relevant information I need to compile.
I am trying to aggregate the total 'UnitsProduced' by 'Lot' and also grab only the first occurring 'Date/StartTime' and last occurring 'Date/EndTime'.
Right now the (simplified) dataframe is as follows:
Lot
UnitsProduced
Date/StartTime
Date/EndTime
1
5
1/1/2021 8:00
1/1/2021 13:00
1
13
1/2/2021 10:00
1/2/2021 14:00
2
20
1/3/2021 7:00
1/3/2021 11:00
3
15
1/4/2021 14:30
1/4/2021 19:00
3
6
1/4/2021 20:00
1/4/2021 22:00
3
28
1/5/2021 7:00
1/5/2021 13:00
The end result should look something like:
Lot
Units Produced
Date/StartTime
Date/EndTime
1
18
1/1/2021 8:00
1/2/2021 14:00
2
20
1/3/2021 7:00
1/3/2021 11:00
3
49
1/4/2021 14:30
1/5/2021 13:00
Thank you for the help. If there is any other information I can provide please let me know
You could use groupby.agg with a dictionary of aggregate functions, just make sure that the date columns are in datetime format:
# df['Date/StartTime'] = pd.to_datetime(df['Date/StartTime'])
# df['Date/EndTime'] = pd.to_datetime(df['Date/EndTime'])
df.groupby('Lot', as_index=False).agg({'UnitsProduced':'sum',
'Date/StartTime':'min',
'Date/EndTime':'max'})
Lot UnitsProduced Date/StartTime Date/EndTime
0 1 18 2021-01-01 08:00:00 2021-01-02 14:00:00
1 2 20 2021-01-03 07:00:00 2021-01-03 11:00:00
2 3 49 2021-01-04 14:30:00 2021-01-05 13:00:00
​

Splitting Pandas Dataframe into chunks by Timestamp

Let's say I have a pandas dataframe df
DF
Timestamp Value
Jan 1 12:32 10
Jan 1 12:50 15
Jan 1 13:01 5
Jan 1 16:05 17
Jan 1 16:10 17
Jan 1 16:22 20
The result I want back, is a dataframe with per-hour (or any user specified time-segment, really) averages. Let's say my specified timesegment is 1 hour here. I want back something like
Jan 1 12:00 12.5
Jan 1 13:00 5
Jan 1 14:00 0
Jan 1 15:00 0
Jan 1 16:00 18
Is there a simple way built into pandas to segment like this? It feels like there should be, but my googling of "splitting pandas dataframe" in a variety of ways is failing me.
We need to convert to datetime first then do resample
df.Timestamp=pd.to_datetime('2020 '+df.Timestamp)
df.set_index('Timestamp').Value.resample('1H').mean().fillna(0)
Timestamp
2020-01-01 12:00:00 7.5
2020-01-01 13:00:00 5.0
2020-01-01 14:00:00 0.0
2020-01-01 15:00:00 0.0
2020-01-01 16:00:00 18.0
Freq: H, Name: Value, dtype: float64
Convert the index
newdf.index=newdf.index.strftime('%B %d %H:%M')
newdf
Timestamp
January 01 12:00 7.5
January 01 13:00 5.0
January 01 14:00 0.0
January 01 15:00 0.0
January 01 16:00 18.0
Name: Value, dtype: float64

How to plot Month name in correct order, using strftime?

Here's a quick peek of my dataframe:
local_date amount
0 2017-08-16 10.00
1 2017-10-26 21.70
2 2017-11-04 5.00
3 2017-11-12 37.20
4 2017-11-13 10.00
5 2017-11-18 31.00
6 2017-11-27 14.00
7 2017-11-29 10.00
8 2017-11-30 37.20
9 2017-12-16 8.00
10 2017-12-17 43.20
11 2017-12-17 49.60
12 2017-12-19 102.50
13 2017-12-19 28.80
14 2017-12-22 72.55
15 2017-12-23 24.80
16 2017-12-24 62.00
17 2017-12-26 12.40
18 2017-12-26 15.50
19 2017-12-26 40.00
20 2017-12-28 57.60
21 2017-12-31 37.20
22 2018-01-01 18.60
23 2018-01-02 12.40
24 2018-01-04 32.40
25 2018-01-05 17.00
26 2018-01-06 28.80
27 2018-01-11 20.80
28 2018-01-12 10.00
29 2018-01-12 26.00
I am trying to plot monthly sum of transactions, which is fine, except for ugly x-ticks:
I would like to change it to Name of the month and year (e.g. Jan 2019). So I sort the dates, change them using strftime and plot it again, but the order of the date are completely messed up.
The code I used to sort the dates and conver them is:
transactions = transactions.sort_values(by='local_date')
transactions['month_year'] = transactions['local_date'].dt.strftime('%B %Y')
#And then groupby that column:
transactions.groupby('month_year').amount.sum().plot(kind='bar')
When doing this, the Month_year are paired together. January 2019 comes after January 2018 etc. etc.
I thought sorting by date would fix this, but it doesn't. What's the best way to approach this?
You can convert column to mont periods by Series.dt.to_period and then change PeriodIndex to custom format in rename:
transactions = transactions.sort_values(by='local_date')
(transactions.groupby(transactions['local_date'].dt.to_period('m'))
.amount.sum()
.rename(lambda x: x.strftime('%B %Y'))
.plot(kind='bar'))
Alternative solution:
transactions = transactions.sort_values(by='local_date')
s = transactions.groupby(transactions['local_date'].dt.to_period('m')).amount.sum()
s.index = s.index.strftime('%B %Y')
s.plot(kind='bar')

Plot line plot per weekday and week number

I have the following data. This represents the number of occurrences in January:
date value WeekDay WeekNo Year Month
2018-01-01 214.0 Monday 1 2018 1
2018-01-02 232.0 Tuesday 1 2018 1
2018-01-03 147.0 Wed 1 2018 1
2018-01-04 257.0 Thursd 1 2018 1
2018-01-05 164.0 Friday 1 2018 1
2018-01-06 187.0 Saturd 1 2018 1
2018-01-07 201.0 Sunday 1 2018 1
2018-01-08 141.0 Monday 2 2018 1
2018-01-09 152.0 Tuesday 2 2018 1
2018-01-10 167.0 Wednesd 2 2018 1
2018-01-15 113.0 Monday 3 2018 1
2018-01-16 139.0 Tuesday 3 2018 1
2018-01-17 159.0 Wednesd 3 2018 1
2018-01-18 202.0 Thursd 3 2018 1
2018-01-19 207.0 Friday 3 2018 1
... ... ... ... ...
WeekNo is the number of the week in a year.
My goal is to have a line plot showing the evolution of occurrences, for this particular month, per week number. Therefore, I'd like to have the weekday in the x-axis, the occurrences on the y-axis and different lines, each with a different color, for each week (and a legend with the color that corresponds to each week).
Does anyone have any idea how this could be done? Thanks a lot!
You can first reshape your dataframe to a format where the columns are the week number and one row per weekday. Then, use the plot pandas method:
reshaped = (df
.assign(date=lambda f: pd.to_datetime(f.date))
.assign(dayofweek=lambda f: f.date.dt.dayofweek,
dayname=lambda f: f.date.dt.weekday_name)
.set_index(['dayofweek', 'dayname', 'WeekNo'])
.value
.unstack()
.reset_index(0, drop=True))
print(reshaped)
reshaped.plot(marker='x')
WeekNo 1 2 3
dayname
Monday 214.0 141.0 113.0
Tuesday 232.0 152.0 139.0
Wednesday 147.0 167.0 159.0
Thursday 257.0 NaN 202.0
Friday 164.0 NaN 207.0
Saturday 187.0 NaN NaN
Sunday 201.0 NaN NaN

Categories

Resources