I have a dataframe where for each timestamp there are some points earned by the user. It looks like the following i.e. data was collected after few seconds
>> df.head()
points
timestamp
2017-05-29 17:40:45 5
2017-05-29 17:41:53 7
2017-05-29 17:42:34 3
2017-05-29 17:42:36 8
2017-05-29 17:42:37 6
Then I wanted to resample it to an interval of 5 minutes so I did this
>> df.resample("5min").mean()
points
timestamp
5/29/2017 17:40 8
5/29/2017 17:45 1
5/29/2017 17:50 4
5/29/2017 17:55 3
5/29/2017 18:00 8
5/30/2017 17:30 3
5/30/2017 17:35 3
5/30/2017 17:40 7
5/30/2017 17:45 8
5/30/2017 17:50 5
5/30/2017 17:55 7
5/30/2017 18:00 1
Now I want to give an input like this input_time = "17:00-18:00" and I want to divide the input time into 5min interval for e.g. [17:05, 17:10 ... 17:55, 18:00]. After that for each interval I want to get the average points earned for that particular time interval. The results should look like the following
interval points
17:00 -
17:05 -
….
17:30 3
17:35 3
17:40 7.5
17:45 4.5
17:50 4.5
17:55 5
18:00 4.5
Need your help. Thanks
Create DatetimeIndex by date_range and change format by strftime:
input_time = "17:00-18:00"
s,e = input_time.split('-')
r = pd.date_range(s, e, freq='5T').strftime('%H:%M')
print (r)
['17:00' '17:05' '17:10' '17:15' '17:20' '17:25' '17:30' '17:35' '17:40'
'17:45' '17:50' '17:55' '18:00']
Also convert original index for groupby with aggregate mean, last reindex by range:
df = df.groupby(df.index.strftime('%H:%M'))['points'].mean().reindex(r)
print (df)
17:00 NaN
17:05 NaN
17:10 NaN
17:15 NaN
17:20 NaN
17:25 NaN
17:30 3.0
17:35 3.0
17:40 7.5
17:45 4.5
17:50 4.5
17:55 5.0
18:00 4.5
Name: points, dtype: float64
Related
How to transform this data so that the pm 2.5 pm 10 columns are the average of the whole day. The data I collected (example here below) collects data every 15 minutes.
Pm 2.5 Pm 10 Created At
0 6.00 19.20 2021-06-21 19:00
1 4.70 17.00 2021-06-21 19:15
2 4.80 16.70 2021-06-21 19:30
3 5.10 12.10 2021-06-21 19:45
4 7.90 19.10 2021-06-21 20:00
Let's resample the dataframe:
df['Created At'] = pd.to_datetime(df['Created At'])
df.resample('D', on='Created At').mean()
Pm 2.5 Pm 10
Created At
2021-06-21 5.7 16.82
You can use pd.Grouper and then transform if you want to preserve the dataframe shape:
df['Created At'] = pd.to_datetime(df['Created At'])
df[['Pm 2.5', 'Pm 10']] = df.groupby(pd.Grouper(key='Created At', freq='D'))\
[['Pm 2.5', 'Pm 10']].transform('mean')
Output:
Pm 2.5 Pm 10 Created At
0 5.7 16.82 2021-06-21 19:00:00
1 5.7 16.82 2021-06-21 19:15:00
2 5.7 16.82 2021-06-21 19:30:00
3 5.7 16.82 2021-06-21 19:45:00
4 5.7 16.82 2021-06-21 20:00:00
here is one way do it
convert the date using to_datetime, grab the date and carry out the mean
df.groupby(pd.to_datetime(df['Created At']).dt.date).mean()
Created At Pm 2.5 Pm 10
0 2021-06-21 5.7 16.82
I am using python for some supply chain/manufacturing purposes. I am trying to figure out some code that will allow me to do configure the relevant information I need to compile.
I am trying to aggregate the total 'UnitsProduced' by 'Lot' and also grab only the first occurring 'Date/StartTime' and last occurring 'Date/EndTime'.
Right now the (simplified) dataframe is as follows:
Lot
UnitsProduced
Date/StartTime
Date/EndTime
1
5
1/1/2021 8:00
1/1/2021 13:00
1
13
1/2/2021 10:00
1/2/2021 14:00
2
20
1/3/2021 7:00
1/3/2021 11:00
3
15
1/4/2021 14:30
1/4/2021 19:00
3
6
1/4/2021 20:00
1/4/2021 22:00
3
28
1/5/2021 7:00
1/5/2021 13:00
The end result should look something like:
Lot
Units Produced
Date/StartTime
Date/EndTime
1
18
1/1/2021 8:00
1/2/2021 14:00
2
20
1/3/2021 7:00
1/3/2021 11:00
3
49
1/4/2021 14:30
1/5/2021 13:00
Thank you for the help. If there is any other information I can provide please let me know
You could use groupby.agg with a dictionary of aggregate functions, just make sure that the date columns are in datetime format:
# df['Date/StartTime'] = pd.to_datetime(df['Date/StartTime'])
# df['Date/EndTime'] = pd.to_datetime(df['Date/EndTime'])
df.groupby('Lot', as_index=False).agg({'UnitsProduced':'sum',
'Date/StartTime':'min',
'Date/EndTime':'max'})
Lot UnitsProduced Date/StartTime Date/EndTime
0 1 18 2021-01-01 08:00:00 2021-01-02 14:00:00
1 2 20 2021-01-03 07:00:00 2021-01-03 11:00:00
2 3 49 2021-01-04 14:30:00 2021-01-05 13:00:00
Let's say I have a pandas dataframe df
DF
Timestamp Value
Jan 1 12:32 10
Jan 1 12:50 15
Jan 1 13:01 5
Jan 1 16:05 17
Jan 1 16:10 17
Jan 1 16:22 20
The result I want back, is a dataframe with per-hour (or any user specified time-segment, really) averages. Let's say my specified timesegment is 1 hour here. I want back something like
Jan 1 12:00 12.5
Jan 1 13:00 5
Jan 1 14:00 0
Jan 1 15:00 0
Jan 1 16:00 18
Is there a simple way built into pandas to segment like this? It feels like there should be, but my googling of "splitting pandas dataframe" in a variety of ways is failing me.
We need to convert to datetime first then do resample
df.Timestamp=pd.to_datetime('2020 '+df.Timestamp)
df.set_index('Timestamp').Value.resample('1H').mean().fillna(0)
Timestamp
2020-01-01 12:00:00 7.5
2020-01-01 13:00:00 5.0
2020-01-01 14:00:00 0.0
2020-01-01 15:00:00 0.0
2020-01-01 16:00:00 18.0
Freq: H, Name: Value, dtype: float64
Convert the index
newdf.index=newdf.index.strftime('%B %d %H:%M')
newdf
Timestamp
January 01 12:00 7.5
January 01 13:00 5.0
January 01 14:00 0.0
January 01 15:00 0.0
January 01 16:00 18.0
Name: Value, dtype: float64
Here's a quick peek of my dataframe:
local_date amount
0 2017-08-16 10.00
1 2017-10-26 21.70
2 2017-11-04 5.00
3 2017-11-12 37.20
4 2017-11-13 10.00
5 2017-11-18 31.00
6 2017-11-27 14.00
7 2017-11-29 10.00
8 2017-11-30 37.20
9 2017-12-16 8.00
10 2017-12-17 43.20
11 2017-12-17 49.60
12 2017-12-19 102.50
13 2017-12-19 28.80
14 2017-12-22 72.55
15 2017-12-23 24.80
16 2017-12-24 62.00
17 2017-12-26 12.40
18 2017-12-26 15.50
19 2017-12-26 40.00
20 2017-12-28 57.60
21 2017-12-31 37.20
22 2018-01-01 18.60
23 2018-01-02 12.40
24 2018-01-04 32.40
25 2018-01-05 17.00
26 2018-01-06 28.80
27 2018-01-11 20.80
28 2018-01-12 10.00
29 2018-01-12 26.00
I am trying to plot monthly sum of transactions, which is fine, except for ugly x-ticks:
I would like to change it to Name of the month and year (e.g. Jan 2019). So I sort the dates, change them using strftime and plot it again, but the order of the date are completely messed up.
The code I used to sort the dates and conver them is:
transactions = transactions.sort_values(by='local_date')
transactions['month_year'] = transactions['local_date'].dt.strftime('%B %Y')
#And then groupby that column:
transactions.groupby('month_year').amount.sum().plot(kind='bar')
When doing this, the Month_year are paired together. January 2019 comes after January 2018 etc. etc.
I thought sorting by date would fix this, but it doesn't. What's the best way to approach this?
You can convert column to mont periods by Series.dt.to_period and then change PeriodIndex to custom format in rename:
transactions = transactions.sort_values(by='local_date')
(transactions.groupby(transactions['local_date'].dt.to_period('m'))
.amount.sum()
.rename(lambda x: x.strftime('%B %Y'))
.plot(kind='bar'))
Alternative solution:
transactions = transactions.sort_values(by='local_date')
s = transactions.groupby(transactions['local_date'].dt.to_period('m')).amount.sum()
s.index = s.index.strftime('%B %Y')
s.plot(kind='bar')
I have the following data. This represents the number of occurrences in January:
date value WeekDay WeekNo Year Month
2018-01-01 214.0 Monday 1 2018 1
2018-01-02 232.0 Tuesday 1 2018 1
2018-01-03 147.0 Wed 1 2018 1
2018-01-04 257.0 Thursd 1 2018 1
2018-01-05 164.0 Friday 1 2018 1
2018-01-06 187.0 Saturd 1 2018 1
2018-01-07 201.0 Sunday 1 2018 1
2018-01-08 141.0 Monday 2 2018 1
2018-01-09 152.0 Tuesday 2 2018 1
2018-01-10 167.0 Wednesd 2 2018 1
2018-01-15 113.0 Monday 3 2018 1
2018-01-16 139.0 Tuesday 3 2018 1
2018-01-17 159.0 Wednesd 3 2018 1
2018-01-18 202.0 Thursd 3 2018 1
2018-01-19 207.0 Friday 3 2018 1
... ... ... ... ...
WeekNo is the number of the week in a year.
My goal is to have a line plot showing the evolution of occurrences, for this particular month, per week number. Therefore, I'd like to have the weekday in the x-axis, the occurrences on the y-axis and different lines, each with a different color, for each week (and a legend with the color that corresponds to each week).
Does anyone have any idea how this could be done? Thanks a lot!
You can first reshape your dataframe to a format where the columns are the week number and one row per weekday. Then, use the plot pandas method:
reshaped = (df
.assign(date=lambda f: pd.to_datetime(f.date))
.assign(dayofweek=lambda f: f.date.dt.dayofweek,
dayname=lambda f: f.date.dt.weekday_name)
.set_index(['dayofweek', 'dayname', 'WeekNo'])
.value
.unstack()
.reset_index(0, drop=True))
print(reshaped)
reshaped.plot(marker='x')
WeekNo 1 2 3
dayname
Monday 214.0 141.0 113.0
Tuesday 232.0 152.0 139.0
Wednesday 147.0 167.0 159.0
Thursday 257.0 NaN 202.0
Friday 164.0 NaN 207.0
Saturday 187.0 NaN NaN
Sunday 201.0 NaN NaN