How to change this time data into H:M in python - python

Have a dataset with a duration column with time data listed as an object shown below
df['duration'].head(10)
0 60 min.
1 1 hr. 13 min.
2 1 hr. 10 min.
3 52 min.
4 1 hr. 25 min.
5 45 min.
6 45 min.
7 60 min.
8 45 min.
9 45 min.
Name: duration, dtype: object
How do I change this to an appropriate numerical value, like below?
0 00:60
1 01:13
2 01:10
3 00:52
4 01:25
5 00:45

Here is a way to get a string version in %H:%M format and a timedelta version:
import pandas as pd
df = pd.DataFrame({'duration':['60 min.', '1 hr. 13 min.', '1 hr. 10 min.']})
print(df)
df['parts']=df.duration.str.findall('\d+')
df['timedelta']=df.parts.apply(lambda x: pd.to_timedelta((0 if len(x) < 2 else int(x[0])) * 3600 + int(x[-1])*60, unit='s'))
df['hours and minutes']=df.parts.apply(lambda x: f"{0 if len(x) < 2 else int(x[0]):02}:{int(x[-1]):02}")
df = df.drop(columns=['duration', 'parts'])
print(df)
Input:
duration
0 60 min.
1 1 hr. 13 min.
2 1 hr. 10 min.
Output:
timedelta hours and minutes
0 0 days 01:00:00 00:60
1 0 days 01:13:00 01:13
2 0 days 01:10:00 01:10
If we do this:
print(df.timedelta.dtypes)
... we see that the timedelta column indeed contains numerical values (of timedelta data type):
timedelta64[ns]

You could apply a lambda function on your duration column like this:
import pandas as pd
import datetime as dt
def transform(t):
if 'hr.' in t:
return dt.datetime.strptime(t, '%I hr. %M min.').strftime('%I:%M')
return dt.datetime.strptime(t, '%M min.').strftime('00:%M')
df = pd.DataFrame(['45 min.', '1 hr. 13 min.'], columns=['duration'])
print(df)
df['duration'] = df['duration'].apply(lambda x: transform(x))
print(df)
Outputs:
duration
0 45 min.
1 1 hr. 13 min.
and then
duration
0 00:45
1 01:13
Note that if you want "60 min." mapped into "00:60", then you need some additional logic in the transform function, since the minutes format %M only takes values between 00-59.

Related

From unix timestamps to relative date based on a condition from another column in pandas

I have a column of dates in unix timestamps and i need to convert them into relative dates from the starting activity.
The final output should be the column D, which expresses the relative time from the activity which has index = 1, in particular the relative time has always to refer to the first activity (index=1).
A index timestamp D
activity1 1 1.612946e+09 0
activity2 2 1.614255e+09 80 hours
activity3 1 1.612181e+09 0
activity4 2 1.613045e+09 50 hours
activity5 3 1.637668e+09 430 hours
Any idea?
Use to_datetime with unit='s' and then create groups starting by index equal 1 and get first value, last subtract and convert to hours:
df['timestamp'] = pd.to_datetime(df['timestamp'], unit='s')
s = df.groupby(df['index'].eq(1).cumsum())['timestamp'].transform('first')
df['D1'] = df['timestamp'].sub(s).dt.total_seconds().div(3600)
print (df)
A index timestamp D D1
0 activity1 1 2021-02-10 08:33:20 0 0.000000
1 activity2 2 2021-02-25 12:10:00 80 hours 363.611111
2 activity3 1 2021-02-01 12:03:20 0 0.000000
3 activity4 2 2021-02-11 12:03:20 50 hours 240.000000
4 activity5 3 2021-11-23 11:46:40 430 hours 7079.722222

How to convert X min, Y sec string to timestamp

I have a dataframe with a duration column of strings in a format like:
index
duration
0
26 s
1
24 s
2
4 min, 37 s
3
7 s
4
1 min, 1 s
Is there a pandas or strftime() / strptime() way to convert the duration column to a min/sec timestamp.
I've attempted this way to convert strings, but I'll run into multiple scenarios after replacing strings:
for row in df['index']:
if "min, " in df['duration'][row]:
df['duration'][row] = df['duration'][row].replace(' min, ', ':').replace(' s', '')
else:
pass
Thanks in advance
Try:
pd.to_timedelta(df['duration'])
Output:
0 0 days 00:00:26
1 0 days 00:00:24
2 0 days 00:04:37
3 0 days 00:00:07
4 0 days 00:01:01
Name: duration, dtype: timedelta64[ns]

Creating columns in df with day and hour of week based on value

I am trying to create 2 columns based of a column that contains numerical values.
Value
0
4
10
24
null
49
Expected Output:
Value Day Hour
0 Sunday 12:00am
4 Sunday 4:00am
10 Sunday 10:00am
24 Monday 12:00am
null No Day No Time
49 Tuesday 1:00am
Continued.....
Code I am trying out:
value = df.value.unique()
Sunday_Starting_Point = pd.to_datetime('Sunday 2015')
(Sunday_Starting_Point + pd.to_timedelta(Value, 'h')).dt.strftime('%A %I:%M%P')
Thanks for looking!
I think unique values are not necessary, you can use 2 times dt.strftime for 2 columns with replace with NaT values:
Sunday_Starting_Point = pd.to_datetime('Sunday 2015')
x = pd.to_numeric(df.Value, errors='coerce')
s = Sunday_Starting_Point + pd.to_timedelta(x, unit='h')
df['Day'] = s.dt.strftime('%A').replace('NaT','No Day')
df['Hour'] = s.dt.strftime('%I:%M%p').replace('NaT','No Time')
print (df)
Value Day Hour
0 0.0 Sunday 12:00AM
1 4.0 Sunday 04:00AM
2 10.0 Sunday 10:00AM
3 24.0 Monday 12:00AM
4 NaN No Day No Time
5 49.0 Tuesday 01:00AM

Pandas How to group by month and year using dt

I am just wondering how to group by both year and month using pandas.series.dt.
The code below groups by just year, but how would I add a further filter to group by month as well.
Data = {'Date':['21.10.1999','30.10.1999','02.11.1999','17.08.2000','09.10.2001','14.07.2000'],'X': [10,20,30,40,50,60],'Y': [5,10,15,20,25,30]}
df = pd.DataFrame(Data)
#Convert to pandas date time
df['Date'] = pd.to_datetime(df['Date'])
#Obtain dataframe dtypes
print(df.dtypes)
print(df)
print(df.groupby(df['Date'].dt.year).sum())
am just wondering how to group by both year and month using pandas.series.dt.
You can pass Series.dt.year and
Series.dt.month with rename to groupby, new columns are not necessary:
print(df.groupby([df['Date'].dt.year.rename('y'), df['Date'].dt.month.rename('m')]).sum())
X Y
y m
1999 2 30 15
10 30 15
2000 7 60 30
8 40 20
2001 9 50 25
Another solutions:
If use DataFrame.resample or Grouper then are added all missing datetimes between (what should be nice or not):
print(df.resample('MS', on='Date').sum())
print(df.groupby(pd.Grouper(freq='MS', key='Date')).sum())
Or convert datetimes to month periods by Series.dt.to_period:
print(df.groupby(df['Date'].dt.to_period('m')).sum())
X Y
Date
1999-02 30 15
1999-10 30 15
2000-07 60 30
2000-08 40 20
2001-09 50 25
df.assign(yr = df['Date'].dt.year, mnth = df['Date'].dt.month).groupby(['yr', 'mnth']).sum()
Out[1]:
X Y
yr mnth
1999 2 30 15
10 30 15
2000 7 60 30
8 40 20
2001 9 50 25

Count String Values in Column across 30 Minute Time Bins using Pandas

I am looking to determine the count of string variables in a column across a 3 month data sample. Samples were taken at random times throughout each day. I can group the data by hour, but I require the fidelity of 30 minute intervals (ex. 0500-0600, 0600-0630) on roughly 10k rows of data.
An example of the data:
datetime stringvalues
2018-06-06 17:00 A
2018-06-07 17:30 B
2018-06-07 17:33 A
2018-06-08 19:00 B
2018-06-09 05:27 A
I have tried setting the datetime column as the index, but I cannot figure how to group the data on anything other than 'hour' and I don't have fidelity on the string value count:
df['datetime'] = pd.to_datetime(df['datetime']
df.index = df['datetime']
df.groupby(df.index.hour).count()
Which returns an output similar to:
datetime stringvalues
datetime
5 0 0
6 2 2
7 5 5
8 1 1
...
I researched multi-indexing and resampling to some length the past two days but I have been unable to find a similar question. The desired result would look something like this:
datetime A B
0500 1 2
0530 3 5
0600 4 6
0630 2 0
....
There is no straightforward way to do a TimeGrouper on the time component, so we do this in two steps:
v = (df.groupby([pd.Grouper(key='datetime', freq='30min'), 'stringvalues'])
.size()
.unstack(fill_value=0))
v.groupby(v.index.time).sum()
stringvalues A B
05:00:00 1 0
17:00:00 1 0
17:30:00 1 1
19:00:00 0 1

Categories

Resources