How to change this time data into H:M in python

How to change this time data into H:M in python - python

Have a dataset with a duration column with time data listed as an object shown below
df['duration'].head(10)
0 60 min.
1 1 hr. 13 min.
2 1 hr. 10 min.
3 52 min.
4 1 hr. 25 min.
5 45 min.
6 45 min.
7 60 min.
8 45 min.
9 45 min.
Name: duration, dtype: object
How do I change this to an appropriate numerical value, like below?
0 00:60
1 01:13
2 01:10
3 00:52
4 01:25
5 00:45

Here is a way to get a string version in %H:%M format and a timedelta version:
import pandas as pd
df = pd.DataFrame({'duration':['60 min.', '1 hr. 13 min.', '1 hr. 10 min.']})
print(df)
df['parts']=df.duration.str.findall('\d+')
df['timedelta']=df.parts.apply(lambda x: pd.to_timedelta((0 if len(x) < 2 else int(x[0])) * 3600 + int(x[-1])*60, unit='s'))
df['hours and minutes']=df.parts.apply(lambda x: f"{0 if len(x) < 2 else int(x[0]):02}:{int(x[-1]):02}")
df = df.drop(columns=['duration', 'parts'])
print(df)
Input:
duration
0 60 min.
1 1 hr. 13 min.
2 1 hr. 10 min.
Output:
timedelta hours and minutes
0 0 days 01:00:00 00:60
1 0 days 01:13:00 01:13
2 0 days 01:10:00 01:10
If we do this:
print(df.timedelta.dtypes)
... we see that the timedelta column indeed contains numerical values (of timedelta data type):
timedelta64[ns]

You could apply a lambda function on your duration column like this:
import pandas as pd
import datetime as dt
def transform(t):
if 'hr.' in t:
return dt.datetime.strptime(t, '%I hr. %M min.').strftime('%I:%M')
return dt.datetime.strptime(t, '%M min.').strftime('00:%M')
df = pd.DataFrame(['45 min.', '1 hr. 13 min.'], columns=['duration'])
print(df)
df['duration'] = df['duration'].apply(lambda x: transform(x))
print(df)
Outputs:
duration
0 45 min.
1 1 hr. 13 min.
and then
duration
0 00:45
1 01:13
Note that if you want "60 min." mapped into "00:60", then you need some additional logic in the transform function, since the minutes format %M only takes values between 00-59.

Related

From unix timestamps to relative date based on a condition from another column in pandas

I have a column of dates in unix timestamps and i need to convert them into relative dates from the starting activity.
The final output should be the column D, which expresses the relative time from the activity which has index = 1, in particular the relative time has always to refer to the first activity (index=1).
A index timestamp D
activity1 1 1.612946e+09 0
activity2 2 1.614255e+09 80 hours
activity3 1 1.612181e+09 0
activity4 2 1.613045e+09 50 hours
activity5 3 1.637668e+09 430 hours
Any idea?

Use to_datetime with unit='s' and then create groups starting by index equal 1 and get first value, last subtract and convert to hours:
df['timestamp'] = pd.to_datetime(df['timestamp'], unit='s')
s = df.groupby(df['index'].eq(1).cumsum())['timestamp'].transform('first')
df['D1'] = df['timestamp'].sub(s).dt.total_seconds().div(3600)
print (df)
A index timestamp D D1
0 activity1 1 2021-02-10 08:33:20 0 0.000000
1 activity2 2 2021-02-25 12:10:00 80 hours 363.611111
2 activity3 1 2021-02-01 12:03:20 0 0.000000
3 activity4 2 2021-02-11 12:03:20 50 hours 240.000000
4 activity5 3 2021-11-23 11:46:40 430 hours 7079.722222

How to convert X min, Y sec string to timestamp

I have a dataframe with a duration column of strings in a format like:
index
duration
0
26 s
1
24 s
2
4 min, 37 s
3
7 s
4
1 min, 1 s
Is there a pandas or strftime() / strptime() way to convert the duration column to a min/sec timestamp.
I've attempted this way to convert strings, but I'll run into multiple scenarios after replacing strings:
for row in df['index']:
if "min, " in df['duration'][row]:
df['duration'][row] = df['duration'][row].replace(' min, ', ':').replace(' s', '')
else:
pass
Thanks in advance

Try:
pd.to_timedelta(df['duration'])
Output:
0 0 days 00:00:26
1 0 days 00:00:24
2 0 days 00:04:37
3 0 days 00:00:07
4 0 days 00:01:01
Name: duration, dtype: timedelta64[ns]

Creating columns in df with day and hour of week based on value

I am trying to create 2 columns based of a column that contains numerical values.
Value
0
4
10
24
null
49
Expected Output:
Value Day Hour
0 Sunday 12:00am
4 Sunday 4:00am
10 Sunday 10:00am
24 Monday 12:00am
null No Day No Time
49 Tuesday 1:00am
Continued.....
Code I am trying out:
value = df.value.unique()
Sunday_Starting_Point = pd.to_datetime('Sunday 2015')
(Sunday_Starting_Point + pd.to_timedelta(Value, 'h')).dt.strftime('%A %I:%M%P')
Thanks for looking!

I think unique values are not necessary, you can use 2 times dt.strftime for 2 columns with replace with NaT values:
Sunday_Starting_Point = pd.to_datetime('Sunday 2015')
x = pd.to_numeric(df.Value, errors='coerce')
s = Sunday_Starting_Point + pd.to_timedelta(x, unit='h')
df['Day'] = s.dt.strftime('%A').replace('NaT','No Day')
df['Hour'] = s.dt.strftime('%I:%M%p').replace('NaT','No Time')
print (df)
Value Day Hour
0 0.0 Sunday 12:00AM
1 4.0 Sunday 04:00AM
2 10.0 Sunday 10:00AM
3 24.0 Monday 12:00AM
4 NaN No Day No Time
5 49.0 Tuesday 01:00AM

Pandas How to group by month and year using dt

I am just wondering how to group by both year and month using pandas.series.dt.
The code below groups by just year, but how would I add a further filter to group by month as well.
Data = {'Date':['21.10.1999','30.10.1999','02.11.1999','17.08.2000','09.10.2001','14.07.2000'],'X': [10,20,30,40,50,60],'Y': [5,10,15,20,25,30]}
df = pd.DataFrame(Data)
#Convert to pandas date time
df['Date'] = pd.to_datetime(df['Date'])
#Obtain dataframe dtypes
print(df.dtypes)
print(df)
print(df.groupby(df['Date'].dt.year).sum())

am just wondering how to group by both year and month using pandas.series.dt.
You can pass Series.dt.year and
Series.dt.month with rename to groupby, new columns are not necessary:
print(df.groupby([df['Date'].dt.year.rename('y'), df['Date'].dt.month.rename('m')]).sum())
X Y
y m
1999 2 30 15
10 30 15
2000 7 60 30
8 40 20
2001 9 50 25
Another solutions:
If use DataFrame.resample or Grouper then are added all missing datetimes between (what should be nice or not):
print(df.resample('MS', on='Date').sum())
print(df.groupby(pd.Grouper(freq='MS', key='Date')).sum())
Or convert datetimes to month periods by Series.dt.to_period:
print(df.groupby(df['Date'].dt.to_period('m')).sum())
X Y
Date
1999-02 30 15
1999-10 30 15
2000-07 60 30
2000-08 40 20
2001-09 50 25

df.assign(yr = df['Date'].dt.year, mnth = df['Date'].dt.month).groupby(['yr', 'mnth']).sum()
Out[1]:
X Y
yr mnth
1999 2 30 15
10 30 15
2000 7 60 30
8 40 20
2001 9 50 25

Count String Values in Column across 30 Minute Time Bins using Pandas

I am looking to determine the count of string variables in a column across a 3 month data sample. Samples were taken at random times throughout each day. I can group the data by hour, but I require the fidelity of 30 minute intervals (ex. 0500-0600, 0600-0630) on roughly 10k rows of data.
An example of the data:
datetime stringvalues
2018-06-06 17:00 A
2018-06-07 17:30 B
2018-06-07 17:33 A
2018-06-08 19:00 B
2018-06-09 05:27 A
I have tried setting the datetime column as the index, but I cannot figure how to group the data on anything other than 'hour' and I don't have fidelity on the string value count:
df['datetime'] = pd.to_datetime(df['datetime']
df.index = df['datetime']
df.groupby(df.index.hour).count()
Which returns an output similar to:
datetime stringvalues
datetime
5 0 0
6 2 2
7 5 5
8 1 1
...
I researched multi-indexing and resampling to some length the past two days but I have been unable to find a similar question. The desired result would look something like this:
datetime A B
0500 1 2
0530 3 5
0600 4 6
0630 2 0
....

There is no straightforward way to do a TimeGrouper on the time component, so we do this in two steps:
v = (df.groupby([pd.Grouper(key='datetime', freq='30min'), 'stringvalues'])
.size()
.unstack(fill_value=0))
v.groupby(v.index.time).sum()
stringvalues A B
05:00:00 1 0
17:00:00 1 0
17:30:00 1 1
19:00:00 0 1

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to change this time data into H:M in python - python

Related

From unix timestamps to relative date based on a condition from another column in pandas

How to convert X min, Y sec string to timestamp

Creating columns in df with day and hour of week based on value

Pandas How to group by month and year using dt

Count String Values in Column across 30 Minute Time Bins using Pandas

Categories

Resources