I've subtracted two datetimes from each other, like so:
df['Time Difference'] = df['Time 1'] - df['Time 2']
resulting in a timedelta object. I need the total number of minutes from this object, but I can't for the life of me figure it out. Currently, the "Time Difference" column looks like this:
1 0 days 00:01:00.000000000
2 0 days 00:04:00.000000000
3 0 days 00:03:00.000000000
4 0 days 00:01:00.000000000
5 0 days 00:03:00.000000000
I've tried dividing by a numpy timedelta (which seems to be the most common suggestion) as well as by pandas timedelta, as well as a few other things. Operations such as df['Time Difference'].seconds, or .seconds(), or .total_seconds, (all suggestions I've seen for this), all give errors. I'm really at a loss for what to do here. I need this in minutes in order to make graphs in matplotlib, and I'm kind of stuck until I figure this out, so any suggestions are very much appreciated. Thanks!
use dt.total_seconds() and divide by 60 to get the minutes:
import pandas as pd
df = pd.DataFrame({'td': pd.to_timedelta(['0 days 00:01:00.000000000',
'0 days 00:04:00.000000000',
'0 days 00:03:00.000000000',
'1 days 00:01:00.000000000',
'0 days 00:03:00.000000000'])})
df['delta_min'] = df['td'].dt.total_seconds() / 60
# df['delta_min']
# 0 1.0
# 1 4.0
# 2 3.0
# 3 1441.0
# 4 3.0
Related
I'm trying to subtract two columns on a dataset which have string times in order to get a time value for statistical analysis.
Basically, TOC is start time and IA is end time.
Something is slightly wrong:
dfc = pd.DataFrame(zip(*[TOC,IA]),columns=['TOC','IA'])
print (dfc)
dfc.['TOC']= dfc.['TOC'].astype(dt.datetime)
dfc['TOC'] = pd.to_datetime(dfc['TOC'])
dfc['TOC'] = [time.time() for time in dfc['TOC']]
Convert the columns to datetime before subtracting:
>>> pd.to_datetime(dfc["IA"], format="%H:%M:%S")-pd.to_datetime(dfc["TOC"], format="%H:%M:%S")
0 0 days 00:08:07
1 0 days 00:15:29
2 0 days 00:11:14
3 0 days 00:27:50
dtype: timedelta64[ns]
I need to import a .xlsx sheet into pandas which has a column for the processing time of an associated activity. All entries in this column look somewhat like this:
01:20:34
12:22:30
25:01:02
155:20:56
Which says how much hours, minutes and seconds were needed. When I use pd.read_excel pandas correctly interprets each of the timestamps with less than 24 hours, and reads them as above in the first two cases. The timestamps with more than 24h (last two) on the other hand are converted into a datetime object, which in turn looks like this: 1900-01-02T14:58:03 instead of 62:58:03.
Is there a simple solution?
I think that part of the problem is not in Python/Pandas, but in Excel. Date '1900-01-01' is the base date used by Excel represented by number '1'. You can check that if you write '0' in a cell and then formate that cell to date, you get '1900-01-00' and '1' you get '1900-01-01'.
So, try to export your Excel file to a CSV file before importing to pandas and then import this way:
import pandas as pd
df1 = pd.read_csv('sample_data.csv')
In this case, you can get this DataFrame with the column Duration as a string (I added a column id for reference).
duration id
0 01:20:34 1
1 12:22:30 2
2 25:01:02 3
3 155:20:56 4
Then for your purpose, I suggest you Do not try to convert those values to datetime type, but a timedelta. A strategy will be to split the strings by colons and then build an instance of timedelta using those three fields: hours, minutes, and seconds.
import datetime as dt
def converter1(x):
vals = x.split(':')
vals = [int(val) for val in vals ]
out = dt.timedelta(hours=vals[0], minutes=vals[1], seconds=vals[2])
return out
df1['deltat'] = df1['duration'].apply(converter1)
duration id deltat
0 01:20:34 1 0 days 01:20:34
1 12:22:30 2 0 days 12:22:30
2 25:01:02 3 1 days 01:01:02
3 155:20:56 4 6 days 11:20:56
If you need to convert those values to a number of decimals hours or other new fields use the total_seconds() method from timedelta:
df1['deltat_hr'] = df1['deltat'].apply(lambda x: x.total_seconds()/3600)
duration id deltat deltat_hr
0 01:20:34 1 0 days 01:20:34 1.342778
1 12:22:30 2 0 days 12:22:30 12.375000
2 25:01:02 3 1 days 01:01:02 25.017222
3 155:20:56 4 6 days 11:20:56 155.348889
I am trying to convert a column df["time_ro_reply"]
which contains only days in decimal to a timedelta format where it contains days, hours, minutes. This makes it more human readable.
I am reading about pd.to_timedelta, but I am struggling implementing it:
pd.to_timedelta(df["time_to_reply"]) This returns me only 0.
Sample input:
df["time_ro_reply"]
1.881551
0.903264
2.931560
2.931560
Expected output:
df["time_ro_reply"]
1 days 19 hours 4 minutes
0 days 23 hours 2 minutes
2 days 2 hours 23 minutes
2 days 2 hours 23 minutes
I suggest using using a custom function as follows:
import numpy as np
import pandas as pd
# creating the provided dataframe
df = pd.DataFrame([1.881551, 0.903264, 2.931560, 2.931560],
columns = ["time_ro_reply"])
# this function converts a time as a decimal of days into the desired format
def convert_time(time):
# calculate the days and remaining time
days, remaining = divmod(time, 1)
# calculate the hours and remaining time
hours, remaining = divmod(remaining * 24, 1)
# calculate the minutes
minutes = divmod(remaining * 60, 1)[0]
# a list of the strings, rounding the time values
strings = [str(round(days)), 'days',
str(round(hours)), 'hours',
str(round(minutes)), 'minutes']
# return the strings concatenated to a single string
return ' '.join(strings)
# add a new column to the dataframe by applying the function
# to all values of the column 'time_ro_reply' using .apply()
df["desired_output"] = df["time_ro_reply"].apply(lambda t: convert_time(t))
This yields the following dataframe:
time_ro_reply desired_output
0 1.881551 1 days 21 hours 9 minutes
1 0.903264 0 days 21 hours 40 minutes
2 2.931560 2 days 22 hours 21 minutes
3 2.931560 2 days 22 hours 21 minutes
However, this yields different outputs than the ones you described. If the 'time_ro_reply' values are indeed to be interpreted as pure decimals, I don't see how you got your expected results. Do you mind sharing how you got them?
I hope the comments explain the code well enough. If not and you are unfamiliar with syntax such as e.g. divmod(), apply(), I suggest looking them up in the Python / Pandas documentations.
Let me know if this helps.
We have some ready available sales data for certain periods, like 1week, 1month...1year:
time_pillars = pd.Series(['1W', '1M', '3M', '1Y'])
sales = pd.Series([4.75, 5.00, 5.10, 5.75])
data = {'time_pillar': time_pillars, 'sales': sales}
df = pd.DataFrame(data)
I would like to do two operations.
Firstly, create a new column of date type, df['date'], that corresponds to the actual date of 1week, 1month..1year from now.
Then, I'd like to create another column df['days_from_now'], taking how many days are on these pillars (1week would be 7days, 1month would be around 30days..1year around 365days).
The goal of this is then to use any day as input for a a simple linear_interpolation_method() to obtain sales data for any given day (eg, what are sales for 4Octobober2018? ---> We would interpolate between 3months and 1year).
Many thanks.
I'm not exactly sure what you mean regarding your interpolation, but here is a way to make your dataframe in pandas (starting from your original df you provided in your post):
from datetime import datetime
from dateutil.relativedelta import relativedelta
def create_dates(df):
df['date'] = [i.date() for i in
[d+delt for d,delt in zip([datetime.now()] * 4 ,
[relativedelta(weeks=1), relativedelta(months=1),
relativedelta(months=3), relativedelta(years=1)])]]
df['days_from_now'] = df['date'] - datetime.now().date()
return df
create_dates(df)
sales time_pillar date days_from_now
0 4.75 1W 2018-04-11 7 days
1 5.00 1M 2018-05-04 30 days
2 5.10 3M 2018-07-04 91 days
3 5.75 1Y 2019-04-04 365 days
I wrapped it in a function, so that you can call it on any given day and get your results for 1 week, 3 weeks, etc. from that exact day.
Note: if you want your days_from_now to simply be an integer of the number of days, use df['days_from_now'] = [i.days for i in df['date'] - datetime.now().date()] in the function, instead of df['days_from_now'] = df['date'] - datetime.now().date()
Explanation:
df['date'] = [i.date() for i in
[d+delt for d,delt in zip([datetime.now()] * 4 ,
[relativedelta(weeks=1), relativedelta(months=1),
relativedelta(months=3), relativedelta(years=1)])]]
Takes a list of the date today (datetime.now()) repeated 4 times, and adds a relativedelta (a time difference) of 1 week, 1 month, 3 months, and 1 year, respectively, extracts the date (i.date() for ...), finally creating a new column using the resulting list.
df['days_from_now'] = df['date'] - datetime.now().date()
is much more straightforward, it simply subtracts those new dates that you got above from the date today. The result is a timedelta object, which pandas conveniently formats as "n days".
This is my dataframe:
issue,time_taken
aa,2 days 20:00:00.95
bb,2 days 19:12:48.276000
I just want to convert the time_taken column format into hours. I only need the total number of hours.
For example, I have to display output like
issue,time_taken,time_taken_hours
aa,2 days 20:00:00.95,68
I think you can use:
import numpy as np
df['time_taken_hours'] = df['time_taken'] / np.timedelta64(1, 'h')
print df
issue time_taken time_taken_hours
0 aa 2 days 20:07:49.958000 68.130544
1 bb 2 days 19:12:13.383000 67.203717
Frequency conversion in doc