How to get time humanized time difference without external libraries - python

I have two times, how can I get the time difference of these 2 aware datetime objects in a human readable format. What I mean by human readable format would be 1 year 3 months 2 weeks 4 days 1 hour 2 minutes and 19 seconds.
However, if the time difference is shorter, it would give a readable format like 2 minutes and 3 seconds (It wouldn't tell us 0 years 0 months 0 weeks 0 days 0 hours 2 minutes and 52 seconds). Or if its just seconds left then it would be 15 seconds
Year is classified as 365.25 days
Month is classified as 30.4375 days

Related

Calculating values from time series in pandas multi-indexed pivot tables

I've got a dataframe in pandas that stores the Id of a person, the quality of interaction, and the date of the interaction. A person can have multiple interactions across multiple dates, so to help visualise and plot this I converted it into a pivot table grouping first by Id then by date to analyse the pattern over time.
e.g.
import pandas as pd
df = pd.DataFrame({'Id':['A4G8','A4G8','A4G8','P9N3','P9N3','P9N3','P9N3','C7R5','L4U7'],
'Date':['2016-1-1','2016-1-15','2016-1-30','2017-2-12','2017-2-28','2017-3-10','2019-1-1','2018-6-1','2019-8-6'],
'Quality':[2,3,6,1,5,10,10,2,2]})
pt = df.pivot_table(values='Quality', index=['Id','Date'])
print(pt)
Leads to this:
Id
Date
Quality
A4G8
2016-1-1
2
2016-1-15
4
2016-1-30
6
P9N3
2017-2-12
1
2017-2-28
5
2017-3-10
10
2019-1-1
10
C7R5
2018-6-1
2
L4U7
2019-8-6
2
However, I'd also like to...
Measure the time from the first interaction for each interaction per Id
Measure the time from the previous interaction with the same Id
So I'd get a table similar to the one below
Id
Date
Quality
Time From First
Time To Prev
A4G8
2016-1-1
2
0 days
NA days
2016-1-15
4
14 days
14 days
2016-1-30
6
29 days
14 days
P9N3
2017-2-12
1
0 days
NA days
2017-2-28
5
15 days
15 days
2017-3-10
10
24 days
9 days
The Id column is a string type, and I've converted the date column into datetime, and the Quality column into an integer.
The column is rather large (>10,000 unique ids) so for performance reasons I'm trying to avoid using for loops. I'm guessing the solution is somehow using pd.eval but I'm stuck as to how to apply it correctly.
Apologies I'm a python, pandas, & stack overflow) noob and I haven't found the answer anywhere yet so even some pointers on where to look would be great :-).
Many thanks in advance
Convert Dates to datetimes and then substract minimal datetimes per groups by GroupBy.transformb subtracted by column Date and for second new column use DataFrameGroupBy.diff:
df['Date'] = pd.to_datetime(df['Date'])
df['Time From First'] = df['Date'].sub(df.groupby('Id')['Date'].transform('min'))
df['Time To Prev'] = df.groupby('Id')['Date'].diff()
print (df)
Id Date Quality Time From First Time To Prev
0 A4G8 2016-01-01 2 0 days NaT
1 A4G8 2016-01-15 3 14 days 14 days
2 A4G8 2016-01-30 6 29 days 15 days
3 P9N3 2017-02-12 1 0 days NaT
4 P9N3 2017-02-28 5 16 days 16 days
5 P9N3 2017-03-10 10 26 days 10 days
6 P9N3 2019-01-01 10 688 days 662 days
7 C7R5 2018-06-01 2 0 days NaT
8 L4U7 2019-08-06 2 0 days NaT
df["Date"] = pd.to_datetime(df.Date)
df = df.merge(
df.groupby(["Id"]).Date.first(),
on="Id",
how="left",
suffixes=["", "_first"]
)
df["Time From First"] = df.Date-df.Date_first
df['Time To Prev'] = df.groupby('Id').Date.diff()
df.set_index(["Id", "Date"], inplace=True)
df
output:

Can't convert Timedelta Object to numeric value in Pandas

I have a data frame with type: String , i want to convert the delta column into total hours
deltas
0 2 days 12:19:00
1 04:45:00
2 3 days 06:41:00
3 5 days 01:55:00
4 13:57:00
Desired Output:
deltas
0 60 hours
1 4 hours
I tried pd.to_timedelta() but i get this error only leading negative signs are allowed and i am totally stuck in this
To get the number of hours as int run:
(pd.to_timedelta(df.s) / np.timedelta64(1, 'h')).astype(int)
The first step is to convert the string representation of Timedelta to
actual Timedelta.
Then divide it by 1 hour and convert to int.

Python Datetime round timedelta to nearest biggest unit

I've been making a forum as a learning experience. I have a timestamp for every post, which I convert to a timedelta (how much time ago it was). I want to output the time like so:
If it's < 1 minute display it in seconds
If it's >= 1 minute and < 1 hour display it in minutes
If it's >= 1 hour and < 1 day display it in hours
If it's >= 1 day and < 1 week display it in days
If it's >= 1 week and < 1 month display it in weeks
If it's >= 1 month and < 1 year display it in months
If it's >= 1 year display it in years
What is the best way to do this in python and datetime?
Use a third-party library. For example, readabledelta is a timedelta subclass which prints human-readable.
>>> from readabledelta import readabledelta
>>> from datetime import timedelta
>>> print(readabledelta(timedelta(seconds=1)))
1 second
>>> print(readabledelta(timedelta(seconds=60)))
1 minute
>>> print(readabledelta(timedelta(seconds=60*60)))
1 hour
>>> print(readabledelta(timedelta(seconds=60*60*24)))
1 day
>>> print(readabledelta(timedelta(seconds=60*60*24*7)))
1 week
You can not easily use months or years, because the length of the unit is not well defined (a month could be 28-31 days, and a year could be 365-366 days).

"<=" not giving expected results in Python

I am teaching myself Python. I have gone through some tutorials and thought I'd write a little program for counting the candles for each of the respective 8 nights of Hanukkah.
days = 0
candles = 1
while days <= 8 :
days = days + 1
candles = candles + 1
print ("Day", days,":", candles, "Candles")
But the results for this (Python 3.4) are:
Day 1 : 2 Candles
Day 2 : 3 Candles
Day 3 : 4 Candles
Day 4 : 5 Candles
Day 5 : 6 Candles
Day 6 : 7 Candles
Day 7 : 8 Candles
Day 8 : 9 Candles
Day 9 : 10 Candles
Why didn't it stop at day 8?
Because you days <= 8 when the loop starts, then you add one to it in the loop. while loops don't stop the second the value changes they finish executing the block and then return to the conditional and check if they should keep going.
You are incrementing the value of the variable days after the test. When days is 8, you increment it to 9 and then you print it.
I would do something like this:
days = 1
candles = 2
while days <= 8 :
print ("Day", days,":", candles, "Candles")
days = days + 1
candles = candles + 1
If you increase your variables at the end you will get what you want.
days = 1
candles = 2
while days <= 8 :
print ("Day", days,":", candles, "Candles")
days = days + 1
candles = candles + 1

datetime: subtracting date from itself yields 3288 days

I have a bunch of dates in a pandas dataframe, mostly observed for July of each year, of type datetime64[ns].
In [126]:
e6.To.head()
Out[122]:
14 1991-07-01
15 1992-07-01
16 1993-07-01
17 1994-07-01
18 1995-07-01
Name: To, dtype: datetime64[ns]
I ultimately want to store in a separate variable the rolling difference from one row to the next using shift(), but I found subtracting dates to produce odd results. Here, I subtract a series of dates from itself (reprinting the first five results. Some of them are, as expected, 0, but others are obviously not.
In [127]:
(e6.To-e6.To).head()
Out[127]:
1 0 days
1 -3288 days
1 3288 days
1 0 days
2 0 days
Name: To, dtype: timedelta64[ns]
If I take just the top five observations and then subtract, I do not get this result, and get all 0's as expected:
In [128]:
e6.To.head()-e6.To.head()
Out[119]:
14 0 days
15 0 days
16 0 days
17 0 days
18 0 days
Name: To, dtype: timedelta64[ns]
I can't reproduce it if I 'enter' the data directly, like so:
In [128]:
test=pd.DataFrame(data=['1991-07-01','1992-07-01','1993-07-01','1994-07-01','1995-07-01','1996-07-01'],columns=['date'])
test['date']=test['date'].astype('datetime64')
test.date - test.date
Out[128]:
0 0 days
1 0 days
2 0 days
3 0 days
4 0 days
5 0 days
Name: date, dtype: timedelta64[ns]
Any ideas what I am doing wrong here?
Not quite an answer but I need some space to show something. My guess is that something weird is going on with indexing (I have no idea why, though). Note my comment about indexing above and also note #ASGM's comment about the difference being very close to 9 years.
I'm using your code to create the sample data above, but adding a few years and sticking to the name of 'e6' for the dataframe and 'To' for the variable in the event that matters (I really doubt it, but you know...)
In [10]: e6
Out[10]:
To
0 1991-07-01
1 1992-07-01
2 1993-07-01
3 1994-07-01
4 1995-07-01
5 1996-07-01
6 1997-07-01
7 1998-07-01
8 1999-07-01
9 2000-07-01
10 2001-07-01
11 2002-07-01
In [11]: e6.To - e6.To[9]
Out[11]:
0 -3288 days
1 -2922 days
2 -2557 days
3 -2192 days
4 -1827 days
5 -1461 days
6 -1096 days
7 -731 days
8 -366 days
9 0 days
10 365 days
11 730 days
Name: To, dtype: timedelta64[ns]

Categories

Resources