I have a column in a pandas dataframe that is created after subtracting two times. I now have a timedelta object like this -1 days +02:45:00. I just need to remove the -1 days and want it to be 02:45:00. Is there a way to do this?
I think you can subtract days converted to timedeltas:
td = pd.to_timedelta(['-1 days +02:45:00','1 days +02:45:00','0 days +02:45:00'])
df = pd.DataFrame({'td': td})
df['td'] = df['td'] - pd.to_timedelta(df['td'].dt.days, unit='d')
print (df.head())
td
0 02:45:00
1 02:45:00
2 02:45:00
print (type(df.loc[0, 'td']))
<class 'pandas._libs.tslibs.timedeltas.Timedelta'>
Or convert timedeltas to strings and extract strings between days and .:
df['td'] = df['td'].astype(str).str.extract('days (.*?)\.')
print (df.head())
td
0 +02:45:00
1 02:45:00
2 02:45:00
print (type(df.loc[0, 'td']))
<class 'str'>
I found this method easy, others didnt work for me
df['column'] = df['column'].astype(str).map(lambda x: x[7:])
It slices of the days part and you only get time part
If your column is named time1, you can do it like this:
import pandas as pd
import datetime as dt
df['time1'] = pd.to_datetime(str(df.time1)[11:19]) #this slice can be adjusted
df['time1'] = df.time1.dt.time
this is going to convert the timedelta to str, slice the time part from it, convert it to datetime and extract the time from that.
I found a very easy solution for other people who may encounter this problem:
if timedelta_obj.days < 0:
timedelta_obj.days = datetime.timedelta(
seconds=timedelta_obj.total_seconds() + 3600*24)
Related
How to remove T00:00:00+05:30 after year, month and date values in pandas? I tried converting the column into datetime but also it's showing the same results, I'm using pandas in streamlit. I tried the below code
df['Date'] = pd.to_datetime(df['Date'])
The output is same as below :
Date
2019-07-01T00:00:00+05:30
2019-07-01T00:00:00+05:30
2019-07-02T00:00:00+05:30
2019-07-02T00:00:00+05:30
2019-07-02T00:00:00+05:30
2019-07-03T00:00:00+05:30
2019-07-03T00:00:00+05:30
2019-07-04T00:00:00+05:30
2019-07-04T00:00:00+05:30
2019-07-05T00:00:00+05:30
Can anyone help me how to remove T00:00:00+05:30 from the above rows?
If I understand correctly, you want to keep only the date part.
Convert date strings to datetime
df = pd.DataFrame(
columns={'date'},
data=["2019-07-01T02:00:00+05:30", "2019-07-02T01:00:00+05:30"]
)
date
0 2019-07-01T02:00:00+05:30
1 2019-07-02T01:00:00+05:30
2 2019-07-03T03:00:00+05:30
df['date'] = pd.to_datetime(df['date'])
date
0 2019-07-01 02:00:00+05:30
1 2019-07-02 01:00:00+05:30
Remove the timezone
df['datetime'] = df['datetime'].dt.tz_localize(None)
date
0 2019-07-01 02:00:00
1 2019-07-02 01:00:00
Keep the date only
df['date'] = df['date'].dt.date
0 2019-07-01
1 2019-07-02
Don't bother with apply to Python dates or string changes. The former will leave you with an object type column and the latter is slow. Just round to the day frequency using the library function.
>>> pd.Series([pd.Timestamp('2000-01-05 12:01')]).dt.round('D')
0 2000-01-06
dtype: datetime64[ns]
If you have a timezone aware timestamp, convert to UTC with no time zone then round:
>>> pd.Series([pd.Timestamp('2019-07-01T00:00:00+05:30')]).dt.tz_convert(None) \
.dt.round('D')
0 2019-07-01
dtype: datetime64[ns]
Pandas doesn't have a builtin conversion to datetime.date, but you could use .apply to achieve this if you want to have date objects instead of string:
import pandas as pd
import datetime
df = pd.DataFrame(
{"date": [
"2019-07-01T00:00:00+05:30",
"2019-07-01T00:00:00+05:30",
"2019-07-02T00:00:00+05:30",
"2019-07-02T00:00:00+05:30",
"2019-07-02T00:00:00+05:30",
"2019-07-03T00:00:00+05:30",
"2019-07-03T00:00:00+05:30",
"2019-07-04T00:00:00+05:30",
"2019-07-04T00:00:00+05:30",
"2019-07-05T00:00:00+05:30"]})
df["date"] = df["date"].apply(lambda x: datetime.datetime.fromisoformat(x).date())
print(df)
import pandas as pd
import datetime
dictt={'s_time': ["06:30:00", "07:30:00","16:30:00"], 'f_time': ["10:30:00", "23:30:00","23:30:00"]}
df=pd.DataFrame(dictt)
in this case i want to convert them times in to datetime object so i can later on use it for calculation or others.
when i command df['s_time']=pd.to_datetime(df['s_time'],format='%H:%M:%S').dt.time
it gives error:
time data '24:00:00' does not match format '%H:%M:%S' (match)
so i dont know how to fix this
"24:00:00" means "00:00:00"
If it's just "24:00:00" that's causing trouble, you can replace the "24:" prefix with "00:":
import pandas as pd
df = pd.DataFrame({'time': ["06:30:24", "07:24:00", "24:00:00"]})
# replace prefix "24:" with "00:"
df['time'] = df['time'].str.replace('^24:', '00:', regex=True)
# now to_datetime
df['time'] = pd.to_datetime(df['time'])
df['time']
0 2021-04-17 06:30:24
1 2021-04-17 07:24:00
2 2021-04-17 00:00:00
Name: time, dtype: datetime64[ns]
1 to 24 hour clock (instead of 0 to 23)
If however your time notation goes from 1 to 24 hours (instead of 0 to 23), you can parse string to timedelta, subtract one hour and then cast to datetime:
df = pd.DataFrame({'time': ["06:30:24", "07:24:00", "24:00:00"]})
# to timedelta and subtract one hour
df['time'] = pd.to_timedelta(df['time']) - pd.Timedelta(hours=1)
# to string and then datettime:
df['time'] = pd.to_datetime(df['time'].astype(str).str.split(' ').str[-1])
df['time']
0 2021-04-17 05:30:24
1 2021-04-17 06:24:00
2 2021-04-17 23:00:00
Name: time, dtype: datetime64[ns]
Note: the underlying assumption here is that the date is irrelevant. If there also is a date, see the related question I linked in the comments section.
I have this column in pandas df:
'''
full_date
2020-12-02T08:11:30-0600
2020-12-02T02:11:50-0600
2020-12-03T08:56:29-0600
'''
I only need the date, hoping to have this column:
'''
date
2020-12-02
2020-12-02
2020-12-03
'''
I have tried to find the solution from previous questions, but still failed. If anyone can help, I will appreciate that a lot. thanks.
In case your column is not a datetime type, you can convert it to that and then use the .dt accessor to get just the date:
>>> df["date"] = df["full_date"].pipe(pd.to_datetime, utc=True).dt.date
>>> print(df)
full_date date
0 2020-12-02T08:11:30-0600 2020-12-02
1 2020-12-02T02:11:50-0600 2020-12-02
2 2020-12-03T08:56:29-0600 2020-12-03
You can convert the datetime very easily using this python code, if suitable.
from dateutil.parser import parse
var = "2020-12-02T08:11:30-0600"
parseddate = parse(var).date()
i'm relatively new to Python
I have a column of data which represents time of the day - but in an integer format hhmm - i.e. 1230, 1559.
I understand that this should be converted to a correct time format so that it can be used correctly.
I've spent a while googling for an answer but I haven't found a definitive solution.
Thank you
If need datetimes, also are necessary dates by function to_datetime, for times add dt.time.
Another solution is convert values to timedeltas - but is necessary format HH:MM:SS:
df = pd.DataFrame({'col':[1230,1559]})
df['date'] = pd.to_datetime(df['col'], format='%H%M')
df['time'] = pd.to_datetime(df['col'], format='%H%M').dt.time
s = df['col'].astype(str)
df['td'] = pd.to_timedelta(s.str[:2] + ':' + s.str[2:] + ':00')
print (df)
col date time td
0 1230 1900-01-01 12:30:00 12:30:00 12:30:00
1 1559 1900-01-01 15:59:00 15:59:00 15:59:00
print (df.dtypes)
col int64
date datetime64[ns]
time object
td timedelta64[ns]
dtype: object
I have a data frame that contains 2 columns, one is Date and other is float number.
I would like to add those 2 to get the following:
Index Date Days NewDate
0 20-04-2016 5 25-04-2016
1 16-03-2015 3.7 20-03-2015
As you can see if there is decimal it is converted as int as 3.1--> 4 (days).
I have some weird questions so I appreciate any help.
Thank you !
First, ensure that the Date column is a datetime object:
df['Date'] = pd.to_datetime(df['Date'])
Then, we can convert the Days column to int by ceiling it and the converting it to a pandas Timedelta:
temp = df['Days'].apply(np.ceil).apply(lambda x: pd.Timedelta(x, unit='D'))
Datetime objects and timedeltas can be added:
df['NewDate'] = df['Date'] + temp
You can convert the Days column to timedelta and add it to Date column:
import pandas as pd
df['NewDate'] = pd.to_datetime(df.Date) + pd.to_timedelta(pd.np.ceil(df.Days), unit="D")
df
using combine for two columns calculations and pd.DateOffset for adding days
df['NewDate'] = df['Date'].combine(df['Days'], lambda x,y: x + pd.DateOffset(days=int(np.ceil(y))))
output:
Date Days NewDate
0 2016-04-20 5.0 2016-04-25
1 2016-03-16 3.7 2016-03-20