i'm relatively new to Python
I have a column of data which represents time of the day - but in an integer format hhmm - i.e. 1230, 1559.
I understand that this should be converted to a correct time format so that it can be used correctly.
I've spent a while googling for an answer but I haven't found a definitive solution.
Thank you
If need datetimes, also are necessary dates by function to_datetime, for times add dt.time.
Another solution is convert values to timedeltas - but is necessary format HH:MM:SS:
df = pd.DataFrame({'col':[1230,1559]})
df['date'] = pd.to_datetime(df['col'], format='%H%M')
df['time'] = pd.to_datetime(df['col'], format='%H%M').dt.time
s = df['col'].astype(str)
df['td'] = pd.to_timedelta(s.str[:2] + ':' + s.str[2:] + ':00')
print (df)
col date time td
0 1230 1900-01-01 12:30:00 12:30:00 12:30:00
1 1559 1900-01-01 15:59:00 15:59:00 15:59:00
print (df.dtypes)
col int64
date datetime64[ns]
time object
td timedelta64[ns]
dtype: object
Related
How to remove T00:00:00+05:30 after year, month and date values in pandas? I tried converting the column into datetime but also it's showing the same results, I'm using pandas in streamlit. I tried the below code
df['Date'] = pd.to_datetime(df['Date'])
The output is same as below :
Date
2019-07-01T00:00:00+05:30
2019-07-01T00:00:00+05:30
2019-07-02T00:00:00+05:30
2019-07-02T00:00:00+05:30
2019-07-02T00:00:00+05:30
2019-07-03T00:00:00+05:30
2019-07-03T00:00:00+05:30
2019-07-04T00:00:00+05:30
2019-07-04T00:00:00+05:30
2019-07-05T00:00:00+05:30
Can anyone help me how to remove T00:00:00+05:30 from the above rows?
If I understand correctly, you want to keep only the date part.
Convert date strings to datetime
df = pd.DataFrame(
columns={'date'},
data=["2019-07-01T02:00:00+05:30", "2019-07-02T01:00:00+05:30"]
)
date
0 2019-07-01T02:00:00+05:30
1 2019-07-02T01:00:00+05:30
2 2019-07-03T03:00:00+05:30
df['date'] = pd.to_datetime(df['date'])
date
0 2019-07-01 02:00:00+05:30
1 2019-07-02 01:00:00+05:30
Remove the timezone
df['datetime'] = df['datetime'].dt.tz_localize(None)
date
0 2019-07-01 02:00:00
1 2019-07-02 01:00:00
Keep the date only
df['date'] = df['date'].dt.date
0 2019-07-01
1 2019-07-02
Don't bother with apply to Python dates or string changes. The former will leave you with an object type column and the latter is slow. Just round to the day frequency using the library function.
>>> pd.Series([pd.Timestamp('2000-01-05 12:01')]).dt.round('D')
0 2000-01-06
dtype: datetime64[ns]
If you have a timezone aware timestamp, convert to UTC with no time zone then round:
>>> pd.Series([pd.Timestamp('2019-07-01T00:00:00+05:30')]).dt.tz_convert(None) \
.dt.round('D')
0 2019-07-01
dtype: datetime64[ns]
Pandas doesn't have a builtin conversion to datetime.date, but you could use .apply to achieve this if you want to have date objects instead of string:
import pandas as pd
import datetime
df = pd.DataFrame(
{"date": [
"2019-07-01T00:00:00+05:30",
"2019-07-01T00:00:00+05:30",
"2019-07-02T00:00:00+05:30",
"2019-07-02T00:00:00+05:30",
"2019-07-02T00:00:00+05:30",
"2019-07-03T00:00:00+05:30",
"2019-07-03T00:00:00+05:30",
"2019-07-04T00:00:00+05:30",
"2019-07-04T00:00:00+05:30",
"2019-07-05T00:00:00+05:30"]})
df["date"] = df["date"].apply(lambda x: datetime.datetime.fromisoformat(x).date())
print(df)
I have a column in a dataframe which has timestamps and their datatype is object (string):
data_log = pd.read_csv(DATA_LOG_PATH)
print(data_log['LocalTime'])
0 09:38:49
1 09:38:50
2 09:38:51
3 09:38:52
4 09:38:53
...
Name: LocalTime, Length: 872, dtype: object
Now I try to convert to datetime:
data_log['LocalTime'] = pd.to_datetime(data_log['LocalTime'], format='%H:%M:%S')
print(data_log['LocalTime'])
0 1900-01-01 09:38:49
1 1900-01-01 09:38:50
2 1900-01-01 09:38:51
3 1900-01-01 09:38:52
4 1900-01-01 09:38:53
...
Name: LocalTime, Length: 872, dtype: datetime64[ns]
How do I remove that date there? I just want the time in the format that I specified, but it adds the 1900-01-01 to every row.
You can get the time part of a datetime series with Series.dt.time
print(data_log['LocalTime'].dt.time)
This series will consist of Python standard library datetime.time objects.
You can do it in different ways from the datatype with 1900-01-01:
data_log['LocalTime'] = pd.Series([lt.time() for lt in data_log['LocalTime']])
or using a lambda function:
data_log['LocalTime'] = data_log.LocalTime.apply(lambda x: x.time(), axis = 1)
For check the type in specific columns:
print(df['LocalTime'].dtypes)
to_dateTime func from pandas
https://pandas.pydata.org/docs/reference/api/pandas.to_datetime.html
df['LocalTime'] = pd.to_datetime(df['timestamp'], unit='s')
where: unit='s' defines the unit of the timestamp (seconds in this case)
For taking consider timezones:
df.rimestamp.dt.tz_localize('UTC').dt.tz_convert('Europe/Brussels')
I have a data frame with a field time of timestamps with dates, and another column period. How can I add a number of days to time based on period?
Current Output:
time period
------------------------------
2020-04-28 10:00:00 1
2020-04-27 12:34:56 3
Expected Output
time
---------------
2020-04-29 10:00:00
2020-04-30 12:34:56
If I try df['time'] = df['time'] + pd.DateOffset(df['period']) I get an error TypeError:nargument must be an integer, got <class 'pandas.core.series.Series'> because it is trying to pass the whole column into the function which expects an integer. How can this be accomplished?
Because days can be converted to timedeltas by to_timedelta is possible use:
df['time'] = df['time'] + pd.to_timedelta(df['period'], unit='d')
print (df)
time period
0 2020-04-29 10:00:00 1
1 2020-04-30 12:34:56 3
But if want add months is necessary use:
df['time'] = df['time'] + df['period'].apply(lambda x: pd.DateOffset(months=x))
print (df)
time period
0 2020-05-28 10:00:00 1
1 2020-07-27 12:34:56 3
If use month timedelatas is working with 'default month', so precision is different:
df['time'] = df['time'] + pd.to_timedelta(df['period'], unit='M')
print (df)
time period
0 2020-05-28 20:29:06 1
1 2020-07-27 20:02:14 3
I have a column in a pandas dataframe that is created after subtracting two times. I now have a timedelta object like this -1 days +02:45:00. I just need to remove the -1 days and want it to be 02:45:00. Is there a way to do this?
I think you can subtract days converted to timedeltas:
td = pd.to_timedelta(['-1 days +02:45:00','1 days +02:45:00','0 days +02:45:00'])
df = pd.DataFrame({'td': td})
df['td'] = df['td'] - pd.to_timedelta(df['td'].dt.days, unit='d')
print (df.head())
td
0 02:45:00
1 02:45:00
2 02:45:00
print (type(df.loc[0, 'td']))
<class 'pandas._libs.tslibs.timedeltas.Timedelta'>
Or convert timedeltas to strings and extract strings between days and .:
df['td'] = df['td'].astype(str).str.extract('days (.*?)\.')
print (df.head())
td
0 +02:45:00
1 02:45:00
2 02:45:00
print (type(df.loc[0, 'td']))
<class 'str'>
I found this method easy, others didnt work for me
df['column'] = df['column'].astype(str).map(lambda x: x[7:])
It slices of the days part and you only get time part
If your column is named time1, you can do it like this:
import pandas as pd
import datetime as dt
df['time1'] = pd.to_datetime(str(df.time1)[11:19]) #this slice can be adjusted
df['time1'] = df.time1.dt.time
this is going to convert the timedelta to str, slice the time part from it, convert it to datetime and extract the time from that.
I found a very easy solution for other people who may encounter this problem:
if timedelta_obj.days < 0:
timedelta_obj.days = datetime.timedelta(
seconds=timedelta_obj.total_seconds() + 3600*24)
I have a dataset where I have 2 columns in a data frame - Date in YYYY-MM-DD format and another column with Hour in format 0100 (for 1am) until 2300 (for 12pm).
Date Hour
2017-01-01 0200
2017-01-01 0400
etc
In order to get it ready for Time series mode, I want to convert these into datetime objects and concatenate these columns. Example output desired: 2017-01-01 01:00:00, etc
I have tried df['Date'] = pd.to_datetime(df['Date']) and converted this into datetime object, But I'm struggling with the Hour column. Please help
This is one way. The trick is to note that pd.to_datetime is actually quite flexible: it accepts strings of the format "YYYY-MM-DD HHMM".
I assume here that your Hour is given as a string (otherwise leading zeros are not possible).
import pandas as pd
df = pd.DataFrame({'Date': ['2017-01-01', '2017-01-01'],
'Hour': ['0200', '0400']})
# as per #COLDSPEED's suggestion
df['DateTime'] = pd.to_datetime(df['Date'] + ' ' + df['Hour'])
print(df)
# Date Hour DateTime
# 0 2017-01-01 0200 2017-01-01 02:00:00
# 1 2017-01-01 0400 2017-01-01 04:00:00
print(df.dtypes)
# Date object
# Hour object
# DateTime datetime64[ns]
# dtype: object
Previous version with pd.DataFrame.apply is possible but inefficient:
df['DateTime'] = df.apply(lambda x: x['Date'] + ' ' + x['Hour'], axis=1)\
.apply(pd.to_datetime)