Using Pandas 1.0.0, how can I change the time of a datetime dataframe column to midnight in one line of code?
e.g.:
from
START_DATETIME
2017-02-13 09:13:33
2017-03-11 23:11:35
2017-03-12 00:44:32
...
to
START_DATETIME
2017-02-13 00:00:00
2017-03-11 00:00:00
2017-03-12 00:00:00
...
My attempt:
df['START_DATETIME'] = df['START_DATETIME'].apply(lambda x: pd.Timestamp(x).replace(hour=0, minute=0, second=0))
but this produces
START_DATETIME
2017-02-13
2017-03-11
2017-03-12
...
Your method already converted datetime values correctly to midnight. I.e., their time are 00:00:00. Pandas just intelligently doesn't show the time part because it is redundant to show all same time of 00:00:00. After you assigning result back to START_DATETIME, print a cell will show
print(df.loc[0, START_DATETIME])
Output:
2017-02-13 00:00:00
Besides, to convert time to 00:00:00, you should use dt.normalize or dt.floor
df['START_DATETIME'] = pd.to_datetime(df['START_DATETIME']).dt.normalize()
or
df['START_DATETIME'] = pd.to_datetime(df['START_DATETIME']).dt.floor('D')
If you want to force pandas to show 00:00:00 in the series output, you need convert START_DATETIME to str after converting
pd.to_datetime(df['START_DATETIME']).dt.floor('D').dt.strftime('%Y-%m-%d %H:%M:%S')
Out[513]:
0 2017-02-13 00:00:00
1 2017-03-11 00:00:00
2 2017-03-12 00:00:00
Name: START_DATETIME, dtype: object
You can do:
import pandas as pd
df=pd.DataFrame({"START_DATETIME":
["2017-02-13 09:13:33","2017-03-11 23:11:35","2017-03-12 00:44:32"]})
#you should convert it to date time first
#in case if it's not already:
df["START_DATETIME"]=pd.to_datetime(df["START_DATETIME"])
df["START_DATETIME_DT"]=df["START_DATETIME"].dt.strftime("%Y-%m-%d 00:00:00")
Outputs:
START_DATETIME START_DATETIME_DT
0 2017-02-13 09:13:33 2017-02-13 00:00:00
1 2017-03-11 23:11:35 2017-03-11 00:00:00
2 2017-03-12 00:44:32 2017-03-12 00:00:00
Related
I've got some date and time data as a string that is formatted like this, in UTC:
,utc_date_and_time, api_calls
0,2022-10-20 00:00:00,12
1,2022-10-20 00:05:00,14
2,2022-10-20 00:10:00,17
Is there a way to create another column here that always represents that time, but so it is for London/Europe?
,utc_date_and_time, api_calls, london_date_and_time
0,2022-10-20 00:00:00,12,2022-10-20 01:00:00
1,2022-10-20 00:05:00,14,2022-10-20 01:05:00
2,2022-10-20 00:10:00,17,2022-10-20 01:10:00
I want to write some code that, for any time of the year, will display the time in London - but I'm worried that when the timezone changes in London/UK that my code will break.
with pandas, you'd convert to datetime, specify UTC and then call tz_convert:
df
Out[9]:
utc_date_and_time api_calls
0 2022-10-20 00:00:00 12
1 2022-10-20 00:05:00 14
2 2022-10-20 00:10:00 17
df["utc_date_and_time"] = pd.to_datetime(df["utc_date_and_time"], utc=True)
df["london_date_and_time"] = df["utc_date_and_time"].dt.tz_convert("Europe/London")
df
Out[12]:
utc_date_and_time api_calls london_date_and_time
0 2022-10-20 00:00:00+00:00 12 2022-10-20 01:00:00+01:00
1 2022-10-20 00:05:00+00:00 14 2022-10-20 01:05:00+01:00
2 2022-10-20 00:10:00+00:00 17 2022-10-20 01:10:00+01:00
in vanilla Python >= 3.9, you'd let zoneinfo handle the conversion;
from datetime import datetime
from zoneinfo import ZoneInfo
t = "2022-10-20 00:00:00"
# to datetime, set UTC
dt = datetime.fromisoformat(t).replace(tzinfo=ZoneInfo("UTC"))
# to london time
dt_london = dt.astimezone(ZoneInfo("Europe/London"))
print(dt_london)
2022-10-20 01:00:00+01:00
You should use utc timezone
from datetime import datetime, timezone
datetime.now(timezone.utc).isoformat()
Outputs:
2022-10-25T15:27:08.874057+00:00
Trying to convert object type variable to datetime type
pd.to_datetime(df['Time'])
0 13:08:00
1 10:29:00
2 13:23:00
3 20:33:00
4 10:37:00
Error :<class 'datetime.time'> is not convertible to datetime
Please help how can I convert object to datetime and merge with date variable.
What you have are datetime.time objects, as the error tells you. You can use their string representation and parse to pandas datetime or timedelta, depending on your needs. Here's three options for example,
import datetime
import pandas as pd
df = pd.DataFrame({'Time': [datetime.time(13,8), datetime.time(10,29), datetime.time(13,23)]})
# 1)
# use string representation and parse to datetime:
pd.to_datetime(df['Time'].astype(str))
# 0 2022-01-19 13:08:00
# 1 2022-01-19 10:29:00
# 2 2022-01-19 13:23:00
# Name: Time, dtype: datetime64[ns]
# 2)
# add as timedelta to a certain date:
pd.Timestamp('2020-1-1') + pd.to_timedelta(df['Time'].astype(str))
# 0 2020-01-01 13:08:00
# 1 2020-01-01 10:29:00
# 2 2020-01-01 13:23:00
# Name: Time, dtype: datetime64[ns]
# 3)
# add the cumulated sum of the timedelta to a starting date:
pd.Timestamp('2020-1-1') + pd.to_timedelta(df['Time'].astype(str)).cumsum()
# 0 2020-01-01 13:08:00
# 1 2020-01-01 23:37:00
# 2 2020-01-02 13:00:00
# Name: Time, dtype: datetime64[ns]
df['col'] = df['col'].astype('datetime64')
This worked for me.
This is my first time to post a question here, if I don't explain the question very clearly, please give me a chance to improve the way of asking. Thank you!
I have a dataset contains dates and times like this
TIME COL1 COL2 COL3 ...
2018/12/31 23:50:23 34 DC 23
2018/12/31 23:50:23 32 NC 23
2018/12/31 23:50:19 12 AL 33
2018/12/31 23:50:19 56 CA 23
2018/12/31 23:50:19 98 CA 33
I want to create a new column and the format would be like '2018-12-31 11:00:00 PM' instead of '2018/12/31 23:10:23' and 17:40 was rounded up to 6:00
I have tried to use .dt.strftime("%Y-%m-%d %H:%M:%S") to change the format and then when I try to convert the time from 12h to 24h, I stuck here.
Name: TIME, Length: 3195450, dtype: datetime64[ns]
I found out the type of df['TIME'] is pandas.core.series.Series
Now I have no idea about how to continue. Please give me some ideas, hints or any instructions. Thank you very much!
From your example it seems you want to floor to the hour, instead of round? In any case, first make sure your TIME column is of datetime dtype.
df['TIME'] = pd.to_datetime(df['TIME'])
Now floor (or round) using the dt accessor and an offset alias:
df['newTIME'] = df['TIME'].dt.floor('H') # could use round instead of floor here
# df['newTIME']
# 0 2018-12-31 23:00:00
# 1 2018-12-31 23:00:00
# 2 2018-12-31 23:00:00
# 3 2018-12-31 23:00:00
# 4 2018-12-31 23:00:00
# Name: newTIME, dtype: datetime64[ns]
Afer that, you can format to string in a desired format, again using the dt accessor to access properties of a datetime series:
df['timestring'] = df['newTIME'].dt.strftime("%Y-%m-%d %I:%M:%S %p")
# df['timestring']
# 0 2018-12-31 11:00:00 PM
# 1 2018-12-31 11:00:00 PM
# 2 2018-12-31 11:00:00 PM
# 3 2018-12-31 11:00:00 PM
# 4 2018-12-31 11:00:00 PM
# Name: timestring, dtype: object
I would like to make a subtraction with date_time in pandas python but with a shift of two rows, I don't know the function
Timestamp
2020-11-26 20:00:00
2020-11-26 21:00:00
2020-11-26 22:00:00
2020-11-26 23:30:00
Explanation:
(2020-11-26 21:00:00) - (2020-11-26 20:00:00)
(2020-11-26 23:30:00) - (2020-11-26 22:00:00)
The result must be:
01:00:00
01:30:00
Firstly you need to check if this is as type datetime.
If not, kindly do pd.to_datetime()
demo = pd.DataFrame(columns=['Timestamps'])
demotime = ['20:00:00','21:00:00','22:00:00','23:30:00']
demo['Timestamps'] = demotime
demo['Timestamps'] = pd.to_datetime(demo['Timestamps'])
Your dataframe would look like:
Timestamps
0 2020-11-29 20:00:00
1 2020-11-29 21:00:00
2 2020-11-29 22:00:00
3 2020-11-29 23:30:00
After that you can either use for loop or while and in that just do:
demo.iloc[i+1,0]-demo.iloc[i,0]
IIUC, you want to iterate on chunks of two and find the difference, one approach is to:
res = df.groupby(np.arange(len(df)) // 2).diff().dropna()
print(res)
Output
Timestamp
1 0 days 01:00:00
3 0 days 01:30:00
I have a data set with a column date like this:
cod date value
0 1O8 2015-01-01 00:00:00 2.1
1 1O8 2015-01-01 01:00:00 2.3
2 1O8 2015-01-01 02:00:00 3.5
3 1O8 2015-01-01 03:00:00 4.5
4 1O8 2015-01-01 04:00:00 4.4
5 1O8 2015-01-01 05:00:00 3.2
6 1O9 2015-01-01 00:00:00 1.4
7 1O9 2015-01-01 01:00:00 8.6
8 1O9 2015-01-01 02:00:00 3.3
10 1O9 2015-01-01 03:00:00 1.5
11 1O9 2015-01-01 04:00:00 2.4
12 1O9 2015-01-01 05:00:00 7.2
The dtypes of column date is an object, for apply some function after I need to change the date column type to datatime. I try a diffrent solution like:
pd.to_datetime(df['date'], errors='raise', format ='%Y-%m-%d HH:mm:ss')
pd.to_datetime(df['date'], errors='coerce', format ='%Y-%m-%d HH:mm:ss')
df['date'].apply(pd.to_datetime, format ='%Y-%m-%d HH:mm:ss')
But the error is only the same:
TypeError: Unrecognized value type: <class 'str'>
ValueError: Unknown string format
The straight thing is that if I apply te function to a sample of data set, the function respond correctly, but if I apply it to all data set exit the error. In the data there isn missing value and the dtype is the same for all value.
How I can fix this error?
There are three issues:
pd.to_datetime and pd.Series.apply don't work in place, so your solutions won't modify your series. Assign back after conversion.
Your third solution needs errors='coerce' to guarantee no errors.
For the time component you need to use specific string formats beginning with %.
So you can use:
df = pd.DataFrame({'date': ['2015-01-01 00:00:00', '2016-12-20 15:00:20',
'2017-08-05 00:05:00', '2018-05-11 00:10:00']})
df['date'] = pd.to_datetime(df['date'], errors='coerce', format='%Y-%m-%d %H:%M:%S')
print(df)
date
0 2015-01-01 00:00:00
1 2016-12-20 15:00:20
2 2017-08-05 00:05:00
3 2018-05-11 00:10:00
In this particular instance, the format is standard and can be omitted:
df['date'] = pd.to_datetime(df['date'], errors='coerce')
I understand you read this data for example from csv file.
df=pd.read_csv('c:/1/comptagevelo2012.csv', index_col=0, parse_dates=True)
To check:
print(df.index)
Is works better than pd.to_datetime!! I checked it!
> DatetimeIndex(['2012-01-01', '2012-02-01', '2012-03-01', '2012-04-01',
> '2012-05-01', '2012-06-01', '2012-07-01', '2012-08-01',
> '2012-09-01', '2012-10-01',
> ...
> '2012-12-22', '2012-12-23', '2012-12-24', '2012-12-25',
> '2012-12-26', '2012-12-27', '2012-12-28', '2012-12-29',
> '2012-12-30', '2012-12-31'],
> dtype='datetime64[ns]', length=366, freq=None)
Another method doesn't work for this file.
df=pd.read_csv('c:/1/comptagevelo2012.csv',index_col=0)
pd.to_datetime(df['Date'], errors='coerce', format ='%d/%m/%Y')
print(df.index)
Index(['01/01/2012', '02/01/2012', '03/01/2012', '04/01/2012', '05/01/2012',
'06/01/2012', '07/01/2012', '08/01/2012', '09/01/2012', '10/01/2012',
...
'22/12/2012', '23/12/2012', '24/12/2012', '25/12/2012', '26/12/2012',
'27/12/2012', '28/12/2012', '29/12/2012', '30/12/2012', '31/12/2012'],
dtype='object', length=366)
sorce: https://keyrus-gitlab.ml/gfeuillen/keyrus-training/blob/5f0076e3c61ad64336efc9bc3fd862bfed53125c/docker/data/python/Exercises/02%20pandas/comptagevelo2012.csv