How to correctly format datetime - python

We are struggling with formatting datetime in Python 3, and we can't seem to figure it out by our own. So far, we have formatted our dataframe to datetime, so that it should be '%Y-%m-%d %H:%M:%S':
before
02-01-2011 22:00:00
after
2011-01-02 22:00:00
For some very odd reason, when datetime is
13-01-2011 00:00:00
it is changed to this
2011-13-01 00:00:00
And from there it's mixing months with days and is therefore counting months instead of days.
This is all of our code for this datetime formatting:
df['local_date']=df['local_date'] + ':00'
df['local_date'] = pd.to_datetime(df.local_date)
df['local_date']=df['local_date'].dt.strftime('%Y-%m-%d %H:%M:%S')
UPDATED CODE WHICH WORKS:
df['local_date']=df['local_date'] + ':00'
df['local_date'] = pd.to_datetime(df.local_date.str.strip(), format='%d-%m-%Y %H:%M:%S')
df['local_date']=df['local_date'].dt.strftime('%Y-%m-%d %H:%M:%S')

Can't say for sure, but I believe this has to do with the warning mentioned in the documentation of to_datetime:
dayfirst : boolean, default False
Specify a date parse order if arg is str or its list-likes. If True, parses dates with the day first, eg 10/11/12 is parsed as 2012-11-10. Warning: dayfirst=True is not strict, but will prefer to parse with day first (this is a known bug, based on dateutil behavior).
I think the way to get around this is by explicitly pssing a format string to to_datetime:
df['local_date'] = pd.to_datetime(df.local_date, format='%d-%m-%Y %H:%M:%S')
This way it won't accidentally mix months and days (but it will raise an error if any line has a different format)

import pandas as pd
local_date = "13-01-2011 00:00"
local_date = local_date + ":00"
local_date = pd.to_datetime(local_date, format='%d-%m-%Y %H:%M:%S')
local_date = local_date.strftime('%Y-%m-%d %H:%M:%S')
print(local_date)
The output is:
2011-01-13 00:00:00

Related

extract date, month and year from string in python

I have this column where the string has date, month, year and also time information. I need to take the date, month and year only.
There is no space in the string.
The string is on this format:
date
Tuesday,August22022-03:30PMWIB
Monday,July252022-09:33PMWIB
Friday,January82022-09:33PMWIB
and I expect to get:
date
2022-08-02
2022-07-25
2022-01-08
How can I get the date, month and year only and change the format into yyyy-mm-dd in python?
thanks in advance
Use strptime from datetime library
var = "Tuesday,August22022-03:30PMWIB"
date = var.split('-')[0]
formatted_date = datetime.strptime(date, "%A,%B%d%Y")
print(formatted_date.date()) #this will get your output
Output:
2022-08-02
You can use the standard datetime library
from datetime import datetime
dates = [
"Tuesday,August22022-03:30PMWIB",
"Monday,July252022-09:33PMWIB",
"Friday,January82022-09:33PMWIB"
]
for text in dates:
text = text.split(",")[1].split("-")[0]
dt = datetime.strptime(text, '%B%d%Y')
print(dt.strftime("%Y-%m-%d"))
An alternative/shorter way would be like this (if you want the other date parts):
for text in dates:
dt = datetime.strptime(text[:-3], '%A,%B%d%Y-%I:%M%p')
print(dt.strftime("%Y-%m-%d"))
The timezone part is tricky and works only for UTC, GMT and local.
You can read more about the format codes here.
strptime() only accepts certain values for %Z:
any value in time.tzname for your machine’s locale
the hard-coded values UTC and GMT
You can convert to datetime object then get string back.
from datetime import datetime
datetime_object = datetime.strptime('Tuesday,August22022-03:30PM', '%A,%B%d%Y-%I:%M%p')
s = datetime_object.strftime("%Y-%m-%d")
print(s)
You can use the datetime library to parse the date and print it in your format. In your examples the day might not be zero padded so I added that and then parsed the date.
import datetime
date = 'Tuesday,August22022-03:30PMWIB'
date = date.split('-')[0]
if not date[-6].isnumeric():
date = date[:-5] + "0" + date[-5:]
newdate = datetime.datetime.strptime(date, '%A,%B%d%Y').strftime('%Y-%m-%d')
print(newdate)
# prints 2022-08-02

CET timezone strings to datetime

date_cet col1
---------------------------------------
2021-10-31 02:00:00+02:00 7.0
2021-10-31 02:00:00+02:00 7.0
2021-10-31 02:00:00+02:00 8.0
2021-10-31 02:00:00+01:00 10.0
2021-10-31 02:00:00+01:00 11.0
I have a data frame that has columns looking similar to this. This data is imported from SQL into a Pandas data frame, and when I print out the dtypes I can see that the date_cet column is object. Since I need it further on, I want to convert it to a datetime object. However, the stuff I've tried just doesn't work, and I think it might have something to do with 1) the timezone difference and 2) the fact that this date is where DST changes (i.e. the +01:00 and +02:00).
I've tried to do stuff like this:
import datetime as dt
df["new_date"] = [dt.datetime.strptime(str(x), "%Y-%m-%d %H:%M:%S %z") for x in df["date_cet"]]
df['new_date']= pd.to_datetime(df['date_cet'])
and a hand full of other stuff.
The first gives an error of:
ValueError: time data '2021-10-31 02:00:00+02:00' does not match format '%Y-%m-%d %H:%M:%S %z'
And the last:
ValueError: Tz-aware datetime.datetime cannot be converted to datetime64 unless utc=True
Basically, I have no idea how to fix this. I just need this column to become a datetime[ns, Europe/Copenhagen] type of column, but everything I've done so far doesn't work.
In the datetime string ('2021-10-31 02:00:00+02:00') there is no space between %S and %z
try to change to this format - "%Y-%m-%d %H:%M:%S%z"
import datetime as dt
df["new_date"] = [dt.datetime.strptime(str(x), "%Y-%m-%d %H:%M:%S%z") for x in df["date_cet"]]
df['new_date']= pd.to_datetime(df['date_cet'])
Update:
to fix the error try adding - utc=True:
import datetime as dt
df["new_date"] = [dt.datetime.strptime(str(x), "%Y-%m-%d %H:%M:%S%z") for x in df["date_cet"]]
df['new_date']= pd.to_datetime(df['date_cet'], utc=True)
you can do this by one line:
df['new_date']= pd.to_datetime(df['date_cet'], format="%Y-%m-%d %H:%M:%S%z", utc=True)

How to format datetime in a dataframe the way I want?

I cannot find the correct format for this datetime. I have tried several formats, %Y/%m/%d%I:%M:%S%p is the closest format I can find for the example below.
df['datetime'] = '2019-11-13 16:28:05.779'
df['datetime'] = pd.to_datetime(df['datetime'], format="%Y/%m/%d%I:%M:%S%p")
Result:
ValueError: time data '2019-11-13 16:28:05.779' does not match format '%Y/%m/%d%I:%M:%S%p' (match)
Before guessing yourself have pandas make the first guess
df['datetime'] = pd.to_datetime(df['datetime'], infer_datetime_format=True)
0 2019-11-13 16:28:05.779
Name: datetime, dtype: datetime64[ns]
You can solve this probably by using the parameter infer_datetime_format=True. Here's an example:
df = {}
df['datetime'] = '2019-11-13 16:28:05.779'
df['datetime'] = pd.to_datetime(df['datetime'], infer_datetime_format=True)
print(df['datetime'])
print(type(df['datetime'])
Output:
2019-11-13 16:28:05.779000
<class 'pandas._libs.tslibs.timestamps.Timestamp'>
Here is the pandas.to_datetime() call with the correct format string: pd.to_datetime(df['datetime'], format="%Y/%m/%d %H:%M:%S")
You were missing a space, %I is for 12-hour time (the example time you gave is 16:28, and %p is, to quote the docs, the Locale’s equivalent of either AM or PM.

ValueError: time data does not match format, optional milliseconds [duplicate]

Right now I have:
timestamp = datetime.strptime(date_string, '%Y-%m-%d %H:%M:%S.%f')
This works great unless I'm converting a string that doesn't have the microseconds. How can I specify that the microseconds are optional (and should be considered 0 if they aren't in the string)?
You could use a try/except block:
try:
timestamp = datetime.strptime(date_string, '%Y-%m-%d %H:%M:%S.%f')
except ValueError:
timestamp = datetime.strptime(date_string, '%Y-%m-%d %H:%M:%S')
What about just appending it if it doesn't exist?
if '.' not in date_string:
date_string = date_string + '.0'
timestamp = datetime.strptime(date_string, '%Y-%m-%d %H:%M:%S.%f')
I'm late to the party but I found if you don't care about the optional bits this will lop off the .%f for you.
datestring.split('.')[0]
I prefer using regex matches instead of try and except. This allows for many fallbacks of acceptable formats.
# full timestamp with milliseconds
match = re.match(r"\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}\.\d+Z", date_string)
if match:
return datetime.strptime(date_string, "%Y-%m-%dT%H:%M:%S.%fZ")
# timestamp missing milliseconds
match = re.match(r"\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}Z", date_string)
if match:
return datetime.strptime(date_string, "%Y-%m-%dT%H:%M:%SZ")
# timestamp missing milliseconds & seconds
match = re.match(r"\d{4}-\d{2}-\d{2}T\d{2}:\d{2}Z", date_string)
if match:
return datetime.strptime(date_string, "%Y-%m-%dT%H:%MZ")
# unknown timestamp format
return false
Don't forget to import "re" as well as "datetime" for this method.
datetime(*map(int, re.findall('\d+', date_string)))
can parse both '%Y-%m-%d %H:%M:%S.%f' and '%Y-%m-%d %H:%M:%S'. It is too permissive if your input is not filtered.
It is quick-and-dirty but sometimes strptime() is too slow. It can be used if you know that the input has the expected date format.
If you are using Pandas you can also filter the the Series and concatenate it. The index is automatically joined.
import pandas as pd
# Every other row has a different format
df = pd.DataFrame({"datetime_string": ["21-06-08 14:36:09", "21-06-08 14:36:09.50", "21-06-08 14:36:10", "21-06-08 14:36:10.50"]})
df["datetime"] = pd.concat([
pd.to_datetime(df["datetime_string"].iloc[1::2], format="%y-%m-%d %H:%M:%S.%f"),
pd.to_datetime(df["datetime_string"].iloc[::2], format="%y-%m-%d %H:%M:%S"),
])
datetime_string
datetime
0
21-06-08 14:36:09
2021-06-08 14:36:09
1
21-06-08 14:36:09.50
2021-06-08 14:36:09.500000
2
21-06-08 14:36:10
2021-06-08 14:36:10
3
21-06-08 14:36:10.50
2021-06-08 14:36:10.500000
using one regular expression and some list expressions
time_str = "12:34.567"
# time format is [HH:]MM:SS[.FFF]
sum([a*b for a,b in zip(map(lambda x: int(x) if x else 0, re.match(r"(?:(\d{2}):)?(\d{2}):(\d{2})(?:\.(\d{3}))?", time_str).groups()), [3600, 60, 1, 1/1000])])
# result = 754.567
For my similar problem using jq I used the following:
|split("Z")[0]|split(".")[0]|strptime("%Y-%m-%dT%H:%M:%S")|mktime
As the solution to sort my list by time properly.

ValueError: time data '10/11/2006 24:00' does not match format '%d/%m/%Y %H:%M'

I tried:
df["datetime_obj"] = df["datetime"].apply(lambda dt: datetime.strptime(dt, "%d/%m/%Y %H:%M"))
but got this error:
ValueError: time data '10/11/2006 24:00' does not match format
'%d/%m/%Y %H:%M'
How to solve it correctly?
The reason why this does not work is because the %H parameter only accepts values in the range of 00 to 23 (both inclusive). This thus means that 24:00 is - like the error says - not a valid time string.
I think therefore we have not much other options than convert the string to a valid format. We can do this by first replacing 24:00 with 00:00, and then later increment the day for these timestamps.
Like:
from datetime import timedelta
import pandas as pd
df['datetime_zero'] = df['datetime'].str.replace('24:00', '0:00')
df['datetime_er'] = pd.to_datetime(df['datetime_zero'], format='%d/%m/%Y %H:%M')
selrow = df['datetime'].str.contains('24:00')
df['datetime_obj'] = df['datetime_er'] + selrow * timedelta(days=1)
The last line thus adds one day to the rows that contain 24:00, such that '10/11/2006 24:00' gets converted to '11/11/2006 24:00'. Note however that the above is rather unsafe since depending on the format of the timestamp this will/will not work. For the above it will (probably) work, since there is only one colon. But if for example the datetimes have seconds as well, the filter could get triggered for 00:24:00, so it might require some extra work to get it working.
Your data doesn't follow the conventions used by Python / Pandas datetime objects. There should be only one way of storing a particular datetime, i.e. '10/11/2006 24:00' should be rewritten as '11/11/2006 00:00'.
Here's one way to approach the problem:
# find datetimes which have '24:00' and rewrite
twenty_fours = df['strings'].str[-5:] == '24:00'
df.loc[twenty_fours, 'strings'] = df['strings'].str[:-5] + '00:00'
# construct datetime series
df['datetime'] = pd.to_datetime(df['strings'], format='%d/%m/%Y %H:%M')
# add one day where applicable
df.loc[twenty_fours, 'datetime'] += pd.DateOffset(1)
Here's some data to test:
dateList = ['10/11/2006 24:00', '11/11/2006 00:00', '12/11/2006 15:00']
df = pd.DataFrame({'strings': dateList})
Result after transformations described above:
print(df['datetime'])
0 2006-11-11 00:00:00
1 2006-11-11 00:00:00
2 2006-11-12 15:00:00
Name: datetime, dtype: datetime64[ns]
As indicated in the documentation (https://docs.python.org/2/library/datetime.html#strftime-strptime-behavior), hours go from 00 to 23. 24:00 is then an error.

Categories

Resources