Datetime string with whitespace, local date - python

I am trying to convert a datetime string (German) that comes from MS Project Excel Export.
02 Februar 2022 17:00
I read it from a Excel-Export of MS Project in to a pandas dataframe.
When converting it with
to_datetime(df["Anfang"], format= '%d %B %Y %H:%M').dt.date
but get the error
ValueError: time data '07 Januar 2019 07:00' does not match format '%d %B %Y %H:%M' (match)
from https://docs.python.org/2/library/datetime.html#strftime-and-strptime-behavior
%B Month as locale’s full name. September
What I am doing wrong here?
Do I have to check some local settings?
I am using German(Swiss)
import locale
locale.getdefaultlocale()
('de_CH', 'cp1252')
df in:
0 10 April 2019 08:00
1 07 Januar 2019 07:00
2 07 Januar 2019 07:00
3 07 Januar 2019 07:00
4 09 Oktober 2019 17:00
5 04 Dezember 2020 17:00
Name: Anfang, dtype: object
df out (wanted):
0 10-04-2019
1 07-01-2019
.
.
EDIT:
I changed my locale to ('de_DE', 'cp1252'), but I get the same error.
SOLVED:
By using matJ's answer, I got the error that "Die 15.06.21" was not matching the format, which led me to investigate the data. There I found two different date formats (Thanks, Microsoft!). After cleaning, the above code worked well!!!
So the error message of to_datetime wasn't precise as datetime.strptime.
Thanks for helping.
Johannes

One possible solution is use dateparser module:
import dateparser
df['Anfang'] = df['Anfang'].apply(dateparser.parse)
print (df)
Anfang
0 2019-04-10 08:00:00
1 2019-01-07 07:00:00
2 2019-01-07 07:00:00
3 2019-01-07 07:00:00
4 2019-10-09 17:00:00
5 2020-12-04 17:00:00
import dateparser
df['Anfang'] = df['Anfang'].apply(dateparser.parse).dt.date
print (df)
Anfang
0 2019-04-10
1 2019-01-07
2 2019-01-07
3 2019-01-07
4 2019-10-09
5 2020-12-04

I'd change the locale in a different way. Then your code should work.
The following works for me:
import locale
from datetime import datetime
locale.setlocale(locale.LC_ALL, 'de_DE') # changing locale to german
datetime.strptime('07 Januar 2019 07:00', '%d %B %Y %H:%M') # returns a datetime obj which you can format as you like
Let me know if that works for you as well.

Related

Parsing dates in pandas.to_datetime when date is 'DD-MMM' [duplicate]

I have a column in the following format
Date
June 22
June 23
June 24
June 25
I am trying to convert this column to datetime within a pandas df with the format YYYY-mm-dd
How can I accomplish this? I was able to format the date and convert to mm-dd but not sure how to add the current's year since it's not present in my Date column
df['Date'] = pd.to_datetime(df['Date'], format='%B %d')
Results:
Date
1900-07-22
1900-07-21
1900-07-20
1900-07-19
Desired results:
Date
2021-07-22
2021-07-21
2021-07-20
2021-07-19
Try:
>>> pd.to_datetime(df['Date'].add(' 2021'), format="%B %d %Y")
0 2021-06-22
1 2021-06-23
2 2021-06-24
3 2021-06-25
Name: Date, dtype: datetime64[ns]
Suggested by #HenryEcker, to add the current year instead of specifying 2021:
pd.to_datetime(df['Date'].add(f' {pd.Timestamp.now().year}'), format="%B %d %Y")

Parse Month Day ('%B %d') date column into datetime using current year

I have a column in the following format
Date
June 22
June 23
June 24
June 25
I am trying to convert this column to datetime within a pandas df with the format YYYY-mm-dd
How can I accomplish this? I was able to format the date and convert to mm-dd but not sure how to add the current's year since it's not present in my Date column
df['Date'] = pd.to_datetime(df['Date'], format='%B %d')
Results:
Date
1900-07-22
1900-07-21
1900-07-20
1900-07-19
Desired results:
Date
2021-07-22
2021-07-21
2021-07-20
2021-07-19
Try:
>>> pd.to_datetime(df['Date'].add(' 2021'), format="%B %d %Y")
0 2021-06-22
1 2021-06-23
2 2021-06-24
3 2021-06-25
Name: Date, dtype: datetime64[ns]
Suggested by #HenryEcker, to add the current year instead of specifying 2021:
pd.to_datetime(df['Date'].add(f' {pd.Timestamp.now().year}'), format="%B %d %Y")

Convert date strings with Italian month names to %Y-%m-%d

I would like to convert dates (Before) within a column (After) in date format:
Before After
23 Ottobre 2020 2020-10-23
24 Ottobre 2020 2020-10-24
27 Ottobre 2020 2020-10-27
30 Ottobre 2020 2020-10-30
22 Luglio 2020 2020-07-22
I tried as follows:
from datetime import datetime
date = df.Before.tolist()
dtObject = datetime.strptime(date,"%d %m, %y")
dtConverted = dtObject.strftime("%y-%m-%d")
But it does not work.
Can you explain me how to do it?
Similar to this question, you can set the locale to Italian before parsing:
import pandas as pd
import locale
locale.setlocale(locale.LC_ALL, 'it_IT')
df = pd.DataFrame({'Before': ['30 Ottobre 2020', '22 Luglio 2020']})
df['After'] = pd.to_datetime(df['Before'], format='%d %B %Y')
# df
# Before After
# 0 30 Ottobre 2020 2020-10-30
# 1 22 Luglio 2020 2020-07-22
If you want the "After" column as dtype string, use df['After'].dt.strftime('%Y-%m-%d').

How to extract date from string in Python [duplicate]

This question already has answers here:
Convert string "Jun 1 2005 1:33PM" into datetime
(26 answers)
Closed 3 years ago.
I have strings in following format:
Friday January 3 2020 16:40:57
Thursday January 2 2020 19:26:19
Sunday January 5 2020 01:24:55
Tuesday December 31 2019 17:31:42
What is the best way to convert them into python date and time?
You can use datetime.strptime:
from datetime import datetime
d = "Friday January 3 2020 16:40:57"
datetime_object = datetime.strptime(d, '%A %B %d %Y %H:%M:%S')
print(datetime_object)
You can use dateparser
Install:
$ pip install dateparser
Sample Code:
import dateparser
t1 = 'Friday January 3 2020 16:40:57'
t2 = 'Thursday January 2 2020 19:26:19'
t3 = 'Sunday January 5 2020 01:24:55'
t4 = 'Tuesday December 31 2019 17:31:42'
dt1 = dateparser.parse(t1)
dt2 = dateparser.parse(t2)
dt3 = dateparser.parse(t3)
dt4 = dateparser.parse(t4)
for dt in [dt1, dt2, dt3, dt4]:
print(dt)
Output:
2020-01-03 16:40:57
2020-01-02 19:26:19
2020-01-05 01:24:55
2019-12-31 17:31:42

error reading date time from csv using pandas

I am using Pandas to read and process csv file. My csv file have date/time column that looks like:
11:59:50:322 02 10 2015 -0400 EDT
11:11:55:051 16 10 2015 -0400 EDT
00:38:37:106 02 11 2015 -0500 EST
04:15:51:600 14 11 2015 -0500 EST
04:15:51:600 14 11 2015 -0500 EST
13:43:28:540 28 11 2015 -0500 EST
09:24:12:723 14 12 2015 -0500 EST
13:28:12:346 28 12 2015 -0500 EST
How can I read this using python/pandas, so far what I have is this:
pd.to_datetime(pd.Series(df['senseStartTime']),format='%H:%M:%S:%f %d %m %Y %z %Z')
But this is not working, though previously I was able to use the same code for another format (with a different format specifier). Any suggestions?
The issue you're having is likely because versions of Python before 3.2 (I think?) had a lot of trouble with time zones, so your format string might be screwing up on the %z and %Z parts. For example, in Python 2.7:
In [187]: import datetime
In [188]: datetime.datetime.strptime('11:59:50:322 02 10 2015 -0400 EDT', '%H:%M:%S:%f %d %m %Y %z %Z')
ValueError: 'z' is a bad directive in format '%H:%M:%S:%f %d %m %Y %z %Z'
You're using pd.to_datetime instead of datetime.datetime.strptime but the underlying issues are the same, you can refer to this thread for help. What I would suggest is instead of using pd.to_datetime, do something like
In [191]: import dateutil
In [192]: dateutil.parser.parse('11:59:50.322 02 10 2015 -0400')
Out[192]: datetime.datetime(2015, 2, 10, 11, 59, 50, 322000, tzinfo=tzoffset(None, -14400))
It should be pretty simple to chop off the timezone at the end (which is redundant since you have the offset), and change the ":" to "." between the seconds and microseconds.
Since datetime.timezone has become available in Python 3.2, you can use %z with .strptime() (see docs). Starting with:
dateparse = lambda x: pd.datetime.strptime(x, '%H:%M:%S:%f %d %m %Y %z %Z')
df = pd.read_csv(path, parse_dates=['time_col'], date_parser=dateparse)
to get:
time_col
0 2015-10-02 11:59:50.322000-04:00
1 2015-10-16 11:11:55.051000-04:00
2 2015-11-02 00:38:37.106000-05:00
3 2015-11-14 04:15:51.600000-05:00
4 2015-11-14 04:15:51.600000-05:00
5 2015-11-28 13:43:28.540000-05:00
6 2015-12-14 09:24:12.723000-05:00
7 2015-12-28 13:28:12.346000-05:00

Categories

Resources