Converting legacy string dates to dates - python

We have some legacy string dates that I need to convert to actual dates that can be used to perform some date logic. Converting to a date object isn't a problem if I knew what the format were! That is, some people wrote 'dd month yy', othes 'mon d, yyyy', etc.
So, I was wondering if anybody knew of a py module that attempts to guess date formats and rewrites them in a uniform way?
Any other suggestions?

Using dateutil:
In [25]: import dateutil.parser as parser
In [26]: parser.parse('25 December 2010')
Out[26]: datetime.datetime(2010, 12, 25, 0, 0)
In [27]: parser.parse('Dec 25, 2010')
Out[27]: datetime.datetime(2010, 12, 25, 0, 0)

The typical approach is to define a list of formats (strptime formats, specifically), and try them in turn, until one works. As an example, see this recipe:
http://code.activestate.com/recipes/577135-parse-a-datetime-string-to-a-datetime-instance/
strptime accepts quite a number of formats. If you can enumerate all the possibilities you'll see, you should be able to hack together a variant of that recipe that'll do what you want.

Related

Convert date into a specific format in python irrespective of the input format is

I have a problem statement to convert the dates into a specific format. However, the input can be of any format.
For example,
Input
Desired_output
2020/11/20
2020-11-20
20201120
2020-11-20
20202011
2020-11-20
11/20/2020
2020-11-20
202020Nov
2020-11-20
I'm able to solve where is a delimiter present in between year, month and date using the Dateutil package. But it is not able to solve where is no clear delimiter.
Is there any way to solve this problem?
I don't think there is an easy way to universally parse dates that don't follow the standard formats. You'll have to specify the format in which the date is written for it to be parsed, which you can easily do using strptime (see here for a reference on datetime formatting).
A package like dateutil can be helpful as well. You should take a look, however, at the accepted date formats. In general, dateutil is more convenient, but strptime gives you more control and ability to parse more date formats.
Four out of the five examples you mentioned can be parsed using dateutil as follows:
from dateutil import parser
parser.parse("2020/11/20") # Output: datetime.datetime(2020, 11, 20, 0, 0)
parser.parse("11/20/2020") # Output: datetime.datetime(2020, 11, 20, 0, 0)
For the two other examples, the dayfirst and yearfirst arguments are needed to let dateutil parse them correctly:
parser.parse("20201120", yearfirst=True)
parser.parse("20202011", yearfirst=True, dayfirst=True)
# Output for both: datetime.datetime(2020, 11, 20, 0, 0)
Finally, I'm not aware of a way to parse the date "202020Nov" using dateutil; however, it can be parsed using strptime as follows:
from datetime import datetime
datetime.strptime("202020Nov", "%Y%d%b")
# Output: datetime.datetime(2020, 11, 20, 0, 0)
All the best.

Identify Format of date in python

How do I get the date format for the given date input in python?
Note:
The input is given by the user which is not predefined format .They may
give any kind of input format ..
The below example is working for dd-mm-yyyy format .But this is not in
my case.Date format is not predefined.
datetime.datetime.strptime('24052010', "%d%m%Y").date()
Expected :
Input 1: 21–02–2019 ,Output: DD-MM-YYYY .
Input 2: 02/21/2019 ,Output : MM/DD/YYYY
I think such function cannot be done because some dates (for example 01/01/2019) cannot be interpreted in one way. This can be both MM/DD/YYYY and DD/MM/YYYY. So you can only check if the date is in such format or not (you can use the answers to this question: How do I validate a date string format in python?).
You can use the module dateutil. It has a robust parser that will try to make sense of any date.
>>> from dateutil import parser
>>> parser.parse("21-02-2019")
datetime.datetime(2019, 2, 21, 0, 0)
>>> parser.parse("02/21/2019")
datetime.datetime(2019, 2, 21, 0, 0)
This isn't exactly what you wanted: you get the date not the format. But if you have the date, do you really need the format?
To meet J Kluseczka's point about some dates being ambiguous (like "01/10/2019") you can specify your assumption:
>>> parser.parse("01/10/2019")
datetime.datetime(2019, 1, 10, 0, 0)
>>> parser.parse("01/10/2019",dayfirst=True)
datetime.datetime(2019, 10, 1, 0, 0)
dateutil isn't part of the standard library but it is well worth the trouble of downloading.

Getting today's date in YYYY-MM-DD in Python?

Is there a nicer way than the following to return today's date in the YYYY-MM-DD format?
str(datetime.datetime.today()).split()[0]
Use strftime:
>>> from datetime import datetime
>>> datetime.today().strftime('%Y-%m-%d')
'2021-01-26'
To also include a zero-padded Hour:Minute:Second at the end:
>>> datetime.today().strftime('%Y-%m-%d %H:%M:%S')
'2021-01-26 16:50:03'
To get the UTC date and time:
>>> datetime.utcnow().strftime('%Y-%m-%d %H:%M:%S')
'2021-01-27 00:50:03'
You can use datetime.date.today() and convert the resulting datetime.date object to a string:
from datetime import date
today = str(date.today())
print(today) # '2017-12-26'
I always use the isoformat() method for this.
from datetime import date
today = date.today().isoformat()
print(today) # '2018-12-05'
Note that this also works on datetime objects if you need the time in the standard ISO 8601 format as well.
from datetime import datetime
now = datetime.today().isoformat()
print(now) # '2018-12-05T11:15:55.126382'
Very late answer, but you can simply use:
import time
today = time.strftime("%Y-%m-%d")
# 2023-02-08
Datetime is just lovely if you like remembering funny codes. Wouldn't you prefer simplicity?
>>> import arrow
>>> arrow.now().format('YYYY-MM-DD')
'2017-02-17'
This module is clever enough to understand what you mean.
Just do pip install arrow.
Addendum: In answer to those who become exercised over this answer let me just say that arrow represents one of the alternative approaches to dealing with dates in Python. That's mostly what I meant to suggest.
Are you working with Pandas?
You can use pd.to_datetime from the pandas library. Here are various options, depending on what you want returned.
import pandas as pd
pd.to_datetime('today') # pd.to_datetime('now')
# Timestamp('2019-03-27 00:00:10.958567')
As a python datetime object,
pd.to_datetime('today').to_pydatetime()
# datetime.datetime(2019, 4, 18, 3, 50, 42, 587629)
As a formatted date string,
pd.to_datetime('today').isoformat()
# '2019-04-18T04:03:32.493337'
# Or, `strftime` for custom formats.
pd.to_datetime('today').strftime('%Y-%m-%d')
# '2019-03-27'
To get just the date from the timestamp, call Timestamp.date.
pd.to_datetime('today').date()
# datetime.date(2019, 3, 27)
Aside from to_datetime, you can directly instantiate a Timestamp object using,
pd.Timestamp('today') # pd.Timestamp('now')
# Timestamp('2019-04-18 03:43:33.233093')
pd.Timestamp('today').to_pydatetime()
# datetime.datetime(2019, 4, 18, 3, 53, 46, 220068)
If you want to make your Timestamp timezone aware, pass a timezone to the tz argument.
pd.Timestamp('now', tz='America/Los_Angeles')
# Timestamp('2019-04-18 03:59:02.647819-0700', tz='America/Los_Angeles')
Yet another date parser library: Pendulum
This one's good, I promise.
If you're working with pendulum, there are some interesting choices. You can get the current timestamp using now() or today's date using today().
import pendulum
pendulum.now()
# DateTime(2019, 3, 27, 0, 2, 41, 452264, tzinfo=Timezone('America/Los_Angeles'))
pendulum.today()
# DateTime(2019, 3, 27, 0, 0, 0, tzinfo=Timezone('America/Los_Angeles'))
Additionally, you can also get tomorrow() or yesterday()'s date directly without having to do any additional timedelta arithmetic.
pendulum.yesterday()
# DateTime(2019, 3, 26, 0, 0, 0, tzinfo=Timezone('America/Los_Angeles'))
pendulum.tomorrow()
# DateTime(2019, 3, 28, 0, 0, 0, tzinfo=Timezone('America/Los_Angeles'))
There are various formatting options available.
pendulum.now().to_date_string()
# '2019-03-27'
pendulum.now().to_formatted_date_string()
# 'Mar 27, 2019'
pendulum.now().to_day_datetime_string()
# 'Wed, Mar 27, 2019 12:04 AM'
Rationale for this answer
A lot of pandas users stumble upon this question because they believe it is a python question more than a pandas one. This answer aims to be useful to folks who are already using these libraries and would be interested to know that there are ways to achieve these results within the scope of the library itself.
If you are not working with pandas or pendulum already, I definitely do not recommend installing them just for the sake of running this code! These libraries are heavy and come with a lot of plumbing under the hood. It is not worth the trouble when you can use the standard library instead.
from datetime import datetime
date = datetime.today().date()
print(date)
Use f-strings, they are usually the best choice for any text-variable mix:
from datetime import date
print(f'{date.today():%Y-%m-%d}')
Taken from Python f-string formatting not working with strftime inline which has the official links as well.
If you need e.g. pacific standard time (PST) you can do
from datetime import datetime
import pytz
tz = pytz.timezone('US/Pacific')
datetime.now(tz).strftime('%Y-%m-%d %H:%M:%S')
# '2021-09-02 10:21:41'
my code is a little complicated but I use it a lot
strftime("%y_%m_%d", localtime(time.time()))
reference:'https://strftime.org/
you can look at the reference to make anything you want
for you what YYYY-MM-DD just change my code to:
strftime("%Y-%m-%d", localtime(time.time()))
This works:
from datetime import date
today =date.today()
Output in this time: 2020-08-29
Additional:
this_year = date.today().year
this_month = date.today().month
this_day = date.today().day
print(today)
print(this_year)
print(this_month)
print(this_day)
To get day number from date is in python
for example:19-12-2020(dd-mm-yyy)order_date
we need 19 as output
order['day'] = order['Order_Date'].apply(lambda x: x.day)

Guessing date format for many identically-formatted dates in Python

I have a large set of datetime strings and it can be safely assumed that they're all identically formatted. For example, I might have the set of dates "7/1/13 0:45", "5/2/13 6:21", "7/15/13 1:24", "7/9/13 12:41", "4/30/13 3:12". The idea would be to get their common format with a reasonable amount of reliability so that they can be parsed using strptime or similar.
Is there any easy way to guess the format? Ideally a library that does this?
Check out https://github.com/jeffreystarr/dateinfer
Seems a little abandoned but maybe it will go with your needs.
Have you tried using dateutil.parser.parse on the tokenized time strings from the set?
It's often very robust to a wide range of formats, or from errors you get it becomes obvious how to slightly massage your data into a format that it works with.
In [11]: dateutil.parser.parse("7/1/13 0:45")
Out[11]: datetime.datetime(2013, 7, 1, 0, 45)
Do take care of ambiguities in the data. For example, it doesn't look like your time stamps use 24 hours, but instead would report "3:00 pm" and "3:00 am" identically on the same date. Unless you have some way of assigning am / pm to the data, no parser can help you out of that issue.
If your date strings are stored in an iterable then you can use map to apply the parse function to all of the strings:
In [12]: the_dates = ["7/1/13 0:45", "12/2/14 1:38", "4/30/13 12:12"]
In [13]: map(dateutil.parser.parse, the_dates)
Out[13]:
[datetime.datetime(2013, 7, 1, 0, 45),
datetime.datetime(2014, 12, 2, 1, 38),
datetime.datetime(2013, 4, 30, 12, 12)]
And if you are in need of some of the extra arguments to dateutil.parser.parse that will indicate the formatting to use, you can use functools.partial to first bind those keyword arguments, and then use map as above to apply the partial function.
For example, suppose you wanted to be extra careful that DAY is treated as the first number. You could always call parse with the extra argument dayfirst=True, or you could pre-bind this argument and treat it like a new function that always had this property.
In [42]: import functools
In [43]: new_parse = functools.partial(dateutil.parser.parse, dayfirst=True)
In [44]: map(new_parse, the_dates)
Out[44]:
[datetime.datetime(2013, 1, 7, 0, 45),
datetime.datetime(2014, 2, 12, 1, 38),
datetime.datetime(2013, 4, 30, 12, 12)]
In [45]: new_parse.keywords
Out[45]: {'dayfirst': True}
In [46]: new_parse.func
Out[46]: <function dateutil.parser.parse>
(Note that in this example, the third date cannot be parsed with day-first, since neither 30 nor 13 can be a month... so it falls back to the default format in that case).

how to convert this date string to "2011-02-15T12:00+00:00" python datetime object

how to convert this date string to "2011-02-15T12:00+00:00" python datetime object in following format "Wed, Feb, 15, 2011 15:00" ?
It seems ISO 8601 format. Try using iso8601 package — you can install it through pip or easy_install.
Many file formats and standards use the ISO 8601 date format (e.g. 2007-01-14T20:34:22+00:00) to store dates in a neutral, unambiguous manner. This simple module parses the most common forms encountered and returns datetime objects.
>>> import iso8601
>>> iso8601.parse_date("2007-06-20T12:34:40+03:00")
datetime.datetime(2007, 6, 20, 12, 34, 40, tzinfo=<FixedOffset '+03:00'>)
>>> iso8601.parse_date("2007-06-20T12:34:40Z")
datetime.datetime(2007, 6, 20, 12, 34, 40, tzinfo=<iso8601.iso8601.Utc object at 0x100ebf0>)
Considering that you know the exact format of the date string, you can parse it to extract each value.
I'm not sure what the +00:00 part means, so I'll ignore that for now.
str="2011-02-15T12:00+00:00"
year=int(str[:4])
month=int(str[5:7])
day=int(str[8:10])
hour=int(str[11:13])
minute=int(str[14:16])
date = datetime(year,month,day,hour,minute)

Categories

Resources