I am using Python 2.7.
I have an Adobe PDF form doc that has a date field. I extract the values using the pdfminer function. The problem I need to solve is, the user in Adobe Acrobat reader is allowed to type in strings like april 3rd 2017 or 3rd April 2017 or Apr 3rd 2017 or 04/04/2017 as well as 4 3 2017. Now the date field in Adobe is set to mm/dd/yyyy format, so when a user types in one of the values above, that is the actual value that pdfminer pulls, yet adobe will display it as 04/03/2017, but when you click on the field is shows you the actual value like the ones above. Adobe allows this and then doing it's on conversion I think to display the date as mm/dd/yyyy. There is ability to use javascript with adobe for more control, but i can't do that the users can only have and use the pdf form without any accompanying javascript file.
So I was looking to find a method with datetime in Python that would be able to accept a written date such as the examples above from a string and then convert them into a true mm/dd/yyyy format??? I saw methods for converting long and short month names but nothing that would handle day names like 1st,2nd,3rd,4th .
You could just try each possible format in turn. First remove any st nd rd specifiers to make the testing easier:
from datetime import datetime
formats = ["%B %d %Y", "%d %B %Y", "%b %d %Y", "%m/%d/%Y", "%m %d %Y"]
dates = ["april 3rd 2017", "3rd April 2017", "Apr 3rd 2017", "04/04/2017", "4 3 2017"]
for date in dates:
date = date.lower().replace("rd", "").replace("nd", "").replace("st", "")
for format in formats:
try:
print datetime.strptime(date, format).strftime("%m/%d/%Y")
except ValueError:
pass
Which would display:
04/03/2017
04/03/2017
04/03/2017
04/04/2017
04/03/2017
This approach has the benefit of validating each date. For example a month greater than 12. You could flag any dates that failed all allowed formats.
Just write a regular expression to get the number out of the string.
import re
s = '30Apr'
n = s[:re.match(r'[0-9]+', s).span()[1]]
print(n) # Will print 30
The other things should be easy.
Based on #MartinEvans's anwser, but using arrow library: (because it handles more cases than datetime so you don't have to use replace() nor lower())
First install arrow:
pip install arrow
Then try each possible format:
import arrow
dates = ['april 3rd 2017', '3rd April 2017', 'Apr 3rd 2017', '04/04/2017', '4 3 2017']
formats = ['MMMM Do YYYY', 'Do MMMM YYYY', 'MMM Do YYYY', 'MM/DD/YYYY', 'M D YYYY']
def convert_datetime(date):
for format in formats:
try:
print arrow.get(date, format).format('MM/DD/YYYY')
except arrow.parser.ParserError:
pass
[convert_datetime(date) for date in dates]
Will output:
04/03/2017
04/03/2017
04/03/2017
04/04/2017
04/03/2017
If you are unsure of what could be wrong in your date format, you can also output a nice error message if none of the date matches the format:
def convert_datetime(date):
for format in formats:
try:
print arrow.get(date, format).format('MM/DD/YYYY')
break
except (arrow.parser.ParserError, ValueError) as e:
pass
else:
print 'For date: "{0}", {1}'.format(date, e)
convert_datetime('124 5 2017') # test invalid date
Will output the following error message:
'For date: "124 5 2017", month must be in 1..12'
Related
I need to parse a few dates that are roughly in the format (1 or 2-digit year)-(Month abbreviation), for example:
5-Jun (June 2005)
13-Jan (January 2013)
I tried using strptime with the format %b-%y but it did not consistently produce the desired date. Per the documentation, this is because some years in my dataset are not zero-padded.
Further, when I tested the datetime module (please see below for my code) on the string "5-Jun", I got "2019-06-05", instead of the desired result (June 2005), even if I set yearfirst=True when calling parse.
from dateutil.parser import parse
parsed = parse("5-Jun",yearfirst=True)
print(parsed)
It will be easier if 0 is padded to single digit years, as it can be directly converted to time using format. Regular expression is used here to replace any instance of single digit number with it's '0 padded in front' value. I've used regex from here.
Sample code:
import re
match_condn = r'\b([0-9])\b'
replace_str = r'0\1'
datetime.strptime(re.sub(match_condn, replace_str, '15-Jun'), '%y-%b').strftime("%B %Y")
Output:
June 2015
One approach is to use str.zfill
Ex:
import datetime
d = ["5-Jun", "13-Jan"]
for date in d:
date, month = date.split("-")
date = date.zfill(2)
print(datetime.datetime.strptime(date+"-"+month, "%y-%b").strftime("%B %Y"))
Output:
June 2005
January 2013
Ah. I see from #Rakesh's answer what your data is about. I thought you needed to parse the full name of the month. So you had your two terms %b and %y backwards, but then you had the problem with the single-digit years. I get it now. Here's a much simpler way to get what you want if you can assume your dates are always in one of the two formats you indicate:
inp = "5-Jun"
t = time.strptime(("0" + inp)[-6:], "%y-%b")
I want to check if the format of the date input by user matches the below:
Jan 5 2018 6:10 PM
Month: First letter should be caps, followed 2 more in small. (total 3 letters)
<Space>: single space, must exist
Date: For single digit it should not be 05, but 5
<Space>: single space, must exist
Hour: 0-12, for single digit it should not be 06, but 6
Minute: 00-59
AM/PM
I'm using the below regex and trying to match:
import re,sys
usr_date = str(input("Please enter the older date until which you want to scan ? \n[Date Format Example: Jan 5 2018 6:10 PM] : "))
valid_usr_date = re.search("^(\s+)*[A-Z]{1}[a-z]{2}\s{1}[1-31]{1}\s{1}[1-2]{1}[0-9]{1}[0-9]{1}[0-9]{1}\s{1}[0-12]{1}:[0-5]{1}[0-9]{1}\s{1}(A|P)M$",usr_date,re.M)
if not valid_usr_date:
print ("The date format is incorrect. Please follow the exact date format as shown in example. Exiting Program!")
sys.exit()
But, even for the correct format it gives a syntax wrong error. What am I doing wrong.
I would not use regex for that, as you have no way to actually validate the date itself (eg, a regex will happily accept Abc 99 9876 9:99 PM).
Instead, use strptime:
from datetime import datetime
string = 'Jan 5 2018 6:10 PM'
datetime.strptime(string, '%b %d %Y %I:%M %p')
If the string would be in the "wrong" format you'd get a ValueError.
The only apparent "problem" with this approach is that for some reason you require the day and hour not to be zero-padded and strptime doesn't seem to have such directives.
A table with all available directives is here.
You could use a function which parses the input string and tries to return a datetime object, if it can't it raises an ValueError:
from datetime import datetime
def valid_date(s):
try:
return datetime.strptime(s, '%Y-%m-%d %H:%M')
except ValueError:
msg = "Not a valid date: '{0}'.".format(s)
raise argparse.ArgumentTypeError(msg)
how would i convert this timestamp '20141031131429Z' to 31 october 2014 in python
>>>datetime.datetime.strptime( "20141031131429Z", "%Y%m%d%H%M%S%Z" )
the above code gives me an error shown below:
ValueError: time data '20141031131429Z' does not match format '%Y%m%d%H%M%S%Z'
Remove the % in front of the Z:
d = datetime.datetime.strptime("20141031131429Z", "%Y%m%d%H%M%SZ" )
print(d.strftime("%d %B %Y"))
Output:
31 October 2014
Set the documentation for the strftime() and strptime() behavior.
That's not a unix timestamp (which are parsed with %s in strftime/strptime) - it looks like iCalendar form #2 (RFC 2445). A module like iCalendar might help you parse that without having to hardcode which form is used.
Once you have a datetime object, it can be used to retrieve any other format:
>>> dt=datetime.datetime.strptime( "20141031131429Z", "%Y%m%d%H%M%SZ" )
>>> dt.strftime('%d %B %Y')
'31 October 2014'
>>> dt.strftime('%x')
'10/31/14'
A variety of programs output date formats according to the syntax of their Unix platforms date command. For example: Tue Nov 5 12:38:00 EST 2013.
How can I easily convert this into a Python date object?
The answer is actually pretty simple. You just need to use the datetime.strptime() method which converts a string representation of a date (1st parameter) into a date object based on a directive which specifies that format of the string representation (2nd parameter).
In this case, this is the code you would use:
import datetime
unix_date_format = '%a %b %d %H:%M:%S %Z %Y'
# Matches strings like Tue Nov 5 12:38:00 EST 2013
my_date = datetime.datetime.strptime(
date_in_string_format, unix_date_format)
Further Reading
datetime.strptime() method
I have to write a program where I take stocks from yahoo finance and print out certain information for the site. One of the pieces of data is the date. I need to take a date such as 3/21/2012 and converter to the following format: Mar 21, 2012.
Here is my code for the entire project.
def getStockData(company="GOOG"):
baseurl ="http://quote.yahoo.com/d/quotes.csv?s={0}&f=sl1d1t1c1ohgvj1pp2owern&e=.csv"
url = baseurl.format(company)
conn = u.urlopen(url)
content = conn.readlines()
data = content[0].decode("utf-8")
data = data.split(",")
date = data[2][1:-1]
date_new = datetime.strptime(date, "%m/%d/%Y").strftime("%B[0:3] %d, %Y")
print("The last trade for",company, "was", data[1],"and the change was", data[4],"on", date_new)
company = input("What company would you like to look up?")
getStockData(company)
co = ["VOD.L", "AAPL", "YHOO", "S", "T"]
for company in co:
getStockData(company)
You should really specify what about your code is not working (i.e., what output are you getting that you don't expect? What error message are you getting, if any?). However, I suspect your problem is with this part:
strftime('%B[0:3] %d, %Y')
Since Python won't do what you think with that attempt to slice '%B'. You should instead use '%b', which as noted in the documentation for strftime(), corresponds to the locale-abbreviated month name.
EDIT
Here is a fully functional script based on what you posted above with my suggested modifications:
import urllib2 as u
from datetime import datetime
def getStockData(company="GOOG"):
baseurl ="http://quote.yahoo.com/d/quotes.csv?s={0}&f=sl1d1t1c1ohgvj1pp2owern&e=.csv"
url = baseurl.format(company)
conn = u.urlopen(url)
content = conn.readlines()
data = content[0].decode("utf-8")
data = data.split(",")
date = data[2][1:-1]
date_new = datetime.strptime(date, "%m/%d/%Y").strftime("%b %d, %Y")
print("The last trade for",company, "was", data[1],"and the change was", data[4],"on", date_new)
for company in ["VOD.L", "AAPL", "YHOO", "S", "T"]:
getStockData(company)
The output of this script is:
The last trade for VOD.L was 170.00 and the change was -1.05 on Mar 06, 2012
The last trade for AAPL was 530.26 and the change was -2.90 on Mar 06, 2012
The last trade for YHOO was 14.415 and the change was -0.205 on Mar 06, 2012
The last trade for S was 2.39 and the change was -0.04 on Mar 06, 2012
The last trade for T was 30.725 and the change was -0.265 on Mar 06, 2012
For what it's worth, I'm running this on Python 2.7.1. I also had the line from __future__ import print_function to make this compatible with the Python3 print function you appear to be using.
Check out Dateutil. You can use it to parse a string into python datetime object and then print that object using strftime.
I've since come to a conclusion that auto detection of datetime value is not always a good idea. It's much better to use strptime and specify what format you want.