ValueError: Unknown string format in Python?

ValueError: Unknown string format in Python? - python

I'm trying to parse a basic iso formatted datetime string in Python, but I'm having a hard time doing that. Consider the following example:
>>> import json
>>> from datetime import datetime, date
>>> import dateutil.parser
>>> date_handler = lambda obj: obj.isoformat()
>>> the_date = json.dumps(datetime.now(), default=date_handler)
>>> print the_date
"2017-02-18T22:14:09.915727"
>>> print dateutil.parser.parse(the_date)
Traceback (most recent call last):
File "<input>", line 1, in <module>
print dateutil.parser.parse(the_date)
File "/usr/local/lib/python2.7/site-packages/dateutil/parser.py", line 1168, in parse
return DEFAULTPARSER.parse(timestr, **kwargs)
File "/usr/local/lib/python2.7/site-packages/dateutil/parser.py", line 559, in parse
raise ValueError("Unknown string format")
ValueError: Unknown string format
I've also tried parsing this using the regular strptime:
>>> print datetime.strptime(the_date, '%Y-%m-%dT%H:%M:%S')
# removed rest of the error output
ValueError: time data '"2017-02-18T22:11:58.125703"' does not match format '%Y-%m-%dT%H:%M:%S'
>>> print datetime.strptime(the_date, '%Y-%m-%dT%H:%M:%S.%f')
# removed rest of the error output
ValueError: time data '"2017-02-18T22:11:58.125703"' does not match format '%Y-%m-%dT%H:%M:%S.%f'
Does anybody know how on earth I can parse this fairly simple datetime format?

note the error message:
ValueError: time data '"2017-02-18T22:11:58.125703"'
There are single quotes + double quotes which means that the string actually contains double quotes. That's because json serialization adds double quotes to strings.
you may want to strip the quotes around your string:
datetime.strptime(the_date.strip('"'), '%Y-%m-%dT%H:%M:%S.%f')
or, maybe less "hacky", de-serialize using json.loads:
datetime.strptime(json.loads(the_date), '%Y-%m-%dT%H:%M:%S.%f')

Related

How to extract multiple time from same string in Python?

I'm trying to extract time from single strings where in one string there will be texts other than only time. An example is s = 'Dates : 12/Jul/2019 12/Aug/2019, Loc : MEISHAN BRIDGE, Time : 06:00 17:58'.
I've tried using datefinder module like this :
from datetime import datetime as dt
import datefinder as dfn
for m in dfn.find_dates(s):
print(dt.strftime(m, "%H:%M:%S"))
Which gives me this :
17:58:00
In this case the time "06:00" is missed out. Now if I try without datefinder with only datetime module like this :
dt.strftime(s, "%H:%M")
It notifies me that the input must be a datetime object already, not a string with the following error :
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: descriptor 'strftime' requires a 'datetime.date' object but received a 'str'
So I tried to use dateutil module to parse this string s to a datetime object with this :
from dateutil.parser import parse
parse(s)
but, now it now says that my string is not in proper format (which in most cases will not be in any fixed format), showing me this error :
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/michael/anaconda3/envs/sec_img/lib/python3.7/site-packages/dateutil/parser/_parser.py", line 1358, in parse
return DEFAULTPARSER.parse(timestr, **kwargs)
File "/home/michael/anaconda3/envs/sec_img/lib/python3.7/site-packages/dateutil/parser/_parser.py", line 649, in parse
raise ValueError("Unknown string format:", timestr)
ValueError: ('Unknown string format:', '12/Jul/2019 12/Aug/2019 MEISHAN BRIDGE 06:00 17:58')
I have thought of getting the time with regex like
import re
p = r"\d{2}\:\d{2}"
times = [i.group() for i in re.finditer(p, s)]
# Gives me ['06:00', '17:58']
But doing this way will need me to check again whether this regex matched chunks are actually time or not because even "99:99" could be regex matched rightly and told as time wrongly. Is there any work around without regex to get all the times from a single string?
Please note that the string might contain or might not contain any date, but it will contain a time always. Even if it contains date, the date format might be anything on earth and also this string might or might not contain other irrelevant texts.

I don't see many options here, so I would go with a heuristic. I would run the following against the whole dataset and extend the config/regexes until it covers all/most of the cases:
import re
import logging
from datetime import datetime as dt
s = 'Dates : 12/Jul/2019 12/08/2019, Loc : MEISHAN BRIDGE, Time : 06:00 17:58:59'
SUPPORTED_DATE_FMTS = {
re.compile(r"(\d{2}/\w{3}/\d{4})"): "%d/%b/%Y",
re.compile(r"(\d{2}/\d{2}/\d{4})"): "%d/%m/%Y",
re.compile(r"(\d{2}/\w{3}\w+/\d{4})"): "%d/%B/%Y",
# Capture more here
}
SUPPORTED_TIME_FMTS = {
re.compile(r"((?:[0-1][0-9]|2[0-4]):[0-5][0-9])[^:]"): "%H:%M",
re.compile(r"((?:[0-1][0-9]|2[0-4]):[0-5][0-9]:[0-5][0-9])"): "%H:%M:%S",
# Capture more here
}
def extract_supported_dt(config, s):
"""
Loop thru the given config (keys are regexes, values are date/time format)
and attempt to gather all valid data.
"""
valid_data = []
for regex, fmt in config.items():
# Extract what you think looks like date
valid_ish_data = regex.findall(s)
if not valid_ish_data:
continue
print("Checking " + str(valid_ish_data))
# validate it
for d in valid_ish_data:
try:
valid_data.append(dt.strptime(d, fmt))
except ValueError:
pass
return valid_data
# Handle dates
dates = extract_supported_dt(SUPPORTED_DATE_FMTS, s)
# Handle times
times = extract_supported_dt(SUPPORTED_TIME_FMTS, s)
print("Found dates: ")
for date in dates:
print("\t" + str(date.date()))
print("Found times: ")
for t in times:
print("\t" + str(t.time()))
Example output:
Checking ['12/Jul/2019']
Checking ['12/08/2019']
Checking ['06:00']
Checking ['17:58:59']
Found dates:
2019-07-12
2019-08-12
Found times:
06:00:00
17:58:59
This is a trial and error approach but I do not think there is an alternative in your case. Thus my goal here is to make it as easy as possible to extend support with more date/time formats as opposed to try to find a solution that covers 100% of the data day-1. This way, the more data you run against the more complete your config will be.
One thing to note is that you will have to detect strings that appear to have no dates and log them somewhere. Later you will need to manually revise and see if something that was missed could be captured.
Now, assuming that your data are being generated by another system, sooner or later you will be able to match 100% of it. If the data input is from human, then you will probably never manage to get 100%! (people tend to make spelling mistakes and sometimes import random stuff... date=today :) )

How to extract multiple time from same string in Python?
If you need only time this regex should work fine
r"[0-2][0-9]\:[0-5][0-9]"
If there could be spaces in time like 23 : 59 use this
r"[0-2][0-9]\s*\:\s*[0-5][0-9]"

Use Regex But Something Like This,
(?=[0-1])[0-1][0-9]\:[0-5][0-9]|(?=2)[2][0-3]\:[0-5][0-9]
This Matched
00:00, 00:59 01:00 01:59 02:00 02: 59
09:00 10:00 11:59 20:00 21:59 23:59
Not work for
99:99 23:99 01:99
Check Here Dude if it works for You
Check on Repl.it

you could use dictionaries:
my_dict = {}
for i in s.split(', '):
m = i.strip().split(' : ', 1)
my_dict[m[0]] = m[1].split()
my_dict
Out:
{'Dates': ['12/Jul/2019', '12/Aug/2019'],
'Loc': ['MEISHAN', 'BRIDGE'],
'Time': ['06:00', '17:58']}

How to parse JSON files with double-quotes inside strings in Python?

I'm trying to read a JSON file in Python. Some of the lines have strings with double quotes inside:
{"Height__c": "8' 0\"", "Width__c": "2' 8\""}
Using a raw string literal produces the right output:
json.loads(r"""{"Height__c": "8' 0\"", "Width__c": "2' 8\""}""")
{u'Width__c': u'2\' 8"', u'Height__c': u'8\' 0"'}
But my string comes from a file, ie:
s = f.readline()
Where:
>>> print repr(s)
'{"Height__c": "8\' 0"", "Width__c": "2\' 8""}'
And json throws the following exception:
json.loads(s) # s = """{"Height__c": "8' 0\"", "Width__c": "2' 8\""}"""
ValueError: Expecting ',' delimiter: line 1 column 21 (char 20)
Also,
>>> s = """{"Height__c": "8' 0\"", "Width__c": "2' 8\""}"""
>>> json.loads(s)
Fails, but assigning the raw literal works:
>>> s = r"""{"Height__c": "8' 0\"", "Width__c": "2' 8\""}"""
>>> json.loads(s)
{u'Width__c': u'2\' 8"', u'Height__c': u'8\' 0"'}
Do I need to write a custom Decoder?

The data file you have does not escape the nested quotes correctly; this can be hard to repair.
If the nested quotes follow a pattern; e.g. always follow a digit and are the last character in each string you can use a regular expression to fix these up. Given your sample data, if all you have is measurements in feet and inches, that's certainly doable:
import re
from functools import partial
repair_nested = partial(re.compile(r'(\d)""').sub, r'\1\\""')
json.loads(repair_nested(s))
Demo:
>>> import json
>>> import re
>>> from functools import partial
>>> s = '{"Height__c": "8\' 0"", "Width__c": "2\' 8""}'
>>> repair_nested = partial(re.compile(r'(\d)""').sub, r'\1\\""')
>>> json.loads(s)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/mj/Development/Library/buildout.python/parts/opt/lib/python2.7/json/__init__.py", line 338, in loads
return _default_decoder.decode(s)
File "/Users/mj/Development/Library/buildout.python/parts/opt/lib/python2.7/json/decoder.py", line 365, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/Users/mj/Development/Library/buildout.python/parts/opt/lib/python2.7/json/decoder.py", line 381, in raw_decode
obj, end = self.scan_once(s, idx)
ValueError: Expecting , delimiter: line 1 column 21 (char 20)
>>> json.loads(repair_nested(s))
{u'Width__c': u'2\' 8"', u'Height__c': u'8\' 0"'}

How do you check a python string to see if it's in HH:MM time format?

what is a regex for matching the HH:MM time format in python?
I was using
def is_time(self, str):
reg = re.compile(r'[1-9]|1[0-2]:[0-9]{2}')
if re.match(reg, str):
return True
else:
return False
I've also tried :
reg = re.compile(r'^([0-9]|0[0-9]|1[0-9]|2[0-3]):[0-5][0-9]$')
but i keep getting
TypeError: %d format: a number is required, not str
it makes sense that I'm getting that error because I'm checking numbers in strings, but I'm not sure how to fix it. any help would be greatly appreciated.

You should probably use datetime.strptime
>>> import datetime
>>> datetime.datetime.strptime('12:34', '%H:%M')
datetime.datetime(1900, 1, 1, 12, 34)
>>> datetime.datetime.strptime('99:99', '%H:%M')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib64/python2.7/_strptime.py", line 325, in _strptime
(data_string, format))
ValueError: time data '99:99' does not match format '%H:%M'

You should use just:
if reg.search(strs)
or simply:
return reg.search(strs).

Python date parsing from Amazon EC2 EBS

I'm having an issue trying to parse a date that's being returned by an EC2 script that checks the last backup of a volume.
I'm getting the current string format returned as a string and I want to parse it into a datetime object but because of the extra characters in the returned string, datetime.strptime does not work properly. Is there a way to get the string into a datetime object without having to use dateutils as I'm having issues with that as well.
This is the date string being returned:
2013-06-26T02:01:05.000Z
This is my code trying to parse it:
startTime = datetime.strptime(s.start_time, '%Y-%m-%dT%H:&M:%S.%fZ')
Obviously this isn't working as when I try and print startTime it does nothing.

I think it's a typo, instead of % you used &.
'%Y-%m-%dT%H:&M:%S.%fZ'
^
|
this is wrong
Demo:
>>> strs = "2013-06-26T02:01:05.000Z"
>>> datetime.strptime(strs, '%Y-%m-%dT%H:%M:%S.%fZ')
datetime.datetime(2013, 6, 26, 2, 1, 5)

You have an error in your format; it's not &M but %M:
datetime.strptime(t, '%Y-%m-%dT%H:%M:%S.%fZ')
The corrected format works just fine:
>>> t = '2013-06-26T02:01:05.000Z'
>>> from datetime import datetime
>>> datetime.strptime(t, '%Y-%m-%dT%H:&M:%S.%fZ')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/mj/Development/Libraries/buildout.python/parts/opt/lib/python2.7/_strptime.py", line 325, in _strptime
(data_string, format))
ValueError: time data '2013-06-26T02:01:05.000Z' does not match format '%Y-%m-%dT%H:&M:%S.%fZ'
>>> datetime.strptime(t, '%Y-%m-%dT%H:%M:%S.%fZ')
datetime.datetime(2013, 6, 26, 2, 1, 5)

Formatting Time as %d-%m-%y

am trying to print orig_time as 6/9/2013 and running into following error..can anyone provide inputs on what is wrong here
Code:
orig_time="2013-06-09 00:00:00"
Time=(orig_time.strftime('%m-%d-%Y'))
print Time
Error:-
Traceback (most recent call last):
File "date.py", line 2, in <module>
Time=(orig_time.strftime('%m-%d-%Y'))
AttributeError: 'str' object has no attribute 'strftime'

You cannot use strftime on a string as it is not a method of string, one way to do this is by using the datetime module:
>>> from datetime import datetime
>>> orig_time="2013-06-09 00:00:00"
#d is a datetime object
>>> d = datetime.strptime(orig_time, '%Y-%m-%d %H:%M:%S')
Now you can use either string formatting:
>>> "{}/{}/{}".format(d.month,d.day,d.year)
'6/9/2013'
or datetime.datetime.strftime:
>>> d.strftime('%m-%d-%Y')
'06-09-2013'

>>> import time
>>> time.strftime("%m-%d-%y",time.strptime("2013-06-09 00:00:00","%Y-%m-%d %H:%M:%S"))
'06-09-13'
but its more of a pain to remove leading zeros ... If you need that use the other answer
also its unclear what you want, since your title says one thing (and your format string), but it does not match what you say is your expected output in the question

import time
now1 = time.strftime("%d-%m-%y", time.localtime())
print(now1)
If you'd prefer, you can change the "d-m-y" as you want and can use a static time.

import time
orig_time="2013-06-09 00:00:00"
Time= time.strftime(orig_time, '%m-%d-%Y')
print Time

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

ValueError: Unknown string format in Python? - python

Related

How to extract multiple time from same string in Python?

How to parse JSON files with double-quotes inside strings in Python?

How do you check a python string to see if it's in HH:MM time format?

Python date parsing from Amazon EC2 EBS

Formatting Time as %d-%m-%y

Categories

Resources