How to let datetime.strptime parse zero-padded decimal number only [duplicate] - python

I've been using the datetime module to do some checking of dates to see if they are in mm/dd/yyyy or mm/dd/yy formats. The problem is that the %d and %m directives aren't sensitive enough to detect when the month or day is a single digit, which is a requirement of mine.
datetime.strptime('01/01/2001', '%m/%d/%Y')
works like I want it to, but
datetime.strptime('1/1/2001', '%m/%d/%Y')
also produces a valid datetime, when I really want it to throw a ValueError unless the month and day are 0-padded. Does anyone know how to set a required precision for datetime formats? Is this possible or should I just go with regex instead?

The datetime function you're using isn't intended to validate input, only to convert strings to datetime objects. Both of your examples are legitimate string inputs as far as datetime is concerned.
If you want to enforce user input to be a specific format, I would go with a regex - see this example from my REPL:
>>> import re
>>> pattern = re.compile(r"^[0-9]{2}/[0-9]{2}/[0-9]{4}$")
>>> def valid_datestring(datestring):
... if pattern.match(datestring):
... return True
... return False
...
>>> valid_datestring('1/1/2001')
False
>>> valid_datestring('01/01/2001')
True
If you want to define a function that returns a formatted date or returns a valueError, you can do something like this:
def format_datestring(datestring):
if not valid_datestring(datestring):
raise ValueError('Date input must be in the form dd/mm/yyyy!')
return datetime.strptime(datestring, '%m/%d/%Y')

You can verify that a string conforms exactly to a formatting string by re-formatting the datetime and comparing to the original string:
datetime_ = datetime.strptime(datetime_string, DATETIME_FORMAT)
if datetime_.strftime(DATETIME_FORMAT) != datetime_string:
[fail]

Related

What does colon mean in Python string formatting?

I am learning the Python string format() method. Though I understand that {} is a placeholder for arguments, I am not sure what : represent in the following code snippet from Programiz tutorial:
import datetime
# datetime formatting
date = datetime.datetime.now()
print("It's now: {:%Y/%m/%d %H:%M:%S}".format(date))
# custom __format__() method
class Person:
def __format__(self, format):
if(format == 'age'):
return '23'
return 'None'
print("Adam's age is: {:age}".format(Person()))
Why is there a : in front of %Y in print("It's now: {:%Y/%m/%d...? The code outputs It's now: 2021, and there is no : in front of 2021.
Why is there a : in front of age in print("Adam's age is: {:age}...?
Thanks in advance for your valuable input!!
Everything after : is a parameter to the __format__() method of the class of the corresponding arguent. For instance, for a number you can write {:.2f} to format it as a decimal number with 2 digits of precision after the decimal point.
For a datetime value, it's a format string that could be used with datetime.strftime().
And in your Person class, it will be passed as the format argument to Person.__format__(). So if you don't put :age there, the if condition will fail and it will print None instead of 23.
Python objects decide for themselves how they should be formatted using the __format__ method. Mostly we just use the defaults that come with the basic types, but much like __str__ and __repr__ we can customize. The stuff after the colon : is the parameter to __format__.
>>> class Foo:
... def __format__(self, spec):
... print(repr(spec))
... return "I will stubbornly refuse your format"
...
>>> f = Foo()
>>> print("Its now {:myformat}".format(f))
'myformat'
Its now I will stubbornly refuse your format
we can call the formatter ourselves. datetime uses the strftime format rules.
>>> import datetime
>>> # datetime formatting
>>> date = datetime.datetime.now()
>>> print("It's now: {:%Y/%m/%d %H:%M:%S}".format(date))
It's now: 2021/10/04 11:12:23
>>> date.__format__(":%Y/%m/%d %H:%M:%S")
':2021/10/04 11:12:23'
Your custom Person class implemented __format__ and used the format specifier after the colon to return a value.
Try f-strings. In them the colon seems to be more reasonable. It delimits the variable name and its formatting options:
import datetime
# datetime formatting
date = datetime.datetime.now()
print(f"It's now: {date:%Y/%m/%d %H:%M:%S}")
# custom __format__() method
class Person:
def __format__(self, format):
if(format == 'age'):
return '23'
return 'None'
print(f"Adam's age is: {Person():age}")
Btw you can have similar functionality with keyword arguments to format():
print("It's now: {d:%Y/%m/%d %H:%M:%S}".format(d=date))
print("Adam's age is: {adam:age}".format(adam=Person()))

How to print only if specific time starts with x in Python

Hi I'm a newbie learning python and I want to print something only if current time starts with x (for example, if current time starts with = 4, print "hi", time = 4:18), this is the code I made, it says attribute error:
import datetime
local = datetime.datetime.now().time().replace(microsecond=0)
if local.startswith('16'):
print("Hi! It's ", local)
The .replace() method returns a date object. date objects don't have a .startswith() method. That method is only for str.
Try converting your date to a string first:
if str(local).startswith('16'):
print("Hi! It's ", local)
The documentation lists all of the methods available on a date object.
You need to first convert it to a string, as datetime objects have no startswith() method. Use strftime, example:
import datetime
t = datetime.datetime(2012, 2, 23, 0, 0)
t2 = t.strftime('%m/%d/%Y')
will yield:
'02/23/2012'. Once it's converted, you can use t2.startswith().
https://docs.python.org/3/library/datetime.html#strftime-strptime-behavior
You can get the hour of the time and check if it is 16:
if local.hour == 16:
print("Hi! It's ",local)
If you need to use startswith() then you can convert it to a string like this:
if str(local).startswith('16'):
print("Hi! It's ", local)
That's not a good way. Check the time as int is the better solution here.
replace() has 2 needed str arguments. You use a named attribute which doesn't exist.

Creating new conversion specifier in Python

In python we have conversion specifier like
'{0!s}'.format(10)
which prints
'10'
How can I make my own conversion specifiers like
'{0!d}'.format(4561321)
which print integers in following format
4,561,321
Or converts it into binary like
'{0!b}'.format(2)
which prints
10
What are the classes I need to inherit and which functions I need to modify? If possible please provide a small example.
Thanks!!
What you want to do is impossible, because built-in types cannot be modified and literals always refer to built-in types.
There is a special method to handle the formatting of values, that is __format__, however it only handles the format string, not the conversion specifier, i.e. you can customize how {0:d} is handled but not how {0!d} is. The only things that work with ! are s and r.
Note that d and b already exist as format specifiers:
>>> '{0:b}'.format(2)
'10'
In any case you could implement your own class that handles formatting:
class MyInt:
def __init__(self, value):
self.value = value
def __format__(self, fmt):
if fmt == 'd':
text = list(str(self.value))
elif fmt == 'b':
text = list(bin(self.value)[2:])
for i in range(len(text)-3, 0, -3):
text.insert(i, ',')
return ''.join(text)
Used as:
>>> '{0:d}'.format(MyInt(5000000))
5,000,000
>>> '{0:b}'.format(MyInt(8))
1,000
Try not to make your own and try to use default functions already present in python. You can use,
'{0:b}'.format(2) # for binary
'{0:d}'.format(2) # for integer
'{0:x}'.format(2) # for hexadecimal
'{0:f}'.format(2) # for float
'{0:e}'.format(2) # for exponential
Please refer https://docs.python.org/2/library/string.html#formatspec for more.

How to check if a datetime object is localized with pytz?

I want to store a datetime object with a localized UTC timezone. The method that stores the datetime object can be given a non-localized datetime (naive) object or an object that already has been localized. How do I determine if localization is needed?
Code with missing if condition:
class MyClass:
def set_date(self, d):
# what do i check here?
# if(d.tzinfo):
self.date = d.astimezone(pytz.utc)
# else:
self.date = pytz.utc.localize(d)
How do I determine if localization is needed?
From datetime docs:
a datetime object d is aware iff:
d.tzinfo is not None and d.tzinfo.utcoffset(d) is not None
d is naive iff:
d.tzinfo is None or d.tzinfo.utcoffset(d) is None
Though if d is a datetime object representing time in UTC timezone then you could use in both cases:
self.date = d.replace(tzinfo=pytz.utc)
It works regardless d is timezone-aware or naive.
Note: don't use datetime.replace() method with a timezone with a non-fixed utc offset (it is ok to use it with UTC timezone but otherwise you should use tz.localize() method).
if you want to check if a datetime object 'd' is localized, check the d.tzinfo, if it is None, no localization.
Here is a function wrapping up the top answer.
def tz_aware(dt):
return dt.tzinfo is not None and dt.tzinfo.utcoffset(dt) is not None
Here's a more complete function to convert or coerce a timestamp obj to utc. If it reaches the exception this means the timestamp is not localized. Since it's good practice to always work in UTC within the code, this function is very useful at the entry level from persistence.
def convert_or_coerce_timestamp_to_utc(timeobj):
out = timeobj
try:
out = timeobj.astimezone(pytz.utc) # aware object can be in any timezone
except (ValueError,TypeError) as exc: # naive
out = timeobj.replace(tzinfo=pytz.utc)
return out
The small addition from the 'try catch' in the answer by J.F. Sebastian is the additional catch condition, without which not all naive cases will be caught by the function.

Method for guessing type of data represented currently represented as strings

I'm currently parsing CSV tables and need to discover the "data types" of the columns. I don't know the exact format of the values. Obviously, everything that the CSV parser outputs is a string. The data types I am currently interested in are:
integer
floating point
date
boolean
string
My current thoughts are to test a sample of rows (maybe several hundred?) in order to determine the types of data present through pattern matching.
I am particularly concerned about the date data type - is their a python module for parsing common date idioms (obviously I will not be able to detect them all)?
What about integers and floats?
ast.literal_eval() can get the easy ones.
Dateutil comes to mind for parsing dates.
For integers and floats you could always try a cast in a try/except section
>>> f = "2.5"
>>> i = "9"
>>> ci = int(i)
>>> ci
9
>>> cf = float(f)
>>> cf
2.5
>>> g = "dsa"
>>> cg = float(g)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: invalid literal for float(): dsa
>>> try:
... cg = float(g)
... except:
... print "g is not a float"
...
g is not a float
>>>
The data types I am currently interested in are...
These do not exist in a CSV file. The data is only strings. Only. Nothing more.
test a sample of rows
Tells you nothing except what you saw in the sample. The next row after your sample can be a string which looks entirely different from the sampled strings.
The only way you can process CSV files is to write CSV-processing applications that assume specific data types and attempt conversion. You cannot "discover" much about a CSV file.
If column 1 is supposed to be a date, you'll have to look at the string and work out the format. It could be anything. A number, a typical Gregorian date in US or European format (there's not way to know whether 1/1/10 is US or European).
try:
x= datetime.datetime.strptime( row[0], some format )
except ValueError:
# column is not valid.
If column 2 is supposed to be a float, you can only do this.
try:
y= float( row[1] )
except ValueError:
# column is not valid.
If column 3 is supposed to be an int, you can only do this.
try:
z= int( row[2] )
except ValueError:
# column is not valid.
There is no way to "discover" if the CSV has floating-point digit strings except by doing float on each row. If a row fails, then someone prepared the file improperly.
Since you have to do the conversion to see if the conversion is possible, you might as well simply process the row. It's simpler and gets you the results in one pass.
Don't waste time analyzing the data. Ask the folks who created it what's supposed to be there.
You may be interested in this python library which does exactly this kind of type guessing on both general python data and CSVs and XLS files:
https://github.com/okfn/messytables
https://messytables.readthedocs.org/ - docs
It happily scales to very large files, to streaming data off the internet etc.
There is also an even simpler wrapper library that includes a command line tool named dataconverters: http://okfnlabs.org/dataconverters/ (and an online service: https://github.com/okfn/dataproxy!)
The core algorithm that does the type guessing is here: https://github.com/okfn/messytables/blob/7e4f12abef257a4d70a8020e0d024df6fbb02976/messytables/types.py#L164
We tested ast.literal_eval() but rescuing from error is pretty slow, if you want to cast from data that you receive all as string, I think that regex would be faster.
Something like the following worked very well for us.
import datetime
import re
"""
Helper function to detect the appropriate type for a given string.
"""
def guess_type(s):
if s == ""
return None
elif re.match("\A[0-9]+\.[0-9]+\Z", s):
return float
elif re.match("\A[0-9]+\Z", s):
return int
# 2019-01-01 or 01/01/2019 or 01/01/19
elif re.match("\A[0-9]{4}-[0-9]{2}-[0-9]{2}\Z", s) or \
re.match("\A[0-9]{2}/[0-9]{2}/([0-9]{2}|[0-9]{4})\Z", s):
return datetime.date
elif re.match("\A(true|false)\Z", s):
return bool
else:
return str
Tests:
assert guess_type("") == None
assert guess_type("this is a string") == str
assert guess_type("0.1") == float
assert guess_type("true") == bool
assert guess_type("1") == int
assert guess_type("2019-01-01") == datetime.date
assert guess_type("01/01/2019") == datetime.date
assert guess_type("01/01/19") == datetime.date

Categories

Resources