Avoiding the mistake of comparing number and string - python

We've all made this kind of mistake in python:
if ( number < string ):
python silently accepts this and just gives incorrect output.
Thank goodness python 3 finally warns us. But in some cases python 2.7 is needed. Is there any way in python 2.7 to guard against this mistake other than "just be careful" (which we all know doesn't work 100% of the time)?

You could explicitly convert both numbers to int. The string will get converted, and the number won't be effected (it's already an int). So this saves you the need to start remembering what type of value the number holds:
a = 11
b = "2"
print a > b # prints False, which isn't what you intended
print int(a) > int(b) # prints True
EDIT:
As noted in the comments, you cannot assume a number is an integer. However, applying the same train of though with the proper function - float should work just fine:
a = 11
b = "2"
print a > b # prints False, which isn't what you intended
print float(a) > float(b) # prints True

If you really, really want to be 100% sure that comparing strings and ints is impossible, you can overload the __builtin__.int (and __builtin__.float, etc. as necessary) method to disallow comparing ints (and floats, etc) with strings. It would look like this:
import __builtin__
class no_str_cmp_int(int):
def __lt__(self,other):
if type(other) is str:
raise TypeError
return super.__lt__(other)
def __gt__(self,other):
if type(other) is str:
raise TypeError
return super.__gt__(other)
# implement __gte__, __lte__ and others as necessary
# replace the builtin int method to disallow string comparisons
__builtin__.int = no_str_cmp_int
x = int(10)
Then, if you attempted to do something like this, you'd receive this error:
>>> print x < '15'
Traceback (most recent call last):
File "<pyshell#15>", line 1, in <module>
print x < '15'
File "tmp.py", line 7, in __lt__
raise TypeError
TypeError
There is a major caveat to this approach, though. It only replaces the int function, so every time you created an int, you'd have to pass it through the function, as I do in the declaration of x above. Literals will continue to be the original int type, and as far as I am aware there is no way to change this. However, if you properly create these objects, they will continue to work with the 100% assurance you desire.

Just convert the string or any data type to float first.
When two data types are same, then only we can compare them.
Suppose,
a = "10"
b= 9.3
c=9
We want to add a,b,c.. So,
So, the correct way to add these three is to convert them to same data type and then add.
a = float(a)
b = float(b)
c = float(c)
print a+b+c

You can check if each variable is an int like this :
if ( isinstance(number, int) and isinstance(string, int) ):
if (number < string):
Do something
else:
Do something else
else :
print "NaN"
*Edit:
To check for a float too the code should be :
if ( isinstance(number, (int,float )) and isinstance(string, (int,float) ) ):

Related

Strange behaviour when comparing Timestamp and datetime64 in Python2.7

Has anyone encountered similar cases as below, where if we let a be a Timestamp, b to be datetime64, then comparing a < b is fine, but b < a returns error.
If a can be compared to b, I thought we should be able to compare the other way around?
For example (Python 2.7):
>>> a
Timestamp('2013-03-24 05:32:00')
>>> b
numpy.datetime64('2013-03-23T05:33:00.000000000')
>>> a < b
False
>>> b < a
Traceback (most recent call last):
File "<input>", line 1, in <module>
File "pandas\_libs\tslib.pyx", line 1080, in pandas._libs.tslib._Timestamp.__richcmp__ (pandas\_libs\tslib.c:20281)
TypeError: Cannot compare type 'Timestamp' with type 'long'
Many thanks in advance!
That's an interesting question. I've done some digging around and did my best to explain some of this, although one thing i still don't get is why we get pandas throwing an error instead of numpy when we do b<a.
Regards to your question:
If a can be compared to b, I thought we should be able to compare the other way around?
That's not necesserily true. It just depends on the implementation of the comparison operators.
Take this test class for example:
class TestCom(int):
def __init__(self, a):
self.value = a
def __gt__(self, other):
print('TestComp __gt__ called')
return True
def __eq__(self, other):
return self.a == other
Here I have defined my __gt__ (<) method to always return true no matter what the other value is. While __eq__ (==) left the same.
Now check the following comparisons out:
a = TestCom(9)
print(a)
# Output: 9
# my def of __ge__
a > 100
# Ouput: TestComp __gt__ called
# True
a > '100'
# Ouput: TestComp __gt__ called
# True
'100' < a
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-486-8aee1b1d2500> in <module>()
1 # this will not use my def of __ge__
----> 2 '100' > a
TypeError: '>' not supported between instances of 'str' and 'TestCom'
So going back to your case. Looking at the timestamps_sourceCode the only thing i can think of is pandas.Timestamp does some type checking and conversion if possible.
When we're comparing a with b (pd.Timestamp against np.datetime64), Timestamp.__richcmp__ function does the comparison, if it is of type np.datetime64 then it converts it to pd.Timestamp type and does the comparison.
# we can do the following to have a comparison of say b > a
# this converts a to np.datetime64 - .asm8 is equivalent to .to_datetime64()
b > a.asm8
# or we can confert b to datetime64[ms]
b.astype('datetime64[ms]') > a
# or convert to timestamp
pd.to_datetime(b) > a
What i found surprising was, as i thought the issue is with nanoseconds not in Timestamp, is that even if you do the following the comparison between np.datetime64 with pd.Timestamp fails.
a = pd.Timestamp('2013-03-24 05:32:00.00000001')
a.nanosecond # returns 10
# doing the comparison again where they're both ns still fails
b < a
Looking at the source code it seems like we can use == and != operators. But even they dont work as expected. Take a look at the following for an example:
a = pd.Timestamp('2013-03-24 05:32:00.00000000')
b = np.datetime64('2013-03-24 05:32:00.00000000', 'ns')
b == a # returns False
a == b # returns True
I think this is the result of lines 149-152 or 163-166. Where they return False if your using == and True for !=, without actually comparing the values.
Edit:
The nanosecond feature was added in version 0.23.0. So you can do something like pd.Timestamp('2013-03-23T05:33:00.000000022', unit='ns'). So yes when you compare np.datetime64 it will be converted to pd.Timestamp with ns precision.
Just note that pd.Timestamp is supposed to be a replacement for python`s datetime:
Timestamp is the pandas equivalent of python's Datetime
and is interchangeable with it in most cases.
But python's datetime doesn't support nanoseconds - good answer here explaining why SO_Datetime.pd.Timestamp have support for comparison between the two even if your Timestamp has nanoseconds in it. When you compare a datetime object agains pd.Timestamp object with ns they have _compare_outside_nanorange that will do the comparison.
Going back to np.datetime64, one thing to note here as explained nicely in this post SO is that it's a wrapper on an int64 type. So not suprising if i do the following:
1 > a
a > 1
Both will though an error Cannot compare type 'Timestamp' with type 'int'.
So under the hood when you do b > a the comparison most be done on an int level, this comparison will be done by np.greater() function np.greater - also take a look at ufunc_docs.
Note: I'm unable to confirm this, the numpy docs are too complex to go through. If any numpy experts can comment on this, that'll be helpful.
If this is the case, if the comparison of np.datetime64 is based on int, then the example above with a == b and b == a makes sense. Since when we do b == a we compare the int value of b against pd.Timestamp this will always return Flase for == and True for !=.
Its the same as doing say 123 == '123', this operation will not fail, it will just return False.

python casting variables with type as argument

Is there a casting function which takes both the variable and type to cast to? Such as:
cast_var = cast_fn(original_var, type_to_cast_to)
I want to use it as an efficient way to cycle through a list of input parameters parsed from a string, some will need to be cast as int, bool, float etc...
All Python types are callable
new_val = int(old_val)
so no special function is needed. Indeed, what you are asking for is effectively just an apply function
new_val = apply(int, old_val)
which exists in Python 2, but was removed from Python 3 as it was never necessary; any expression that could be passed as the first argument to apply can always be used with the call "operator":
apply(expr, *args) == expr(*args)
Short answer:
This works:
>>> t = int
>>> s = "9"
>>> t(s)
Or a full example:
def convert(type_id, value):
return type_id(value)
convert(int, "3") # --> 3
convert(str, 3.0) # --> '3.0'

Change Letters to numbers (ints) in python

This might be python 101, but I am having a hard time changing letters into a valid integer.
The put what I am trying to do simply
char >> [ ] >> int
I created a case statement to give me a number depending on certain characters, so what I tried doing was
def char_to_int(sometext):
return {
'Z':1,
'Y':17,
'X':8,
'w':4,
}.get(sometext, '')
Which converts the letter into a number, but when I try using that number into any argument that takes ints it doesn't work.
I've tried
text_number = int(sometext)
But I get the message TypeError: int() argument must be a string or a number, not 'function'
So from there I returned the type of sometext using
print(type(sometext))
And the return type is a function.
So my question is, is there a better way to convert letters into numbers, or a better way to setup my switch/def statement
Heres the full code where its call
if sometext:
for i in range ( 0, len(sometext)):
char_to_int(sometext[i])
I've managed to get it working, ultimately what I changed was the default of the definition, I now set the definition to a variable before instead of calling it in another function, and I recoded the section I was using it.
Originally my definition looked liked this
def char_to_int(sometext):
return {
...
}.get(sometext, '')
But I changed the default to 0, so now it looks like
def char_to_int(sometext):
return {
...
}.get(sometext, 0)
The old code that called the definition looked
if sometext:
for i in range ( 0, len(sometext)):
C_T_I = int(char_to_int(sometext[i]))
I changed it to this.
if sometext:
for i in range ( 0, len(sometext)):
C_T_I = char_to_int(sometext[i])
TEXTNUM = int(C_T_I)
Hopefully this clarifies the changes. Thanks for everyone's assistance.
in the python console:
>>> type({ 'Z':1, 'Y':17, 'X':8, 'w':4, }.get('X', ''))
<class 'int'>
so as cdarke suggested, you should look at how you are calling the function.

writing odd number in a list python

This is a part of my homework assignment and im close to the final answer but not quite yet. I need to write a function that writes the odd number between position 1 and 5 in a list.
I make something like that:
-in a file domain I write the condition for odd number:
def oddNumber(x):
"""
this instruction help us to write the odd numbers from the positions specificated
input: x-number
output:-True if the number is odd
-False otherwise
"""
if x % 2==1:
return True
else:
return False
-then the tests:
def testOdd_Number():
testOdd_Number=[0,1,2,3,4,5,6,7,8]
oddNumber(testOdd_Number,0,6)
assert (testOdd_Number==[1,3,5])
oddNumber(testOdd_Number,0,3)
assert (testOdd_Number==[3])
-and in the other file named userinterface I write this:
elif(cmd.startswith("odd from ", "")):
try:
cmd=cmd.replace("odd from ", "")
cmd=cmd.replace("to ", "")
i=int(cmd[:cmd.find(" ")])
j=int(cmd[cmd.find(" "):])
if (i>=len(NumberList) or i>j or j>=len(NumberList) or i<0 or j<0):
print("Invalid value(s).")
else:
for m in range(i-1,j):
if oddNumber(NumberList[m]):
print (NumberList[m])
except:
print("Error!")
-when I run the entire project(I have more requirements but the others one are good), and write odd from [pos] to [pos] it says me
Traceback (most recent call last):
File "C:\Users\Adina\My Documents\LiClipse Workspace\P1\userinterface.py", line 94, in <module>
run()
File "C:\Users\Adina\My Documents\LiClipse Workspace\P1\userinterface.py", line 77, in run
elif(cmd.startswith("odd from ", "")):
TypeError: slice indices must be integers or None or have an __index__ method
I've forgotten to say that I have a also a function main() where I print the requirements.Where am I wrong?
Python's string startswith method, described here:
https://docs.python.org/2/library/stdtypes.html
states that arguments are
some_string.startswith(prefix, beginning, end) #where beginning and end are optional integers
and You have provided prefix and empty string ( cmd.startswith("odd from ", "") )
Some things I noticed:
1) you can shorten your oddNumber function to
def oddNumber(x):
return x%2
2) in your tests, you rebind the functions name testOdd_Number to some list, then pass that around to your oddNumber function. is that the same function described above? Then it won't work, as this function expects a single integer to be passed.
Using the same name to refer to two different things is discouraged.
Actually, I have no idea what your testcode does or should do. Are you passing a list and expect oddNumber to modify it in place?
3) your custom command parser looks... odd, and fragile. Maybe invest in a real parser?
You should decouple command parsing and actual computation.
As brainovergrow pointed out, there is also your error, since .startswith does not accept a string as second argument.
Some general hints:
You can use list(range(9)) instead of hardcoding [0,1,2,3,4,5,6,7,8]
You can use filter to filter the odd numbers of a given list:>>> list(filter(oddNumber, range(9))) yields [1, 3, 5, 7].
You can also use list comprehensions: [x for x in range(9) if x%2] yields the same.
you might find any() and all() useful. Take a look at them.
Your naming scheme is neighter consistent nor pythonic. Read PEP8 for a style guide.

Method for guessing type of data represented currently represented as strings

I'm currently parsing CSV tables and need to discover the "data types" of the columns. I don't know the exact format of the values. Obviously, everything that the CSV parser outputs is a string. The data types I am currently interested in are:
integer
floating point
date
boolean
string
My current thoughts are to test a sample of rows (maybe several hundred?) in order to determine the types of data present through pattern matching.
I am particularly concerned about the date data type - is their a python module for parsing common date idioms (obviously I will not be able to detect them all)?
What about integers and floats?
ast.literal_eval() can get the easy ones.
Dateutil comes to mind for parsing dates.
For integers and floats you could always try a cast in a try/except section
>>> f = "2.5"
>>> i = "9"
>>> ci = int(i)
>>> ci
9
>>> cf = float(f)
>>> cf
2.5
>>> g = "dsa"
>>> cg = float(g)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: invalid literal for float(): dsa
>>> try:
... cg = float(g)
... except:
... print "g is not a float"
...
g is not a float
>>>
The data types I am currently interested in are...
These do not exist in a CSV file. The data is only strings. Only. Nothing more.
test a sample of rows
Tells you nothing except what you saw in the sample. The next row after your sample can be a string which looks entirely different from the sampled strings.
The only way you can process CSV files is to write CSV-processing applications that assume specific data types and attempt conversion. You cannot "discover" much about a CSV file.
If column 1 is supposed to be a date, you'll have to look at the string and work out the format. It could be anything. A number, a typical Gregorian date in US or European format (there's not way to know whether 1/1/10 is US or European).
try:
x= datetime.datetime.strptime( row[0], some format )
except ValueError:
# column is not valid.
If column 2 is supposed to be a float, you can only do this.
try:
y= float( row[1] )
except ValueError:
# column is not valid.
If column 3 is supposed to be an int, you can only do this.
try:
z= int( row[2] )
except ValueError:
# column is not valid.
There is no way to "discover" if the CSV has floating-point digit strings except by doing float on each row. If a row fails, then someone prepared the file improperly.
Since you have to do the conversion to see if the conversion is possible, you might as well simply process the row. It's simpler and gets you the results in one pass.
Don't waste time analyzing the data. Ask the folks who created it what's supposed to be there.
You may be interested in this python library which does exactly this kind of type guessing on both general python data and CSVs and XLS files:
https://github.com/okfn/messytables
https://messytables.readthedocs.org/ - docs
It happily scales to very large files, to streaming data off the internet etc.
There is also an even simpler wrapper library that includes a command line tool named dataconverters: http://okfnlabs.org/dataconverters/ (and an online service: https://github.com/okfn/dataproxy!)
The core algorithm that does the type guessing is here: https://github.com/okfn/messytables/blob/7e4f12abef257a4d70a8020e0d024df6fbb02976/messytables/types.py#L164
We tested ast.literal_eval() but rescuing from error is pretty slow, if you want to cast from data that you receive all as string, I think that regex would be faster.
Something like the following worked very well for us.
import datetime
import re
"""
Helper function to detect the appropriate type for a given string.
"""
def guess_type(s):
if s == ""
return None
elif re.match("\A[0-9]+\.[0-9]+\Z", s):
return float
elif re.match("\A[0-9]+\Z", s):
return int
# 2019-01-01 or 01/01/2019 or 01/01/19
elif re.match("\A[0-9]{4}-[0-9]{2}-[0-9]{2}\Z", s) or \
re.match("\A[0-9]{2}/[0-9]{2}/([0-9]{2}|[0-9]{4})\Z", s):
return datetime.date
elif re.match("\A(true|false)\Z", s):
return bool
else:
return str
Tests:
assert guess_type("") == None
assert guess_type("this is a string") == str
assert guess_type("0.1") == float
assert guess_type("true") == bool
assert guess_type("1") == int
assert guess_type("2019-01-01") == datetime.date
assert guess_type("01/01/2019") == datetime.date
assert guess_type("01/01/19") == datetime.date

Categories

Resources