python: string comparison do not work as expected - python

My program is like this:
filename=sys.argv[1]
print "filename is default?", (filename is "default")
if (filename is "default"):
filename="..."
readfile(filename)
I type python ....py default in the command line. Then the output is:
filename is default? False
IOError:...No such file or directory 'default'.
I use pdb, and before the if statement excutes, p filename returns: 'default'.

Use == two compare whether two strings are equal.
Use is to test whether it is the same string.

This is what you're looking for:
if filename == "default" :
The == operator is used for comparison, whilst the is operator tests if two variables point to the same object, not if two variables have the same value.

Short answer:
if filename == "default" :
Long answer:
is checks for object identity. To check for equality, use ==. Check the Python documentation on comparisons. In your case:
Note that comparing two string constants with is will actually return true.
def f():
a = "foo"
b = "foo"
print(a is b) # True, because a and b refer to the same constant
x = "f" + "oo"
print(a is x) # True, because the addition is optimized away
y = "f"
z = y + "oo" #
print(a is z) # False, because z is actually a different object
You can see what happens under the hood by disassembling the CPython byte code:
>>> import dis
>>> dis.dis(f)
2 0 LOAD_CONST 1 ('foo')
3 STORE_FAST 0 (a)
3 6 LOAD_CONST 1 ('foo')
9 STORE_FAST 1 (b)
4 ...
5 28 LOAD_CONST 4 ('foo')
31 STORE_FAST 2 (x)
6 ...
7 50 LOAD_CONST 2 ('f')
53 STORE_FAST 3 (y)
8 56 LOAD_FAST 3 (y)
59 LOAD_CONST 3 ('oo')
62 BINARY_ADD
63 STORE_FAST 4 (z)
9 ...

Related

Why are the identities of two diffrent Integer Pyobjects same in VSCode? [duplicate]

This question already has answers here:
"is" operator behaves unexpectedly with integers
(11 answers)
Closed 6 months ago.
I have this code
l = 40000
m = 40000
print(type(l),type(m))
if l is m:
print("same")
else:
print("nope")
Taking reference from here I was hoping id to be different since the value are not falling in range of (-2,256). Please let me know if I am missing out on something
Myself using Python-3.8.3(32bit) on windows Platform
This is due to a simple optimization at the bytecode compilation stage; if the same constant appears in the same code more than once, it is created only once, and each use of the constant will refer to the same instance.
We can investigate the bytecode with the dis module:
>>> import dis
>>> dis.dis('l = 40000\nm=40000')
1 0 LOAD_CONST 0 (40000)
2 STORE_NAME 0 (l)
2 4 LOAD_CONST 0 (40000)
6 STORE_NAME 1 (m)
8 LOAD_CONST 1 (None)
10 RETURN_VALUE
>>> dis.dis('l = 40000\nm=40001')
1 0 LOAD_CONST 0 (40000)
2 STORE_NAME 0 (l)
2 4 LOAD_CONST 1 (40001)
6 STORE_NAME 1 (m)
8 LOAD_CONST 2 (None)
10 RETURN_VALUE
Note that in the first case where both constants are 40000, the two LOAD_CONST operations both load constant #0, but in the second case, 40000 is constant #0 and 40001 is constant #1.
Also, you will usually get different results if you do this on separate lines in the REPL, since each line in the REPL is compiled and executed as a separate code object, so they cannot share constants:
>>> l = 50000
>>> m = 50000
>>> id(l)
140545755966480
>>> id(m)
140545755966448
But if you do both in one line in the REPL, the same instance of the constant is used again, because it's just one line compiled to one code object, so the constant can be shared:
>>> p = 60000; q = 60000
>>> id(p)
140545755966576
>>> id(q)
140545755966576

How is the __contains__ method of the list class in Python implemented?

Suppose I define the following variables:
mode = "access"
allowed_modes = ["access", "read", "write"]
I currently have a type checking statement which is
assert any(mode == allowed_mode for allowed_mode in allowed_modes)
However, it seems that I can replace this simply with
assert mode in allowed_modes
According to ThiefMaster's answer in Python List Class __contains__ Method Functionality, these two should be equivalent. Is this indeed the case? And how could I easily verify this by looking up Python's source code?
No, they're not equivalent. For example:
>>> mode = float('nan')
>>> allowed_modes = [mode]
>>> any(mode == allowed_mode for allowed_mode in allowed_modes)
False
>>> mode in allowed_modes
True
See Membership test operations for more details, including this statement:
For container types such as list, tuple, set, frozenset, dict, or collections.deque, the expression x in y is equivalent to any(x is e or x == e for e in y).
Python lists are defined in C code.
You may verify it by looking at the code in the repository:
static int
list_contains(PyListObject *a, PyObject *el)
{
Py_ssize_t i;
int cmp;
for (i = 0, cmp = 0 ; cmp == 0 && i < Py_SIZE(a); ++i)
cmp = PyObject_RichCompareBool(el, PyList_GET_ITEM(a, i),
Py_EQ);
return cmp;
}
It's fairly straight forward to see that this code loops over items in list and stop when first equality (Py_EQ) comparison between el and PyList_GET_ITEM(a, i) returns 1.
Not equivalent since the any requires an extra function call, a generator expression and things.
>>> mode = "access"
>>> allowed_modes =["access", "read", "write"]
>>>
>>> def f1():
... mode in allowed_modes
...
>>> def f2():
... any(mode == x for x in allowed_modes)
...
>>>
>>>
>>> import dis
>>> dis.dis
dis.dis( dis.disassemble( dis.disco( dis.distb(
>>> dis.dis(f1)
2 0 LOAD_GLOBAL 0 (mode)
3 LOAD_GLOBAL 1 (allowed_modes)
6 COMPARE_OP 6 (in)
9 POP_TOP
10 LOAD_CONST 0 (None)
13 RETURN_VALUE
>>> dis.dis(f2)
2 0 LOAD_GLOBAL 0 (any)
3 LOAD_CONST 1 (<code object <genexpr> at 0x7fb24a957540, file "<stdin>", line 2>)
6 LOAD_CONST 2 ('f2.<locals>.<genexpr>')
9 MAKE_FUNCTION 0
12 LOAD_GLOBAL 1 (allowed_modes)
15 GET_ITER
16 CALL_FUNCTION 1 (1 positional, 0 keyword pair)
19 CALL_FUNCTION 1 (1 positional, 0 keyword pair)
22 POP_TOP
23 LOAD_CONST 0 (None)
26 RETURN_VALUE
>>>
This is more instructive than the python source for the methods themselves but here is the source of __contains__ for lists and the loop is in C which will probably be faster than a Python loop.
Some timing numbers confirm this.
>>> import timeit
>>> timeit.timeit(f1)
0.18974408798385412
>>> timeit.timeit(f2)
0.7702703149989247
>>>

difference between 'if x not in dict' and 'if not x in dict' [duplicate]

This question already has answers here:
"x not in y" or "not x in y"
(6 answers)
Closed 9 years ago.
I've noticed that both of these work the same:
if x not in list and if not x in list.
Is there some sort of difference between the two in certain cases? Is there a reason for having both, or is it just because it's more natural for some people to write one or the other?
Which one am I more likely to see in other people's code?
The two forms make identical bytecode, as you can clearly verify:
>>> import dis
>>> dis.dis(compile('if x not in d: pass', '', 'exec'))
1 0 LOAD_NAME 0 (x)
3 LOAD_NAME 1 (d)
6 COMPARE_OP 7 (not in)
9 JUMP_IF_FALSE 4 (to 16)
12 POP_TOP
13 JUMP_FORWARD 1 (to 17)
>> 16 POP_TOP
>> 17 LOAD_CONST 0 (None)
20 RETURN_VALUE
>>> dis.dis(compile('if not x in d: pass', '', 'exec'))
1 0 LOAD_NAME 0 (x)
3 LOAD_NAME 1 (d)
6 COMPARE_OP 7 (not in)
9 JUMP_IF_FALSE 4 (to 16)
12 POP_TOP
13 JUMP_FORWARD 1 (to 17)
>> 16 POP_TOP
>> 17 LOAD_CONST 0 (None)
20 RETURN_VALUE
so obviously they're semantically identical.
As a matter of style, PEP 8 does not mention the issue.
Personally, I strongly prefer the if x not in y form -- that makes it immediately clear that not in is a single operator, and "reads like English". if not x in y may mislead some readers into thinking it means if (not x) in y, reads a bit less like English, and has absolutely no compensating advantages.
>>> dis.dis(lambda: a not in b)
1 0 LOAD_GLOBAL 0 (a)
3 LOAD_GLOBAL 1 (b)
6 COMPARE_OP 7 (not in)
9 RETURN_VALUE
>>> dis.dis(lambda: not a in b)
1 0 LOAD_GLOBAL 0 (a)
3 LOAD_GLOBAL 1 (b)
6 COMPARE_OP 7 (not in)
9 RETURN_VALUE
when you do "not a in b" it will need be converted for (not in)
so, the right way is "a not in b".
not x in L isn't explicitly disallowed because that would be silly. x not in L is explicitly allowed (though it compiles to the same bytecode) because it's more readable.
x not in L is what everyone uses, though.
When you write a not in b it is using the not in operator, whereas not a in b uses the in operator and then negates the result. But the not in operator is defined to return the same as not a in b so they do exactly the same thing. From the documentation:
The operators in and not in test for collection membership. x in s evaluates to true if x is a member of the collection s, and false otherwise. x not in s returns the negation of x in s.
Similarly there is a is not b versus not a is b.
The extra syntax was added because it makes it easier for a human to read it naturally.
It just personal preference. You could also compare if x != 3 and if not x == 3. There's no difference between the two expressions you've shown.

Python operator precedence - and vs greater than

I have a line of code in my script that has both these operators chained together. From the documentation reference BOOLEAN AND has a lower precedence than COMPARISON GREATER THAN. I am getting unexpected results here in this code:
>>> def test(msg, value):
... print(msg)
... return value
>>> test("First", 10) and test("Second", 15) > test("Third", 5)
First
Second
Third
True
I was expecting Second or Third test to happen before the fist one, since > operator has a higher precedence. What am I doing wrong here?
https://docs.python.org/3/reference/expressions.html#operator-precedence
Because you are looking at the wrong thing. call (or function call) takes higher precendence over both and as well as > (greater than) . So first function calls occur from left to right.
Python will get the results for all function calls before either comparison happens. The only thing that takes precendence over here would be short circuiting , so if test("First",10) returned False, it would short circuit and return False.
The comparisons and and still occur in the same precendence , that is first the result of test("Second", 15) is compared against test("Third", 5) (please note only the return values (the function call already occured before)) . Then the result of test("Second", 15) > test("Third", 5) is used in the and operation.
From the documentation on operator precedence -
One way to see what's happening is to look at exactly how Python is interpreting this result:
>>> x = lambda: test("First", 10) and test("Second", 15) > test("Third", 5)
>>> dis.dis(x)
1 0 LOAD_GLOBAL 0 (test)
3 LOAD_CONST 1 ('First')
6 LOAD_CONST 2 (10)
9 CALL_FUNCTION 2
12 JUMP_IF_FALSE_OR_POP 42
15 LOAD_GLOBAL 0 (test)
18 LOAD_CONST 3 ('Second')
21 LOAD_CONST 4 (15)
24 CALL_FUNCTION 2
27 LOAD_GLOBAL 0 (test)
30 LOAD_CONST 5 ('Third')
33 LOAD_CONST 6 (5)
36 CALL_FUNCTION 2
39 COMPARE_OP 4 (>)
>> 42 RETURN_VALUE
If you do the same for 10 and 15 > 5, you get:
>>> x = lambda: 10 and 15 > 5
>>> dis.dis(x)
1 0 LOAD_CONST 1 (10)
3 JUMP_IF_FALSE_OR_POP 15
6 LOAD_CONST 2 (15)
9 LOAD_CONST 3 (5)
12 COMPARE_OP 4 (>)
>> 15 RETURN_VALUE

"x not in" vs. "not x in" [duplicate]

This question already has answers here:
"x not in y" or "not x in y"
(6 answers)
Closed 9 years ago.
I've noticed that both of these work the same:
if x not in list and if not x in list.
Is there some sort of difference between the two in certain cases? Is there a reason for having both, or is it just because it's more natural for some people to write one or the other?
Which one am I more likely to see in other people's code?
The two forms make identical bytecode, as you can clearly verify:
>>> import dis
>>> dis.dis(compile('if x not in d: pass', '', 'exec'))
1 0 LOAD_NAME 0 (x)
3 LOAD_NAME 1 (d)
6 COMPARE_OP 7 (not in)
9 JUMP_IF_FALSE 4 (to 16)
12 POP_TOP
13 JUMP_FORWARD 1 (to 17)
>> 16 POP_TOP
>> 17 LOAD_CONST 0 (None)
20 RETURN_VALUE
>>> dis.dis(compile('if not x in d: pass', '', 'exec'))
1 0 LOAD_NAME 0 (x)
3 LOAD_NAME 1 (d)
6 COMPARE_OP 7 (not in)
9 JUMP_IF_FALSE 4 (to 16)
12 POP_TOP
13 JUMP_FORWARD 1 (to 17)
>> 16 POP_TOP
>> 17 LOAD_CONST 0 (None)
20 RETURN_VALUE
so obviously they're semantically identical.
As a matter of style, PEP 8 does not mention the issue.
Personally, I strongly prefer the if x not in y form -- that makes it immediately clear that not in is a single operator, and "reads like English". if not x in y may mislead some readers into thinking it means if (not x) in y, reads a bit less like English, and has absolutely no compensating advantages.
>>> dis.dis(lambda: a not in b)
1 0 LOAD_GLOBAL 0 (a)
3 LOAD_GLOBAL 1 (b)
6 COMPARE_OP 7 (not in)
9 RETURN_VALUE
>>> dis.dis(lambda: not a in b)
1 0 LOAD_GLOBAL 0 (a)
3 LOAD_GLOBAL 1 (b)
6 COMPARE_OP 7 (not in)
9 RETURN_VALUE
when you do "not a in b" it will need be converted for (not in)
so, the right way is "a not in b".
not x in L isn't explicitly disallowed because that would be silly. x not in L is explicitly allowed (though it compiles to the same bytecode) because it's more readable.
x not in L is what everyone uses, though.
When you write a not in b it is using the not in operator, whereas not a in b uses the in operator and then negates the result. But the not in operator is defined to return the same as not a in b so they do exactly the same thing. From the documentation:
The operators in and not in test for collection membership. x in s evaluates to true if x is a member of the collection s, and false otherwise. x not in s returns the negation of x in s.
Similarly there is a is not b versus not a is b.
The extra syntax was added because it makes it easier for a human to read it naturally.
It just personal preference. You could also compare if x != 3 and if not x == 3. There's no difference between the two expressions you've shown.

Categories

Resources