Why is eval('"\x27"') == eval('"\\x27"')? - python

I'm very confused with python's eval():
I tried eval('"\x27"') == eval('"\\x27"') and it evaluates to True. Can somebody explain why this is the case? Both expressions evaluate to "'". I understand why eval('"\x27"') does (the string evaluated has a single character, which is an escaped hexadecimal representing an apostrophe), but shouldn't eval('"\\x27"') be equal to "\\x27"?
Secondly, adding to the confusion, if I set the following variables,
s = "\x27"
t = "\\x27"
then eval('s') is again "'", but eval('t') is "\\x27". Why is that?

According to the docs, eval "parses and evaluates the argument as a python expression". In other words, it's applying the same processing that is applied if you write x = "foobar \n" inside a program or the IDLE. In this example, \n gets turned into a newline-character (which, note, is not identical to the literal \n).
If you typed x = "\x27" into the IDLE, you'd get x == "'". The x27 is escaped because of the backslash and thus changed during evaluation. If you escape the backslash, then x27 is not changed during evaluation. Instead, you simply get a string with a backslash followed by x27.
Now if you evaluated that string again, you only have one backslash left - seemingly escaping x27. Thus, it is changed to '.
Another way to look at this: eval("\x27") evaluates the argument twice, but it is only changed the first time, to "'". eval("\\x27") also evaluates the argument twice, first to \x27, then to "'".
Here's an easier example to demonstrate how this works:
>>> x = "\"foobar\""
>>> x == "foobar"
False
>>> x == "\"foobar\""
True
>>> x = eval(x) # changes value of x from "foobar" to just foobar. Still string though, thus still ""
>>> x == "foobar"
True
>>> x == "\"foobar\""
False
Look at it like this: The right hand side of y = "2" contains two components: the information that y should be of type string, expressed using the two ", and the value of that string, expressed by the character 2. The separation of these two aspects is done during evaluation of the code you write. The string object itself never sees the " during initialization.
So in the above example, after the first line, we have x of type str with value "foobar". If you evaluate that again, the " are interpreted this time not as part of the value of x but as the type of x. So eval("\"foobar\"") basically transforms the string "foobar" to foobar, which, if you want to use that using the language Python, you have to write as "\"foobar\"" and "foobar".

For '"\x27"' the backslash escape is expanded during parsing, so this is literally
'"\'"'. '"\\x27"' only strips the backslash, i.e. it is equal to r'"\x27"'.
A direct eval() on the string literals adds another iteration of special character expansion: In the first case, \' is a valid escape sequence yielding '. The second case is the same as above.
When you use variable names, only one round of unescaping is performed when you assign the values. eval('s') simply expands to the value of s without further unescaping. If you want an emulation of the first case, you need to eval(s), i.e. evaluate the value of the string referenced by s.

Related

I want to distinguish between true digit and a string digit

I want to check a 'x' string whether it is a digit or not in advance.
'1' is naturally a digit.
But and I will use ① what is calld a string number very much.
I don't know the range of string numbers IDE judges as a digit.
'①'.isdigit() returns True.
'⑴'.isdigit() returns True.
'ⅰ' or 'Ⅰ' returns False.
'㈠' returns False. (kanji version of (1) )
'❶' returns True.
I want to do like this.
for s in data:
if s.isdigit():
int_ = int(s)
If I accept '①', int will throw an error. Now, I write try:except for it.
Because I'm a japanese, I often use '①' or '⑴'
How to distinguish isdigit or not isdigit in advance?
Should I rely on try:except or counting all of them in advance?
regular expression?
The main problem is I don't know what is judged as a digit.
data = ["1", "23", "345", "①", "(1)", "(2)"]
This data is dynamic value. It will be changed every time.
Moreover, the string like this may expand in the future.
I hope the string of isdigit() == True is accepted by int().
I don't have an urgent problem because of try: except.
I believe that the str.isdecimal method fits your requirements. It excludes strings like '①', but includes other strings like '١' which are accepted by int.
>>> int('١')
1

Wrong boolean result in simple expression

I am getting a strange result in this boolean in python. I keep getting the wrong result.
string = '94070'
string[0:2] is '95' or string[0:2] is '94'
returns False, but when I hardcode in the value '94', it works
'94' is '95' or '94' is '94'
returns True. I've checked the data types and they are both of type 'str' so I'm not sure what is going on here.
Use == instead of is. In Python, the is operator does an object identity check. The == operator checks two objects (which may be different objects) to see whether they contain the same contents.
is is an identity test (is this the exact same object?), not equality test. While is works coincidentally, as an implementation detail for some things that aren't logically singletons, it shouldn't be used like this; use value equality testing with ==.
Your test of '94' is '94' can work due to a couple of related possibilities:
Python often coalesces constant literals in a function (sometimes only on a single line)
String literals are often interned by Python, so the same string literal expressed anywhere in the code references a common copy of that string
When you slice off bits of a string, interning isn't involved, so the identity test fails.
Use is to see if two arguments refer to the same object and == to see if they have the same value.
>>> a = 'this is some text.'
>>> b = 'this is some text.'
>>> a == b
True
>>> a is b
False
>>> a = 'this is some text.'
>>> b = a
>>> a == b
True
>>> a is b
True

How to convert hexadecimal string to character with that code point?

I have the string x = '0x32' and would like to turn it into y = '\x32'.
Note that len(x) == 4 and len(y) == 1.
I've tried to use z = x.replace("0", "\\"), but that causes z = '\\x32' and len(z) == 4. How can I achieve this?
You do not have to make it that hard: you can use int(..,16) to parse a hex string of the form 0x.... Next you simply use chr(..) to convert that number into a character with that Unicode (and in case the code is less than 128 ASCII) code:
y = chr(int(x,16))
This results in:
>>> chr(int(x,16))
'2'
But \x32 is equal to '2' (you can look it up in the ASCII table):
>>> chr(int(x,16)) == '\x32'
True
and:
>>> len(chr(int(x,16)))
1
Try this:
z = x[2:].decode('hex')
The ability to include code points like '\x32' inside a quoted string is a convenience for the programmer that only works in literal values inside the source code. Once you're manipulating strings in memory, that option is no longer available to you, but there are other ways of getting a character into a string based on its code point value.
Also note that '\x32' results in exactly the same string as '2'; it's just typed out differently.
Given a string containing a hexadecimal literal, you can convert it to its numeric value with int(str,16). Once you have a numeric value, you can convert it to the character with that code point via chr(). So putting it all together:
x = '0x32'
print(chr(int(x,16)))
#=> 2

'is' operator behaves differently when comparing strings with spaces

I've started learning Python (python 3.3) and I was trying out the is operator. I tried this:
>>> b = 'is it the space?'
>>> a = 'is it the space?'
>>> a is b
False
>>> c = 'isitthespace'
>>> d = 'isitthespace'
>>> c is d
True
>>> e = 'isitthespace?'
>>> f = 'isitthespace?'
>>> e is f
False
It seems like the space and the question mark make the is behave differently. What's going on?
EDIT: I know I should be using ==, I just wanted to know why is behaves like this.
Warning: this answer is about the implementation details of a specific python interpreter. comparing strings with is==bad idea.
Well, at least for cpython3.4/2.7.3, the answer is "no, it is not the whitespace". Not only the whitespace:
Two string literals will share memory if they are either alphanumeric or reside on the same block (file, function, class or single interpreter command)
An expression that evaluates to a string will result in an object that is identical to the one created using a string literal, if and only if it is created using constants and binary/unary operators, and the resulting string is shorter than 21 characters.
Single characters are unique.
Examples
Alphanumeric string literals always share memory:
>>> x='aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'
>>> y='aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'
>>> x is y
True
Non-alphanumeric string literals share memory if and only if they share the enclosing syntactic block:
(interpreter)
>>> x='`!##$%^&*() \][=-. >:"?<a'; y='`!##$%^&*() \][=-. >:"?<a';
>>> z='`!##$%^&*() \][=-. >:"?<a';
>>> x is y
True
>>> x is z
False
(file)
x='`!##$%^&*() \][=-. >:"?<a';
y='`!##$%^&*() \][=-. >:"?<a';
z=(lambda : '`!##$%^&*() \][=-. >:"?<a')()
print(x is y)
print(x is z)
Output: True and False
For simple binary operations, the compiler is doing very simple constant propagation (see peephole.c), but with strings it does so only if the resulting string is shorter than 21 charcters. If this is the case, the rules mentioned earlier are in force:
>>> 'a'*10+'a'*10 is 'a'*20
True
>>> 'a'*21 is 'a'*21
False
>>> 'aaaaaaaaaaaaaaaaaaaaa' is 'aaaaaaaa' + 'aaaaaaaaaaaaa'
False
>>> t=2; 'a'*t is 'aa'
False
>>> 'a'.__add__('a') is 'aa'
False
>>> x='a' ; x+='a'; x is 'aa'
False
Single characters always share memory, of course:
>>> chr(0x20) is ' '
True
To expand on Ignacio’s answer a bit: The is operator is the identity operator. It is used to compare object identity. If you construct two objects with the same contents, then it is usually not the case that the object identity yields true. It works for some small strings because CPython, the reference implementation of Python, stores the contents separately, making all those objects reference to the same string content. So the is operator returns true for those.
This however is an implementation detail of CPython and is generally neither guaranteed for CPython nor any other implementation. So using this fact is a bad idea as it can break any other day.
To compare strings, you use the == operator which compares the equality of objects. Two string objects are considered equal when they contain the same characters. So this is the correct operator to use when comparing strings, and is should be generally avoided if you do not explicitely want object identity (example: a is False).
If you are really interested in the details, you can find the implementation of CPython’s strings here. But again: This is implementation detail, so you should never require this to work.
The is operator relies on the id function, which is guaranteed to be unique among simultaneously existing objects. Specifically, id returns the object's memory address. It seems that CPython has consistent memory addresses for strings containing only characters a-z and A-Z.
However, this seems to only be the case when the string has been assigned to a variable:
Here, the id of "foo" and the id of a are the same. a has been set to "foo" prior to checking the id.
>>> a = "foo"
>>> id(a)
4322269384
>>> id("foo")
4322269384
However, the id of "bar" and the id of a are different when checking the id of "bar" prior to setting a equal to "bar".
>>> id("bar")
4322269224
>>> a = "bar"
>>> id(a)
4322268984
Checking the id of "bar" again after setting a equal to "bar" returns the same id.
>>> id("bar")
4322268984
So it seems that cPython keeps consistent memory addresses for strings containing only a-zA-Z when those strings are assigned to a variable. It's also entirely possible that this is version dependent: I'm running python 2.7.3 on a macbook. Others might get entirely different results.
In fact your code amounts to comparing objects id (i.e. their physical address). So instead of your is comparison:
>>> b = 'is it the space?'
>>> a = 'is it the space?'
>>> a is b
False
You can do:
>>> id(a) == id(b)
False
But, note that if a and b were directly in the comparison it would work.
>>> id('is it the space?') == id('is it the space?')
True
In fact, in an expression there's sharing between the same static strings. But, at the program scale there's only sharing for word-like strings (so neither spaces nor punctuations).
You should not rely on this behavior as it's not documented anywhere and is a detail of implementation.
Two or more identical strings of consecutive alphanumeric (only) characters are stored in one structure, thus they share their memory reference. There are posts about this phenomenon all over the internet since the 1990's. It has evidently always been that way. I have never seen a reasonable guess as to why that's the case. I only know that it is. Furthermore, if you split and re-join alphanumeric strings to remove spaces between words, the resulting identical alphanumeric strings do NOT share a reference, which I find odd. See below:
Add any non-alphanumeric value identically to both strings, and they instantly become copies, but not shared references.
a ="abbacca"; b = "abbacca"; a is b => True
a ="abbacca "; b = "abbacca "; a is b => False
a ="abbacca?"; b = "abbacca?"; a is b => False
~Dr. C.
'is' operator compare the actual object.
c is d should also be false. My guess is that python make some optimization and in that case, it is the same object.

Python: How to refer to a digit in a string by its index?

I feel like this is a simple question, but it keeps escaping me...
If I had a string, say, "1010101", how would I refer to the first digit in the string by its index?
You can get the first element of any sequence with [0]. Since a string is a sequence of characters, you're looking for s[0]:
>>> s = "1010101"
>>> s[0]
'1'
For a detailed explanation, refer to the Python tutorial on strings.
Negative indexes count from the right side.
digit = mystring[-1]
In Python, a sting is something called, subscriptable. That means that you can access the different parts using square brackets, just like you can with a list.
If you want to get the first character of the string, then you can simply use my_string[0].
If you need to get the last (character) in a string (the final 1 in the string you provided), then use my_string[-1].
If you originally have an int (or a long) and you are looking for the last digit, you are best off using % (modulous) (10101 % 10 => 1).
If you have a float, on the other hand, you are best of str(my_float)[-1]

Categories

Resources