Python set constructor syntax - python

Does anyone know the justification for this confusing set construction syntax? I spent a day unable to find this bug because I missed a comma in constructing a set.
> {1 2}
SyntaxError: invalid syntax # This makes sense.
> {'a' 'b'} = set(['ab']) # This does not.

That's not a set construction syntax thing. You're running into implicit string literal concatenation, a confusing and surprising corner of the language:
>>> 'a' 'b'
'ab'
If you write two string literals next to each other, they're implicitly combined into one string. (This only works with literals; str(3) str([]) is a syntax error, not '3[]'.)

This has nothing to do with sets.
Two string literals separated by whitespace are considered one string literal.
rationale = ('This is quite useful when you need to construct '
'a long literal without useless "+" and without '
'the indentation and newlines which triple-quotes bring.')

Do you mean
>>> {'a' 'b'} == set(['ab'])
True
?
That's just because 2 strings are concatenated to 1 string:
>>> type('a' 'b')
<class 'str'>
>>> len('a' 'b')
2
>>> print('a' 'b')
ab

Related

Python's re.findall seems to work when not in function, but fails when in a function [duplicate]

Trying to understand regular expressions and I am on the repetitions part: {m, n}.
I have this code:
>>> p = re.compile('a{1}b{1, 3}')
>>> p.match('ab')
>>> p.match('abbb')
As you can see both the strings are not matching the pattern. Why is this happening?
You shouldn't put a space after the comma, and the {1} is redundant.
Try
p = re.compile('a{1}b{1,3}')
...and mind the space.
Remove the extra whitespace in b.
Change:
p = re.compile('a{1}b{1, 3}')
to:
p = re.compile('a{1}b{1,3}')
^ # no whitespace
and all should be well.
You are seeing some re behaviour that is very "dark corner", nigh on a bug (or two).
# Python 2.7.1
>>> import re
>>> pat = r"b{1, 3}\Z"
>>> bool(re.match(pat, "bb"))
False
>>> bool(re.match(pat, "b{1, 3}"))
True
>>> bool(re.match(pat, "bb", re.VERBOSE))
False
>>> bool(re.match(pat, "b{1, 3}", re.VERBOSE))
False
>>> bool(re.match(pat, "b{1,3}", re.VERBOSE))
True
>>>
In other words, the pattern "b{1, 3}" matches the literal text "b{1, 3}" in normal mode, and the literal text "b{1,3}" in VERBOSE mode.
The "Law of Least Astonishment" would suggest either (1) the space in front of the 3 was ignored and it matched "b", "bb", or "bbb" as appropriate [preferable] or (2) an exception at compile time.
Looking at it another way: Two possibilities: (a) The person who writes "{1, 3}" is imbued with the spirit of PEP8 and believes it is prescriptive and applies everywhere (b) The person who writes that has tested re undocumented behaviour and actually wants to match the literal text "b{1, 3}" and perversely wants to use r"b{1, 3}" instead of explicitly escaping: r"b\{1, 3}". Seems to me that (a) is much more probable than (b), and re should act accordingly.
Yet another perspective: When the space is reached, it has already parsed {, a string of digits, and a comma i.e. well into the {m,n} "operator" ... to silently ignore an unexpected character and treat it as though it was literal text is mind-boggling, perlish, etc.
Update Bug report lodged.
Do not insert spaces between { and }.
p = re.compile('a{1}b{1,3}')
You can compile the regex with VERBOSE flag, this means most whitespace in the regex would be ignored. I think this is a very good practice to describe complex regular expressions in a more readable manner.
See here for details...
Hope this helps...

Backward slash added when assigning in dictionary. how to avoid it

When i assign a windows path as a value in dictionary, the backward slash gets added.
I did try using raw string.
p = "c:\windows\pat.exe"
print p
c:\windows\pat.exe
d = {"p": p}
print d
{'p': 'c:\\windows\\pat.exe'}
Tried it as raw string
d = {"p": r"%s" % p}
print d
{'p': 'c:\\windows\\pat.exe'}
I dont want the backslash to added when assigned to value in dictionary.
This is a mistake that's very common among people new to Python.
TL;DR:
>>> print "c:\windows\pat.exe" == 'c:\\windows\\pat.exe'
True
Explanation:
In the first instance, where you're assigning a value to the string p and then printing p, Python gets the string to print itself and it does so by outputting its literal value. In your example:
>>> p = "c:\windows\pat.exe"
>>> print p
c:\windows\pat.exe
In Python 3, the same:
>>> p = "c:\windows\pat.exe"
>>> print(p)
c:\windows\pat.exe
In the second instance, since you're creating and then printing a dictionary, Python asks the dictionary to print itself. It does so by printing a short Python code representation of itself, since there is no standard simple way of printing a dictionary, like there is for variables with simple types like strings or numbers.
In your example (slightly modified to work by itself):
>>> d = {"p": "c:\windows\pat.exe"}
>>> print d
{'p': 'c:\\windows\\pat.exe'}
So, why does the value of p in the Python code representation have the double backslashes? Because a single backslash in a string literal has an ambiguous meaning. In your example, it just so happens that \w and \p don't have special meanings in Python. However, you've maybe seen things like \n and perhaps \t used in strings to represent a new line or a tab character.
For example:
>>> print "Hello\nworld!"
Hello
world!
So how does Python know when to print a new line and when to print \n literally, when you want to? It doesn't. It just assumes that if the character after the \ doesn't make for a special character, you probably wanted to write a \ and if it is, you wanted to write the special character. If you want to literally write a \, regardless of what follows, you need to follow up the escape character (that's what the \ is called in this context) with another one.
For example:
>>> print "I can see \\n"
I can see \n
That way, there is no ambiguity and Python knows exactly what is intended. You should always write backslashes as double backslashes in normal string literals, instead of relying on luck in avoiding control characters like \n or \t. And that's why Python, when printing its code version of your string "c:\windows\pat.exe", prefers to write it as 'c:\\windows\\pat.exe'. Using single quotes, which are preferred even though double quotes are fine too and using double backslashes.
It's just how it is written in code, "really" your string has single backslashes and the quotes are of course not part of it at all.
If you don't like having to write double backslashes, you can consider using 'raw strings', which is prefixing a string with r or R, telling Python to ignore special characters and take the string exactly as written in code:
>>> print r"This won't have \n a line break"
This won't have \n a line break
But watch out! This doesn't work if you want your last characters in the string to be an odd number of \, for reasons not worth getting into. In that case, you have no other recourse than writing the string with double backslashes:
>>> print r"Too bad\"
File "<stdin>", line 1
print r"Too bad\"
^
SyntaxError: EOL while scanning string literal
>>> print r"Too bad\\"
Too bad\\
>>> print "Too bad\\"
Too bad\
Maybe it is not a problem, because when you print the values (not the whole dictionary) the string will have one backslash
p = "c:\windows\pat.exe"
d = {"p": p}
print (d)
{'p': 'c:\\windows\\pat.exe'}
for i in d:
print("key:", i, " value:", d[i])
Output
{'p': 'c:\\windows\\pat.exe'}
key: p value: c:\windows\pat.exe
>>>

Variants of string concatenation?

Out of the following two variants (with or without plus-sign between) of string literal concatenation:
What's the preferred way?
What's the difference?
When should one or the other be used?
Should non of them ever be used, if so why?
Is join preferred?
Code:
>>> # variant 1. Plus
>>> 'A'+'B'
'AB'
>>> # variant 2. Just a blank space
>>> 'A' 'B'
'AB'
>>> # They seems to be both equal
>>> 'A'+'B' == 'A' 'B'
True
Juxtaposing works only for string literals:
>>> 'A' 'B'
'AB'
If you work with string objects:
>>> a = 'A'
>>> b = 'B'
you need to use a different method:
>>> a b
a b
^
SyntaxError: invalid syntax
>>> a + b
'AB'
The + is a bit more obvious than just putting literals next to each other.
One use of the first method is to split long texts over several lines, keeping
indentation in the source code:
>>> a = 5
>>> if a == 5:
text = ('This is a long string'
' that I can continue on the next line.')
>>> text
'This is a long string that I can continue on the next line.'
''join() is the preferred way to concatenate more strings, for example in a list:
>>> ''.join(['A', 'B', 'C', 'D'])
'ABCD'
The variant without + is done during the syntax parsing of the code. I guess it was done to let you write multiple line strings nicer in your code, so you can do:
test = "This is a line that is " \
"too long to fit nicely on the screen."
I guess that when it's possible, you should use the non-+ version, because in the byte code there will be only the resulting string, no sign of concatenation left.
When you use +, you have two string in your code and you execute the concatenation during runtime (unless interpreters are smart and optimize it, but I don't know if they do).
Obviously, you cannot do:
a = 'A'
ba = 'B' a
Which one is faster? The no-+ version, because it is done before even executing the script.
+ vs join -> If you have a lot of elements, join is prefered because it is optimised to handle many elements. Using + to concat multiple strings creates a lot of partial results in the process memory, while using join doesn't.
If you're going to concat just a couple of elements I guess + is better as it's more readable.

Using tuple efficiently with strip()

Consider a basic tuple used with the built-in method str.startswith():
>>> t = 'x','y','z'
>>> s = 'x marks the spot'
>>> s.startswith( t )
True
It seems a loop is required when using a tuple with str.strip():
>>> for i in t: s.strip(i)
...
' marks the spot'
'x marks the spot'
'x marks the spot'
This seems wasteful perhaps; is there a more efficient way to use tuple items with str.strip()? s.strip(t) seemed logical, however, no bueno.
Stripping a string is a very different usecase from testing if a string starts with a given text, so the methods differ materially in how they treat their input.
str.strip() takes one string, which is treated as a set of characters; the string will be stripped of each of those characters in the set; as long as the string starts of ends with a character that is a member of the set, that starting or ending character is removed, until start and end are free of any characters from the given set.
If you have a tuple join it into one string:
s.strip(''.join(t))
or pass it in as a string literal:
s.strip('xyz')
Note that this means something different from using str.strip() with each individual tuple element!
Compare:
>>> s = 'yxz_middle_zxy'
>>> for t in ('x', 'y', 'z'):
... print(s.strip(t))
...
yxz_middle_zxy
xz_middle_zx
yxz_middle_zxy
>>> print(s.strip('xyz'))
_middle_
Even if you chained the str.strip() calls with individual character, it would still not produce the same output because the str.strip() call:
>>> for t in ('x', 'y', 'z'):
... s = s.strip(t)
...
>>> print(s)
xz_middle_zx
because the string never started or ended in x or z when those character were stripped; x only ended up at the start and end because the y character was stripped in a next step.

How to convert an integer to hexadecimal without the extra '0x' leading and 'L' trailing characters in Python?

I am trying to convert big integer number to hexadecimal, but in result I get extra "0x" in the beginning and "L" at the and. Is there any way to remove them. Thanks.
The number is:
44199528911754184119951207843369973680110397865530452125410391627149413347233422
34022212251821456884124472887618492329254364432818044014624401131830518339656484
40715571509533543461663355144401169142245599341189968078513301836094272490476436
03241723155291875985122856369808620004482511813588136695132933174030714932470268
09981252011612514384959816764532268676171324293234703159707742021429539550603471
00313840833815860718888322205486842202237569406420900108504810
In hex I get:
0x2ef1c78d2b66b31edec83f695809d2f86e5d135fb08f91b865675684e27e16c2faba5fcea548f3
b1f3a4139942584d90f8b2a64f48e698c1321eee4b431d81ae049e11a5aa85ff85adc2c891db9126
1f7f2c1a4d12403688002266798ddd053c2e2670ef2e3a506e41acd8cd346a79c091183febdda3ca
a852ce9ee2e126ca8ac66d3b196567ebd58d615955ed7c17fec5cca53ce1b1d84a323dc03e4fea63
461089e91b29e3834a60020437db8a76ea85ec75b4c07b3829597cfed185a70eeaL
The 0x is literal representation of hex numbers. And L at the end means it is a Long integer.
If you just want a hex representation of the number as a string without 0x and L, you can use string formatting with %x.
>>> a = 44199528911754184119951207843369973680110397
>>> hex(a)
'0x1fb62bdc9e54b041e61857943271b44aafb3dL'
>>> b = '%x' % a
>>> b
'1fb62bdc9e54b041e61857943271b44aafb3d'
Sure, go ahead and remove them.
hex(bignum).rstrip("L").lstrip("0x") or "0"
(Went the strip() route so it'll still work if those extra characters happen to not be there.)
Similar to Praveen's answer, you can also directly use built-in format().
>>> a = 44199528911754184119951207843369973680110397
>>> format(a, 'x')
'1fb62bdc9e54b041e61857943271b44aafb3d'
I think it's dangerous idea to use strip.
because lstrip or rstrip strips 0.
ex)
a = '0x0'
a.lstrip('0x')
''
result is '', not '0'.
In your case, you can simply use replace to prevent above situation.
Here's sample code.
hex(bignum).replace("L","").replace("0x","")
Be careful when using the accepted answer as lstrip('0x') will also remove any leading zeros, which may not be what you want, see below:
>>> account = '0x000067'
>>> account.lstrip('0x')
'67'
>>>
If you are sure that the '0x' prefix will always be there, it can be removed simply as follows:
>>> hex(42)
'0x2a'
>>> hex(42)[2:]
'2a'
>>>
[2:] will get every character in the string except for the first two.
A more elegant way would be
hex(_number)[2:-1]
but you have to be careful if you're working with gmpy mpz types,
then the 'L' doesn't exist at the end and you can just use
hex(mpz(_number))[2:]

Categories

Resources