PEP 8 doesn't mention the slice operator. From my understanding, unlike other operators, it should not be surrounded with whitespace
spam[3:5] # OK
spam[3 : 5] # NOT OK
Does this hold when using complex expressions, that is, which one is considered better style
1. spam[ham(66)//3:44+eggs()]
2. spam[ham(66) // 3: 44 + eggs()]
3. spam[ham(66) // 3 : 44 + eggs()]
4. something else?
As you already mentioned, PEP8 doesn't explicitly mention the slice operator in that format, but spam[3:5] is definitely more common and IMHO more readable.
If pep8 checker is anything to go by, the space before : will be flagged up
[me#home]$ pep8 <(echo "spam[3:44]") # no warnings
[me#home]$ pep8 <(echo "spam[3 : 44]")
/dev/fd/63:1:7: E203 whitespace before ':'
... but that's only because of it assumes : to be the operator for defining a literal dict and no space is expected before the operator. spam[3: 44] passes for that reason, but that just doesn't seem right.
On that count, I'd stick to spam[3:44].
Nested arithmetic operations are a little trickier. Of your 3 examples, only the 2nd one passes PEP8 validation:
[me#home]$ pep8 <(echo "spam[ham(66)//3:44+eggs()]")
/dev/fd/63:1:13: E225 missing whitespace around operator
[me#home]$ pep8 <(echo "spam[ham(66) // 3:44 + eggs()]") # OK
[me#home]$ pep8 <(echo "spam[ham(66) // 3 : 44 + eggs()]")
/dev/fd/63:1:18: E203 whitespace before ':'
However, I find all of the above difficult to parse by eye at first glance.
For readability and compliance with PEP8, I'd personally go for:
spam[(ham(66) // 3):(44 + eggs())]
Or for more complication operations:
s_from = ham(66) // 3
s_to = 44 + eggs()
spam[s_from:s_to]
I do see slicing used in PEP8:
- Use ''.startswith() and ''.endswith() instead of string slicing to check
for prefixes or suffixes.
startswith() and endswith() are cleaner and less error prone. For
example:
Yes: if foo.startswith('bar'):
No: if foo[:3] == 'bar':
I wouldn't call that definitive but it backs up your (and my) understanding:
spam[3:5] # OK
As far as which to use in the more complex situation, I'd use #3. I don't think the no-spaces-around-the-: method looks good in that case:
spam[ham(66) / 3:44 + eggs()] # looks like it has a time in the middle. Bad.
If you want the : to stand out more, don't sacrifice operator spacing, add extra spaces to the ::
spam[ham(66) / 3 : 44 + eggs()] # Wow, it's easy to read!
I would not use #1 because I like operator spacing, and #2 looks too much like the dictionary key: value syntax.
I also wouldn't call it an operator. It's special syntax for constructing a slice object -- you could also do
spam[slice(3, 5)]
I agree with your first example. For the latter one: PEP 20. Readability counts. The semantically most important part of your complex slice expression is the slice operator itself, it divides the expression into two parts that should be parsed (both by the human reader and the interpreter) separately. Therefore my intuition is that the consistency with PEP 8 should be sacrificed in order to highlight the : operator, ie. by surrounding it with whitespaces as in example 3. Question is if omitting the whitespaces within the two sides of the expression to increases readability or not:
1. spam[ham(66)/3 : 44+eggs()]
vs.
2. spam[ham(66) / 3 : 44 + eggs()]
I find 1. quicker to parse.
Related
I want to know why does the following happen.
The code below evaluates right side 1**3 first then 2**1
2**1**3 has the value of 2
However, for the below code left side 7//3 is evaluated first then 2*3. Finally 1+6-1=6.
1+7//3*3-1 has the value of 6
Take a look at the documentation of operator precedence. Although multiplication * and floor division // have the same precedence, you should take note of this part:
Operators in the same box group left to right (except for exponentiation, which groups from right to left).
For the convention of 213 being evaluated right-associative, see cross-site dupe on the math stackexchange site: What is the order when doing xyz and why?
The TL;DR is this: since the left-associative version (xy)z would just equal xy*z, it's not useful to have another (worse) notation for the same thing, so exponentiation should be right associative.
Almost all operators in Python (that share the same precedence) have left-to-right associativity. For example:
1 / 2 / 3 ≡ (1 / 2) / 3
One exception is the exponent operator which is right-to-left associativity:
2 ** 3 ** 4 ≡ 2 ** (3 ** 4)
That's just the way the language is defined, matching mathematical notation where abc ≡ a(bc).
If it were (ab)c, that would just be abc.
Per Operator Precedence, the operator is right associative: a**b**c**d == a**(b**(c**d)).
So, if you do this:
a,b,c,d = 2,3,5,7
a**b**c**d == a**(b**(c**d))
you should get true after a looooong time.
The Exponent operator in python has a Right to Left precedence. That is out of all the occurrences in an expression the calculation will be done from Rightmost to Leftmost. The exponent operator is an exception among all the other operators as most of them follow a Left to Right associativity rule.
2**1**3 = 2
The expression
1+7//3*3-1
It is a simple case of Left to Right associativity. As // and * operator share the same precedence, Associativity(one is the Left) is taken into account.
This is just how math typically works
213
This is the same as the first expression you used. To evaluate this with math, you'd work your way down, so 13=1 and then 21 which equals 2.
You can make sense of this just by thinking about the classic PEMDAS (or Please Excuse My Dear Aunt Sally) order of operations from mathematics. In your first one, 2**1**3 is equivalent to , which is really read as . Looking at it this way, you see that you do parenthesis (P) first (the 1**3).
In the second one, 1+7//3*3-1 == 6 you have to note that the MD and AS of PEMDAS are actually done in order of whichever comes first reading from left-to-right. It's simply a fault of language that we have to write one letter before another (that is, we could write this as PEDMAS and it still be correct if we treat the D and M appropriately).
All that to say, Python is treating the math exactly the same way as we should even if this were written with pen and paper.
This problem asks to sum up 100 numbers, each 50 digits long. http://code.jasonbhill.com/python/project-euler-problem-13/
We can replace \n with "\n+" in Notepad++ yielding
a=37107287533902102798797998220837590246510135740250
+46376937677490009712648124896970078050417018260538
...
+20849603980134001723930671666823555245252804609722
+53503534226472524250874054075591789781264330331690
print(a)
>>37107287533902102798797998220837590246510135740250 (incorrect)
We can as well replace \n with \na+= yielding
a=37107287533902102798797998220837590246510135740250
a+=46376937677490009712648124896970078050417018260538
...
a+=20849603980134001723930671666823555245252804609722
a+=53503534226472524250874054075591789781264330331690
print(a)
>>553... (correct)
This seems to be a feature of BigInteger arithmetic. Under which conditions a sum of all numbers (Method 1) yields different result from an iterative increment (Method 2)?
As you can see in the result, the first set of instruction is not computing the sum. It preserved the first assignment. Since +N is on its own a valid instruction, the next lines after the assignment do nothing. Thus
a=42
+1
print a
prints 42
To write an instruction over two lines, you need to escape the ending newline \n :
a=42\
+1
43
Python source code lines are terminated by newline characters. The subsequent lines in the first example are separate expression statements consisting of a single integer with a unary plus operator in front, but they don't do anything. They evaluate the expression (resulting in the integer constant itself), and then ignore the result. If you put all numbers on a single line, or use parentheses around the addition, the simple sum will work as well.
Let's take a look at the simplest arithmetic example in the pyparsing doc, here.
More specifically, I'm looking at the "+" operation that is defined as left associative and the first example test where we're parsing "9 + 2 + 3".
The outcome of the parsing I would have expected would be ((9+2)+3), that is, first compute the infix binary operator on 9 and 2 and then compute the infix binary operator on the result and 3. What I get however is (9+2+3), all on the same level, which is really not all that helpful, after all I have now to decide the order of evaluation myself and yet it was defined to be left associative. Why am I forced to parenthesize myself? What am I missing?
Thanks & Regards
Examples of slicing in documentation only show integer literals and variables used as indices, not more complex expressions (e.g. myarray[x/3+2:x/2+3:2]). PEP-8 also doesn't cover this case. What is the usual usage of whitespace here: myarray[x/3+2:x/2+3:2], myarray[x/3+2 : x/2+3 : 2], or myarray[x/3+2: x/2+3: 2] (there don't seem to be other reasonable options)?
I have never seen spaces used in slicing operations, so would err on the side of avoiding them. Then again, unless it's performance critical I'd be inclined to move the expressions outside of the slicing operation altogether. After all, your goal is readability:
lower = x / 3 + 2
upper = x / 2 + 3
myarray[lower:upper:2]
I believe the most relevant extract of PEP8 on this subject is:
The guidelines provided here are intended to improve the readability of code and make it consistent across the wide spectrum of Python code.
In this case, my personal choice would probably be either Steve Mayne's answer, or perhaps:
myarray[slice(x / 3 + 2, x / 2 + 3, 2)]
Rule 1. Pet Peeves
However, in a slice the colon acts like a binary operator, and should
have equal amounts on either side (treating it as the operator with
the lowest priority). In an extended slice, both colons must have the
same amount of spacing applied. Exception: when a slice parameter is
omitted, the space is omitted:
Rule 2. Other Recommendations
If operators with different priorities are used, consider adding whitespace around the operators with the lowest priority(ies). Use your own judgment; however, never use more than one space, and always have the same amount of whitespace on both sides of a binary operator:
The following fails rule 2.
myarray[x/3+2:x/2+3:2]
myarray[x/3+2:x/2+3:2]
myarray[x/3+2 : x/2+3 : 2]
myarray[x/3+2: x/2+3: 2]
So the answer is,
myarray[x/3 + 2 : x/2 + 3 : 2]
Black Playground link Bug
I'm having trouble getting a replace() to work
I've tried my_string.replace('\\', '') and re.sub('\\', '', my_string), but neither one works.
I thought \ was the escape code for backslash, am I wrong?
The string in question looks like
'<2011315123.04C6DACE618A7C2763810#\x82\xb1\x82\xea\x82\xa9\x82\xe7\x8c\xa9\x82\xa6\x82\xe9\x82\xbe\x82\xeb\x82\xa4>'
or print my_string
<2011315123.04C6DACE618A7C2763810#???ꂩ?猩???邾?낤>
Yes, it's supposed to look like garbage, but I'd rather get
'<2011315123.04C6DACE618A7C2763810#82b182ea82a982e78ca982a682e982be82eb82a4>'
You don't have any backslashes in your string. What you don't have, you can't remove.
Consider what you are showing as '\x82' ... this is a one-byte string.
>>> s = '\x82'
>>> len(s)
1
>>> ord(s)
130
>>> hex(ord(s))
'0x82'
>>> print s
é # my sys.stdout.encoding is 'cp850'
>>> print repr(s)
'\x82'
>>>
What you'd "rather get" ('x82') is meaningless.
Update The "non-ascii" part of the string (bounded by # and >) is actually Japanese text written mostly in Hiragana and encoded using shift_jis. Transcript of IDLE session:
>>> y = '\x82\xb1\x82\xea\x82\xa9\x82\xe7\x8c\xa9\x82\xa6\x82\xe9\x82\xbe\x82\xeb\x82\xa4'
>>> print y.decode('shift_jis')
これから見えるだろう
Google Translate produces "Can not you see the future" as the English translation.
In a comment on another answer, you say:
I just need ascii
and
What I'm doing with it is seeing how
far apart the two strings are using
nltk.edit_distance(), so this will
give me a multiple of the true
distance. Which is good enough for me.
Why do you think you need ASCII? Edit distance is defined quite independently of any alphabet.
For a start, doing nonsensical transformations of your strings won't give you a consistent or predicable multiple of the true distance. Secondly, out of the following:
x
repr(x)
repr(x).replace('\\', '')
repr(x).replace('\\x', '') # if \ is noise, so is x
x.decode(whatever_the_encoding_is)
why do you choose the third?
Update 2 in response to comments:
(1) You still haven't said why you think you need "ascii". nltk.edit_distance doesn't require "ascii" -- the args are said to be "strings" (whatever that means) but the code will work with any 2 sequences of objects for which != works. In other words, why not just use the first of the above 5 options?
(2) Accepting up to 100% inflation of the edit distance is somwhat astonishing. Note that your currently chosen method will use 4 symbols (hex digits) per Japanese character. repr(x) uses 8 symbols per character. x (the first option) uses 2.
(3) You can mitigate the inflation effect by normalising your edit distance. Instead of comparing distance(s1, s2) with a number_of_symbols threshold, compare distance(s1, s2) / float(max(len(s1), len(s2))) with a fraction threshold. Note normalisation is usually used anyway ... the rationale being that the dissimilarity between 20-symbol strings with an edit distance of 4 is about the same as that between 10-symbol strings with an edit distance of 2, not twice as much.
(4) nltk.edit_distance is the most shockingly inefficient pure-Python implementation of edit_distance that I've ever seen. This implementation by Magnus Lie Hetland is much better, but still capable of improvement.
This works i think if you really want to just strip the "\"
>>> a = '<2011315123.04C6DACE618A7C2763810#\x82\xb1\x82\xea\x82\xa9\x82\xe7\x8c\xa9\x82\xa6\x82\xe9\x82\xbe\x82\xeb\x82\xa4>'
>>> repr(a).replace("\\","")[1:-1]
'<2011315123.04C6DACE618A7C2763810#x82xb1x82xeax82xa9x82xe7x8cxa9x82xa6x82xe9x82xbex82xebx82xa4>'
>>>
But like the answer above, what you get is pretty much meaningless.