Let's say I need a slice from the end of a sequence seq to the first occurrence of a given item x (inclusive). The naive attempt to write seq[-1:seq.index(x)-1:-1] creates a subtle bug:
seq = 'abc'
seq[-1:seq.index('b')-1:-1] # 'cb' as expected
seq[-1:seq.index('a')-1:-1] # '' because -1 is interpreted as end of seq
Is there any idiomatic way to write this?
seq[seq.index(x):][::-1] works fine, but it is presumably inefficient for large sequences since it creates an extra copy. (I do need a sequence in the end, so one copy is necessary; I just don't want to create a second copy.)
On a side note, this is a really easy bug to introduce, it can pass many tests, and is undetectable to any static analyzer (unless it warns about every slice with a negative step).
Update
There seems to be no perfect / idiomatic solution. I agree that it may not be the bottleneck as often as I thought, so I'll use [pos:][::-1] in most cases. When performance is important, I'd use the normal if check. However, I'll accept the solution that I found interesting even though it's hard to read; it's probably usable in certain rare cases (where I really need to fit the whole thing into an expression, and I don't want to define a new function).
Also, I tried timing this. For lists it seems there's always a 2x penalty for an extra slice even if they are as short as 2 items. For strings, the results are extremely inconsistent, to the point that I can't say anything:
import timeit
for n in (2, 5, 10, 100, 1000, 10000, 100000, 1000000):
c = list(range(n))
# c = 'x' * n
pos = n // 2 # pretend the item was found in the middle
exprs = 'c[pos:][::-1]', 'c[:pos:-1] if pos else c[::-1]'
results = [timeit.Timer(expr, globals=globals()).autorange() for expr in exprs]
times = [t/loops for loops, t in results]
print(n, times[0]/times[1])
Results for lists (ratio of extra slice / no extra slice times):
2 2.667782437753884
5 2.2672817613246914
10 1.4275235266754878
100 1.6167102119737584
1000 1.7309116253903338
10000 3.606259720606781
100000 2.636049703318956
1000000 1.9915776615090277
Of course, this ignores the fact that whatever it is we're doing with the resulting slice is much more costly, in relative terms, when the slice is short. So still, I agree that for sequences of small size, [::-1] is usually perfectly fine.
If an iterator result is okay, use a forward slice and call reversed on it:
reversed(seq[seq.index(whatever):])
If it isn't, subtract an extra len(seq) from the endpoint:
seq[:seq.index(whatever)-len(seq)-1:-1]
Or just take a forward slice, slice it again to reverse it, and eat the cost of the extra copy. It's probably not your bottleneck.
Whatever you do, leave a comment explaining it so people don't reintroduce the bug when editing, and write a unit test for this case.
IMHO, seq[seq.index(x):][::-1] is the most readable solution, but here's a way that's a little more efficient.
def sliceback(seq, key):
pos = seq.index(key)
return seq[:pos-1 if pos else None:-1]
seq = 'abc'
for k in seq:
print(k, sliceback(seq, k))
output
a cba
b cb
c c
As Budo Zindovic mentions in the comments, .index will raise an exception if the char isn't found in the string. Depending on the context, the code may not ever be called with a char that's not in seq, but if it's possible we need to handle it. The simplest way to do that is to catch the exception:
def sliceback(seq, key):
try:
pos = seq.index(key)
except ValueError:
return ''
return seq[:pos-1 if pos else None:-1]
seq = 'abc'
for k in 'abcd':
print(k, sliceback(seq, k))
output
a cba
b cb
c c
d
Python exception handling is very efficient. When the exception isn't actually raised it's faster than equivalent if-based code, but if the exception is raised more than 5-10% of the time it's faster to use an if.
Rather than testing for key before calling seq.index, it's more efficient to use find. Of course, that will only work if seq is a string; it won't work if seq is a list because (annoyingly) lists don't have a .find method.
def sliceback(seq, key):
pos = seq.find(key)
return '' if pos < 0 else seq[:pos-1 if pos else None:-1]
You can check for pos while assigning the string, for example:
result = seq[-1:pos-1:-1] if pos > 0 else seq[::-1]
input:
pos = seq.index('a')
output:
cba
Related
I recently started looking into recursion to clean up my code and "up my game" as it were. As such, I'm trying to do things which could normally be accomplished rather simply with loops, etc., but practicing them with recursive algorithms instead.
Currently, I am attempting to generate a two-dimensional array which should theoretically resemble a sort of right-triangle in an NxN formation given some height n and the value which will get returned into the 2D-array.
As an example, say I call: my_function(3, 'a');, n = 3 and value = 'a'
My output returned should be: [['a'], ['a', 'a'], ['a', 'a', 'a']]
[['a'],
['a', 'a'],
['a', 'a', 'a']]
Wherein n determines both how many lists will be within the outermost list, as well as how many elements should successively appear within those inner-lists in ascending order.
As it stands, my code currently looks as follows:
def my_function(n, value):
base_val = [value]
if n == 0:
return [base_val]
else:
return [base_val] + [my_function(n-1, value)]
Unfortunately, using my above example n = 3 and value = 'a', this currently outputs: [['a'], [['a'], [['a'], [['a']]]]]
Now, this doesn't have to get formatted or printed the way I showed above in a literal right-triangle formation (that was just a visualization of what I want to accomplish).
I will answer any clarifying questions you need, of course!
return [base_val]
Okay, for n == 0 we get [[value]]. Solid. Er, sort of. That's the result with one row in it, right? So, our condition for the base case should be n == 1 instead.
Now, let's try the recursive case:
return [base_val] + [my_function(n-1, value)]
We had [[value]], and we want to end up with [[value], [value, value]]. Similarly, when we have [[value], [value, value]], we want to produce [[value], [value, value], [value, value, value]] from it. And so on.
The plan is that we get one row at the moment, and all the rest of the rows by recursing, yes?
Which rows will we get by recursing? Answer: the ones at the beginning, because those are the ones that still look like a triangle in isolation.
Therefore, which row do we produce locally? Answer: the one at the end.
Therefore, how do we order the results? Answer: we need to get the result from the recursive call, and add a row to the end of it.
Do we need to wrap the result of the recursive call? Answer: No. It is already a list of lists. We're just going to add one more list to the end of it.
How do we produce the last row? Answer: we need to repeat the value, n times, in a list. Well, that's easy enough.
Do we need to wrap the local row? Answer: Yes, because we want to append it as a single item to the recursive result - not concatenate all its elements.
Okay, let's re-examine the base case. Can we properly handle n == 0? Yes, and it makes perfect sense as a request, so we should handle it. What does our triangle look like with no rows in it? Well, it's still a list of rows, but it doesn't have any rows in it. So that's just []. And we can still append the first row to that, and proceed recursively. Great.
Let's put it all together:
if n == 0:
return []
else:
return my_function(n-1, value) + [[value] * n]
Looks like base_val isn't really useful any more. Oh well.
We can condense that a little further, with a ternary expression:
return [] if n == 0 else (my_function(n-1, value) + [[value] * n])
You have a couple logic errors: off-by-1 with n, growing the wrong side (critically, the non-base implementation should not use a base-sized array), growing by an array of the wrong size. A fixed version:
#!/usr/bin/env python3
def my_function(n, value):
if n <= 0:
return []
return my_function(n-1, value) + [[value]*n]
def main():
print(my_function(3, 'a'))
if __name__ == '__main__':
main()
Since you're returning mutable, you can get some more efficiency by using .append rather than +, which would make it no longer functional. Also note that the inner mutable objects don't get copied (but since the recursion is internal this doesn't really matter in this case).
It would be possible to write a tail-recursive version of this instead, by adding a parameter.
But python is a weird language for using unnecessary recursion.
The easiest way for me to think about recursive algorithms is in terms of the base case and how to build on that.
The base case (case where no recursion is necessary) is when n = 1 (or n = 0, but I'm going to ignore that case). A 1x1 "triangle" is just a 1x1 list: [[a]].
So how do we build on that? Well, if n = 2, we can assume we already have that base case value (from calling f(1)) of [[a]]. So we need to add [a, a] to that list.
We can generalize this as:
f(1) = [[a]]
f(n > 1) = f(n - 1) + [[a] * n]
, or, in Python:
def my_function(n, value):
if n == 1:
return [[value]]
else:
return my_function(n - 1, value) + [[value] * n]
While the other answers proposed another algorithm for solving your Problem, it could have been solved by correcting your solution:
Using a helper function such as:
def indent(x, lst):
new_lst = []
for val in lst:
new_lst += [x] + val
return new_lst
You can implement the return in the original function as:
return [base_val] + indent(value, [my_function(n-1, value)])
The other solutions are more elegant though so feel free to accept them.
Here is an image explaining this solution.
The red part is your current function call and the green one the previous function call.
As you can see, we also need to add the yellow part in order to complete the triangle.
These are the other solutions.
In these solutions you only need to add a new row, so that it's more elegant overall.
So I just came across what seems to me like a strange Python feature and wanted some clarification about it.
The following array manipulation somewhat makes sense:
p = [1,2,3]
p[3:] = [4]
p = [1,2,3,4]
I imagine it is actually just appending this value to the end, correct?
Why can I do this, however?
p[20:22] = [5,6]
p = [1,2,3,4,5,6]
And even more so this:
p[20:100] = [7,8]
p = [1,2,3,4,5,6,7,8]
This just seems like wrong logic. It seems like this should throw an error!
Any explanation?
-Is it just a weird thing Python does?
-Is there a purpose to it?
-Or am I thinking about this the wrong way?
Part of question regarding out-of-range indices
Slice logic automatically clips the indices to the length of the sequence.
Allowing slice indices to extend past end points was done for convenience. It would be a pain to have to range check every expression and then adjust the limits manually, so Python does it for you.
Consider the use case of wanting to display no more than the first 50 characters of a text message.
The easy way (what Python does now):
preview = msg[:50]
Or the hard way (do the limit checks yourself):
n = len(msg)
preview = msg[:50] if n > 50 else msg
Manually implementing that logic for adjustment of end points would be easy to forget, would be easy to get wrong (updating the 50 in two places), would be wordy, and would be slow. Python moves that logic to its internals where it is succint, automatic, fast, and correct. This is one of the reasons I love Python :-)
Part of question regarding assignments length mismatch from input length
The OP also wanted to know the rationale for allowing assignments such as p[20:100] = [7,8] where the assignment target has a different length (80) than the replacement data length (2).
It's easiest to see the motivation by an analogy with strings. Consider, "five little monkeys".replace("little", "humongous"). Note that the target "little" has only six letters and "humongous" has nine. We can do the same with lists:
>>> s = list("five little monkeys")
>>> i = s.index('l')
>>> n = len('little')
>>> s[i : i+n ] = list("humongous")
>>> ''.join(s)
'five humongous monkeys'
This all comes down to convenience.
Prior to the introduction of the copy() and clear() methods, these used to be popular idioms:
s[:] = [] # clear a list
t = u[:] # copy a list
Even now, we use this to update lists when filtering:
s[:] = [x for x in s if not math.isnan(x)] # filter-out NaN values
Hope these practical examples give a good perspective on why slicing works as it does.
The documentation has your answer:
s[i:j]: slice of s from i to j (note (4))
(4) The slice of s from i to j is defined as the sequence of items
with index k such that i <= k < j. If i or j is greater than
len(s), use len(s). If i is omitted or None, use 0. If j
is omitted or None, use len(s). If i is greater than or equal to
j, the slice is empty.
The documentation of IndexError confirms this behavior:
exception IndexError
Raised when a sequence subscript is out of range. (Slice indices are silently truncated to fall in the allowed range; if an index is
not an integer, TypeError is raised.)
Essentially, stuff like p[20:100] is being reduced to p[len(p):len(p]. p[len(p):len(p] is an empty slice at the end of the list, and assigning a list to it will modify the end of the list to contain said list. Thus, it works like appending/extending the original list.
This behavior is the same as what happens when you assign a list to an empty slice anywhere in the original list. For example:
In [1]: p = [1, 2, 3, 4]
In [2]: p[2:2] = [42, 42, 42]
In [3]: p
Out[3]: [1, 2, 42, 42, 42, 3, 4]
I have a list of numbers:
a = [1,2,3,4,5,19,22,25,17,6,73,72,71,77,899,887,44,124, ...]
#this is an abbreviated version of the list
I need to determine if there are duplicates in the list or not using the XOR ("^") operator.
Can anyone give me any tips? I'm a newbie and have never encountered this problem or used the XOR operator before.
I've tried several approaches (which have amounted to blind stabs in the dark). The last one was this:
MyDuplicatesList = [1,5,12,156,166,2656,6,4,5,9] #changed the list to make it easer
for x in MyDuplicatesList:
if x^x:
print("True")
I realize I'm probably violating a protocol by asking such an open-ended question, but I'm completely stumped.
Why XOR?
# true if there are duplicates
print len(set(a)) != len(a)
Ok, THIS is pythonic. It finds all the duplicates and makes a list of them.
a = [1,2,3,4,5,19,22,25,17,6,73,72,71,77,899,887,44,124,1]
b = [a[i] for i in range(len(a)) for j in range(i+1,len(a)) if i ^ j > 0 if a[i] ^ a[j] < 1]
print b
Simplified version of Dalen's idea:
a = [1,2,3,4,5,19,22,25,17,6,73,72,71,77,899,887,44,124,1]
def simplifyDalen(source):
dup = 0
for x in source:
for y in source:
dup += x ^ y == 0
return dup ^ len(source) > 0
result = simplifyDalen(a)
if result:
print "There are duplicates!"
else:
print "There are no duplicates!"
So far my bit index idea is the fastest (because it's one pass algorithm, I guess, not many to many)
When you xor two same numbers, you get 0. As you should know.
from operator import xor
def check (lst):
dup = 0
for x in lst:
for y in lst:
dup += xor(x, y)!=0
l = len(lst)
return dup!=(l**2 -l)
c = check([0,1,2,3,4,5,4,3,5])
if c:
print "There are duplicates!"
else:
print "There are no duplicates!"
BTW, this is extremely stupid way to do it. XORing is fast, but O(n**2) (always through whole set) is unnecessary loss.
For one thing, the iteration should be stopped when first duplicate is encountered.
For another, this really should be done using set() or dict().
But you can get the idea.
Also, using xor() function instead of bitwise operator '^' slows the thing down a bit. I did it for clarity as I complicated the rest of the code. And so that people know that it exists as an alternative.
Here is an example on how to do it better.
It's a slight modification of code suggested by Organis in comments.
def check (lst):
l = len(lst)
# I skip comparing first with first and last with last
# to prevent 4 unnecessary iterations. It's not much, but it makes sense.
for x in xrange(1, l):
for y in xrange(l-1):
# Skip elements on same position as they will always xor to 0 :D
if x!=y: # Can be (if you insist): if x^y != 0:...
if (lst[x] ^ lst[y])==0:
return 1 # Duplicate found
return 0 # No duplicates
c = check([0,1,2,3,4,5,4,3,5])
if c:
print "There are duplicates!"
else:
print "There are no duplicates!"
To repeat, XOR is not to be used for comparison, at least not in high-level programming languages. It's possible you would need similar thing in assembly for some reason, but eq works nicely there as anywhere. If we simply used == here instead, we would be able to check for duplicates in lists containing anything, not just integers.
XOR is fancy for other uses, as for ordinary bitwising (masking, unmasking, changing bits...), so in encryption systems and similar stuff.
XOR takes the binary repersentation of a number and then compares it bit-by-bit with another and outputs a 1 if the two bits are different and a 0 otherwise. Example: 1 ^ 2 = 3 because in binary 1 is 01 and 2 is 10 so comparing bit-by-bit we get 11 or 3. XORing a number with itself always gives 0, so we can use this property to check if two numbers are the same. There are myriad better ways to check for duplicates in a list than using XOR, but if you want / need to do it this way, hopefully the above information gives you an idea of where to start.
Well, lets use list items as bit indices:
def dupliXor(source):
bigs = 0
for x in source:
b = 1<<x
nexts = bigs ^ b
# if xor removes the bit instead
# of adding it then it is duplicate
if nexts < bigs:
print True
return
bigs = nexts
print False
a = [1,2,3,4,5,19,22,25,17,6,73,72,71,77,899,887,44,124,1]
dupliXor(a) # True
a = [1,2,3,4,5,19,22,25,17,6,73,72,71,77,899,887,44,124]
dupliXor(a) # False
In python you can do list.pop(i) which removes and returns the element in index i, but is there a built in function like list.remove(e) where it removes and returns the first element equal to e?
Thanks
I mean, there is list.remove, yes.
>>> x = [1,2,3]
>>> x.remove(1)
>>> x
[2, 3]
I don't know why you need it to return the removed element, though. You've already passed it to list.remove, so you know what it is... I guess if you've overloaded __eq__ on the objects in the list so that it doesn't actually correspond to some reasonable notion of equality, you could have problems. But don't do that, because that would be terrible.
If you have done that terrible thing, it's not difficult to roll your own function that does this:
def remove_and_return(lst, item):
return lst.pop(lst.index(item))
Is there a builtin? No. Probably because if you already know the element you want to remove, then why bother returning it?1
The best you can do is get the index, and then pop it. Ultimately, this isn't such a big deal -- Chaining 2 O(n) algorithms is still O(n), so you still scale roughly the same ...
def extract(lst, item):
idx = lst.index(item)
return lst.pop(idx)
1Sure, there are pathological cases where the item returned might not be the item you already know... but they aren't important enough to warrant a new method which takes only 3 lines to write yourself :-)
Strictly speaking, you would need something like:
def remove(lst, e):
i = lst.index(e)
# error if e not in lst
a = lst[i]
lst.pop(i)
return a
Which would make sense only if e == a is true, but e is a is false, and you really need a instead of e.
In most case, though, I would say that this suggest something suspicious in your code.
A short version would be :
a = lst.pop(lst.index(e))
The first part of the question is to check if input A and input B are anagrams, which I can do easily enough.
s = input ("Word 1?")
b = sorted(s)
c = ''.join(b)
t = input("Word 2?")
a = sorted(t)
d = ''.join(b)
if d == c:
print("Anagram!")
else:
print("Not Anagram!")
The problem is the second part of the question - I need to check if two words are anagrams if all of the punctuation is removed, the upper case letters turned to lower case, but the question assumes no spaces are used. So, for example, (ACdB;,.Eo,."kl) and (oadcbE,LK) are anagrams. The question also asks for loops to be used.
s = input ("Word 1?")
s = s.lower()
for i in range (0, len(s)):
if ord(s[i]) < 97 or ord(s[i]) >122:
s = s.replace(s[i], '')
b = sorted(s)
c = ''.join(b)
print(c)
Currently, the above code is saying the string index is out of range.
Here's the loop you need to add, in psuedocode:
s = input ("Word 1?")
s_letters = ''
for letter in s:
if it's punctuation: skip it
else if it's uppercase: add the lowercase version to s_letters
else: add it to s_letters
b = sorted(s_letters)
Except of course that you need to add the same thing for t as well. If you've learned about functions, you will want to write this as a function, and call it twice, instead of copying and pasting it with minor changes.
There are three big problems with your loop. You need to solve all three of these, not just one.
First, s = s.replace(s[i], '') doesn't replace the ith character with a space, it replaces the ith character and every other copy of the same character with a space. That's going to screw up the rest of your loop if there are any duplicates. It's also very slow, because you have to search the entire string over and over again.
The right way to replace the character at a specific index is to use slicing: s = s[:i] + s[i+1:].
Or, you could make this a lot simpler by turning the string into a list of characters (s = list(s)), you can mutate it in-place (del s[i]).
Next, we're going through the loop 6 times, checking s[0], s[1], s[2], s[3], s[4], and s[5]. But somewhere along the way, we're going to remove some of the characters (ideally three of them). So some of those indices will be past the end of the string, which will raise an IndexError. I won't explain how to fix this yet, because it ties directly into the next problem.
Modifying a sequence while you loop over it always breaks your loop.* Imagine starting with s = '123abc'. Let's step through the loop.
i = 0, so you check s[0], which is 1, so you remove it, leaving s = '23abc'.
i = 1, so you check s[1], which is 3, so you remove it, leaving s = '2abc'.
i = 2, so you check s[2], which is b, so you leave it, leaving s = '2abc'.
And so on.
The 2 got moved to s[0] by removing the 1. But you're never going to come back to i = 0 once you've passed it. So, you're never going to check the 2. You can solve this in a few different ways—iterating backward, doing a while instead of an if each time through the for, etc.—but most of those solutions will just exacerbate the previous problem.
The easy way to solve both problems is to just not modify the string while you loop over it. You could do this by, e.g., building up a list of indexes to remove as you go along, then applying that in reverse order.
But a much easier way to do it is to just build up the characters you want to keep as you go along. And that also solves the first problem for your automatically.
So:
new_s = []
for i in range (0, len(s)):
if ord(s[i]) < 97 or ord(s[i]) >122:
pass
else:
new_s.append(s[i])
b = sorted(new_s)
And with that relative minor change, your code works.
While we're at it, there are a few ways you're overcomplicating things.
First, you don't need to do ord(s[i]) < 97; you can just do s[i] < 'a'. This makes things a lot more readable.
But, even more simply, you can just use the isalpha or islower method. (Since you've already converted to lower, and you're only dealing with one character at a time, it doesn't really matter which.) Besides being more readable, and harder to get wrong, this has the advantage of working with non-ASCII characters, like é.
Finally, you almost never want to write a loop like this:
for i in range(len(s)):
That forces you to write s[i] all over the place, when you could have just looped over s in the first place:
for ch in s:
So, putting it all together, here's your code, with the two simple fixes, and the cleanup:
s = input ("Word 1?")
s = s.lower()
new_s = []
for ch in s:
if ch.isalpha():
new_s.append(ch)
b = sorted(new_s)
c = ''.join(b)
print(c)
If you know about comprehensions or higher-order functions, you'll recognize this pattern as exactly what a list comprehension does. So, you can turn the whole 4 lines of code that build new_s into either of these one-liners, which are more readable as well as being shorter:
new_s = (ch for ch in s if ch.isalpha)
new_s = filter(str.isalpha, s)
And in fact, the whole thing can become a one-liner:
b = sorted(ch for ch in s.lower() if ch.isalpha)
But your teacher asked you to use a for statement, so you'd better keep it as a for statement.
* This isn't quite true. If you only modify the part of the sequence after the current index, and you make sure the sequence aways has the right length by the time you get to each index even though it may have had a different length before you did (using a while loop instead of a for loop, to reevaluate len(seq) each time, makes this part trivial instead of hard), then it works. But it's easier to just never do it to than learn the rules and carefully analyze your code to see if you're getting away with it this time.