How to understand python slicing with a negative k index? - python

Can someone explain why a[:5:-1] != a[:5][::-1]?
>>> a = range(10)
>>> a[:5][::-1]
[4, 3, 2, 1, 0]
>>> a[:5:-1]
[9, 8, 7, 6]

The general syntax of slicings is
a[start:stop:step]
You can omit any of the three values start, stop, or step. If you omit step, it always defaults to 1. The default values of start and stop, by contrast, depend on the sign of step: if step is positive, start defaults to 0 and stop to len(a). If step is negative, start defaults to len(a) - 1 and stop to "beginning of the list".
So a[:5:-1] is the same as a[9:5:-1] here,
while a[:5][::-1] is the same as a[0:5][4::-1].
(Note that it's impossible to give the default value for stop explicitly if step is negative. The stop value is non-inclusive, so 0 would be different from "beginning of the list". Using None would be equivalent to giving no value at all.)

What a[:5][::-1] says is that program should firstly take elements until 5th element of the dataset and then reverse them (take every one element starting with the last one).
Contrary to that, a[:5:-1] says that you should take elements until 5th element starting with the last element (take every one element starting with the last one).

a[:5] returns an array, indexes 0 through 4, that you're then negatively indexing as a second operation. a[:5:-1] indexes the original array negatively.

Related

How do we use minus numbers for step in range

I tried to use -1 for the step in range to reverse the list, in the first code it gave an empty list and in the second, I got what I wanted.
print(list(range(0, 5, -1)))
# Output: []
print(list(range(5, -1, -1)))
# Output: [5, 4, 3, 2, 1, 0]
How do we understand this?
The range generator does a sanity check. If start is lower than end and step is negative then that's impossible - hence the empty list. In your case, you can never get to 5 by decrementing from zero.
In the second case, the range generator will stop generating when end has been reached - i.e. it does not generate the value of end
range(0, 5, -1) -> starting at 0, you cannot reach 5 by successively adding -1. No numbers are encountered along the way.
range(5, -1, -1) -> starting at 5, you can reach -1 by successively adding -1, and it yields the numbers that it'll encounter along the way.
range function is working like this if you add 3 arguments :
range(begin, end, step)
In this case you will begin from begin value, and by implementing the step, reach the (end - abs(step)) value at the end. Remember in python as the end is not reached (which differs from other languages such as Matlab for example).
You must take care of your implementation step according to the value you add as begin and end. It must be consistent, as #Kache was saying.

Why does a[1:-1:-1] with a=[1,2,3] return []?

I am observing that if a is a list (or a numpy array) with elements [1,2,3] and I ask for a[1:-1:-1], then I get the empty list. I would expect to get [2,1] assuming that the slicing spans the indexes obtainable decrementing from 1 to -1 excluding the last value (that is excluding -1), that is indexes 1 and 0.
The actual behavior may have some justification but makes things more complex than expected when one needs to take a subarray of an array a starting from some generic index i to index i+m (excluded) in reverse order. One would tend to write a[i+m-1:i-1:-1] but this suddenly breaks if i is set to 0. The fact that it works for all i but zero looks like a nasty inconsistency. Obviously, there are workarounds:
one could write a[i+m-1-n:i-1-n:-1] offsetting everything by -n where n is the array length; or
one could write a[i:i+m][::-1].
However, in case 1 the need to know the array length appears rather unnatural and in case 2 the double indexing appears as a not very justified overhead if the slicing is done in a tight loop.
Is there any important reason that I am missing for which it is important that the behavior is as it is?
Has this issue been considered by the NumPy community?
Is there some better workaround than those I came up with?
Numpy has adopted this behavior from Python's sequence indexing for which the rules are explained here (for some history see below). Specifically footnote (5) reads:
The slice of s from i to j with step k is defined as the sequence of items with index x = i + n*k such that 0 <= n < (j-i)/k. In other words, the indices are i, i+k, i+2*k, i+3*k and so on, stopping when j is reached (but never including j). When k is positive, i and j are reduced to len(s) if they are greater. When k is negative, i and j are reduced to len(s) - 1 if they are greater. If i or j are omitted or None, they become “end” values (which end depends on the sign of k). Note, k cannot be zero. If k is None, it is treated like 1.
So the indices are generated from multipliers n subject to 0 <= n < (j-i)/k. For your specific example (j-i)/k < 0 and hence no indices are computed.
For Numpy arrays a[i:i+m][::-1] generates a view of the underlying array, i.e. it has negligible overhead and thus appears to be a valid solution. It clearly conveys the intent, namely "take a subarray of an array a starting from some generic index i to index i+m (excluded) in reverse order".
Alternatively, you can use None as the stop argument if i is zero:
a[i+m-1:(None if i==0 else i-1):-1]
History
Originally, Python implemented slicing syntax via __getslice__ (see also here) which didn't allow a step argument, i.e. it only used the 2-argument form: a[i:j]. This was implemented by built-in sequences such as list. Back then, around 1995, the predecessor of Numpy, Numerical Python, was developed and discussed within the MATRIX-SIG (special interest group). This predecessor implemented a specific Slice type which could be used to also specify a so called stride (now step) in a form very similar to today's slice: e.g. a[Slice(None, None, 2)]. It was asked to extend Python's syntax to allow for the 3-form slicing known today: a[::2] (see e.g. this thread). This got implemented in form of the slice type and would be passed to __getitem__ instead of __getslice__. So back then, a[i:j] was resolved as a.__getslice__(i, j) while a[i:j:k] was resolved as a.__getitem__(slice(i, j, k)). Back then, Numerical Python even allowed "reverse" slicing with the 2-form, interpreting the second argument as the stride (see the docs; e.g. a[i:-1] was equivalent to a[i::-1] for an array object a). Indexing of arrays was oriented at how indexing for Python sequences worked: including the start index, excluding the stop index (see here). This applied to negative stride (step) as well, hence providing the behavior that can be observed today. The decision was probably based on the principle of least surprise (for "standard" Python users).
It took a long time until Python 2.3 where the extended slicing feature including a step was implemented for the built-in types (see what's new and the docs; note that the 2.3 version of the docs contained a wrong description of slicing with step which was fixed for the 2.4 release).
-1 as an index has a special meaning [1], it's replaced with the highest possible = last index of a list.
So a[1:-1:-1] becomes a[1:2:-1] which is empty.
[1] Actually, all negative indices in Python work like that. -1 means the last element of a list, -2 the second-to-last, -3 the one before that and so on.
List[1:-1:-1] means List[start index : end index : jump]
Indexing in List:
Number
1
2
3
Index
0
1
2
Index
-3
-2
-1
So, if we take list a[1,2,3] and find list of a[1:-1:-1] means starting index = 1, ending index = -1, jump = -1
So, list traversing through the
index 1 (i.e. number=2) to index -1 (i.e. number=3) but jump = -1 (means backward position)
So, return an empty list i.e. []
As others noted -1 as end point has special meaning
In [66]: a=[1,2,3]
Slice back to the beginning is best done with None:
In [68]: a[1::-1]
Out[68]: [2, 1]
In [69]: a[1:None:-1]
Out[69]: [2, 1]
Working with slices that could cross boundaries, either side can be tricky:
In [75]: [a[i+2-1:i-1:-1] for i in range(4)]
Out[75]: [[], [3, 2], [3], []]
simplify a bit:
In [77]: [a[i+2:i:-1] for i in range(-1,3)]
Out[77]: [[], [3, 2], [3], []]
We can correct the lower boundary by using a if clause:
In [78]: [a[i+2:None if i<0 else i:-1] for i in range(-1,3)]
Out[78]: [[2, 1], [3, 2], [3], []]

Non-tail recursion within a for loop

Given an array of numbers, find the length of the longest increasing subsequence in the array. The subsequence does not necessarily have to be contiguous.
For example, given the array [0, 8, 4, 12, 2, 10, 6, 14, 1, 9, 5, 13, 3, 11, 7, 15], the longest increasing subsequence has length 6: it is 0, 2, 6, 9, 11, 15.
One of the solutions to the above problem uses non-tail recursion within a for loop, and I am having trouble making sense of it. I don't understand when the code after the recursive call in the for loop is executed, and I can't visualize the entire execution process of the whole solution.
def longest_increasing_subsequence(arr):
if not arr:
return 0
if len(arr) == 1:
return 1
max_ending_here = 0
for i in range(len(arr)):
ending_at_i = longest_increasing_subsequence(arr[:i])
if arr[-1] > arr[i - 1] and ending_at_i + 1 > max_ending_here:
max_ending_here = ending_at_i + 1
return max_ending_here
The description of the solution is as follows:
Assume that we already have a function that gives us the length of the longest increasing subsequence. Then we’ll try to feed some part of our input array back to it and try to extend the result. Our base cases are: the empty list, returning 0, and an array with one element, returning 1.
Then,
For every index i up until the second to last element, calculate longest_increasing_subsequence up to there.
We can only extend the result with the last element if our last element is greater than arr[i] (since otherwise, it’s not increasing).
Keep track of the largest result.
Source: https://www.dailycodingproblem.com/blog/longest-increasing-subsequence/
**EDITS**:
What I mean by I don't understand when the code after the recursive call in the for loop is executed. Here is my understanding:
Some code calls lis([0, 8, 4, 12, 2]).
arr = [0, 8, 4, 12, 2] doesn't meet either of the two base cases.
The for loop makes the first call when i = 0 in the line ending_at_i = lis([]). This is the first base case, so it returns 0. I can't understand why control doesn't return to the for loop so that ending_at_i is set to 0, and the if condition is executed (because it surely isn't checked else [][-1] would throw an error), after which we can move on to the for loop making the second call when i = 1, third call when i = 2 which would branch into two calls, and so on.
Here's how this function works. Fist, it handles the degenerate cases where the list length is 0 or 1.
It then looks for the solution when the list length is >= 2. There are two possibilities for the longest sequence: (1) It may contain the last number in the list, or (2) It may not contain the last number in the list.
For case (1), if the last number in the list is in the longest sequence, then the number before it in the longest sequence must be one of the earlier numbers. Suppose the number before it in the sequence is at position x. Then the longest sequence is the longest sequence taken from the numbers in the list up to and including x, plus the last number in the list. So it recurses on all of the possible positions of x, which are 0 through the list length minus 2. It iterates i over range(len(arr)), which is 0 through len(arr)-1). But it then uses i as the upper bound in the slice, so the last element in the slice corresponds to indices -1 through len(arr)-2. In the case of -1, this is an empty slice, which handles the case where all values in the list before the last are >= the last element.
This handles case (1). For case (2), we just need to find the largest sequence from the sublist that excludes the last element. However, this check is missing from the posted code, which is why the wrong answer is given for a list like [1, 2, 3, 0]:
>>> longest_increasing_subsequence([1, 2, 3, 0])
0
>>>
Obviously the correct answer in this case is 3, not 0. This is fairly easy to fix, but somehow was left out of the posted version.
Also, as others have pointed out, creating a new slice each time it recurses is unnecessary and inefficient. All that's needed is to pass the length of the sublist to achieve the same result.
Here is a (hopefully good enough) explanation:
ending_at_i = the length of the LIS when you clip arr at the i-th index (that is, considering elements arr[0], arr[1], ..., arr[i-1].
if arr[-1] > arr[i - 1] and ending_at_i + 1 > max_ending_here
if arr[-1] > arr[i - 1] = if the last element of arr is greater than the last element of the part of arr correponding to ending_at_i
if ending_at_i + 1 > max_ending_here = if appending the last element of arr to the LIS found during computing ending_at_i is larger than the current best LIS
The recursive step is then:
Let an oracle tell you the length of the LIS in arr[:i] (= arr[0], arr[1], ..., arr[i-1])
realize that, if the last element of arr, that is, arr[-1], is larger than the last element of arr[:i], then whatever the LIS inside arr[:i] was, if you take it and append arr[-1], it will still be an LIS, except that it will be one element larger
Check whether arr[-1] is actually larger than arr[i-1], (= arr[:i][-1])
Check whether appending arr[-1] to the LIS of arr[:i] creates the new optimal solution
Repeat 1., 2., 3. for i in range(len(arr)).
The result will be the knowledge of the length of the LIS inside arr.
All that being said, since the recursive substep of this algorithm runs in O(n), there are very few worse feasible solutions to the problem.
You tagged dynamic programming, however, this is precisely the anti-example of such. Dynamic programming lets you reuse the solutions to subproblems, which is precisely what this algorithm doesn't do, hence wasting time. Check out a DP solution instead.

starting range() as len(<some_list>) VS starting range() as len(<some_list>)-1 [Inclusive/Exclusive end]

I can't seem to understand when does the end condition is included and when it's not depending of the start as a length of list or length of list deducted by one (last position) , without modifying the end with the step value
I wanted to pop elements from a list within a loop
(Note: I know that setting range(0,len(colors_list)) like this will do the trick too)
colors_list = ["green","blue","yellow","pink","violet","black"]
I tried this snippet of code, and the end was included :-
for color in range(len(colors_list),0,-1):
colors_list.pop()
print(colors_list)
Output : []
and I tried this too, but here the end was excluded :-
for color in range(len(colors_list)-1,0,-1):
colors_list.pop()
print(colors_list)
Output : ["green"]
I understand the second trial as the 0th index is not popped, but the first trial is what I dont understand, like isn't it supposed to be the same as the second ? considering that it stop befrore the 0th index ? but the first element is popped instead.
The end was excluded in both cases. You ran the loop one fewer times in the second case, because you added the -1 to the start condition. The end value is always exclusive, the start value is always inclusive.
Just listify and print the ranges and you'll see:
>>> colors_list = ["green","blue","yellow","pink","violet","black"]
>>> print(list(range(len(colors_list),0,-1)))
[6, 5, 4, 3, 2, 1]
>>> print(list(range(len(colors_list)-1,0,-1)))
[5, 4, 3, 2, 1]
No 0 on either, you just began the loop from 6 on one, and 5 on the other.

How to explain the reverse of a sequence by slice notation a[::-1]

From the python.org tutorial
Slice indices have useful defaults; an omitted first index defaults to zero, an omitted second index defaults to the size of the string being sliced.
>>> a = "hello"
>>> print(a[::-1])
olleh
As the tutorial says a[::-1] should equals to a[0:5:-1]
but a[0:5:-1] is empty as follows:
>>> print(len(a[0:5:-1]))
0
The question is not a duplicate of explain-slice-notation. That question is about the general use of slicing in python.
I think the docs are perhaps a little misleading on this, but the optional arguments of slicing if omitted are the same as using None:
>>> a = "hello"
>>> a[::-1]
'olleh'
>>> a[None:None:-1]
'olleh'
You can see that these 2 above slices are identical from the CPython bytecode:
>>> import dis
>>> dis.dis('a[::-1]') # or dis.dis('a[None:None:-1]')
1 0 LOAD_NAME 0 (a)
3 LOAD_CONST 0 (None)
6 LOAD_CONST 0 (None)
9 LOAD_CONST 2 (-1)
12 BUILD_SLICE 3
15 BINARY_SUBSCR
16 RETURN_VALUE
For a negative step, the substituted values for None are len(a) - 1 for the start and -len(a) - 1 for the end:
>>> a[len(a)-1:-len(a)-1:-1]
'olleh'
>>> a[4:-6:-1]
'olleh'
>>> a[-1:-6:-1]
'olleh'
This may help you visualize it:
h e l l o
0 1 2 3 4 5
-6 -5 -4 -3 -2 -1
You are confused with the behavior of the stepping. To get the same result, what you can do is:
a[0:5][::-1]
'olleh'
Indeed, stepping wants to 'circle' around backwards in your case, but you are limiting it's movement by calling a[0:5:-1].
All it does is slice. You pick. start stop and step so basically you're saying it should start at the beginning until the beginning but going backwards (-1).
If you do it with -2 it will skip letters:
>>> a[::-2]
'olh'
When doing [0:5:-1] your'e starting at the first letter and going back directly to 5 and thus it will stop. only if you try [-1::-1] will it correctly be able to go to the beginning by doing steps of negative 1.
Edit to answer comments
As pointed out the documentation says
an omitted second index defaults to the size of the string being
sliced.
Lets assume we have str with len(str) = 5. When you slice the string and omit, leave out, the second number it defaults to the length of the string being sliced, in this case - 5.
i.e str[1:] == str[1:5], str[2:] == str[2:5]. The sentence refers to the length of the original object and not the newly sliced object.
Also, this answer is great
a[0:5:-1] does not make much sense, since when you use this notation the indices mean: a[start:end:step]. When you use a negative step your end value needs to be at an "earlier" position than your start value.
You'll notice that the third slice argument, the step, is not presented in the part of the tutorial you quoted. That particular snippet assumes a positive step.
When you add in the possibility of a negative step, the behavior is actually pretty intuitive. An empty start parameter refers to whichever end of the sequence one would start at to step through the whole sequence in the direction indicated by the step value. In other words it refers to the lowest index (to count up) if you have a positive step, and the highest index (to count down) if you have a negative step. Likewise, an empty end parameter refers to whichever end of the sequence one would end up at after stepping through in the appropriate direction.
The docs simply aren't correct about the default values as you've pointed out. However, they're consistent other than that minor error. You can view the docs I am referring to here: https://docs.python.org/3/library/stdtypes.html#common-sequence-operations
Note that the behavior is definitionaly correct according to the docs:
The slice of s from i to j with step k is defined as the sequence of
items with index x = i + n*k such that 0 <= n < (j-i)/k. In other
words, the indices are i, i+k, i+2*k, i+3*k and so on, stopping when j
is reached (but never including j).
When you do:
>>> a = "hello"
>>> y = a[0:5:-1]
we have that i == 0, j == 5, and k == -1. So we are grabbing items at index x = i + n*k for n starting at 0 and going up to (j-i)/k. However, observe that (j-i)/k == (5-0)/-1 == -5. There are no n such that 0 <= n < -5, so you get the empty string:
>>> y
''
Do a[start:stop][::step] when in doubt (it's almost always what we want)
It's almost always the case that when you pass a negative step to something like x[start:stop:step], what you want to happen is for the sub selection to happen first, and then just go backwards by step (i.e. we usually want x[start:stop][::step].
Futhermore, to add to the confusion, it happens to be the case that
x[start:stop:step] == x[start:stop][::step]
if step > 0. For example:
>>> x = list(range(10))
>>> x
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
>>> x[2:6:2]
[2, 4]
>>> x[2:6][::2]
[2, 4]
>>> x[1:10][::3]
[1, 4, 7]
>>> x[1:10:3]
[1, 4, 7]
Unfortunately, this doesn't hold when step < 0, even though it's tempting to think that it should.
After being burned by this a couple times, I realized it's just safer to always do the step clause after you perform the start:stop slice. So I almost always start with y = x[start:stop][::step], at least when prototyping or creating a new module where correctness/readability is the primiary concern. This is less performant than doing a single slice, but if performance is an issue, then you can do the less readable:
y = x[start:stop:step] if step > 0 else x[stop:start:step]
HTH.
For Python slicing for a sequence[start:stop:step], have derived these rules:
start:stop = start:stop:1
start:stop:(+ or -) step - It means when traversing skip N items in the sequence. However, (-) indicates backward traversal
Remember, position of last item in sequence is -1, and the one before than is -2, and so on..
# start:stop: +step Rules
Always traverse in forward
Always start from beginning of sequence as its a positive step ( forward )
Start at requested position, stop at requested position but exclude the item stop position
Default start: If start is not provided, start at 0
Default stop: if stop is not provided, it means until the end of the sequence including the last value
If item at stop position is not reachable (item is beyond the end of sequence traversal), slice does not return anything
# start:stop:-step Rules
Always traverse in reverse
If start position is provided, start from there, but traverse in reverse ( its a step back )
If stop is provided, stop traversing there but exclude this
Default start: If start position is not provided, start position is the last position of the sequence ( since negative traversal)
Default stop: If stop is not provided, it is the beginning of the list ( position 0)
If item at stop position is not reachable (item is beyond the end of sequence traversal), slice does not return anything

Categories

Resources