So I just came across what seems to me like a strange Python feature and wanted some clarification about it.
The following array manipulation somewhat makes sense:
p = [1,2,3]
p[3:] = [4]
p = [1,2,3,4]
I imagine it is actually just appending this value to the end, correct?
Why can I do this, however?
p[20:22] = [5,6]
p = [1,2,3,4,5,6]
And even more so this:
p[20:100] = [7,8]
p = [1,2,3,4,5,6,7,8]
This just seems like wrong logic. It seems like this should throw an error!
Any explanation?
-Is it just a weird thing Python does?
-Is there a purpose to it?
-Or am I thinking about this the wrong way?
Part of question regarding out-of-range indices
Slice logic automatically clips the indices to the length of the sequence.
Allowing slice indices to extend past end points was done for convenience. It would be a pain to have to range check every expression and then adjust the limits manually, so Python does it for you.
Consider the use case of wanting to display no more than the first 50 characters of a text message.
The easy way (what Python does now):
preview = msg[:50]
Or the hard way (do the limit checks yourself):
n = len(msg)
preview = msg[:50] if n > 50 else msg
Manually implementing that logic for adjustment of end points would be easy to forget, would be easy to get wrong (updating the 50 in two places), would be wordy, and would be slow. Python moves that logic to its internals where it is succint, automatic, fast, and correct. This is one of the reasons I love Python :-)
Part of question regarding assignments length mismatch from input length
The OP also wanted to know the rationale for allowing assignments such as p[20:100] = [7,8] where the assignment target has a different length (80) than the replacement data length (2).
It's easiest to see the motivation by an analogy with strings. Consider, "five little monkeys".replace("little", "humongous"). Note that the target "little" has only six letters and "humongous" has nine. We can do the same with lists:
>>> s = list("five little monkeys")
>>> i = s.index('l')
>>> n = len('little')
>>> s[i : i+n ] = list("humongous")
>>> ''.join(s)
'five humongous monkeys'
This all comes down to convenience.
Prior to the introduction of the copy() and clear() methods, these used to be popular idioms:
s[:] = [] # clear a list
t = u[:] # copy a list
Even now, we use this to update lists when filtering:
s[:] = [x for x in s if not math.isnan(x)] # filter-out NaN values
Hope these practical examples give a good perspective on why slicing works as it does.
The documentation has your answer:
s[i:j]: slice of s from i to j (note (4))
(4) The slice of s from i to j is defined as the sequence of items
with index k such that i <= k < j. If i or j is greater than
len(s), use len(s). If i is omitted or None, use 0. If j
is omitted or None, use len(s). If i is greater than or equal to
j, the slice is empty.
The documentation of IndexError confirms this behavior:
exception IndexError
Raised when a sequence subscript is out of range. (Slice indices are silently truncated to fall in the allowed range; if an index is
not an integer, TypeError is raised.)
Essentially, stuff like p[20:100] is being reduced to p[len(p):len(p]. p[len(p):len(p] is an empty slice at the end of the list, and assigning a list to it will modify the end of the list to contain said list. Thus, it works like appending/extending the original list.
This behavior is the same as what happens when you assign a list to an empty slice anywhere in the original list. For example:
In [1]: p = [1, 2, 3, 4]
In [2]: p[2:2] = [42, 42, 42]
In [3]: p
Out[3]: [1, 2, 42, 42, 42, 3, 4]
Related
Why doesn't 'example'[999:9999] result in error? Since 'example'[9] does, what is the motivation behind it?
From this behavior I can assume that 'example'[3] is, essentially/internally, not the same as 'example'[3:4], even though both result in the same 'm' string.
You're correct! 'example'[3:4] and 'example'[3] are fundamentally different, and slicing outside the bounds of a sequence (at least for built-ins) doesn't cause an error.
It might be surprising at first, but it makes sense when you think about it. Indexing returns a single item, but slicing returns a subsequence of items. So when you try to index a nonexistent value, there's nothing to return. But when you slice a sequence outside of bounds, you can still return an empty sequence.
Part of what's confusing here is that strings behave a little differently from lists. Look what happens when you do the same thing to a list:
>>> [0, 1, 2, 3, 4, 5][3]
3
>>> [0, 1, 2, 3, 4, 5][3:4]
[3]
Here the difference is obvious. In the case of strings, the results appear to be identical because in Python, there's no such thing as an individual character outside of a string. A single character is just a 1-character string.
(For the exact semantics of slicing outside the range of a sequence, see mgilson's answer.)
For the sake of adding an answer that points to a robust section in the documentation:
Given a slice expression like s[i:j:k],
The slice of s from i to j with step k is defined as the sequence of items with index x = i + n*k such that 0 <= n < (j-i)/k. In other words, the indices are i, i+k, i+2*k, i+3*k and so on, stopping when j is reached (but never including j). When k is positive, i and j are reduced to len(s) if they are greater
if you write s[999:9999], python is returning s[len(s):len(s)] since len(s) < 999 and your step is positive (1 -- the default).
Slicing is not bounds-checked by the built-in types. And although both of your examples appear to have the same result, they work differently; try them with a list instead.
Why doesn't 'example'[999:9999] result in error? Since 'example'[9] does, what is the motivation behind it?
From this behavior I can assume that 'example'[3] is, essentially/internally, not the same as 'example'[3:4], even though both result in the same 'm' string.
You're correct! 'example'[3:4] and 'example'[3] are fundamentally different, and slicing outside the bounds of a sequence (at least for built-ins) doesn't cause an error.
It might be surprising at first, but it makes sense when you think about it. Indexing returns a single item, but slicing returns a subsequence of items. So when you try to index a nonexistent value, there's nothing to return. But when you slice a sequence outside of bounds, you can still return an empty sequence.
Part of what's confusing here is that strings behave a little differently from lists. Look what happens when you do the same thing to a list:
>>> [0, 1, 2, 3, 4, 5][3]
3
>>> [0, 1, 2, 3, 4, 5][3:4]
[3]
Here the difference is obvious. In the case of strings, the results appear to be identical because in Python, there's no such thing as an individual character outside of a string. A single character is just a 1-character string.
(For the exact semantics of slicing outside the range of a sequence, see mgilson's answer.)
For the sake of adding an answer that points to a robust section in the documentation:
Given a slice expression like s[i:j:k],
The slice of s from i to j with step k is defined as the sequence of items with index x = i + n*k such that 0 <= n < (j-i)/k. In other words, the indices are i, i+k, i+2*k, i+3*k and so on, stopping when j is reached (but never including j). When k is positive, i and j are reduced to len(s) if they are greater
if you write s[999:9999], python is returning s[len(s):len(s)] since len(s) < 999 and your step is positive (1 -- the default).
Slicing is not bounds-checked by the built-in types. And although both of your examples appear to have the same result, they work differently; try them with a list instead.
Why doesn't 'example'[999:9999] result in error? Since 'example'[9] does, what is the motivation behind it?
From this behavior I can assume that 'example'[3] is, essentially/internally, not the same as 'example'[3:4], even though both result in the same 'm' string.
You're correct! 'example'[3:4] and 'example'[3] are fundamentally different, and slicing outside the bounds of a sequence (at least for built-ins) doesn't cause an error.
It might be surprising at first, but it makes sense when you think about it. Indexing returns a single item, but slicing returns a subsequence of items. So when you try to index a nonexistent value, there's nothing to return. But when you slice a sequence outside of bounds, you can still return an empty sequence.
Part of what's confusing here is that strings behave a little differently from lists. Look what happens when you do the same thing to a list:
>>> [0, 1, 2, 3, 4, 5][3]
3
>>> [0, 1, 2, 3, 4, 5][3:4]
[3]
Here the difference is obvious. In the case of strings, the results appear to be identical because in Python, there's no such thing as an individual character outside of a string. A single character is just a 1-character string.
(For the exact semantics of slicing outside the range of a sequence, see mgilson's answer.)
For the sake of adding an answer that points to a robust section in the documentation:
Given a slice expression like s[i:j:k],
The slice of s from i to j with step k is defined as the sequence of items with index x = i + n*k such that 0 <= n < (j-i)/k. In other words, the indices are i, i+k, i+2*k, i+3*k and so on, stopping when j is reached (but never including j). When k is positive, i and j are reduced to len(s) if they are greater
if you write s[999:9999], python is returning s[len(s):len(s)] since len(s) < 999 and your step is positive (1 -- the default).
Slicing is not bounds-checked by the built-in types. And although both of your examples appear to have the same result, they work differently; try them with a list instead.
Why doesn't 'example'[999:9999] result in error? Since 'example'[9] does, what is the motivation behind it?
From this behavior I can assume that 'example'[3] is, essentially/internally, not the same as 'example'[3:4], even though both result in the same 'm' string.
You're correct! 'example'[3:4] and 'example'[3] are fundamentally different, and slicing outside the bounds of a sequence (at least for built-ins) doesn't cause an error.
It might be surprising at first, but it makes sense when you think about it. Indexing returns a single item, but slicing returns a subsequence of items. So when you try to index a nonexistent value, there's nothing to return. But when you slice a sequence outside of bounds, you can still return an empty sequence.
Part of what's confusing here is that strings behave a little differently from lists. Look what happens when you do the same thing to a list:
>>> [0, 1, 2, 3, 4, 5][3]
3
>>> [0, 1, 2, 3, 4, 5][3:4]
[3]
Here the difference is obvious. In the case of strings, the results appear to be identical because in Python, there's no such thing as an individual character outside of a string. A single character is just a 1-character string.
(For the exact semantics of slicing outside the range of a sequence, see mgilson's answer.)
For the sake of adding an answer that points to a robust section in the documentation:
Given a slice expression like s[i:j:k],
The slice of s from i to j with step k is defined as the sequence of items with index x = i + n*k such that 0 <= n < (j-i)/k. In other words, the indices are i, i+k, i+2*k, i+3*k and so on, stopping when j is reached (but never including j). When k is positive, i and j are reduced to len(s) if they are greater
if you write s[999:9999], python is returning s[len(s):len(s)] since len(s) < 999 and your step is positive (1 -- the default).
Slicing is not bounds-checked by the built-in types. And although both of your examples appear to have the same result, they work differently; try them with a list instead.
Why doesn't 'example'[999:9999] result in error? Since 'example'[9] does, what is the motivation behind it?
From this behavior I can assume that 'example'[3] is, essentially/internally, not the same as 'example'[3:4], even though both result in the same 'm' string.
You're correct! 'example'[3:4] and 'example'[3] are fundamentally different, and slicing outside the bounds of a sequence (at least for built-ins) doesn't cause an error.
It might be surprising at first, but it makes sense when you think about it. Indexing returns a single item, but slicing returns a subsequence of items. So when you try to index a nonexistent value, there's nothing to return. But when you slice a sequence outside of bounds, you can still return an empty sequence.
Part of what's confusing here is that strings behave a little differently from lists. Look what happens when you do the same thing to a list:
>>> [0, 1, 2, 3, 4, 5][3]
3
>>> [0, 1, 2, 3, 4, 5][3:4]
[3]
Here the difference is obvious. In the case of strings, the results appear to be identical because in Python, there's no such thing as an individual character outside of a string. A single character is just a 1-character string.
(For the exact semantics of slicing outside the range of a sequence, see mgilson's answer.)
For the sake of adding an answer that points to a robust section in the documentation:
Given a slice expression like s[i:j:k],
The slice of s from i to j with step k is defined as the sequence of items with index x = i + n*k such that 0 <= n < (j-i)/k. In other words, the indices are i, i+k, i+2*k, i+3*k and so on, stopping when j is reached (but never including j). When k is positive, i and j are reduced to len(s) if they are greater
if you write s[999:9999], python is returning s[len(s):len(s)] since len(s) < 999 and your step is positive (1 -- the default).
Slicing is not bounds-checked by the built-in types. And although both of your examples appear to have the same result, they work differently; try them with a list instead.