What are the default slice indices *really*? - python

From the python documentation docs.python.org/tutorial/introduction.html#strings:
Slice indices have useful defaults; an omitted first index defaults to zero, an omitted second index defaults to the size of the string being sliced.
For the standard case, this makes a lot of sense:
>>> s = 'mystring'
>>> s[1:]
'ystring'
>>> s[:3]
'mys'
>>> s[:-2]
'mystri'
>>> s[-1:]
'g'
>>>
So far, so good. However, using a negative step value seems to suggest slightly different defaults:
>>> s[:3:-1]
'gnir'
>>> s[0:3:-1]
''
>>> s[2::-1]
'sym'
Fine, perhaps if the step is negative, the defaults reverse. An ommitted first index defaults to the size of the string being sliced, an omitted second index defaults to zero:
>>> s[len(s):3:-1]
'gnir'
Looking good!
>>> s[2:0:-1]
'sy'
Whoops. Missed that 'm'.
Then there is everyone's favorite string reverse statement. And sweet it is:
>>> s[::-1]
'gnirtsym'
However:
>>> s[len(s):0:-1]
'gnirtsy'
The slice never includes the value of the second index in the slice. I can see the consistency of doing it that way.
So I think I am beginning to understand the behavior of slice in its various permutations. However, I get the feeling that the second index is somewhat special, and that the default value of the second index for a negative step can not actually be defined in terms of a number.
Can anyone concisely define the default slice indices that can account for the provided examples? Documentation would be a huge plus.

There actually aren't any defaults; omitted values are treated specially.
However, in every case, omitted values happen to be treated in exactly the same way as None. This means that, unless you're hacking the interpreter (or using the parser, ast, etc. modules), you can just pretend that the defaults are None (as recursive's answer says), and you'll always get the right answers.
The informal documentation cited isn't quite accurate—which is reasonable for something that's meant to be part of a tutorial. For the real answers, you have to turn to the reference documentation.
For 2.7.3, Sequence Types describes slicing in notes 3, 4, and 5.
For [i:j]:
… If i is omitted or None, use 0. If j is omitted or None, use len(s).
And for [i:j:k]:
If i or j are omitted or None, they become “end” values (which end depends on the sign of k). Note, k cannot be zero. If k is None, it is treated like 1.
For 3.3, Sequence Types has the exact same wording as 2.7.3.

The end value is always exclusive, thus the 0 end value means include index 1 but not 0. Use None instead (since negative numbers have a different meaning):
>>> s[len(s)-1:None:-1]
'gnirtsym'
Note the start value as well; the last character index is at len(s) - 1; you may as well spell that as -1 (as negative numbers are interpreted relative to the length):
>>> s[-1:None:-1]
'gnirtsym'

I don't have any documentation, but I think the default is [None:None:None]
>>> "asdf"[None:None:None]
'asdf'
>>> "asdf"[None:None:-1]
'fdsa'

The notes in the reference documentation for sequence types explains this in some detail:
(5.) The slice of s from i to j with step k is defined as the sequence of items with index x = i + n*k such that 0 <= n < (j-i)/k. In other words, the indices are i, i+k, i+2*k, i+3*k and so on, stopping when j is reached (but never including j). If i or j is greater than len(s), use len(s). If i or j are omitted or None, they become “end” values (which end depends on the sign of k). Note, k cannot be zero. If k is None, it is treated like 1.
So you can get the following behaviour:
>>> s = "mystring"
>>> s[2:None:-1]
'sym'

Actually it is logical ...
if you look to the end value, it always points to the index after the last index.
So, using 0 as the end value, means it gets till element at index 1. So, you need to omit that value .. so that it returns the string you want.
>>> s = '0123456789'
>>> s[0], s[:0]
('0', '')
>>> s[1], s[:1]
('1', '0')
>>> s[2], s[:2]
('2', '01')
>>> s[3], s[:3]
('3', '012')
>>> s[0], s[:0:-1]
('0', '987654321')

Useful to know if you are implementing __getslice__: j defaults to sys.maxsize (https://docs.python.org/2/reference/datamodel.html#object.getslice)
>>> class x(str):
... def __getslice__(self, i, j):
... print i
... print j
...
... def __getitem__(self, key):
... print repr(key)
...
>>> x()[:]
0
9223372036854775807
>>> x()[::]
slice(None, None, None)
>>> x()[::1]
slice(None, None, 1)
>>> x()[:1:]
slice(None, 1, None)
>>> import sys
>>> sys.maxsize
9223372036854775807L

There are excellent answers and the best one is selected as accepted answer, but if you are looking for a way to wrap your head around default values for slice, then it helps to imagine list as having two ends. Starting with HEAD end then the first element and so on, until the TAIL end after the last element.
Now answering the actual question:
There are two defaults for the slices
Defaults when step is +ve
0:TAIL:+ve step
Defaults when step is -ve
HEAD:-1:-ve step

Great question. I thought I knew how slicing worked until I read this post. While your question title asks about "default slice indices" and that's been answered by abarnet, Martijn, and others, the body of your post suggests your real question is "How does slicing work". So, I'll take a stab at that..
Explanation
Given your example, s = “mystring”, you can imagine a set of positive and negative indices.
m y s t r i n g
0 1 2 3 4 5 6 7 <- positive indices
-8 -7 -6 -5 -4 -3 -2 -1 <- negative indices
We select slices of the form s[i:j:k]. The logic changes depending on whether k is positive or negative. I would describe the algorithm as follows.
if k is empty, set k = 1
if k is positive:
move right, from i (inclusive) to j (exclusive) stepping by abs(k)
if i is empty, start from the left edge
if j is empty, go til the right edge
if k is negative:
move left, from i (inclusive) to j (exclusive) stepping by abs(k)
if i is empty, start from the right edge
if j is empty, go til the left edge
(Note this isn't exactly pseudo code, as I intended it to be more comprehendible.)
Examples
>>> s[:3:]
'mys'
Here, k is empty so we set it equal to 1. Then since k is positive, we move right from i to j. Since i is empty, we start from the left edge and select everything up to but excluding the element at index 3.
>>> s[:3:-1]
'gnir'
Here, k is negative, so we move left from i to j. Since i is empty, we start from the right edge and select everything up to but excluding the element at index 3.
>>> s[0:3:-1]
''
Here, k is negative, so we move left from i to j. Since index 3 isn't to the left of index 0, no elements are selected and we get back the empty string.

Related

Why does `"abc"[10:11]` not produce an index error but `"abc"[10]` does? [duplicate]

Why doesn't 'example'[999:9999] result in error? Since 'example'[9] does, what is the motivation behind it?
From this behavior I can assume that 'example'[3] is, essentially/internally, not the same as 'example'[3:4], even though both result in the same 'm' string.
You're correct! 'example'[3:4] and 'example'[3] are fundamentally different, and slicing outside the bounds of a sequence (at least for built-ins) doesn't cause an error.
It might be surprising at first, but it makes sense when you think about it. Indexing returns a single item, but slicing returns a subsequence of items. So when you try to index a nonexistent value, there's nothing to return. But when you slice a sequence outside of bounds, you can still return an empty sequence.
Part of what's confusing here is that strings behave a little differently from lists. Look what happens when you do the same thing to a list:
>>> [0, 1, 2, 3, 4, 5][3]
3
>>> [0, 1, 2, 3, 4, 5][3:4]
[3]
Here the difference is obvious. In the case of strings, the results appear to be identical because in Python, there's no such thing as an individual character outside of a string. A single character is just a 1-character string.
(For the exact semantics of slicing outside the range of a sequence, see mgilson's answer.)
For the sake of adding an answer that points to a robust section in the documentation:
Given a slice expression like s[i:j:k],
The slice of s from i to j with step k is defined as the sequence of items with index x = i + n*k such that 0 <= n < (j-i)/k. In other words, the indices are i, i+k, i+2*k, i+3*k and so on, stopping when j is reached (but never including j). When k is positive, i and j are reduced to len(s) if they are greater
if you write s[999:9999], python is returning s[len(s):len(s)] since len(s) < 999 and your step is positive (1 -- the default).
Slicing is not bounds-checked by the built-in types. And although both of your examples appear to have the same result, they work differently; try them with a list instead.

Python: Cyclic indexing of lists [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 9 years ago.
Take the following example :
a = range(10)
We can proceed through the list from left to right as follows: a[0], a[1], ...., a[9]
Or in the other way around with negative indexes: a[-1], a[-2], a[-3], ....
It is also possible to index a range, e.g. a[from:to-1]
Because I know that the index of the last element is -1, I would say (theoretical thought) that a[0:0] should deliver the whole list, since a[0:0-1] is from 0 to -1 (including -1).
This is wrong, but why? It makes more sense to me than a[0:] (whole list)
EDIT:
So to make it simple (I'm just wondering!^^):
a[from:to-1] means: get elements from from to to. Ok, we want to get the whole list, means (following this reasoning): a[0:0] (which is the empty list), but hey 0-1 is the last element, right?
The indexing isn't "cyclic":
a = [0, 1, 2]
a[2] # 2
a[3] # IndexError, not 0
a[-3] # 0
a[-4] # IndexError, not 2
-3 as an index is just a shorthand for length-3.
range doesn't subtract 1 from the stop parameter. It increases (or decreases - let's assume it increases in this example) until it is greater than or equal to stop (since it is exclusive of the end point) and then it returns. a[0:0] should not deliver the whole array because you told it go from 0 to 0 non-inclusive of the end point 0, which is an empty range.
This is because when you slice, positive numbers are counted from the beginning of the list whereas negative numbers are counted from the end. When you slice you you take everything from the first index (inclusive) up to the second index (not inclusive). If there is nothing in that range, you get an empty list.
a[0:0]
would be a very confusing API for a lot of people. Sometimes it is helpful to think of what python is actually doing:
a[slice(0,None)]
which says that we start from 0 but there is no upper bound which is pythons way of saying that the upper bound is infinite -- therefore you take all the elements.
Of course, this could also be acomplsihed by:
a[:]
In which case there is no lower bound either ...
The slice s[i:j] is just defined that way:
If i or j is negative, the index is relative to the end of the string: len(s) + i or len(s) + j is substituted. But note that -0 is still 0.
The slice of s from i to j is defined as the sequence of items with index k such that i <= k < j. If i or j is greater than len(s), use len(s). If i is omitted or None, use 0. If j is omitted or None, use len(s). If i is greater than or equal to j, the slice is empty.
So a[0:0] gives you an empty list, because i is equal to j. And a[i:j] for negative j is translated to a[i:range(a) + j] before the slicing happens, so a[0:-1] itself wouldn’t be a valid slice (as i < j is not true), but as the translation happens before, it works.
It looks like you are asking about reverse iteration, not cyclic indexing. Reverse iteration is simple, using either slicing or reversed().
for i in range(10)[::-1]:
print i
for i in reversed(range(10)):
print i
Reversed is pretty self-explanatory. For slicing, the third value in the colon-delimited list is the step. If you specify a negative step, it will iterate backwards through the list. If you specify no start or stop, it will iterate through the entire list.

selecting sub-sequence confusion in python [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
The Python Slice Notation
I am confused with the way python subsequence selection works.
suppose i have this following code:
>>> t = 'hi'
>>> t[:3]
'hi'
>>> t[3:]
''
>>> print t[:3] + t[3:]
hi
>>> print t[3]
Traceback (most recent call last):
File "<pyshell#4>", line 1, in <module>
print t[3]
IndexError: string index out of range
please explain how this thing works in python
Subsequence, or slice, notation is forgiving. t[:3] will get you a slice of t from the beginning up to the end or the third element, whichever comes first, t[3:] will get you a slice of t from the third element if it exists through the end. Direct indexing such as t[3] is not forgiving; the indexed element must exist or else you get an exception. With slices, if the end index is out of range, you get the whole original list, if the start index is out of range, you get an empty list.
I always find it somewhat funny behavior of sequences that they allow slicing out of bounds. However, this is documented. Specifically in bullet point 4 which describes slicing of a sequence type:
The slice of s from i to j is defined as the sequence of items with index k such that i <= k < j. If i or j is greater than len(s), use len(s). If i is omitted or None, use 0. If j is omitted or None, use len(s). If i is greater than or equal to j, the slice is empty.
or bullet point 5 which describes slicing with the optional stride parameter:
The slice of s from i to j with step k is defined as the sequence of items with index x = i + n*k such that 0 <= n < (j-i)/k. In other words, the indices are i, i+k, i+2*k, i+3*k and so on, stopping when j is reached (but never including j). If i or j is greater than len(s), use len(s). If i or j are omitted or None, they become “end” values (which end depends on the sign of k). Note, k cannot be zero. If k is None, it is treated like 1
Note that if you look at point 3 (which describes s[index]), there is no corresponding transform of out-of-bounds indices to in-bounds-indices.
t[start:stop] prints all elements x with start <= x < stop. When some elements do not exist it simply does not print them.
t[index] on the other hand gives an error if there is no element at given index.
In your example only t[0]='h' and t[1]='i' exist which explaines your results.
print t[3:] should return nothing instead of 'hi' which is also the case at my python interpreter.

Why does substring slicing with index out of range work?

Why doesn't 'example'[999:9999] result in error? Since 'example'[9] does, what is the motivation behind it?
From this behavior I can assume that 'example'[3] is, essentially/internally, not the same as 'example'[3:4], even though both result in the same 'm' string.
You're correct! 'example'[3:4] and 'example'[3] are fundamentally different, and slicing outside the bounds of a sequence (at least for built-ins) doesn't cause an error.
It might be surprising at first, but it makes sense when you think about it. Indexing returns a single item, but slicing returns a subsequence of items. So when you try to index a nonexistent value, there's nothing to return. But when you slice a sequence outside of bounds, you can still return an empty sequence.
Part of what's confusing here is that strings behave a little differently from lists. Look what happens when you do the same thing to a list:
>>> [0, 1, 2, 3, 4, 5][3]
3
>>> [0, 1, 2, 3, 4, 5][3:4]
[3]
Here the difference is obvious. In the case of strings, the results appear to be identical because in Python, there's no such thing as an individual character outside of a string. A single character is just a 1-character string.
(For the exact semantics of slicing outside the range of a sequence, see mgilson's answer.)
For the sake of adding an answer that points to a robust section in the documentation:
Given a slice expression like s[i:j:k],
The slice of s from i to j with step k is defined as the sequence of items with index x = i + n*k such that 0 <= n < (j-i)/k. In other words, the indices are i, i+k, i+2*k, i+3*k and so on, stopping when j is reached (but never including j). When k is positive, i and j are reduced to len(s) if they are greater
if you write s[999:9999], python is returning s[len(s):len(s)] since len(s) < 999 and your step is positive (1 -- the default).
Slicing is not bounds-checked by the built-in types. And although both of your examples appear to have the same result, they work differently; try them with a list instead.

Python Syntax / List Slicing Question: What does this syntax mean?

lines = file('info.csv','r').readlines()
counts = []
for i in xrange(4):
counts.append(fromstring(lines[i][:-2],sep=',')[0:-1])
If anyone can explain this code to me, it would be greatly appreciated. I can't seem to find more advanced examples on slicing--only very simple ones that don't explain this situation.
Thank you very much.
A slice takes the form o[start:stop:step], all of which are optional. start defaults to 0, the first index. stop defaults to len(o), the closed upper bound on the indicies of the list. step defaults to 1, including every value of the list.
If you specify a negative value, it represents an offset from the end of the list. For example, [-1] access the last element in a list, and -2 the second last.
If you enter a non-1 value for step, you will include different elements or include them in a different order. 2 would skip every other element. 3 would skip two out of every three. -1 would go backwards through the list.
[:-2]
Since start is omitted, it defaults to the beginning of the list. A stop of -2 indicates to exclude the last two elements. So o[:-2] slices the list to exclude the last two elements.
[0:-1]
The 0 here is redundant, because it's what start would have defaulted to anyway. This is the same as the other slice, except that it only excludes the last element.
From the Data model page of the Python 2.7 docs:
Sequences also support slicing: a[i:j] selects all items with index k such that i <= k < j. When used as an expression, a slice is a sequence of the same type. This implies that the index set is renumbered so that it starts at 0.
Some sequences also support “extended slicing” with a third “step” parameter: a[i:j:k] selects all items of a with index x where x = i + n*k, n >= 0 and i <= x < j.
The "what's new" section of the Python 2.3 documentation discusses them as well, when they were added to the language.
A good way to understand the slice syntax is to think of it as syntactic sugar for the equivalent for loop. For example:
L[a:b:c]
Is equivalent to (e.g., in C):
for(int i = a; i < b; i += c) {
// slice contains L[i]
}
Where a defaults to 0, b defaults to len(L), and c defaults to 1.
(And if c, the step, is a negative number, then the default values of a and b are reversed. This gives a sensible result for L[::-1]).
Then the only other thing you need to know is that, in Python, indexes "wrap around", so that L[-1] signifies the last item in the list, L[-2] is the second to last, and so forth.
If list is a list then list[-1] is the last element of the list, list[-2] is the element before it and so on.
Also, list[a:b] means the list with all elements in list at positions between a and b. If one of them is missing, it is assumed to mean the end of the list. Thus, list[2:] is the list of all elements starting from list[2]. And list[:-2] is the list of all elements from list[0] to list[-2].
In your code, the [0:-1] part it the same as [:-1].

Categories

Resources