Python: Want to use a string as a slice specifier

Python: Want to use a string as a slice specifier - python

Suppose I have a variable S with the string "1:3" (or for that matter, "1", or "1:" or ":3") and I want to use that as a slice specifier on list L. You cannot simply do L[S] since the required args for a slice are "int:int".
Now, I current have some ugly code that parses S into its two constituent ints and deals with all the edge cases (4 of them) to come up with the correct slice access but this is just plain ugly and unpythonic.
How do I elegantly take string S and use it as my slice specifier?

This can be done without much hacking by using a list comprehension. We split the string on :, passing the split items as arguments to the slice() builtin. This allows us to quite nicely produce the slice in one line, in a way which works in every case I can think of:
slice(*[int(i.strip()) if i else None for i in string_slice.split(":")])
By using the slice() builtin, we neatly avoid having to deal with too many edge cases ourselves.
Example usage:
>>> some_list = [1, 2, 3]
>>> string_slice = ":2"
>>> some_list[slice(*[int(i.strip()) if i else None for i in string_slice.split(":")])]
[1, 2]
>>> string_slice = "::-1"
>>> some_list[slice(*[int(i.strip()) if i else None for i in string_slice.split(":")])]
[3, 2, 1]

Here's another solution
eval("L[%s]" % S)
warning - It's not safe if S is coming from an external(unreliable) source.

Related

Why does splatting create a tuple on the rhs but a list on the lhs?

Consider, for example,
squares = *map((2).__rpow__, range(5)),
squares
# (0, 1, 4, 9, 16)
*squares, = map((2).__rpow__, range(5))
squares
# [0, 1, 4, 9, 16]
So, all else being equal we get a list when splatting on the lhs and a tuple when splatting on the rhs.
Why?
Is this by design, and if yes, what's the rationale? Or, if not, are there any technical reasons? Or is this just how it is, no particular reason?

The fact that you get a tuple on the RHS has nothing to do with the splat. The splat just unpacks your map iterator. What you unpack it into is decided by the fact that you've used tuple syntax:
*whatever,
instead of list syntax:
[*whatever]
or set syntax:
{*whatever}
You could have gotten a list or a set. You just told Python to make a tuple.
On the LHS, a splatted assignment target always produces a list. It doesn't matter whether you use "tuple-style"
*target, = whatever
or "list-style"
[*target] = whatever
syntax for the target list. The syntax looks a lot like the syntax for creating a list or tuple, but target list syntax is an entirely different thing.
The syntax you're using on the left was introduced in PEP 3132, to support use cases like
first, *rest = iterable
In an unpacking assignment, elements of an iterable are assigned to unstarred targets by position, and if there's a starred target, any extras are stuffed into a list and assigned to that target. A list was chosen instead of a tuple to make further processing easier. Since you have only a starred target in your example, all items go in the "extras" list assigned to that target.

This is specified in PEP-0448 disadvantages
Whilst *elements, = iterable causes elements to be a list, elements = *iterable, causes elements to be a tuple. The reason for this may confuse people unfamiliar with the construct.
Also as per: PEP-3132 specification
This PEP proposes a change to iterable unpacking syntax, allowing to specify a "catch-all" name which will be assigned a list of all items not assigned to a "regular" name.
Also mentioned here: Python-3 exprlists
Except when part of a list or set display, an expression list containing at least one comma yields a tuple.
The trailing comma is required only to create a single tuple (a.k.a. a singleton); it is optional in all other cases. A single expression without a trailing comma doesn’t create a tuple, but rather yields the value of that expression. (To create an empty tuple, use an empty pair of parentheses: ().)
This might also be seen in a simpler example here, where elements in a list
In [27]: *elements, = range(6)
In [28]: elements
Out[28]: [0, 1, 2, 3, 4, 5]
and here, where elements is a tuple
In [13]: elements = *range(6),
In [14]: elements
Out[14]: (0, 1, 2, 3, 4, 5)
From what I could understand from the comments and the other answers:
The first behaviour is to keep in-line with the existing arbitrary argument lists used in functions ie.*args
The second behaviour is to be able to use the variables on LHS further down in the evaluation, so making it a list, a mutable value rather than a tuple makes more sense

There is an indication of the reason why at the end of PEP 3132 -- Extended Iterable Unpacking:
Acceptance
After a short discussion on the python-3000 list [1], the
PEP was accepted by Guido in its current form. Possible changes
discussed were:
[...]
Make the starred target a tuple instead of a list. This would be
consistent with a function's *args, but make further processing of the
result harder.
[1] https://mail.python.org/pipermail/python-3000/2007-May/007198.html
So, the advantage of having a mutable list instead of an immutable tuple seems to be the reason.

not a complete answer, but disassembling gives some clues:
from dis import dis
def a():
squares = (*map((2).__rpow__, range(5)),)
# print(squares)
print(dis(a))
disassembles as
5 0 LOAD_GLOBAL 0 (map)
2 LOAD_CONST 1 (2)
4 LOAD_ATTR 1 (__rpow__)
6 LOAD_GLOBAL 2 (range)
8 LOAD_CONST 2 (5)
10 CALL_FUNCTION 1
12 CALL_FUNCTION 2
14 BUILD_TUPLE_UNPACK 1
16 STORE_FAST 0 (squares)
18 LOAD_CONST 0 (None)
20 RETURN_VALUE
while
def b():
*squares, = map((2).__rpow__, range(5))
print(dis(b))
results in
11 0 LOAD_GLOBAL 0 (map)
2 LOAD_CONST 1 (2)
4 LOAD_ATTR 1 (__rpow__)
6 LOAD_GLOBAL 2 (range)
8 LOAD_CONST 2 (5)
10 CALL_FUNCTION 1
12 CALL_FUNCTION 2
14 UNPACK_EX 0
16 STORE_FAST 0 (squares)
18 LOAD_CONST 0 (None)
20 RETURN_VALUE
the doc on UNPACK_EX states:
UNPACK_EX(counts)
Implements assignment with a starred target: Unpacks an iterable in TOS into individual values, where the total number of values can be
smaller than the number of items in the iterable: one of the new
values will be a list of all leftover items.
The low byte of counts is the number of values before the list value, the high byte of counts the number of values after it. The
resulting values are put onto the stack right-to-left.
(emphasis mine). while BUILD_TUPLE_UNPACK returns a tuple:
BUILD_TUPLE_UNPACK(count)
Pops count iterables from the stack, joins them in a single tuple, and pushes the result. Implements iterable unpacking in tuple displays
(*x, *y, *z).

For the RHS, there is not much of an issue. the answer here states it well:
We have it working as it usually does in function calls. It expands
the contents of the iterable it is attached to. So, the statement:
elements = *iterable
can be viewed as:
elements = 1, 2, 3, 4,
which is another way for a tuple to be initialized.
Now, for the LHS,
Yes, there are technical reasons for the LHS using a list, as indicated in the discussion around the initial PEP 3132 for extending unpacking
The reasons can be gleaned from the conversation on the PEP(added at the end).
Essentially it boils down to a couple key factors:
The LHS needed to support a "starred expression" that was not necessarily restricted to the end only.
The RHS needed to allow various sequence types to be accepted, including iterators.
The combination of the two points above required manipulation/mutation of the contents after accepting them into the starred expression.
An alternative approach to handling, one to mimic the iterator fed on the RHS, even leaving implementation difficulties aside, was shot down by Guido for its inconsistent behaviour.
Given all the factors above, a tuple on LHS would have to be a list first, and then converted. This approach would then just add overhead, and did not invite any further discussion.
Summary: A combination of various factors led to the decision to allow a list on the LHS, and the reasons fed off of each other.
Relevant extract for disallowing inconsistent types:
The important use case in Python for the proposed semantics is when
you have a variable-length record, the first few items of which are
interesting, and the rest of which is less so, but not unimportant.
(If you wanted to throw the rest away, you'd just write a, b, c =
x[:3] instead of a, b, c, *d = x.) It is much more convenient for this
use case if the type of d is fixed by the operation, so you can count
on its behavior.
There's a bug in the design of filter() in Python 2 (which will be
fixed in 3.0 by turning it into an iterator BTW): if the input is a
tuple, the output is a tuple too, but if the input is a list or
anything else, the output is a list. That's a totally insane
signature, since it means that you can't count on the result being a
list, nor on it being a tuple -- if you need it to be one or the
other, you have to convert it to one, which is a waste of time and
space. Please let's not repeat this design bug.
-Guido
I have also tried to recreate a partially quoted conversation that pertains to the summary above.Source
Emphasis mine.
1.
In argument lists, *args exhausts iterators, converting them to
tuples. I think it would be confusing if *args in tuple unpacking
didn't do the same thing.
This brings up the question of why the patch produces lists, not
tuples. What's the reasoning behind that?
STeVe
2.
IMO, it's likely that you would like to further process the resulting
sequence, including modifying it.
Georg
3.
Well if that's what you're aiming at, then I'd expect it to be more
useful to have the unpacking generate not lists, but the same type you
started with, e.g. if I started with a string, I probably want to
continue using strings::
--additional text snipped off
4.
When dealing with an iterator, you don't know the length in advance,
so the only way to get a tuple would be to produce a list first and
then create a tuple from it.
Greg
5.
Yep. That was one of the reasons it was suggested that the *args
should only appear at the end of the tuple unpacking.
STeVe
couple convos skipped
6.
I don't think that returning the type given is a goal that should be
attempted, because it can only ever work for a fixed set of known
types. Given an arbitrary sequence type, there is no way of knowing
how to create a new instance of it with specified contents.
-- Greg
skipped convos
7.
I'm suggesting, that:
lists return lists
tuples return tuples
XYZ containers return XYZ containers
non-container iterables return iterators.
How do you propose to distinguish between the last two cases?
Attempting to slice it and catching an exception is not acceptable,
IMO, as it can too easily mask bugs.
-- Greg
8.
But I expect less useful. It won't support "a, *b, c = "
either. From an implementation POV, if you have an unknown object on
the RHS, you have to try slicing it before you try iterating over it;
this may cause problems e.g. if the object happens to be a defaultdict
-- since x[3:] is implemented as x[slice(None, 3, None)], the defaultdict will give you its default value. I'd much rather define
this in terms of iterating over the object until it is exhausted,
which can be optimized for certain known types like lists and tuples.
--
--Guido van Rossum

TLDR: You get a tuple on the RHS because you asked for one. You get a list on the LHS because it is easier.
It is important to keep in mind that the RHS is evaluated before the LHS - this is why a, b = b, a works. The difference then becomes apparent when splitting the assignment and using additional capabilities for the LHS and RHS:
# RHS: Expression List
a = head, *tail
# LHS: Target List
*leading, last = a
In short, while the two look similar, they are entirely different things. The RHS is an expression to create one tuple from all names - the LHS is a binding to multiple names from one tuple. Even if you see the LHS as a tuple of names, that does not restrict the type of each name.
The RHS is an expression list - a tuple literal without the optional () parentheses. This is the same as how 1, 2 creates a tuple even without parentheses, and how enclosing [] or {} create a list or set. The *tail just means unpacking into this tuple.
New in version 3.5: Iterable unpacking in expression lists, originally proposed by PEP 448.
The LHS does not create one value, it binds values to multiple names. With a catch-all name such as *leading, the binding is not known up-front in all cases. Instead, the catch-all contains whatever remains.
Using a list to store values makes this simples - the values for trailing names can be efficiently removed from the end. The remaining list then contains the exactly the values for the catch-all name. In fact, this is exactly what CPython does:
collect all items for mandatory targets before the starred one
collect all remaining items from the iterable in a list
pop items for mandatory targets after the starred one from the list
push the single items and the resized list on the stack
Even when the LHS has a catch-all name without trailing names, it is a list for consistency.

Using a = *b,:
If you do:
a = *[1, 2, 3],
It would give:
(1, 2, 3)
Because:
Unpacking and some other stuff give tuples in default, but if you say i.e
[*[1, 2, 3]]
Output:
[1, 2, 3] as a list since I do a list, so {*[1, 2, 3]} will give a set.
Unpacking gives three elements, and for [1, 2, 3] it really just does
1, 2, 3
Which outputs:
(1, 2, 3)
That's what unpacking does.
The main part:
Unpacking simply executes:
1, 2, 3
For:
[1, 2, 3]
Which is a tuple:
(1, 2, 3)
Actually this creates a list, and changes it into a tuple.
Using *a, = b:
Well, this is really gonna be:
a = [1, 2, 3]
Since it isn't:
*a, b = [1, 2, 3]
Or something similar, there is not much about this.
It is equivalent without * and ,, not fully, it just always gives a list.
This is really almost only used for multiple variables i.e:
*a, b = [1, 2, 3]
One thing is that no matter what it stores a list type:
>>> *a, = {1,2,3}
>>> a
[1, 2, 3]
>>> *a, = (1,2,3)
>>> a
[1, 2, 3]
>>>
Also it would be a strange to have:
a, *b = 'hello'
And:
print(b)
To be:
'ello'
Then it doesn't seem like splatting.
Also list have more functions than others, easier to handle.
There is probably no reason for this to happen, it really a decision in Python.
The a = *b, section there is a reason, in "The main part:" section.
Summary:
Also as #Devesh mentioned here in PEP 0448 disadvantages:
Whilst *elements, = iterable causes elements to be a list, elements = *iterable, causes elements to be a tuple. The reason for this may confuse people unfamiliar with the construct.
(emphasis mine)
Why bother, this doesn't really matter for us, why not just use the below if you want a list:
print([*a])
Or a tuple:
print((*a))
And a set:
print({*a})
And so on...

Replacing Nones in a python array with zeroes

I've just joined two arrays of unequal length together with the command:
allorders = map(None,todayorders, lastyearorders)
where "none" is given where today orders fails to have a value (as the todayorders array is not as long).
However, when I try to pass the allorders array into a matplotlib bar chart:
p10= plt.bar(ind, allorders[9], width, color='#0000DD', bottom=allorders[8])
..I get the following error:
TypeError: unsupported operand type(s) for +=: 'int' and 'NoneType'
So, is there a way for matplotlib to accept none datatypes? if not, how do I replace the 'Nones' with zeroes in my allorders array?
If you can, as I am a Python newbie (coming over from the R community), please provide detailed code from start to finish that I can use/test.

Use a list comprehension:
allorders = [i if i[0] is not None else (0, i[1]) for i in allorders]

With numpy:
import numpy as np
allorders = np.array(allorders)
This creates an arrray of objects due to the Nones. We can replace them with zeros:
allorders[allorders == None] = 0
Then convert the array to the proper type:
allorders.astype(int)

Since it sounds like you want this all to be in numpy, the direct answer to your question is really just an aside, and the right answer doesn't being until the "Of course…" paragraph.
If you think about it, you're using map with a None first parameter as a zip_longest, because Python doesn't have a zip_longest. But it does have one, in itertools—and it allows you to specify a custom fillvalue. So, you can do this all in one step with izip_longest:
>>> import itertools
>>> todayorders = [1, 2]
>>> lastyearorders = [1, 2, 3]
>>> allorders = itertools.izip_longest(todayorders, lastyearorders, fillvalue=0)
>>> list(allorders)
[(1, 1), (2, 2), (0, 3)]
This only fills in 0 for the Nones that show up as extra values for the shorter list; if you want to replace every None with a 0, you have to do it Martijn Pieters's way. But I think this is what you want.
Also, note that list(allorders) at the end: izip_longest, like most things in itertools, returns an iterator, not a list. Or, in terms you might be more familiar with, it returns a "lazy" sequence rather than a "strict" one. If you're just going to iterate over the result, that's actually better, but if you need to use it with some function that requires a list (like printing it out in human-readable form—or accessing allorders[9], as in your example), you need to explicitly convert it first.
If you actually want a numpy.array rather than a list, you can get there directly, without going through a list first. (If all you're ever going to do with it is matplotlib it, you probably do want an array.) The clearest way is to just use np.fromiter(allorders) instead of list(allorders). You might want to pass an explicit dtype=int (or whatever's appropriate). And, if you know the size (which you do—it's max(len(todayorders), len(lastyearorders))), in some cases it's faster or simpler to pass an explicit count as well.
Of course if any of the numpy stuff sounds appealing, you probably should stay within numpy in the first place, instead of using map or izip_longest:
>>> todayorders.resize(lastyearorders.shape)
>>> allorders = np.vstack(todayorders, lastyearorders).transpose()
Unfortunately, that mutates todayorders, and as far as I know, the equivalent immutable function numpy.resize doesn't give you any way to "zero-extend", but instead repeats the values. Hopefully I'm wrong and someone will suggest the easy way, but otherwise, you have to do it explicitly:
>>> extrazeros = np.zeros(len(lastyearorders) - len(todayorders), dtype=int)
>>> allorders = np.vstack(np.concatenate((todayorders, extrazeros)), lastyearorders)
>>> allorders = allorders.transpose()
array([[ 1, 1],
[ 2, 2],
[ 0, 3]])
Of course if you do a lot of that, I'd write a zeroextend function that takes a pair of arrays and extends one to match the other (or, if you're not just dealing with 1D, extends the shorter one on each axis to make the other).
At any rate, aside from being faster and using less temporary memory than using map, izip_longest, etc., this also means that you end up with a final array with the right dtype (int rather than object)—which means your result also uses less long-term memory, and everything you do from then on will also be faster and use less temporary memory.
For completeness: It is possible to have pyplot handle None values, but I don't think it's what you want. For example, you can pass it a Transform object whose transform method converts None to 0. But this will be effectively the same as Martijn Pieters's answer but much more verbose, and there's no advantage at all unless you need to plot tons of such arrays.

Why do some list methods in Python work only with defined variables? [duplicate]

This question already has answers here:
Why do these list operations (methods: clear / extend / reverse / append / sort / remove) return None, rather than the resulting list?
(6 answers)
Closed 4 years ago.
>>> a = [1, 2, 3]
>>> a.append(4)
>>> a
[1, 2, 3, 4]
But:
>>> [1, 2, 3].append(4)
>>>
Why do list methods in Python (such as insert and append) only work with defined variables?

In the second sample nothing is printed, because append, that was called on a list (note that append was actually performed), returns None.
Alternatively you should mention that a.append(4) also gave you a blank line (as your first sample shows), and final output of a first code sample was a representation of result of a expression, not a.append('4') expression.
Nothing is printed after append call in both cases because it is a representation of None.

list.append returns None. a.append('4') didn't print anything either since things which return None don't print anything in the interactive interpreter ...
Note that your second method call did work. It appended '4' to the list, you'll just never get to see it since you immediately lose any handle you had on the list you created.

It does work. It just doesn't print anything, the same way that a.append(4) doesn't print anything. It's only because you saved the list as a variable that you can display its new value.

Another way to concatenate lists:
>>> [1, 2, 3] + [4]
[1, 2, 3, 4]
Note that in this case, a new list is created and returned. list.append adds an item to an existing list and does not return anything.

It's not that the list methods works only with defined variables. It's that the specified method append of list always return None while change the internal state of the list in question.

The function only works with defined variables because that is how it is indented to work. From the python documentation:
list.append(x)
Add an item to the end of the list; equivalent to a[len(a):] = [x].
Note that is does "work" in the sense that a value is returned and the function does not raise an exception. But the results are lost because you have no reference to the modified list.
As an aside, some languages enforce naming conventions to make it explicit that a function modifies a given object. For example, Ruby and Scheme append a ! - so in this case the function might be called append!. Such a naming convention helps make the intent of the function more clear.

Colon Operator with Deques (in Python)

I was hoping to use the colon operator with my deque but it didn't seem to work the same as a list.
I was trying something like:
myDeque = deque([0,1,2,3,4,5])
myDequeFunction(myDeque[3:])
This is the error I recieved:
"TypeError: sequence index must be integer, not 'slice'"
What is the best way to do array slicing with deques?

Iterating is probably faster than brute-force methods (note: unproven) due to the nature of a deque.
>>> myDeque = collections.deque([0,1,2,3,4,5])
>>> list(itertools.islice(myDeque, 3, sys.maxint))
[3, 4, 5]

deque objects don't support slicing themselves, but you can make a new deque:
sliced_deque = deque(list(old_deque)[3:])

collections.deque objects don't support slicing. It'd be more straightforward to make a new one.
n_deque = deque(list(d)[3:])

Python LIST functions not returning new lists [duplicate]

This question already has answers here:
Why do these list operations (methods: clear / extend / reverse / append / sort / remove) return None, rather than the resulting list?
(6 answers)
Closed 5 months ago.
I'm having an issue considering the built-in Python List-methods.
As I learned Python, I always thought Python mutators, as any value class mutators should do, returned the new variable it created.
Take this example:
a = range(5)
# will give [0, 1, 2, 3, 4]
b = a.remove(1)
# as I learned it, b should now be [0, 2, 3, 4]
# what actually happens:
# a = [0, 2, 3, 4]
# b = None
The main problem with this list mutator not returning a new list, is that you cannot to multiple mutations subsequently.
Say I want a list ranging from 0 to 5, without the 2 and the 3.
Mutators returning new variables should be able to do it like this:
a = range(5).remove(2).remove(3)
This sadly isn't possible, as range(5).remove(2) = None.
Now, is there a way to actually do multiple mutations on lists like I wanna do in my example? I think even PHP allows these types of subsequent mutations with Strings.
I also can't find a good reference on all the built-in Python functions. If anyone can find the actual definition (with return values) of all the list mutator methods, please let me know. All I can find is this page: http://docs.python.org/tutorial/datastructures.html

Rather than both mutating and returning objects, the Python library chooses to have just one way of using the result of a mutator. From import this:
There should be one-- and preferably only one --obvious way to do it.
Having said that, the more usual Python style for what you want to do is using list comprehensions or generator expressions:
[x for x in range(5) if x != 2 and x != 3]
You can also chain these together:
>>> [x for x in (x for x in range(5) if x != 2) if x != 3]
[0, 1, 4]
The above generator expression has the added advantage that it runs in O(n) time because Python only iterates over the range() once. For large generator expressions, and even for infinite generator expressions, this is advantageous.

Many methods of list and other mutable types intentionally return None so that there is no question in your mind as to whether you are creating a new object or mutating an existing object. The only thing that could be happening is mutation since, if a new object were created, it would have to be returned by the method, and it is not returned.
As you may have noticed, the methods of str that edit the string do return the new string, because strings are immutable and a new string is always returned.
There is of course nothing at all keeping you from writing a list subclass that has the desired behavior on .append() et al, although this seems like rather a heavy hammer to swing merely to allow you to chain method calls. Also, most Python programmers won't expect that behavior, making your code less clear.

In Python, essentially all methods that mutate the object return None.
That's so you don't accidentally think you've got a new object as a result.
You mention
I think even PHP allows these types of subsequent mutations with Strings.
While I don't remember about PHP, with string manipulations, you can chain method calls, because strings are immutable, so all string methods return a new string, since they don't (can't) change the one you call them on.
>>> "a{0}b".format(3).center(5).capitalize().join('ABC').swapcase()
'a A3B b A3B c'

Neither Python or php have built-in "mutators". You can simply write multiple lines, like
a = list(range(5))
a.remove(2)
a.remove(3)
If you want to generate a new list, copy the old one beforehand:
a = list(range(5))
b = a[:]
b.remove(2)
Note that in most cases, a singular call to remove indicates you should have used a set in the first place.

To remove multiple mutations on lists as you want to do on your example, you can do :
a = range(5)
a = [i for j, i in enumerate(a) if j not in [2, 3]]
a will be [0, 1, 4]

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.