Why does pythons slice indexing give counter intuitive results? [duplicate] - python

This question already has answers here:
Printing a column of a 2-D List in Python
(7 answers)
Closed 6 years ago.
If I have a 2D list in python Data and I want to create a slice of that 2D list, where I select all the elements from the first index and a single on from the second.
eg.
Data = [[a,b,c],[d,e,f],[h,i,g]]
and I want the list;
raw_data = [b,e,i]
Why does doing something like;
raw_data = Data[:][1]
not give the desired output?
I have specified the whole first index and the 1 index for the second.
Instead I get the output that is;
raw_data = [d,e,f]
Which is what I would expect to get from;
raw_data = Data[1][:]
raw_data = [d,e,f]
So;
Data[1][:] = Data[:][1]
Which is not compatible with my mental model of how lists work in python.
Instead I have to use a loop to do it;
raw_data = []
for i in xrange(0,len(Data),1):
raw_data.append(Data[i][1])
So my question is, can anyone explain why Data[1][:] = Data[:][1] ?
Thanks for reading!

lst[:] has no explicit start an no explicit end, so according to the Python documentation, it will return a copy of the list starting at the start and ending at the end of the list. In other words, it will return a copy of same list you have before. So:
>>> Data = [['a','b','c'],['d','e','f'],['h','i','g']]
>>> Data[:]
[['a', 'b', 'c'], ['d', 'e', 'f'], ['h', 'i', 'g']]
So when you say Data[:], that will evaluate to the same as a copy of Data, meaning that Data[:][1] essentially is just Data[1], which is [d,e,f]
If you do it the other way:
>>> Data[1]
['d', 'e', 'f']
>>> Data[1][:]
['d', 'e', 'f']
You get the second element in data, [d,e,f], then you use that same list slicing syntax as before to get that same list again.
To get what you want, I'd use a list comprehension:
>>> [x[1] for x in Data]
['b', 'e', 'i']
Simple as that.

Vanilla Python doesn't have two dimensional arrays, but it does allow for extensions to implement them. You have a list of lists, which is somewhat different.
The solution to your problem is to use numpy which does have a 2d array type. You can then say data[:,1]
Why your example doesn't work as you expect: data[:] means "a copy of data" and so data[:][1] means the index 1 element of the copy of data, which is [d,e,f]

It's pretty obvious if you go through what's happening from left to right.
raw_data = Data[:]
will give you the entirety of Data, so the whole list of lists: [[a,b,c],[d,e,f],[h,i,g]]
raw_data = Data[:][1]
will then give you the element at index 1 in this list, which is [d,e,f].
On the other hand,
raw_data = Data[1]
will return the element at position 1 in data, which is also [d,e,f].
[:] on this object will again return itself in its entirety.
What you are trying to do is best done with a list comprehension, such as:
raw_data = [x[1] for x in Data]
This will give you list of all second elements in all lists in Data.

Related

set item at multiple indexes in a list

I am trying to find a way to use a list of indexes to set values at multiple places in a list (as is possible with numpy arrays).
I found that I can map __getitem__ and a list of indexes to return the values at those indexes:
# something like
a_list = ['a', 'b', 'c']
idxs = [0, 1]
get_map = map(a_list.__getitem__, idxs)
print(list(get_map)) # yields ['a', 'b']
However, when applying this same line of thought to __setitem__, the setting fails. This probably has something to do with pass-by-reference vs pass-by-value, which I have never fully understood no matter how many times I've read about it.
Is there a way to do this?
b_list = ['a', 'b', 'c']
idxs = [0, 1]
put_map = map(b_list.__setitem__, idx, ['YAY', 'YAY'])
print(b_list) # yields ['YAY', 'YAY', 'c']
For my use case, I only want to set one value at multiple locations. Not multiple values at multiple locations.
EDIT: I know how to use list comprehension. I am trying to mimic numpy's capability to accept a list of indexes for both getting and setting items in an array, except for lists.
The difference between the get and set case is that in the get case you are interested in the result of map itself, but in the set case you want a side effect. Thus, you never consume the map generator and the instructions are never actually executed. Once you do, b_list gets changed as expected.
>>> put_map = map(b_list.__setitem__, idxs, ['YAY', 'YAY'])
>>> b_list
['a', 'b', 'c']
>>> list(put_map)
[None, None]
>>> b_list
['YAY', 'YAY', 'c']
Having said that, the proper way for get would be a list comprehension and for set a simple for loop. That also has the advantage that you do not have to repeat the value to put in place n times.
>>> for i in idxs: b_list[i] = "YAY"
>>> [b_list[i] for i in idxs]
['YAY', 'YAY']

How to split a list on an element delimiter [duplicate]

This question already has answers here:
How to split a list into chunks determined by a separator?
(4 answers)
Closed 5 years ago.
Are there concise and elegant ways of splitting a list in Python into a list of sub-lists by a delimiting element, such that ['a', 'delim', 'b'] -> [['a'], ['b']]?
Here is the example:
ldat = ['a','b','c','a','b','c','a','b']
dlim = 'c'
lspl = [] # an elegant python one-liner wanted on this line!
print(lspl) # want: [['a', 'b'], ['a', 'b'], ['a', 'b']]
Working examples that seem overly complex
I have surveyed documentation and related questions on stackoverflow - many referenced below - which did not answer my question, and am summarizing my research below: several approaches which do generate the desired output, but are verbose and intricate, and what is happening (splitting a list) is not immediately apparent -- you really have to squint.
Are there better ways? I am primarily interested in readability for beginners (e.g. teaching), canonical / 'Pythonic' approaches, and secondarily in the most efficient approaches (e.g. timeit speed). Ideally answers would address both Python 2.7 and 3.x.
with conditional .append()
Loop through the list and either append to the last output list or add a new output list. Based on an example that includes the delimiter, but altered to exclude it. I'm not sure how to make it a one-liner, or if that is even desirable.
lspl = [[]]
for i in ldat:
if i==dlim:
lspl.append([])
else:
lspl[-1].append(i)
print(lspl) # prints: [['a', 'b'], ['a', 'b'], ['a', 'b']]
with itertools.groupby
Combine itertools.groupby with list comprehension. Many answers include delimeters, this is based on those that exclude delimeters.
import itertools
lspl = [list(y) for x, y in itertools.groupby(ldat, lambda z: z == dlim) if not x]
print(lspl) # prints: [['a', 'b'], ['a', 'b'], ['a', 'b']]
with slicing on indices
Some related questions have discussed how to use slicing after using .index() -- however answers usually focus on finding the first index only. One can extend this approach by first finding a list of indices and then looping through a self-zipped list to slice the ranges.
indices = [i for i, x in enumerate(ldat) if x == dlim]
lspl = [ldat[s+1:e] for s, e in zip([-1] + indices, indices + [len(ldat)])]
print(lspl) # prints: [['a', 'b'], ['a', 'b'], ['a', 'b']]
However, like all the approaches I have found, this seems like a very complex way of enacting a simple split-on-delimiter operation.
Comparison to string splitting
By comparison and as a model only, here is a working, concise, and elegant way of splitting
a string into a list of sub-strings by a delimiter.
sdat = 'abcabcab'
dlim = 'c'
sspl = sdat.split(dlim)
print(sspl) # prints: ['ab', 'ab', 'ab']
NOTE: I understand there is no split method on lists in Python, and I am not asking about splitting a string. I am also not asking about splitting element-strings into new elements.
or this:
ldat = ['a','b','c','a','b','c','a','b']
dlim = 'c'
lspl = [] # an elegant python one-liner wanted on this line!
print(lspl) # want: [['a', 'b'], ['a', 'b'], ['a', 'b']]
s = str(ldat).replace(", '%s', " % dlim, "],[")
result = eval(s)
print(result)

Efficiently comparing the first item in each list of two large list of lists?

I'm currently working with a a large list of lists (~280k lists) and a smaller one (~3.5k lists). I'm trying to efficiently compare the first index in the smaller list to the first index in the large list. If they match, I want to return both lists from the small and large list that have a matching first index.
For example:
Large List 1:
[[a,b,c,d],[e,f,g,h],[i,j,k,l],[m,n,o,p]]
Smaller list 2:
[[e,q,r,s],[a,t,w,s]]
Would return
[([e,q,r,s],[e,f,g,h]),([a,t,w,s],[a,b,c,d])]
I currently have it setup as shown here, where a list of tuples is returned with each tuple holding the two lists that have a matching first element. I'm fine with any other data structures being used. I was trying to use a set of tuples but was having issues trying to figure out how to do it quicker than what I already have.
My code to to compare these two list of lists is currently this:
match = []
for list_one in small_list:
for list_two in large_list:
if str(list_one[0]).lower() in str(list_two[0]).lower():
match.append((spm_values, cucm_values))
break
return match
Assuming order doesn't matter, I would highly recommend using a dictionary to map prefix (one character) to items and set to find matches:
# generation of data... not important
>>> lst1 = [list(c) for c in ["abcd", "efgh", "ijkl", "mnop"]]
>>> lst2 = [list(c) for c in ["eqrs", "atws"]]
# mapping prefix to list (assuming uniqueness)
>>> by_prefix1 = {chars[0]: chars for chars in lst1}
>>> by_prefix2 = {chars[0]: chars for chars in lst2}
# actually finding matches by intersecting sets (fast)
>>> common = set(by_prefix1.keys()) & set(by_prefix2.keys())
>>> tuples = tuple(((by_prefix1[k], by_prefix2[k]) for k in common))
>>> tuples
Here's a one liner using list comprehension. I'm not sure how efficient it is, though.
large = [list(c) for c in ["abcd", "efgh", "ijkl", "mnop"]]
small = [list(c) for c in ["eqrs", "atws"]]
ret = [(x,y) for x in large for y in small if x[0] == y[0]]
print ret
#output
[(['a', 'b', 'c', 'd'], ['a', 't', 'w', 's']), (['e', 'f', 'g', 'h'], ['e', 'q', 'r', 's'])]
I'm actually using Python 2.7.11, although I guess this may work.
l1 =[['a','b','c','d'],['e','f','g','h'],['i','j','k','l'],['m','n','o','p']]
l2 =[['e','q','r','s'],['a','t','w','s']]
def org(Smalllist,Largelist):
L = Largelist
S = Smalllist
Final = []
for i in range(len(S)):
for j in range(len(L)):
if S[i][0] == L[j][0]:
Final.append((S[i],L[j]))
return Final
I suggest you to put the Smaller list in the first variable in order to get the results in the order you expected.
It's very important that you enter these letters as strings upon testing, as I did, otherwise they might be considered variables and the code will not run properly.

Python - List of Lists Slicing Behavior

When I define a list and try to change a single item like this:
list_of_lists = [['a', 'a', 'a'], ['a', 'a', 'a'], ['a', 'a', 'a']]
list_of_lists[1][1] = 'b'
for row in list_of_lists:
print row
It works as intended. But when I try to use list comprehension to create the list:
row = ['a' for range in xrange(3)]
list_of_lists = [row for range in xrange(3)]
list_of_lists[1][1] = 'b'
for row in list_of_lists:
print row
It results in an entire column of items in the list being changed. Why is this? How can I achieve the desired effect with list comprehension?
Think about if you do this:
>>> row = ['a' for range in xrange(3)]
>>> row2 = row
>>> row2[0] = 'b'
>>> row
['b', 'a', 'a']
This happens because row and row2 are two different names for the same list (you have row is row2) - your example with nested lists only obscures this a little.
To make them different lists, you can cause it to re-run the list-creation code each time instead of doing a variable assignment:
list_of_lists = [['a' for range in xrange(3)] for _ in xrange(3)]
or, create a new list each time by using a slice of the full old list:
list_of_lists = [row[:] for range in xrange(3)]
Although this isn't guaranteed to work in general for all sequences - it just happens that list slicing makes a new list for the slice. This doesn't happen for, eg, numpy arrays - a slice in those is a view of part of the array rather than a copy. If you need to work more generally than just lists, use the copy module:
from copy import copy
list_of_lists = [copy(row) for range in xrange(3)]
Also, note that range isn't the best name for a variable, since it shadows the builtin - for a throwaway like this, _ is reasonably common.
This happens because most objects in python (exept for strings and numbers) get passed reference (not exactly by reference, but here you have a better explanation) so when you try to do it in the "list comprehensive" way, yo get a list of 3 references to the same list (the one you called "row"). So when you change the value of one row you see that change in all of them)
So what you have to do is to change your "matrix" creation like this:
list_of_lists = [list(row) for range in xrange(3)]
Here you have some ideas on how to correctly get a copy of a list. Depending on what you are trying to do, you may use one or another...
Hope it helps!
Copy the list instead of just the reference.
list_of_lists = [row[:] for range in xrange(3)]

Is there a python method to re-order a list based on the provided new indices?

Say I have a working list:
['a','b','c']
and an index list
[2,1,0]
which will change the working list to:
['c','b','a']
Is there any python method to do this easily (the working list may also be a numpy array, and so a more adaptable method is greatly preferred)? Thanks!
ordinary sequence:
L = [L[i] for i in ndx]
numpy.array:
L = L[ndx]
Example:
>>> L = "abc"
>>> [L[i] for i in [2,1,0]]
['c', 'b', 'a']
Converting my comment-answer from when this question was closed:
As you mentioned numpy, here is an answer for that case:
For numerical data, you can do this directly with numpy arrays Details here: http://www.scipy.org/Cookbook/Indexing#head-a8f6b64c733ea004fd95b47191c1ca54e9a579b5
the syntax is then
myarray[myindexlist]
For non-numerical data, the most efficient way in the case of long arrays which you read out only once is most likely this:
(myarray[i] for i in myindex)
note the () for generator expressions instead of [] for list comprehension.
Read this:
Generator Expressions vs. List Comprehension
This is very easy to do with numpy if you don't mind converting to numpy arrays:
>>> import numpy
>>> vals = numpy.array(['a','b','c'])
>>> idx = numpy.array([2,1,0])
>>> vals[idx]
array(['c', 'b', 'a'],
dtype='|S1')
To get back to a list, you can do:
>>> vals[idx].tolist()
['c', 'b', 'a']

Categories

Resources