Removing a list of lists from another - python

I want to remove elements if it appears in a and b (not all occurrences of it)
a = [[2.0, 3.0], [1.0, 2.0] , [2.0, 3.0]]
b = [[1.0, 4.0], [2.0, 3.0] , [3.0, 4.0]]
Expected output
c = [[1.0, 2.0], [2.0, 3.0], [1.0, 4.0], [3.0, 4.0]]
If a point appears twice in a and twice in b then the output should contain the point twice
a = [[2.0, 3.0], [1.0, 2.0] , [2.0, 3.0]]
b = [[1.0, 4.0], [2.0, 3.0] , [3.0, 4.0], [2.0, 3.0]]
Expected output
c = [[1.0, 2.0], [2.0, 3.0], [1.0, 4.0], [3.0, 4.0], [2.0, 3.0]]
I have tried
first_set = set(map(tuple, a))
secnd_set = set(map(tuple, b))
first_set.symmetric_difference(secnd_set)
But this takes into account elements within a or b themselves.
Edited: a second example for clarification

If you first convert each pair to a tuple with something like this:
a = [tuple(item) for item in a]
b = [tuple(item) for item in b]
then you can simply take the set union between the two:
c = set(a).union(b)
This will give you a set with one of each pair that appears at least once in either or both collections:
>>> c
{(1.0, 2.0), (3.0, 4.0), (2.0, 3.0), (1.0, 4.0)}
If you want this behaviour keeping multiples then simply substitute Counter for set, then you can get them all back in one collection with .elements()
from collections import Counter
a = [[2.0, 3.0], [1.0, 2.0] , [2.0, 3.0]]
b = [[1.0, 4.0], [2.0, 3.0] , [3.0, 4.0], [2.0, 3.0]]
a1 = Counter(map(tuple,a))
b1 = Counter(map(tuple,a))
c = a1 | b1
>>> c
Counter({(2.0, 3.0): 2, (1.0, 2.0): 1, (1.0, 4.0): 1, (3.0, 4.0): 1})
>>> list(c.elements())
[(2.0, 3.0), (2.0, 3.0), (1.0, 2.0), (1.0, 4.0), (3.0, 4.0)]

This solution may not be efficient, but may give the result:
a = [[2.0, 3.0], [1.0, 2.0] , [2.0, 3.0]]
b = [[1.0, 4.0], [2.0, 3.0] , [3.0, 4.0]]
c = []
for item in a:
if item not in c:
c.append(item)
for item in b:
if item not in c:
c.append(item)
print(c)
Output:
[[2.0, 3.0], [1.0, 2.0], [1.0, 4.0], [3.0, 4.0]]

Alternatively, you could use itertools groupby
a = [[2.0, 3.0], [1.0, 2.0] , [2.0, 3.0]]
b = [[1.0, 4.0], [2.0, 3.0] , [3.0, 4.0]]
import itertools
c = [k for k,g in itertools.groupby(sorted(a+b))]
will result in
[[1.0, 2.0], [1.0, 4.0], [2.0, 3.0], [3.0, 4.0]]

Related

Python - Reordering the final list in a list of lists and having all corresponding list indices change to the same index ordering

I have a list of lists in python i.e.
[[6.0, 3.0, 16.0, 3.0], [3.0, 2.0, 5.0, 7.0], [4.0, 3.0, 2.0, 1.0]]
I then want to order the final list in the list of lists by ascending numerical size, but the change of order in the indexes of this list I want to be applied to the other corresponding indexes in the other lists within the list. For example,
[[6.0, 3.0, 16.0, 3.0], [3.0, 2.0, 5.0, 7.0], [4.0, 3.0, 2.0, 1.0]]
turns into
[[3.0, 16.0, 3.0, 6.0], [7.0, 5.0, 2.0, 3.0], [1.0, 2.0, 3.0, 4.0]]
Apologies if this isn't worded greatly, I am rather new to python.
I have looked into using the zip and sorted functions however haven't been able to use them to the effect I want to.
One way to do this is to associate to each number in the list you are ordering to an incrementing index - and then use this incrementing index as target for each element in the previous lists.
def order_by_last(data):
indexes = list(enumerate(data[-1]))
indexes.sort(key=lambda pair: pair[1])
new_list = [[sublist[index[0]] for index in indexes] for sublist in data]
return new_list
In [56]: order_by_last([[6.0, 3.0, 16.0, 3.0], [3.0, 2.0, 5.0, 7.0], [4.0, 3.0, 2.0, 1.0]])
Out[56]: [[3.0, 16.0, 3.0, 6.0], [7.0, 5.0, 2.0, 3.0], [1.0, 2.0, 3.0, 4.0]]
I'm not sure if you are willing to use external libraries, but you need an argsort for this one from numpy argsort. Note that the output is not a python list, but rather a numpy array (which can be converted though).
So you can get your result by doing the following:
# done list_of_lists
list_order = argsort(list_of_lists[-1])
new_list = []
for single_list in list_of_lists:
buffer_list = []
for position in list_order:
buffer_list.append(single_list[position])
new_list.append(buffer_list)
Keep in mind though that if your lists are different sizes, this might break.
Create a sorted list of indexes based on the last list, then recreate each other list based on these indexes.
l = [[6.0, 3.0, 16.0, 3.0], [3.0, 2.0, 5.0, 7.0], [4.0, 3.0, 2.0, 1.0]]
indexes = sorted(range(len(l[-1])), key=lambda x:l[-1][x])
res = [[x[i] for i in indexes] for x in l]
One option is to use zip to restructure the list into columnwise tuples, sort them and then turn that back into original lists:
L = [[6.0, 3.0, 16.0, 3.0], [3.0, 2.0, 5.0, 7.0], [4.0, 3.0, 2.0, 1.0]]
R = [*map(list,zip(*sorted(zip(*L[::-1]))))][::-1]
# [[3.0, 16.0, 3.0, 6.0], [7.0, 5.0, 2.0, 3.0], [1.0, 2.0, 3.0, 4.0]]
Another way (much less efficient but perhaps more readable) is to sort each row based on the last row's corresponding values:
R = [ [v for _,v in sorted(zip(L[-1],r))] for r in L ]
>>> a = [[6.0, 3.0, 16.0, 3.0], [3.0, 2.0, 5.0, 7.0], [4.0, 3.0, 2.0, 1.0]]
>>> list(zip(*sorted((zip(*a)), key=lambda x: x[-1])))
[(3.0, 16.0, 3.0, 6.0), (7.0, 5.0, 2.0, 3.0), (1.0, 2.0, 3.0, 4.0)]
I'm using two idioms here:
zip(*list_of_lists) acts as a matrix transposer by swapping rows and columns of the matrix, represented by a list of lists.
sorting the transposed list of lists by the value of the last element.

how to iterate over each single column of the chunk in python without using pandas?

I have two parts question for this
X = [[[-1.0, -1.0], [-2.0, -1.9], [-3.4, -2.0], [3.0, 1.5], [3.7, 1.0]],
[[3.0, 2.0] , [-4.0, 10.0]],
[[-10.0, 5.0], [-6.0, -10.0]],
[[2.0, -2.0]]]
first I wnat to iterate over every single column of the chunk ? How to do it?
this is my approach :
ds = zip(*X)
for list in ds:
print(list)
but it's giving me only one column:
([-1.0, -1.0], [3.0, 2.0], [-10.0, 5.0], [2.0, -2.0])
second how to creat 2 points for each column?
Unclear what your expected output is but:
data = [
[[-1.0, -1.0], [-2.0, -1.9], [-3.4, -2.0], [3.0, 1.5], [3.7, 1.0]],
[[3.0, 2.0], [-4.0, 10.0]],
[[-10.0, 5.0], [-6.0, -10.0]],
[[2.0, -2.0]]
]
flattened = [column for row in data for column in row]
Output:
[[-1.0, -1.0],
[-2.0, -1.9],
[-3.4, -2.0],
[3.0, 1.5],
[3.7, 1.0],
[3.0, 2.0],
[-4.0, 10.0],
[-10.0, 5.0],
[-6.0, -10.0],
[2.0, -2.0]]
Use itertools.zip_longest:
from itertools import zip_longest
for lst in zip_longest(*X):
print(lst)
Output:
([-1.0, -1.0], [3.0, 2.0], [-10.0, 5.0], [2.0, -2.0])
([-2.0, -1.9], [-4.0, 10.0], [-6.0, -10.0], None)
([-3.4, -2.0], None, None, None)
([3.0, 1.5], None, None, None)
([3.7, 1.0], None, None, None)
If you don't want Nones:
for lst in zip_longest(*X):
print(lst[:(lst.index(None) if None in lst else None)])
([-1.0, -1.0], [3.0, 2.0], [-10.0, 5.0], [2.0, -2.0])
([-2.0, -1.9], [-4.0, 10.0], [-6.0, -10.0])
([-3.4, -2.0],)
([3.0, 1.5],)
([3.7, 1.0],)
Not sure what you mean by 'every column'. When I look at your nested list X, it seems to be a list of 2D nested lists each with 2 columns and a variable number of rows:
>>> import numpy as np
>>> [np.array(ch).shape for ch in X]
[(5, 2), (2, 2), (2, 2), (1, 2)]
In your question you describe the following as a column:
([-1.0, -1.0], [3.0, 2.0], [-10.0, 5.0], [2.0, -2.0])
This is the first row of each of the 2D arrays I describe above. Since the last of these only has one row, the zip operation will stop after the first row because it runs out of items.
Maybe you are trying to concatenate all the 2D arrays into a vertical stack.
>>> np.vstack(X)
array([[ -1. , -1. ],
[ -2. , -1.9],
[ -3.4, -2. ],
[ 3. , 1.5],
[ 3.7, 1. ],
[ 3. , 2. ],
[ -4. , 10. ],
[-10. , 5. ],
[ -6. , -10. ],
[ 2. , -2. ]])
Or, if you don't want to use numpy, as #Lukas suggested:
>>> [row for chunk in X for row in chunk]
[[-1.0, -1.0],
[-2.0, -1.9],
[-3.4, -2.0],
[3.0, 1.5],
[3.7, 1.0],
[3.0, 2.0],
[-4.0, 10.0],
[-10.0, 5.0],
[-6.0, -10.0],
[2.0, -2.0]]
Also, note that you should not use the Python keyword list as a variable.

Python concatenating elements of one list that are between elements of another list

I have two lists: a and b. I want to concatenate all of the elements of the b that are between elements of a. All of the elements of a are in b, but b also has some extra elements that are extraneous. I would like to take the first instance of every element of a in b and concatenate it with the extraneous elements that follow it in b until we find another element of a in b. The following example should make it more clear.
a = [[11.0, 1.0], [11.0, 2.0], [11.0, 3.0], [11.0, 4.0], [11.0, 5.0], [12.0, 1.0], [12.0, 2.0], [12.0, 3.0], [12.0, 4.0], [12.0, 5.0], [12.0, 6.0], [12.0, 7.0], [12.0, 8.0], [12.0, 9.0], [12.0, 10.0], [12.0, 11.0], [12.0, 12.0], [12.0, 13.0], [12.0, 14.0], [13.0, 1.0], [13.0, 2.0], [13.0, 3.0], [13.0, 4.0], [13.0, 5.0], [13.0, 6.0], [13.0, 7.0], [13.0, 8.0], [13.0, 9.0], [13.0, 10.0]]
b = [[11.0, 1.0], [11.0, 1.0], [1281.0, 8.0], [11.0, 2.0], [11.0, 3.0], [11.0, 3.0], [11.0, 4.0], [11.0, 5.0], [12.0, 1.0], [12.0, 2.0], [12.0, 3.0], [12.0, 4.0], [12.0, 5.0], [12.0, 6.0], [12.0, 7.0], [12.0, 5.0], [12.0, 8.0], [12.0, 9.0], [12.0, 10.0], [13.0, 5.0], [12.0, 11.0], [12.0, 8.0], [3.0, 1.0], [13.0, 1.0], [9.0, 7.0], [12.0, 12.0], [12.0, 13.0], [12.0, 14.0], [13.0, 1.0], [13.0, 2.0], [11.0, 3.0], [13.0, 3.0], [13.0, 4.0], [13.0, 5.0], [13.0, 5.0], [13.0, 5.0], [13.0, 6.0], [13.0, 7.0], [13.0, 7.0], [13.0, 8.0], [13.0, 9.0], [13.0, 10.0]]
c = [[[11.0, 1.0], [11.0, 1.0], [1281.0, 8.0]], [[11.0, 2.0]], [[11.0, 3.0], [11.0, 3.0]], [[11.0, 4.0]], [[11.0, 5.0]], [[12.0, 1.0]], [[12.0, 2.0]], [[12.0, 3.0]], [[12.0, 4.0]], [[12.0, 5.0]], [[12.0, 6.0]], [[12.0, 7.0], [12.0, 5.0]], [[12.0, 8.0]], [[12.0, 9.0]], [[12.0, 10.0], [13.0, 5.0]], [[12.0, 11.0], [12.0, 8.0], [3.0, 1.0]], [[13.0, 1.0], [9.0, 7.0], [12.0, 12.0], [12.0, 13.0], [12.0, 14.0], [13.0, 1.0]], [[13.0, 2.0]], [[11.0, 3.0], [13.0, 3.0]], [[13.0, 4.0]], [[13.0, 5.0], [13.0, 5.0], [13.0, 5.0]], [[13.0, 6.0]], [[13.0, 7.0], [13.0, 7.0]], [[13.0, 8.0]], [[13.0, 9.0]], [[13.0, 10.0]]]
What I have thought of is something like this:
slice_list = []
for i, elem in enumerate(a):
if i < len(key_list)-1:
b_first_index = b.index(a[i])
b_second_index = b.index(a[i+1])
slice_list.append([b_first_index, b_second_index])
c = [[b[slice_list[i][0]:b[slice_list[i][1]]]] for i in range(len(slice_list))]
This however will not catch the last item in the list (which I am not quite sure how to fit into my list comprehension anyways) and it seems quite ugly. My question is, is there a neater way of doing this (perhaps in itertools)?
Let's simplify the visual a bit:
key_list = ['a', 'c', 'f']
wrong_list = ['a', 'b', 'c', 'd', 'e', 'f']
wrong_list_fixed = [['a', 'b'], ['c', 'd', 'e'], ['f']]
This will be semantically identical to what you have, but I think it is easier to see without all the extra nested brackets.
You could use itertools.groupby, if you could only come up with a clever key. Luckily, the mapping of key_list to wrong_list givs you exactly what you want:
class key:
def __init__(self, key_list):
self.last = -1
self.key_list = key_list
def __call__(self, item):
try:
self.last = self.key_list.index(item, self.last + 1)
except ValueError:
pass
return self.last
wrong_list_fixed = [list(g) for k, g in itertools.groupby(wrong_list, key(key_list))]
The key maps elements of wrong_list to key_list using index. For missing indices, it just returns the last one successfully found, ensuring that groups are not split until a new index is found. By starting the search from the next available index, you can ensure that duplicate entries in key_list get handled correctly.
[IDEOne Link]
I think your example wrong_list_fixed is incorrect.
[[12.0, 10.0], [13.0, 5.0], [12.0, 11.0], [12.0, 8.0],
# There should be a new list here -^
Here's a solution that walks the lists. It can be optimized further:
from contextlib import suppress
fixed = []
current = []
key_list_iter = iter(key_list)
next_key = next(key_list_iter)
for wrong in wrong_list:
if wrong == next_key:
if current:
fixed.append(current)
current = []
next_key = None
with suppress(StopIteration):
next_key = next(key_list_iter)
current.append(wrong)
if current:
fixed.append(current)
Here are the correct lists (modified to be easier to visually parse):
key_list = ['_a0', '_b0', '_c0', '_d0', '_e0', '_f0', '_g0', '_h0', '_i0', '_j0', '_k0', '_l0', '_m0', '_n0', '_o0', '_p0', '_q0', '_r0', '_s0', '_t0', '_u0', '_v0', '_w0', '_x0', '_y0', '_z0', '_A0', '_B0', '_C0']
wrong_list = ['_a0', '_a0', 'D0', '_b0', '_c0', '_c0', '_d0', '_e0', '_f0', '_g0', '_h0', '_i0', '_j0', '_k0', '_l0', '_j0', '_m0', '_n0', '_o0', '_x0', '_p0', '_m0', 'E0', '_t0', 'F0', '_q0', '_r0', '_s0', '_t0', '_u0', '_c0', '_v0', '_w0', '_x0', '_x0', '_x0', '_y0', '_z0', '_z0', '_A0', '_B0', '_C0']
wrong_list_fixed = [['_a0', '_a0', 'D0'], ['_b0'], ['_c0', '_c0'], ['_d0'], ['_e0'], ['_f0'], ['_g0'], ['_h0'], ['_i0'], ['_j0'], ['_k0'], ['_l0', '_j0'], ['_m0'], ['_n0'], ['_o0', '_x0'], ['_p0', '_m0', 'E0', '_t0', 'F0'], ['_q0'], ['_r0'], ['_s0'], ['_t0'], ['_u0', '_c0'], ['_v0'], ['_w0'], ['_x0', '_x0', '_x0'], ['_y0'], ['_z0', '_z0'], ['_A0'], ['_B0'], ['_C0']]
I get slightly different result from yours, but give it a try. If this is not what you want, I will delete my answer.
idx = sorted(set([b.index(ai) for ai in a] + [len(b)]))
c = [b[i:j] for i, j in zip(idx[:-1], idx[1:])]

Python dictionary iteratively expanded into lists

I have the following Python dictionary:
b = {'SP:1': 1.0,
'SP:2': 2.0,
'SP:3': 3.0,
'SP:4': 4.0,
'SP:5': 5.0,
'SP:6': 6.0,
'SP:7': 40.0,
'SP:8': 7.0,
'SP:9': 8.0}
I want to take this list and iterate over it to create 9 lists, each successive list being a superset of its predecessor. So:
[1.0]
[1.0,2.0]
[1.0,2.0,3.0]
[1.0,2.0,3.0,4.0]
...
[1.0,2.0,3.0,4.0,5.0,6.0,40.0,7.0,8.0]
There is probably a really easy way of doing this with a list comprehension, but I cant work it out!
You can do the following:
>>> vals = [v for k, v in sorted(b.items())]
# or shorter, but less explicit:
# vals = [b[k] for k in sorted(b)]
>>> [vals[:i+1] for i in range(len(vals))]
[[1.0],
[1.0, 2.0],
[1.0, 2.0, 3.0],
[1.0, 2.0, 3.0, 4.0],
[1.0, 2.0, 3.0, 4.0, 5.0],
[1.0, 2.0, 3.0, 4.0, 5.0, 6.0],
[1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 40.0],
[1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 40.0, 7.0],
[1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 40.0, 7.0, 8.0]]
The first comprehension gives you the values sorted by key as the initial dict is inherently unordered. The second gives you all of the desired slices of that list of values.
Dictionaries are not meant to be used in this form, and should never considered to be ordered. However, since the keys are basically indicies, we can use them like that:
[[b['SP:'+str(j+1)] for j in range(i+1)] for i in range(len(b))]

Remove duplicates in each list of a list of lists

I have a list of lists:
a = [[1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0],
[2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 3.0, 3.0, 3.0, 3.0, 3.0, 4.0, 4.0, 4.0, 4.0],
[3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 5.0, 5.0, 5.0],
[1.0, 4.0, 4.0, 4.0, 5.0, 5.0, 5.0],
[5.0, 5.0, 5.0],
[1.0]
]
What I need to do is remove all the duplicates in the list of lists and keep the previous sequence. Such as
a = [[1.0],
[2.0, 3.0, 4.0],
[3.0, 5.0],
[1.0, 4.0, 5.0],
[5.0],
[1.0]
]
If order is important, you can just compare to the set of items seen so far:
a = [[1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0],
[2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 3.0, 3.0, 3.0, 3.0, 3.0, 4.0, 4.0, 4.0, 4.0],
[3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 5.0, 5.0, 5.0],
[1.0, 4.0, 4.0, 4.0, 5.0, 5.0, 5.0],
[5.0, 5.0, 5.0],
[1.0]]
for index, lst in enumerate(a):
seen = set()
a[index] = [i for i in lst if i not in seen and seen.add(i) is None]
Here i is added to seen as a side-effect, using Python's lazy and evaluation; seen.add(i) is only called where the first check (i not in seen) evaluates True.
Attribution: I saw this technique yesterday from #timgeb.
If you have access to the OrderedDict (in Python 2.7 on), abusing it a good way to do this:
import collections
import pprint
a = [[1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0],
[2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 3.0, 3.0, 3.0, 3.0, 3.0, 4.0, 4.0, 4.0, 4.0],
[3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 5.0, 5.0, 5.0],
[1.0, 4.0, 4.0, 4.0, 5.0, 5.0, 5.0],
[5.0, 5.0, 5.0],
[1.0]
]
b = [list(collections.OrderedDict.fromkeys(i)) for i in a]
pprint.pprint(b, width = 40)
Outputs:
[[1.0],
[2.0, 3.0, 4.0],
[3.0, 5.0],
[1.0, 4.0, 5.0],
[5.0],
[1.0]]
This will help you.
a = [[1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0],
[2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 3.0, 3.0, 3.0, 3.0, 3.0, 4.0, 4.0, 4.0, 4.0],
[3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 5.0, 5.0, 5.0],
[1.0, 4.0, 4.0, 4.0, 5.0, 5.0, 5.0],
[5.0, 5.0, 5.0],
[1.0]
]
for _ in range(len(a)):
a[_] = sorted(list(set(a[_])))
print a
OUTPUT:
[[1.0], [2.0, 3.0, 4.0], [3.0, 5.0], [1.0, 4.0, 5.0], [5.0], [1.0]]
Inspired by DOSHI, here's another way, probably best way for a small number of possible elements (i.e. a small number of index lookups for sorted) otherwise a way that remembers insertion order may be better:
b = [sorted(set(i), key=i.index) for i in a]
So just to compare the methods, a seen set versus sorting a set by an original index lookup:
>>> setup = 'l = [1,2,3,4,1,2,3,4,1,2,3,4]*100'
>>> timeit.repeat('sorted(set(l), key=l.index)', setup)
[23.231241687943111, 23.302754517266294, 23.29650511717773]
>>> timeit.repeat('seen = set(); [i for i in l if i not in seen and seen.add(i) is None]', setup)
[49.855933579601697, 50.171151882997947, 51.024657420945005]
Here we see that for a larger case where, the contain test that Jon uses for every element becomes relatively very costly, and since insertion order is quickly determined by index in this case, this method is much more efficient.
However, by appending more elements to the end of the list, we see that Jon's method does not bear much increased cost, whereas mine does:
>>> setup = 'l = [1,2,3,4,1,2,3,4,1,2,3,4]*100 + [8,7,6,5]'
>>> timeit.repeat('sorted(set(l), key=l.index)', setup)
[93.221347206941573, 93.013769266020972, 92.64512197257136]
>>> timeit.repeat('seen = set(); [i for i in l if i not in seen and seen.add(i) is None]', setup)
[51.042504915545578, 51.059295348750311, 50.979311841569142]
I think I'd prefer Jon's method with a seen set, given the bad lookup times for the index.

Categories

Resources