comparing elements of a list of list and removing - python

l=[['a', 'random_str', 4], ['b', 'random_str2', 5], ['b', 'random_str3', 7]]
so I have a list like this and I want to traverse through this list to check if the zeroth index of each sublist is equivalent to one another so check if any zeroth index of each sublist is equal to another, and then if two of or more are similar check the second index in the sublist and only keep the sublist with lowest int value and remove all others.
so the output should be that
[['a', 'random_str', 4], ['b', 'random_str2', 5]]
so it removes the sublist with the higher int in the second index
I'm thinking something like this
for i in l:
for k in i:
if k[0]=i[0][0]:
# then I dont know

It can be achieved with pandas, a sort_values and a groupby:
import pandas as pd
l=[['a', 'random_str', 4], ['b', 'random_str2', 5], ['b', 'random_str3', 7]]
#create dataframe from list of list
df = pd.DataFrame(l)
#sort column based on third column / index = 2
df = df.sort_values(by=2)
#groupby first column and only take first entry which is lowest int after sort.
df = df.groupby(0).head(1)
#put back to list of list
df = df.values.tolist()
print(df)
prints out
[['a', 'random_str', 4], ['b', 'random_str2', 5]]

One more option may be this:
l = [['a', 'random_str', 4], ['b', 'random_str2', 5], ['b', 'random_str3', 7]]
# initialize a dict
d = {x[0]: list() for x in l}
# lambda function to compare values
f = lambda x: x if not d[x[0]] or x[2] < d[x[0]][2] else d[x[0]]
# list comprehension to iterate and process the list of values
[d.update({x[0]: f(x)}) for x in l]
# output exected: [['a', 'random_str', 4], ['b', 'random_str2', 5]]
print(list(d.values()))

This list-comprehension is fairly horrible, but does what you're after - find the minimum element based on third element, for every sublist with first element iterating over a set of first elements.
>>> [min((x for x in l if x[0] == y), key=lambda x: x[2]) for y in set(z[0] for z in l)]
[['b', 'random_str2', 5], ['a', 'random_str', 4]]

Group by first element, and in each group find min by third:
f2 = lambda x: x[2]
f0 = lambda x: x[0]
[min(subl, key=f2) for _, subl in itertools.groupby(sorted(l, key=f0), key=f0)]
# => [['a', 'random_str', 4], ['b', 'random_str2', 5]]

not so good/pythonic way, but you can use dictionary to get result
l=[['a', 'random_str', 4], ['b', 'random_str2', 5], ['b', 'random_str3', 7]]
res = {}
for sublist in l:
if sublist[0] not in res.keys():
res.update({sublist[0]:[sublist[1], sublist[2]]})
else:
if sublist[2]<res[sublist[0]][1]:
res[sublist[0]][1] = sublist[2]
final_res = [[index, res[index][0], res[index][1]] for index, value in res.items()]
print(final_res)
output
[['a', 'random_str', 4], ['b', 'random_str2', 5]]

This should do the trick. (I'm sure you could reduce the time complexity with a clever algorithm that does not sort, but unless we're talking about a bottleneck here you should not be too concerned about that.)
>>> from itertools import groupby
>>> from operator import itemgetter
>>>
>>> first, third = itemgetter(0), itemgetter(2)
>>> l = [['a', 'random_str', 4], ['b', 'random_str2', 5], ['b', 'random_str3', 7]]
>>>
>>> groups = groupby(sorted(l), key=first)
>>> [min(list(group), key=third) for _, group in groups]
[['a', 'random_str', 4], ['b', 'random_str2', 5]]
The idea is to group your data by the first element in each sublist. groupby needs l to be sorted in order to do that. (sorted already sorts lexicographically, but you could optimize it by using sorted(l, key=first) such that only the first element is considered for sorting.) Afterwards, we extract the minimum of each group with respect to element three (index 2).
All of this could be done in one line, but I find groupby oneliners terribly unreadable, so I opted for a solution with more lines and some self-documenting variable names.

Doing this via dataframe is the easiest and most efficient according to me. Try
import pandas as pd
data = [['a', 'random_str4', 4], ['b', 'random_str2', 5], ['b', 'random_str3', 7], ['a', 'random_str2', 6]]
df = pd.DataFrame(data)
df = df.sort_values(by = 2)
df = df.drop_duplicates(0)
df.values.tolist()
This outputs the desired result, i.e.,
[['a', 'random_str4', 4], ['b', 'random_str2', 5]]

I know this is not the most feasible solution, but works fine for the mean time.
Assuming the list is sorted in Ascending order,
def get_result(l) :
temp_list = []
i = 0
while i < len(l) - 1 :
min = l[i]
while i<len(l)-1 and l[i][0] == l[i+1][0] :
if l[i][2] < l[i+1][2] :
min = l[i]
else :
min = l[i+1]
i += 1
temp_list.append(min)
i += 1
return temp_list
do,
print(get_result(l))
if the list is sorted, use this before invoking the method,
l.sort(key=lambda x: x[0])

Related

use threshold on a list of numbers to subset corresponding elements from other list of objects

I have two lists of lists which share the same dimension (same length, each list in the list have the same number of elements)
list1 looks like this:
list1 = [[1,2,3],[2,3,4],[4,5,2],[4,0,9]]
list2 looks like this:
list2 = [['a''b','c'],['a','d','e'],['a','f','b'],['p','o','a']]
let's say I have a threshold x = 3
I want to filter out elements from list 2 which have the same position of those elements in list1 which fall below x = 3
so for instance for x = 3 I would want to obtain:
list3 = [[],['e'],['a','f'],['p','a']
how can I do so?
thank you for your help
You can use zip to walk the two lists (and sublists) together and filter the values in list2 that satisfy the condition in the corresponding position in list1:
out = [[j for i, j in zip(li1, li2) if i>3] for li1, li2 in zip(list1, list2)]
Output:
[[], ['e'], ['a', 'f'], ['p', 'a']]
I would use the Python zip function to "merge" the two lists.
list1 = [[1,2,3],[2,3,4],[4,5,2],[4,0,9]]
list2 = [['a','b','c'],['a','d','e'],['a','f','b'],['p','o','a']]
threshold = 3
list3 = [
[char for value, char in zip(sub_one, sub_two) if value > threshold]
for sub_one, sub_two in zip(list1, list2)
]
print(list3)
This outputs
[[], ['e'], ['a', 'f'], ['p', 'a']]
The outer zip merges list1 and list2 into:
[([1, 2, 3], ['a', 'b', 'c']),
([2, 3, 4], ['a', 'd', 'e']),
([4, 5, 2], ['a', 'f', 'b']),
([4, 0, 9], ['p', 'o', 'a'])]
while the inner zip merges e.g. [1,2,3] and ['a','b','c'] into [(1, 'a'), (2, 'b'), (3, 'c')]. Then, the inner list comprehension can use a simple filtering with if value > threshold.

calculate sum of all unique elements in a list based on another list python

I have two lists like this,
a=[['a', 'b', 'c'], ['b', 'c'], ['a', 'd'], ['x']]
b=[[1, 2, 3], [4,5], [6,7], [8]] (the size of a and b is always same)
Now I want to create two list with the sum of unique elements, so the final lists should look like,
a=['a', 'b', 'c', 'd', 'x']
b=[7, 6, 8, 7, 8] (sum of all a, b, d, d and x)
I could do this using for loop but looking for some efficient way to reduce execution time.
Not so pythonic but will do the job:
a=[['a', 'b', 'c'], ['b', 'c'], ['a', 'd'], ['x']]
b=[[1, 2, 3], [4,5], [6,7], [8]]
mapn = dict()
for elt1, elt2 in zip(a, b):
for e1, e2 in zip(elt1, elt2):
mapn[e1] = mapn.get(e1, 0) + e2
elts = mapn.keys()
counts = mapn.values()
print(mapn)
print(elts)
print(counts)
You can use zip and collections.Counter along the following lines:
from collections import Counter
c = Counter()
for la, lb in zip(a, b):
for xa, xb in zip(la, lb):
c[xa] += xb
list(c.keys())
# ['a', 'b', 'c', 'd', 'x']
list(c.values())
# [7, 6, 8, 7, 8]
Here some ideas.
First, to flatten your list you can try:
a=[['a', 'b', 'c'], ['b', 'c'], ['a', 'd'], ['x']]
b=[[1, 2, 3], [4,5], [6,7], [8]]
To have uniques elements, you can do something like
A = set([item for sublist in a for item in sublist])
But what I would do first (perhaps not the more efficient) is :
import pandas as pd
import bumpy as np
LIST1 = [item for sublist in a for item in sublist]
LIST2 = [item for sublist in b for item in sublist]
df = pd.DataFrame({'a':LIST1,'b':LIST2})
df.groupby(df.a).sum()
OUTPUT:
At the end of the day, you're going to have to use two for loops. I have a one liner solution using zip and Counter.
The first solutions works only in this specific case where all the strings are a single character, because it creates a string with the right number of each letter, and then gets the frequency of each letter.
from collections import Counter
a = [['a', 'b', 'c'], ['b', 'c'], ['a', 'd'], ['x']]
b = [[1, 2, 3], [4,5], [6,7], [8]]
a, b = zip(*Counter(''.join(x*y for al, bl in zip(a, b) for x, y in zip(al, bl))).items())
For the more general case, you can do:
a, b = zip(*Counter(dict(p for al, bl in zip(a, b) for p in zip(al, bl))).items())
You can combine the lists and their internal lists using zip(), then feed the list of tuples to a dictionary constructor to get a list of dictionaries with values for each letter. Then convert those dictionaries to Counter and add them up.
a = [['a', 'b', 'c'], ['b', 'c'], ['a', 'd'], ['x']]
b = [[ 1, 2, 3 ], [ 4, 5 ], [ 6, 7 ], [ 8 ]]
from collections import Counter
from itertools import starmap
mapn = sum(map(Counter,map(dict,starmap(zip,zip(a,b)))),Counter())
elts,counts = map(list,zip(*mapn.items()))
print(mapn) # Counter({'c': 8, 'x': 8, 'a': 7, 'd': 7, 'b': 6})
print(elts) # ['a', 'b', 'c', 'd', 'x']
print(counts) # [ 7, 6, 8, 7, 8]
detailed explanation:
zip(a,b) combines the lists into pairs of sublists. e.g. (['a','b','c'],[1,2,3]), ...
starmap(zip,...) takes these list pairs and merges then together into sublist of letter-number pairs: [('a',1),('b',2),('c',3)], ...
Each of these lists of pairs is converted to a dictionary by map(dict,...) and then into a Counter object by map(counter,...)
We end up with a list of Counter objects corresponding to the pairing of each sublist. Applying sum(...,Counter()) computes the totals for each letter into a single Counter object.
Apart from being a Counter object, mapn is excatly the same as the dictionary that you produced.

Selecting elements from list of lists that satisfy a condition

I have the following list
a = [['a', 'b', 1], ['c', 'b', 3], ['c','a', 4], ['a', 'd', 2]]
and I'm trying to remove all the elements from the list where the last element is less than 3. So the output should look like
a = [['c', 'b', 3], ['c','a', 4]]
I tried to use filter in the following way
list(filter(lambda x: x == [_, _, 2], a))
Here _ tries to denote that the element in those places can be anything. I'm used to this kind of syntax from mathematica but I have been unable to find something like this in Python (is there even such a symbol in python ?).
I would prefer solution using map and filter as those are most intuitive for me.
You should be using x[-1] >= 3 in lambda to retain all sub lists with last value greater than or equal to 3:
>>> a = [['a', 'b', 1], ['c', 'b', 3], ['c','a', 4], ['a', 'd', 2]]
>>> list(filter(lambda x: x[-1] >= 3, a))
[['c', 'b', 3], ['c', 'a', 4]]
List comprehension approach:
a_new = [sublist for sublist in a if sublist[-1] >= 3]
a = [['a', 'b', 1], ['c', 'b', 3], ['c','a', 4], ['a', 'd', 2]]
Filter above list with list comprehension like:
b = [x for x in a if x[-1] >= 3]

Merge two or more lists with given order of merging

On start I have 2 lists and 1 list that says in what order I should merge those two lists.
For example I have first list equal to [a, b, c] and second list equal to [d, e] and 'merging' list equal to [0, 1, 0, 0, 1].
That means: to make merged list first I need to take element from first list, then second, then first, then first, then second... And I end up with [a, d, b, c, e].
To solve this I just used for loop and two "pointers", but I was wondering if I can do this task more pythonic... I tried to find some functions that could help me, but no real result.
You could create iterators from those lists, loop through the ordering list, and call next on one of the iterators:
i1 = iter(['a', 'b', 'c'])
i2 = iter(['d', 'e'])
# Select the iterator to advance: `i2` if `x` == 1, `i1` otherwise
print([next(i2 if x else i1) for x in [0, 1, 0, 0, 1]]) # ['a', 'd', 'b', 'c', 'e']
It's possible to generalize this solution to any number of lists as shown below
def ordered_merge(lists, selector):
its = [iter(l) for l in lists]
for i in selector:
yield next(its[i])
In [4]: list(ordered_merge([[3, 4], [1, 5], [2, 6]], [1, 2, 0, 0, 1, 2]))
Out[4]: [1, 2, 3, 4, 5, 6]
If the ordering list contains strings, floats, or any other objects that can't be used as list indexes, use a dictionary:
def ordered_merge(mapping, selector):
its = {k: iter(v) for k, v in mapping.items()}
for i in selector:
yield next(its[i])
In [6]: mapping = {'A': [3, 4], 'B': [1, 5], 'C': [2, 6]}
In [7]: list(ordered_merge(mapping, ['B', 'C', 'A', 'A', 'B', 'C']))
Out[7]: [1, 2, 3, 4, 5, 6]
Of course, you can use integers as dictionary keys as well.
Alternatively, you could remove elements from the left side of each of the original lists one by one and add them to the resulting list. Quick example:
In [8]: A = ['a', 'b', 'c']
...: B = ['d', 'e']
...: selector = [0, 1, 0, 0, 1]
...:
In [9]: [B.pop(0) if x else A.pop(0) for x in selector]
Out[9]: ['a', 'd', 'b', 'c', 'e']
I would expect the first approach to be more efficient (list.pop(0) is slow).
How about this,
list1 = ['a', 'b', 'c']
list2 = ['d', 'e']
options = [0,1,0,0,1]
list1_iterator = iter(list1)
list2_iterator = iter(list2)
new_list = [next(list2_iterator) if option else next(list1_iterator) for option in options]
print(new_list)
# Output
['a', 'd', 'b', 'c', 'e']

sort list of lists in ascending and descending order with strings

I was looking into this question about sorting a list of lists with multiple criteria (element 0 descending and element 1 ascending) .
L = [['a',1], ['a',2], ['a',3], ['b',1], ['b',2], ['b',3]]
L.sort(key=lambda k: (-k[0], k[1]), reverse=True)
is there a way to make it work when the first or the second element is a string?
It is simple, just use -ord for your strings:
L.sort(key=lambda k: (-ord(k[0]), k[1]), reverse=True)
Output:
In [1]: L = [['a',1], ['a',2], ['a',3], ['b',1], ['b',2], ['b',3]]
In [2]: L.sort(key=lambda k: (-ord(k[0]), k[1]), reverse=True)
In [3]: L
Out[3]: [['a', 3], ['a', 2], ['a', 1], ['b', 3], ['b', 2], ['b', 1]]
You can get the exact same result by not using reverse and negating the int values:
L = [['a',1], ['a',2], ['a',3], ['b',1], ['b',2], ['b',3]]
L.sort(key=lambda k: (k[0], -k[1]))
print(L)
[['a', 3], ['a', 2], ['a', 1], ['b', 3], ['b', 2], ['b', 1]]
So for a mixture of strings and ints you don't actually need cmp, ord or anything other negating the int values.
As you want to reverse the sort order for the second string, using the key function won't work. However, using the cmp function will work:
L.sort(cmp=lambda x, y: cmp(x[0], y[0]) or -cmp(x[1], y[1]))

Categories

Resources