Operations on sub-lists in list - python

I have a list of lists:
a = [[1, 2], [2, 3], [4, 3]]
How to get the following effect in two steps ?:
b = [[1, 2, 2, 3], [1, 2, 4, 3], [2, 3, 4, 3]]
b = [[1, 2, 3], [1, 2, 4, 3]], it means:
1.1. If the same values occur in the sub-list b[i] next to each other, then
one of these values must be deleted.
2.2. If the same values appear in a given sub-list b[i] but not next to each
other, then the entire sub-list b[i] must be deleted.

timegb is right. An elegant solution involves some amount of trickery and deception. I'll try and break down the steps.
find all 2-combinations of your input using itertools.combinations
flatten returned combinations with map and chain
for each combination, group by consecutive elements
keep only those that satisfy your condition by doing a length check.
from itertools import chain, combinations, groupby
out = []
for r in map(lambda x: list(chain.from_iterable(x)), combinations(a, 2)):
j = [i for i, _ in groupby(r)]
if len(j) <= len(set(r)):
out.append(j)
print(out)
[[1, 2, 3], [1, 2, 4, 3]]
If you need only the first part, just find combinations and flatten:
out = list(map(lambda x: list(chain.from_iterable(x)), combinations(a, 2)))
print(out)
[[1, 2, 2, 3], [1, 2, 4, 3], [2, 3, 4, 3]]

Related

Fastest way to find all elements that maximize / minimize a function in a Python list

Let's use a simple example: say I have a list of lists
ll = [[1, 2], [1, 3], [2, 3], [1, 2, 3], [2, 3, 4]]
and I want to find all longest lists, which means all lists that maximize the len function. Of course we can do
def func(x):
return len(x)
maxlen = func(max(ll, key=lambda x: func(x)))
res = [l for l in ll if func(l) == maxlen]
print(res)
Output
[[1, 2, 3], [2, 3, 4]]
But I wonder if there are more efficient way to do this, especially when the function is very expensive or the list is very long. Any suggestions?
From a computer science/algorithms perspective, this is a very classical "reduce" problem.
so, pseudocode. It's honestly very straightforward.
metric():= a mapping from elements to non-negative numbers
winner = []
maxmetric = 0
for element in ll:
if metric(element) larger than maxmetric:
winner = [ element ]
maxmetric = metric(element)
else if metric(element) equal to maxmetric:
append element to winner
when the function is very expensive
Note that you do compute func(x) for each element twice, first there
maxlen = func(max(ll, key=lambda x: func(x)))
then there
res = [l for l in ll if func(l) == maxlen]
so it would be beneficial to store what was already computed. functools.lru_cache allow that easily just replace
def func(x):
return len(x)
using
import functools
#functools.lru_cache(maxsize=None)
def func(x):
return len(x)
However, beware as due to way how data are stored argument(s) must be hashable, so in your example you would first need convert list e.g. to tuples i.e.
ll = [(1, 2), (1, 3), (2, 3), (1, 2, 3), (2, 3, 4)]
See descripiton in docs for further discussion
Is not OK use dictionary like below, (this is O(n))
ll = [[1, 2], [1, 3], [2, 3], [1, 2, 3], [2, 3, 4]]
from collections import defaultdict
dct = defaultdict(list)
for l in ll:
dct[len(l)].append(l)
dct[max(dct)]
Output:
[[1, 2, 3], [2, 3, 4]]
>>> dct
defaultdict(list, {2: [[1, 2], [1, 3], [2, 3]], 3: [[1, 2, 3], [2, 3, 4]]})
OR use setdefault and without defaultdict like below:
ll = [[1, 2], [1, 3], [2, 3], [1, 2, 3], [2, 3, 4]]
dct = {}
for l in ll:
dct.setdefault(len(l), []).append(l)
Output:
>>> dct
{2: [[1, 2], [1, 3], [2, 3]], 3: [[1, 2, 3], [2, 3, 4]]}

How to merge smaller sub-elements into larger parent-elements in a list?

I have a list of lists, but some lists are "sublists" of other lists. What I want to do is remove the sublists from the larger list so that we only have the largest unique sublists.
For example:
>>> some_list = [[1], [1, 2], [1, 2, 3], [1, 4]]
>>> ideal_list = [[1, 2, 3], [1, 4]]
The code that I've written right now is:
new_list = []
for i in range(some_list)):
for j in range(i + 1, len(some_list)):
count = 0
for k in some_list[i]:
if k in some_list[j]:
count += 1
if count == len(some_list[i]):
new_list.append(some_list[j])
The basic algorithm that I had in mind is that we'd check if a list's elements were in the following sublists, and if so then we use the other larger sublist. It doesn't give the desired output (it actually gives [[1, 2], [1, 2, 3], [1, 4], [1, 2, 3]]) and I'm wondering what I could do to achieve what I want.
I don't want to use sets because duplicate elements matter.
Same idea as set, but using Counter instead. It should be a lot more efficient in sublist check part than brute force
from collections import Counter
new_list = []
counters = []
for arr in sorted(some_list, key=len, reverse=True):
arr_counter = Counter(arr)
if any((c & arr_counter) == arr_counter for c in counters):
continue # it is a sublist of something else
new_list.append(arr)
counters.append(arr_counter)
With some inspiration from #mkrieger1's comment, one possible solution would be:
def merge_sublists(some_list):
new_list = []
for i in range(len(some_list)):
true_or_false = []
for j in range(len(some_list)):
if some_list[j] == some_list[i]:
continue
true_or_false.append(all([x in some_list[j] for x in some_list[i]]))
if not any(true_or_false):
new_list.append(some_list[i])
return new_list
As is stated in the comment, a brute-force solution would be to loop through each element and check if it's a sublist of any other sublist. If it's not, then append it to the new list.
Test cases:
>>> merge_sublists([[1], [1, 2], [1, 2, 3], [1, 4]])
[[1, 2, 3], [1, 4]]
>>> merge_sublists([[1, 2, 3], [4, 5], [3, 4]])
[[1, 2, 3], [4, 5], [3, 4]]
Input:
l = [[1], [1, 2], [1, 2, 3], [1, 4]]
One way here:
l1 = l.copy()
for i in l:
for j in l:
if set(i).issubset(set(j)) and i!=j:
l1.remove(i)
break
This prints:
print(l1)
[[1, 2, 3], [1, 4]]
EDIT: (Taking care of duplicates as well)
l1 = [list(tupl) for tupl in {tuple(item) for item in l }]
l2 = l1.copy()
for i in l1:
for j in l1:
if set(i).issubset(set(j)) and i!=j:
l2.remove(i)
break

How to do Math Functions on Lists within a List

I'm very new to python (using python3) and I'm trying to add numbers from one list to another list. The only problem is that the second list is a list of lists. For example:
[[1, 2, 3], [4, 5, 6]]
What I want is to, say, add 1 to each item in the first list and 2 to each item in the second, returning something like this:
[[2, 3, 4], [6, 7, 8]]
I tried this:
original_lst = [[1, 2, 3], [4, 5, 6]]
trasposition_lst = [1, 2]
new_lst = [x+y for x,y in zip(original_lst, transposition_ls)]
print(new_lst)
When I do this, I get an error
can only concatenate list (not "int") to list
This leads me to believe that I can't operate in this way on the lists as long as they are nested within another list. I want to do this operation without flattening the nested list. Is there a solution?
One approach using enumerate
Demo:
l = [[1, 2, 3], [4, 5, 6]]
print( [[j+i for j in v] for i,v in enumerate(l, 1)] )
Output:
[[2, 3, 4], [6, 7, 8]]
You can use enumerate:
l = [[1, 2, 3], [4, 5, 6]]
new_l = [[c+i for c in a] for i, a in enumerate(l, 1)]
Output:
[[2, 3, 4], [6, 7, 8]]
Why don't use numpy instead?
import numpy as np
mat = np.array([[1, 2, 3], [4, 5, 6]])
mul = np.array([1,2])
m = np.ones(mat.shape)
res = (m.T *mul).T + mat
You were very close with you original method. Just fell one step short.
Small addition
original_lst = [[1, 2, 3], [4, 5, 6]]
transposition_lst = [1, 2]
new_lst = [[xx + y for xx in x] for x, y in zip(original_lst, transposition_lst)]
print(new_lst)
Output
[[2, 3, 4], [6, 7, 8]]
Reasoning
If you print your original zip it is easy to see the issue. Your original zip yielded this:
In:
original_lst = [[1, 2, 3], [4, 5, 6]]
transposition_lst = [1, 2]
for x,y in zip(original_lst, transposition_lst):
print(x, y)
Output
[1, 2, 3] 1
[4, 5, 6] 2
Now it is easy to see that you are trying to add an integer to a list (hence the error). Which python doesn't understand. if they were both integers it would add them or if they were both lists it would combine them.
To fix this you need to do one extra step with your code to add the integer to each value in the list. Hence the addition of the extra list comprehension in the solution above.
A different approach than numpy that could work even for lists of different lengths is
lst = [[1, 2, 3], [4, 5, 6, 7]]
c = [1, 2]
res = [[l + c[i] for l in lst[i]] for i in range(len(c))]

find occurrences of elements of a list in a list of list

i have a list [1, 2, 3]
i want to find number of times the elements of this list appears in a list of list:
lol = [[1, 2, 4, 5], [2, 3, 1, 2], [1, 2, 3], [3, 2, 6, 7, 1], [1, 4, 2, 6, 3]]
occurrences = 4
What I’m doing currently is the following:
a = [1, 2, 3]
lol = [[1, 2, 4, 5], [2, 3, 1, 2], [1, 2, 3], [3, 2, 6, 7, 1], [1, 4, 2, 6, 3]]
def get_count(a, b):
a = set(a)
return sum([a.issubset(x) for x in b])
print(get_count(a, lol))
This method works but is quite slow when I have 100s of 1000s of list to compare with a list of list (lol remains static!)
can we also preserve the "order" of the elements? there can be other elements in between. in this case occurrences will be 2 for the above case
Why not try:
testlist = lol ##Create a test list that we will work with
for i in range len(testlist): ##Start a loop that will repeat length of testlist times
if a in testlist: ##If/When it finds the first occurrence of the list a
Occurrences =+ 1 ##It adds 1 to the amount off occurences
Pos = testlist.index(a)
testlist.del(Pos) ##It deletes the instance from the list.
This should work

extracting item with most common probability in python list

I have a list [[1, 2, 7], [1, 2, 3], [1, 2, 3, 7], [1, 2, 3, 5, 6, 7]] and I need [1,2,3,7] as final result (this is kind of reverse engineering). One logic is to check intersections -
while(i<dlistlen):
j=i+1
while(j<dlistlen):
il = dlist1[i]
jl = dlist1[j]
tmp = list(set(il) & set(jl))
print tmp
#print i,j
j=j+1
i=i+1
this is giving me output :
[1, 2]
[1, 2, 7]
[1, 2, 7]
[1, 2, 3]
[1, 2, 3]
[1, 2, 3, 7]
[]
Looks like I am close to getting [1,2,3,7] as my final answer, but can't figure out how. Please note, in the very first list (([[1, 2, 7], [1, 2, 3], [1, 2, 3, 7], [1, 2, 3, 5, 6, 7]] )) there may be more items leading to one more final answer besides [1,2,3,4]. But as of now, I need to extract only [1,2,3,7] .
Please note, this is not kind of homework, I am creating own clustering algorithm that fits my need.
You can use the Counter class to keep track of how often elements appear.
>>> from itertools import chain
>>> from collections import Counter
>>> l = [[1, 2, 7], [1, 2, 3], [1, 2, 3, 7], [1, 2, 3, 5, 6, 7]]
>>> #use chain(*l) to flatten the lists into a single list
>>> c = Counter(chain(*l))
>>> print c
Counter({1: 4, 2: 4, 3: 3, 7: 3, 5: 1, 6: 1})
>>> #sort keys in order of descending frequency
>>> sortedValues = sorted(c.keys(), key=lambda x: c[x], reverse=True)
>>> #show the four most common values
>>> print sortedValues[:4]
[1, 2, 3, 7]
>>> #alternatively, show the values that appear in more than 50% of all lists
>>> print [value for value, freq in c.iteritems() if float(freq) / len(l) > 0.50]
[1, 2, 3, 7]
It looks like you're trying to find the largest intersection of two list elements. This will do that:
from itertools import combinations
# convert all list elements to sets for speed
dlist = [set(x) for x in dlist]
intersections = (x & y for x, y in combinations(dlist, 2))
longest_intersection = max(intersections, key=len)

Categories

Resources