Segmenting a list of lists in Python

Segmenting a list of lists in Python - python

I have a list of lists all of the same length. I would like to segment the first list into contiguous runs of a given value. I would then like to segment the remaining lists to match the segments generated from the first list.
For example:
Given value: 2
Given list of lists: [[0,0,2,2,2,1,1,1,2,3], [1,2,3,4,5,6,7,8,9,10], [1,1,1,1,1,1,1,1,1,1]
Return: [ [[2,2,2],[2]], [[3,4,5],[9]], [[1,1,1],[1]] ]
The closest I have gotten is to get the indices by:
>>> import itertools
>>> import operator
>>> x = 2
>>> L = [[0,0,2,2,2,1,1,1,2,3],[1,2,3,4,5,6,7,8,9,10],[1,1,1,1,1,1,1,1,1,1]]
>>> I = [[i for i,value in it] for key,it in itertools.groupby(enumerate(L[0]), key=operator.itemgetter(1)) if key == x]
>>> print I
[[2, 3, 4], [8]]
This code was modified from another question on this site.
I would like to find the most efficient way possible, since these lists may be very long.
EDIT:
Maybe if I place the lists one on top of each other it might be clearer:
[[0,0,[2,2,2],1,1,1,[2],3], -> [2,2,2],[2]
[1,2,[3,4,5],6,7,8,[9],10],-> [3,4,5],[9]
[1,1,[1,1,1],1,1,1,[1],1]] -> [1,1,1],[1]

You can use groupby to create a list of groups in the form of a tuple of starting index and length of the group, and use this list to extract the values from each sub-list:
from itertools import groupby
from operator import itemgetter
def match(L, x):
groups = [(next(g)[0], sum(1 for _ in g) + 1)
for k, g in groupby(enumerate(L[0]), key=itemgetter(1)) if k == x]
return [[lst[i: i + length] for i, length in groups] for lst in L]
so that:
match([[0,0,2,2,2,1,1,1,2,3], [1,2,3,4,5,6,7,8,9,10], [1,1,1,1,1,1,1,1,1,1]], 2)
returns:
[[[2, 2, 2], [2]], [[3, 4, 5], [9]], [[1, 1, 1], [1]]]

l=[[0,0,2,2,2,1,1,1,2,3], [1,2,3,4,5,6,7,8,9,10], [1,1,1,1,1,1,1,1,1,1]]
temp=l[0]
value=2
dict={}
k=-1
prev=-999
for i in range(0,len(temp)):
if(temp[i]==value):
if(prev!=-999 and prev==i-1):
if(k in dict):
dict[k].append(i)
else:
dict[k]=[i]
else:
k+=1
if(k in dict):
dict[k].append(i)
else:
dict[k]=[i]
prev=i
output=[]
for i in range(0,len(l)):
single=l[i]
final=[]
for keys in dict: #{0: [2, 3, 4], 1: [8]}
ans=[]
desired_indices=dict[keys]
for j in range(0,len(desired_indices)):
ans.append(single[desired_indices[j]])
final.append(ans)
output.append(final)
print(output) #[[[2, 2, 2], [2]], [[3, 4, 5], [9]], [[1, 1, 1], [1]]]
This seems to be one of the approach, this first creates the dictionary of contagious elements and then looks for that keys in every list and stores in output.

Related

Split lists and tuples in Python

I have a simple question.
I have list, or a tuple, and I want to split it into many lists (or tuples) that contain the same elements.
I'll try to be more clear using an example:
(1,1,2,2,3,3,4) --> (1,1),(2,2),(3,3),(4,)
(1,2,3,3,3,3) --> (1,),(2,),(3,3,3,3)
[2,2,3,3,2,3] --> [2,2],[3,3],[2],[3]
How can I do? I know that tuples and lists do not have the attribute "split" so i thought that i could turn them into strings before. This is what i tried:
def splitt(l)
x=str(l)
for i in range (len(x)-1):
if x[i]!=x[i+1]:
x.split()
return x

You can use groupby.
import itertools as it
[list(grp) if isinstance(t,list) else tuple(grp) for k, grp in it.groupby(t)]
Examples:
>>> t = (1,2,3,3,3,3)
[(1,), (2,), (3, 3, 3, 3)]
>>> t = [2,2,3,3,2,3]
[[2, 2], [3, 3], [2], [3]]

You also may try with for-loop:
def group_lt(list_or_tuple):
result = []
for x in list_or_tuple:
if not result or result[-1][0] != x:
result.append(type(list_or_tuple)([x]))
else:
result[-1] += type(list_or_tuple)([x])
return result
t = (1,1,2,2,3,3,4)
print(group_lt(t)) # [(1,1),(2,2),(3,3),(4,)]
l = [2,2,3,3,2,3]
print(group_lt(l)) # [[2,2],[3,3],[2],[3]]

Try this
from itertools import groupby
input_list = [1, 1, 2, 4, 6, 6, 7]
output = [list(g) for k, g in groupby(input_list)]

unite lists if at least one value matches in python

Let's say I have a list of lists, for example:
[[0, 2], [0, 1], [2, 3], [4, 5, 7, 8], [6, 4]]
and if at least one of the values on a list is the same that another one of a different list, i would like to unite the lists so in the example the final result would be:
[[0, 1, 2, 3], [4, 5, 6, 7, 8]]
I really don't care about the order of the values inside the list [0, 1, 2, 3] or [0, 2, 1, 3].
I tried to do it but it doesn't work. So have you got any ideas? Thanks.
Edit(sorry for not posting the code that i tried before):
What i tried to do was the following:
for p in llista:
for q in p:
for k in llista:
if p==k:
llista.remove(k)
else:
for h in k:
if p!=k:
if q==h:
k.remove(h)
for t in k:
if t not in p:
p.append(t)
llista_final = [x for x in llista if x != []]
Where llista is the list of lists.

I have to admit this is a tricky problem. I'm really curious what does this problem represent and/or where did you find it out...
I initially have thought this is just a graph connected components problem, but I wanted to take a shortcut from creating an explicit representation of the graph, running bfs, etc...
The idea of the solution is this: for every sublist, check if it has some common element with any other sublist, and replace that with their union.
Not very pythonic, but here it is:
def merge(l):
l = list(map(tuple, l))
for i, h in enumerate(l):
sh = set(h)
for j, k in enumerate(l):
if i == j: continue
sk = set(k)
if sh & sk: # h and k have some element in common
l[j] = tuple(sh | sk)
return list(map(list, set(l)))

Here is a function that does what you want. I tried to use self-documenting variable names and comments to help you understand how this code works. As far as I can tell, the code is pythonic. I used sets to speed up and simplify some of the operations. The downside of that is that the items in your input list-of-lists must be hashable, but your example uses integers which works perfectly well.
def cliquesfromlistoflists(inputlistoflists):
"""Given a list of lists, return a new list of lists that unites
the old lists that have at least one element in common.
"""
listofdisjointsets = []
for inputlist in inputlistoflists:
# Update the list of disjoint sets using the current sublist
inputset = set(inputlist)
unionofsetsoverlappinginputset = inputset.copy()
listofdisjointsetsnotoverlappinginputset = []
for aset in listofdisjointsets:
# Unite set if overlaps the new input set, else just store it
if aset.isdisjoint(inputset):
listofdisjointsetsnotoverlappinginputset.append(aset)
else:
unionofsetsoverlappinginputset.update(aset)
listofdisjointsets = (listofdisjointsetsnotoverlappinginputset
+ [unionofsetsoverlappinginputset])
# Return the information in a list-of-lists format
return [list(aset) for aset in listofdisjointsets]
print(cliquesfromlistoflists([[0, 2], [0, 1], [2, 3], [4, 5, 7, 8], [6, 4]]))
# printout is [[0, 1, 2, 3], [4, 5, 6, 7, 8]]

This solution modifies the generic breadth-first search to gradually diminish the initial deque and update a result list with either a combination should a match be found or a list addition if no grouping is discovered:
from collections import deque
d = deque([[0,2] , [0,1] , [2,3] , [4,5,7,8] , [6,4]])
result = [d.popleft()]
while d:
v = d.popleft()
result = [list(set(i+v)) if any(c in i for c in v) else i for i in result] if any(any(c in i for c in v) for i in result) else result + [v]
Output:
[[0, 1, 2, 3], [8, 4, 5, 6, 7]]

Filtering the duplicates in subset sum combinations

Given an array, I've found all the combinations of subsets that equal a targeted sum, that's because I want the largest array possible.
For instance, the array [1, 2, 2, 2] for the target sum of "4" returns [[2, 2], [2, 2], [2, 2]].
subsets = []
def subset_sum(numbers, target, partial=[]):
s = sum(partial)
if s == target:
subsets.append(partial)
if s >= target:
return
for i in range(len(numbers)):
n = numbers[i]
remaining = numbers[i + 1:]
subset_sum(remaining, target, partial + [n])
subsets.sort()
subsets.reversed()
How can I remove values that are once mentioned in the subsets' list?
In the example above, how can I hay only one [2,2].
And that, show the values of the initial array that are not in this final list?
In the example above [1].

You can use itertools.groupby to remove duplicate lists:
>>> import itertools
>>> lst = [[2, 2], [2, 2], [2, 2]]
>>> lst.sort()
>>> new_lst = list(k for k,_ in itertools.groupby(lst))
>>> print(new_lst)
[[2, 2]]
Then simply flatten new_lst with itertools.chain.from_iterable and check if any of the elements from the initial list do not exist in this flattened list:
>>> initial = [1,2,2,2]
>>> print([x for x in initial if x not in itertools.chain.from_iterable(new_lst)])
[1]
Note: You can probably make your subset_sum() return a list of non duplicate items also, but the above should also work fine.

This is not a direct answer to your question, but a better algorithm. If you're only looking for one example of a list of maximal length which satisfies your sum criterion, you should be looking at longer lists first. This code uses itertools for the combinatorial bits and will stop when the longest list is found.
numbers = [1, 2, 2, 2]
taget = 5
for size in reversed(range(1, 1 + len(numbers))):
for c in itertools.combinations(numbers, size):
if sum(c) == target:
break
else:
continue
break
c now contains the longest subset as a tuple (1, 2, 2)

You can do something like this:
Data is :
data=[1, 2, 2,2]
import itertools
your_target=4
One line solution:
print(set([k for k in itertools.combinations(data,r=2) if sum(k)==your_target]))
output:
{(2, 2)}
or better if you use a function:
def targeted_sum(data,your_target):
result=set([k for k in itertools.combinations(data,r=2) if sum(k)==your_target])
return result
print(targeted_sum(data,4))

Separating same numbers from the list and making the list of such lists [duplicate]

From this list:
N = [1,2,2,3,3,3,4,4,4,4,5,5,5,5,5]
I'm trying to create:
L = [[1],[2,2],[3,3,3],[4,4,4,4],[5,5,5,5,5]]
Any value which is found to be the same is grouped into it's own sublist.
Here is my attempt so far, I'm thinking I should use a while loop?
global n
n = [1,2,2,3,3,3,4,4,4,4,5,5,5,5,5] #Sorted list
l = [] #Empty list to append values to
def compare(val):
""" This function receives index values
from the n list (n[0] etc) """
global valin
valin = val
global count
count = 0
for i in xrange(len(n)):
if valin == n[count]: # If the input value i.e. n[x] == n[iteration]
temp = valin, n[count]
l.append(temp) #append the values to a new list
count +=1
else:
count +=1
for x in xrange (len(n)):
compare(n[x]) #pass the n[x] to compare function

Use itertools.groupby:
from itertools import groupby
N = [1,2,2,3,3,3,4,4,4,4,5,5,5,5,5]
print([list(j) for i, j in groupby(N)])
Output:
[[1], [2, 2], [3, 3, 3], [4, 4, 4, 4], [5, 5, 5, 5, 5]]
Side note: Prevent from using global variable when you don't need to.

Someone mentions for N=[1, 2, 2, 3, 3, 3, 4, 4, 4, 4, 5, 5, 5, 5, 5, 1] it will get [[1], [2, 2], [3, 3, 3], [4, 4, 4, 4], [5, 5, 5, 5, 5], [1]]
In other words, when numbers of the list isn't in order or it is a mess list, it's not available.
So I have better answer to solve this problem.
from collections import Counter
N = [1,2,2,3,3,3,4,4,4,4,5,5,5,5,5]
C = Counter(N)
print [ [k,]*v for k,v in C.items()]

You can use itertools.groupby along with a list comprehension
>>> l = [1,2,2,3,3,3,4,4,4,4,5,5,5,5,5]
>>> [list(v) for k,v in itertools.groupby(l)]
[[1], [2, 2], [3, 3, 3], [4, 4, 4, 4], [5, 5, 5, 5, 5]]
This can be assigned to the variable L as in
L = [list(v) for k,v in itertools.groupby(l)]

You're overcomplicating this.
What you want to do is: for each value, if it's the same as the last value, just append it to the list of last values; otherwise, create a new list. You can translate that English directly to Python:
new_list = []
for value in old_list:
if new_list and new_list[-1][0] == value:
new_list[-1].append(value)
else:
new_list.append([value])
There are even simpler ways to do this if you're willing to get a bit more abstract, e.g., by using the grouping functions in itertools. But this should be easy to understand.
If you really need to do this with a while loop, you can translate any for loop into a while loop like this:
for value in iterable:
do_stuff(value)
iterator = iter(iterable)
while True:
try:
value = next(iterator)
except StopIteration:
break
do_stuff(value)
Or, if you know the iterable is a sequence, you can use a slightly simpler while loop:
index = 0
while index < len(sequence):
value = sequence[index]
do_stuff(value)
index += 1
But both of these make your code less readable, less Pythonic, more complicated, less efficient, easier to get wrong, etc.

You can do that using numpy too:
import numpy as np
N = np.array([1,2,2,3,3,3,4,4,4,4,5,5,5,5,5])
counter = np.arange(1, np.alen(N))
L = np.split(N, counter[N[1:]!=N[:-1]])
The advantage of this method is when you have another list which is related to N and you want to split it in the same way.

Another slightly different solution that doesn't rely on itertools:
#!/usr/bin/env python
def group(items):
"""
groups a sorted list of integers into sublists based on the integer key
"""
if len(items) == 0:
return []
grouped_items = []
prev_item, rest_items = items[0], items[1:]
subgroup = [prev_item]
for item in rest_items:
if item != prev_item:
grouped_items.append(subgroup)
subgroup = []
subgroup.append(item)
prev_item = item
grouped_items.append(subgroup)
return grouped_items
print group([1,2,2,3,3,3,4,4,4,4,5,5,5,5,5])
# [[1], [2, 2], [3, 3, 3], [4, 4, 4, 4], [5, 5, 5, 5, 5]]

Find multiple occuring string in array and output index

I have a array filled with e-mail addresses which change constantly. e.g.
mailAddressList = ['chip#plastroltech.com','spammer#example.test','webdude#plastroltech.com','spammer#example.test','spammer#example.test','support#plastroltech.com']
How do I find multiple occurrences of the same string in the array and output it's indexes?

just group indexes by email and print only those items, where lenght of index list is greater than 1:
from collections import defaultdict
mailAddressList = ['chip#plastroltech.com',
'spammer#example.test',
'webdude#plastroltech.com',
'spammer#example.test',
'spammer#example.test',
'support#plastroltech.com'
]
index = defaultdict(list)
for i, email in enumerate(mailAddressList):
index[email].append(i)
print [(email, positions) for email, positions in index.items()
if len(positions) > 1]
# [('spammer#example.test', [1, 3, 4])]

Try this:
query = 'spammer#example.test''
indexes = [i for i, x in enumerate(mailAddressList) if x == query]
Output:
[1, 3, 4]

In [7]: import collections
In [8]: q=collections.Counter(mailAddressList).most_common()
In [9]: indexes = [i for i, x in enumerate(mailAddressList) if x == q[0][0]]
In [10]: indexes
Out[10]: [1, 3, 4]

note: solutions submitted before are more pythonic than mine. but in my opinon, lines that i've written before are easier to understand. i simply will create a dictionary, then will add mail adresses as key and the indexes as value.
first declare an empty dictionary.
>>> dct = {}
then iterate over mail adresses (m) and their indexes (i) in mailAddressList and add them to dictionary.
>>> for i, m in enumerate(mailAddressList):
... if m not in dct.keys():
... dct[m]=[i]
... else:
... dct[m].append(i)
...
now, dct looks liike this.
>>> dct
{'support#plastroltech.com': [5], 'webdude#plastroltech.com': [2],
'chip#plastroltech.com': [0], 'spammer#example.test': [1, 3, 4]}
there are many ways to grab the [1,3,4]. one of them (also not so pythonic :) )
>>> [i for i in dct.values() if len(i)>1][0]
[1, 3, 4]
or this
>>> [i for i in dct.items() if len(i[1])>1][0] #you can add [1] to get [1,3,4]
('spammer#example.test', [1, 3, 4])

Here's a dictionary comprehension solution:
result = { i: [ k[0] for k in list(enumerate(mailAddressList)) if k[1] == i ] for j, i in list(enumerate(mailAddressList)) }
# Gives you: {'webdude#plastroltech.com': [2], 'support#plastroltech.com': [5], 'spammer#example.test': [1, 3, 4], 'chip#plastroltech.com': [0]}
It's not ordered, of course, since it's a hash table. If you want to order it, you can use the OrderedDict collection. For instance, like so:
from collections import OrderedDict
final = OrderedDict(sorted(result.items(), key=lambda t: t[0]))
# Gives you: OrderedDict([('chip#plastroltech.com', [0]), ('spammer#example.test', [1, 3, 4]), ('support#plastroltech.com', [5]), ('webdude#plastroltech.com', [2])])
This discussion is less relevant, but it might also prove useful to you.

mailAddressList = ["chip#plastroltech.com","spammer#example.test","webdude#plastroltech.com","spammer#example.test","spammer#example.test","support#plastroltech.com"]
print [index for index, address in enumerate(mailAddressList) if mailAddressList.count(address) > 1]
prints [1, 3, 4], the indices of the addresses occuring more than once in the list.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Segmenting a list of lists in Python - python

Related

Split lists and tuples in Python

unite lists if at least one value matches in python

Filtering the duplicates in subset sum combinations

Separating same numbers from the list and making the list of such lists [duplicate]

Find multiple occuring string in array and output index

Categories

Resources