I am creating a list of lists and want to prevent dupes. For example, I have:
mainlist = [[a,b],[c,d],[a,d]]
the next item (list) to be added is [b,a] which is considered a duplicate of [a,b].
UPDATE
mainlist = [[a,b],[c,d],[a,d]]
swap = [b,a]
for item in mainlist:
if set(item) & set(swap):
print "match was found", item
else:
mainlist.append(swap)
Any suggestions as to how I can test whether the next item to be added is already in the list?
Here's an approach using frozensets within a set to check for duplicates. It's a bit ugly since I'm invoking a function that works with global variables.
def add_to_mainlist(new_list):
if frozenset(new_list) not in dups:
mainlist.append(new_list)
mainlist = [['a', 'b'],['c', 'd'],['a', 'd']]
dups = set()
for l in mainlist:
dups.add(frozenset(l))
print("Before:", mainlist)
add_to_mainlist(['a', 'b'])
print("After:", mainlist)
This outputs:
Before: [['a', 'b'], ['c', 'd'], ['a', 'd']]
After: [['a', 'b'], ['c', 'd'], ['a', 'd']]
Showing that the new list was indeed not added to the original.
Here's a cleaner version that calculates the existing set on the fly inside a function that does everything locally:
def add_to_mainlist(mainlist, new_list):
dups = set()
for l in mainlist:
dups.add(frozenset(l))
if frozenset(new_list) not in dups:
mainlist.append(new_list)
return mainlist
mainlist = [['a', 'b'],['c', 'd'],['a', 'd']]
print("Before:", mainlist)
mainlist = add_to_mainlist(mainlist, ['a', 'b']) # the assignment isn't needed, but done anyway :-)
print("After:", mainlist)
Why doesn't your existing code work?
This is what you're doing:
...
for item in mainlist:
if set(item) & set(swap):
print "match was found", item
else:
mainlist.append(swap)
You're intersecting two sets and checking the truthiness of the result. While this might be okay for 0 intersections, in the event that even one of the elements are common (example, ['a', 'b'] and ['b', 'd']), you'd still declare a match which is false.
Ideally you'd want to check the length of the resultant set and make sure its length is equal to than 2:
dups = False
for item in mainlist:
if len(set(item) & set(swap)) == 2:
dups = True
break
if dups == False:
mainlist.append(swap)
You'd also ideally want a flag to ensure that you did not find duplicates. Your previous code would add without checking all items first.
If the order of your inner lists doesn't matter, then this can trivially be accomplished using frozenset()s:
>>> mainlist = [['a', 'b'],['c', 'd'],['a', 'd']]
>>> mainlist = [frozenset(sublist) for sublist in mainlist]
>>>
>>> def add_to_list(lst, sublist):
... if frozenset(sublist) not in lst:
... lst.append(frozenset(sublist))
...
>>> mainlist
[frozenset({'a', 'b'}), frozenset({'d', 'c'}), frozenset({'a', 'd'})]
>>> add_to_list(mainlist, ['b', 'a'])
>>> mainlist
[frozenset({'a', 'b'}), frozenset({'d', 'c'}), frozenset({'a', 'd'})]
>>>
If the order does matter you can either do what #Coldspeed suggested - Construct a set() from your list, construct a frozenset() from the list to be added, and test for membership - or you can use all() and sorted() to test if the list to be added is equivalent to any of the other lists:
>>> def add_to_list(lst, sublist):
... for l in lst:
... if all(a == b for a, b in zip(sorted(sublist), sorted(l))):
... return
... lst.append(sublist)
...
>>> mainlist
[['a', 'b'], ['c', 'd'], ['a', 'd']]
>>> add_to_list(mainlist, ['b', 'a'])
>>> mainlist
[['a', 'b'], ['c', 'd'], ['a', 'd']]
>>>
I'm looking to create a positional_index function that takes in two or more lists as argument and should return the doc_id and the position of the word that occurs in various lists.
ex:
index = create_positional_index([['a', 'b', 'a'], ['a', 'c']])
index['a']
[[0, 0, 2], [1, 0]]
index['b']
[[0, 1]]
index['c']
[[1, 1]]
Here the first '0' represents the doc_id followed by the positions at which 'a' appears in document 0. since 'a' appears in both the documents we've got two lists. 'b' appears only in document 0 and 'c' appears only in document 1..
can anyone help me write this function ?
I would suggest using dict's for doc_id.
EDIT: Changed function to OPs output format.
class create_positional_index():
def __init__(self, lst):
self.lst = lst
def __getitem__(self, elm):
return [[doc_id] + [pos for pos, key in enumerate(sub_lst) if key==elm] for doc_id, sub_lst in enumerate(self.lst) if elm in sub_lst]
index = create_positional_index([['a', 'b', 'a'], ['a', 'c']])
print index['a']
print index['b']
print index['c']
def create_positional_index2(lst, elm):
return [[doc_id] + [pos for pos, key in enumerate(sub_lst) if key==elm] for doc_id, sub_lst in enumerate(lst) if elm in sub_lst]
print create_positional_index2([['a', 'b', 'a'], ['a', 'c']], 'a')
print create_positional_index2([['a', 'b', 'a'], ['a', 'c']], 'b')
print create_positional_index2([['a', 'b', 'a'], ['a', 'c']], 'c')
You can use the following function :
>>> from itertools import chain
>>> sample_list = [['a', 'b', 'a'], ['a', 'c']]
>>> def find_index(s_list) :
... for elem in set(chain(*s_list)) :
... yield {elem:[[i]+[t for t,k in enumerate(j) if k==elem] for i,j in enumerate(s_list)]}
...
>>> list(find_index(sample_list)
... )
[{'a': [[0, 0, 2], [1, 0]]}, {'c': [[0], [1, 1]]}, {'b': [[0, 1], [1]]}]
All stuff that you need here is using enumerate within two list comprehension . note that set(chain(*s_list)) create a set of unique elements of all your sub_lists .
I found it fairly difficult to understand your question. However in spite of of that and after looking at the other answers along with your comments, I think the following would be a good way to accomplish want your goal. It defines a subclass of dict to hold the index, so you can also use regular dictionary methods such as keys(), items(), or update on any instances created if you want.
from itertools import chain
class PositionalIndex(dict):
def __init__(self, *lists):
self.update(
(doc_id, [[i]+[j for j, k in enumerate(sublist) if k == doc_id]
for i, sublist in enumerate(lists)])
for doc_id in set(chain.from_iterable(lists)))
index = PositionalIndex(['a', 'b', 'a'], ['a', 'c'])
for doc_id in sorted(index):
print('index[{!r}] --> {}'.format(doc_id, index[doc_id]))
Output:
index['a'] --> [[0, 0, 2], [1, 0]]
index['b'] --> [[0, 1], [1]]
index['c'] --> [[0], [1, 1]]
How can I reorganize sublists and exclude certain items from sublists to create a new list of sublists?
By reorganize I mean that I want to change the order of the items within each sublists across each sublist. For example moving every element at index 0 to index 1, and moving every element in index 2 to index 0 across every sublist. At the same time, I don't want to include index 1 in the original list of sublists.
Original_List = [['a','b','c'],['a','b','c'],['a','b','c']]
Desired_List = [['c','a'],['c','a'],['c','a']]
I currently have this function, which rearranges and pulls out different indexes from a sublist.
def Function(li):
return map(lambda x: (x[2] + "|" + x[0]).split("|"),li)
However, there are situations in which the sublists are much longer and there are more indexes that I want to pull out.
Rather than making this same function for 3 or 4 indexes like this for example:
def Function(li):
return map(lambda x: (x[2] + "|" + x[1] + "|" + x[0]).split("|"),li)
I'd like to use the *args, so that I can specify different amounts of indexes of the sublists to pull out. This is what I have so far, but I get a TypeError.
def Function(self,li,*args):
return map(lambda x: ([int(arg) + "|" for arg in args]).split("|"))
I get a TypeError, which I can understand but can't get around:
TypeError: string indices must be integers, not str
Perhaps there is a better and faster method entirely to rearrange sublists and exclude certain items within those sublists?
Also, it would be amazing if the function could deal with sub-sub-lists like this.
Original_List = [['a','b','c',['1','2','3']],['a','b','c',['1','2','3']],['a','b','c',['1','2','3']]]
Inputs that I'd like to achieve this:
[2] for c
[0] for a
[3][1] for '2'
Desired_List = [['c','a','2'],['c','a','2'],['c','a','2']]
I think what you are describing is this:
def sublist_indices(lst, *args):
return [[l[i] for i in args] for l in lst]
>>> sublist_indices([[1, 2, 3], [4, 5, 6]], 2, 0)
[[3, 1], [6, 4]]
If your sublists and sub-sublists contain all iterable items (e.g. strings, lists), you can use itertools.chain.from_iterable to flatten the sub-sublists, and then index in:
from itertools import chain
def sublists(lst, *args):
return [[list(chain.from_iterable(l))[i] for i in args] for l in lst]
e.g.
>>> lst = [['a', 'b', 'c', ['1', '2', '3']],
['a', 'b', 'c', ['1', '2', '3']],
['a', 'b', 'c', ['1', '2', '3']]]
>>> sublists(lst, 2, 0, 4)
[['c', 'a', '2'], ['c', 'a', '2'], ['c', 'a', '2']]
original = [['a','b','c'],['a','b','c'],['a','b','c']]
desired = [['c','a'],['c','a'],['c','a']]
def filter_indices(xs, indices):
return [[x[i] for i in indices if i < len(x)] for x in xs]
filter_indices(original, [2, 0])
# [['c', 'a'], ['c', 'a'], ['c', 'a']]
filter_indices(original, [2, 1, 0])
# [['c', 'b', 'a'], ['c', 'b', 'a'], ['c', 'b', 'a']]
I'm not sure what exactly you mean by "reorganize", but this nested list comprehension will take in a list of lists li and return a new list which contains the lists in li, but with the indices in args excluded.
def exclude_indices(li, *args):
return [[subli[i] for i in range(len(subli)) if i not in args] for subli in li]
I have two lists, the first of which is guaranteed to contain exactly one more item than the second. I would like to know the most Pythonic way to create a new list whose even-index values come from the first list and whose odd-index values come from the second list.
# example inputs
list1 = ['f', 'o', 'o']
list2 = ['hello', 'world']
# desired output
['f', 'hello', 'o', 'world', 'o']
This works, but isn't pretty:
list3 = []
while True:
try:
list3.append(list1.pop(0))
list3.append(list2.pop(0))
except IndexError:
break
How else can this be achieved? What's the most Pythonic approach?
If you need to handle lists of mismatched length (e.g. the second list is longer, or the first has more than one element more than the second), some solutions here will work while others will require adjustment. For more specific answers, see How to interleave two lists of different length? to leave the excess elements at the end, or How to elegantly interleave two lists of uneven length? to try to intersperse elements evenly, or Insert element in Python list after every nth element for the case where a specific number of elements should come before each "added" element.
Here's one way to do it by slicing:
>>> list1 = ['f', 'o', 'o']
>>> list2 = ['hello', 'world']
>>> result = [None]*(len(list1)+len(list2))
>>> result[::2] = list1
>>> result[1::2] = list2
>>> result
['f', 'hello', 'o', 'world', 'o']
There's a recipe for this in the itertools documentation (note: for Python 3):
from itertools import cycle, islice
def roundrobin(*iterables):
"roundrobin('ABC', 'D', 'EF') --> A D E B F C"
# Recipe credited to George Sakkis
num_active = len(iterables)
nexts = cycle(iter(it).__next__ for it in iterables)
while num_active:
try:
for next in nexts:
yield next()
except StopIteration:
# Remove the iterator we just exhausted from the cycle.
num_active -= 1
nexts = cycle(islice(nexts, num_active))
import itertools
print([x for x in itertools.chain.from_iterable(itertools.zip_longest(list1,list2)) if x])
I think this is the most pythonic way of doing it.
In Python 2, this should do what you want:
>>> iters = [iter(list1), iter(list2)]
>>> print list(it.next() for it in itertools.cycle(iters))
['f', 'hello', 'o', 'world', 'o']
Without itertools and assuming l1 is 1 item longer than l2:
>>> sum(zip(l1, l2+[0]), ())[:-1]
('f', 'hello', 'o', 'world', 'o')
In python 2, using itertools and assuming that lists don't contain None:
>>> filter(None, sum(itertools.izip_longest(l1, l2), ()))
('f', 'hello', 'o', 'world', 'o')
If both lists have equal length, you can do:
[x for y in zip(list1, list2) for x in y]
As the first list has one more element, you can add it post hoc:
[x for y in zip(list1, list2) for x in y] + [list1[-1]]
Edit: To illustrate what is happening in that first list comprehension, this is how you would spell it out as a nested for loop:
result = []
for y in zip(list1, list2): # y is is a 2-tuple, containining one element from each list
for x in y: # iterate over the 2-tuple
result.append(x) # append each element individually
I know the questions asks about two lists with one having one item more than the other, but I figured I would put this for others who may find this question.
Here is Duncan's solution adapted to work with two lists of different sizes.
list1 = ['f', 'o', 'o', 'b', 'a', 'r']
list2 = ['hello', 'world']
num = min(len(list1), len(list2))
result = [None]*(num*2)
result[::2] = list1[:num]
result[1::2] = list2[:num]
result.extend(list1[num:])
result.extend(list2[num:])
result
This outputs:
['f', 'hello', 'o', 'world', 'o', 'b', 'a', 'r']
Here's a one liner that does it:
list3 = [ item for pair in zip(list1, list2 + [0]) for item in pair][:-1]
Here's a one liner using list comprehensions, w/o other libraries:
list3 = [sub[i] for i in range(len(list2)) for sub in [list1, list2]] + [list1[-1]]
Here is another approach, if you allow alteration of your initial list1 by side effect:
[list1.insert((i+1)*2-1, list2[i]) for i in range(len(list2))]
This one is based on Carlos Valiente's contribution above
with an option to alternate groups of multiple items and make sure that all items are present in the output :
A=["a","b","c","d"]
B=[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16]
def cyclemix(xs, ys, n=1):
for p in range(0,int((len(ys)+len(xs))/n)):
for g in range(0,min(len(ys),n)):
yield ys[0]
ys.append(ys.pop(0))
for g in range(0,min(len(xs),n)):
yield xs[0]
xs.append(xs.pop(0))
print [x for x in cyclemix(A, B, 3)]
This will interlace lists A and B by groups of 3 values each:
['a', 'b', 'c', 1, 2, 3, 'd', 'a', 'b', 4, 5, 6, 'c', 'd', 'a', 7, 8, 9, 'b', 'c', 'd', 10, 11, 12, 'a', 'b', 'c', 13, 14, 15]
Might be a bit late buy yet another python one-liner. This works when the two lists have equal or unequal size. One thing worth nothing is it will modify a and b. If it's an issue, you need to use other solutions.
a = ['f', 'o', 'o']
b = ['hello', 'world']
sum([[a.pop(0), b.pop(0)] for i in range(min(len(a), len(b)))],[])+a+b
['f', 'hello', 'o', 'world', 'o']
from itertools import chain
list(chain(*zip('abc', 'def'))) # Note: this only works for lists of equal length
['a', 'd', 'b', 'e', 'c', 'f']
itertools.zip_longest returns an iterator of tuple pairs with any missing elements in one list replaced with fillvalue=None (passing fillvalue=object lets you use None as a value). If you flatten these pairs, then filter fillvalue in a list comprehension, this gives:
>>> from itertools import zip_longest
>>> def merge(a, b):
... return [
... x for y in zip_longest(a, b, fillvalue=object)
... for x in y if x is not object
... ]
...
>>> merge("abc", "defgh")
['a', 'd', 'b', 'e', 'c', 'f', 'g', 'h']
>>> merge([0, 1, 2], [4])
[0, 4, 1, 2]
>>> merge([0, 1, 2], [4, 5, 6, 7, 8])
[0, 4, 1, 5, 2, 6, 7, 8]
Generalized to arbitrary iterables:
>>> def merge(*its):
... return [
... x for y in zip_longest(*its, fillvalue=object)
... for x in y if x is not object
... ]
...
>>> merge("abc", "lmn1234", "xyz9", [None])
['a', 'l', 'x', None, 'b', 'm', 'y', 'c', 'n', 'z', '1', '9', '2', '3', '4']
>>> merge(*["abc", "x"]) # unpack an iterable
['a', 'x', 'b', 'c']
Finally, you may want to return a generator rather than a list comprehension:
>>> def merge(*its):
... return (
... x for y in zip_longest(*its, fillvalue=object)
... for x in y if x is not object
... )
...
>>> merge([1], [], [2, 3, 4])
<generator object merge.<locals>.<genexpr> at 0x000001996B466740>
>>> next(merge([1], [], [2, 3, 4]))
1
>>> list(merge([1], [], [2, 3, 4]))
[1, 2, 3, 4]
If you're OK with other packages, you can try more_itertools.roundrobin:
>>> list(roundrobin('ABC', 'D', 'EF'))
['A', 'D', 'E', 'B', 'F', 'C']
My take:
a = "hlowrd"
b = "el ol"
def func(xs, ys):
ys = iter(ys)
for x in xs:
yield x
yield ys.next()
print [x for x in func(a, b)]
def combine(list1, list2):
lst = []
len1 = len(list1)
len2 = len(list2)
for index in range( max(len1, len2) ):
if index+1 <= len1:
lst += [list1[index]]
if index+1 <= len2:
lst += [list2[index]]
return lst
How about numpy? It works with strings as well:
import numpy as np
np.array([[a,b] for a,b in zip([1,2,3],[2,3,4,5,6])]).ravel()
Result:
array([1, 2, 2, 3, 3, 4])
Stops on the shortest:
def interlace(*iters, next = next) -> collections.Iterable:
"""
interlace(i1, i2, ..., in) -> (
i1-0, i2-0, ..., in-0,
i1-1, i2-1, ..., in-1,
.
.
.
i1-n, i2-n, ..., in-n,
)
"""
return map(next, cycle([iter(x) for x in iters]))
Sure, resolving the next/__next__ method may be faster.
Multiple one-liners inspired by answers to another question:
import itertools
list(itertools.chain.from_iterable(itertools.izip_longest(list1, list2, fillvalue=object)))[:-1]
[i for l in itertools.izip_longest(list1, list2, fillvalue=object) for i in l if i is not object]
[item for sublist in map(None, list1, list2) for item in sublist][:-1]
An alternative in a functional & immutable way (Python 3):
from itertools import zip_longest
from functools import reduce
reduce(lambda lst, zipped: [*lst, *zipped] if zipped[1] != None else [*lst, zipped[0]], zip_longest(list1, list2),[])
using for loop also we can achive this easily:
list1 = ['f', 'o', 'o']
list2 = ['hello', 'world']
list3 = []
for i in range(len(list1)):
#print(list3)
list3.append(list1[i])
if i < len(list2):
list3.append(list2[i])
print(list3)
output :
['f', 'hello', 'o', 'world', 'o']
Further by using list comprehension this can be reduced. But for understanding this loop can be used.
My approach looks as follows:
from itertools import chain, zip_longest
def intersperse(*iterators):
# A random object not occurring in the iterators
filler = object()
r = (x for x in chain.from_iterable(zip_longest(*iterators, fillvalue=filler)) if x is not filler)
return r
list1 = ['f', 'o', 'o']
list2 = ['hello', 'world']
print(list(intersperse(list1, list2)))
It works for an arbitrary number of iterators and yields an iterator, so I applied list() in the print line.
def alternate_elements(small_list, big_list):
mew = []
count = 0
for i in range(len(small_list)):
mew.append(small_list[i])
mew.append(big_list[i])
count +=1
return mew+big_list[count:]
if len(l2)>len(l1):
res = alternate_elements(l1,l2)
else:
res = alternate_elements(l2,l1)
print(res)
Here we swap lists based on size and perform, can someone provide better solution with time complexity O(len(l1)+len(l2))
I'd do the simple:
chain.from_iterable( izip( list1, list2 ) )
It'll come up with an iterator without creating any additional storage needs.
This is nasty but works no matter the size of the lists:
list3 = [
element for element in
list(itertools.chain.from_iterable([
val for val in itertools.izip_longest(list1, list2)
]))
if element != None
]
Obviously late to the party, but here's a concise one for equal-length lists:
output = [e for sub in zip(list1,list2) for e in sub]
It generalizes for an arbitrary number of equal-length lists, too:
output = [e for sub in zip(list1,list2,list3) for e in sub]
etc.
I'm too old to be down with list comprehensions, so:
import operator
list3 = reduce(operator.add, zip(list1, list2))