Combining tuples in list of tuples
test_list = [([1, 2, 3], 'gfg'), ([5, 4, 3], 'cs')]
How to get this output:
[(1, 'gfg'), (2, 'gfg'), (3, 'gfg'), (5, 'cs'), (4, 'cs'), (3, 'cs')]
Just to go into a bit more detail about how to do this with list comprehensions and explain what they are and how they work...
To begin with, here's a fairly long-winded way of achieving what you want:
test_list = [([1, 2, 3], 'gfg'), ([5, 4, 3], 'cs')]
result = [] # set up empty list to hold the result
for group in test_list: # loop through each 'group' in your list
(numbers, text) = group # unpack into the list of numbers and the text string
for n in numbers: # loop through the numbers
result.append((n, text)) # add the (number, text) tuple to the result list
print(result)
# [(1, 'gfg'), (2, 'gfg'), (3, 'gfg'), (5, 'cs'), (4, 'cs'), (3, 'cs')]
So we've achieved the result using two for loops, one nested inside the other.
But there's a really neat Python construct called a list comprehension which lets you do this kind of loop in just one line.
Using an example with just a single loop:
numbers = [1, 2, 3]
doubles = []
for n in numbers:
doubles.append(n * 2)
print(doubles)
# [2, 4, 6]
We can re-write this as the following list comprehension:
numbers = [1, 2, 3]
doubles = [n * 2 for n in numbers]
print(doubles)
# [2, 4, 6]
A list comprehension is of the form:
result = [<expression> for item in iterable]
which is equivalent to:
result = []
for item in iterable:
result.append(<expression>)
where <expression> is something involving item.
You can also nest list comprehensions like you can nest for loops. Going back to our original problem, we need to first change it so that we 'unpack' group into numbers and text directly when we set up the for loop:
result = []
for (numbers, text) in test_list:
for n in numbers:
result.append((n, text))
Now imagine dragging the for loops off to the right until we can line them all up:
result = []
for (numbers, text) in test_list:
for n in numbers:
result.append((n, text))
and then put the expression (i.e. (n, text)) at the left:
result = [(n, text) for (numbers, text) in test_list for n in numbers]
List comprehensions may seem strange at first (especially if you're jumping straight into a double list comprehension!), but one you've got your head around how they work, they are really neat and can be very powerful! There are also similar set comprehensions and dictionary comprehensions. Read more here: https://dbader.org/blog/list-dict-set-comprehensions-in-python
You can use nested list comprehensions:
test_list = [([1, 2, 3], 'gfg'), ([5, 4, 3], 'cs')]
result = [(z, y) for x, y in test_list for z in x]
# z = numbers in the lists inside the tuples
# x = the lists inside the tuples
# y = the strings inside the tuples
print(result)
Output:
[(1, 'gfg'), (2, 'gfg'), (3, 'gfg'), (5, 'cs'), (4, 'cs'),(3, 'cs')]
result = [(z, y) for x, y in test_list for z in x] is the list comprehension version for:
result = []
for x, y in test_list:
for z in x:
result.append((z,y))
Related
I am trying to get a list of lists that represent all possible ordered pairs from an existing list of lists.
import itertools
list_of_lists=[[0, 1, 2, 3, 4], [5], [6, 7],[8, 9],[10, 11],[12, 13],[14, 15],[16, 17],[18, 19],[20, 21],[22, 23],[24, 25],[26, 27],[28, 29],[30, 31],[32, 33],[34, 35],[36, 37],[38],[39]]
Ideally, we would just use itertools.product in order to get that list of ordered pairs.
scenarios_list=list(itertools.product(*list_of_lists))
However, if I were to do this for a larger list of lists I would get a memory error and so this solution is not scalable for larger lists of lists where there could be numerous different sets of ordered pairs.
So, is there a way to set up a process where we could iterate through these ordered pairs as they are produced where before appending the list to another list, we could test if the list satisfies a certain criteria (for example testing whether there are a certain number of even numbers, sum of list cannot be equal to the maximum, etc). If the criteria is not satisfied then the ordered pair would not be appended and thus not unnecessarily suck up memory when there are only certain ordered pairs that we care about.
Starting with a recursive base implementation of product:
def product(*lsts):
if not lsts:
yield ()
return
first_lst, *rest = lsts
for element in first_lst:
for rec_p in product(*rest):
p = (element,) + rec_p
yield p
[*product([1, 2], [3, 4, 5])]
# [(1, 3), (1, 4), (1, 5), (2, 3), (2, 4), (2, 5)]
Now, you could augment that with a condition by which you filter any p not meeting it:
def product(*lsts, condition=None):
if condition is None:
condition = lambda tpl: True
if not lsts:
yield ()
return
first_lst, *rest = lsts
for element in first_lst:
for rec_p in product(*rest, condition=condition):
p = (element,) + rec_p
if condition(p): # stop overproduction right where it happens
yield p
Now you can - for instance - restrict to only even elements:
[*product([1, 2], [3, 4, 5], condition=lambda tpl: not any(x%2 for x in tpl))]
# [(2, 4)]
I need to make a "generator function" which will take 2 lists and concatenate numbres from two lists by indices into a tuple. For example:
l1 = [3, 2, 1]
l2 = [4, 3, 2]
The result of the first iteration will be
(3, 4)
The result of the second iteration will be
(2, 3)
And the third
(1, 2)
And also, one of lists may have more numbers than the other one. In that case I need to write condition "if one of the lists ended while iterating, then no further iterations are performed." (using try, except)
I know, that generator functions use yield instead of return, but I have no idea how to write this fuction...
I did this
def generate_something(l1, l2):
l3 = tuple(tuple(x) for x in zip(l1, l2))
return l3
output is
((3, 4), (2, 3), (1, 2))
It works, but this is not geterator function, there is no yield, there is no first-,second-,etc- iterations. I hope you can help me...
May be this help for you:
def generate_something_2(l1, l2):
# if one of the lists ended while iterating, then no further iterations are performed.
# For this get minimal len of list
min_i = len(l1) if len(l1) < len(l2) else len(l2)
for i in range(0,min_i):
yield (l1[i], l2[i])
l1 = [3, 2, 1, 1]
l2 = [4, 3, 2]
l3 = tuple(generate_something_2(l1, l2))
# Result: ((3, 4), (2, 3), (1, 2))
print(l3)
you can create your own generator:
def iterate_lists(l1,l2):
for i,item in enumerate(l1):
yield l1[i],l2[i]
or as a variable by generator comprehension:
iterate_lists = (li[i],l2[i] for i,item in enumerate(l1))
about if the lengthes arent equal its not clear to me what exactly you want to do, without the message you can just go on the smaller list...
just change the for i,item in enumerate(l1) with for i in range(min((len(l1),len(l2)))
Use a while loop and keep track of the index position, when a IndexError is raised you've hit the end of the shortest list and stop:
def generate_something(l1, l2):
idx = 0
while True:
try:
yield l1[idx], l2[idx]
idx += 1
except IndexError:
break
l1 = [3, 2, 1]
l2 = [4, 3, 2]
g = generate_something(l1, l2)
print(list(g))
Output:
[(3, 4), (2, 3), (1, 2)]
My question is similar to this previous SO question.
I have two very large lists of data (almost 20 million data points) that contain numerous consecutive duplicates. I would like to remove the consecutive duplicate as follows:
list1 = [1,1,1,1,1,1,2,3,4,4,5,1,2] # This is 20M long!
list2 = ... # another list of size len(list1), also 20M long!
i = 0
while i < len(list)-1:
if list[i] == list[i+1]:
del list1[i]
del list2[i]
else:
i = i+1
And the output should be [1, 2, 3, 4, 5, 1, 2] for the first list.
Unfortunately, this is very slow since deleting an element in a list is a slow operation by itself. Is there any way I can speed up this process? Please note that, as shown in the above code snipped, I also need to keep track of the index i so that I can remove the corresponding element in list2.
Python has this groupby in the libraries for you:
>>> list1 = [1,1,1,1,1,1,2,3,4,4,5,1,2]
>>> from itertools import groupby
>>> [k for k,_ in groupby(list1)]
[1, 2, 3, 4, 5, 1, 2]
You can tweak it using the keyfunc argument, to also process the second list at the same time.
>>> list1 = [1,1,1,1,1,1,2,3,4,4,5,1,2]
>>> list2 = [9,9,9,8,8,8,7,7,7,6,6,6,5]
>>> from operator import itemgetter
>>> keyfunc = itemgetter(0)
>>> [next(g) for k,g in groupby(zip(list1, list2), keyfunc)]
[(1, 9), (2, 7), (3, 7), (4, 7), (5, 6), (1, 6), (2, 5)]
If you want to split those pairs back into separate sequences again:
>>> zip(*_) # "unzip" them
[(1, 2, 3, 4, 5, 1, 2), (9, 7, 7, 7, 6, 6, 5)]
You can use collections.deque and its max len argument to set a window size of 2. Then just compare the duplicity of the 2 entries in the window, and append to the results if different.
def remove_adj_dups(x):
"""
:parameter x is something like '1, 1, 2, 3, 3'
from an iterable such as a string or list or a generator
:return 1,2,3, as list
"""
result = []
from collections import deque
d = deque([object()], maxlen=2) # 1st entry is object() which only matches with itself. Kudos to Trey Hunner -->object()
for i in x:
d.append(i)
a, b = d
if a != b:
result.append(b)
return result
I generated a random list with duplicates of 20 million numbers between 0 and 10.
def random_nums_with_dups(number_range=None, range_len=None):
"""
:parameter
:param number_range: use the numbers between 0 and number_range. The smaller this is then the more dups
:param range_len: max len of the results list used in the generator
:return: a generator
Note: If number_range = 2, then random binary is returned
"""
import random
return (random.choice(range(number_range)) for i in range(range_len))
I then tested with
range_len = 2000000
def mytest():
for i in [1]:
return [remove_adj_dups(random_nums_with_dups(number_range=10, range_len=range_len))]
big_result = mytest()
big_result = mytest()[0]
print(len(big_result))
The len was 1800197 (read dups removed), in <5 secs, which includes the random list generator spinning up.
I lack the experience/knowhow to say if it is memory efficient as well. Could someone comment please
I want to compare to lists and return the different indices and elements.
So I write the following code:
l1 = [1,1,1,1,1]
l2 = [1,2,1,1,3]
ind = []
diff = []
for i in range(len(l1)):
if l1[i] != l2[i]:
ind.append(i)
diff.append([l1[i], l2[i]])
print ind
print diff
# output:
# [1, 4]
# [[1, 2], [1, 3]]
The code works, but are there any better ways to do that?
Update the Question:
I want to ask for another solutions, for example with the iterator, or ternary expression like [a,b](expression) (Not the easiest way like what I did. I want to exclude it.) Thanks very much for the patient! :)
You could use a list comprehension to output all the information in a single list.
>>> [[idx, (i,j)] for idx, (i,j) in enumerate(zip(l1, l2)) if i != j]
[[1, (1, 2)], [4, (1, 3)]]
This will produce a list where each element is: [index, (first value, second value)] so all the information regarding a single difference is together.
An alternative way is the following
>>> l1 = [1,1,1,1,1]
>>> l2 = [1,2,1,1,3]
>>> z = zip(l1,l2)
>>> ind = [i for i, x in enumerate(z) if x[0] != x[1]]
>>> ind
[1, 4]
>>> diff = [z[i] for i in ind]
>>> diff
[(1, 2), (1, 3)]
In Python3 you have to add a call to list around zip.
You can try functional style:
res = filter(lambda (idx, x): x[0] != x[1], enumerate(zip(l1, l2)))
# [(1, (1, 2)), (4, (1, 3))]
to unzip res you can use:
zip(*res)
# [(1, 4), ((1, 2), (1, 3))]
I have a list of numbers
l = [1,2,3,4,5]
and a list of tuples which describe which items should not be in the output together.
gl_distribute = [(1, 2), (1,4), (1, 5), (2, 3), (3, 4)]
the possible lists are
[1,3]
[2,4,5]
[3,5]
and I want my algorithm to give me the second one [2,4,5]
I was thinking to do it recursively.
In the first case (t1) I call my recursive algorithm with all the items except the 1st, and in the second case (t2) I call it again removing the pairs from gl_distribute where the 1st item appears.
Here is my algorithm
def check_distribute(items, distribute):
i = sorted(items[:])
d = distribute[:]
if not i:
return []
if not d:
return i
if len(remove_from_distribute(i, d)) == len(d):
return i
first = i[0]
rest = items[1:]
distr_without_first = remove_from_distribute([first], d)
t1 = check_distribute(rest, d)
t2 = check_distribute(rest, distr_without_first)
t2.append(first)
if len(t1) >= len(t2):
return t1
else:
return t2
The remove_from_distribute(items, distr_list) removes the pairs from distr_list that include any of the items in items.
def remove_from_distribute(items, distribute_list):
new_distr = distribute_list[:]
for item in items:
for pair in distribute_list:
x, y = pair
if x == item or y == item and pair in new_distr:
new_distr.remove((x,y))
if new_distr:
return new_distr
else:
return []
My output is [4, 5, 3, 2, 1] which obviously is not correct. Can you tell me what I am doing wrong here? Or can you give me a better way to approach this?
I will suggest an alternative approach.
Assuming your list and your distribution are sorted and your list is length of n, and your distribution is length of m.
First, create a list of two tuples with all valid combinations. This should be a O(n^2) solution.
Once you have the list, it's just a simple loop through the valid combination and find the longest list. There are probably some better solutions to further reduce the complexity.
Here are my sample codes:
def get_valid():
seq = [1, 2, 3, 4, 5]
gl_dist = [(1, 2), (1,4), (1, 5), (2, 3), (3, 4)]
gl_index = 0
valid = []
for i in xrange(len(seq)):
for j in xrange(i+1, len(seq)):
if gl_index < len(gl_dist):
if (seq[i], seq[j]) != gl_dist[gl_index] :
valid.append((seq[i], seq[j]))
else:
gl_index += 1
else:
valid.append((seq[i], seq[j]))
return valid
>>>> get_valid()
[(1, 3), (2, 4), (2, 5), (3, 5), (4, 5)]
def get_list():
total = get_valid()
start = total[0][0]
result = [start]
for i, j in total:
if i == start:
result.append(j)
else:
start = i
return_result = list(result)
result = [i, j]
yield return_result
yield list(result)
raise StopIteration
>>> list(get_list())
[[1, 3], [2, 4, 5], [3, 5], [4, 5]]
I am not sure I fully understand your output as I think 4,5 and 5,2 should be possible lists as they are not in the list of tuples:
If so you could use itertools to get the combinations and filter based on the gl_distribute list using sets to see if any two numbers in the different combinations in combs contains two elements that should not be together, then get the max
combs = (combinations(l,r) for r in range(2,len(l)))
final = []
for x in combs:
final += x
res = max(filter(lambda x: not any(len(set(x).intersection(s)) == 2 for s in gl_distribute),final),key=len)
print res
(2, 4, 5)