Using list comprehension, tuples and itertools.groupby - python

This has been giving me some trouble for a while, maybe I've got tunneled vision. Given a list of integers, generate a new list where every group of adjacent duplicates has been turned into a tuple.
For example, given the list: [1, 2, 3, 3, 4, 5, 5, 5, 6]
The generated list contains: [1, 2, (3, 3), 4, (5, 5, 5), 6]
I'd like to achieve this using list comprehension.
numbers = [1, 2, 3, 3, 4, 5, 5, 5, 6]
it = itertools.groupby(numbers)
numbers = [tuple(group) if len(tuple(group)) > 1 else key for key, group in it]
The result I'm expecting:
[1, 2, (3, 3), 4, (5, 5, 5), 6]
The result I'm getting:
[1, 2, (), 4, (), 6]
The inserted tuples are empty, apparently - but at the same time they're not, since they would have had to have had more than one element in them to get inserted in the first place. What's going on? I'm new to python, and even after exhausting all the keywords I can think of I still haven't been able to find a similar question online. I'm sure it's something simple and I just can't see it. Any help is appreciated.

If you want to do list comprehension
>>>l = [1, 2, 3, 3, 4, 5, 5, 5, 6]
>>>[k[0] if len(k) == 1 else tuple(k) for k in [list(j) for i,j in itertools.groupby(l)]]
[1, 2, (3, 3), 4, (5, 5, 5), 6]

The problem is that the group variable is an iterator that only can be iterated once. It appears empty after exhausting. You need to store the intermediate group temporarily. One way to go is using nested generators/comprehesions as itzmeontv suggested, or to use a mapping function:
def make_group(group):
group = tuple(group)
if len(group) == 1:
return group[0]
return group
numbers = [make_group(group) for key, group in itertools.group_by(numbers)]

You may try this one
a = [1, 2, 3, 3, 4, 5, 5, 5, 6]
[(i,)*a.count(i) if a.count(i)>1 else i for i in set(a)]
output:
[1, 2, (3, 3), 4, (5, 5, 5), 6]

Related

How to split a list into two based on a value?

I am trying to create a function which can separate a list into two new lists based on a value (in this case 3.5).
The code I have tried to make so far makes a list of the first values in the main list (the ones I want to compare). This list is [1,1,3,1,4,4,5,1] I now want to create two lists. This would be [1,1,3,1,1] and [4,4,5]. However, I cannot use > to compare the different values in the list and am unsure how to do so.
As I said above, I'm confused about what your code is trying to do, but you can split a list like this.
my_list=[1,1,3,1,4,4,5,1]
my_val=3.5 #value to split on
list1=[x for x in my_list if x>my_val]
list2=[x for x in my_list if x<my_val]
EDIT
To make this work for a list of tuples, based on their first value, you can do the same but with a slight modification
my_list = [(1, 4, 3, 0),
(1, 7, 6, 0),
(3, 8, 7, 0),
(1, 1, 9, 0),
(4, 1, 1, 0),
(4, 3, 8, 1),
(5, 4, 2, 1),
(1, 7, 7, 1)]
list1=[x for x in my_list if x[0]>my_val]
list2=[x for x in my_list if x[0]<my_val]
The itertools module documentation provides a series of recipes for common tasks. One of them is a partition function.
from itertools import tee, filterfalse
def partition(pred, iterable):
"Use a predicate to partition entries into false entries and true entries"
# partition(is_odd, range(10)) --> 0 2 4 6 8 and 1 3 5 7 9
t1, t2 = tee(iterable)
return filterfalse(pred, t1), filter(pred, t2)
my_list=[1,1,3,1,4,4,5,1]
t1, t2 = partition(lambda x: x > 3.5, my_list)
list1 = list(t1) # [1, 1, 3, 1, 1]
list2 = list(t2) # [4, 4, 5]
You can have a function to split your list for a value.
def split_list(val, _list):
list1 = []
list2 = []
for _x in _list:
(list1 if _x <= val else list2).append(_x)
return list1, list2
# Your example
print(split_list(3.5, [1, 1, 3, 1, 4, 4, 5, 1]))
#brings output
#([1, 1, 3, 1, 1], [4, 4, 5])
And call it repeated for a data set as you said
# For a data set like yours
my_list = [(1, 4, 3, 0),
(1, 7, 6, 0),
(3, 8, 7, 0),
(1, 1, 9, 0),
(4, 1, 1, 0),
(4, 3, 8, 1),
(5, 4, 2, 1),
(1, 7, 7, 1)]
my_list_1 = []
my_list_2 = []
for x in my_list:
g, le = split_list(3.5, x)
my_list_1.append(g)
my_list_2.append(le)
print(my_list_1)
print(my_list_2)
# Separates the list to
[[1, 3, 0], [1, 0], [3, 0], [1, 1, 0], [1, 1, 0], [3, 1], [2, 1], [1, 1]]
[[4], [7, 6], [8, 7], [9], [4], [4, 8], [5, 4], [7, 7]]

Generating all the increasing subsequences

Given an array of integers, how can we generate all the increasing subsequences such that all of them have same length ?
Example: given this list
l = [1, 2, 4, 5, 3, 6]
The answer should be if we consider subsequences of length 4:
[1, 2, 3, 6]
[1, 2, 4, 5]
[1, 2, 4, 6]
[1, 2, 5, 6]
[1, 4, 5, 6]
[2, 4, 5, 6]
from itertools import combinations
# the second item of the tuple is the position of the element in the array
zeta = [(1, 1), (2, 2), (4, 3), (5, 4), (3, 5), (6, 6)]
comb = combinations(sorted(zeta, key=lambda x: x[0]), 4)
def verif(x):
l = []
for k in x:
l.append(k[1])
for i in range(len(l)-1):
if l[i+1]-l[i] < 0:
return 0
return 1
for i in comb:
if verif(list(i)):
print(i)
I want a better approach like dynamic programming solution, because obviously my solution is very slow for bigger list of integers. Is LIS problem helpful in this situation?

How to use `filter` with multiple iterables, as is supported by `map`?

filter accepts only one iterable, whereas map accepts a variadic number of iterables. For example, I can exhaust map(operator.add, [1, 2, 3, 4], [1, 2, 2, 4]) to get [2, 4, 5, 8].
I'm looking for a similar mechanism for filter, accepting any predicate and a variable number of iterables. Exhausting filter(operator.eq, [1, 2, 3, 4], [1, 2, 2, 4]) causes a TypeError about how filter only accepts 1 iterable, not 2.
My expected output for that particular case is ([1, 2, 4], [1, 2, 4]), i.e the pairwise elements that don't satisfy operator.eq are removed.
Here's what I have so far (eager version supporting only 2 iterables instead of N):
from typing import TypeVar, Callable, Iterable
A = TypeVar("A")
B = TypeVar("B")
def filter_(predicate: Callable[[A, B], bool], iterable1: Iterable[A], iterable2: Iterable[B]) -> (Iterable[A], Iterable[B]):
filtered_iterable1 = []
filtered_iterable2 = []
for value1, value2 in zip(iterable1, iterable2):
if predicate(value1, value2):
filtered_iterable1.append(value1)
filtered_iterable2.append(value2)
return filtered_iterable1, filtered_iterable2
However my goal is to 1) be able to support N iterables and 2) to have filter_ be lazy instead of eager, as is with filter.
Unfortunately there's no equivalent to starmap like starfilter, so the equivalent I can think of is:
[i for i in zip(*lists) if predicate(*i)]
lists here being something like ([..], [..]). This results in:
[(1, 1), (2, 2), (4, 4)]
To turn this back into separate lists, use tuple(map(list, zip(*result))):
([1, 2, 4], [1, 2, 4])
So, putting it together:
predicate = operator.eq
lists = [1, 2, 3, 4], [1, 2, 2, 4]
result = tuple(map(list, zip(*(i for i in zip(*lists) if predicate(*i)))))
Your answer is in your implementation. Map accepts a function taking multiple lists which must match the number of arguments. Filter takes a single list to filter, so the difference in not just semantic - it makes sense for filter to only take a single list. In your case the list is indeed the zip, and that is what you implement. What you are missing is a nifty way to unpair the paired results:
>>> r1, r2 = zip(*filter(lambda x: predicate(*x), zip([1, 2, 3, 4, 5], [1, 1, 3, 3, 5)))
>>> r1
(1, 3, 5)
>>> r2
(1, 3, 5)
How about:
def filter_(predicate, *iterables):
for t in zip(*iterables):
if predicate(*t):
yield t
print(list(filter_(operator.eq, [1, 2, 3, 4], [1, 2, 2, 4])))
It is lazy, it outputs [(1, 1), (2, 2), (4, 4)] for your test case, and no, you can't have ([1, 2, 4], [1, 2, 4]) as a result in a lazy way. To convert from [(1, 1), (2, 2), (4, 4)] to ([1, 2, 4], [1, 2, 4]) you could use: zip(*filter_(operator.eq, [1, 2, 3, 4], [1, 2, 2, 4])) but then of course you lose the laziness.

How to split a numpy array based on a tuple content? [duplicate]

This question already has an answer here:
Create index list for np.split from the list that already has number for each section
(1 answer)
Closed 3 years ago.
Let's say I've got an array [0, 1, 2, 3, 4, 5, 6, 7] and a tuple: (3, 3, 2).
I'm looking for a way to split my array to 3 array based on my tuple data:
[0, 1, 2]
[3, 4, 5]
[6, 7]
I can write a simple code like this to get what I want, however I'm looking for a correct and pythonic way to do this:
I used lists for simplicity.
a = [0, 1, 2, 3, 4, 5, 6, 7]
b = (3, 3, 2)
pointer = 0
for i in b:
lst = []
for j in range(i):
lst.append(a[pointer])
pointer += 1
print(lst)
Or this one:
a = [0, 1, 2, 3, 4, 5, 6, 7]
b = (3, 3, 2)
pointer = 0
for i in b:
lst = a[pointer:pointer+i]
pointer += i
print(lst)
Results:
[0, 1, 2]
[3, 4, 5]
[6, 7]
you can use the split method of numpy
import numpy as np
a = [0, 1, 2, 3, 4, 5, 6, 7]
b = (3, 3, 2)
c = np.split(a, np.cumsum(b)[:-1])
for r in c:
print(r)
np.split(a, b) splits a by the indices in b along a given axis(0 by default).
If you don't want to modify your input list, you can use an iterator and the itertools module.
>>> from itertools import islice
>>> a = [0, 1, 2, 3, 4, 5, 6, 7]
>>> b = (3, 3, 2)
>>> i = iter(a)
>>> [list(islice(i, x)) for x in b]
[[0, 1, 2], [3, 4, 5], [6, 7]]
In the first step you create an iterator, which starts at the first element of a. Then you iterate in a list comprehension over your numbers in b and in each step you pull accordingly many elements from the iterator and store them in your result list.
One simpler way is this:
a = [0, 1, 2, 3, 4, 5, 6, 7]
b = (3, 3, 2)
for ind in b:
print(a[:ind])
a = a[ind:]
It loops through slice sizes in b while shortening the original array every time. You can easily append the resulting slices as sublists if you need them for something else. It's almost like one of your solutions except it doesn't use any extra variables and iterates directly through elements of b.
Also, I wouldn't call variables a and b - surely not in this case where variables have clear meanings that you can express through their names. More meaningful names lessen bugs number and make code more clear, becomes a real difference with larger/more complex code. I'd call a at least in_list and b slices, but with more context this could be better.
The most "concise" syntax would be :
ex_array = [0, 1, 2, 3, 4, 5, 6, 7]
extuple = (3, 3, 2)
result = [ex_array[sum(extuple[:iii]):sum(extuple[:iii])+extuple[iii]] for iii in range(len(extuple))]
result would be a list of the expected sub-lists
Re-using the pairwise function from Compare two adjacent elements in same list, you could also:
from itertools import accumulate
from more_itertools import pairwise
a = [0, 1, 2, 3, 4, 5, 6, 7]
b = (3, 3, 2)
[a[slice(*s)] for s in pairwise(accumulate((0,)+b))]
That begin said, the np.split answer is probably faster (and easier to read).

Python Random List Comprehension

I have a list similar to:
[1 2 1 4 5 2 3 2 4 5 3 1 4 2]
I want to create a list of x random elements from this list where none of the chosen elements are the same. The difficult part is that I would like to do this by using list comprehension...
So possible results if x = 3 would be:
[1 2 3]
[2 4 5]
[3 1 4]
[4 5 1]
etc...
Thanks!
I should have specified that I cannot convert the list to a set. Sorry!
I need the randomly selected numbers to be weighted. So if 1 appears 4 times in the list and 3 appears 2 times in the list, then 1 is twice as likely to be selected...
Disclaimer: the "use a list comprehension" requirement is absurd.
Moreover, if you want to use the weights, there are many excellent approaches listed at Eli Bendersky's page on weighted random sampling.
The following is inefficient, doesn't scale, etc., etc.
That said, it has not one but two (TWO!) list comprehensions, returns a list, never duplicates elements, and respects the weights in a sense:
>>> s = [1, 2, 1, 4, 5, 2, 3, 2, 4, 5, 3, 1, 4, 2]
>>> [x for x in random.choice([p for c in itertools.combinations(s, 3) for p in itertools.permutations(c) if len(set(c)) == 3])]
[3, 1, 2]
>>> [x for x in random.choice([p for c in itertools.combinations(s, 3) for p in itertools.permutations(c) if len(set(c)) == 3])]
[5, 3, 4]
>>> [x for x in random.choice([p for c in itertools.combinations(s, 3) for p in itertools.permutations(c) if len(set(c)) == 3])]
[1, 5, 2]
.. or, as simplified by FMc:
>>> [x for x in random.choice([p for p in itertools.permutations(s, 3) if len(set(p)) == 3])]
[3, 5, 2]
(I'll leave the x for x in there, even though it hurts not to simply write list(random.choice(..)) or just leave it as a tuple..)
Generally, you don't want to do this sort of thing in a list comprehension -- It'll lead to much harder to read code. However, if you really must, we can write a completely horrible 1 liner:
>>> values = [random.randint(0,10) for _ in xrange(12)]
>>> values
[1, 10, 6, 6, 3, 9, 0, 1, 8, 9, 1, 2]
>>> # This is the 1 liner -- The other line was just getting us a list to work with.
>>> [(lambda x=random.sample(values,3):any(values.remove(z) for z in x) or x)() for _ in xrange(4)]
[[6, 1, 8], [1, 6, 10], [1, 0, 2], [9, 3, 9]]
Please never use this code -- I only post it for fun/academic reasons.
Here's how it works:
I create a function inside the list comprehension with a default argument of 3 randomly selected elements from the input list. Inside the function, I remove the elements from values so that they aren't available to be picked again. since list.remove returns None, I can use any(lst.remove(x) for x in ...) to remove the values and return False. Since any returns False, we hit the or clause which just returns x (the default value with 3 randomly selected items) when the function is called. All that is left then is to call the function and let the magic happen.
The one catch here is that you need to make sure that the number of groups you request (here I chose 4) multiplied by the number of items per group (here I chose 3) is less than or equal to the number of values in your input list. It may seem obvious, but it's probably worth mentioning anyway...
Here's another version where I pull shuffle into the list comprehension:
>>> lst = [random.randint(0,10) for _ in xrange(12)]
>>> lst
[3, 5, 10, 9, 10, 1, 6, 10, 4, 3, 6, 5]
>>> [lst[i*3:i*3+3] for i in xrange(shuffle(lst) or 4)]
[[6, 10, 6], [3, 4, 10], [1, 3, 5], [9, 10, 5]]
This is significantly better than my first attempt, however, most people would still need to stop, scratch their head a bit before they figured out what this code was doing. I still assert that it would be much better to do this in multiple lines.
If I'm understanding your question properly, this should work:
def weighted_sample(L, x):
# might consider raising some kind of exception of len(set(L)) < x
while True:
ans = random.sample(L, x)
if len(set(ans)) == x:
return ans
Then if you want many such samples you can just do something like:
[weighted_sample(L, x) for _ in range(num_samples)]
I have a hard time conceiving of a comprehension for the sampling logic that isn't just obfuscated. The logic is a bit too complicated. It sounds like something randomly tacked on to a homework assignment to me.
If you don't like infinite looping, I haven't tried it but I think this will work:
def weighted_sample(L, x):
ans = []
c = collections.Counter(L)
while len(ans) < x:
r = random.randint(0, sum(c.values())
for k in c:
if r < c[k]:
ans.append(k)
del c[k]
break
else:
r -= c[k]
else:
# maybe throw an exception since this should never happen on valid input
return ans
First of all, I hope your list might be like
[1,2, 1, 4, 5, 2, 3, 2, 4, 5, 3, 1, 4, 2]
so if you want to print the permutation from the given list as size 3, you can do as the following.
import itertools
l = [1,2, 1, 4, 5, 2, 3, 2, 4, 5, 3, 1, 4, 2]
for permutation in itertools.permutations(list(set(l)),3):
print permutation,
Output:
(1, 2, 3) (1, 2, 4) (1, 2, 5) (1, 3, 2) (1, 3, 4) (1, 3, 5) (1, 4, 2) (1, 4, 3) (1, 4, 5) (1, 5, 2) (1, 5, 3) (1, 5, 4) (2, 1, 3) (2, 1, 4) (2, 1, 5) (2, 3, 1) (2, 3, 4) (2, 3, 5) (2, 4, 1) (2, 4, 3) (2, 4, 5) (2, 5, 1) (2, 5, 3) (2, 5, 4) (3, 1, 2) (3, 1, 4) (3, 1, 5) (3, 2, 1) (3, 2, 4) (3, 2, 5) (3, 4, 1) (3, 4, 2) (3, 4, 5) (3, 5, 1) (3, 5, 2) (3, 5, 4) (4, 1, 2) (4, 1, 3) (4, 1, 5) (4, 2, 1) (4, 2, 3) (4, 2, 5) (4, 3, 1) (4, 3, 2) (4, 3, 5) (4, 5, 1) (4, 5, 2) (4, 5, 3) (5, 1, 2) (5, 1, 3) (5, 1, 4) (5, 2, 1) (5, 2, 3) (5, 2, 4) (5, 3, 1) (5, 3, 2) (5, 3, 4) (5, 4, 1) (5, 4, 2) (5, 4, 3)
Hope this helps. :)
>>> from random import shuffle
>>> L = [1, 2, 1, 4, 5, 2, 3, 2, 4, 5, 3, 1, 4, 2]
>>> x=3
>>> shuffle(L)
>>> zip(*[L[i::x] for i in range(x)])
[(1, 3, 2), (2, 2, 1), (4, 5, 3), (1, 4, 4)]
You could also use a generator expression instead of the list comprehension
>>> zip(*(L[i::x] for i in range(x)))
[(1, 3, 2), (2, 2, 1), (4, 5, 3), (1, 4, 4)]
Starting with a way to do it without list compehensions:
import random
import itertools
alphabet = [1, 2, 1, 4, 5, 2, 3, 2, 4, 5, 3, 1, 4, 2]
def alphas():
while True:
yield random.choice(alphabet)
def filter_unique(iter):
found = set()
for a in iter:
if a not in found:
found.add(a)
yield a
def dice(x):
while True:
yield itertools.islice(
filter_unique(alphas()),
x
)
for i, output in enumerate(dice(3)):
print list(output)
if i > 10:
break
The part, where list comprehensions have troubles is filter_unique() since list comprehension does not have 'memory' of what it did output. The possible solution would be to generate many outputs while the one of good quality is not found as #DSM suggested.
The slow, naive approach is:
import random
def pick_n_unique(l, n):
res = set()
while len(res) < n:
res.add(random.choice(l))
return list(res)
This will pick elements and only quit when it has n unique ones:
>>> pick_n_unique([1, 2, 1, 4, 5, 2, 3, 2, 4, 5, 3, 1, 4, 2], 3)
[2, 3, 4]
>>> pick_n_unique([1, 2, 1, 4, 5, 2, 3, 2, 4, 5, 3, 1, 4, 2], 3)
[3, 4, 5]
However it can get slow if, for example, you have a list with thirty 1s and one 2, since once it has a 1 it'll keep spinning until it finally hits a 2. The better is to count the number of occurrences of each unique element, choose a random one weighted by their occurrence count, remove that element from the count list, and repeat until you have the desired number of elements:
def weighted_choice(item__counts):
total_counts = sum(count for item, count in item__counts.items())
which_count = random.random() * total_counts
for item, count in item__counts.items():
which_count -= count
if which_count < 0:
return item
raise ValueError("Should never get here")
def pick_n_unique(items, n):
item__counts = collections.Counter(items)
if len(item__counts) < n:
raise ValueError(
"Can't pick %d values with only %d unique values" % (
n, len(item__counts))
res = []
for i in xrange(n):
choice = weighted_choice(item__counts)
res.append(choice)
del item__counts[choice]
return tuple(res)
Either way, this is a problem not well-suited to list comprehensions.
def sample(self, population, k):
n = len(population)
if not 0 <= k <= n:
raise ValueError("sample larger than population")
result = [None] * k
try:
selected = set()
selected_add = selected.add
for i in xrange(k):
j = int(random.random() * n)
while j in selected:
j = int(random.random() * n)
selected_add(j)
result[i] = population[j]
except (TypeError, KeyError): # handle (at least) sets
if isinstance(population, list):
raise
return self.sample(tuple(population), k)
return result
Above is a simplied version of the sample function Lib/random.py. I only removed some optimization code for small data sets. The codes tell us straightly how to implement a customized sample function:
get a random number
if the number have appeared before just abandon it and get a new one
repeat the above steps until you get all the sample numbers you want.
Then the real problem turns out to be how to get a random value from a list by weight.This could be by the original random.sample(population, 1) in the Python standard library (a little overkill here, but simple).
Below is an implementation, because duplicates represent weight in your given list, we can use int(random.random() * array_length) to get a random index of your array.
import random
arr = [1, 2, 1, 4, 5, 2, 3, 2, 4, 5, 3, 1, 4, 2]
def sample_by_weight( population, k):
n = len(population)
if not 0 <= k <= len(set(population)):
raise ValueError("sample larger than population")
result = [None] * k
try:
selected = set()
selected_add = selected.add
for i in xrange(k):
j = population[int(random.random() * n)]
while j in selected:
j = population[int(random.random() * n)]
selected_add(j)
result[i] = j
except (TypeError, KeyError): # handle (at least) sets
if isinstance(population, list):
raise
return self.sample(tuple(population), k)
return result
[sample_by_weight(arr,3) for i in range(10)]
With the setup:
from random import shuffle
from collections import deque
l = [1, 2, 1, 4, 5, 2, 3, 2, 4, 5, 3, 1, 4, 2]
This code:
def getSubLists(l,n):
shuffle(l) #shuffle l so the elements are in 'random' order
l = deque(l,len(l)) #create a structure with O(1) insert/pop at both ends
while l: #while there are still elements to choose
sample = set() #use a set O(1) to check for duplicates
while len(sample) < n and l: #until the sample is n long or l is exhausted
top = l.pop() #get the top value in l
if top in sample:
l.appendleft(top) #add it to the back of l for a later sample
else:
sample.add(top) #it isn't in sample already so use it
yield sample #yield the sample
You end up with:
for s in getSubLists(l,3):
print s
>>>
set([1, 2, 5])
set([1, 2, 3])
set([2, 4, 5])
set([2, 3, 4])
set([1, 4])

Categories

Resources