Python creating tuple groups in list from another list - python

Let's say I have this data:
data = [1, 2, 3, -4, -5, 3, 2, 4, -2, 5, 6, -5, -1, 1]
I need it to be grouped in another list by tuples. One tuple consists of two lists. One for positive numbers, another for negative. And tuples should be created by checking what kind of number it is. Last negative number (I mean in a row that between negative numbers there were no positive ones) means, other numbers must go into another tuple and when it finds another last negative number, it should create another tuple.
So rules are these: All found numbers are being added into first tuple, when it finds negative number, it still adds it to that tuple, till it finds positive number (it means new tuple must be created).
I think it is easier to show, than to explain. After parsing data, the list should look like this:
l = [([1, 2, 3], [-4, -5]), ([3, 2, 4], [-2]), ([5, 6], [-5, -1]), ([1], [])]
I created a solution, but I wonder if it's quite optimal. Maybe it is possible to write a more elegant one (and I wonder about performance, is there some better way to write such parser with best possible performance:))?
def neighborhood(iterable):
iterator = iter(iterable)
prev = None
item = iterator.next() # throws StopIteration if empty.
for next in iterator:
yield (prev,item,next)
prev = item
item = next
yield (prev,item,None)
l = []
pos = []
neg = []
for prev, item, next in neighborhood(data):
if item > 0:
pos.append(item)
if not next:
l.append((pos, neg))
else:
neg.append(item)
if next > 0:
l.append((pos, neg))
pos = []
neg = []
elif not next:
l.append((pos, neg))
print l
P.S. if not next part I think can be used only once after main check.

I'd use itertools.groupby to make a list of consecutive tuples containing positive/negative lists first, and then group into consecutive pairs. This can still be done in one pass through the list by taking advantage of generators:
from itertools import groupby, zip_longest
x = (list(v) for k,v in groupby(data, lambda x: x < 0))
l = list(zip_longest(x, x, fillvalue=[]))
This gives l as:
[([1, 2, 3], [-4, -5]), ([3, 2, 4], [-2]), ([5, 6], [-5, -1]), ([1], [])]
A couple of notes on the code above:
The initial grouping into positive/negative values is handed to groupby which should be reasonably performant (it's compiled code).
The zipping-a-generator method for grouping into pairs is a reasonably common idiom in Python. It's guaranteed to work since zip guarantees than an iterable is consumed from left to right.
In Python 2, use izip_longest.

You could go with O(n) solution which is much less beautiful than #ajcr one but should be more efficient.
def pos_neg(data):
split = []
for r in data:
if len(split) == 0 or (r > 0 and len(split[-1][-1]) > 0):
split.append(([], []))
if r < 0:
split[-1][-1].append(r)
else:
split[-1][-2].append(r)
return split
data = [1, 2, 3, -4, -5, 3, 2, 4, -2, 5, 6, -5, -1, 1]
print pos_neg(data)
#=> [([1, 2, 3], [-4, -5]), ([3, 2, 4], [-2]), ([5, 6], [-5, -1]), ([1], [])]

Related

python: efficient and pythonic way to get sublist from a descending list which is larger than threshold

Given a list with descending order, e.g. [10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0, 0, 0, -1, -2, -2] and threshold = 1.2, I want to get sublist from original list with all elements larger than threshold
Method1:
orgin_lst = [10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0, 0, 0, -1, -2, -2]
lst = [i for i in orgin_lst if i > threshold]
This is pythonic way but we don't use the descending property and cannot break out when found a element not larger than threshold. If there are few satisfied elements but oringal list is very large, the performance is not good.
Method2:
orgin_lst = [10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0, 0, 0, -1, -2, -2]
lst = []
for i in orgin_lst:
if i <= threshold:
break
lst.append(i)
However this code is not quite pythonic.
Is there a way that I can combine pythonic style and performance?
Python 3.10+
Binary search is fast for sorted data, O(log n) time. And Python's bisect module already does it. It wants increasing data and yours is decreasing, but we can virtually make it increasing. Just use its shiny new key parameter to negate the O(log n) accessed elements (and search for the negated threshold):
from bisect import bisect_left
from operator import neg
i = bisect_left(orgin_lst, -threshold, key=neg)
lst = orgin_lst[:i]
Alternatively, use a key function that returns False for values larger than the threshold and True otherwise. Since False is smaller than True (they act like 0 and 1, respectively), we again have a monotonically increasing sequence and can use bisect with this:
from bisect import bisect
i = bisect(orgin_lst, False, key=lambda x: x <= threshold)
lst = orgin_lst[:i]
If you don't need a separate new list, you could use del orgin_lst[i:] to instead remove the unwanted elements.
Before Python 3.10
Previously I would've written a proxy class to do the job now done by that much more convenient key parameter:
from bisect import bisect_left
class Negate:
def __getitem__(_, i):
return -orgin_lst[i]
i = bisect_left(Negate(), -threshold, 0, len(orgin_lst))
lst = orgin_lst[:i]
Or I might've written binary search myself, but I've done that so many times that at some point I started to loathe it.
Exponential search
Under your Method1, the list comprehension comparing every element, you wrote: "If there are few satisfied elements but oringal list is very large, the performance is not good". If that was not just an argument against that list comprehension but you actually do have mostly very few satisfied elements and a very long list, then exponential search could be better than binary search. But it would be more code (unless you find a package for it, I guess).
A simple iterative search like your Method2 (which I btw do find pythonic) or Chris' answer or with itertools.takewhile would also be fast in such extreme cases, but for cases with large numbers of satisfied elements, they'd be much slower than binary search and exponential search.
itertools.takewhile
Like I said it would be slower in general, but it's fast for those best-cases and it's quite simple and clean:
from itertools import takewhile
lst = list(takewhile(lambda x: x > threshold, orgin_lst))
Faster loop
Like I said I do find your loop pythonic, and it's good for best-cases. But calling append to individually append elements to the result is quite costly. Would probably be faster to at first just find the first too small element, then find its index and slice:
for i in orgin_lst:
if i <= threshold:
lst = orgin_lst[:orgin_lst.index(i)]
break
else:
lst = orgin_lst[:]
Again, if you're ok with just removing the unwanted elements from the existing list, use del inside the if and then you don't need the else part here.
A similar solution I wrote for another question ended up second-fastest in the benchmark there.
Alternative implementation:
cut = None
for i in orgin_lst:
if i <= threshold:
cut = orgin_lst.index(i)
break
lst = orgin_lst[:cut]
I think your code was very close:
orgin_lst = [10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0, 0, 0, -1, -2, -2]
lst = []
for i in orgin_lst:
if i <= threshold:
break
lst.append(i)
But let's employ a generator.
def take_until(f, it):
for x in it:
if f(x): return
yield x
Now, we can write something like the following, for instance.
>>> for x in take_until(lambda x: x <= 1.2, lst):
... print(x)
...
10
9
8
7
6
5
4
3
2
>>>
Heck, if we really want a list, that's just as easy.
>>> list(take_until(lambda x: x <= 1.2, lst))
[10, 9, 8, 7, 6, 5, 4, 3, 2]
>>>

Mapping a function each list in a list of lists

I've been given a homework task that asks me to find in a list of data the greatest continuous increase. i.e [1,2,3,4,5,3,1,2,3] the greatest static increase here is 4.
I've written a function that takes a single list and spits out a list of sublists like this.
def group_data(lst):
sublist= [[lst[0]]]
for i in range(1, len(lst)):
if lst[i-1] < lst[i]:
sublist[-1].append(lst[i])
else:
sublist.append([lst[i]])
return(sublist)
Which does what it's supposed to
group_data([1,2,3,4,5,6,7,8,9,10,1,2,3,5,4,7,8])
Out[3]: [[1, 2, 3, 4, 5, 6, 7, 8, 9, 10], [1, 2, 3, 5], [4, 7, 8]]
And I now want to subtract the last element of each individual list from the first to find their differences. But I'm having difficulty figuring out how to map the function to each list rather than each element of the list. Any help would be greatly appreciated.
you can do it using map function where arr is your grouped list
list(map(lambda x: x[-1]-x[0], arr ))
For this problem I think itertools.groupby would be a good choice. Since your final goal is to find the difference of longest consecutive numbers:
from itertools import groupby
max_l = max([len(list(g)) - 1 for k, g in groupby(enumerate([1,2,3,4,5,6,7,8,9,10,1,2,3,5,4,7,8]), key=lambda x: x[0] - x[1])])
print(max_l)
#it will print 9
Explanation:
First groupby the numbers with the difference between index and number value. For example [0, 1, 2, 4] will create [0, 0, 0, 1] as the index of 0 is 0, so 0-0=0, for the second one 1-1=0. Then take the maximum length of the grouped list. Since you want difference, I used len(list(g)) - 1

Code not performing as expected [For in cycle]

Why is this not working? Actual result is [] for any entry.
def non_unique(ints):
"""
Return a list consisting of only the non-unique elements from the list lst.
You are given a non-empty list of integers (ints). You should return a
list consisting of only the non-unique elements in this list. To do so
you will need to remove all unique elements (elements which are
contained in a given list only once). When solving this task, do not
change the order of the list.
>>> non_unique([1, 2, 3, 1, 3])
[1, 3, 1, 3]
>>> non_unique([1, 2, 3, 4, 5])
[]
>>> non_unique([5, 5, 5, 5, 5])
[5, 5, 5, 5, 5]
>>> non_unique([10, 9, 10, 10, 9, 8])
[10, 9, 10, 10, 9]
"""
new_list = []
for x in ints:
for a in ints:
if ints.index(x) != ints.index(a):
if x == a:
new_list.append(a)
return new_list
Working code (not from me):
result = []
for c in ints:
if ints.count(c) > 1:
result.append(c)
return result
list.index will return the first index that contains the input parameter, so if x==a is true, then ints.index(x) will always equal ints.index(a). If you want to keep your same code structure, I'd recommend keeping track of the indicies within the loop using enumerate as in:
for x_ind, x in enumerate(ints):
for a_ind, a in enumerate(ints):
if x_ind != a_ind:
if x == a:
new_list.append(a)
Although, for what it's worth, I think your example of working code is a better way of accomplishing the same task.
Although the example of working code is correct, if suffers from quadratic complexity which makes it slow for larger lists. I'd prefer s.th. like this:
from nltk.probability import FreqDist
def non_unique(ints):
fd = FreqDist(ints)
return [x for x in ints if fd[x] > 1]
It precomputes a frequency distribution in the first step, and then selects all non-unique elements. Both steps have a O(n) performance characteristic.

Python map a value to each i-th sublist's element

I'm trying to do the following in python: given a list of lists and an integer i
input = [[1, 2, 3, 4], [1, 2, 3, 4], [5, 6, 7, 8]]
i = 1
I need to obtain another list which has all 1s for the elements of the i-th list, 0 otherwise
output = [0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0]
I wrote this code
output = []
for sublist in range(0, len(input)):
for item in range(0, len(input[sublist])):
output.append(1 if sublist == i else 0)
and it obviously works, but since I'm a newbie in python I suppose there's a better 'pythonic' way of doing this.
I thought using map could work, but I can't get the index of the list with it.
Creating extra variable to get index of current element in interation is quite unpythonic. Usual alternative is usage of enumerate built-in function.
Return an enumerate object. sequence must be a sequence, an iterator,
or some other object which supports iteration. The next() method of
the iterator returned by enumerate() returns a tuple containing a
count (from start which defaults to 0) and the values obtained from
iterating over sequence.
You may use list comprehension with double loop inside it for concise one liner:
input_seq = [[1, 2, 3, 4], [1, 2, 3, 4], [5, 6, 7, 8]]
i = 1
o = [1 if idx == i else 0 for idx, l in enumerate(input_seq) for _ in l]
Alternatively,
o = [int(idx == i) for idx, l in enumerate(input_seq) for _ in l]
Underscore is just throwaway name, since in this case we don't care for actual values stored in input sublists.
Here's a 1-liner, but it's not really obvious:
output = [int(j == i) for j, sublist in enumerate(input) for _ in sublist]
Somewhat more obvious:
output = []
for j, sublist in enumerate(input):
output.extend([int(i == j)] * len(sublist))
Then "0 or 1?" is computed only once per sublist, which may or may not be more efficient.

Get randomly the 3 minimum values of an repeated-values array in Python

I've an array my_array and I want, due to specific reasons ignore the values -5 and -10 of it (yes, in the example below there's not a -10 but in other arrays I've to manage yes), and get the index of the three minimum values of the array, and append them to a new list titled lista_indices_candidatos.
This is my code.
my_array = [4, -5, 10, 4, 4, 4, 0, 4, 4]
a = np.array(my_array)
indices = a.argsort()
indices = indices[a[indices] != -5]
indices = indices[a[indices] != -10]
lista_indices_candidatos = []
for i in indices[:3]:
lista_indices_candidatos.append(i)
print lista_indices_candidatos
This gets me the index of the 3 minimum values [6, 0, 3] from the array [4, -5, 10, 4, 4, 4, 0, 4, 4]
The thing is that, if there are repeated values, this get's me the first three minimum values (the first 4 (index 0) the second 4 (index 3), ignoring the rest 4's of the array.
How can I change the code to get completely randomly the three minimum values, without taking always the first three?
myArray = [4, -5, 10, 4, 4, 4, 0, 4, 4]
myUniqueArray = list(set(myArray))
myUniqueArray.sort()
return [myArray.index(myUniqueArray[0]), myArray.index(myUniqueArray[1]), myArray.index(myUniqueArray[2])]
.index would not give you a random index in the sense that it will always be the same value for a give set of input list but you could play with that part.
I haven't introduced randomness, because it don't really see the point for doing this.
If you need the first 3 lowest positive values:
sorted([x for x in my_array if x >= 0])[:3]
If you need the first three lowest positive values and their initial index:
sorted([(x,idx) for idx,x in enumerate(my_array) if x >= 0], key=lambda t: t[0])[:3]
If you just need the first 3 lowest positive values initial indexes:
[i for x,i in sorted([(x,idx) for idx,x in enumerate(my_array) if x >= 0], key=lambda t: t[0])[:3]]
My take is that you want to get 3 random indices for values in my_array, excluding [-10, -5], the 3 random indices must be chosen within the index list of the 3 lowest values of the remaining set, right?
What about this:
from random import sample
my_array = [4, -5, 10, 4, 4, 4, 0, 4, 4]
sample([i for i, x in enumerate(my_array) if x in sorted(set(my_array) - {-10, -5})[:3]], 3)
Factoring out the limited set of values, that would be:
from random import sample
my_array = [4, -5, 10, 4, 4, 4, 0, 4, 4]
filtered_list = sorted(set(my_array) - {-10, -5})[:3]
# Print 3 sample indices from my_array
print sample([i for i, x in enumerate(my_array) if x in filtered_list], 3)
Ok, I'm also not sure what you are trying to achieve. I like the simplicity of Nasha's answer, but I think you want to always have the index of the 0 in the result set. The way I understand you, you want the index of the lowest three values and only if one of those values is listed more than once, do you want to pick randomly from those.
Here's my try a solution:
import random
my_array = [4, -5, 10, 4, 4, 4, 0, 4, 4]
my_dict = {}
lista_indices_candidatos = []
for index, item in enumerate(my_array):
try:
my_dict[item] = my_dict[item] + [index]
except:
my_dict[item] = [index]
for i in [x for x in sorted(my_array) if x != -10 and x != -5][:3]:
lista_indices_candidatos.append(random.choice(my_dict[i]))
print lista_indices_candidatos
In this solution, I build a dictionary with all the values from my_array as keys. The values of the dictionary is a list of indexes these numbers have in my_array. I then use a list comprehension and slicing to get the three lowest values to iterate over in the for loop. There, I can randomly pick an index for a given value by randomly selecting from my_dict.
I bet there are better ways to achieve what you want to achieve, though. Maybe you can let us know what it is you are trying to do so we can improve on our answers.
After reading your comment, I see that you do not actually want a completely random selection, but instead a random selection without repetition. So here's an updated version.
import random
my_array = [4, -5, 10, 4, 4, 4, 0, 4, 4]
my_dict = {}
lista_indices_candidatos = []
for index, item in enumerate(my_array):
try:
my_dict[item] = my_dict[item] + [index]
except:
my_dict[item] = [index]
for l in my_dict:
random.shuffle(my_dict[l])
for i in [x for x in sorted(my_array) if x != -10 and x != -5][:3]:
lista_indices_candidatos.append(my_dict[i].pop())
print lista_indices_candidatos
How about this one:
import random
def eachIndexSorted(a): # ... without -5 and -10
for value in sorted(set(a) - { -5, -10 }):
indexes = [ i for i in range(len(a)) if a[i] == value ]
random.shuffle(indexes)
for i in indexes:
yield i
def firstN(iterator, n):
for i in range(n):
yield iterator.next()
print list(firstN(eachIndexSorted(my_array), 3))
If you have very large data, then sorting the complete set might be too costly; finding each next minimum iteratively might then be a better approach. (Ask for more details if this aspect is unclear and important for you.)

Categories

Resources