How to merge repeated elements in list in python?

How to merge repeated elements in list in python? - python

I have a list of coordinate like:
list_coordinate =[(9,0),(9,1),(9,3) ... (53,0),(53,1),(53,3)...(54,0),(54,1)..]
value = []
for m in range(0,len(list_coordinate)):
if m != len(list_coordinate)-1:
if list_coordinate[m][0]==list_coordinate[m+1][0]:
value.append(list_coordinate[m][0])`
Output of this code:
value = [9,9 ,9,...,53,53,53,...,54,54,54,54...]
I want to merge this value list for similar element and want output as:
Expected output:
[9,53,54]

If you prefer one-liners, you can do it like this:
list(set(map(lambda x: x[0], list_coordinate)))
It will output:
[9, 53, 54]
Note: As set is being used in the code, ordering of the elements is not guaranteed here.

you can use itertools.groupby
from itertools import groupby
value = [9,9 ,9,53,53,53,54,54,54,54]
g = [k for k,_ in groupby(value)]
print(g)
which produces
[9, 53, 54]
and it is guaranteed to be in the same order as the input list (if it matters).
Basically
groupby(iterable[, keyfunc])
groups the elements in the iterable, passing to a new group when the key function changes.
If the key function is omitted, the identity function is assumed, and the key for the group will be each element encountered.
So as long as the elements in value stay the same, they will be grouped under the same key, which is the element itself.
Note: this works for contiguous repetitions only. In case you wanted to get rid of re-occurring duplicates, you should sort the list first (as groupby docs explains)
As per your comment below, in case you wanted to operate on the coordinates directly
list_coordinate = [(9,0), (9,1), (9,3), (53,0), (53,1), (53,3), (54,0), (54,1)]
g = [k for k,_ in groupby(list_coordinate, lambda x: x[0])]
print(g)
produces the same output
[9, 53, 54]

You could use an OrderedDict for both of your cases. Firstly for just the x coordinates:
list_coords = [(9, 0), (9, 1), (9, 3), (53, 0), (53, 1), (53, 3), (54, 0), (54, 1)]
merged = OrderedDict()
for coord in list_coords:
merged[coord[0]] = 1
print merged.keys()
Giving:
[9, 53, 54]
Note, if for example (9, 0) was repeated later on, it would not change the output.
Secondly, for whole coordinates. Note, the data has (10 ,0) repeated 3 times:
list_coords = [(9, 0), (9, 1), (9, 3), (10, 0), (10, 0), (10, 0), (53, 0), (53, 1), (53, 3), (54, 0), (54, 1)]
merged = OrderedDict()
for coord in list_coords:
merged[coord] = 1
print merged.keys()
Giving:
[(9, 0), (9, 1), (9, 3), (10, 0), (53, 0), (53, 1), (53, 3), (54, 0), (54, 1)]

Why don't you use a set:
{ k[0] for k in list_coordinate }

Related

Combining a list of onset-offset tuples if the previous element's offset equals the next element's onset

Is there any standard library Python or Numpy operation for doing the following:
my_array = [(1, 3), (3, 4), (4, 5), (5, 7), (10, 12), (12, 17), (21, 24)]
new_array = magic_function(my_array)
print(new_array)
> [(1, 7), (10, 17), (21, 24)]
I feel like something in itertools should be able to do this, seems like something a lot of people would use. We can assume the list is sorted by onset times already. It wouldn't be hard to do that anyway, you'd just use the sorted function with a key on the first element.
Apologies if this question has already been asked, wasn't sure how to word this problem, but this could be seen as a list of onsets and offsets and I want to merge elements with adjacent/equivalent timing.
EDIT: Inspired by #chris-charley's answer below, which relies on some third party module, I just wrote up a small function which does what I wanted.
import re
def magic_function(mylist):
# convert list to intspan
intspan = ','.join([f'{int(a)}-{int(b)}' for (a,b) in mylist])
# collapse adjacent ranges
intspan = re.sub(r'\-(\d+)\,\1', '', intspan)
# convert back to list
return [tuple(map(int, _.split('-'))) for _ in intspan.split(',')]
Here is the same function generalized for floats also:
import re
def magic_function(mylist):
# convert list to floatspan
floatspan = ','.join([f'{float(a)}-{float(b)}' for (a,b) in mylist])
# collapse adjacent ranges
floatspan = re.sub(r'\-(\d+\.?\d+?)+\,\1', '', floatspan)
# convert back to list
return [tuple(map(float, _.split('-'))) for _ in floatspan.split(',')]

intspan has the methods from_ranges() and ranges() to produce the results you need.
>>> from intspan import intspan
>>> my_array = [(1, 3), (3, 4), (4, 5), (5, 7), (10, 12), (12, 17), (21, 24)]
>>> intspan.from_ranges(my_array).ranges()
[(1, 7), (10, 17), (21, 24)]

How to add N OrderDict() in python

Assuming that I have 2 OrderedDict(), I can get the result of the (+) operation by doing the following action:
dict1 = OrderedDict([(52, 0),
(53, 0),
(1, 0),
(2, 0),
(3, 0),
(4, 0),
(5, 0),
(6, 0),
(7, 0),
(8, 0),
(9, 0),
(10, 0),
(11, 1)])
dict2 = OrderedDict([(52, 0),
(53, 0),
(1, 0),
(2, 5),
(3, 0),
(4, 0),
(5, 0),
(6, 1),
(7, 0),
(8, 0),
(9, 0),
(10, 1),
(11, 1)])
dict3 = OrderedDict((k, dict1[k] + dict2[k]) for k in dict1 if k in dict2)
print(dict3)
OrderedDict([(52, 0),
(53, 0),
(1, 0),
(2, 5),
(3, 0),
(4, 0),
(5, 0),
(6, 1),
(7, 0),
(8, 0),
(9, 0),
(10, 1),
(11, 2)])
My question is: how can I generalize the above action so I can get the (+) operation result for N OrderedDict()?

By testing each key for membership of each other dict you're essentially performing an operation of a set intersection, but you can't actually use set intersections because sets are unordered in Python.
You can work around this limitation by installing the ordered-set package, so that you can use the OrderedSet.intersection method to obtain common keys among the dicts ordered by keys in the first dict, which you can then iterate over to construct a new OrderedDict with each value being the sum of the values of the current key from all dicts:
from ordered_set import OrderedSet
dicts = [dict1, dict2]
common_keys = OrderedSet.intersection(*dicts)
print(OrderedDict((k, sum(d[k] for d in dicts)) for k in common_keys))
Demo: https://replit.com/#blhsing/FlawlessGrowlingAccounting

Some naive approach using map-reduce. Note that I didn't test the following code, so it might need some adjustments
import operator
dicts = [dict1, dict2, dict3, dict4]
dicts_keys = map(lambda d: set(d.keys()), dicts)
common_keys = set.intersection(*dicts_keys)
sum_dict = OrderedDict(
(k, reduce(operator.add, map(lambda d: d[k], dicts)))
for k in common_keys)

In case you don't want to install an external package, a similar result can be achieved by using this function:
def add_dicts(*args):
items_list = list()
for k in args[0]:
if all([k in arg for arg in args[1:]]):
value = 0
for arg in args:
value += arg[k]
items_list.append((k, value))
return OrderedDict(items_list)
To call it:
dict3 = add_dicts(dict1, dict2)
dict4 = add_dicts(dict1, dict2, dict3)
If you want to call it with a list of dictionaries:
dict_list=[dict1, dict2]
dict5 = add_dicts(*dict_list)
More information about *args can be found in this answer

Categorization the list of tuples in Python

help me please I'm trying to find out the fastest and logical way to categorize tuple list by values of the first tuple element.
for example I have a list with tuples like
a = [(378, 123), (100, 12), (112, 23), (145, 14), (165, 34), (178, 45), (227, 32), (234, 12), (356, 15)] # and more and more
How I can dynamically categorize it into a groups like
100to150 = [(100, 12), (112, 23), (145, 14)]
150to200 = [(165, 34), (178, 45)]
200to250 = [(227, 32), (234, 12)]
350to400 = [(378, 123), (356, 15)]
In this way I used step 50, but I want to have an ability to change it of course. It doesn't matter what will be in output, maybe list in list for example
[[(100, 112), (124, 145)], [(165, 12), (178, 12)], [(234, 14)], [(356, 65)]] (random data) or maybe a list with a tuple, it doesn't matter. I just want to have an ability to get the length of the category and print category out. Thank you much.

You can try something like this. This will give of course give you back a categorized dictionary though, not separate variables.
a = [(378, 123), (100, 12), (112, 23), (145, 14), (165, 34), (178, 45), (227, 32), (234, 12), (356, 15)] # and more and more
def categorize(array, step=50):
d = dict()
for e in array:
from_n = e[0]//step*step
s = f'{from_n}to{from_n+step}'
if s not in d:
d[s] = []
d[s].append(e)
return d
print(categorize(a))
Output:
{'350to400': [(378, 123), (356, 15)], '100to150': [(100, 12), (112, 23), (145, 14)], '150to200': [(165, 34), (178, 45)], '200to250': [(227, 32), (234, 12)]}

l = [x for x in a if 100<x[0]<150]
I should say this is the minimal you should need to get going. If you want the full solution, you could imagine putting this into some type of function where your low and high (100, 150 in this example) are arguments. You could even have a list of highs/lows and then loop through them all and collect all the out put as a list of lists of tuples.

You can see something like this:
Using a dictionary to store grouped values, to instantly get them later.
def categorize_by_first(pairs, step=50):
d = {}
for pair in pairs:
range_start = (pair[0] // step) * step
dict_key_name = f"{range_start}_{range_start + step}"
if not d.get(dict_key_name):
d[dict_key_name] = []
d[dict_key_name].append(pair)
return d
Output:
{'350_400': [(378, 123), (356, 15)],
'100_150': [(100, 12), (112, 23), (145, 14)],
'150_200': [(165, 34), (178, 45)],
'200_250': [(227, 32), (234, 12)]}
Time complexity of grouping is O(n) (we only once iterate over the input list).
Time complexity of getting element from a dictionary is O(1)
So that should be efficient.

How to get the highest 4 tuple values?

I am trying to get the highest 4 values in a list of tuples and put them into a new list. However, if there are two tuples with the same value I want to take the one with the lowest number.
The list originally looks like this:
[(9, 20), (3, 16), (54, 13), (67, 10), (2, 10)...]
And I want the new list to look like this:
[(9,20), (3,16), (54, 13), (2,10)]
This is my current code any suggestions?
sorted_y = sorted(sorted_x, key=lambda t: t[1], reverse=True)[:5]
sorted_z = []
while n < 4:
n = 0
x = 0
y = 0
if sorted_y[x][y] > sorted_y[x+1][y]:
sorted_z.append(sorted_y[x][y])
print(sorted_z)
print(n)
n = n + 1
elif sorted_y[x][y] == sorted_y[x+1][y]:
a = sorted_y[x]
b = sorted_y[x+1]
if a > b:
sorted_z.append(sorted_y[x+1][y])
else:
sorted_z.append(sorted_y[x][y])
n = n + 1
print(sorted_z)
print(n)
Edit: When talking about lowest value I mean the highest value in the second value of the tuple and then if two second values are the same I want to take the lowest first value of the two.

How about groupby?
from itertools import groupby, islice
from operator import itemgetter
data = [(9, 20), (3, 16), (54, 13), (67, 10), (2, 10)]
pre_sorted = sorted(data, key=itemgetter(1), reverse=True)
result = [sorted(group, key=itemgetter(0))[0] for key, group in islice(groupby(pre_sorted, key=itemgetter(1)), 4)]
print(result)
Output:
[(9, 20), (3, 16), (54, 13), (2, 10)]
Explanation:
This first sorts the data by the second element's value in descending order. groupby then puts them into groups where each tuple in the group has the same value for the second element.
Using islice, we take the top four groups and sort each by the value of the first element in ascending order. Taking the first value of each group, we arrive at our answer.

You can try this :
l = [(9, 20), (3, 16), (54, 13), (67, 10), (2, 10)]
asv = set([i[1] for i in l]) # The set of unique second elements
new_l = [(min([i[0] for i in l if i[1]==k]),k) for k in asv]
OUTPUT :
[(3, 16), (2, 10), (9, 20), (54, 13)]

Returning a value from a python for loop function

I have edited my question. The code contains an is_prime formula which indicates whether a number is prime> I'm trying to extract all the prime values in the range 3 to 65
a = []
b = []
c = []
d = []
def ll_prime(n_start, n_end):
for number in range(n_start, n_end):
if is_prime(number) ==True:
a.append(number)
b.append(1)
else:
c.append(number)
d.append(0)
return (list(zip(a,b)))
The above code runs fine but when I call the function ll_prime(3,65) it gives me the following error:
TypeError Traceback (most recent call last)
<ipython-input-498-1a1a58988fa7> in <module>()
----> 1 ll_prime(3,65)
2 #type(tyl)
3 #list_values = [ v for v in tyl.values()]
<ipython-input-497-d99272d4b655> in ll_prime(n_start, n_end)
11 c.append(number)
12 d.append(0)
---> 13 return (list(zip(a,b)))
TypeError: 'list' object is not callable
Can anyone guide me as why I'm getting this error? I have searched previous question on stackoverflow but none were helpful in my case.
I want result as : [(3,1),(5,1),(7,1)] etc

You could use a list comprehension:
def l1_prime():
return [(i, 1) for i in mb]

You can use repeat from itertools and zip this with your list.
>>> from itertools import repeat
>>> mb = [3,5,7,11,13,17,19,23,29,31,37,41,43,47,53,59,61]
>>> zip(mb, repeat(1))
[(3, 1), (5, 1), (7, 1), (11, 1), (13, 1), (17, 1), (19, 1), (23, 1), (29, 1), (31, 1), (37, 1), (41, 1), (43, 1), (47, 1), (53, 1), (59, 1), (61, 1)]
Or you can use a list comprehension like this:
>>> [(x, 1) for x in mb]
[(3, 1), (5, 1), (7, 1), (11, 1), (13, 1), (17, 1), (19, 1), (23, 1), (29, 1), (31, 1), (37, 1), (41, 1), (43, 1), (47, 1), (53, 1), (59, 1), (61, 1)]
To your solution: In your solution you return your result after the first loop iteration. So it doesn't have the right values yet. Try to move the return outside of your loop.

One issue is that your return command is inside your for loop, so it will only execute once. That could be why you aren't getting what you want. When I ran your code, it returned (3,1), which was only the first set of items. That makes sense if it is only running through the for loop once and then returning. Try this:
mb = [3,5,7,11,13,17,19,23,29,31,37,41,43,47,53,59,61]
list1 = []
list2 = []
def prime():
for i in mb:
list1.append(i)
list2.append(1)
print(str(len(list1)))
print(str(len(list2)))
return (list(zip(list1,list2)))
When I run that, I get the correct answer

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to merge repeated elements in list in python? - python

If you prefer one-liners, you can do it like this: list(set(map(lambda x: x[0], list_coordinate))) It will output: [9, 53, 54] Note: As set is being used in the code, ordering of the elements is not guaranteed here.

Why don't you use a set: { k[0] for k in list_coordinate }

Related

Combining a list of onset-offset tuples if the previous element's offset equals the next element's onset

How to add N OrderDict() in python

Categorization the list of tuples in Python

How to get the highest 4 tuple values?

Returning a value from a python for loop function

Categories

Resources