Grouping same list value into one list value - python

I have a question for grouping multiple list values into one values. For example I have this list
data_list = [A,A,B,B,B,C,C,C,C]
then I want to make it into this
data_list = [A, B, C]
I have tried using itertools.groupby but I still cannot find my solution
from itertools import groupby
data_list = [A,A,B,B,B,C,C,C,C]
data_group = [(key, len(list(group))) for key, group in groupby(data_list)]
print(data_group)
the expected output is data_group = [A, B, C]
the actual result is data_group = [(A, 2), (B, 3), (C, 4)]

Method-1 --
you can also use numpy to get unique values:-
import numpy as np
data_list = np.array(['A','A','B','B','B','C','C','C','C'])
np.unique(data_list)
Method-2
You can use set to get unique values but in set result will not contain the same order.
new_list = list( set(data_list) )
new_list
I hope it may help you.

Try with this code
mylist = ["a", "b", "a", "c", "c"]
mylist = list(dict.fromkeys(mylist))
print(mylist)
you can also use OrderedDict to print it in order
from collections import OrderedDict
mylist = ['A','A','B','B','B','C','C','C','C']
mylist = list(OrderedDict.fromkeys(mylist))
print(mylist)

Have you tried looking into sets?
you can first cast your original data_list into a set using set(data_list) then cast that again into a list.
data_list = [A,A,B,B,B,C,C,C,C]
print(list(set(data_list)))
#OUTPUT:
['A', 'B', 'C']
What sets do is they only include unique values. Hence why when we run the set() function on your data_list var, we are left with only the unique values. Sets, in python, are signified by 'curly brackets' like those in dicts, { }, but sets do not contain key:value pairs. The list() function casts your set as a list so you can treat it like a list in the future.

A good idea is to use python sets.
Per documentation, a part of the description is:
"A set is an unordered collection with no duplicate elements. Basic uses include membership testing and eliminating duplicate entries."
For example:
my_list = [1,1,2,2,3,3]
my_set = set(my_list)
print(my_set)
type(my_set)
Will output:
{1,2,3}
set
Mind that the resulting data type is set
So, if you want your result to be a list, you can cast it back into one:
unique_values = list(set(my_list))
And if you are planning to use that a lot in your code, a function would help:
def giveUnique(x):
return list(set(x))
my_list = giveUnique(my_list)
This would change my_list with a list containing unique values

Just adapt the itertools.groupby solution you have (found?) to only use the key:
>>> data_list = [A, A, B, B, B, C, C, C, C] # with A, B, C = "ABC"
>>> [(key, len(list(group))) for key, group in groupby(data_list)]
[('A', 2), ('B', 3), ('C', 4)]
>>> [key for key, group in groupby(data_list)]
['A', 'B', 'C']

Related

How to join certain elements of a list?

I have a list that looks like this:
lst = [(1,'X1', 256),(1,'X2', 356),(2,'X3', 223)]
The first item of each tuple is an ID and I want to marge the items of each tuple where the ID is the same.
For example I want the list to look like this:
lst = [(1,('X1','X2'),(256,356)),(2,'X3',223)
How do I do this the easiest way?
I have tried some solutions based on own logic but it did not work out.
Use a dictionary whose keys are the IDs, so you can combine all the elements with the same ID.
from collections import defaultdict
lst = [(1,'X1', 256),(1,'X2', 356),(2,'X3', 223)]
result_dict = defaultdict(lambda: [[], []])
for id, item1, item2 in lst:
result_dict[id][0].append(item1)
result_dict[id][1].append(item2)
result = [(id, *map(tuple, vals)) for id, vals in result_dict.items()]
print(result)
Output is:
[(1, ('X1', 'X2'), (256, 356)), (2, ('X3',), (223,))]
This can be done with a single-line list comprehension (after obtaining a set of ids), and for a general case of having multiple fields other than the id (and not just 2 other fields):
lst = [(1,'X1', 256),(1,'X2', 356),(2,'X3', 223)]
ids = {x[0] for x in lst}
result = [(id, list(zip(*[x[1:] for x in lst if x[0] == id]))) for id in ids]
print(result)
# [(1, [('X1', 'X2'), (256, 356)]), (2, [('X3',), (223,)])]
So there's no need to go through a dictionary stage which is then turned into a list, and there's also no need to hardcode indexing of two elements and other such limitations.

Identify distinct elements in a list and map it to a corresponding index in another list in python

I have 2 different lists:
l1 = ['a','b','a','e','b','c','a','d']
l2 = ['t1','t2','t3','t4','t5','t6','t7','t8']
The lengths of l1 and l2 will always be the same. They're in fact logical mappings - each item in l1 corresponds to a value in l2.
I wanted to identify distinct elements in l1. I did that using set and list comprehension as follows:
used = set()
distl1 = [x for x in l1 if x not in used and (used.add(x) or True)]
Here, the output will be:
distl1 = ['a','b','e','c','d']
which is nothing but the first occurrence of every distinct element.
Now, how do I build a list distl2 so that I get the output as the value in l2 that corresponds to the first occurrence's value i.e., distl1?
distl2 = ['t1','t2','t4','t6','t8']
My idea is to use an OrderedDict to build a mapping of (key, value) pairs corresponding to the elements of l1 and l2 and then extract the values from that dict as a list.
>>> from collections import OrderedDict
>>>
>>> l1 = ['a','b','a','e','d','c','a','b']
>>> l2 = ['t1','t2','t3','t4','t5','t6','t7','t8']
>>>
>>> d = OrderedDict()
>>> for k, v in zip(l1, l2):
...: if k not in d: # <--- check if this key has already been seen!
...: d[k] = v
...:
>>> distl2 = list(d.values())
>>> distl2
>>> ['t1', 't2', 't4', 't5', 't6']
Note for Python 3.7+ users: regular dicts are guaranteed to remember their key insertion order, so you can omit importing the OrderedDict.
You can also do this:
distl2 = [l2[l1.index(key)] for key in distl1]
Python 3.6+
Dictionaries are ordered in Python 3.6+, as an implementation detail in 3.6 and confirmed in 3.7+. So in this case you can use dict with an iterable which ignores duplicates. To ignore duplicates, you can use the itertools unique_everseen recipe, also available via 3rd party more_itertools.unique_everseen or toolz.unique:
from operator import itemgetter
from toolz import unique
l1 = ['a','b','a','e','b','c','a','d']
l2 = ['t1','t2','t3','t4','t5','t6','t7','t8']
keys, values = zip(*dict(unique(zip(l1, l2), key=itemgetter(0))).items())
print(keys)
('a', 'b', 'e', 'c', 'd')
print(values)
('t1', 't2', 't4', 't6', 't8')
Python 2.7
You can use collections.OrderedDict instead of dict for Python 2.7, where dictionaries are not ordered:
from collections import OrderedDict
keys, values = zip(*OrderedDict(unique(zip(l1, l2), key=itemgetter(0))).items())
The question doesn't say if you need to preserve the order. If not, list of unique values of l1 would be:
distl1 = list(set(l1))
And the corresponding values of l2:
distl2 = [l2[l1.index(value)] for value in distl1]
(where index() always returns the first occurrence)
The resulting lists will keep your logical mapping, in the random order:
['b', 'e', 'c', 'd', 'a']
['t2', 't4', 't6', 't8', 't1']
EDIT:
Another approach (no dictionaries, no index() in a loop, order preserved, 2.7 friendly):
l1 = ['a','b','a','e','b','c','a','d']
l2 = ['t1','t2','t3','t4','t5','t6','t7','t8']
distl1 = []
distl2 = []
for i, val in enumerate(l1):
if val not in distl1:
distl1.append(val)
distl2.append(l2[i])

Multi Dimensional List - Sum Integer Element X by Common String Element Y

I have a multi dimensional list:
multiDimList = [['a',1],['a',1],['a',1],['b',2],['c',3],['c',3]]
I'm trying to sum the instances of element [1] where element [0] is common.
To put it more clearly, my desired output is another multi dimensional list:
multiDimListSum = [['a',3],['b',2],['c',6]]
I see I can access, say the value '2' in multiDimList by
x = multiDimList [3][1]
so I can grab the individual elements, and could probably build some sort of function to do this job, but it'd would be disgusting.
Does anyone have a suggestion of how to do this pythonically?
Assuming your actual sequence has similar elements grouped together as in your example (all instances of 'a', 'b' etc. together), you can use itertools.groupby() and operator.itemgetter():
from itertools import groupby
from operator import itemgetter
[[k, sum(v[1] for v in g)] for k, g in groupby(multiDimList, itemgetter(0))]
# result: [['a', 3], ['b', 2], ['c', 6]]
Zero Piraeus's answer covers the case when field entries are grouped in order. If they're not, then the following is short and reasonably efficient.
from collections import Counter
reduce(lambda c,x: c.update({x[0]: x[1]}) or c, multiDimList, Counter())
This returns a collection, accessible by element name. If you prefer it as a list you can call the .items() method on it, but note that the order of the labels in the output may be different from the order in the input even in the cases where the input was consistently ordered.
You could use a dict to accumulate the total associated to each string
d = {}
multiDimList = [['a',1],['a',1],['a',1],['b',2],['c',3],['c',3]]
for string, value in multiDimList:
# Retrieves the current value in the dict if it exists or 0
current_value = d.get(string, 0)
d[string] += value
print d # {'a': 3, 'b': 2, 'c': 6}
You can then access the value for b by using d["b"].

How to remove None when iterating through a list in python

I have this two unequal lists and i'm using itertools to loop through them and i'm trying to use the filter function to remove the None generated in List1 so that at the end of the day a contains only two elements instead of three (counting the none) but i keep getting this error: Type error: NoneType object is not iterable
import itertools
List1 = [['a'],['b']]
List2 = ['A','b','C']
l = list(itertools.chain(*List1))
print(l)
for a, b in itertools.zip_longest((b for a in List1 for b in a),List2):
filter(None, a)
print(a,b)
Not entirely clear what you want. As I understand the question and the comments, you want to use izip_longest to combine the lists, but without any None elements in the result.
This will filter the None from the zipped 'slices' of the lists and print only the non-None values. But note that this way you can not be sure whether, e.g., the first element in the non_none list came from the first list or the second or third.
a = ["1", "2"]
b = ["a", "b", "c", "d"]
c = ["x", "y", "z"]
for zipped in izip_longest(a, b, c):
non_none = filter(None, zipped)
print non_none
Output:
('1', 'a', 'x')
('2', 'b', 'y')
('c', 'z')
('d',)
BTW, what your filter(None, a) does: It filters the None values from your a, i.e. from the strings "a" and "b" (which does not do much, as they contain no None values), until it fails for the last value, as None is not iterable. Also, it discards the result anyway, as you do not bind it to a variable. filter does not alter the original list, but returns a filtered copy!
Why not just use zip?
for a, b in zip((b for a in List1 for b in a),List2):
print(a,b)
However, if you really insist on using zip_longest, you don't need to use filter to remove None values. You just need an if.
for a, b in itertools.zip_longest((b for a in List1 for b in a),List2):
if a is None: continue
print(a,b)
import itertools as it
def grouper(inputs, n, fillvalue=None):
iters = [iter(inputs)] * n
interim = it.zip_longest(*iters, fillvalue=fillvalue)
return interim
nums = range(23)
results = (list(grouper(nums, 4)))
finalanswer = []
for zipped in results:
# non_none = tuple(filter(None, zipped))
# unfortunately the line above filters 0 which is incorrect so instead use:
non_none = tuple(filter(lambda x: x is not None, zipped))
finalanswer.append(non_none)
print(finalanswer)
The code above uses zip_longest to illustrate generating zipped iterations of 'n' lists, irrespective of whether a corresponding entry exists in a given list or not --- and then strips out 'None' values --- noting that FILTER considers None and 0 as equivalent, so you have to use the 'lambda' version.

Combine Python List Elements Based On Another List

I have 2 lists:
phon = ["A","R","K","H"]
idx = [1,2,3,3]
idx corresponds to how phon should be grouped. In this case, phon_grouped should be ["A","R","KH"] because both "K" and "H" correspond to group 3.
I'm assuming some sort of zip or map function is required, but I'm not sure how to implement it. I have something like:
a = []
for i in enumerate(phon):
a[idx[i-1].append(phon[i])
but this does not actually work/compile
Use zip() and itertools.groupby() to group the output after zipping:
from itertools import groupby
from operator import itemgetter
result = [''.join([c for i, c in group])
for key, group in groupby(zip(idx, phon), itemgetter(0))]
itertools.groupby() requires that your input is already sorted on the key (your idx values here).
zip() pairs up the indices from idx with characters from phon
itertools.groupby() groups the resulting tuples on the first value, the index. Equal index values puts the tuples into the same group
The list comprehension then picks the characters from the group again and joins them into strings.
Demo:
>>> from itertools import groupby
>>> from operator import itemgetter
>>> phon = ["A","R","K","H"]
>>> idx = [1,2,3,3]
>>> [''.join([c for i, c in group]) for key, group in groupby(zip(idx, phon), itemgetter(0))]
['A', 'R', 'KH']
If you don't want to use an extra class:
phon = ["A","R","K","H"]
idx = [1,2,3,3]
a = [[] for i in range(idx[-1])] # Create list of lists of length(max(idx))
for data,place in enumerate(idx):
a[place-1].append(phon[data])
[['A'], ['R'], ['K', 'H']]
Mainly the trick is to just pre-initialize your list. You know the final list will be of the max number found in idx, which should be the last number as you said idx is sorted.
Not sure if you wanted the end result to be an appended list, or concatenated characters, i.e. "KH" vs ['K', 'H']

Categories

Resources