I have 2 lists:
phon = ["A","R","K","H"]
idx = [1,2,3,3]
idx corresponds to how phon should be grouped. In this case, phon_grouped should be ["A","R","KH"] because both "K" and "H" correspond to group 3.
I'm assuming some sort of zip or map function is required, but I'm not sure how to implement it. I have something like:
a = []
for i in enumerate(phon):
a[idx[i-1].append(phon[i])
but this does not actually work/compile
Use zip() and itertools.groupby() to group the output after zipping:
from itertools import groupby
from operator import itemgetter
result = [''.join([c for i, c in group])
for key, group in groupby(zip(idx, phon), itemgetter(0))]
itertools.groupby() requires that your input is already sorted on the key (your idx values here).
zip() pairs up the indices from idx with characters from phon
itertools.groupby() groups the resulting tuples on the first value, the index. Equal index values puts the tuples into the same group
The list comprehension then picks the characters from the group again and joins them into strings.
Demo:
>>> from itertools import groupby
>>> from operator import itemgetter
>>> phon = ["A","R","K","H"]
>>> idx = [1,2,3,3]
>>> [''.join([c for i, c in group]) for key, group in groupby(zip(idx, phon), itemgetter(0))]
['A', 'R', 'KH']
If you don't want to use an extra class:
phon = ["A","R","K","H"]
idx = [1,2,3,3]
a = [[] for i in range(idx[-1])] # Create list of lists of length(max(idx))
for data,place in enumerate(idx):
a[place-1].append(phon[data])
[['A'], ['R'], ['K', 'H']]
Mainly the trick is to just pre-initialize your list. You know the final list will be of the max number found in idx, which should be the last number as you said idx is sorted.
Not sure if you wanted the end result to be an appended list, or concatenated characters, i.e. "KH" vs ['K', 'H']
Related
I am looking to group values in an input set together with the first element in a tuple acting as the key. The second elements need to be grouped together to a list based on the common key. Output needs to be a list with tuples.
# Input set
values = {(304008, 2020.0), (304008, 2017.0), (250128, 2020.0), (93646, 2020.0), (93646, 2017.0)}
# Current workflow
keys = {i[0] for i in values}
id_dict = dict()
for k in keys:
id_dict[k] = [int(i[1]) for i in values if i[0] == k]
lst2 = list(id_dict.items())
# Expected output
# [(250128, [2020]), (304008, [2017, 2020]), (93646, [2020, 2017])]
I have the expected output, but the whole process is too slow. I am looking to make it faster. I was looking at groupby functions, but I can't seem to make them work.
You can use itertools.groupby to accomplish this. Basically groupby the first element in the tuple, then make a list of the second elements in each group.
>>> from itertools import groupby
>>> [(k, [i[1] for i in g]) for k, g in groupby(sorted(values), key=lambda i: i[0])]
[(93646, [2017.0, 2020.0]), (250128, [2020.0]), (304008, [2017.0, 2020.0])]
You can use setdefault to make a dict with key as first item of tuple and iterate on the set to populate it in single shot.
The use list constructor to get the required list. See below:
>>> values = {(304008, 2020.0), (304008, 2017.0), (250128, 2020.0), (93646, 2020.0), (93646, 2017.0)}
>>> info = {}
>>> for elements in values:
... info.setdefault(elements[0], []).append(elements[1])
...
>>> list(info.items())
[(304008, [2017.0, 2020.0]), (93646, [2017.0, 2020.0]), (250128, [2020.0])]
>>>
This does not use groupby but avoids your second loop.
I have a question for grouping multiple list values into one values. For example I have this list
data_list = [A,A,B,B,B,C,C,C,C]
then I want to make it into this
data_list = [A, B, C]
I have tried using itertools.groupby but I still cannot find my solution
from itertools import groupby
data_list = [A,A,B,B,B,C,C,C,C]
data_group = [(key, len(list(group))) for key, group in groupby(data_list)]
print(data_group)
the expected output is data_group = [A, B, C]
the actual result is data_group = [(A, 2), (B, 3), (C, 4)]
Method-1 --
you can also use numpy to get unique values:-
import numpy as np
data_list = np.array(['A','A','B','B','B','C','C','C','C'])
np.unique(data_list)
Method-2
You can use set to get unique values but in set result will not contain the same order.
new_list = list( set(data_list) )
new_list
I hope it may help you.
Try with this code
mylist = ["a", "b", "a", "c", "c"]
mylist = list(dict.fromkeys(mylist))
print(mylist)
you can also use OrderedDict to print it in order
from collections import OrderedDict
mylist = ['A','A','B','B','B','C','C','C','C']
mylist = list(OrderedDict.fromkeys(mylist))
print(mylist)
Have you tried looking into sets?
you can first cast your original data_list into a set using set(data_list) then cast that again into a list.
data_list = [A,A,B,B,B,C,C,C,C]
print(list(set(data_list)))
#OUTPUT:
['A', 'B', 'C']
What sets do is they only include unique values. Hence why when we run the set() function on your data_list var, we are left with only the unique values. Sets, in python, are signified by 'curly brackets' like those in dicts, { }, but sets do not contain key:value pairs. The list() function casts your set as a list so you can treat it like a list in the future.
A good idea is to use python sets.
Per documentation, a part of the description is:
"A set is an unordered collection with no duplicate elements. Basic uses include membership testing and eliminating duplicate entries."
For example:
my_list = [1,1,2,2,3,3]
my_set = set(my_list)
print(my_set)
type(my_set)
Will output:
{1,2,3}
set
Mind that the resulting data type is set
So, if you want your result to be a list, you can cast it back into one:
unique_values = list(set(my_list))
And if you are planning to use that a lot in your code, a function would help:
def giveUnique(x):
return list(set(x))
my_list = giveUnique(my_list)
This would change my_list with a list containing unique values
Just adapt the itertools.groupby solution you have (found?) to only use the key:
>>> data_list = [A, A, B, B, B, C, C, C, C] # with A, B, C = "ABC"
>>> [(key, len(list(group))) for key, group in groupby(data_list)]
[('A', 2), ('B', 3), ('C', 4)]
>>> [key for key, group in groupby(data_list)]
['A', 'B', 'C']
Not sure how else to word this, but say I have a list containing the following sequence:
[a,a,a,b,b,b,a,a,a]
and I would like to return:
[a,b,a]
How would one do this in principle?
You can use itertools.groupby, this groups consecutive same elements in the same group and return an iterator of key value pairs where the key is the unique element you are looking for:
from itertools import groupby
[k for k, _ in groupby(lst)]
# ['a', 'b', 'a']
lst = ['a','a','a','b','b','b','a','a','a']
Psidoms way is a lot better, but I may as well write this so you can see how it'd be possible just using basic loops and statements. It's always good to figure out what steps you'd need to take for any problem, as it usually makes coding the simple things a bit easier :)
original = ['a','a','a','b','b','b','a','a','a']
new = [original[0]]
for letter in original[1:]:
if letter != new[-1]:
new.append(letter)
Basically it will append a letter if the previous letter is something different.
Using list comprehension:
original = ['a','a','a','b','b','b','a','a','a']
packed = [original[i] for i in range(len(original)) if i == 0 or original[i] != original[i-1]]
print(packed) # > ['a', 'b', 'a']
Similarly (thanks to pylang) you can use enumerate instead of range:
[ x for i,x in enumerate(original) if i == 0 or x != original[i-1] ]
more_itertools has an implementation of the unique_justseen recipe from itertools:
import more_itertools as mit
list(mit.unique_justseen(["a","a","a","b","b","b","a","a","a"]))
# ['a', 'b', 'a']
I have a multi dimensional list:
multiDimList = [['a',1],['a',1],['a',1],['b',2],['c',3],['c',3]]
I'm trying to sum the instances of element [1] where element [0] is common.
To put it more clearly, my desired output is another multi dimensional list:
multiDimListSum = [['a',3],['b',2],['c',6]]
I see I can access, say the value '2' in multiDimList by
x = multiDimList [3][1]
so I can grab the individual elements, and could probably build some sort of function to do this job, but it'd would be disgusting.
Does anyone have a suggestion of how to do this pythonically?
Assuming your actual sequence has similar elements grouped together as in your example (all instances of 'a', 'b' etc. together), you can use itertools.groupby() and operator.itemgetter():
from itertools import groupby
from operator import itemgetter
[[k, sum(v[1] for v in g)] for k, g in groupby(multiDimList, itemgetter(0))]
# result: [['a', 3], ['b', 2], ['c', 6]]
Zero Piraeus's answer covers the case when field entries are grouped in order. If they're not, then the following is short and reasonably efficient.
from collections import Counter
reduce(lambda c,x: c.update({x[0]: x[1]}) or c, multiDimList, Counter())
This returns a collection, accessible by element name. If you prefer it as a list you can call the .items() method on it, but note that the order of the labels in the output may be different from the order in the input even in the cases where the input was consistently ordered.
You could use a dict to accumulate the total associated to each string
d = {}
multiDimList = [['a',1],['a',1],['a',1],['b',2],['c',3],['c',3]]
for string, value in multiDimList:
# Retrieves the current value in the dict if it exists or 0
current_value = d.get(string, 0)
d[string] += value
print d # {'a': 3, 'b': 2, 'c': 6}
You can then access the value for b by using d["b"].
I have an iterator containing strings:
it = (_ for _ in ['aaxbb', 'aayybb', 'aaaaaaabb', 'ccabcavabb', 'yyaaadbb', 'yyaabb', 'a'])
I want to group these string if they have the same first and last two characters. The end result of the groupby in the above example should be:
[['aaxbb', 'aayybb', 'aaaaaaabb'],
['ccabcavabb'],
['yyaaadbb', 'yyaabb'],
['a']]
Can this complex groupby be achieved using itertools.groupby?
Not complex at all, just return a tuple of the first and last two characters:
lambda v: (v[:2], v[-2:])
or, if you want to use operator.itemgetter():
from operator import itemgetter
itemgetter(slice(2), slice(-2, None))
Demo:
>>> from itertools import groupby
>>> from operator import itemgetter
>>> sample = ['aaxbb', 'aayybb', 'aaaaaaabb', 'ccabcavabb', 'yyaaadbb', 'yyaabb', 'a']
>>> for key, group in groupby(sample, lambda v: (v[:2], v[-2:])):
... print list(group)
...
['aaxbb', 'aayybb', 'aaaaaaabb']
['ccabcavabb']
['yyaaadbb', 'yyaabb']
['a']
>>> for key, group in groupby(sample, itemgetter(slice(2), slice(-2, None))):
... print list(group)
...
['aaxbb', 'aayybb', 'aaaaaaabb']
['ccabcavabb']
['yyaaadbb', 'yyaabb']
['a']
It is important to do sorting first before using groupby. In this specific example, all the items that belong to a group appear consecutively, so sorting might be optional. But in general, the collection must be sorted before using groupby.
See the note from Python documentation regarding the same
https://docs.python.org/2/library/itertools.html
"The operation of groupby() is similar to the uniq filter in Unix. It generates a break or new group every time the value of the key function changes (which is why it is usually necessary to have sorted the data using the same key function). That behavior differs from SQL’s GROUP BY which aggregates common elements regardless of their input order."
sample = ['aaxbb', 'aayybb', 'aaaaaaabb', 'ccabcavabb', 'yyaaadbb', 'yyaabb', 'a', 'aaxxbb']
print [list(group) for key, group in groupby(sorted(sample), lambda x: x[:2]+x[-2:])]