Concatenate strings by groups python - python

I would like to concatenate a list of strings into new strings grouped over values in a list. Here is an example of what I mean:
Input
key = ['1','2','2','3']
data = ['a','b','c','d']
Result
newkey = ['1','2','3']
newdata = ['a','b c','d']
I understand how to join text. But I don't know how to iterate correctly over the values of the list to aggregate the strings that are common to the same key value.
Any help or suggestions appreciated. Thanks.

from collections import defaultdict
d = defaultdict(list)
for k, v in zip(key, data):
d[k].append(v)
print [(k, ' '.join(v)) for k, v in d.items()]
Output:
[('1', 'a'), ('3', 'd'), ('2', 'b c')]
And how to get new lists:
newkey, newvalue = d.keys(), [' '.join(v) for v in d.values()]
And with saved order:
newkey, newvalue = zip(*[(k, ' '.join(d.pop(k))) for k in key if k in d])

Use the itertools.groupby() function to combine elements; zip will let you group two input lists into two output lists:
import itertools
import operator
newkey, newdata = [], []
for key, items in itertools.groupby(zip(key, data), key=operator.itemgetter(0)):
# key is the grouped key, items an iterable of key, data pairs
newkey.append(key)
newdata.append(' '.join(d for k, d in items))
You can turn this into a list comprehension with a bit more zip() magic:
from itertools import groupby
from operator import itemgetter
newkey, newdata = zip(*[(k, ' '.join(d for _, d in it)) for k, it in groupby(zip(key, data), key=itemgetter(0))])
Note that this does require the input to be sorted; groupby only groups elements based on the consecutive keys being the same. On the other hand, it does preserve that initial sorted order.

you can use itertools.groupby() on zip(key,data):
In [128]: from itertools import *
In [129]: from operator import *
In [133]: lis=[(k," ".join(x[1] for x in g)) for k,g in groupby(zip(key,data),key=itemgetter(0))]
In [134]: newkey,newdata=zip(*lis)
In [135]: newkey
Out[135]: ('1', '2', '3')
In [136]: newdata
Out[136]: ('a', 'b c', 'd')

If you dont feel like importing collections you can always use a regular dictionary.
key = ['1','2','2','3']
data = ['a','b','c','d']
newkeydata = {}
for k,d in zip(key,data):
newkeydata[k] = newkeydata.get(k, []).append(d)

Just for the sake of variety, here is a solution that works without any external libraries and without dictionaries:
def group_vals(keys, vals):
new_keys= sorted(set(keys))
zipped_keys = zip(keys, keys[1:]+[''])
zipped_vals = zip(vals, vals[1:]+[''])
new_vals = []
for i, (key1, key2) in enumerate(zipped_keys):
if key1 == key2:
new_vals.append(' '.join(zipped_vals[i]))
else:
new_vals.append(zipped_vals[i][0])
return new_keys, new_vals
group_vals([1,2,2,3], ['a','b','c','d'])
# --> ([1, 2, 3], ['a', 'b c', 'd'])
But I know that it's quite ugly and probably not as performant as the other solutions. Just for demonstration purposes. :)

Related

Get sequences of same values within list and count elements within sequences

I'd like to find the amount of values within sequences of the same value from a list:
list = ['A','A','A','B','B','C','A','A']
The result should look like:
result_dic = {A: [3,2], B: [2], C: [1]}
I do not just want the counts of different values in a list as you can see in the result for A.
collections.defaultdict and itertools.groupby
from itertools import groupby
from collections import defaultdict
listy = ['A','A','A','B','B','C','A','A']
d = defaultdict(list)
for k, v in groupby(listy):
d[k].append(len([*v]))
d
defaultdict(list, {'A': [3, 2], 'B': [2], 'C': [1]})
groupby will loop through an iterable and lump contiguous things together.
[(k, [*v]) for k, v in groupby(listy)]
[('A', ['A', 'A', 'A']), ('B', ['B', 'B']), ('C', ['C']), ('A', ['A', 'A'])]
So I loop through those results and append the length of each grouped thing to the values of a defaultdict
I'd suggest using a defaultdict and looping through the list.
from collections import defaultdict
sample = ['A','A','A','B','B','C','A','A']
result_dic = defaultdict(list)
last_letter = None
num = 0
for l in sample:
if last_letter == l or last_letter is None:
num += 1
else:
result_dic[last_letter].append(num)
Edit
This is my approach, although I'd have a look at #piRSquared's answer because they were keen enough to include groupby as well. Nice work!
I'd suggest looping through the list.
result_dic = {}
old_word = ''
for word in list:
if not word in result_dic:
d[word] = [1]
elif word == old_word:
result_dic[word][-1] += 1
else:
result_dic[word].append(1)
old_word = word

How can I only parse/split this list with multiple colons in each element? Create dictionary

I have the following Python list:
list1 = ['EW:G:B<<LADHFSSFAFFF', 'CB:E:OWTOWTW', 'PP:E:A,A<F<AF', 'GR:A:OUO-1-XXX-EGD:forthyFive:1:HMJeCXX:7', 'SX:F:-111', 'DS:f:115.5', 'MW:AA:0', 'MA:A:0XT:i:0', 'EY:EE:KJERWEWERKJWE']
I would like to take the entries of this list and create a dictionary of key-values pairs that looks like
dictionary_list1 = {'EW':'G:B<<LADHFSSFAFFF', 'CB':'E:OWTOWTW', 'PP':'E:A,A<F<AF', 'GR':'A:OUO-1-XXX-EGD:forthyFive:1:HMJeCXX:7', 'SX':'F:-111', 'DS':'f:115.5', 'MW':'AA:0', 'MA':'A:0XT:i:0', 'EW':'EE:KJERWEWERKJWE'}
How does one parse/split the list above list1 to do this? My first instinct was to try try1 = list1.split(":"), but then I think it is impossible to retrieve the "key" for this list, as there are multiple colons :
What is the most pythonic way to do this?
You can specify a maximum number of times to split with the second argument to split.
list1 = ['EW:G:B<<LADHFSSFAFFF', 'CB:E:OWTOWTW', 'PP:E:A,A<F<AF', 'GR:A:OUO-1-XXX-EGD:forthyFive:1:HMJeCXX:7', 'SX:F:-111', 'DS:f:115.5', 'MW:AA:0', 'MA:A:0XT:i:0', 'EW:EE:KJERWEWERKJWE']
d = dict(item.split(':', 1) for item in list1)
Result:
>>> import pprint
>>> pprint.pprint(d)
{'CB': 'E:OWTOWTW',
'DS': 'f:115.5',
'EW': 'EE:KJERWEWERKJWE',
'GR': 'A:OUO-1-XXX-EGD:forthyFive:1:HMJeCXX:7',
'MA': 'A:0XT:i:0',
'MW': 'AA:0',
'PP': 'E:A,A<F<AF',
'SX': 'F:-111'}
If you'd like to keep track of values for non-unique keys, like 'EW:G:B<<LADHFSSFAFFF' and 'EW:EE:KJERWEWERKJWE', you could add keys to a collections.defaultdict:
import collections
d = collections.defaultdict(list)
for item in list1:
k,v = item.split(':', 1)
d[k].append(v)
Result:
>>> pprint.pprint(d)
{'CB': ['E:OWTOWTW'],
'DS': ['f:115.5'],
'EW': ['G:B<<LADHFSSFAFFF', 'EE:KJERWEWERKJWE'],
'GR': ['A:OUO-1-XXX-EGD:forthyFive:1:HMJeCXX:7'],
'MA': ['A:0XT:i:0'],
'MW': ['AA:0'],
'PP': ['E:A,A<F<AF'],
'SX': ['F:-111']}
You can also use str.partition
list1 = ['EW:G:B<<LADHFSSFAFFF', 'CB:E:OWTOWTW', 'PP:E:A,A<F<AF', 'GR:A:OUO-1-XXX-EGD:forthyFive:1:HMJeCXX:7', 'SX:F:-111', 'DS:f:115.5', 'MW:AA:0', 'MA:A:0XT:i:0', 'EW:EE:KJERWEWERKJWE']
d = dict([t for t in x.partition(':') if t!=':'] for x in list1)
# or more simply as TigerhawkT3 mentioned in the comment
d = dict(x.partition(':')[::2] for x in list1)
for k, v in d.items():
print('{}: {}'.format(k, v))
Output:
MW: AA:0
CB: E:OWTOWTW
GR: A:OUO-1-XXX-EGD:forthyFive:1:HMJeCXX:7
PP: E:A,A<F<AF
EW: EE:KJERWEWERKJWE
SX: F:-111
DS: f:115.5
MA: A:0XT:i:0

weighted counting in python

I want to count the instances of X in a list, similar to
How can I count the occurrences of a list item in Python?
but taking into account a weight for each instance.
For example,
L = [(a,4), (a,1), (b,1), (b,1)]
the function weighted_count() should return something like
[(a,5), (b,2)]
Edited to add: my a, b will be integers.
you can still use counter:
from collections import Counter
c = Counter()
for k,v in L:
c.update({k:v})
print c
The following will give you a dictionary of all the letters in the array and their corresponding counts
counts = {}
for value in L:
if value[0] in counts:
counts[value[0]] += value[1]
else:
counts[value[0]] = value[1]
Alternatively, if you're looking for a very specific value. You can filter the list for that value, then map the list to the weights and find the sum of them.
def countOf(x,L):
filteredL = list(filter(lambda value: value[0] == x,L))
return sum(list(map(lambda value: value[1], filteredL)))
>>> import itertools
>>> L = [ ('a',4), ('a',1), ('b',1), ('b',1) ]
>>> [(k, sum(amt for _,amt in v)) for k,v in itertools.groupby(sorted(L), key=lambda tup: tup[0])]
[('a', 5), ('b', 2)]
defaultdict will do:
from collections import defaultdict
L = [('a',4), ('a',1), ('b',1), ('b',1)]
res = defaultdict(int)
for k, v in L:
res[k] += v
print(list(res.items()))
prints:
[('b', 2), ('a', 5)]
Group items with the occurrence of first element of each tuple using groupby from itertools:
>>> from itertools import groupby
>>> from operator import itemgetter
>>> L = [('a',4), ('a',1), ('b',1), ('b',1)]
>>> L_new = []
>>> for k,v in groupby(L,key=itemgetter(0)):
L_new.append((k,sum(map(itemgetter(1), v))))
>>> L_new
[('a', 5), ('b', 2)]
>>> L_new = [(k,sum(map(itemgetter(1), v))) for k,v in groupby(L, key=itemgetter(0))] #for those fun of list comprehension and one liner expression
>>> L_new
[('a', 5), ('b', 2)]
Tested in both Python2 & Python3
Use the dictionaries get method.
>>> d = {}
>>> for item in L:
... d[item[0]] = d.get(item[0], 0) + item[1]
...
>>> d
{'a': 5, 'b': 2}

Refer a value from a key which is in the form tuple of multiple elements

dic = {('UUU','UUC'):'F',('GUU','GUC','GUA','GUG'):'V'}
L = ['UUU', 'GUG', 'GUU']
As you see each elements of list(L) are in dictionary as keys. Now i want to replace each elements of L by its corresponding values. Output would be:
output = ['F','V']
How can i do that?
One way would be to decompose the keys into individual elements, and create a new dict from those:
new_dic = {}
for k, v in dic.items():
for sub_k in k:
new_dic[sub_k] = v
Now it's a simple matter of looping through the list:
output = [new_dic[i] for i in L]
and you can de-duplicate with set:
output = list(set(output))
Using list compression:
In [1]: dic = {('UUU','UUC'):'F',('GUU','GUC','GUA','GUG'):'V'}
In [2]: L = ['UUU', 'GUG', 'GUU']
In [3]: list(set([v for k,v in dic.items() for x in L if x in k]))
Out [3]: ['V', 'F']

list to dictionary conversion with multiple values per key?

I have a Python list which holds pairs of key/value:
l = [[1, 'A'], [1, 'B'], [2, 'C']]
I want to convert the list into a dictionary, where multiple values per key would be aggregated into a tuple:
{1: ('A', 'B'), 2: ('C',)}
The iterative solution is trivial:
l = [[1, 'A'], [1, 'B'], [2, 'C']]
d = {}
for pair in l:
if pair[0] in d:
d[pair[0]] = d[pair[0]] + tuple(pair[1])
else:
d[pair[0]] = tuple(pair[1])
print(d)
{1: ('A', 'B'), 2: ('C',)}
Is there a more elegant, Pythonic solution for this task?
from collections import defaultdict
d1 = defaultdict(list)
for k, v in l:
d1[k].append(v)
d = dict((k, tuple(v)) for k, v in d1.items())
d contains now {1: ('A', 'B'), 2: ('C',)}
d1 is a temporary defaultdict with lists as values, which will be converted to tuples in the last line. This way you are appending to lists and not recreating tuples in the main loop.
Using lists instead of tuples as dict values:
l = [[1, 'A'], [1, 'B'], [2, 'C']]
d = {}
for key, val in l:
d.setdefault(key, []).append(val)
print(d)
Using a plain dictionary is often preferable over a defaultdict, in particular if you build it just once and then continue to read from it later in your code:
First, the plain dictionary is faster to build and access.
Second, and more importantly, the later read operations will error out if you try to access a key that doesn't exist, instead of silently creating that key. A plain dictionary lets you explicitly state when you want to create a key-value pair, while the defaultdict always implicitly creates them, on any kind of access.
This method is relatively efficient and quite compact:
reduce(lambda x, (k,v): x[k].append(v) or x, l, defaultdict(list))
In Python3 this becomes (making exports explicit):
dict(functools.reduce(lambda x, d: x[d[0]].append(d[1]) or x, l, collections.defaultdict(list)))
Note that reduce has moved to functools and that lambdas no longer accept tuples. This version still works in 2.6 and 2.7.
Are the keys already sorted in the input list? If that's the case, you have a functional solution:
import itertools
lst = [(1, 'A'), (1, 'B'), (2, 'C')]
dct = dict((key, tuple(v for (k, v) in pairs))
for (key, pairs) in itertools.groupby(lst, lambda pair: pair[0]))
print dct
# {1: ('A', 'B'), 2: ('C',)}
I had a list of values created as follows:
performance_data = driver.execute_script('return window.performance.getEntries()')
Then I had to store the data (name and duration) in a dictionary with multiple values:
dictionary = {}
for performance_data in range(3):
driver.get(self.base_url)
performance_data = driver.execute_script('return window.performance.getEntries()')
for result in performance_data:
key=result['name']
val=result['duration']
dictionary.setdefault(key, []).append(val)
print(dictionary)
My data was in a Pandas.DataFrame
myDict = dict()
for idin set(data['id'].values):
temp = data[data['id'] == id]
myDict[id] = temp['IP_addr'].to_list()
myDict
Gave me a Dict of the keys, ID, mappings to >= 1 IP_addr. The first IP_addr is Guaranteed. My code should work even if temp['IP_addr'].to_list() == []
{'fooboo_NaN': ['1.1.1.1', '8.8.8.8']}
My two coins for toss into that amazing discussion)
I've tried to wonder around one line solution with only standad libraries. Excuse me for the two excessive imports. Perhaps below code could solve the issue with satisfying quality (for the python3):
from functools import reduce
from collections import defaultdict
a = [1, 1, 2, 3, 1]
b = ['A', 'B', 'C', 'D', 'E']
c = zip(a, b)
print({**reduce(lambda d,e: d[e[0]].append(e[1]) or d, c, defaultdict(list))})

Categories

Resources