weighted counting in python

weighted counting in python - python

I want to count the instances of X in a list, similar to
How can I count the occurrences of a list item in Python?
but taking into account a weight for each instance.
For example,
L = [(a,4), (a,1), (b,1), (b,1)]
the function weighted_count() should return something like
[(a,5), (b,2)]
Edited to add: my a, b will be integers.

you can still use counter:
from collections import Counter
c = Counter()
for k,v in L:
c.update({k:v})
print c

The following will give you a dictionary of all the letters in the array and their corresponding counts
counts = {}
for value in L:
if value[0] in counts:
counts[value[0]] += value[1]
else:
counts[value[0]] = value[1]
Alternatively, if you're looking for a very specific value. You can filter the list for that value, then map the list to the weights and find the sum of them.
def countOf(x,L):
filteredL = list(filter(lambda value: value[0] == x,L))
return sum(list(map(lambda value: value[1], filteredL)))

>>> import itertools
>>> L = [ ('a',4), ('a',1), ('b',1), ('b',1) ]
>>> [(k, sum(amt for _,amt in v)) for k,v in itertools.groupby(sorted(L), key=lambda tup: tup[0])]
[('a', 5), ('b', 2)]

defaultdict will do:
from collections import defaultdict
L = [('a',4), ('a',1), ('b',1), ('b',1)]
res = defaultdict(int)
for k, v in L:
res[k] += v
print(list(res.items()))
prints:
[('b', 2), ('a', 5)]

Group items with the occurrence of first element of each tuple using groupby from itertools:
>>> from itertools import groupby
>>> from operator import itemgetter
>>> L = [('a',4), ('a',1), ('b',1), ('b',1)]
>>> L_new = []
>>> for k,v in groupby(L,key=itemgetter(0)):
L_new.append((k,sum(map(itemgetter(1), v))))
>>> L_new
[('a', 5), ('b', 2)]
>>> L_new = [(k,sum(map(itemgetter(1), v))) for k,v in groupby(L, key=itemgetter(0))] #for those fun of list comprehension and one liner expression
>>> L_new
[('a', 5), ('b', 2)]
Tested in both Python2 & Python3

Use the dictionaries get method.
>>> d = {}
>>> for item in L:
... d[item[0]] = d.get(item[0], 0) + item[1]
...
>>> d
{'a': 5, 'b': 2}

Related

Summing values in dictionary who stores list of tuples

I have a list of tuples named lista and values in this list looks like:
[('A', 1234),('A', 9876),('B',6574),('B',9562), etc]
Next I create defaultdict(list) where I store my tuple and I get:
([('A', [1234, 9876]),('B',[6547.9562]), etc])
To create this I wrote:
//list
lista = []
for w in data:
if self.getAmountOfProceededInSpecificYear(w.status,w.year,w.district):
lista.append(tuple((w.district,w.amount)))
//dict
passed_dict = defaultdict(list)
for k,v in lista:
passed_dict[k].append(v)
Now I want to sum up values for each key and get:
'A', 11110
Does anobody know how to sum up this values?

You can use defaultdict(int) instead of defaultdict(list):
lista = [('A', 1234),('A', 9876),('B',6574),('B',9562)]
passed_dict = defaultdict(int)
for k,v in lista:
passed_dict[k] += v
passed_dict
defaultdict(int, {'A': 11110, 'B': 16136})

You can use itemgetter and groupby as suggested here
If you want a list as an output
from itertools import groupby
from operator import itemgetter
lista = [('A', 1234),('A', 9876),('B',6574),('B',9562)]
passed_dict = [(k, sum(list(zip(*v))[1])) for k, v in groupby(lista, itemgetter(0))]
# [('A', 11110), ('B', 16136)]
If you want a dictionary as an output
passed_dict = {k: sum(list(zip(*v))[1]) for k, v in groupby(lista, itemgetter(0))}
# {'A': 11110, 'B': 16136}

Python reduce sum tuple

I have an input that will vary in size.
data = [(("101","A"),5), (("105","C"),12), (("101", "B"),4)]
Looking for an output that groups by key[0], keeps all items of key[1]. And, sums up the values.
output = [(("101", "A", "B"),9), (("105", "C"),12)]
I've tried.
my_dict = dict(data)
final_values = {}
for k,v in my_dict.items():
key1 = k[0]
key2 = k[1]
if key1 not in final_values:
final_values[key1] = []
final_values[key1].append(key2)
final_values[key1].append(v)
Which returns.
{'101': ['A', 5, 'B', 4], '105': ['C', 12]}
I'd like to get the sum of the numbers in the list.

for k in final_values:
print '%s: sum is %d' % (k, sum([x for x in final_values[k] if type(x) is int]))

You can try using a collections.defaultdict() to group the items, then flattening the results at the end:
from collections import defaultdict
from operator import itemgetter
data = [(("101","A"),5), (("105","C"),12), (("101", "B"),4)]
d = defaultdict(list)
for (x, y), z in data:
d[x].append((y, z))
result = [
((k, *tuple(map(itemgetter(0), v))), sum(map(itemgetter(1), v)))
for k, v in d.items()
]
print(result)
# [(('101', 'A', 'B'), 9), (('105', 'C'), 12)]

Find count of identical adjacent characters in a string

I have a string: 'AAAAATTT'
I want to write a program that would count each time 2 values are identical.
So in 'AAAAATTT' it would give a count of:
AA: 4
TT: 2

You can use collections.defaultdict for this. This is an O(n) complexity solution which loops through adjacent letters and builds a dictionary based on a condition.
Your output will be a dictionary with keys as repeated letters and values as counts.
The use of itertools.islice is to avoid building a new list for the second argument of zip.
from collections import defaultdict
from itertools import islice
x = 'AAAAATTT'
d = defaultdict(int)
for i, j in zip(x, islice(x, 1, None)):
if i == j:
d[i+j] += 1
Result:
print(d)
defaultdict(<class 'int'>, {'AA': 4, 'TT': 2}

You could use a Counter:
from collections import Counter
s = 'AAAAATTT'
print([(k*2, v - 1) for k, v in Counter(list(s)).items() if v > 1])
#output: [('AA', 4), ('TT', 2)]

You may use collections.Counter with dictionary comprehension and zip as:
>>> from collections import Counter
>>> s = 'AAAAATTT'
>>> {k: v for k, v in Counter(zip(s, s[1:])).items() if k[0]==k[1]}
{('A', 'A'): 4, ('T', 'T'): 2}
Here's another alternative to achieve this using itertools.groupby, but this one is not as clean as the above solution (also will be slow in terms of performance).
>>> from itertools import groupby
>>> {x[0]:len(x) for i,j in groupby(zip(s, s[1:]), lambda y: y[0]==y[1]) for x in (tuple(j),) if i}
{('A', 'A'): 4, ('T', 'T'): 2}

One way may be as following using Counter:
from collections import Counter
string = 'AAAAATTT'
result = dict(Counter(s1+s2 for s1, s2 in zip(string, string[1:]) if s1==s2))
print(result)
Result:
{'AA': 4, 'TT': 2}

You can try it with just range method without importing anything :
data='AAAAATTT'
count_dict={}
for i in range(0,len(data),1):
data_x=data[i:i+2]
if len(data_x)>1:
if data_x[0] == data_x[1]:
if data_x not in count_dict:
count_dict[data_x] = 1
else:
count_dict[data_x] += 1
print(count_dict)
output:
{'TT': 2, 'AA': 4}

Splitting a list of tuples by 2nd element - python

How can i split a list of tuples by the 2nd element?
I can do it with 2 list comprehension:
tup = [('x',1),('y',2),('z',1)]
ones = [i for i in tup if i[1] == 1]
twos = [i for i in tup if i[1] == 2]
but is there a way to avoid looping through the list twice? like this?
ones, twos = [], []
for i in tup:
if i[1] == 1:
ones.append(i)
if i[1] == 2:
twos.append(i)
any other way?

Using a collections.defaultdict() object:
from collections import defaultdict
numbered = defaultdict(list)
for i in tup:
numbered[i[1]].append(i)
Now numbered[1] contains all ones, numbered[2] a list of all twos. This solution extends to more values of i[1] naturally without having to define any additional lists or if statements.
Demo:
>>> from collections import defaultdict
>>> tup = [('x',1),('y',2),('z',1)]
>>> numbered = defaultdict(list)
>>> for i in tup:
... numbered[i[1]].append(i)
...
>>> numbered
defaultdict(<type 'list'>, {1: [('x', 1), ('z', 1)], 2: [('y', 2)]})
>>> numbered[1]
[('x', 1), ('z', 1)]
>>> numbered[2]
[('y', 2)]
A defaultdict is just a dict subclass with additional behaviour; you can do without it too with a little more complexity and a slight loss in speed:
numbered = {}
for i in tup:
numbered.setdefault(i[1], []).append(i)

How to write a function that takes a string and prints the letters in decreasing order of frequency?

I got this far:
def most_frequent(string):
d = dict()
for key in string:
if key not in d:
d[key] = 1
else:
d[key] += 1
return d
print most_frequent('aabbbc')
Returning:
{'a': 2, 'c': 1, 'b': 3}
Now I need to:
reverse the pair
sort by number by decreasing order
only print the letters out
Should I convert this dictionary to tuples or list?

Here's a one line answer
sortedLetters = sorted(d.iteritems(), key=lambda (k,v): (v,k))

This should do it nicely.
def frequency_analysis(string):
d = dict()
for key in string:
d[key] = d.get(key, 0) + 1
return d
def letters_in_order_of_frequency(string):
frequencies = frequency_analysis(string)
# frequencies is of bounded size because number of letters is bounded by the dictionary, not the input size
frequency_list = [(freq, letter) for (letter, freq) in frequencies.iteritems()]
frequency_list.sort(reverse=True)
return [letter for freq, letter in frequency_list]
string = 'aabbbc'
print letters_in_order_of_frequency(string)

Here is something that returns a list of tuples rather than a dictionary:
import operator
if __name__ == '__main__':
test_string = 'cnaa'
string_dict = dict()
for letter in test_string:
if letter not in string_dict:
string_dict[letter] = test_string.count(letter)
# Sort dictionary by values, credits go here http://stackoverflow.com/questions/613183/sort-a-dictionary-in-python-by-the-value/613218#613218
ordered_answer = sorted(string_dict.items(), key=operator.itemgetter(1), reverse=True)
print ordered_answer

Python 2.7 supports this use case directly:
>>> from collections import Counter
>>> Counter('abracadabra').most_common()
[('a', 5), ('r', 2), ('b', 2), ('c', 1), ('d', 1)]

chills42 lambda function wins, I think but as an alternative, how about generating the dictionary with the counts as the keys instead?
def count_chars(string):
distinct = set(string)
dictionary = {}
for s in distinct:
num = len(string.split(s)) - 1
dictionary[num] = s
return dictionary
def print_dict_in_reverse_order(d):
_list = d.keys()
_list.sort()
_list.reverse()
for s in _list:
print d[s]

EDIT This will do what you want. I'm stealing chills42 line and adding another:
sortedLetters = sorted(d.iteritems(), key=lambda (k,v): (v,k))
sortedString = ''.join([c[0] for c in reversed(sortedLetters)])
------------original answer------------
To print out the sorted string add another line to chills42 one-liner:
''.join(map(lambda c: str(c[0]*c[1]), reversed(sortedLetters)))
This prints out 'bbbaac'
If you want single letters, 'bac' use this:
''.join([c[0] for c in reversed(sortedLetters)])

from collections import defaultdict
def most_frequent(s):
d = defaultdict(int)
for c in s:
d[c] += 1
return "".join([
k for k, v in sorted(
d.iteritems(), reverse=True, key=lambda (k, v): v)
])
EDIT:
here is my one liner:
def most_frequent(s):
return "".join([
c for frequency, c in sorted(
[(s.count(c), c) for c in set(s)], reverse=True
)
])

Here's the code for your most_frequent function:
>>> a = 'aabbbc'
>>> {i: a.count(i) for i in set(a)}
{'a': 2, 'c': 1, 'b': 3}
this particular syntax is for py3k, but it's easy to write something similar using syntax of previous versions. it seems to me a bit more readable than yours.

def reversedSortedFrequency(string)
from collections import defaultdict
d = defaultdict(int)
for c in string:
d[c]+=1
return sorted([(v,k) for k,v in d.items()], key=lambda (k,v): -k)

Here is the fixed version (thank you for pointing out bugs)
def frequency(s):
return ''.join(
[k for k, v in
sorted(
reduce(
lambda d, c: d.update([[c, d.get(c, 0) + 1]]) or d,
list(s),
dict()).items(),
lambda a, b: cmp(a[1], b[1]),
reverse=True)])
I think the use of reduce makes the difference in this sollution compared to the others...
In action:
>>> from frequency import frequency
>>> frequency('abbbccddddxxxyyyyyz')
'ydbxcaz'
This includes extracting the keys (and counting them) as well!!! Another nice property is the initialization of the dictionary on the same line :)
Also: no includes, just builtins.
The reduce function is kinda hard to wrap my head around, and setting dictionary values in a lambda is also a bit cumbersome in python, but, ah well, it works!

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

weighted counting in python - python

you can still use counter: from collections import Counter c = Counter() for k,v in L: c.update({k:v}) print c

>>> import itertools >>> L = [ ('a',4), ('a',1), ('b',1), ('b',1) ] >>> [(k, sum(amt for _,amt in v)) for k,v in itertools.groupby(sorted(L), key=lambda tup: tup[0])] [('a', 5), ('b', 2)]

defaultdict will do: from collections import defaultdict L = [('a',4), ('a',1), ('b',1), ('b',1)] res = defaultdict(int) for k, v in L: res[k] += v print(list(res.items())) prints: [('b', 2), ('a', 5)]

Use the dictionaries get method. >>> d = {} >>> for item in L: ... d[item[0]] = d.get(item[0], 0) + item[1] ... >>> d {'a': 5, 'b': 2}

Related

Summing values in dictionary who stores list of tuples

Python reduce sum tuple

Find count of identical adjacent characters in a string

Splitting a list of tuples by 2nd element - python

How to write a function that takes a string and prints the letters in decreasing order of frequency?

Categories

Resources