Python tuple operations and count - python

I have the following tuple.I want to build a string which outputs as stated in output.I want count all the elements corresponding to 'a' i.e, how many k1 occured w.r.t 'a' and so on .What is the easiest way to do this
a=[('a','k1'),('b','k2'),('a','k2'),('a','k1'),('b','k2'),('a','k1'),('b','k2'),('c','k3'),('c','k4')]
Output should be in a string output=""
a k1 3
a k2 1
b k1 1
b k2 3
c k3 1
c k4 1

Use the Counter class from collections:
>>> a = [('a', 'k1'), ('b', 'k2'), ('a', 'k2'), ('a', 'k1'), ('b', 'k2'), ('a', 'k1'), ('b', 'k2'), ('c', 'k3'), ('c', 'k4')]
>>> from collections import Counter
>>> c = Counter(a)
Counter({('b', 'k2'): 3, ('a', 'k1'): 3, ('a', 'k2'): 1, ('c', 'k3'): 1, ('c', 'k4'): 1})
You can use c.items() to iterate over the counts:
>>> for item in c.items():
... print(item)
...
(('a', 'k2'), 1)
(('c', 'k3'), 1)
(('b', 'k2'), 3)
(('a', 'k1'), 3)
(('c', 'k4'), 1)
The above code is Python 3. The Counter class is new in Python 2.7. You can now rearrange the items in the desired order and convert them to a string if needed.

You can do the addition portion easily with defaultdict. The default dict works like a normal dictionary, except it has a default value for empty key stores so you can easily increment your counter when you iterate over your data set.
a=[('a','k1'),('b','k2'),('a','k2'),('a','k1'),('b','k2'),('a','k1'),('b','k2'),('c','k3'),('c','k4')]
from collections import defaultdict
b = defaultdict(int)
for item in a:
b[item] += 1
print b
defaultdict(<type 'int'>, {('a', 'k2'): 1, ('c', 'k3'): 1, ('b', 'k2'): 3, ('a', 'k1'): 3, ('c', 'k4'): 1})
And for pretty printing it, just iterate over the resulting data and print it how you want.
for key, value in b.iteritems():
print '%s %s %s' % (key[0], key[1], value)

Related

Sort the letter count descending [duplicate]

Other than doing list comprehensions of reversed list comprehension, is there a pythonic way to sort Counter by value? If so, it is faster than this:
>>> from collections import Counter
>>> x = Counter({'a':5, 'b':3, 'c':7})
>>> sorted(x)
['a', 'b', 'c']
>>> sorted(x.items())
[('a', 5), ('b', 3), ('c', 7)]
>>> [(l,k) for k,l in sorted([(j,i) for i,j in x.items()])]
[('b', 3), ('a', 5), ('c', 7)]
>>> [(l,k) for k,l in sorted([(j,i) for i,j in x.items()], reverse=True)]
[('c', 7), ('a', 5), ('b', 3)
Use the Counter.most_common() method, it'll sort the items for you:
>>> from collections import Counter
>>> x = Counter({'a':5, 'b':3, 'c':7})
>>> x.most_common()
[('c', 7), ('a', 5), ('b', 3)]
It'll do so in the most efficient manner possible; if you ask for a Top N instead of all values, a heapq is used instead of a straight sort:
>>> x.most_common(1)
[('c', 7)]
Outside of counters, sorting can always be adjusted based on a key function; .sort() and sorted() both take callable that lets you specify a value on which to sort the input sequence; sorted(x, key=x.get, reverse=True) would give you the same sorting as x.most_common(), but only return the keys, for example:
>>> sorted(x, key=x.get, reverse=True)
['c', 'a', 'b']
or you can sort on only the value given (key, value) pairs:
>>> sorted(x.items(), key=lambda pair: pair[1], reverse=True)
[('c', 7), ('a', 5), ('b', 3)]
See the Python sorting howto for more information.
A rather nice addition to #MartijnPieters answer is to get back a dictionary sorted by occurrence since Collections.most_common only returns a tuple. I often couple this with a json output for handy log files:
from collections import Counter, OrderedDict
x = Counter({'a':5, 'b':3, 'c':7})
y = OrderedDict(x.most_common())
With the output:
OrderedDict([('c', 7), ('a', 5), ('b', 3)])
{
"c": 7,
"a": 5,
"b": 3
}
Yes:
>>> from collections import Counter
>>> x = Counter({'a':5, 'b':3, 'c':7})
Using the sorted keyword key and a lambda function:
>>> sorted(x.items(), key=lambda i: i[1])
[('b', 3), ('a', 5), ('c', 7)]
>>> sorted(x.items(), key=lambda i: i[1], reverse=True)
[('c', 7), ('a', 5), ('b', 3)]
This works for all dictionaries. However Counter has a special function which already gives you the sorted items (from most frequent, to least frequent). It's called most_common():
>>> x.most_common()
[('c', 7), ('a', 5), ('b', 3)]
>>> list(reversed(x.most_common())) # in order of least to most
[('b', 3), ('a', 5), ('c', 7)]
You can also specify how many items you want to see:
>>> x.most_common(2) # specify number you want
[('c', 7), ('a', 5)]
More general sorted, where the key keyword defines the sorting method, minus before numerical type indicates descending:
>>> x = Counter({'a':5, 'b':3, 'c':7})
>>> sorted(x.items(), key=lambda k: -k[1]) # Ascending
[('c', 7), ('a', 5), ('b', 3)]

Python: How to count the tuple ('a' , 'b') and tuple ('b', 'a') in a list as the same thing ? [duplicate]

This question already has answers here:
Get count of tuples in list regardless of elements orders
(4 answers)
Count the number of unique elements of a list of tuples regardless of order in Python
(2 answers)
Closed 4 years ago.
I want to get the following results:
Input: list = [('a' , 'b'), ('b', 'a'), ('c', 'd'), ('d','e'), ('e','d')]
Output: Counter({('a','b'):2,('c','d'):1, ('d','e'):2})
I have tried to implement the counter as:
count = Counter(list)
And it could only return :
Counter({('a', 'b'):1, ('b', 'a'):1, ('c', 'd'):1, ('d', 'e'):1, ('e','d'):1})
Simply sort the tuples first:
In [21]: l = [tuple(sorted(i)) for i in l]
In [22]: l
Out[22]: [('a', 'b'), ('a', 'b'), ('c', 'd'), ('d', 'e'), ('d', 'e')]
In [23]: Counter(l)
Out[23]: Counter({('a', 'b'): 2, ('c', 'd'): 1, ('d', 'e'): 2})
You can use collections.defaultdict with frozenset.
This works because frozenset is hashable and therefore can be used as a key in a dictionary.
If order non-preservation is the goal, then this is a good alternative to sorting each time.
from collections import defaultdict
lst = [('a' , 'b'), ('b', 'a'), ('c', 'd'), ('d','e'), ('e','d')]
d = defaultdict(int)
for i in map(frozenset, lst):
d[i] += 1
# defaultdict(int,
# {frozenset({'a', 'b'}): 2,
# frozenset({'c', 'd'}): 1,
# frozenset({'d', 'e'}): 2})

Most Pythonic way for creating a defaultdictionary counter

I am trying to count occurrences of various items based on condition. What I have until now is this function that given two items will increase the counter like this:
given [('a', 'a'), ('a', 'b'), ('b', 'a')] will output defaultdict(<class 'collections.Counter'>, {'a': Counter({'a': 1, 'b': 1}), 'b': Counter({'a': 1})
the function can be seen bellow
def freq(samples=None):
out = defaultdict(Counter)
if samples:
for (c, s) in samples:
out[c][s] += 1
return out
It is limited though to only work with tuples while I would like it to be more generic and work with any number of variables e.g., [('a', 'a', 'b'), ('a', 'b', 'c'), ('b', 'a', 'a')] would still work and I would be able to query the result for lets say res['a']['b'] and get the count for 'c' that is one.
What would be the best way to do this in Python?
Assuming all tuples in the list have the same length:
from collections import Counter
from itertools import groupby
from operator import itemgetter
def freq(samples=[]):
sorted_samples = sorted(samples)
if sorted_samples and len(sorted_samples[0]) > 2:
return {key: freq(value[1:] for value in values) for key, values in groupby(sorted_samples, itemgetter(0))}
else:
return {key: Counter(value[1] for value in values) for key, values in groupby(sorted_samples, itemgetter(0))}
That gives:
freq([('a', 'a'), ('a', 'b'), ('b', 'a'), ('a', 'c')])
>>> {'a': Counter({'a': 1, 'b': 1, 'c': 1}), 'b': Counter({'a': 1})}
freq([('a', 'a', 'a'), ('a', 'b', 'c'), ('b', 'a', 'a'), ('a', 'c', 'c')])
>>> {'a': {'a': Counter({'a': 1}), 'b': Counter({'c': 1}), 'c': Counter({'c': 1})}, 'b': {'a': Counter({'a': 1})}}
One option is to use the full tuples as keys
def freq(samples=[]):
out = Counter()
for sample in samples:
out[sample] += 1
return out
which would then return things as
Counter({('a', 'a', 'b'): 1, ('a', 'b', 'c'): 1, ('b', 'a', 'a'): 1})
You could convert the tuples to strings to select certain slices, e.g. "('a', 'b',". For example in a new dictionary {k: v for k,v in out.items() if str(k)[:10] == "('a', 'b',"}.
If the groups are indeed either 2 or 3 long, but never both, you can change to:
def freq(samples):
l = len(samples[0])
if l == 2:
out = defaultdict(lambda: 0)
for a, b in samples:
out[a][b] += 1
elif l == 3:
out = defaultdict(lambda: defaultdict(lambda: 0))
for a, b, c in samples:
out[a][b][c] += 1
return out

Matrix file to dictionary in python

I have a file matrix.txt that contains :
A B C
A 1 2 3
B 4 5 6
C 7 8 9
I want to read the content of the file and store it in a dictionary as following :
{('A', 'A') : 1, ('A', 'B') : 2, ('A', 'C') : 3,
('B', 'A') : 4, ('B', 'B') : 5, ('B', 'C') : 6,
('C', 'A') : 7, ('C', 'B') : 8, ('C', 'C') : 9}
The following Python3 function will yield all matrix items with it's indices, compatible with dict constructor:
def read_mx_cells(file, parse_cell = lambda x:x):
rows = (line.rstrip().split() for line in file)
header = next(rows)
for row in rows:
row_id = row[0]
for col_id,cell in zip(header, row[1:]):
yield ((row_id, col_id), parse_cell(cell))
with open('matrix.txt') as f:
for x in read_mx_cells(f, int):
print(x)
# ('A','A'),1
# ('A','B'),2
# ('A','C'),3 ...
with open('matrix.txt') as f:
print(dict(read_mx_cells(f, int)))
# { ('A','A'): 1, ('A','B'): 2, ('A','C'): 3 ... }
# Note that python dicts dont retain item order
You can use itertools.product to create your keys, using the file header and the first column after transposing to create the keys, then just zip transforming the remaining rows back to their original state and creating a single iterable of the split substrings. To maintain order we also need to use an OrderedDict:
from collections import OrderedDict
from itertools import izip, product, imap, chain
with open("matrix.txt") as f:
head, zipped = next(f).split(), izip(*imap(str.split, f))
cols = next(zipped)
od = OrderedDict(zip(product(head, cols), chain.from_iterable(izip(*zipped))))
Output:
OrderedDict([(('A', 'A'), '1'), (('A', 'B'), '2'), (('A', 'C'), '3'),
(('B', 'A'), '4'), (('B', 'B'), '5'), (('B', 'C'), '6'), (('C', 'A'), '7'),
(('C', 'B'), '8'), (('C', 'C'), '9')])
For python3 just use map and zip.
Or without transposing and using the csv lib:
from collections import OrderedDict
from itertools import izip,repeat
import csv
with open("matrix.txt") as f:
r = csv.reader(f, delimiter=" ", skipinitialspace=1)
head = repeat(next(r))
od = OrderedDict((((row[0], k), v) for row in r
for k, v in izip(next(head), row[1:])))
output will be the same.
pandas makes it pretty neat.
import pandas as pd
Approach 1
df = pd.read_table('matrix.txt', sep=' ')
>>> df
A B C
A 1 2 3
B 4 5 6
C 7 8 9
d = df.to_dict()
>>> d
{'A': {'A': 1, 'B': 4, 'C': 7},
'B': {'A': 2, 'B': 5, 'C': 8},
'C': {'A': 3, 'B': 6, 'C': 9}}
new_d = {}
{new_d.update(g) for g in [{(r,c):v for r,v in v1.iteritems()} for c,v1 in d.iteritems()]}
>>> new_d
{('A', 'A'): 1,
('A', 'B'): 2,
('A', 'C'): 3,
('B', 'A'): 4,
('B', 'B'): 5,
('B', 'C'): 6,
('C', 'A'): 7,
('C', 'B'): 8,
('C', 'C'): 9}
Approach 2
df = pd.read_table('matrix.txt', sep=' ')
>>> df
A B C
A 1 2 3
B 4 5 6
C 7 8 9
new_d = {}
for r, v in df.iterrows():
for c, v1 in v.iteritems():
new_d.update({(r,c): v1})
>>> new_d
{('A', 'A'): 1,
('A', 'B'): 2,
('A', 'C'): 3,
('B', 'A'): 4,
('B', 'B'): 5,
('B', 'C'): 6,
('C', 'A'): 7,
('C', 'B'): 8,
('C', 'C'): 9}

How to sort Counter by value? - python

Other than doing list comprehensions of reversed list comprehension, is there a pythonic way to sort Counter by value? If so, it is faster than this:
>>> from collections import Counter
>>> x = Counter({'a':5, 'b':3, 'c':7})
>>> sorted(x)
['a', 'b', 'c']
>>> sorted(x.items())
[('a', 5), ('b', 3), ('c', 7)]
>>> [(l,k) for k,l in sorted([(j,i) for i,j in x.items()])]
[('b', 3), ('a', 5), ('c', 7)]
>>> [(l,k) for k,l in sorted([(j,i) for i,j in x.items()], reverse=True)]
[('c', 7), ('a', 5), ('b', 3)
Use the Counter.most_common() method, it'll sort the items for you:
>>> from collections import Counter
>>> x = Counter({'a':5, 'b':3, 'c':7})
>>> x.most_common()
[('c', 7), ('a', 5), ('b', 3)]
It'll do so in the most efficient manner possible; if you ask for a Top N instead of all values, a heapq is used instead of a straight sort:
>>> x.most_common(1)
[('c', 7)]
Outside of counters, sorting can always be adjusted based on a key function; .sort() and sorted() both take callable that lets you specify a value on which to sort the input sequence; sorted(x, key=x.get, reverse=True) would give you the same sorting as x.most_common(), but only return the keys, for example:
>>> sorted(x, key=x.get, reverse=True)
['c', 'a', 'b']
or you can sort on only the value given (key, value) pairs:
>>> sorted(x.items(), key=lambda pair: pair[1], reverse=True)
[('c', 7), ('a', 5), ('b', 3)]
See the Python sorting howto for more information.
A rather nice addition to #MartijnPieters answer is to get back a dictionary sorted by occurrence since Collections.most_common only returns a tuple. I often couple this with a json output for handy log files:
from collections import Counter, OrderedDict
x = Counter({'a':5, 'b':3, 'c':7})
y = OrderedDict(x.most_common())
With the output:
OrderedDict([('c', 7), ('a', 5), ('b', 3)])
{
"c": 7,
"a": 5,
"b": 3
}
Yes:
>>> from collections import Counter
>>> x = Counter({'a':5, 'b':3, 'c':7})
Using the sorted keyword key and a lambda function:
>>> sorted(x.items(), key=lambda i: i[1])
[('b', 3), ('a', 5), ('c', 7)]
>>> sorted(x.items(), key=lambda i: i[1], reverse=True)
[('c', 7), ('a', 5), ('b', 3)]
This works for all dictionaries. However Counter has a special function which already gives you the sorted items (from most frequent, to least frequent). It's called most_common():
>>> x.most_common()
[('c', 7), ('a', 5), ('b', 3)]
>>> list(reversed(x.most_common())) # in order of least to most
[('b', 3), ('a', 5), ('c', 7)]
You can also specify how many items you want to see:
>>> x.most_common(2) # specify number you want
[('c', 7), ('a', 5)]
More general sorted, where the key keyword defines the sorting method, minus before numerical type indicates descending:
>>> x = Counter({'a':5, 'b':3, 'c':7})
>>> sorted(x.items(), key=lambda k: -k[1]) # Ascending
[('c', 7), ('a', 5), ('b', 3)]

Categories

Resources