In Python 2.7: I'm measuring a process that counts the keys of a dictionary returned from a function.
A basic example is shown where the function getList() returns a list of chars which may be ['a'], ['b'], ['c'] or ['d']; most lists are single elements though two may be returned sometimes, e.g. ['a', 'd']. I'd like to count everything returned. A way I thought of doing this is shown below:
myDict = {'a':0, 'b':0, 'c':0, 'd':0, 'error':0, 'total':0}
for key in charList:
myDict[key] += 1
myDict['total'] += 1
Is there a more Pythonic way, perhaps dictionary comprehension to count keys within lists (of varying length)?
import random
def getList():
'''mimics a prcoess that returns a list of chars between a - d
[most lists are single elements, though some are two elements]'''
number = (random.randint(97,101))
if number == 101:
charList = [chr(number-1), chr(random.randint(97,100))]
if charList[0] == charList[1]:
getList()
else:
charList = [chr(number)]
return charList
myDict = {'a':0, 'b':0, 'c':0, 'd':0, 'error':0, 'total':0}
for counter in range(0,5):
charList = getList()
for key in charList:
print charList, '\t', key
try:
myDict[key] += 1
myDict['total'] += 1
except:
myDict['error'] += 1
print "\n",myDict
Output generated:
You can use the built-in collections.Counter class: https://docs.python.org/2/library/collections.html#collections.Counter
For example with your code:
import collections
ctr = collections.Counter()
for ii in range(0,5):
charList = getList()
ctr.update(charList)
ctr['total'] = sum(ctr.values())
print ctr
This will print:
Counter({'total': 7, 'd': 5, 'a': 1, 'c': 1})
You can use collections.Counter:
# You need to initialize the counter or you won't get the entry with 0 count.
myDict = collections.Counter({'a': 0, 'b': 0, 'c': 0, 'd': 0})
myDict.update(x for _ in range(0, 5) for x in getList())
# Then create the 'total' entry
myDict['total'] = sum(myDict.values())
Note: This may add new keys to the counter without setting the 'error' entry if the list returned by getList() contains new characters ('e', 'f', ...).
Use collections.Counter and a double-loop generator expression for feeding the individual elements into the counter:
>>> lst = [['a'], ['a', 'b'], ['c'], ['c', 'd']]
>>> c = collections.Counter((y for x in lst for y in x))
>>> c
Counter({'a': 2, 'c': 2, 'b': 1, 'd': 1})
>>> c.most_common(2)
[('a', 2), ('c', 2)]
>>> sum(c.values())
6
The easiest way I can think of is to flatten your list with chain and then use a Counter: let lst be the list [['a'], ['b'], ['c'], ['d'], ['a', 'd']]
>>> from itertools import chain
>>> from collections import Counter
>>> c = Counter(chain(*lst))
>>> c['total'] = sum(c.values())
>>> c
Counter({'total': 6, 'd': 2, 'a': 2, 'b': 1, 'c': 1})
Related
I would like to know how to change this code to NOT using the function zip. I haven’t been taught this function yet and so I want to know if there is an alternative way to retrieve the output I require?
list_one = ['a', 'a', 'c', 'd']
list_two = [1, 2, 3, 4]
dict_1 = {}
for key, value in zip(list_one, list_two):
if key not in dict_1:
dict_1[key] = [value]
else:
dict_1[key].append(value)
print(dict_1)
I would like the output to be:
{'a': [1, 2], 'd': [4], 'c': [3]}
A simple way to do this:
l1 = ['a', 'a', 'c', 'd']
l2 = [1, 2, 3, 4]
# Dict comprehension to initialize keys: list pairs
dct = {x: [] for x in l1}
# Append value related to key
for i in range(len(l1)):
dct[l1[i]].append(l2[i])
print(dct)
Output:
{'a': [1, 2], 'c': [3], 'd': [4]}
zip() makes it easy to iterate through multiple lists in parallel. If you try to understand the zip() it would be very easy to replicate it. So, please find the explanation and example in the official docs here.
Below is an example code with the implementation,
list_one = ['a', 'a', 'c', 'd']
list_two = [1,2,3,4]
dict_1={}
for index in range(len(list_one)):
if list_one[index] not in dict_1:
dict_1[list_one[index]]=[list_two[index]]
else:
dict_1[list_one[index]].append(list_two[index])
print(dict_1)
Output:
{'a': [1, 2], 'c': [3], 'd': [4]}
You can rewrite like this without any library.
list_one = ['a', 'a', 'c', 'd']
list_two = [1,2,3,4]
dict_1={}
if len(list_one) == len(list_two):
for i in range(len(list_one)):
if list_one[i] not in dict_1:
dict_1[list_one[i]]=[list_two[i]]
else:
dict_1[list_one[i]].append(list_two[i])
print(dict_1)
Assuming the two lists always have the same length, you can circumvent the use of zip() by iterating over the indices:
dict_1 = {}
for i in range(len(list_one)):
key = list_one[i]
value = list_two[i]
if key not in dict_1:
dict_1[key] = [value]
else:
dict_1[key].append(value)
print(dict_1)
As others have said, this is not a recommended way to do this, because the code using zip() is more readable, and zip() is a built-in function, so there shouldn't be any reason not to use it.
In a python list, I want to delete all elements repeated less than 'k'.
for example if k == 3 then if our list is:
l = [a,b,c,c,c,a,d,e,e,d,d]
then the output must be:
[c,c,c,d,d,d]
what is a fast way to do that (my data is large), any good pythonic suggestion?
this is what I coded but I don't think it is the fastest and most pythonic way:
from collections import Counter
l = ['a', 'b', 'c', 'c', 'c', 'a', 'd', 'e', 'e', 'd', 'd']
counted = Counter(l)
temp = []
for i in counted:
if counted[i] < 3:
temp.append(i)
new_l = []
for i in l:
if i not in temp:
new_l.append(i)
print(new_l)
You can use collections.Counter to construct a dictionary mapping values to counts. Then use a list comprehension to filter for counts larger than a specified value.
from collections import Counter
L = list('abcccadeedd')
c = Counter(L)
res = [x for x in L if c[x] >=3]
# ['c', 'c', 'c', 'd', 'd', 'd']
A brute-force option would be to get the number of occurrences per item, then filter that output. The collections.Counter object works nicely here:
l = [a,b,c,c,c,a,d,e,e,d,d]
c = Counter(l)
# Counter looks like {'a': 2, 'b': 1, 'c': 3...}
l = [item for item in l if c[item]>=3]
Under the hood, Counter acts as a dictionary, which you can build yourself like so:
c = {}
for item in l:
# This will check if item is in the dictionary
# if it is, add to current count, if it is not, start at 0
# and add 1
c[item] = c.get(item, 0) + 1
# And the rest of the syntax follows from here
l = [item for item in l if c[item]>=3]
I would use a Counter from collections:
from collections import Counter
count_dict = Counter(l)
[el for el in l if count_dict[el]>2]
Any drawback with this option?
l = ['a','b','c','c','c','a','d','e','e','d','d']
res = [ e for e in l if l.count(e) >= 3]
#=> ['c', 'c', 'c', 'd', 'd', 'd']
Say I have a list with some numbers that are duplicates.
list = [1,1,1,1,2,3,4,4,1,2,5,6]
I want to identify all the elements in the list that are repeating and consecutive, including the first element, i.e. replacing all elements in the list to values in a dictionary:
mydict = {1: 'a', 4: 'd'}
list = ['a','a','a','a',2,3,'d','d',1,2,5,6]
Because I want to replace the first instance of the repetition as well, I am quite confused as to how to proceed!
itertools.groupby is your friend:
from itertools import groupby
mydict = {1: 'a', 4: 'd'}
A = [1,1,1,1,2,3,4,4,1,2,5,6]
res = []
for k, g in groupby(A):
size = len(list(g))
if size > 1:
res.extend([mydict[k]] * size) # see note 1
else:
res.append(k)
print(res) # -> ['a', 'a', 'a', 'a', 2, 3, 'd', 'd', 1, 2, 5, 6]
Notes:
If you want to catch possible KeyErrors and have a default value you want to fall back on, use mydict.get(k, <default>) instead of mydict[k]
I have dictionary like below
dict1 = {'a':{'a':20, 'b':30}, 'b':{'a':30, 'b':40}, 'c':{'a':20, 'b':30}, 'd':{'a':30, 'b':40}}
Then in following dictionary two dictionaries are same, so expected result will be like below
result = [['a','c'],['b','d']]
>>> seen = {}
>>> dict1 = {'a':{'a':20, 'b':30}, 'b':{'a':30, 'b':40}, 'c':{'a':20, 'b':30}, 'd':{'a':30, 'b':40}}
>>> for k in dict1:
fs = frozenset(dict1[k].items())
seen.setdefault(fs, []).append(k)
>>> seen.values() # note: unordered
[['a', 'c'], ['b', 'd']]
If order is needed:
>>> from collections import OrderedDict
>>> dict1 = {'a':{'a':20, 'b':30}, 'b':{'a':30, 'b':40}, 'c':{'a':20, 'b':30}, 'd':{'a':30, 'b':40}}
>>> seen = OrderedDict()
>>> for k in sorted(dict1):
fs = frozenset(dict1[k].items())
seen.setdefault(fs, []).append(k)
>>> seen.values()
[['a', 'c'], ['b', 'd']]
Note: This code is currently cross-compatible on Python 2/3. On Python 2 you can make it more efficient by using .iteritems() instead of .items()
A quick one: 1st get different values, then list comprehension.
>>> values = []
>>> for k in dict1:
if dict1[k] not in values:
values.append(dict1[k])
>>> values
[{'a': 20, 'b': 30}, {'a': 30, 'b': 40}]
>>> [[k for k in dict1 if dict1[k] == v] for v in values]
[['a', 'c'], ['b', 'd']]
Here is a list containing duplicates:
l1 = ['a', 'b', 'c', 'a', 'a', 'b']
Here is the desired result:
l1 = ['a', 'b', 'c', 'a_1', 'a_2', 'b_1']
How can the duplicates be renamed by appending a count number?
Here is an attempt to achieve this goal; however, is there a more Pythonic way?
for index in range(len(l1)):
counter = 1
list_of_duplicates_for_item = [dup_index for dup_index, item in enumerate(l1) if item == l1[index] and l1.count(l1[index]) > 1]
for dup_index in list_of_duplicates_for_item[1:]:
l1[dup_index] = l1[dup_index] + '_' + str(counter)
counter = counter + 1
In Python, generating a new list is usually much easier than changing an existing list. We have generators to do this efficiently. A dict can keep count of occurrences.
l = ['a', 'b', 'c', 'a', 'a', 'b']
def rename_duplicates( old ):
seen = {}
for x in old:
if x in seen:
seen[x] += 1
yield "%s_%d" % (x, seen[x])
else:
seen[x] = 0
yield x
print list(rename_duplicates(l))
I would do something like this:
a1 = ['a', 'b', 'c', 'a', 'a', 'b']
a2 = []
d = {}
for i in a1:
d.setdefault(i, -1)
d[i] += 1
if d[i] >= 1:
a2.append('%s_%d' % (i, d[i]))
else:
a2.append(i)
print a2
Based on your comment to #mathmike, if your ultimate goal is to create a dictionary from a list with duplicate keys, I would use a defaultdict from the `collections Lib.
>>> from collections import defaultdict
>>> multidict = defaultdict(list)
>>> multidict['a'].append(1)
>>> multidict['b'].append(2)
>>> multidict['a'].append(11)
>>> multidict
defaultdict(<type 'list'>, {'a': [1, 11], 'b': [2]})
I think the output you're asking for is messy itself, and so there is no clean way of creating it.
How do you intend to use this new list? Would a dictionary of counts like the following work instead?
{'a':3, 'b':2, 'c':1}
If so, I would recommend:
from collections import defaultdict
d = defaultdict(int) # values default to 0
for key in l1:
d[key] += 1
I wrote this approach for renaming duplicates in a list with any separator and a numeric or alphabetical postfix (e.g. _1, _2 or _a, _b, _c etc.). Might not be the best you could write efficient-wise, but I like this as a clean readable code which is also scalable easily.
def rename_duplicates(label_list, seperator="_", mode="numeric"):
"""
options for 'mode': numeric, alphabet
"""
import string
if not isinstance(label_list, list) or not isinstance(seperator, str):
raise TypeError("lable_list and separator must of type list and str, respectively")
for item in label_list:
l_count = label_list.count(item)
if l_count > 1:
if mode == "alphabet":
postfix_str = string.ascii_lowercase
if len(postfix_str) < l_count:
# do something
pass
elif mode == "numeric":
postfix_str = "".join([str(i+1) for i in range(l_count)])
else:
raise ValueError("the 'mode' could be either 'numeric' or 'alphabet'")
postfix_iter = iter(postfix_str)
for i in range(l_count):
item_index = label_list.index(item)
label_list[item_index] += seperator + next(postfix_iter)
return label_list
label_list = ['a', 'b', 'c', 'a', 'a', 'b']
use the function:
rename_duplicates(label_list)
result:
['a_1', 'b_1', 'c', 'a_2', 'a_3', 'b_2']