x = [['a', 'b', 'c'], ['a', 'c', 'd'], ['e', 'f', 'f']]
Let's say we have a list with random str letters.
How can i create a function so it tells me how many times the letter 'a' comes out, which in this case 2. Or any other letter, like 'b' comes out once, 'f' comes out twice. etc.
Thank you!
You could flatten the list and use collections.Counter:
>>> import collections
>>> x = [['a', 'b', 'c'], ['a', 'c', 'd'], ['e', 'f', 'f']]
>>> d = collections.Counter(e for sublist in x for e in sublist)
>>> d
Counter({'a': 2, 'c': 2, 'f': 2, 'b': 1, 'e': 1, 'd': 1})
>>> d['a']
2
import itertools, collections
result = collections.defaultdict(int)
for i in itertools.chain(*x):
result[i] += 1
This will create result as a dictionary with the characters as keys and their counts as values.
Just FYI, you can use sum() to flatten a single nested list.
>>> from collections import Counter
>>>
>>> x = [['a', 'b', 'c'], ['a', 'c', 'd'], ['e', 'f', 'f']]
>>> c = Counter(sum(x, []))
>>> c
Counter({'a': 2, 'c': 2, 'f': 2, 'b': 1, 'e': 1, 'd': 1})
But, as Blender and John Clements have addressed, itertools.chain.from_iterable() may be more clear.
>>> from itertools import chain
>>> c = Counter(chain.from_iterable(x)))
>>> c
Counter({'a': 2, 'c': 2, 'f': 2, 'b': 1, 'e': 1, 'd': 1})
Related
I am trying to iterate through a double list but am getting the incorrect results. I am trying to get the count of each element in the list.
l = [['<s>', 'a', 'a', 'b', 'b', 'c', 'c', '</s>'], ['<s>', 'a', 'c', 'b', 'c', '</s>'], ['<s>', 'b', 'c', 'c', 'a', 'b', '</s>']]
dict = {}
for words in l:
for letters in words:
dict[letters] = words.count(letters)
for x in countVocabDict:
print(x + ":" + str(countVocabDict[x]))
at the moment, I am getting:
<s>:1
a:1
b:2
c:2
</s>:1
It seems as if it is only iterating through the last list in 'l' : ['<s>', 'b', 'c', 'c', 'a', 'b', '</s>']
but I am trying to get:
<s>: 3
a: 4
b: 5
c: 6
</s>:3
In each inner for loop, you are not adding to the current value of dict[letters] but set it to whatever amount is counted for the current sublist (peculiarly) named word.
Fixing your code with a vanilla dict:
>>> l = [['<s>', 'a', 'a', 'b', 'b', 'c', 'c', '</s>'], ['<s>', 'a', 'c', 'b', 'c', '</s>'], ['<s>', 'b', 'c', 'c', 'a', 'b', '</s>']]
>>> d = {}
>>>
>>> for sublist in l:
...: for x in sublist:
...: d[x] = d.get(x, 0) + 1
>>> d
{'<s>': 3, 'a': 4, 'b': 5, 'c': 6, '</s>': 3}
Note that I am not calling list.count in each inner for loop. Calling count will iterate over the whole list again and again. It is far more efficient to just add 1 every time a value is seen, which can be done by looking at each element of the (sub)lists exactly once.
Using a Counter.
>>> from collections import Counter
>>> Counter(x for sub in l for x in sub)
Counter({'<s>': 3, 'a': 4, 'b': 5, 'c': 6, '</s>': 3})
Using a Counter and not manually unnesting the nested list:
>>> from collections import Counter
>>> from itertools import chain
>>> Counter(chain.from_iterable(l))
Counter({'<s>': 3, 'a': 4, 'b': 5, 'c': 6, '</s>': 3})
The dictionary is being overwritten in every iteration, rather it should update
count_dict[letters] += words.count(letters)
Initialize the dictionary with defaultdict
from collections import defaultdict
count_dict = defaultdict(int)
As #Vishnudev said, you must add current counter. But dict[letters] must exists (else you'll get a KeyError Exception). You can use the get method of dict with a default value to avoir this:
l = [['<s>', 'a', 'a', 'b', 'b', 'c', 'c', '</s>'],
['<s>', 'a', 'c', 'b', 'c', '</s>'],
['<s>', 'b', 'c', 'c', 'a', 'b', '</s>']]
dict = {}
for words in l:
for letters in words:
dict[letters] = dict.get(letters, 0) + 1
As per your question, you seem to know that it only takes on the result of the last sublist. This happens because after every iteration your previous dictionary values are replaced and overwritten by the next iteration values. So, you need to maintain the previous states values and add it to the newly calculated values.
You can try this-
l = [['<s>', 'a', 'a', 'b', 'b', 'c', 'c', '</s>'], ['<s>', 'a', 'c', 'b', 'c', '</s>'], ['<s>', 'b', 'c', 'c', 'a', 'b', '</s>']]
d={}
for lis in l:
for x in lis:
if x in d:
d[x]+=1
else:
d[x]=1
So the resulting dictionary d will be as-
{'<s>': 3, 'a': 4, 'c': 6, 'b': 5, '</s>': 3}
I hope this helps!
Here I have a code that initiates by a list, it takes two random letter and put them back into the main list. Then I count each letter from each generated list:
import random
import collections
def randMerge(l:list, count:int) -> list:
return l + [random.sample(l,k=count)]
def flatten(d):
return [i for b in [[c] if not isinstance(c, list) else flatten(c)
for c in d] for i in b]
num = 2
aList = ['A','B','C','D']
newList = aList[:]
for _ in range(3):
newList = randMerge(newList,num)
print(newList)
new_counts = collections.Counter(flatten(newList))
print(new_counts)
which gives:
['A', 'B', 'C', 'D', ['A', 'C']]
Counter({'A': 2, 'C': 2, 'B': 1, 'D': 1})
['A', 'B', 'C', 'D', ['A', 'C'], ['D', 'A']]
Counter({'A': 3, 'C': 2, 'D': 2, 'B': 1})
['A', 'B', 'C', 'D', ['A', 'C'], ['D', 'A'], ['A', 'B']]
Counter({'A': 4, 'B': 2, 'C': 2, 'D': 2})
Now I wonder how can I make a dataframe such that each column the numbers in counters and the row will be representing the letters. I did this:
df = pandas.DataFrame.from_dict(new_counts, orient='index')
yet this gives me only the last Counter. Also how can I make a histogram of each Counter and show them together?
If you need each column to represent the contents of a Counter object, you can create your dataframe from the list of objects, then transpose.
import collections, random, pandas as pd
def randMerge(l:list, count:int) -> list:
return l + [random.sample(l,k=count)]
def flatten(d):
return [i for b in [[c] if not isinstance(c, list) else flatten(c)
for c in d] for i in b]
num = 2
aList = ['A','B','C','D']
res = []
newList = aList[:]
for _ in range(3):
newList = randMerge(newList,num)
new_counts = collections.Counter(flatten(newList))
res.append(new_counts)
print(res)
# [Counter({'A': 2, 'C': 2, 'B': 1, 'D': 1}),
# Counter({'C': 3, 'A': 2, 'D': 2, 'B': 1}),
# Counter({'C': 4, 'A': 2, 'B': 2, 'D': 2})]
df = pd.DataFrame(res).T
print(df)
# 0 1 2
# A 2 2 2
# B 1 1 1
# C 2 3 5
# D 1 2 3
I need to quickly hash a dictionary (a counter), and I’m noticing that python seems to order dictionaries with the same keys in the same order, even if they are constructed differently. In fact the dictionaries seem to be able to survive quite a bit of abuse:
>>> D = {'a': 1, 'b': 2, 'c': 3}
>>> list(D)
['b', 'c', 'a']
>>> list(D)
['b', 'c', 'a']
>>> list(D)
['b', 'c', 'a']
>>> list(D)
['b', 'c', 'a']
>>> E = {'a': 1, 'b': 2, 'c': 3}
>>> list(E)
['b', 'c', 'a']
>>> list(E)
['b', 'c', 'a']
>>> list(E)
['b', 'c', 'a']
>>> F = {'a': 1, 'b': 2, 'c': 3}
>>> list(F)
['b', 'c', 'a']
>>> list(F)
['b', 'c', 'a']
>>> list(F)
['b', 'c', 'a']
>>> list(F)
['b', 'c', 'a']
>>> G = {'a': 1, 'b': 2, 'c': 3, 'd': 4}
>>> list(G)
['b', 'c', 'a', 'd']
>>> list(G)
['b', 'c', 'a', 'd']
>>> list(G)
['b', 'c', 'a', 'd']
>>> list(F)
['b', 'c', 'a']
>>> F.pop('a')
1
>>> list(F)
['b', 'c']
>>> F['a'] = 2
>>> list(F)
['b', 'c', 'a']
>>> list(F)
['b', 'c', 'a']
>>> H = {'b': 2, 'a': 1, 'c': 3}
>>> list(H)
['b', 'c', 'a']
>>> H = {'b': 2, 'c': 1, 'a': 3}
>>> list(H)
['b', 'c', 'a']
>>> K = {'b': 2, 'c': 1, 'a': 3, 'd': 4}
>>> list(K)
['b', 'c', 'a', 'd']
>>> K = {'b': 2, 'c': 1, 'd': 3, 'a': 4}
>>> list(K)
['b', 'c', 'a', 'd']
My question is then, if my dictionaries have the same keys and the same values, can I count on the keys being in the same order, at least for the lifetime of that running instance of python?
Note that I’m aware python is a bit incomprehensible in how it decides to order a dictionary, but I want to know if given the same inputs, the same instance of python will return the same key ordering each time.
Regular python dicts are not ordered. It is never guaranteed that when you get the list of keys that they will be the order you expect them to be.
If you want to preserve order, use an ordered dict.
https://docs.python.org/2/library/collections.html#collections.OrderedDict
Python >3.7
Dictionary order is guaranteed to be insertion order.
Python <3.7
In terms of the language definition, no you cannot rely on stable ordering, because it is not promised in the language definition.
Now, it might be that over the short- and medium-term you will find that this ordering is stable, and this makes sense: computers are deterministic, so it's reasonable to expect the same results from one iteration of the experiment to the next. (however, since they are complex systems, this nondeterministic machine might still produce unexpected results, since you don't know the factors that are determinant) However, this reasoning does not extend to the long-term, which is what you should be programming to, because the language implementation is free to choose any means of ordering those keys that it likes, and to change that choice at any time, as long as the implementation is consistent with the language definition. This means that programs depending on some order remaining stable are subject to breakage if run under different implementations, and they are subject to breakage when the implementation is updated.
This is not a place you want to be, therefore you should not make any assumptions about the stability of ordering of dictionary keys.
That being said, if you are only concerned about stability just across the lifetime of one running instance of python then this seems like a safe gamble - again, computers are deterministic - but still a gamble. Test carefully against cases rather more complex than the ones you're expecting to encounter, and then decide whether that chopping block looks like a comfortable place to rest your neck.
if my dictionaries have the same keys and the same values, can I count on the keys being in the same order
No.
>>> list({'d': 0, 'l': 0})
['d', 'l']
>>> list({'l': 0, 'd': 0})
['l', 'd']
Given that nobody mentioned this yet, I'll tell you that hash randomization is enabled by default since Python 3.3.
With hash randomization, the result of hash('abc') is different between each Python run. Because hashes are at the base of dictionaries (they are used to determine the location of the item in the internal array used by dict), there are even fewer guarantees about ordering.
$ python3.5
>>> d = {'a': 1, 'b': 2, 'c': 3}
>>> list(d)
['a', 'c', 'b']
>>> list(d)
['a', 'c', 'b']
$ python3.5
# new process, new random seed, new ordering
>>> d = {'a': 1, 'b': 2, 'c': 3}
>>> list(d)
['c', 'a', 'b']
>>> list(d)
['c', 'a', 'b']
This question already has answers here:
Invert keys and values of the original dictionary
(3 answers)
Closed 8 years ago.
I am looking to tranpose a dictionary on python and after looking around i was not able to ifnd a solution for this. Does anybody know how could i reverse a dictionary like the following as input:
graph = {'A': ['B', 'C'],
'B': ['C', 'D'],
'C': ['D'],
'D': ['C'],
'E': ['F'],
'F': ['C']}
so that i get something like:
newgraph = {'A': [''],
'B': ['A'],
'C': ['A', 'B', 'D','F'],
'D': ['B', 'C'],
'E': [''],
'F': ['E']}
Use defaultdict:
newgraph = defaultdict(list)
for x, adj in graph.items():
for y in adj:
newgraph[y].append(x)
While it doesn't seem to make any sense to have the empty string '' in the empty lists, it's certainly possible:
for x in newgraph:
newgraph[x] = newgraph[x] or ['']
Use defaultdict:
>>> from collections import defaultdict
>>> graph = {'A': ['B', 'C'],
... 'B': ['C', 'D'],
... 'C': ['D'],
... 'D': ['C'],
... 'E': ['F'],
... 'F': ['C']}
>>> new_graph = defaultdict(list)
>>> for ele in graph.keys():
... new_graph[ele] = []
...
>>> for k, v in graph.items():
... for ele in v:
... new_graph[ele].append(k)
...
>>> pprint(new_graph)
{'A': [],
'B': ['A'],
'C': ['A', 'B', 'D', 'F'],
'D': ['B', 'C'],
'E': [],
'F': ['E']}
It's also possible without defaultdict.
Here I've left the empty keys in the new dict with the value None.
graph = {'A': ['B', 'C'],
'B': ['C', 'D'],
'C': ['D'],
'D': ['C'],
'E': ['F'],
'F': ['C']}
g = dict.fromkeys(graph.keys())
for k, v in graph.iteritems():
for x in v:
if g[x]: g[x] += [k]
else: g[x] = [k]
for k in sorted(graph.keys()):
print k, ':', g[k]
Output:
A : None
B : ['A']
C : ['A', 'B', 'D', 'F']
D : ['C', 'B']
E : None
F : ['E']
I am trying to learn Python dictionary comprehension, and I think it is possible to do in one line what the following functions do. I wasn't able to make the n+1 as in the first or avoid using range() as in the second.
Is it possible to use a counter that automatically increments during the comprehension, as in test1()?
def test1():
l = ['a', 'b', 'c', 'd']
d = {}
n = 1
for i in l:
d[i] = n
n = n + 1
return d
def test2():
l = ['a', 'b', 'c', 'd']
d = {}
for n in range(len(l)):
d[l[n]] = n + 1
return d
It's quite simple using the enumerate function:
>>> L = ['a', 'b', 'c', 'd']
>>> {letter: i for i,letter in enumerate(L, start=1)}
{'a': 1, 'c': 3, 'b': 2, 'd': 4}
Note that, if you wanted the inverse mapping, i.e. mapping 1 to a, 2 to b etc, you could simply do:
>>> dict(enumerate(L, start=1))
{1: 'a', 2: 'b', 3: 'c', 4: 'd'}
This works
>>> l = ['a', 'b', 'c', 'd']
>>> { x:(y+1) for (x,y) in zip(l, range(len(l))) }
{'a': 1, 'c': 3, 'b': 2, 'd': 4}