This question already has answers here:
How do I count the occurrences of a list item?
(30 answers)
Closed 3 years ago.
I am trying to track seen elements, from a big array, using a dict.
Is there a way to force a dictionary object to be integer type and set to zero by default upon initialization?
I have done this with a very clunky codes and two loops.
Here is what I do now:
fl = [0, 1, 1, 2, 1, 3, 4]
seenit = {}
for val in fl:
seenit[val] = 0
for val in fl:
seenit[val] = seenit[val] + 1
Of course, just use collections.defaultdict([default_factory[, ...]]):
from collections import defaultdict
fl = [0, 1, 1, 2, 1, 3, 4]
seenit = defaultdict(int)
for val in fl:
seenit[val] += 1
print(fl)
# Output
defaultdict(<class 'int'>, {0: 1, 1: 3, 2: 1, 3: 1, 4: 1})
print(dict(seenit))
# Output
{0: 1, 1: 3, 2: 1, 3: 1, 4: 1}
In addition, if you don't like to import collections you can use dict.get(key[, default])
fl = [0, 1, 1, 2, 1, 3, 4]
seenit = {}
for val in fl:
seenit[val] = seenit.get(val, 0) + 1
print(seenit)
# Output
{0: 1, 1: 3, 2: 1, 3: 1, 4: 1}
Also, if you only want to solve the problem and don't mind to use exactly dictionaries you may use collection.counter([iterable-or-mapping]):
from collections import Counter
fl = [0, 1, 1, 2, 1, 3, 4]
seenit = Counter(f)
print(seenit)
# Output
Counter({1: 3, 0: 1, 2: 1, 3: 1, 4: 1})
print(dict(seenit))
# Output
{0: 1, 1: 3, 2: 1, 3: 1, 4: 1}
Both collection.defaultdict and collection.Counter can be read as dictionary[key] and supports the usage of .keys(), .values(), .items(), etc. Basically they are a subclass of a common dictionary.
If you want to talk about performance I checked with timeit.timeit() the creation of the dictionary and the loop for a million of executions:
collection.defaultdic: 2.160868141 seconds
dict.get: 1.3540439499999999 seconds
collection.Counter: 4.700308418999999 seconds
collection.Counter may be easier, but much slower.
You can use collections.Counter:
from collections import Counter
Counter([0, 1, 1, 2, 1, 3, 4])
Output:
Counter({1: 3, 0: 1, 2: 1, 3: 1, 4: 1})
You can then address it like a dictionary:
>>> Counter({1: 3, 0: 1, 2: 1, 3: 1, 4: 1})[1]
3
>>> Counter({1: 3, 0: 1, 2: 1, 3: 1, 4: 1})[0]
1
Using val in seenit is a bit faster than .get():
seenit = dict()
for val in fl:
if val in seenit :
seenit[val] += 1
else:
seenit[val] = 1
For larger lists, Counter will eventually outperform all other approaches. and defaultdict is going to be faster than using .get() or val in seenit.
Related
Im doing a course in bioinformatics. We were supposed to create a function that takes a list of strings like this:
Motifs =[
"AACGTA",
"CCCGTT",
"CACCTT",
"GGATTA",
"TTCCGG"]
and turn it into a count matrix that counts the occurrence of the nucleotides (the letters A, C, G and T) in each column and adds a pseudocount 1 to it, represented by a dictionary with multiple values for each key like this:
count ={
'A': [2, 3, 2, 1, 1, 3],
'C': [3, 2, 5, 3, 1, 1],
'G': [2, 2, 1, 3, 2, 2],
'T': [2, 2, 1, 2, 5, 3]}
For example A occurs 1 + 1 pseudocount = 2 in the first column. C appears 2 + 1 pseudocount = 3 in the fourth column.
Here is my solution:
def CountWithPseudocounts(Motifs):
t = len(Motifs)
k = len(Motifs[0])
count = {}
for symbol in "ACGT":
count[symbol] = [1 for j in range(k)]
for i in range(t):
for j in range(k):
symbol = Motifs[i][j]
count[symbol][j] += 1
return count
The first set of for loops generates a dictionary with the keys A,C,G,T and the initial values 1 for each column like this:
count ={
'A': [1, 1, 1, 1, 1, 1],
'C': [1, 1, 1, 1, 1, 1],
'G': [1, 1, 1, 1, 1, 1],
'T': [1, 1, 1, 1, 1, 1]}
The second set of for loops counts the occurrence of the nucleotides and adds it to the values of the existing dictionary as seen above.
This works and does its job, but I want to know how to further compress both for loops using dict comprehensions.
NOTE:
I am fully aware that there are a multitude of modules and libraries like biopython, scipy and numpy that probably can turn the entire function into a one liner. The problem with modules is that their output format often doesnt match with what the automated solution check from the course is expecting.
This
count = {}
for symbol in "ACGT":
count[symbol] = [1 for j in range(k)]
can be changed to comprehension as follows
count = {symbol:[1 for j in range(k)] for symbol in "ACGT"}
and then further simplified by using pythons ability to multiply list by integer to
count = {symbol:[1]*k for symbol in "ACGT"}
compressing the first loop:
count = {symbol: [1 for j in range(k)] for symbol in "ACGT"}
This method is called a generator (or dict comprehension) - it generates a dict using a for loop.
I'm not sure you can compress the second (nested) loop, since it's not generating anything, but changing the first dict.
You can compress a lot your code using collections.Counter and collections.defaultdict:
from collections import Counter, defaultdict
out = defaultdict(list)
bases = 'ACGT'
for m in zip(*Motifs):
c = Counter(m)
for b in bases:
out[b].append(c[b]+1)
dict(out)
output:
{'A': [2, 3, 2, 1, 1, 3],
'C': [3, 2, 5, 3, 1, 1],
'G': [2, 2, 1, 3, 2, 2],
'T': [2, 2, 1, 2, 5, 3]}
You can use collections.Counter:
from collections import Counter
m = ['AACGTA', 'CCCGTT', 'CACCTT', 'GGATTA', 'TTCCGG']
d = [Counter(i) for i in zip(*m)]
r = {a:[j.get(a, 0)+1 for j in d] for a in 'ACGT'}
Output:
{'A': [2, 3, 2, 1, 1, 3], 'C': [3, 2, 5, 3, 1, 1], 'G': [2, 2, 1, 3, 2, 2], 'T': [2, 2, 1, 2, 5, 3]}
I am trying to group the words_count column by both essay_Set and domain1_score and adding the counters in words_count to add the counters results as mentioned here:
>>> c = Counter(a=3, b=1)
>>> d = Counter(a=1, b=2)
>>> c + d # add two counters together: c[x] + d[x]
Counter({'a': 4, 'b': 3})
I grouped them using this command:
words_freq_by_set = words_freq_by_set.groupby(by=["essay_set", "domain1_score"]) but do not know how to pass the Counter addition function to apply it on words_count column which is simply +.
Here is my dataframe:
GroupBy.sum works with Counter objects. However I should mention the process is pairwise, so this may not be very fast. Let's try
words_freq_by_set.groupby(by=["essay_set", "domain1_score"])['words_count'].sum()
df = pd.DataFrame({
'a': [1, 1, 2],
'b': [Counter([1, 2]), Counter([1, 3]), Counter([2, 3])]
})
df
a b
0 1 {1: 1, 2: 1}
1 1 {1: 1, 3: 1}
2 2 {2: 1, 3: 1}
df.groupby(by=['a'])['b'].sum()
a
1 {1: 2, 2: 1, 3: 1}
2 {2: 1, 3: 1}
Name: b, dtype: object
I am using Python3. I have a list a of only integers. Now, I want to save the element and the number it repeats itself in a row in another list.
Example:
a = [6, 0, 0, 2, 2, 2, 2, 1, 89, 89]
Output:
result = ["6,1", "0, 2", "2, 4", "1, 1", "89, 2"]
# the number before the "," represents the element, the number after the "," represents how many times it repeats itself.
How to efficiently achieve my goal ?
I believe all the solutions given are counting the total occurrences of a number in the list rather than counting the repeating runs of a number.
Here is a solution using groupby from itertools. It gathers the runs and appends them to a dictionary keyed by the number.
from itertools import groupby
a = [6, 0, 0, 2, 2, 2, 2, 1, 89, 89]
d = dict()
for k, v in groupby(a):
d.setdefault(k, []).append(len(list(v)))
Dictionary created:
>>> d
{6: [1], 0: [2], 2: [4], 1: [1], 89: [2]}
Note that all runs only had 1 count in their list. If there where other occurrences of a number already seen, there would be multiple counts in the lists (that are the values for dictionary).
for counting an individual element,
us list.count,
i.e, here, for, say 2, we user
a.count(2),
which outputs 4,
also,
set(a) gives the unique elements in a
overall answer,
a = [6, 0, 0, 2, 2, 2, 2, 1, 89, 89]
nums = set(a)
result = [f"{val}, {a.count(val)}" for val in set(a)]
print(result)
which gives
['0, 2', '1, 1', '2, 4', '6, 1', '89, 2']
Method 1: using for loop
a = [6, 0, 0, 2, 2, 2, 2, 1, 89, 89]
result = []
a_set = set(a) # transform the list into a set to have unique integer
for nbr in a_set:
nbr_count = a.count(nbr)
result.append("{},{}".format(nbr, nbr_count))
print(result) # ['0,2', '1,1', '2,4', '6,1', '89,2']
Method 2: using list-comprehensions
result = ["{},{}".format(item, a.count(item)) for item in set(a)]
print(result) # ['0,2', '1,1', '2,4', '6,1', '89,2']
you can use Python List count() Method, method returns the number of elements with the specified value.
a = [6, 0, 0, 2, 2, 2, 2, 1, 89, 89]
print ({x:a.count(x) for x in a})
output:
{6: 1, 0: 2, 2: 4, 1: 1, 89: 2}
a = [6, 0, 0, 2, 2, 2, 2, 1, 89, 89]
dic = dict()
for i in a:
if(i in dic):
dic[i] = dic[i] + 1
else:
dic[i] = 1
result = []
for i in dic:
result.append(str(i) +"," + str(dic[i]))
Or:
from collections import Counter
a = [6, 0, 0, 2, 2, 2, 2, 1, 89, 89]
mylist = [Counter(a)]
print(mylist)
You can use Counter from collections:
from collections import Counter
a = [6, 0, 0, 2, 2, 2, 2, 1, 89, 89]
counter = Counter(a)
result = ['{},{}'.format(k, v) for k,v in counter.items()]
in python, if I want to find the max value of d, but the key only include 1,2,3 other than all the keys in the d. so how to do, thank you.
d = {1: 5, 2: 0, 3: 4, 4: 0, 5: 1}
Just get the keys and values for the keys 1, 2 and 3 in a list of tuples, sort the list and get the first tuple element [0] key [0].
d = {1: 5, 2: 0, 3: 4, 4: 0, 5: 1}
key_max_val = sorted([(k,v) for k,v in d.items() if k in [1,2,3]])[0][0]
print(key_max_val) # Outputs 1
You can use operator:
It will return you the key with maximum value:
In [873]: import operator
In [874]: d = {1: 5, 2: 0, 3: 4, 4: 0, 5: 1}
In [875]: max(d.iteritems(), key=operator.itemgetter(1))[0]
Out[875]: 1
I think this below should work (base on
#Mayank Porwal idea, sorry coz I can not reply):
d = {1: 5, 2: 0, 3: 4, 4: 0, 5: 1}
max(v for k,v in d.items())
Use a generator and the max builtin function:
Max value
max(v for k,v in d.items() if k in [1,2,3])
Max key
max(k for k,v in d.items() if k in [1,2,3])
Suppose I have an array as follows
arr = [1 , 2, 3, 4, 5]
I would like to convert it to a dictionary like
{
1: 1,
2: 1,
3: 1,
4: 1,
5: 1
}
My motivation behind this is so I can quickly increment the count of any of the keys in O(1) time.
Help will be much appreciated. Thanks
from collections import Counter
answer = Counter(arr)
You can use the fromkeys method:
>>> arr = [1 , 2, 3, 4, 5]
>>> dict.fromkeys(arr,1)
{1: 1, 2: 1, 3: 1, 4: 1, 5: 1}
>>>
You can use a dictionary comprehension:
{k: 1 for k in arr}
from collections import Counter
arr = [1, 2, 3, 4, 5]
c = Counter(arr)
Try collections.Counter:
>>> import collections
>>> arr = [1, 2, 3, 4, 5]
>>> collections.Counter(arr)
Counter({1: 1, 2: 1, 3: 1, 4: 1, 5: 1})
arr = [1 , 2, 3, 4, 5]
print({p: arr.count(p) for p in arr})
It's more precise I think and it still works when an element in the array repeats.