Counting the number of specific characters ignoring duplicates: Python - python

I have an input like this: BFFBFBFFFBFBBBFBBBBFF .
I want to count 'B's and the answer should be 6.(ignore the duplicate ones)
How to do it in python?

Use itertools.groupby :
>>> from itertools import groupby
>>> l = [k for k,v in groupby(s)]
>>> l
=> ['B', 'F', 'B', 'F', 'B', 'F', 'B', 'F', 'B', 'F', 'B', 'F']
>>> l.count('B')
=> 6
#driver values :
IN : s = 'BFFBFBFFFBFBBBFBBBBFF
EDIT : Also, for more extensive use, its better to use collections.Counter to get count for all the characters.
>>> from collections import Counter
>>> Counter(l)
=> Counter({'B': 6, 'F': 6})

s = "BFFBFBFFFBFBBBFBBBBFF"
f = False
count = 0
for i in s:
if f and i == 'B':
continue
elif i == 'B':
count += 1
f = True
else:
f = False
print(count)
another
from itertools import groupby
count = 0
for i,_ in groupby(s):
if i == 'B':
count += 1
print(count)

You should set a counter and a flag variable. Then count only occurences which are not duplicates, and flip the flag. The logic is simple: if current letter is 'B', and you the letter before isn't 'B' (dup = False), then count it + flip the boolean:
s = 'BFFBFBFFFBFBBBFBBBBFF'
count = 0
dup = False
for l in s:
if l == 'B' and not dup:
count += 1
dup = True
elif l != 'B':
dup = False
# count: 6

We can remove consecutive dups and use collections.Counter to count the B's that are left:
from collections import Counter
def remove_conseq_dups(s):
res = ""
for i in range(len(s)-1):
if s[i] != s[i+1]:
res+= s[i]
return res
s = "BFFBFBFFFBFBBBFBBBBFF"
print(Counter(remove_conseq_dups(s))['B']) # 6
And a groupby solution:
from itertools import groupby
s = "BFFBFBFFFBFBBBFBBBBFF"
print(sum(map(lambda x: 1 if x == 'B' else 0, [x for x, v in groupby(s)])))
Or
print(len(list(filter(lambda x: x == 'B', [x for x, v in groupby(s)]))))

Another solution by first removing duplicates using RE-library:
import re
l1 = "BFFBFBFFFBFBBBFBBBBFF"
l2 = re.sub(r'([A-z])\1+', r'\1', l1) # Remove duplicates
l2.count("B") # 6

You want to count when the letters change from F to B, and another function can do that : split. It removes all Fs, but create empty strings for consecutive Fs, so we must remove them from the count.
s = "BFFBFBFFFBFBBBFBBBBFF"
t = s.split('F')
n = sum([1 for b in t if len(b) > 0])
print(n)

Alternative solution:
s = 'BFFBFBFFFBFBBBFBBBBFF'
l = [c for i,c in enumerate(s) if s[i-1] != c]
l.count('B') #or use counter
>>>6

Related

Common characters between strings in an array

I am trying to find the common char between the strings in the array. I am using a hashmap for this purpose which is defined as Counter. After trying multiple times I am not able to get correct ans. What I am doing wrong here?
Expected Ans: {(c,1),(o,1)}
What I am getting: {('c', 1)}
My code:
arr = ["cool","lock","cook"]
def Counter(arr):
d ={}
for items in arr:
if items not in d:
d[items] = 0
d[items] += 1
return d
res = Counter(arr[0]).items()
for items in arr:
res &= Counter(items).items()
print(res)
In [29]: from collections import Counter
In [30]: words = ["cool","coccoon","cook"]
In [31]: chars = ''.join(set(''.join(words)))
In [32]: counts = [Counter(w) for w in words]
In [33]: common = {ch: min(wcount[ch] for wcount in counts) for ch in chars}
In [34]: answer = {ch: count for ch, count in common.items() if count}
In [35]: answer
Out[35]: {'c': 1, 'o': 2}
In [36]:
Try using functools.reduce and collections.Counter:
>>> from functools import reduce
>>> from collections import Counter
>>> reduce(lambda x,y: x&y, (Counter(elem) for elem in arr[1:]), Counter(arr[0]))
Counter({'c': 1, 'o': 1})
An approach without any other library could be like this:
arr = ["cool","lock","cook"]
def Counter(obj_str):
countdict = {x: 0 for x in set(obj_str)}
for char in obj_str:
countdict[char] += 1
return {(k, v) for k,v in countdict.items()}
print(Counter(arr[0]))
This should give you the result formated as you want it.

searching element of 1-D list in 2-D list

I have two lists, one is of form:
A = ["qww","ewq","ert","ask"]
B = [("qww",2) ,("ert",4) , ("qww",6), ("ewq" , 5),("ewq" , 10),("ewq" , 15),("ask",11)]
I have to process such that final output is
C = A = [("qww",8),("ewq",20),("ert",4),("ask",11)]
for that I written code:
# in code temp_list is of form A
# in code total_list is of form B
# in code final is of form C
def ispresent(key,list):
for qwerty in list:
if qwerty == key:
return 1
else:
return 0
def indexreturn(key,list):
counter = 0
for qwerty in list:
if qwerty != key:
counter = counter + 1
else:
return counter
def mult_indexreturn(key,list):
for i in range(len(list)):
if key == list[i][0]:
return i
final = map(lambda n1, n2: (n1,n2 ), temp_list,[ 0 for _ in range(len(temp_list))])
for object2 in total_list:#****
for object1 in temp_list:
if object2 == object1:
final[ indexreturn(object2,final) ][1] = final[ indexreturn(object2, final) ][1] + object2[mult_indexreturn(object2,total_list)][1]#total_list[ mult_indexreturn(object2,total_list) ][1]
print(final)
it should give output as C type list, but giving nothing
but C = [("qww",0),("ewq",0),("ert",0),("ask",0)]
according to me the main problem is in my looping part ( with **** comment), is there problem with logic or something else.
I gave in lot of codes, so that you can understand how my code working
You can build a dictionary using the method fromkeys() and subsequently you can use the for loop to accumulate integers:
A = ["qww","ewq","ert","ask"]
B = [("qww",2) ,("ert",4) , ("qww",6), ("ewq" , 5),("ewq" , 10),("ewq" , 15),("ask",11)]
C = dict.fromkeys(A, 0)
# {'qww': 0, 'ewq': 0, 'ert': 0, 'ask': 0}
for k, v in B:
C[k] += v
C = list(C.items())
# [('qww', 8), ('ewq', 30), ('ert', 4), ('ask', 11)]
Try this:
from collections import defaultdict
result = defaultdict(int)
for i in A:
result[i] = sum([j[1] for j in B if j[0] == i])
then tuple(result.items()) will be your out put.
Or you can do it in just one line:
result = tuple({i:sum([j[1] for j in B if j[0] == i]) for i in A}.items())
Using collection.defaultdict
Ex:
from collections import defaultdict
A = ["qww","ewq","ert","ask"]
B = [("qww",2) ,("ert",4) , ("qww",6), ("ewq" , 5),("ewq" , 10),("ewq" , 15),("ask",11)]
result = defaultdict(int)
for key, value in B:
if key in A: #Check if Key in A.
result[key] += value #Add Value.
print(result)
Output:
defaultdict(<type 'int'>, {'qww': 8, 'ert': 4, 'ewq': 30, 'ask': 11})

Get sequences of same values within list and count elements within sequences

I'd like to find the amount of values within sequences of the same value from a list:
list = ['A','A','A','B','B','C','A','A']
The result should look like:
result_dic = {A: [3,2], B: [2], C: [1]}
I do not just want the counts of different values in a list as you can see in the result for A.
collections.defaultdict and itertools.groupby
from itertools import groupby
from collections import defaultdict
listy = ['A','A','A','B','B','C','A','A']
d = defaultdict(list)
for k, v in groupby(listy):
d[k].append(len([*v]))
d
defaultdict(list, {'A': [3, 2], 'B': [2], 'C': [1]})
groupby will loop through an iterable and lump contiguous things together.
[(k, [*v]) for k, v in groupby(listy)]
[('A', ['A', 'A', 'A']), ('B', ['B', 'B']), ('C', ['C']), ('A', ['A', 'A'])]
So I loop through those results and append the length of each grouped thing to the values of a defaultdict
I'd suggest using a defaultdict and looping through the list.
from collections import defaultdict
sample = ['A','A','A','B','B','C','A','A']
result_dic = defaultdict(list)
last_letter = None
num = 0
for l in sample:
if last_letter == l or last_letter is None:
num += 1
else:
result_dic[last_letter].append(num)
Edit
This is my approach, although I'd have a look at #piRSquared's answer because they were keen enough to include groupby as well. Nice work!
I'd suggest looping through the list.
result_dic = {}
old_word = ''
for word in list:
if not word in result_dic:
d[word] = [1]
elif word == old_word:
result_dic[word][-1] += 1
else:
result_dic[word].append(1)
old_word = word

Count how many times are items from list 1 in list 2

I have 2 lists:
1. ['a', 'b', 'c']
2. ['a', 'd', 'a', 'b']
And I want dictionary output like this:
{'a': 2, 'b': 1, 'c': 0}
I already made it:
#b = list #1
#words = list #2
c = {}
for i in b:
c.update({i:words.count(i)})
But it is very slow, I need to process like 10MB txt file.
EDIT: Entire code, currently testing so unused imports..
import string
import os
import operator
import time
from collections import Counter
def getbookwords():
a = open("wu.txt", encoding="utf-8")
b = a.read().replace("\n", "").lower()
a.close()
b.translate(string.punctuation)
b = b.split(" ")
return b
def wordlist(words):
a = open("wordlist.txt")
b = a.read().lower()
b = b.split("\n")
a.close()
t = time.time()
#c = dict((i, words.count(i)) for i in b )
c = Counter(words)
result = {k: v for k, v in c.items() if k in set(b)}
print(time.time() - t)
sorted_d = sorted(c.items(), key=operator.itemgetter(1))
return(sorted_d)
print(wordlist(getbookwords()))
Since speed is currently an issue, it might be worth considering not passing through the list for each thing you want to count. The set() function allows you to only use the unique keys in your list words.
An important thing to remember for speed in all cases is the line unique_words = set(b). Without this, an entire pass through your list is being done to create a set from b at every iteration in whichever kind of data structure you happen to use.
c = {k:0 for k in set(words)}
for w in words:
c[w] += 1
unique_words = set(b)
c = {k:counts[k] for k in c if k in unique_words}
Alternatively, defaultdicts can be used to eliminate some of the initialization.
from collections import defaultdict
c = defaultdict(int)
for w in words:
c[w] += 1
unique_words = set(b)
c = {k:counts[k] for k in c if k in unique_words}
For completeness sake, I do like the Counter based solutions in the other answers (like from Reut Sharabani). The code is cleaner, and though I haven't benchmarked it I wouldn't be surprised if a built-in counting class is faster than home-rolled solutions with dictionaries.
from collections import Counter
c = Counter(words)
unique_words = set(b)
c = {k:v for k, v in c.items() if k in unique_words}
Try using collections.Counter and move b to a set, not a list:
from collections import Counter
c = Counter(words)
b = set(b)
result = {k: v for k, v in c.items() if k in b}
Also, if you can read the words lazily and not create an intermediate list that should be faster.
Counter provides the functionality you want (counting items), and filtering the result against a set uses hashing which should be a lot faster.
You can use collection.Counter on a generator that skips ignored keys using a set lookup.
from collections import Counter
keys = ['a', 'b', 'c']
lst = ['a', 'd', 'a', 'b']
unique_keys = set(keys)
count = Counter(x for x in lst if x in unique_keys)
print(count) # Counter({'a': 2, 'b': 1})
# count['c'] == 0
Note that count['c'] is not printed, but is still 0 by default in a Counter.
Here's an example I just coughed up in repl. Assuming you're not counting duplicates in list two. We create a hash table using a dictionary. For each item in the list were matching two, we create a key value pair with the item being the key and we set the value to 0.
Next we iterate through the second list, for each value, we check if the value has been defined already, if it has been, than we increment the value using the key. Else, we ignore.
Least amount of iterations possible. You hit each item in each list only once.
x = [1, 2, 3, 4, 5];
z = [1, 2, 2, 2, 1];
y = {};
for n in x:
y[n] = 0; //Set the value to zero for each item in the list
for n in z:
if(n in y): //If we defined the value in the hash already, increment by one
y[n] += 1;
print(y)
#Makalone, above answers are appreciable. You can also try the below code sample which uses Python's Counter() from collections module.
You can try it at http://rextester.com/OTYG56015.
Python code »
from collections import Counter
list1 = ['a', 'b', 'c']
list2 = ['a', 'd', 'a', 'b']
counter = Counter(list2)
d = {key: counter[key] for key in set(list1)}
print(d)
Output »
{'a': 2, 'c': 0, 'b': 1}

Python Removing duplicates ( and not keeping them) in a list

Say I have:
x=[a,b,a,b,c,d]
I want a way to get
y=[c,d]
I have managed to do it with count:
for i in x:
if x.count(i) == 1:
unique.append(i)
The problem is, this is very slow for bigger lists, help?
First use a dict to count:
d = {}
for i in x:
if i not in d:
d[i] = 0
d[i] += 1
y = [i for i, j in d.iteritems() if j == 1]
x=["a","b","a","b","c","d"]
from collections import Counter
print([k for k,v in Counter(x).items() if v == 1])
['c', 'd']
Or to guarantee the order create the Counter dict first then iterate over the x list doing lookups for the values only keeping k's that have a value of 1:
x = ["a","b","a","b","c","d"]
from collections import Counter
cn = Counter(x)
print([k for k in x if cn[k] == 1])
So one pass over x to create the dict and another pass in the comprehension giving you an overall 0(n) solution as opposed to your quadratic approach using count.
The Counter dict counts the occurrences of each element:
In [1]: x = ["a","b","a","b","c","d"]
In [2]: from collections import Counter
In [3]: cn = Counter(x)
In [4]: cn
Out[4]: Counter({'b': 2, 'a': 2, 'c': 1, 'd': 1})
In [5]: cn["a"]
Out[5]: 2
In [6]: cn["b"]
Out[6]: 2
In [7]: cn["c"]
Out[7]: 1
Doing cn[k] returns the count for each element so we only end up keeping c and d.
The best way to do this is my using the set() function like this:
x=['a','b','a','b','c','d']
print list(set(x))
As the set() function returns an unordered result. Using the sorted() function, this problem can be solved like so:
x=['a','b','a','b','c','d']
print list(sorted(set(x)))

Categories

Resources