Common characters between strings in an array - python

I am trying to find the common char between the strings in the array. I am using a hashmap for this purpose which is defined as Counter. After trying multiple times I am not able to get correct ans. What I am doing wrong here?
Expected Ans: {(c,1),(o,1)}
What I am getting: {('c', 1)}
My code:
arr = ["cool","lock","cook"]
def Counter(arr):
d ={}
for items in arr:
if items not in d:
d[items] = 0
d[items] += 1
return d
res = Counter(arr[0]).items()
for items in arr:
res &= Counter(items).items()
print(res)

In [29]: from collections import Counter
In [30]: words = ["cool","coccoon","cook"]
In [31]: chars = ''.join(set(''.join(words)))
In [32]: counts = [Counter(w) for w in words]
In [33]: common = {ch: min(wcount[ch] for wcount in counts) for ch in chars}
In [34]: answer = {ch: count for ch, count in common.items() if count}
In [35]: answer
Out[35]: {'c': 1, 'o': 2}
In [36]:

Try using functools.reduce and collections.Counter:
>>> from functools import reduce
>>> from collections import Counter
>>> reduce(lambda x,y: x&y, (Counter(elem) for elem in arr[1:]), Counter(arr[0]))
Counter({'c': 1, 'o': 1})

An approach without any other library could be like this:
arr = ["cool","lock","cook"]
def Counter(obj_str):
countdict = {x: 0 for x in set(obj_str)}
for char in obj_str:
countdict[char] += 1
return {(k, v) for k,v in countdict.items()}
print(Counter(arr[0]))
This should give you the result formated as you want it.

Related

How to make function unique in python

Okay, so I have to make a function called unique. This is what it should do:
If the input is: s1 = [{1,2,3,4}, {3,4,5}]
unique(s1) should return: {1,2,5} because the 1, 2 and 5 are NOT in both lists.
And if the input is s2 = [{1,2,3,4}, {3,4,5}, {2,6}]
unique(s2) should return: {1,5,6} because those numbers are unique and are in only one list of this collection of 3 lists.
I tried to make something like this:
for x in s1:
if x not in unique_list:
unique_list.append(x)
else:
unique_list.remove(x)
print(unique_list)
But the problem with this is that it takes a whole list as "x" and not each element from each list.
Anyone that can help me a bit with this?
I am not allowed to import anything.
Python set() objects have a symmetric_difference() method to find elements in either, but not both sets. You can reduce your list with this to find the total elements unique to each set:
from functools import reduce
l = [{1,2,3,4}, {3,4,5}, {2,6}]
reduce(set.symmetric_difference, l)
# {1, 5, 6}
You can, of course do this without reduce by manually looping over the list. ^ will produce the symmetric_difference:
l = [{1,2,3,4}, {3,4,5}, {2,6}]
final = set()
for s in l:
final = final ^ s
print(final)
# {1, 5, 6}
In [13]: def f(sets):
...: c = {}
...: for s in sets:
...: for x in s:
...: c[x] = c.setdefault(x, 0) + 1
...: return {x for x, v in c.items() if v == 1}
...:
In [14]: f([{1,2}, {2, 3}, {3, 4}])
Out[14]: {1, 4}

Count how many times are items from list 1 in list 2

I have 2 lists:
1. ['a', 'b', 'c']
2. ['a', 'd', 'a', 'b']
And I want dictionary output like this:
{'a': 2, 'b': 1, 'c': 0}
I already made it:
#b = list #1
#words = list #2
c = {}
for i in b:
c.update({i:words.count(i)})
But it is very slow, I need to process like 10MB txt file.
EDIT: Entire code, currently testing so unused imports..
import string
import os
import operator
import time
from collections import Counter
def getbookwords():
a = open("wu.txt", encoding="utf-8")
b = a.read().replace("\n", "").lower()
a.close()
b.translate(string.punctuation)
b = b.split(" ")
return b
def wordlist(words):
a = open("wordlist.txt")
b = a.read().lower()
b = b.split("\n")
a.close()
t = time.time()
#c = dict((i, words.count(i)) for i in b )
c = Counter(words)
result = {k: v for k, v in c.items() if k in set(b)}
print(time.time() - t)
sorted_d = sorted(c.items(), key=operator.itemgetter(1))
return(sorted_d)
print(wordlist(getbookwords()))
Since speed is currently an issue, it might be worth considering not passing through the list for each thing you want to count. The set() function allows you to only use the unique keys in your list words.
An important thing to remember for speed in all cases is the line unique_words = set(b). Without this, an entire pass through your list is being done to create a set from b at every iteration in whichever kind of data structure you happen to use.
c = {k:0 for k in set(words)}
for w in words:
c[w] += 1
unique_words = set(b)
c = {k:counts[k] for k in c if k in unique_words}
Alternatively, defaultdicts can be used to eliminate some of the initialization.
from collections import defaultdict
c = defaultdict(int)
for w in words:
c[w] += 1
unique_words = set(b)
c = {k:counts[k] for k in c if k in unique_words}
For completeness sake, I do like the Counter based solutions in the other answers (like from Reut Sharabani). The code is cleaner, and though I haven't benchmarked it I wouldn't be surprised if a built-in counting class is faster than home-rolled solutions with dictionaries.
from collections import Counter
c = Counter(words)
unique_words = set(b)
c = {k:v for k, v in c.items() if k in unique_words}
Try using collections.Counter and move b to a set, not a list:
from collections import Counter
c = Counter(words)
b = set(b)
result = {k: v for k, v in c.items() if k in b}
Also, if you can read the words lazily and not create an intermediate list that should be faster.
Counter provides the functionality you want (counting items), and filtering the result against a set uses hashing which should be a lot faster.
You can use collection.Counter on a generator that skips ignored keys using a set lookup.
from collections import Counter
keys = ['a', 'b', 'c']
lst = ['a', 'd', 'a', 'b']
unique_keys = set(keys)
count = Counter(x for x in lst if x in unique_keys)
print(count) # Counter({'a': 2, 'b': 1})
# count['c'] == 0
Note that count['c'] is not printed, but is still 0 by default in a Counter.
Here's an example I just coughed up in repl. Assuming you're not counting duplicates in list two. We create a hash table using a dictionary. For each item in the list were matching two, we create a key value pair with the item being the key and we set the value to 0.
Next we iterate through the second list, for each value, we check if the value has been defined already, if it has been, than we increment the value using the key. Else, we ignore.
Least amount of iterations possible. You hit each item in each list only once.
x = [1, 2, 3, 4, 5];
z = [1, 2, 2, 2, 1];
y = {};
for n in x:
y[n] = 0; //Set the value to zero for each item in the list
for n in z:
if(n in y): //If we defined the value in the hash already, increment by one
y[n] += 1;
print(y)
#Makalone, above answers are appreciable. You can also try the below code sample which uses Python's Counter() from collections module.
You can try it at http://rextester.com/OTYG56015.
Python code »
from collections import Counter
list1 = ['a', 'b', 'c']
list2 = ['a', 'd', 'a', 'b']
counter = Counter(list2)
d = {key: counter[key] for key in set(list1)}
print(d)
Output »
{'a': 2, 'c': 0, 'b': 1}

Find count of identical adjacent characters in a string

I have a string: 'AAAAATTT'
I want to write a program that would count each time 2 values are identical.
So in 'AAAAATTT' it would give a count of:
AA: 4
TT: 2
You can use collections.defaultdict for this. This is an O(n) complexity solution which loops through adjacent letters and builds a dictionary based on a condition.
Your output will be a dictionary with keys as repeated letters and values as counts.
The use of itertools.islice is to avoid building a new list for the second argument of zip.
from collections import defaultdict
from itertools import islice
x = 'AAAAATTT'
d = defaultdict(int)
for i, j in zip(x, islice(x, 1, None)):
if i == j:
d[i+j] += 1
Result:
print(d)
defaultdict(<class 'int'>, {'AA': 4, 'TT': 2}
You could use a Counter:
from collections import Counter
s = 'AAAAATTT'
print([(k*2, v - 1) for k, v in Counter(list(s)).items() if v > 1])
#output: [('AA', 4), ('TT', 2)]
You may use collections.Counter with dictionary comprehension and zip as:
>>> from collections import Counter
>>> s = 'AAAAATTT'
>>> {k: v for k, v in Counter(zip(s, s[1:])).items() if k[0]==k[1]}
{('A', 'A'): 4, ('T', 'T'): 2}
Here's another alternative to achieve this using itertools.groupby, but this one is not as clean as the above solution (also will be slow in terms of performance).
>>> from itertools import groupby
>>> {x[0]:len(x) for i,j in groupby(zip(s, s[1:]), lambda y: y[0]==y[1]) for x in (tuple(j),) if i}
{('A', 'A'): 4, ('T', 'T'): 2}
One way may be as following using Counter:
from collections import Counter
string = 'AAAAATTT'
result = dict(Counter(s1+s2 for s1, s2 in zip(string, string[1:]) if s1==s2))
print(result)
Result:
{'AA': 4, 'TT': 2}
You can try it with just range method without importing anything :
data='AAAAATTT'
count_dict={}
for i in range(0,len(data),1):
data_x=data[i:i+2]
if len(data_x)>1:
if data_x[0] == data_x[1]:
if data_x not in count_dict:
count_dict[data_x] = 1
else:
count_dict[data_x] += 1
print(count_dict)
output:
{'TT': 2, 'AA': 4}

Counting the number of specific characters ignoring duplicates: Python

I have an input like this: BFFBFBFFFBFBBBFBBBBFF .
I want to count 'B's and the answer should be 6.(ignore the duplicate ones)
How to do it in python?
Use itertools.groupby :
>>> from itertools import groupby
>>> l = [k for k,v in groupby(s)]
>>> l
=> ['B', 'F', 'B', 'F', 'B', 'F', 'B', 'F', 'B', 'F', 'B', 'F']
>>> l.count('B')
=> 6
#driver values :
IN : s = 'BFFBFBFFFBFBBBFBBBBFF
EDIT : Also, for more extensive use, its better to use collections.Counter to get count for all the characters.
>>> from collections import Counter
>>> Counter(l)
=> Counter({'B': 6, 'F': 6})
s = "BFFBFBFFFBFBBBFBBBBFF"
f = False
count = 0
for i in s:
if f and i == 'B':
continue
elif i == 'B':
count += 1
f = True
else:
f = False
print(count)
another
from itertools import groupby
count = 0
for i,_ in groupby(s):
if i == 'B':
count += 1
print(count)
You should set a counter and a flag variable. Then count only occurences which are not duplicates, and flip the flag. The logic is simple: if current letter is 'B', and you the letter before isn't 'B' (dup = False), then count it + flip the boolean:
s = 'BFFBFBFFFBFBBBFBBBBFF'
count = 0
dup = False
for l in s:
if l == 'B' and not dup:
count += 1
dup = True
elif l != 'B':
dup = False
# count: 6
We can remove consecutive dups and use collections.Counter to count the B's that are left:
from collections import Counter
def remove_conseq_dups(s):
res = ""
for i in range(len(s)-1):
if s[i] != s[i+1]:
res+= s[i]
return res
s = "BFFBFBFFFBFBBBFBBBBFF"
print(Counter(remove_conseq_dups(s))['B']) # 6
And a groupby solution:
from itertools import groupby
s = "BFFBFBFFFBFBBBFBBBBFF"
print(sum(map(lambda x: 1 if x == 'B' else 0, [x for x, v in groupby(s)])))
Or
print(len(list(filter(lambda x: x == 'B', [x for x, v in groupby(s)]))))
Another solution by first removing duplicates using RE-library:
import re
l1 = "BFFBFBFFFBFBBBFBBBBFF"
l2 = re.sub(r'([A-z])\1+', r'\1', l1) # Remove duplicates
l2.count("B") # 6
You want to count when the letters change from F to B, and another function can do that : split. It removes all Fs, but create empty strings for consecutive Fs, so we must remove them from the count.
s = "BFFBFBFFFBFBBBFBBBBFF"
t = s.split('F')
n = sum([1 for b in t if len(b) > 0])
print(n)
Alternative solution:
s = 'BFFBFBFFFBFBBBFBBBBFF'
l = [c for i,c in enumerate(s) if s[i-1] != c]
l.count('B') #or use counter
>>>6

Python Removing duplicates ( and not keeping them) in a list

Say I have:
x=[a,b,a,b,c,d]
I want a way to get
y=[c,d]
I have managed to do it with count:
for i in x:
if x.count(i) == 1:
unique.append(i)
The problem is, this is very slow for bigger lists, help?
First use a dict to count:
d = {}
for i in x:
if i not in d:
d[i] = 0
d[i] += 1
y = [i for i, j in d.iteritems() if j == 1]
x=["a","b","a","b","c","d"]
from collections import Counter
print([k for k,v in Counter(x).items() if v == 1])
['c', 'd']
Or to guarantee the order create the Counter dict first then iterate over the x list doing lookups for the values only keeping k's that have a value of 1:
x = ["a","b","a","b","c","d"]
from collections import Counter
cn = Counter(x)
print([k for k in x if cn[k] == 1])
So one pass over x to create the dict and another pass in the comprehension giving you an overall 0(n) solution as opposed to your quadratic approach using count.
The Counter dict counts the occurrences of each element:
In [1]: x = ["a","b","a","b","c","d"]
In [2]: from collections import Counter
In [3]: cn = Counter(x)
In [4]: cn
Out[4]: Counter({'b': 2, 'a': 2, 'c': 1, 'd': 1})
In [5]: cn["a"]
Out[5]: 2
In [6]: cn["b"]
Out[6]: 2
In [7]: cn["c"]
Out[7]: 1
Doing cn[k] returns the count for each element so we only end up keeping c and d.
The best way to do this is my using the set() function like this:
x=['a','b','a','b','c','d']
print list(set(x))
As the set() function returns an unordered result. Using the sorted() function, this problem can be solved like so:
x=['a','b','a','b','c','d']
print list(sorted(set(x)))

Categories

Resources