Print count (word occurrence) from a random text (Print hackerearth) - python

I am trying to find the count of occurrence of fixed word from any given string.
Fixed word = 'hackerearth'
Random string may be s = 'aahkcreeatrhaaahkcreeatrha'
Now from string we can generate 2-times hackerearth.
I have written some code to find the count of (h,a,e,r,c,k,t) letters in string:
Code:
word = list(raw_input())
print word
h = word.count('h')
a = word.count('a')
c = word.count('c')
k = word.count('k')
e = word.count('e')
r = word.count('r')
t = word.count('t')
if (h >= 2 and a >= 2 and e >= 2 and r >=2) and (c >= 1 and k >= 1 and t >=1 ):
hc = h/2
ac = a/2
ec = e/2
rc = r/2
num_words = []
num_words.append(hc)
num_words.append(ac)
num_words.append(ec)
num_words.append(rc)
num_words.append(c)
num_words.append(k)
num_words.append(t)
print num_words
Output:
[2, 4, 2, 2, 2, 2, 2]
From above output list, I want to calculate the total occurrence of word.
How can I get total count of fixed word and any other way to make this code easier?

You could utilize Counter:
from collections import Counter
s = 'aahkcreeatrhaaahkcreeatrha'
word = 'hackerearth'
wd = Counter(word)
sd = Counter(s)
print(min((sd.get(c, 0) // wd[c] for c in wd), default=0))
Output:
2
Above code will create two dict like counters where letters are keys and their occurrence are values. Then it will use generator expression to iterate over the letters found in the word and for each letter generate the ratio. min will pick the lowest ratio and default value of 0 is used for case where word is empty string.

When looking for a substring, you need to account for the character order, and not just the counts
something like this should work:
def subword(lookup,whole):
if len(whole)<len(lookup):
return 0
if lookup==whole:
return 1
if lookup=='':
return 1
if lookup[0]==whole[0]:
return subword(lookup[1:],whole[1:])+subword(lookup,whole[1:])
return subword(lookup,whole[1:])
For example:
In [21]: subword('hello','hhhello')
Out[21]: 3
Because you can choose each of the 3 hs and construct the word hello with the remainder

Related

Check the most frequent letter(s) in a word. Python

My task is:
To write a function that gets a string as an argument and returns the letter(s) with the maximum appearance in it.
Example 1:
s = 'Astana'
Output:
a
Example 2:
s = 'Kaskelen'
Output:
ke
So far, I've got this code(click to run):
a = input()
def most_used(w):
a = list(w)
indexes = []
g_count_max = a.count(a[0])
for letter in a:
count = 0
i = int()
for index in range(len(a)):
if letter == a[index] or letter == a[index].upper():
count += 1
i = index
if g_count_max <= count: //here is the problem.
g_count_max = count
if i not in indexes:
indexes.append(i)
letters = str()
for i in indexes:
letters = letters + a[i].lower()
return letters
print(most_used(a))
The problem is that it automatically adds first letter to the array because the sum of appearance of the first element is actually equal to the starter point of appearance(which is basically the first element).
Example 1:
s = 'hheee'
Output:
he
Example 2:
s = 'malaysia'
Output:
ma
I think what you're trying to can be much simplified by using the standard library's Counter object
from collections import Counter
def most_used(word):
# this has the form [(letter, count), ...] ordered from most to least common
most_common = Counter(word.lower()).most_common()
result = []
for letter, count in most_common:
if count == most_common[0][1]:
result.append(letter) # if equal largest -- add to result
else:
break # otherwise don't bother looping over the whole thing
return result # or ''.join(result) to return a string
You can use a dictionary comprehension with a list comprehension and max():
s = 'Kaskelen'
s_lower = s.lower() #convert string to lowercase
counts = {i: s_lower.count(i) for i in s_lower}
max_counts = max(counts.values()) #maximum count
most_common = ''.join(k for k,v in counts.items() if v == max_counts)
Yields:
'ke'
try this code using list comprehensions:
word = input('word=').lower()
letters = set(list(word))
max_w = max([word.count(item) for item in letters])
out = ''.join([item for item in letters if word.count(item)==max_w])
print(out)
Also you can import Counter lib:
from collections import Counter
a = "dagsdvwdsbd"
print(Counter(a).most_common(3)[0][0])
Then it returns:
d

Write a program that prints the number of times the string contains a substring

s = "bobobobobobsdfsdfbob"
count = 0
for x in s :
if x == "bob" :
count += 1
print count
i want to count how many bobs in string s, the result if this gives me 17
what's wrong with my code i'm newbie python.
When you are looping overt the string, the throwaway variable will hold the characters, so in your loop x is never equal with bob.
If you want to count the non-overlaping strings you can simply use str.count:
In [52]: s.count('bob')
Out[52]: 4
For overlapping sub-strings you can use lookaround in regex:
In [57]: import re
In [59]: len(re.findall(r'(?=bob)', s))
Out[59]: 6
you can use string.count
for example:
s = "bobobobobobsdfsdfbob"
count = s.count("bob")
print(count)
I'm not giving the best solution, just trying to correct your code.
Understanding what for each (a.k.a range for) does in your case
for c in "Hello":
print c
Outputs:
H
e
l
l
o
In each iteration you are comparing a character to a string which results in a wrong answer.
Try something like
(For no overlapping, i.e no span)
s = "bobobobobobsdfsdfbob"
w = "bob"
count = 0
i = 0
while i <= len(s) - len(w):
if s[i:i+len(w)] == w:
count += 1
i += len(w)
else:
i += 1
print (count)
Output:
Count = 4
Overlapping
s = "bobobobobobsdfsdfbob"
w = "bob"
count = 0
for i in range(len(s) - len(w) + 1):
if s[i:i+len(w)] == w:
count += 1
print (count)
Output:
Count = 6

Comparing occurrences of characters in strings

code
def jottoScore(s1,s2):
n = len(s1)
score = 0
sorteds1 = ''.join(sorted(s1))
sorteds2 = ''.join(sorted(s2))
if sorteds1 == sorteds2:
return n
if(sorteds1[0] == sorteds2[0]):
score = 1
if(sorteds2[1] == sorteds2[1]):
score = 2
if(sorteds2[2] == sorteds2[2]):
score = 3
if(sorteds2[3] == sorteds2[3]):
score = 4
if(sorteds2[4] == sorteds2[4]):
score = 5
return score
print jottoScore('cat', 'mattress')
I am trying to write a jottoScore function that will take in two strings and return how many character occurrences are shared between two strings.
I.E jottoScore('maat','caat') should return 3, because there are two As being shared and one T being shared.
I feel like this is a simple enough independent practice problem, but I can't figure out how to iterate over the strings and compare each character(I already sorted the strings alphabetically).
If you are on Python2.7+ then this is the approach I would take:
from collections import Counter
def jotto_score(str1, str2):
count1 = Counter(str1)
count2 = Counter(str2)
return sum(min(v, count2.get(k, 0)) for k, v in count1.items())
print jotto_score("caat", "maat")
print jotto_score("bigzeewig", "ringzbuz")
OUTPUT
3
4
in case they are sorted and the order matters:
>>> a = "maat"
>>> b = "caat"
>>> sum(1 for c1,c2 in zip(a,b) if c1==c2)
3
def chars_occur(string_a, string_b):
list_a, list_b = list(string_a), list(string_b) #makes a list of all the chars
count = 0
for c in list_a:
if c in list_b:
count += 1
list_b.remove(c)
return count
EDIT: this solution doesn't take into account if the chars are at the same index in the string or that the strings are of the same length.
A streamlined version of #sberry answer.
from collections import Counter
def jotto_score(str1, str2):
return sum((Counter(str1) & Counter(str2)).values())

check if string is in abc order

So the function should count the number of times the letters in uppercase are out of abc order.
>>> abc('ABBZHDL')
2
Above, z and d are out of order.
>>> abc('ABCD')
0
>>> abc('DCBA')
4
My code:
def abc(check):
order=ABCDEFGHIJKLMNOPQRSTUVWXYZ
for c in check:
if check != order:
#then I get stuck here
Pointers?
The question is ill-defined. One solution to a nearby question would be using the builtin sorted():
def abc(s):
count = 0
s = ''.join(i for i in s if i.isupper())
l = sorted(s)
for i,c in enumerate(s):
if l[i] != c:
count += 1
return count
It counts all of the places where the alphabetized string does not match the original.
def abc(check):
last = ''
count = 0
for letter in check:
if not letter.isupper():
continue
if letter < last:
count += 1
last = letter
return count
import string
a = 'acbdefr'
b = 'abdcfe'
assert ''.join(sorted(b)) in string.ascii_letters
assert ''.join(sorted(a)) in string.ascii_letters #should fail
Its really simple everyone seems to be overcomplicating it somewhat?

Counting longest occurrence of repeated sequence in Python

What's the easiest way to count the longest consecutive repeat of a certain character in a string? For example, the longest consecutive repeat of "b" in the following string:
my_str = "abcdefgfaabbbffbbbbbbfgbb"
would be 6, since other consecutive repeats are shorter (3 and 2, respectively.) How can I do this in Python?
How about a regex example:
import re
my_str = "abcdefgfaabbbffbbbbbbfgbb"
len(max(re.compile("(b+b)*").findall(my_str))) #changed the regex from (b+b) to (b+b)*
# max([len(i) for i in re.compile("(b+b)").findall(my_str)]) also works
Edit, Mine vs. interjays
x=timeit.Timer(stmt='import itertools;my_str = "abcdefgfaabbbffbbbbbbfgbb";max(len(list(y)) for (c,y) in itertools.groupby(my_str) if c=="b")')
x.timeit()
22.759046077728271
x=timeit.Timer(stmt='import re;my_str = "abcdefgfaabbbffbbbbbbfgbb";len(max(re.compile("(b+b)").findall(my_str)))')
x.timeit()
8.4770550727844238
Here is a one-liner:
max(len(list(y)) for (c,y) in itertools.groupby(my_str) if c=='b')
Explanation:
itertools.groupby will return groups of consecutive identical characters, along with an iterator for all items in that group. For each such iterator, len(list(y)) will give the number of items in the group. Taking the maximum of that (for the given character) will give the required result.
Here's my really boring, inefficient, straightforward counting method (interjay's is much better). Note, I wrote this in this little text field, which doesn't have an interpreter, so I haven't tested it, and I may have made a really dumb mistake that a proof-read didn't catch.
my_str = "abcdefgfaabbbffbbbbbbfgbb"
last_char = ""
current_seq_len = 0
max_seq_len = 0
for c in mystr:
if c == last_char:
current_seq_len += 1
if current_seq_len > max_seq_len:
max_seq_len = current_seq_len
else:
current_seq_len = 1
last_char = c
print(max_seq_len)
Using run-length encoding:
import numpy as NP
signal = NP.array([4,5,6,7,3,4,3,5,5,5,5,3,4,2,8,9,0,1,2,8,8,8,0,9,1,3])
px, = NP.where(NP.ediff1d(signal) != 0)
px = NP.r_[(0, px+1, [len(signal)])]
# collect the run-lengths for each unique item in the signal
rx = [ (m, n, signal[m]) for (m, n) in zip(px[:-1], px[1:]) if (n - m) > 1 ]
# get longest:
rx2 = [ (b-a, c) for (a, b, c) in rx ]
rx2.sort(reverse=True)
# returns: [(4, 5), (3, 8)], ie, '5' occurs 4 times consecutively, '8' occurs 3 times consecutively
Here is my code, Not that efficient but seems to work:
def LongCons(mystring):
dictionary = {}
CurrentCount = 0
latestchar = ''
for i in mystring:
if i == latestchar:
CurrentCount += 1
if dictionary.has_key(i):
if CurrentCount > dictionary[i]:
dictionary[i]=CurrentCount
else:
CurrentCount = 1
dictionary.update({i: CurrentCount})
latestchar = i
k = max(dictionary, key=dictionary.get)
print(k, dictionary[k])
return

Categories

Resources