Calculate percentages in Python (0% to100%) - python

At this code, I have a data with lots combinations from 'a' 'b' 'c' 'd' and I am trying to find out how often each combination is existing.(example of the data: abdc, abcc, abcd, abbb, aaaa, abdc,...)
After that I want to have the answer in percentage from 0% to 100% of each letter combination. Also if it's zero.
Example Input:
letters: ['abc','aaa','abb','acc','aac','abc','bbb','ccc','ddd','abc','adc','acd','acd','aac','aad','bba','bab','abb','abc','abd'...]
I get df from this: ( tab_files is the file where get my data)
for i, tab_file in enumerate(tab_files):
df = pd.DataFrame.from_csv(tab_file, sep='\t')
Here is my try:
#letter_l = all combinations of letters (abcd) together
nt_l = "abcd"
letter_l = []
for i1 in nt_l:
for i2 in nt_l:
for i3 in nt_l:
letter = i1+i2+i3
letter_l.append(letter)
#print(letter_l)
#calculates the amount of each letter combination and shows the percentage
x = []
number_per_combination = {}
for b in letter_l:
counter = 0
number_per_combination[b] = 0
for c2 in df.letter:
if c2 == b:
counter +=1
number_per_combination[b] += 1
# amount of each letter combination divided through the whole amount
x.append(counter/(len(df.letter)))
but I get strange percentages as answer... I don't understand why. Can somebody help me?
Output I want: number_per combination
'abc': 20% (40)
'aaa': 10% (20)
'ccd': 0% (0)
'ddd': 3% (6)...

So what you're trying to do is a histogram? Here's a simple way to do it:
input_list = ['a', 'a', 'b', 'b', 'b', 'c']
def histogram(my_list):
result = {}
for item in my_list:
result[item] = result.get(item, 0) + 1
return result
print(str(histogram(input_list)))
The .get() method returns the value for the given key from the dictionary. If the key isn't there, it is inserted and given the value provided in the second argument.

import re
import itertools
data="aaa, abc, aab"
words = re.split(', ',data)
words_count = {}
total_count = len( words )
for word in list(itertools.product(["a","b","c","d"], repeat=3)):
words_count["".join(word)] = 0
for word in words:
words_count[word] = words_count.get(word,0) + 1
for word in words_count:
p = words_count[word]/total_count * 100
print( "%s: %.3f%%\t(%d)" % (word,p,words_count[word]) )

Related

Sort non repeated characters and then repeated ones?

I am doing the following programming exercise: Return String As Sorted Blocks. The statement is:
Task
You will receive a string consisting of lowercase letters, uppercase
letters and digits as input. Your task is to return this string as
blocks separated by dashes ("-"). The elements of a block should be
sorted with respect to the hierarchy listed below, and each block
cannot contain multiple instances of the same character.
The hierarchy is:
lowercase letters (a - z), in alphabetic order
uppercase letters (A - Z), in alphabetic order
digits (0 - 9), in ascending order
Examples
"21AxBz" -> "xzAB12" - since input does not contain repeating characters, you only need 1 block
"abacad" -> "abcd-a-a" - character "a" repeats 3 times, thus 3 blocks are needed
"" -> "" - an empty input should result in an empty output
Good luck!
I have written the following code:
def blocks(s):
print("s: "+s)
lowers = []
uppers = []
digits = []
for c in s:
if c.islower():
lowers.append(c)
if c.isupper():
uppers.append(c)
if c.isdigit():
digits.append(c)
lowers.sort()
uppers.sort()
digits.sort()
print("lowers: ")
print(lowers)
print("uppers: ")
print(uppers)
print("digits: ")
print(digits)
result = ""
sorted = lowers+uppers+digits
removedLetters = 0
needsNextBlock = False
nextBlock = "-"
while len(sorted) > 0:
for i, c in enumerate(sorted):
print(i, c)
print("result: ")
print(result)
if c not in result:
result += c
print("we want to delete: ")
print(c)
sorted = sorted[0:i-removedLetters] + sorted[i+1-removedLetters:]
removedLetters += 1
print("new sorted: ")
print(sorted)
else:
if c not in nextBlock:
needsNextBlock = True
nextBlock += c
sorted = sorted[0:i-removedLetters] + sorted[i+1-removedLetters:]
removedLetters += 1
print("new sorted: ")
print(sorted)
if needsNextBlock:
result += nextBlock
needsNextBlock = False
nextBlock = "-"
return result
And there is a bug, because of when we have the following test:
Test.assert_equals(blocks("abacad"), "abcd-a-a")
The trace is:
s: abacad
lowers:
['a', 'a', 'a', 'b', 'c', 'd']
uppers:
[]
digits:
[]
0 a
result:
we want to delete:
a
new sorted:
['a', 'a', 'b', 'c', 'd']
1 a
result:
a
new sorted:
['a', 'b', 'c', 'd']
2 a
result:
a
3 b
result:
a
we want to delete:
b
new sorted:
['a', 'c', 'd']
4 c
result:
ab
we want to delete:
c
new sorted:
['a', 'd']
5 d
result:
abc
we want to delete:
d
new sorted:
['a']
0 a
result:
abcd-a
new sorted:
['a']
0 a
result:
abcd-a-a
new sorted:
['a']
0 a
result:
abcd-a-a-a
new sorted:
['a']
0 a
result:
abcd-a-a-a-a
new sorted:
['a']
0 a
(infinite loop)
So as we see the difficulty is created when we execute:
sorted = sorted[0:i-removedLetters] + sorted[i+1-removedLetters:]
removedLetters += 1
Because we have previously passed over the repeated letter, in this case 'a', but we have not counted it, so the calculus for the new sorted substring keeps being the same.
I tried a naive approach:
def blocks(s):
print("\n\n\ns: "+s)
lowers = []
uppers = []
digits = []
for c in s:
if c.islower():
lowers.append(c)
if c.isupper():
uppers.append(c)
if c.isdigit():
digits.append(c)
lowers.sort()
uppers.sort()
digits.sort()
print("lowers: ")
print(lowers)
print("uppers: ")
print(uppers)
print("digits: ")
print(digits)
result = ""
sorted = lowers+uppers+digits
removedLetters = 0
needsNextBlock = False
nextBlock = "-"
while len(sorted) > 0:
initialIterationLength = len(sorted)
for i, c in enumerate(sorted):
print(i, c)
print("result: ")
print(result)
if c not in result:
result += c
print("we want to delete: ")
print(c)
sorted = sorted[0:i-removedLetters] + sorted[i+1-removedLetters:]
removedLetters += 1
print("new sorted: ")
print(sorted)
else:
if c not in nextBlock:
needsNextBlock = True
nextBlock += c
sorted = sorted[0:i-removedLetters] + sorted[i+1-removedLetters:]
removedLetters += 1
if initialIterationLength == len(sorted):
sorted = []
print("new sorted: ")
print(sorted)
if needsNextBlock:
result += nextBlock
needsNextBlock = False
nextBlock = "-"
return result
As you see, I added when we start the while loop the sentence: initialIterationLength = len(sorted) and inside the loop, in the if condition:
if initialIterationLength == len(sorted):
sorted = []
It does work for the test being discussed, however for larger inputs it won't work.
For example when input is:
ZrXx2VpVJMgPs54SwwxSophZEWvwKUxzqNxaxlgY0T
Our result is:
aghlopqrsvwxzEJKMNPSTUVWXYZ0245-gpwxSVZ-wx
Expected is:
aghlopqrsvwxzEJKMNPSTUVWXYZ0245-gpwxSVZ-wx-x-x
I think there should be a better algorithm.
I have read:
How do I get a substring of a string in Python?
Does Python have a string 'contains' substring method?
Accessing the index in 'for' loops?
How do I concatenate two lists in Python?
How can I check if a string represents an int, without using try/except?
Check if string is upper, lower, or mixed case in Python
Iterating each character in a string using Python
How to detect lowercase letters in Python?
How could we sort non repeated characters and then repeated ones?
You could use a Counter to keep track of the iterations you need according to the repeated digits.
import string
from collections import Counter
ORDER = {s:i for i, s in enumerate(string.ascii_letters + string.digits)}
def my_sorted(s):
c = Counter(s)
res = []
it = 1
to_sort = set(c)
while len(to_sort) > 0:
res.append(sorted(to_sort ,key=lambda x:ORDER[x]))
to_sort = [k for k in c if c[k] > it]
it+=1
return "-".join(["".join(l) for l in res])
Example:
>>> s="ZrXx2VpVJMgPs54SwwxSophZEWvwKUxzqNxaxlgY0T"
>>> my_sorted(s)
aghlopqrsvwxzEJKMNPSTUVWXYZ0245-gpwxSVZ-gpwxSVZ-wx-x-x
Stealing from #abc's answer...
import string
from collections import Counter
ORDER = {s:i for i, s in enumerate(string.ascii_letters + string.digits)}
def my_sorted(s):
c = Counter(s)
res = []
while c:
res.append(''.join(sorted(c, key=ORDER.get)))
c -= Counter(set(c))
return "-".join(res)
Example:
>>> s = "ZrXx2VpVJMgPs54SwwxSophZEWvwKUxzqNxaxlgY0T"
>>> my_sorted(s)
'aghlopqrsvwxzEJKMNPSTUVWXYZ0245-gpwxSVZ-wx-x-x'
Still stealing #abc's setup, but completely different solution. I append the needed number of dashes and then sort everything based on 1) the how-many-eth occurrence a character is and 2) the aA0- order.
import string
from collections import Counter
ORDER = {s:i for i, s in enumerate(string.ascii_letters + string.digits + '-')}
def my_sorted(s):
return ''.join(sorted(s + '-' * (max(Counter(s).values()) - 1),
key=lambda c, ctr=Counter(): (ctr.update(c) or ctr[c], ORDER[c])))
Example:
>>> s = "ZrXx2VpVJMgPs54SwwxSophZEWvwKUxzqNxaxlgY0T"
>>> my_sorted(s)
'aghlopqrsvwxzEJKMNPSTUVWXYZ0245-gpwxSVZ-wx-x-x'

Print count (word occurrence) from a random text (Print hackerearth)

I am trying to find the count of occurrence of fixed word from any given string.
Fixed word = 'hackerearth'
Random string may be s = 'aahkcreeatrhaaahkcreeatrha'
Now from string we can generate 2-times hackerearth.
I have written some code to find the count of (h,a,e,r,c,k,t) letters in string:
Code:
word = list(raw_input())
print word
h = word.count('h')
a = word.count('a')
c = word.count('c')
k = word.count('k')
e = word.count('e')
r = word.count('r')
t = word.count('t')
if (h >= 2 and a >= 2 and e >= 2 and r >=2) and (c >= 1 and k >= 1 and t >=1 ):
hc = h/2
ac = a/2
ec = e/2
rc = r/2
num_words = []
num_words.append(hc)
num_words.append(ac)
num_words.append(ec)
num_words.append(rc)
num_words.append(c)
num_words.append(k)
num_words.append(t)
print num_words
Output:
[2, 4, 2, 2, 2, 2, 2]
From above output list, I want to calculate the total occurrence of word.
How can I get total count of fixed word and any other way to make this code easier?
You could utilize Counter:
from collections import Counter
s = 'aahkcreeatrhaaahkcreeatrha'
word = 'hackerearth'
wd = Counter(word)
sd = Counter(s)
print(min((sd.get(c, 0) // wd[c] for c in wd), default=0))
Output:
2
Above code will create two dict like counters where letters are keys and their occurrence are values. Then it will use generator expression to iterate over the letters found in the word and for each letter generate the ratio. min will pick the lowest ratio and default value of 0 is used for case where word is empty string.
When looking for a substring, you need to account for the character order, and not just the counts
something like this should work:
def subword(lookup,whole):
if len(whole)<len(lookup):
return 0
if lookup==whole:
return 1
if lookup=='':
return 1
if lookup[0]==whole[0]:
return subword(lookup[1:],whole[1:])+subword(lookup,whole[1:])
return subword(lookup,whole[1:])
For example:
In [21]: subword('hello','hhhello')
Out[21]: 3
Because you can choose each of the 3 hs and construct the word hello with the remainder

Count consecutive characters

How would I count consecutive characters in Python to see the number of times each unique digit repeats before the next unique digit?
At first, I thought I could do something like:
word = '1000'
counter = 0
print range(len(word))
for i in range(len(word) - 1):
while word[i] == word[i + 1]:
counter += 1
print counter * "0"
else:
counter = 1
print counter * "1"
So that in this manner I could see the number of times each unique digit repeats. But this, of course, falls out of range when i reaches the last value.
In the example above, I would want Python to tell me that 1 repeats 1, and that 0 repeats 3 times. The code above fails, however, because of my while statement.
How could I do this with just built-in functions?
Consecutive counts:
You can use itertools.groupby:
s = "111000222334455555"
from itertools import groupby
groups = groupby(s)
result = [(label, sum(1 for _ in group)) for label, group in groups]
After which, result looks like:
[("1": 3), ("0", 3), ("2", 3), ("3", 2), ("4", 2), ("5", 5)]
And you could format with something like:
", ".join("{}x{}".format(label, count) for label, count in result)
# "1x3, 0x3, 2x3, 3x2, 4x2, 5x5"
Total counts:
Someone in the comments is concerned that you want a total count of numbers so "11100111" -> {"1":6, "0":2}. In that case you want to use a collections.Counter:
from collections import Counter
s = "11100111"
result = Counter(s)
# {"1":6, "0":2}
Your method:
As many have pointed out, your method fails because you're looping through range(len(s)) but addressing s[i+1]. This leads to an off-by-one error when i is pointing at the last index of s, so i+1 raises an IndexError. One way to fix this would be to loop through range(len(s)-1), but it's more pythonic to generate something to iterate over.
For string that's not absolutely huge, zip(s, s[1:]) isn't a a performance issue, so you could do:
counts = []
count = 1
for a, b in zip(s, s[1:]):
if a==b:
count += 1
else:
counts.append((a, count))
count = 1
The only problem being that you'll have to special-case the last character if it's unique. That can be fixed with itertools.zip_longest
import itertools
counts = []
count = 1
for a, b in itertools.zip_longest(s, s[1:], fillvalue=None):
if a==b:
count += 1
else:
counts.append((a, count))
count = 1
If you do have a truly huge string and can't stand to hold two of them in memory at a time, you can use the itertools recipe pairwise.
def pairwise(iterable):
"""iterates pairwise without holding an extra copy of iterable in memory"""
a, b = itertools.tee(iterable)
next(b, None)
return itertools.zip_longest(a, b, fillvalue=None)
counts = []
count = 1
for a, b in pairwise(s):
...
A solution "that way", with only basic statements:
word="100011010" #word = "1"
count=1
length=""
if len(word)>1:
for i in range(1,len(word)):
if word[i-1]==word[i]:
count+=1
else :
length += word[i-1]+" repeats "+str(count)+", "
count=1
length += ("and "+word[i]+" repeats "+str(count))
else:
i=0
length += ("and "+word[i]+" repeats "+str(count))
print (length)
Output :
'1 repeats 1, 0 repeats 3, 1 repeats 2, 0 repeats 1, 1 repeats 1, and 0 repeats 1'
#'1 repeats 1'
Totals (without sub-groupings)
#!/usr/bin/python3 -B
charseq = 'abbcccdddd'
distros = { c:1 for c in charseq }
for c in range(len(charseq)-1):
if charseq[c] == charseq[c+1]:
distros[charseq[c]] += 1
print(distros)
I'll provide a brief explanation for the interesting lines.
distros = { c:1 for c in charseq }
The line above is a dictionary comprehension, and it basically iterates over the characters in charseq and creates a key/value pair for a dictionary where the key is the character and the value is the number of times it has been encountered so far.
Then comes the loop:
for c in range(len(charseq)-1):
We go from 0 to length - 1 to avoid going out of bounds with the c+1 indexing in the loop's body.
if charseq[c] == charseq[c+1]:
distros[charseq[c]] += 1
At this point, every match we encounter we know is consecutive, so we simply add 1 to the character key. For example, if we take a snapshot of one iteration, the code could look like this (using direct values instead of variables, for illustrative purposes):
# replacing vars for their values
if charseq[1] == charseq[1+1]:
distros[charseq[1]] += 1
# this is a snapshot of a single comparison here and what happens later
if 'b' == 'b':
distros['b'] += 1
You can see the program output below with the correct counts:
➜ /tmp ./counter.py
{'b': 2, 'a': 1, 'c': 3, 'd': 4}
You only need to change len(word) to len(word) - 1. That said, you could also use the fact that False's value is 0 and True's value is 1 with sum:
sum(word[i] == word[i+1] for i in range(len(word)-1))
This produces the sum of (False, True, True, False) where False is 0 and True is 1 - which is what you're after.
If you want this to be safe you need to guard empty words (index -1 access):
sum(word[i] == word[i+1] for i in range(max(0, len(word)-1)))
And this can be improved with zip:
sum(c1 == c2 for c1, c2 in zip(word[:-1], word[1:]))
If we want to count consecutive characters without looping, we can make use of pandas:
In [1]: import pandas as pd
In [2]: sample = 'abbcccddddaaaaffaaa'
In [3]: d = pd.Series(list(sample))
In [4]: [(cat[1], grp.shape[0]) for cat, grp in d.groupby([d.ne(d.shift()).cumsum(), d])]
Out[4]: [('a', 1), ('b', 2), ('c', 3), ('d', 4), ('a', 4), ('f', 2), ('a', 3)]
The key is to find the first elements that are different from their previous values and then make proper groupings in pandas:
In [5]: sample = 'abba'
In [6]: d = pd.Series(list(sample))
In [7]: d.ne(d.shift())
Out[7]:
0 True
1 True
2 False
3 True
dtype: bool
In [8]: d.ne(d.shift()).cumsum()
Out[8]:
0 1
1 2
2 2
3 3
dtype: int32
This is my simple code for finding maximum number of consecutive 1's in binaray string in python 3:
count= 0
maxcount = 0
for i in str(bin(13)):
if i == '1':
count +=1
elif count > maxcount:
maxcount = count;
count = 0
else:
count = 0
if count > maxcount: maxcount = count
maxcount
There is no need to count or groupby. Just note the indices where a change occurs and subtract consecutive indicies.
w = "111000222334455555"
iw = [0] + [i+1 for i in range(len(w)-1) if w[i] != w[i+1]] + [len(w)]
dw = [w[i] for i in range(len(w)-1) if w[i] != w[i+1]] + [w[-1]]
cw = [ iw[j] - iw[j-1] for j in range(1, len(iw) ) ]
print(dw) # digits
['1', '0', '2', '3', '4']
print(cw) # counts
[3, 3, 3, 2, 2, 5]
w = 'XXYXYYYXYXXzzzzzYYY'
iw = [0] + [i+1 for i in range(len(w)-1) if w[i] != w[i+1]] + [len(w)]
dw = [w[i] for i in range(len(w)-1) if w[i] != w[i+1]] + [w[-1]]
cw = [ iw[j] - iw[j-1] for j in range(1, len(iw) ) ]
print(dw) # characters
print(cw) # digits
['X', 'Y', 'X', 'Y', 'X', 'Y', 'X', 'z', 'Y']
[2, 1, 1, 3, 1, 1, 2, 5, 3]
A one liner that returns the amount of consecutive characters with no imports:
def f(x):s=x+" ";t=[x[1] for x in zip(s[0:],s[1:],s[2:]) if (x[1]==x[0])or(x[1]==x[2])];return {h: t.count(h) for h in set(t)}
That returns the amount of times any repeated character in a list is in a consecutive run of characters.
alternatively, this accomplishes the same thing, albeit much slower:
def A(m):t=[thing for x,thing in enumerate(m) if thing in [(m[x+1] if x+1<len(m) else None),(m[x-1] if x-1>0 else None)]];return {h: t.count(h) for h in set(t)}
In terms of performance, I ran them with
site = 'https://web.njit.edu/~cm395/theBeeMovieScript/'
s = urllib.request.urlopen(site).read(100_000)
s = str(copy.deepcopy(s))
print(timeit.timeit('A(s)',globals=locals(),number=100))
print(timeit.timeit('f(s)',globals=locals(),number=100))
which resulted in:
12.528256356999918
5.351301653001428
This method can definitely be improved, but without using any external libraries, this was the best I could come up with.
In python
your_string = "wwwwweaaaawwbbbbn"
current = ''
count = 0
for index, loop in enumerate(your_string):
current = loop
count = count + 1
if index == len(your_string)-1:
print(f"{count}{current}", end ='')
break
if your_string[index+1] != current:
print(f"{count}{current}",end ='')
count = 0
continue
This will output
5w1e4a2w4b1n
#I wrote the code using simple loops and if statement
s='feeekksssh' #len(s) =11
count=1 #f:0, e:3, j:2, s:3 h:1
l=[]
for i in range(1,len(s)): #range(1,10)
if s[i-1]==s[i]:
count = count+1
else:
l.append(count)
count=1
if i == len(s)-1: #To check the last character sequence we need loop reverse order
reverse_count=1
for i in range(-1,-(len(s)),-1): #Lopping only for last character
if s[i] == s[i-1]:
reverse_count = reverse_count+1
else:
l.append(reverse_count)
break
print(l)
Today I had an interview and was asked the same question. I was struggling with the original solution in mind:
s = 'abbcccda'
old = ''
cnt = 0
res = ''
for c in s:
cnt += 1
if old != c:
res += f'{old}{cnt}'
old = c
cnt = 0 # default 0 or 1 neither work
print(res)
# 1a1b2c3d1
Sadly this solution always got unexpected edge cases result(is there anyone to fix the code? maybe i need post another question), and finally timeout the interview.
After the interview I calmed down and soon got a stable solution I think(though I like the groupby best).
s = 'abbcccda'
olds = []
for c in s:
if olds and c in olds[-1]:
olds[-1].append(c)
else:
olds.append([c])
print(olds)
res = ''.join([f'{lst[0]}{len(lst)}' for lst in olds])
print(res)
# [['a'], ['b', 'b'], ['c', 'c', 'c'], ['d'], ['a']]
# a1b2c3d1a1
Here is my simple solution:
def count_chars(s):
size = len(s)
count = 1
op = ''
for i in range(1, size):
if s[i] == s[i-1]:
count += 1
else:
op += "{}{}".format(count, s[i-1])
count = 1
if size:
op += "{}{}".format(count, s[size-1])
return op
data_input = 'aabaaaabbaaaaax'
start = 0
end = 0
temp_dict = dict()
while start < len(data_input):
if data_input[start] == data_input[end]:
end = end + 1
if end == len(data_input):
value = data_input[start:end]
temp_dict[value] = len(value)
break
if data_input[start] != data_input[end]:
value = data_input[start:end]
temp_dict[value] = len(value)
start = end
print(temp_dict)
PROBLEM: we need to count consecutive characters and return characters with their count.
def countWithString(input_string:str)-> str:
count = 1
output = ''
for i in range(1,len(input_string)):
if input_string[i]==input_string[i-1]:
count +=1
else:
output += f"{count}{input_string[i-1]}"
count = 1
# Used to add last string count (at last else condition will not run and data will not be inserted to ouput string)
output += f"{count}{input_string[-1]}"
return output
countWithString(input)
input:'aaabbbaabbcc'
output:'3a3b2a2b2c'
Time Complexity: O(n)
Space Complexity: O(1)
temp_str = "aaaajjbbbeeeeewwjjj"
def consecutive_charcounter(input_str):
counter = 0
temp_list = []
for i in range(len(input_str)):
if i==0:
counter+=1
elif input_str[i]== input_str[i-1]:
counter+=1
if i == len(input_str)-1:
temp_list.extend([input_str[i - 1], str(counter)])
else:
temp_list.extend([input_str[i-1],str(counter)])
counter = 1
print("".join(temp_list))
consecutive_charcounter(temp_str)

I have a program to find the amount of alphabets in a string but its not complete can you complete it python

def encode(source):
dest="";
i=0
while i<len(source):
runLength = 1;
while source[runLength] == source[runLength-1]:
runLength=runLength+1
i=i+1
dest+=source[i]
dest+=str(runLength)
i=i+1
return dest
source = raw_input("")
print (encode(source))
sample input:
AABBCCCCDADD
sample output:
3A2B4C3D
please fix it, mostly changing line 6 should do it I think
You can simply do it using dictionary.
x="AABBCCCCDDD"
d={}
for i in x:
d.setdefault(i,0)
d[i]=d[i]+1
print "".join([str(j)+i for i,j in d.items()])
The best way is to use a dict to keep the count, an OrderedDict will also keep the order for you:
from collections import OrderedDict
def encode(source):
od = OrderedDict()
# iterate over input string
for ch in source:
# create key/value pairing if key not alread in dict
od.setdefault(ch,0)
# increase count by one each time we see the char/key
od[ch] += 1
# join the key/char and the value/count for each char
return "".join([str(v)+k for k,v in od.items()])
source = "AABBCCCCDDD"
print (encode(source))
This will only work for strings like your input where the chars don't repeat later, if they do we can keep track in a loop and reset the count when we meet a char that was not the same as he previous:
def encode(source):
it = iter(source)
# set prev to first char from source
prev = next(it)
count = 1
out = ""
for ch in it:
# if prev and char are equal add 1 to count
if prev == ch:
count += 1
# else we don't have sequence so add count and prev char to output string
# and reset count to 1
else:
out += prev + str(count)
count = 1
prev = ch
# catch out last match or a single string
out += prev + str(count)
return out
Output:
In [7]: source = "AABBCCCCDDDEE"
In [8]: print (encode(source))
A2B2C4D3E2
As an alternative solution, there is a Python library called itertools that has a function which is useful in this situation. It can split your string up into groups of the same letter.
import itertools
def encode(source):
return "".join(["%u%s" % (len(list(g)), k) for k,g in itertools.groupby(source)])
print encode("AABBCCCCDDD")
This will print out the following:
2A2B4C3D
To see how this works, see the following smaller version:
for k, g in itertools.groupby("AABBCCCCDDD"):
print k, list(g)
This prints the following:
A ['A', 'A']
B ['B', 'B']
C ['C', 'C', 'C', 'C']
D ['D', 'D', 'D']
You can see k is the key, and g is the group. If we take the length of each group, you have your solution.

Determining the most common word from a user's input. [Python]

The way I tried to solve this problem was by entering the words of a user into a list and then using .count() to see how many times the word is in the list. The problem is whenever there is a tie, I need to print all of the words that appear the most amount of times. It works only if the words that I use aren't inside of another word that appears the same amount of times. Ex: if I use Jimmy and Jim in that order, it will only print Jimmy.
for value in usrinput:
dict.append(value)
for val in range(len(dict)):
count = dict.count(dict[val])
print(dict[val],count)
if (count > max):
max = count
common= dict[val]
elif(count == max):
if(dict[val] in common):
pass
else:
common+= "| " + dict[val]
Use a collections.Counter class. I'll give you a hint.
>>> from collections import Counter
>>> a = Counter()
>>> a['word'] += 1
>>> a['word'] += 1
>>> a['test'] += 1
>>> a.most_common()
[('word', 2), ('test', 1)]
You can extract the word and the frequencies from here.
Using it to extract frequencies from user input.
>>> userInput = raw_input("Enter Something: ")
Enter Something: abc def ghi abc abc abc ghi
>>> testDict = Counter(userInput.split(" "))
>>> testDict.most_common()
[('abc', 4), ('ghi', 2), ('def', 1)]
Why not use a collections.defaultdict?
from collections import defaultdict
d = defaultdict(int)
for value in usrinput:
d[value] += 1
To get the most common words sorted descending order by the number of occurences:
print sorted(d.items(), key=lambda x: x[1])[::-1]
Rather that concatenating to common where "Jim" in "Fred|Jimmy|etc" is true use a list to store the found max values and then print "|".join(commonlist).
This is a quick and dirty solution, not elegant at all, and uses numpy.
import numpy as np
def print_common( usrinput ):
'''prints the most common entry of usrinput, printing all entries if there is a tie '''
usrinput = np.array( usrinput )
# np.unique returns the unique elements of usrinput
unique_inputs = np.unique( usrinput )
# an array to store the counts of each input
counts = np.array( [] )
# loop over the unique inputs and store the count for each item
for u in unique_inputs:
ind = np.where( usrinput == u )
counts = np.append( counts, len( usrinput[ ind ] ) )
# find the maximum counts and indices in the original input array
max_counts = np.max( counts )
max_ind = np.where( counts == max_counts )
# if there's a tie for most common, print all of the ties
if len( max_ind[0] ) > 1:
for i in max_ind[0]:
print unique_inputs[i], counts[i]
#otherwise just print the maximum
else:
print unique_inputs[max_ind][0], counts[max_ind][0]
return 1
# two test arrays which show desired results
usrinput = ['Jim','Jim','Jim', 'Jimmy','Jimmy','Matt','Matt','Matt']
print_common( usrinput )
usrinput = ['Jim','Jim','Jim', 'Jimmy','Jimmy','Matt','Matt']
print_common( usrinput )

Categories

Resources