Python: Case Sensitive Counter - python

I'm making a small script that needs to take any string as an argument and determine if there are any duplicates (letters or numbers). However, it needs to treat uppercase and lowercase as different entities. What I got so far is:
import collections
string = str(input('Enter Sequence: '))
x = list(string)
a = [item for item, count in collections.Counter(x).items() if count > 1]
if len(a) == 0:
return True
else:
return False
This yields correct results only if there is no both upper and lower case instances of the same letter, so it wont work if I enter i.e. 'moOse'
If anyone can help on how to make different case letters count separately, I'd appreciate it.
Thank you

Unable to replicate your problem - Counter is case sensitive.
Demo:
import collections
for s in ["aa","aAbBcC"]:
x = list(s)
a = [item for item, count in collections.Counter(x).items() if count > 1]
if len(a) == 0:
print(x, "has no dupes")
else:
print(x, "has dupes:", a)
Output:
['a', 'a'] has dupes: ['a']
['a', 'A', 'b', 'B', 'c', 'C'] has no dupes
No need to use/import Counter to test if you only have unique elements. Compare len(set(data)) against len(data):
def is_unique(d):
return len(set(d)) == len(d)
for d in ["qwertzui4567QWERTZUI","AA"]:
print(f"{d} :" ,'is Unique' if is_unique(d) else 'has Duplicates')
Output:
qwertzui4567QWERTZUI : is Unique
AA : has Duplicates

Related

Need help for p-language deciphering

I don't know if you're familiar with the P-language or if it's something that's just known in my country. Basically, everytime you come across a vowel in a word, you replace the vowel the same vowel + p + the vowel again.
So 'home' would be 'hopomepe' in the p-language.
Now I'm tasked to decipher the p-language and turn sentences that are written in the p-language back to normal.
p = str(input())
for letter in range(1, len(p)):
if p[letter]=='.':
break
if p[letter-1]==p[letter+1] and p[letter]=='p':
p = p[:letter-1] + p[letter+1:]
print(p)
This is my code so far, it works except I don't know how to make it work for double vowel sounds like 'io' in scorpion (scoporpiopion) for example.
Also when a sentence starts with a vowel, this code doesn't work on that vowel.
For example 'Apan epelepephapant' becomes 'Apan elephant' with my code.
And my code crashes with string index out of bounds when it doesn't end on '.' but it crashes everytime when I don't have that if for the '.' case.
TLDR; How can I get change my code so it works for double vowels and at the start of my sentence.
EDIT: To clarify, like in my example, combination vowels should count as 1 vowel. Scorpion would be Scoporpiopion instead of Scoporpipiopon, boat would be boapoat, boot would be boopoot, ...
You can do it using regular expressions:
import re
def decodePLanguage(p):
return re.subn(r'([aeiou]+)p\1', r'\1', p, flags=re.IGNORECASE)[0]
In [1]: decodePLanguage('Apan epelepephapant')
Out[1]: 'An elephant'
In [2]: decodePLanguage('scoporpiopion')
Out[2]: 'scorpion'
This uses re.subn function to replace all regex matches.
In r'([aeiou]+)p\1', the [aeiou]+ part matches several vowels in a row, and \1 ensures you have the same combination after a p.
Then r'\1' is used to replace the whole match with the first vowel group.
EDIT: working code
def decipher(p):
result = ''
while len(p) > 2:
# first strip out all the consecutive consonants each iteration
idx = 0
while p[idx].lower() not in 'aeiou' and idx < len(p) - 2:
idx += 1
result += p[:idx]
p = p[idx:]
# if there is any string remaining to process, that starts with a vowel
if len(p) > 2:
idx = 0
# scan forward until 'p'
while p[idx].lower() != 'p':
idx += 1
# sanity check
if len(p) < (idx*2 + 1) or p[:idx].lower() != p[idx+1:2*idx+1].lower():
raise ValueError
result += p[:idx]
p = p[2*idx+1:]
result += p
return result
In your example input 'Apan epelepephapant', you compare 'A' == 'a' and get False. It seems you want to compare 'a' == 'a', that is, the str.lower() of each.
It also seems you don't check if the character before the p and after the p is a vowel; that is, if you come across the string hph, as written, your function deciphers it to simply h.
Earlier version of code below:
def decipher(p):
while len(p) > 2:
if p[0].lower() in 'aeiou' and p[0].lower() == p[2].lower() and p[1] == 'p':
result += p[0]
p = p[3:]
else:
result += p[0]
p = p[1:]
result += p
return result
called as e.g.
p = str(input())
print(decipher(p))
Since #Kolmar already gave a regex solution, I'm going to add one without regex
To help think through this, I'm going to first show you my solution to encode a regular string into p-language. In this approach, I group the characters in the string by whether or not they are vowels using itertools.groupby(). This function groups consecutive elements having the same key in the same group.
def p_encode(s):
vowels = {'a', 'e', 'i', 'o', 'u'}
s_groups = [(k, list(v)) for k, v in itertools.groupby(s, lambda c: c.lower() in vowels)]
# For scorpion, this will look like this:
# [(False, ['s', 'c']),
# (True, ['o']),
# (False, ['r', 'p']),
# (True, ['i', 'o']),
# (False, ['n'])]
p_output = []
# Now, we go over each group and do the encoding for the vowels.
for is_vowel_group, group_chars in s_groups:
p_output.extend(group_chars) # Add these chars to the output
if is_vowel_group: # Special treatment for vowel groups
p_output.append("p")
p_output.extend(c.lower() for c in group_chars)
return "".join(p_output)
I added a list comprehension to define s_groups to show you how it works. You can skip the list comprehension and directly iterate for is_vowel_group, group_chars in itertools.groupby(s, lambda c: c.lower() in vowels)
Now, to decode this, we can do something similar in reverse, but this time manually do the grouping because we need to process ps differently when they're in the middle of a vowel group.
I suggest you don't modify the string as you're iterating over it. At best, you'll have written some code that's hard to understand. At worst, you'll have bugs because the loop will try to iterate over more indices than actually exist.
Also, you iterate over 1..len(p), and then try to access p[i+1]. In the last iteration this will throw an IndexError. And because you want repeated vowels to count as a single group, this doesn't work. You're going to have to group the vowels and non-vowels separately, and then join them into a single string.
def p_decode(p):
vowels = {'a', 'e', 'i', 'o', 'u'}
p_groups = []
current_group = None
for c in p:
if current_group is not None:
# If the 'vowelness' of the current group is the same as this character
# or ( the current group is a vowel group
# and the current character is a 'p'
# and the current group doesn't contain a 'p' already )
if (c.lower() in vowels) == current_group[0] or \
( current_group[0] and
c.lower() == 'p' and
'p' not in current_group[1]):
current_group[1].append(c) # Add c to the current group
else:
current_group = None # Reset the current group to None so you can make it later
if current_group is None:
current_group = (c.lower() in vowels, [c]) # Make the current group
p_groups.append(current_group) # Append it to the list
# For scorpion => scoporpiopion
# p_groups looks like this:
# [(False, ['s', 'c']),
# (True, ['o', 'p', 'o']),
# (False, ['r', 'p']),
# (True, ['i', 'o', 'p', 'i', 'o']),
# (False, ['n'])]
p_output = []
for is_vowel_group, group_chars in p_groups:
if is_vowel_group:
h1 = group_chars[:len(group_chars)//2] # First half of the group
h2 = group_chars[-len(group_chars)//2+1:] # Second half of the group, excluding the p
# Add the first half to the output
p_output.extend(h1)
if h1 != h2:
# The second half of this group is not repeated characters
# so something in the input was wrong!
raise ValueError(f"Invalid input '{p}' to p_decode(): vowels before and after 'p' are not the same in group '{''.join(group_chars)}'")
else:
# Add all chars in non-vowel groups to the output
p_output.extend(group_chars)
return "".join(p_output)
And now, we have:
words = ["An elephant", "scorpion", "boat", "boot", "Hello World", "stupid"]
for w in words:
p = p_encode(w)
d = p_decode(p)
print(w, p, d, sep=" | ")
Which gives (prettification mine):
Word
Encoded
Decoded
An elephant
Apan epelepephapant
An elephant
scorpion
scoporpiopion
scorpion
boat
boapoat
boat
boot
boopoot
boot
Hello World
Hepellopo Woporld
Hello World
stupid
stupupipid
stupid
Also, words that aren't actually encoded correctly (such as "stupid") throw a ValueError
>>> p_decode("stupid")
ValueError: Invalid input 'stupid' to p_decode(): vowels before and after 'p' are not the same in group 'upi'

Write a while loop that takes a string and counts the vowels?

Write a while loop that takes a string and counts the vowels. Use the string “May the force be with you.” Print the results. (Answer: 8)
Any help with this would be greatly appreciated! I kept coming up with a continuous loop. It has to use a while loop and print the number of vowels (8). Thank you!!
count = 0
vowels = ['a', 'e', 'i', 'o', 'u']
s = "May the force be with you."
while i in s:
if i in vowels:
count += 1
print(count)
The statement:
while i in s:
does not do what you think. Were that while a for, it would iterate over the string one character at a time, and probably work.
However, the expression i in s (which is what that is in a while statement) simply checks if i is one of the things in the s "collection" and gives you true or false. It does not iterate i over the s collection.
If i had been set to something, the while loop would either execute infinitely or never, depending on the value of i. If i is not bound to a value, you'll get a run-time error.
As a solution, you can iterate over the characters in a string with something like (from an actual transcript):
>>> str = "pax"
>>> for ch in str:
... print(ch)
...
p
a
x
The equivalent while version would be:
>>> str = "pax"
>>> idx = 0 # OR: idx, slen = 0, len(str)
>>> while idx < len(str): # while idx < slen:
... print str[idx]
... idx += 1
...
p
a
x
though the for variant is generally considered more Pythonic for this sort of task.
Further, you can detect if a character is one of a set of characters by using in, such as in the following transcript:
>>> str = "pax"
>>> for ch in str:
... if ch in "ABCabc":
... print(f"{ch} is either a, b, or c")
...
a is either a, b, or c
So you should be able to combine that for/while loop and if statement to count the vowels and output it (with print).
Note especially the string I use for vowel checking, it also contains the upper-case vowels. And keep in mind, though your specification may only be to use the Latin-ish vowels, the Unicode world of today would not forgive this oversight. See here for example.
Since it needs to be a while, you can slice off a character at a time and loop until the string is empty. As a quick optimization, True adds as 1 and False as 0, so the interior if can be removed.
vowels = ['a', 'e', 'i', 'o', 'u']
s = "May the force be with you."
count = 0
while s:
count += s[0] in vowels
s = s[1:]
print(count)
I think you could use something like this
phrase = "May the force be with you."
vowels = 0
count = 0
while phrase[count] != ".":
if phrase[count] in 'aeiou':
vowels += 1
count+=1
print(vowels)

Deleting adjacent repeating characters in a string

I want to delete adjacent repeating characters in a string in Python 3. For ex if the input is AABBC the output is ABC or if the input is AAABBBCC the output is ABC. I made two attempts to solve this problem.
Attempt #1
string = input()
for i in range (len(string) - 1):
if string[i] == string[i+1]:
string.replace(string[i],"")
print(string)
The above code returns the same string that is entered. If I enter AABBC it simply returns the same string. Not knowing what I was doing wrong, I tried another attempt.
Attempt #2
string = input()
new = []
for i in string:
new.append(i)
for i in range (len(new) - 3):
"""in the above line, if I set the range to (len(new)-2), it gives me an
error saying "list index out of range"."""
if new[i] == new[i+1]:
new.pop(i)
print(new)
The above code works for double repeating characters, but fails when there are 3 or more repeating characters. If I input AABBC it returns the list ['A','B','C'] which is perfectly fine, but with the input AAABBCC it returns ['A', 'A', 'B', 'C'].
Using Regex:
import re
s = ["AABBC", "AAABBBCC"]
for i in s:
print( re.sub(r"(.)\1+", r"\1", i) )
Or:
s = ["AABBC", "AAABBBCC"]
for i in s:
temp = []
for j in i:
if not temp:
temp.append(j)
else:
if temp[-1] != j:
temp.append(j)
print("".join(temp))
Output:
ABC
ABC
You can use itertools to group the characters like,
>>> import itertools
>>> [x[0] for x in itertools.groupby('AABBCC')]
['A', 'B', 'C']
string = 'AAABBBCC'
result = ''
for letter in string:
if len(result) > 0:
if result[-1] != letter: # [-1] is the last letter
result += letter
else:
result = letter # the first letter will always be included
print(result) # ABC
That is, only append the letter if it is not already at the end of the result string.
An easy to understand short solution :
mot = 'AABBBCC'
a = [mot[0]] + [mot[i] if mot[i]!=mot[i-1] else '' for i in range(1, len(mot))]
>>> ['A', '', 'B', '', '', 'C', '']
result = ''
for lettre in a:
result += lettre
result
>>> 'ABC'
You first create a list of the letters which respects a certain condition, then you convert this list into a string. This algorithm can be used for many different conditions.
Note that you don't need to import any new library.

Finding the number of occurrences of an element in every list inside a list

Here I have a nested list:
[['a,b,c,d,e’], [‘d,e,c,b,a’], [‘e,b,a,e,c’], ['a,c,b,e,d'], [‘e,d,c,b,a’], [‘a,c,b,d,e']]
I want to count the number of occurrences of "a" in the beginning of each list, so that will be 3. I also want to be able to do the same for each letter in the lists, so I know that the number of "b" in the second position of each list is 2.
This is perhaps easier to imagine if you put each list on top of each other and look down the column.
I want to be able to give the letter and position to scan through and receive the number of occurrences of that letter in that position. I hope I'm clear.
Here is what I have tried so far:
number=sum(x.count("a") for x in data[0][0])
In most simple case you may use the following:
data = [['a,b,c,d,e'], ['d,e,c,b,a'], ['e,b,a,e,c'], ['a,c,b,e,d'], ['e,d,c,b,a'], ['a,c,b,d,e']]
letter = 'b'
pos = 1
result = sum(1 for i in data if letter in i[0] and i[0].split(',')[pos] == letter)
print(result)
The output:
2
To get a more accurate and robust solution:
data = [['a,b,c,d,e'], ['d,e,c,b,a'], ['e,b,a,e,c'], ['a,c,b,e,d'], ['e,d,c,b,a'], ['a,c,b,d,e']]
letter = 'b'
pos = 1
result = 0
for i in data:
if letter in i[0]:
items = i[0].split(',')
if pos < len(items) and items[pos] == letter:
result += 1
print(result)
You should use a function instead of just loops. Because it's easier to give a parameter as index no and letter:
Here is solution with a function:
list_1= [['a,b,c,d,e'], ['d,e,c,b,a'], ['e,b,a,e,c'], ['a,c,b,e,d'], ['e,d,c,b,a'], ['a,c,b,d,e']]
def find(index,letter,list_1):
count=0
for item in list_1:
for subitem in item:
if subitem[index]==letter:
count+=1
return count
Test with 'a' with index 0:
print(find(0, 'a', list_1))
Output:
3
Test with 'b' with index 2:
print(find(2, 'b', list_1))
Output:
2

Substring of a string from a point where character starts to repeat

I am a sophomore CS student and I was practicing for interviews. In this problem, I am trying to print substring of an input parameter from the point where character starts to repeat. In other words, for a string like 'college', i want to print 'col', 'lege', 'colleg', 'e'.
The code implementation is shown below, but I wanted to ask about how to think of solving these types of problems, because they are really tricky and I wanted to know if there are certain algorithms to get hang of these dynamic problems quickly.
def checkrepeat(word):
i = 0
temp_w =''
check_char = {}
my_l = list()
while i < len(word)-1:
if word[i] not in check_char:
temp_w += word[i]
check_char[word[i]] = i
else:
my_l.append(temp_w)
temp_w=''
i = check_char[word[i]]
check_char.pop(word[i])
i+=1
return my_l
print(checkrepeat('college'))
This may not be best practice, but it seems functional:
def checkrepeat(word):
for letter in set(word):
split_word = []
copyword = word
while copyword.count(letter) > 1:
split_loc = copyword.rfind(letter)
split_word.insert(0, copyword[split_loc:])
copyword = copyword[:split_loc]
if len(split_word) > 0:
split_word.insert(0, copyword)
print split_word
checkrepeat('college')
set(word) gives us a list of the unique characters in word. We create an empty list (split_word) to maintain the separate sections of the word. count lets us count the number of times a letter appears in a word - we want to split our word until every substring contains the given letter only once.
We iterate over a copy of word (as we need to repeat the exercise for each duplicated letter, thus don't want to tamper with the original word variable), and add the end-section of the word from our letter onwards to the start of our list. We repeat this until copyword only has our letter in it once, at which point we exit the while loop. The remaining characters of copyword must be added to the start of our list, and we print the word given. This example prints:
['colleg', 'e']
['col', 'lege']
EDIT2 - The Working Solution, that's semi-elegant and almost Pythonic:
def split_on_recursion(your_string, repeat_character): #Recursive function
temp_string = ''
i = 0
for character in your_string:
if repeat_character == character:
if i==1:
return split_on_recursion(temp_string, repeat_character) #Recursion
else:
i += 1
temp_string += character
return temp_string
def split_on_repeat(your_string):
temp_dict = {}
your_dict = {}
your_end_strings = []
for char in set(your_string):
temp_dict[char] = your_string.count(char) #Notice temp_dict
for key in temp_dict:
if temp_dict[key] >= 2:
your_dict[key] = temp_dict[key] #Isolate only the characters which repeat
if your_dict != {}:
for key in your_dict:
pre_repeat_string = split_on_recursion(your_string,key)
post_repeat_string = your_string.replace(pre_repeat_string,'')
your_end_strings.append((pre_repeat_string, post_repeat_string))
else:
your_end_strings = [(your_string)]
return your_end_strings
Use:
>>> print(split_on_repeat('Innocent'))
[('In', 'nocent')]
>>> print(split_on_repeat('College'))
[('Colleg', 'e'), ('Col', 'lege')]
>>> print(split_on_repeat('Python.py'))
[('Python.p', 'y')]
>>> print(split_on_repeat('Systems'))
[('System', 's')]
As is the case, the solution is case-sensitive, but that is a minor issue.
To fathom the solution, though, you need to understand how recursions work. If you don't, this might not be a great example; I would recommend people to start with math problems.
But here's some quick context about how indexing works in python:
'Word'[:1] == 'Wo'
'Word'[-1] == 'd'
'Word'[:-1] == 'Wor'
This indexing works for every object that is indexable.
Solution derived from original #asongtoruin's idea:
import collections
def checkrepeat(word):
out = collections.defaultdict(int)
for c in word:
out[c] += 1
out = {k: [] for (k, v) in out.items() if v > 1}
for letter, split_word in out.iteritems():
copyword = word
while copyword.count(letter) > 1:
split_loc = copyword.rfind(letter)
split_word.insert(0, copyword[split_loc:])
copyword = copyword[:split_loc]
if len(split_word) > 0:
split_word.insert(0, copyword)
return out
for word in ["bloomberg", "college", "systems"]:
print checkrepeat(word)
Output:
{'b': ['bloom', 'berg'], 'o': ['blo', 'omberg']}
{'e': ['colleg', 'e'], 'l': ['col', 'lege']}
{'s': ['sy', 'stem', 's']}
def split_repeated(string):
visited = set()
res = []
for i, c in enumerate(string):
if c in visited: res.append([string[0:i], string[i:]])
visited.add(c)
return res
Output:
split_repeated("college")
#=> [['col', 'lege'], ['colleg', 'e']]
split_repeated("hello world")
#=> [['hel', 'lo world'], ['hello w', 'orld'], ['hello wor', 'ld']]
If you need to split a string only when you meet repeated letter first time:
def split_repeated_unique(string):
visited = set()
shown = set()
res = []
for i, c in enumerate(string):
if c in visited:
if c not in shown:
res.append([string[0:i], string[i:]])
shown.add(c)
else:
visited.add(c)
return res
And the key difference is following:
split_repeated("Hello, Dolly")
#=> [('Hel', 'lo, Dolly'), ('Hello, D', 'olly'), ('Hello, Do', 'lly'), ('Hello, Dol', 'ly')]
split_repeated_unique("Hello, Dolly")
#=> [['Hel', 'lo, Dolly'], ['Hello, D', 'olly']]

Categories

Resources