Find the Letters Occurring Odd Number of Times - python

I came across a funny question, I am wondering whether we can solve it.
The Background
In time complexity O(n), can we find the letters occurring odd number of times, Output a list contain letters and keep the order of letters consistent with original string.
In case of multiple options to choose from, take the last occurence as the unpaired character.
Here is an example:
# note we should keep the order of letters
findodd('Hello World') == ["H", "e", " ", "W", "r", "l", "d"] # it is good
findodd('Hello World') == ["H", "l", " ", "W", "r", "e", "d"] # it is wrong
My attempt
def findodd(s):
hash_map = {}
# This step is a bit strange. I will show an example:
# If I have a string 'abc', I will convert string to list = ['a','b','c'].
# Just because we can not use dict.get(a) to lookup dict. However, dict.get('a') works well.
s = list(s)
res = []
for i in range(len(s)):
if hash_map.get(s[i]) == 1:
hash_map[s[i]] = 0
res.remove(s[i])
else:
hash_map[s[i]] = 1
res.append(s[i])
return res
findodd('Hello World')
Out:
["H", "e", " ", "W", "r", "l", "d"]
However, since I use list.remove, the time complexity is above O(n) in my solution.
My Question:
Can anyone give some advice about O(n) solution?
If I don't wanna use s = list(s), how to iterate over a string 'abc' to lookup the value of key = 'a' in a dict? dict.get('a') works but dict.get(a) won't work.
Source
Here are 2 webpage I watched, however they did not take the order of letter into account and did not provide O(n) solution.
find even time number, stack overflow
find odd time number, geeks for geeks

Python 3.7 up has dictionary keys input ordered. Use collection.OrderedDict for lower python versions.
Go through your word, add letter do dict if not in, else delete key from dict.
Solution is the dict.keys() collection:
t = "Hello World"
d = {}
for c in t:
if c in d: # even time occurences: delete key
del d[c]
else:
d[c] = None # odd time occurence: add key
print(d.keys())
Output:
dict_keys(['H', 'e', ' ', 'W', 'r', 'l', 'd'])
Its O(n) because you touch each letter in your input exactly once - lookup into dict is O(1).
There is some overhead by key adding/deleting. If that bothers you, use a counter instead and filter the key() collection for those that are odd - this will make it O(2*n) - 2 is constant so still O(n).

Here is an attempt (keys are ordered in python 3.6 dict):
from collections import defaultdict
def find_odd(s):
counter = defaultdict(int)
for x in s:
counter[x] += 1
return [l for l, c in counter.items() if c%2 != 0]
the complexity of this algo is less than 2n, which is O(n)!
Example
>>> s = "hello world"
>>> find_odd(s)
['h', 'e', 'l', ' ', 'w', 'r', 'd']

You could use the hash map to store the index at which a character occurs, and toggle it when it already has a value.
And then you just iterate the string again and only keep those letters that occur at the index that you have in the hash map:
from collections import defaultdict
def findodd(s):
hash_map = defaultdict(int)
for i, c in enumerate(s):
hash_map[c] = 0 if hash_map[c] else i+1
return [c for i, c in enumerate(s) if hash_map[c] == i+1]

My solution from scratch
It actually uses the feature that a dict in Python 3.6 is key-ordered.
def odd_one_out(s):
hash_map = {}
# reverse the original string to capture the last occurance
s = list(reversed(s))
res = []
for i in range(len(s)):
if hash_map.get(s[i]):
hash_map[s[i]] += 1
else:
hash_map[s[i]] = 1
for k,v in hash_map.items():
if v % 2 != 0:
res.append(k)
return res[::-1]
Crazy super short solution
#from user FArekkusu on Codewars
from collections import Counter
def find_odd(s):
d = Counter(reversed(s))
return [x for x in d if d[x] % 2][::-1]

Using Counter from collections will give you an O(n) solution. And since the Counter object is a dictionary (which keeps the occurrence order), your result can simply be a filter on the counts:
from collections import Counter
text = 'Hello World'
oddLetters = [ char for char,count in Counter(text).items() if count&1 ]
print(oddLetters) # ['H', 'e', 'l', ' ', 'W', 'r', 'd']

Related

How to do slicing in strings in python?

I am trying to do slicing in string "abcdeeefghij", here I want the slicing in such a way that whatever input I use, i divide the output in the format of a list (such that in one list element no alphabets repeat).
In this case [abcde,e,efghij].
Another example is if input is "aaabcdefghiii". Here the expected output is [a,a,acbdefghi,i,i].
Also amongst the list if I want to find the highest len character i tried the below logic:
max_str = max(len(sub_strings[0]),len(sub_strings[1]),len(sub_strings[2]))
print(max_str) #output - 6
which will yield 6 as the output, but i presume this logic is not a generic one: Can someone suggest a generic logic to print the length of the maximum string.
Here is how:
s = "abcdeeefghij"
l = ['']
for c in s: # For character in s
if c in l[-1]: # If the character is already in the last string in l
l.append('') # Add a new string to l
l[-1] += c # Add the character to either the last string, either new, or old
print(l)
Output:
['abcde', 'e', 'efghij']
Use a regular expression:
import re
rx = re.compile(r'(\w)\1+')
strings = ['abcdeeefghij', 'aaabcdefghiii']
lst = [[part for part in rx.split(item) if part] for item in strings]
print(lst)
Which yields
[['abcd', 'e', 'fghij'], ['a', 'bcdefgh', 'i']]
You would loop over the characters in the input and start a new string if there is an existing match, otherwise join them onto the last string in the output list.
input_ = "aaabcdefghiii"
output = []
for char in input_:
if not output or char in output[-1]:
output.append("")
output[-1] += char
print(output)
To avoid repetition of alphabet within a list element repeat, you can greedily track what are the words that are already in the current list. Append the word to your answer once you detected a repeating alphabet.
from collections import defaultdict
s = input()
ans = []
d = defaultdict(int)
cur = ""
for i in s:
if d[i]:
ans.append(cur)
cur = i # start again since there is repeatition
d = defaultdict(int)
d[i] = 1
else:
cur += i #append to cur since no repetition yet
d[i] = 1
if cur: # handlign the last part
ans.append(cur)
print(ans)
An input of aaabcdefghiii produces ['a', 'a', 'abcdefghi', 'i', 'i'] as expected.

Is it possible to use the Counter function to see the ocurrences of elements in a list within a bigger list?

I am asked to make code that counts the ocurrence of each vowel in a single inputed word. Case insensitive.
So I basically want to count the ocurrence of different elements within a list. They way I thought of this is to create a list. vowels=( "a" ,"e" ,"i" ,"o" ,"u" )
Then I input the word, lowering it, etc.
from collections import Counter
x = input()
y = x.lower()
z = list(y)
Then I want to use counter so it can count all of the vowels at once.
C = z.Counter(vowels)
print(C)
But when I run the software it shows me
AttributeError: 'list' object has no attribute 'Counter'
So what I am doing wrong? Or can you just not use counter the same way that you use count?
(I already solved the excercise using count but I'm trying to find a elegant more concise solution.)
This is the whole code I'm trying to make work:
from collections import Counter
x = input()
y = x.lower()
z = list(y)
vowels=[ "a" ,"e" ,"i" ,"o" ,"u" ]
C = z.Counter(vowels)
print(C)
Counter is not an attribute nor a method of list.
Try this instead:
vowels = ("a", "e", "i", "o", "u")
x = input("Enter a word:") # input: aeiai
y = x.lower()
vowels_counter = {k: v for k, v in Counter(y).items() if k in vowels}
print(vowels_counter) # output: {'a': 2, 'e': 1, 'i': 2}

How do i remove a string from a list if it DOES NOT contain certain characters in Python

I am working on a list filter. This is as far as I've gone. I would like to remove every string that doesn't contain H, L OR C. So far this is my attemp
input_list = input("Enter The Results(leave a space after each one):").split(' ')
for i in input_list:
if 'H'not in i or 'L' not in i or 'C' not in i:
Use this pythonic code
input_list = input("Enter The Results(leave a space after each one):").split(' ') # this is the input source
after_removed = [a for a in input_list if ('H' not in a and 'L' not in a and 'C' not in a)] # this is the after removed 'H', 'L', and 'C' from the input_list
Using list comprehension, you can make python easier and faster
If you don't believe, just try it for yourself :D
For clarity, you can use a function
def contains_invalid_character(my_string):
return 'H' in my_string or 'L' in my_string or 'C' in my_string
# To be more pythonic, you can use the following
# return next((True for letter in ("H", "L", "C") if letter in my_string), False)
results = []
for i in input_list:
if not contains_invalid_character(i):
results.append(i)
# Or to be more pythonic
# results = [i for i in input_list if not contains_invalid_character(i)]

How to iterate through multiple items using split() and check for vowels using for loops?

I have the following homework problem:
Write a function countVowels() that takes a string as a parameter and prints the number of occurrences of vowels in the string.
>>> countVowels('It was the best of times; it was the worst of times.')
a, e, i, o, and u appear, respectively, 2, 5, 4, 3, 0 times.
>>> countVowels('All for one, and one for all!')
a, e, i, o, and u appear, respectively, 3, 2, 0, 4, 0 times.
>>>
I've created a function that with the intention of taking a string like str1 = "bananas are tasty", splitting the string and assigning it to a variable so it becomes str1_array = ['bananas', 'are', 'tasty'], then using for loops to iterate over str1_array, count the vowels a, e, i, o, and u respectively, and include the amount of each vowels in the ending print() statement.
I've run into a problem where the function will only count for "a" in the first item in str1_array, and it will repeat that amount in my ending print statement.
i.e for str1_array = ['bananas', 'are', 'tasty'] I get the print statement A, E, I, O, and U appear, respectively, 3 3 3 3 and 3 times.
def countVowels(str1):
str1_array = str1.split(" ")
vA = ["A", "a"]
vE = ["E", "e"]
vI = ["I", "i"]
vO = ["O", "o"]
vU = ["U", "u"]
vA_count = 0
for vA in str1_array:
vA_count = vA_count + 1
vE_count = 0
for vE in str1_array:
vE_count = vE_count + 1
vI_count = 0
for vI in str1_array:
vI_count = vI_count + 1
vO_count = 0
for vO in str1_array:
vO_count = vO_count + 1
vU_count = 0
for vU in str1_array:
vU_count = vU_count + 1
print("A, E, I, O, and U appear, respectively, ", vA_count, vE_count, vI_count, vO_count, "and", vU_count, "times.")
Take a look at str.count(). You can simply do:
def countVowels(s):
vowels = ['a', 'e', 'i', 'o', 'u']
# Save the string in lower case to also match upper case instances
lower_s = s.lower()
# Possible improvement: generate this message from vowels list
msg = 'a, e, i, o, and u appear, respectively'
for v in vowels:
msg += ', ' + str(lower_s.count(v))
msg += ' times.'
print(msg)
This solution is, admittedly, not great, since it iterates over the whole string once per vowel. You could maybe improve it by iterating over the string just once and counting how many time any character (not only vowels) shows up. You can then simply print the values you are interested in:
from collections import defaultdict
def countVowels(s):
vowels = ['a', 'e', 'i', 'o', 'u']
lower_s = s.lower()
# Create a dictionary of int values to store the number of appearances of a
# letter
results = defaultdict(int)
for c in lower_s:
results[c] += 1
msg = 'a, e, i, o, and u appear, respectively'
for v in vowels:
msg += ', ' + str(results[v])
msg += ' times.'
print(msg)
Let's first spot problems in your code:
First, you split the stirng: str1_array = ['bananas', 'are', 'tasty']. Then you define your vowels as e.g. vA = ["A", "a"]. And then, you are looping to count with:
vA_count = 0
for vA in str1_array:
vA_count = vA_count + 1
Actually, that last part doesn't do that at all. If you do
for vA in str1_array:
print (vA)
You will get:
'bananas'
'are'
'tasty'
What happens in that the variable vA previously defined as ["A", "a"] is overwritten and define successively as the 3 words in str1_array.
Aside from this, counters method are already implemented, there is no need to reprogram those. For instance, you could do:
from collections import Counter
c = Counter("bananas are tasy".lower())
This will lower the string, meaning the upper case letters get turned into lower case letters; and then it will create a counter object. Then you can access the number of vowels as:
IN: c['a']
OUT: 5
And thus with a loop:
vowels = ['a', 'e', 'i', 'o', 'u']
for v in vowels:
print c[v]

Sub-dictionary erroneously repeated throughout dictionary?

I'm trying to store in a dictionary the number of times a given letter occurs after another given letter. For example, dictionary['a']['d'] would give me the number of times 'd' follows 'a' in short_list.
alphabet = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z']
short_list = ['ford','hello','orange','apple']
# dictionary to keep track of how often a given letter occurs
tally = {}
for a in alphabet:
tally[a] = 0
# dictionary to keep track of how often a given letter occurs after a given letter
# e.g. how many times does 'd' follow 'a' -- master_dict['a']['d']
master_dict = {}
for a in alphabet:
master_dict[a] = tally
def precedingLetter(letter,word):
if word.index(letter) == 0:
return
else:
return word[word.index(letter)-1]
for a in alphabet:
for word in short_list:
for b in alphabet:
if precedingLetter(b,word) == a:
master_dict[a][b] += 1
However, the entries for all of the letters (the keys) in master_dict are all the same. I can't think of another way to properly tally each letter's occurrence after another letter. Can anyone offer some insight here?
If the sub-dicts are all supposed to be updated independently after creation, you need to shallow copy them. Easiest/fastest way is with .copy():
for a in alphabet:
master_dict[a] = tally.copy()
The other approach is to initialize the dict lazily. The easiest way to do that is with defaultdict:
from collections import defaultdict
masterdict = defaultdict(lambda: defaultdict(int))
# or
from collections import Counter, defaultdict
masterdict = defaultdict(Counter)
No need to pre-create empty tallies or populate masterdict at all, and this avoids creating dicts when the letter never occurs. If you access masterdict[a] for an a that doesn't yet exist, it creates a defaultdict(int) value for it automatically. When masterdict[a][b] is accessed and doesn't exist, the count is initialized to 0 automatically.
In Addition to the first answer it could be handy to perform your search the other way around. So instead of looking for each possible pair of letters, you could iterate just over the words.
In combination with the defaultdict this could simplify the process. As an example:
from collections import defaultdict
short_list = ['ford','hello','orange','apple']
master_dict = defaultdict(lambda: defaultdict(int))
for word in short_list:
for i in range(0,len(word)-1):
master_dict[word[i]][word[i+1]] += 1
Now master_dict contains all occured letter combinations while it returns zero for all other ones. A few examples below:
print(master_dict["f"]["o"]) # ==> 1
print(master_dict["o"]["r"]) # ==> 2
print(master_dict["a"]["a"]) # ==> 0
The problem you ask about is that the master_dict[a] = tally is only assigning the same object another name, so updating it through any of the references updates them all. You could fix that by making a copy of it each time by using master_dict[a] = tally.copy() as already pointed out in #ShadowRanger's answer.
As #ShadowRanger goes on to point out, it would also be considerably less wasteful to make your master_dict a defaultdict(lambda: defaultdict(int)) because doing so would only allocate and initialize counts for the combinations that actually encountered rather than all possible 2 letter permutations (if it was used properly).
To give you a concert idea of the savings, consider that there are only 15 unique letter pairs in your sample short_list of words, yet the exhaustive approach would still create and initialize 26 placeholders in 26 dictionaries for all 676 the possible counts.
It also occurs to me that you really don't need a two-level dictionary at all to accomplish what you want since the same thing could be done with a single dictionary which had keys comprised of tuples of pairs of characters.
Beyond that, another important improvement, as pointed out in #AdmPicard's answer, is that your approach of iterating through all possible permutations and seeing if any pairs of them are in each word via the precedingLetter() function is significantly more time consuming than it would be if you just iterated over all the successive pairs of letters that actually occurred in each one of them.
So, putting all this advice together would result in something like the following:
from collections import defaultdict
from string import ascii_lowercase
alphabet = set(ascii_lowercase)
short_list = ['ford','hello','orange','apple']
# dictionary to keep track of how often a letter pair occurred after one other.
# e.g. how many times 'd' followed an 'a' -> master_dict[('a','d')]
master_dict = defaultdict(int)
try:
from itertools import izip
except ImportError: # Python 3
izip = zip
def pairwise(iterable):
"s -> (s0,s1), (s1,s2), (s2, s3), ..."
a, b = iter(iterable), iter(iterable) # 2 independent iterators
next(b, None) # advance the 2nd one
return izip(a, b)
for word in short_list:
for (ch1,ch2) in pairwise(word.lower()):
if ch1 in alphabet and ch2 in alphabet:
master_dict[(ch1,ch2)] += 1
# display results
unique_pairs = 0
for (ch1,ch2) in sorted(master_dict):
print('({},{}): {}'.format(ch1, ch2, master_dict[(ch1,ch2)]))
unique_pairs += 1
print('A total of {} different letter pairs occurred in'.format(unique_pairs))
print('the words: {}'.format(', '.join(repr(word) for word in short_list)))
Which produces this output from the short_list:
(a,n): 1
(a,p): 1
(e,l): 1
(f,o): 1
(g,e): 1
(h,e): 1
(l,e): 1
(l,l): 1
(l,o): 1
(n,g): 1
(o,r): 2
(p,l): 1
(p,p): 1
(r,a): 1
(r,d): 1
A total of 15 different letter pairs occurred in
the words: 'ford', 'hello', 'orange', 'apple'

Categories

Resources