find first not repeating character - python

Problem:
Given a string s consisting of small English letters, find and return the first instance of a non-repeating character in it. If there is no such character, return '_'.
Example
For s = "abacabad", the output should be
firstNotRepeatingCharacter(s) = 'c'.
This is what I have so far, but it's too slow. How can I make the run time faster? Thanks.
def firstNotRepeatingCharacter(s):
char = set(s)
for i in range(len(s)):
if s.count(s[i]) == 1:
return s[i]
return "_"

You could use collections.Counter to count the characters in linear time, and then filter the result in conjunction with next, like this:
from collections import Counter
def firstNotRepeatingCharacter(s):
counts = Counter(s)
return next((ch for ch in s if counts[ch] < 2), "_")
print(firstNotRepeatingCharacter("abacabad"))
Output
c
Or simply use a dictionary (no imports needed):
counts = {}
for ch in s:
counts[ch] = counts.get(ch, 0) + 1
return next((ch for ch in s if counts[ch] < 2), "_")
Both approaches are linear in the length of the input string, your current approach is O(k*s) where k is the number of unique characters.

Related

Remove string character after run of n characters in string

Suppose you have a given string and an integer, n. Every time a character appears in the string more than n times in a row, you want to remove some of the characters so that it only appears n times in a row. For example, for the case n = 2, we would want the string 'aaabccdddd' to become 'aabccdd'. I have written this crude function that compiles without errors but doesn't quite get me what I want:
def strcut(string, n):
for i in range(len(string)):
for j in range(n):
if i + j < len(string)-(n-1):
if string[i] == string[i+j]:
beg = string[:i]
ends = string[i+1:]
string = beg + ends
print(string)
These are the outputs for strcut('aaabccdddd', n):
n
output
expected
1
'abcdd'
'abcd'
2
'acdd'
'aabccdd'
3
'acddd'
'aaabccddd'
I am new to python but I am pretty sure that my error is in line 3, 4 or 5 of my function. Does anyone have any suggestions or know of any methods that would make this easier?
This may not answer why your code does not work, but here's an alternate solution using regex:
import re
def strcut(string, n):
return re.sub(fr"(.)\1{{{n-1},}}", r"\1"*n, string)
How it works: First, the pattern formatted is "(.)\1{n-1,}". If n=3 then the pattern becomes "(.)\1{2,}"
(.) is a capture group that matches any single character
\1 matches the first capture group
{2,} matches the previous token 2 or more times
The replacement string is the first capture group repeated n times
For example: str = "aaaab" and n = 3. The first "a" is the capture group (.). The next 3 "aaa" matches \1{2,} - in this example a{2,}. So the whole thing matches "a" + "aaa" = "aaaa". That is replaced with "aaa".
regex101 can explain it better than me.
you can implement a stack data structure.
Idea is you add new character in stack, check if it is same as previous one or not in stack and yes then increase counter and check if counter is in limit or not if yes then add it into stack else not. if new character is not same as previous one then add that character in stack and set counter to 1
# your code goes here
def func(string, n):
stack = []
counter = None
for i in string:
if not stack:
counter = 1
stack.append(i)
elif stack[-1]==i:
if counter+1<=n:
stack.append(i)
counter+=1
elif stack[-1]!=i:
stack.append(i)
counter = 1
return ''.join(stack)
print(func('aaabbcdaaacccdsdsccddssse', 2)=='aabbcdaaccdsdsccddsse')
print(func('aaabccdddd',1 )=='abcd')
print(func('aaabccdddd',2 )=='aabccdd')
print(func('aaabccdddd',3 )=='aaabccddd')
output
True
True
True
True
The method I would use is creating a new empty string at the start of the function and then everytime you exceed the number of characters in the input string you just not insert them in the output string, this is computationally efficient because it is O(n) :
def strcut(string,n) :
new_string = ""
first_c, s = string[0], 0
for c in string :
if c != first_c :
first_c, s= c, 0
s += 1
if s > n : continue
else : new_string += c
return new_string
print(strcut("aabcaaabbba",2)) # output : #aabcaabba
Simply, to anwer the question
appears in the string more than n times in a row
the following code is small and simple, and will work fine :-)
def strcut(string: str, n: int) -> str:
tmp = "*" * (n+1)
for char in string:
if tmp[len(tmp) - n:] != char * n:
tmp += char
print(tmp[n+1:])
strcut("aaabccdddd", 1)
strcut("aaabccdddd", 2)
strcut("aaabccdddd", 3)
Output:
abcd
aabccdd
aaabccddd
Notes:
The character "*" in the line tmp = "*"*n+string[0:1] can be any character that is not in the string, it's just a placeholder to handle the start case when there are no characters.
The print(tmp[n:]) line simply removes the "*" characters added in the beginning.
You don't need nested loops. Keep track of the current character and its count. include characters when the count is less or equal to n, reset the current character and count when it changes.
def strcut(s,n):
result = '' # resulting string
char,count = '',0 # initial character and count
for c in s: # only loop once on the characters
if c == char: count += 1 # increase count
else: char,count = c,1 # reset character/count
if count<=n: result += c # include character if count is ok
return result
Just to give some ideas, this is a different approach. I didn't like how n was iterating each time even if I was on i=3 and n=2, I still jump to i=4 even though I already checked that character while going through n. And since you are checking the next n characters in the string, you method doesn't fit with keeping the strings in order. Here is a rough method that I find easier to read.
def strcut(string, n):
for i in range(len(string)-1,0,-1): # I go backwards assuming you want to keep the front characters
if string.count(string[i]) > n:
string = remove(string,i)
print(string)
def remove(string, i):
if i > len(string):
return string[:i]
return string[:i] + string[i+1:]
strcut('aaabccdddd',2)

Shifting all the alphabets of a string by a certain step

input: ['baNaNa', 7] # string and step size
required output : 'utGtGt' # every character of string shifted backwards by step size
import ast
in_string = input()
lis = ast.literal_eval(in_string)
st = lis[0]
step = lis[1]
alphabets = 'abcdefghijklmnopqrstuvwxyz'
password = ''
for letter in st:
if letter in alphabets:
index_val = alphabets.index(letter) - (step)
password += alphabets[index_val]
print(password)
Output i am getting is 'utgtgt'. I want 'utGtGt'. Help on this would be appreciated a lot.
The string module has methods to create a transformation dictionary and a translate method to do exactly what you want:
st = "baNaNa"
step = 7
alphabets = 'abcdefghijklmnopqrstuvwxyz'
alph2 = alphabets.upper()
# lower case translation table
t = str.maketrans(alphabets, alphabets[-step:]+alphabets[:-step])
# upper case translation table
t2 = str.maketrans(alph2, alph2[-step:]+alph2[:-step])
# merge both translation tables
t.update(t2)
print(st.translate(t))
Output:
utGtGt
You give it the original string and an equal long string to map letters to and apply that dictionary using str.translate(dictionary).
The sliced strings equate to:
print(alphabets)
print(alphabets[-step:]+alphabets[:-step])
abcdefghijklmnopqrstuvwxyz
tuvwxyzabcdefghijklmnopqrs
which is what your step is for.
See Understanding slice notation if you never saw string slicing in use.
by processing each charater and checking it's cardinal no and making calculation accordingly help you to reach the result
def func(string, size):
if size%26==0:
size=26
else:
size=size%26
new_str = ''
for char in string:
if char.isupper():
if ord(char)-size<ord('A'):
new_str+=chr(ord(char)-size+26)
else:
new_str+=chr(ord(char)-size)
elif char.islower():
if ord(char)-size<ord('a'):
new_str+=chr(ord(char)-size+26)
else:
new_str+=chr(ord(char)-size)
return new_str
res =func('baNaNa', 7)
print(res)
# output utGtGt
Here's a simple solution that makes use of the % modulo operator to shift letters backwards.
It basically collects all of the letters in a reverse index lookup dictionary, so looking up letter positions is O(1) instead of using list.index(), which is linear O(N) lookups.
Then it goes through each letter and calculates the shift value from the letter index e.g. for the letter a with a shift value of 7, the calculation will be (0 - 7) % 26, which will give 19, the position of u.
Then once you have this shift value, convert it to uppercase or lowercase depending on the case of the original letter.
At the end we just str.join() the result list into one string. This is more efficient than doing += to join strings.
Demo:
from string import ascii_lowercase
def letter_backwards_shift(word, shift):
letter_lookups = {letter: idx for idx, letter in enumerate(ascii_lowercase)}
alphabet = list(letter_lookups)
result = []
for letter in word:
idx = letter_lookups[letter.lower()]
shifted_letter = alphabet[(idx - shift) % len(alphabet)]
if letter.isupper():
result.append(shifted_letter.upper())
else:
result.append(shifted_letter.lower())
return ''.join(result)
Output:
>>> letter_backwards_shift('baNaNa', 7)
utGtGt
I would probably go with #Patrick Artner's pythonic solution. I just showed the above implementation as a learning exercise :-).

Longest Common Prefix with Python

I am trying to figure out an easy leetcode question and I do not know why my answer does not work.
Problem:
Write a function to find the longest common prefix string amongst an array of strings.
If there is no common prefix, return an empty string "".
My Code:
shortest=min(strs,key=len)
strs.remove(shortest)
common=shortest
for i in range(1,len(shortest)):
comparisons=[common in str for str in strs]
if all(comparisons):
print(common)
break
else:
common=common[:-i]
The above trial does not work when the length of the strings in the list are same but works for other cases.
Thank you very much.
Friend, try to make it as 'pythonic' as possible. just like you would in real life.
in real life what do you see? you see words and maybe look for the shortest word and compare it to all the others. Okay, let's do that, let's find the longest word and then the shortest.
First we create an empty string, there the characters that are the same in both strings will be stored
prefix = ''
#'key=len' is a necesary parameter, because otherwise, it would look for the chain with the highest value in numerical terms, and it is not always the shortest in terms of length (it is not exactly like that but so that it is understood haha)
max_sentense = max(strings, key=len)
min_sentense = min(strings, key=len)
Okay, now what would we do in real life?
loop both one by one from the beginning, is it possible in python? yes. with zip()
for i, o in zip(max_sentense, min_sentense):
the 'i' will go through the longest string and the 'o' will go through the shortest string.
ok, now it's easy, we just have to stop going through them when 'i' and 'o' are not the same, that is, they are not the same character.
for i, o in zip(max_sentense, min_sentense):
if i == o:
prefix += i
else:
break
full code:
prefix = ''
max_sentense = max(strings, key=len)
min_sentense = min(strings, key=len)
for i, o in zip(max_sentense, min_sentense):
if i == o:
prefix += i
else:
break
print(prefix)
It's quickest to compare the first characters of all the words, and then the second characters, etc. Otherwise you're doing unnecessary comparisons.
def longestCommonPrefix(self, strs):
prefix = ''
for char in zip(*strs):
if len(set(char)) == 1:
prefix += char[0]
else:
break
return prefix
You can do this fairly efficiently in a single iteration over the list. I've made this a little verbose so that it's easier to understand.
import itertools
def get_longest_common_prefix(strs):
longest_common_prefix = strs.pop()
for string in strs:
pairs = zip(longest_common_prefix, string)
longest_common_prefix_pairs = itertools.takewhile(lambda pair: pair[0] == pair[1], pairs)
longest_common_prefix = (x[0] for x in longest_common_prefix_pairs)
return ''.join(longest_common_prefix)
In your code you cross check with the shortest string which can be one of the shortest strings if multiple same length strings are present. Furthermore the shortest might not have the longest common prefix.
This is not a very clean code but it does the job
common, max_cnt = "", 0
for i, s1 in enumerate(strs[:-2]):
for s2 in strs[i+1:]:
for j in range(1, min(len(s1), len(s2))+1):
if s1[:j] == s2[:j]:
if j > max_cnt:
max_cnt = j
common = s1[:j]
This function takes any number of positional arguments.
If no argument is given, it returns "".
If just one argument is given, it is returned.
from itertools import zip_longest
def common_prefix(*strings) -> str:
length = len(strings)
if not length:
return ""
if length == 1:
return strings[0]
# as pointed in another answer, 'key=len' is necessary because otherwise
# the strings will be compared according to lexicographical order,
# instead of their length
shortest = min(strings, key=len)
longest = max(strings, key=len)
# we use zip_longest instead of zip because `shortest` might be a substring
# of the longest; that is, the longest common prefix might be `shortest`
# itself
for i, chars in enumerate(zip_longest(shortest, longest)):
if chars[0] != chars[1]:
return shortest[:i]
# if it didn't return by now, the first character is already different,
# so the longest common prefix is empty
return ""
if __name__ == "__main__":
for args in [
("amigo", "amiga", "amizade"),
tuple(),
("teste",),
("amigo", "amiga", "amizade", "atm"),
]:
print(*args, sep=", ", end=": ")
print(common_prefix(*args))
Simple python code
def longestCommonPrefix(self, arr):
arr.sort(reverse = False)
print arr
n= len(arr)
str1 = arr[0]
str2 = arr[n-1]
n1 = len(str1)
n2 = len(str2)
result = ""
j = 0
i = 0
while(i <= n1 - 1 and j <= n2 - 1):
if (str1[i] != str2[j]):
break
result += (str1[i])
i += 1
j += 1
return (result)

How to find the largest repeating substring given character in Python?

Given some string say 'aabaaab', how would I go about finding the largest substring of a. So it should return 'aaa'. Any help would be greatly appreciated.
def sub_string(s):
best_run = 0
current_run = 0
for char in s:
if char == 'a'
current_run += 1
else:
current_letter = char
return(best_run)
I have something like the one above. Not sure where I can fix it up.
not the most efficient, but a straightforward solution:
word = "aasfgaaassaasdsddaaaaaafff"
substr_count = 0
substr_counts = []
character = "f"
for i, letter in enumerate(word):
if (letter == character):
substr_count += 1
else:
substr_counts.append(substr_count)
substr_count = 0
if (i == len(word) - 1):
substr_counts.append(substr_count)
print(max(substr_counts))
If you want a short method using standard python tools (and avoid writing loops to reconstruct the string as you iterate), you can use regex to split the string by any non-a characters than get the max() according to len:
import re
test_string = 'aabaaab'
split_string_list = re.split( '[^a]', test_string )
longest_string_subset = max( split_string_list, key=len )
print( longest_string_subset )
The re library is for regex, the '[^a]' is a regex statement for any non-a character. Basically, the 'aabaaab' is being split into a list according to any matches on the regex statement, so that it becomes [ 'aa' 'aaa' '' ]. Then, the max() statement looks for the longest string based on len (aka length).
You can read more about functions like re.split() in the docs: https://docs.python.org/2/library/re.html

Most common character in a string

Write a function that takes a string consisting of alphabetic
characters as input argument and returns the most common character.
Ignore white spaces i.e. Do not count any white space as a character.
Note that capitalization does not matter here i.e. that a lower case
character is equal to a upper case character. In case of a tie between
certain characters return the last character that has the most count
This is the updated code
def most_common_character (input_str):
input_str = input_str.lower()
new_string = "".join(input_str.split())
print(new_string)
length = len(new_string)
print(length)
count = 1
j = 0
higher_count = 0
return_character = ""
for i in range(0, len(new_string)):
character = new_string[i]
while (length - 1):
if (character == new_string[j + 1]):
count += 1
j += 1
length -= 1
if (higher_count < count):
higher_count = count
return (character)
#Main Program
input_str = input("Enter a string: ")
result = most_common_character(input_str)
print(result)
The above is my code. I am getting an error of string index out of bound which I can't understand why. Also the code only checks the occurrence of first character I am confused about how to proceed to the next character and take the maximum count?
The error i get when I run my code:
> Your answer is NOT CORRECT Your code was tested with different inputs.
> For example when your function is called as shown below:
>
> most_common_character ('The cosmos is infinite')
>
> ############# Your function returns ############# e The returned variable type is: type 'str'
>
> ######### Correct return value should be ######## i The returned variable type is: type 'str'
>
> ####### Output of student print statements ###### thecosmosisinfinite 19
You can use a regex patter to search for all characters. \w matches any alphanumeric character and the underscore; this is equivalent to the set [a-zA-Z0-9_]. The + after [\w] means to match one or more repetitions.
Finally, you use Counter to total them and most_common(1) to get the top value. See below for the case of a tie.
from collections import Counter
import re
s = "Write a function that takes a string consisting of alphabetic characters as input argument and returns the most common character. Ignore white spaces i.e. Do not count any white space as a character. Note that capitalization does not matter here i.e. that a lower case character is equal to a upper case character. In case of a tie between certain characters return the last character that has the most count"
>>> Counter(c.lower() for c in re.findall(r"\w", s)).most_common(1)
[('t', 46)]
In the case of a tie, it is a little more tricky.
def top_character(some_string):
joined_characters = [c for c in re.findall(r"\w+", some_string.lower())]
d = Counter(joined_characters)
top_characters = [c for c, n in d.most_common() if n == max(d.values())]
if len(top_characters) == 1:
return top_characters[0]
reversed_characters = joined_characters[::-1]
for c in reversed_characters:
if c in top_characters:
return c
>>> top_character(s)
't'
>>> top_character('the the')
'e'
In the case of your code above and your sentence "The cosmos is infinite", you can see that 'i' occurs more frequently that 'e' (the output of your function):
>>> Counter(c.lower() for c in "".join(re.findall(r"[\w]+", 'The cosmos is infinite'))).most_common(3)
[('i', 4), ('s', 3), ('e', 2)]
You can see the issue in your code block:
for i in range(0, len(new_string)):
character = new_string[i]
...
return (character)
You are iterating through a sentence and assign that letter to the variable character, which is never reassigned elsewhere. The variable character will thus always return the last character in your string.
Actually your code is almost correct. You need to move count, j, length inside of your for i in range(0, len(new_string)) because you need to start over on each iteration and also if count is greater than higher_count, you need to save that charater as return_character and return it instead of character which is always last char of your string because of character = new_string[i].
I don't see why have you used j+1 and while length-1. After correcting them, it now covers tie situations aswell.
def most_common_character (input_str):
input_str = input_str.lower()
new_string = "".join(input_str.split())
higher_count = 0
return_character = ""
for i in range(0, len(new_string)):
count = 0
length = len(new_string)
j = 0
character = new_string[i]
while length > 0:
if (character == new_string[j]):
count += 1
j += 1
length -= 1
if (higher_count <= count):
higher_count = count
return_character = character
return (return_character)
If we ignore the "tie" requirement; collections.Counter() works:
from collections import Counter
from itertools import chain
def most_common_character(input_str):
return Counter(chain(*input_str.casefold().split())).most_common(1)[0][0]
Example:
>>> most_common_character('The cosmos is infinite')
'i'
>>> most_common_character('ab' * 3)
'a'
To return the last character that has the most count, we could use collections.OrderedDict:
from collections import Counter, OrderedDict
from itertools import chain
from operator import itemgetter
class OrderedCounter(Counter, OrderedDict):
pass
def most_common_character(input_str):
counter = OrderedCounter(chain(*input_str.casefold().split()))
return max(reversed(counter.items()), key=itemgetter(1))[0]
Example:
>>> most_common_character('ab' * 3)
'b'
Note: this solution assumes that max() returns the first character that has the most count (and therefore there is a reversed() call, to get the last one) and all characters are single Unicode codepoints. In general, you might want to use \X regular expression (supported by regex module), to extract "user-perceived characters" (eXtended grapheme cluster) from the Unicode string.

Categories

Resources