First recurring character problem in Python

First recurring character problem in Python - python

I'm trying to solve a problem that I have with a recurring character problem.
I'm a beginner in development so I'm trying to think of ways I can do this.
thisWord = input()
def firstChar(thisWord):
for i in range(len(thisWord)):
for j in range(i+1, len(thisWord)):
if thisWord[i] == thisWord[j]:
return thisWord[i]
print(firstChar(thisWord))
This is what I came up with. In plenty of use cases, the result is fine. The problem I found after some fiddling around is that with a word like "statistics", where the "t" is the first recurring letter rather than the "s" because of the distance between the letters, my code counts the "s" first and returns that as the result.
I've tried weird solutions like measuring the entire string first for each possible case, creating variables for string length, and then comparing it to another variable, but I'm just ending up with more errors than I can handle.
Thank you in advance.

So you want to find the first letter that recurs in your text, with "first" being determined by the recurrence, not the first occurrence of the letter? To illustrate that with your "statistics" example, the t is the first letter that recurs, but the s had its first occurrence before the first occurrence of the t. I understand that in such cases, it's the t you want, not the s.
If that's the case, then I think a set is what you want, since it allows you to keep track of letters you've already seen before:
thisword = "statistics"
set_of_letters = set()
for letter in thisword:
if letter not in set_of_letters:
set_of_letters.add(letter)
else:
firstchar = letter
break
print(firstchar)

Whenever you're looking at a certain character in the word, you should not check whether the character will occur again at all, but whether it has already occurred. The algorithmically optimal way would be to use a set to store and look up characters as you go, but it could just as well be done with your double loop. The second one should then become for j in range(i).

This is not an answer to your problem (one was already provided), but an advice for a better solution:
def firstChar(thisWord):
occurrences: dict[str, int] = {char: 0 for char in thisWord} # At the beginning all the characters occurred once
for char in thisWord:
occurrences[char] += 1 # You found this char
if (occurrences[char] == 2): # This was already found one time before
return char # So you return it as the first duplicate
This works as expected:
>>> firstChar("statistics")
't'
EDIT:
occurrences: dict[str, int] = {char: 0 for char in thisWord}
This line of code creates a dictionary with the chars from thisWord as keys and 0 as values, so that you can use it to count the occurrences starting from 0 (before finding a char its count is 0).

Related

Why does it recognize the second capital T as 0?

I'm trying to make a short program that will find all the capital letters in a single string. I got it to work for the first two capital letters but it won't return the correct position of the last capital letter. What did I do wrong?
def capital_indexes(n):
listOfUpperPlaces = []
for x in n:
print(x)
if x.isupper():
characterPlace = n.index(x)
print(characterPlace)
listOfUpperPlaces.append(characterPlace)
return listOfUpperPlaces
print(capital_indexes("TEsTo"))

That is because n.index(x) returns the first occurrence of x in the string n. Because "T" occurs multiple times, n.index(x) returns the first occurrence of "T"
You want to iterate through range(len(n), like
def capital_indexes(n):
listOfUpperPlaces = []
for x in range(len(n)):
print(n[x])
if n[x].isupper():
print(x)
listOfUpperPlaces.append(x)
return listOfUpperPlaces
print(capital_indexes("TEsTo"))

The issue is the call to n.index(x)
This is searching the string to find x, and its able to find a capital T right at the beginning of the string.
A better way to do this would be to use enumerate, which gives you both the index and the item at the same time.
Can't code very well from a phone, but something like:
for index, character in enumerate(n):
if character.isUpper():
list_of_upper_places.append(index)
This will handle duplicates correctly, and will also be faster, since you don't need to search through the string just to count which character you are currently checking. It will be easier to read for most python programmers too.

Understanding this solution to the longest substring without repetition problem in leetcode

So I'm trying to understand this solution for this problem. The goal here is to get the length of the longest substring from a string without having repeated characters.
How I understand it is that it goes character by character. Using the current index, it will subtract from the start position which is 0 because the the index initially starts 0. The addition of 1 is to compensate from starting at index 0.
If it encounters a duplicate character, it will shift the start position until no duplicates are found, this essentially separates the previous characters into a substring and starts at the position of the new substring with the duplicates, e.g. abcab => "abc" and "ab". It will continue until the length longest substring with no duplicates is found.
The code for the solution is as seen below:
class Solution(object):
def lengthOfLongestSubstring(self, s):
"""
:type s: str
:rtype: int
"""
used = {}
max_length = start = 0
for i,c in enumerate(s):
if c in used and start<=used[c]:
start = used[c]+1
else:
max_length = max(max_length,i-start+1)
used[c] = i
return max_length
What I don't understand is the start<=used[c] and used[c] = i part of this solution, what does it do? Can someone clarify with me?
EDIT: I understand that the dictionary is being used to keep track of the character count. I just don't understand the logic of it. Sorry, I should've clarified.
Thank you for reading.

If I understand correctly, your goal is to form a longest sub-string without repeating characters.
Algorithm Psuedocode:
Start with an empty string, with start as begin index of string. You want to extend the string until you get a duplicate character.
There are 2 possibilities for each character, either a character has been seen for the first time or character is already seen before. After each character we update bookmark used to keep track of last seen index.
a) If the character is not seen before, you can safely extend the current string.
Or
b) If the character was seen before, then we can only extend the string if it is not part of current string (start > used[c]). If it is part of the string ( start <= used[c]), you will need to update the sub-string's begin index start with index next to the last seen of current character as we don't want the characters to repeat, i.e. start = used[c] + 1. Since you are shortening the string in the latter case, maximal string won't be ending at this position.

How to change a single letter in input string

I'm newbie in Python so that I have a question. I want to change letter in word if the first letter appears more than once. Moreover I want to use input to get the word from user. I'll present the problem using an example:
word = 'restart'
After changes the word should be like this:
word = 'resta$t'
I was trying couple of ideas but always I got stuck. Is there any simple sollutions for this?
Thanks in advance.
EDIT: In response to Simas Joneliunas
It's not my homework. I'm just finished reading some basic Python tutorials and I found some questions that I couldn't solve on my own. My first thought was to separate word into a single letters and then to find out the place of the letter I want to replace by "$". I have wrote that code but I couldn't came up with sollution how to get to specific place and replace it.
word = 'restart'
how_many = {}
for x in word:
how_many=+1
else:
how_many=1
for y in how_many:
if how_many[y] > 0:
print(y,how_many[y])

Using str.replace:
s = "restart"
new_s = s[0] + s[1:].replace(s[0], "$")
Output:
'resta$t'

Try:
"".join([["$" if ch in word[:i] else ch for i, ch in enumerate(word)])
enumerate iterates through the string (i.e. a list of characters) and keeps a running index of the iteration
word[:i] checks the list of chars until the current index, i.e. previously appeared characters
"$" if ch in word[:i] else ch means replace the character at existing position with $ if it appears before others keep the character
"".join() joins the list of characters into a single string.

This is where the python console is handy and lets you experiment. Since you have to keep track of number of letters, for a good visual I would list the alphabet in a list. Then in the loop remove from the list the current letter. If letter does not exist in the list replace the letter with $.
So check if it exists first thing in the loop, if it exists, remove it, if it doesn’t exist replace it from example above.

Most Frequent Character - User Submitted String without Dictionaries or Counters

Currently, I am in the midst of writing a program that calculates all of the non white space characters in a user submitted string and then returns the most frequently used character. I cannot use collections, a counter, or the dictionary. Here is what I want to do:
Split the string so that white space is removed. Then count each character and return a value. I would have something to post here but everything I have attempted thus far has been met with critical failure. The closest I came was this program here:
strin=input('Enter a string: ')
fc=[]
nfc=0
for ch in strin:
i=0
j=0
while i<len(strin):
if ch.lower()==strin[i].lower():
j+=1
i+=1
if j>nfc and ch!=' ':
nfc=j
fc=ch
print('The most frequent character in string is: ', fc )
If you can fix this code or tell me a better way of doing it that meets the required criteria that would be helpful. And, before you say this has been done a hundred times on this forum please note I created an account specifically to ask this question. Yes there are a ton of questions like this but some that are reading from a text file or an existing string within the program. And an overwhelmingly large amount of these contain either a dictionary, counter, or collection which I cannot presently use in this chapter.

Just do it "the old way". Create a list (okay it's a collection, but a very basic one so shouldn't be a problem) of 26 zeroes and increase according to position. Compute max index at the same time.
strin="lazy cat dog whatever"
l=[0]*26
maxindex=-1
maxvalue=0
for c in strin.lower():
pos = ord(c)-ord('a')
if 0<=pos<=25:
l[pos]+=1
if l[pos]>maxvalue:
maxindex=pos
maxvalue = l[pos]
print("max count {} for letter {}".format(maxvalue,chr(maxindex+ord('a'))))
result:
max count 3 for letter a

As an alternative to Jean's solution (not using a list that allows for one-pass over the string), you could just use str.count here which does pretty much what you're trying to do:
strin = input("Enter a string: ").strip()
maxcount = float('-inf')
maxchar = ''
for char in strin:
c = strin.count(char) if not char.isspace() else 0
if c > maxcount:
maxcount = c
maxchar = char
print("Char {}, Count {}".format(maxchar, maxcount))
If lists are available, I'd use Jean's solution. He doesn't use a O(N) function N times :-)
P.s: you could compact this with one line if you use max:
max(((strin.count(i), i) for i in strin if not i.isspace()))

To keep track of several counts for different characters, you have to use a collection (even if it is a global namespace implemented as a dictionary in Python).
To print the most frequent non-space character while supporting arbitrary Unicode strings:
import sys
text = input("Enter a string (case is ignored)").casefold() # default caseless matching
# count non-space character frequencies
counter = [0] * (sys.maxunicode + 1)
for nonspace in map(ord, ''.join(text.split())):
counter[nonspace] += 1
# find the most common character
print(chr(max(range(len(counter)), key=counter.__getitem__)))
A similar list in Cython was the fastest way to find frequency of each character.

python: dictionary of words and wordforms

I have the following problem: I created a dictionary (german) with words and their corresponding lemma. exemple:
"Lagerbestände", "Lager-bestand"; "Wohnhäuser", "Wohn-haus"; "Bahnhof", "Bahn-hof"
I now have a text and I want to check for all word their lemmata. It can happen that it appears a word which is not in the dict, such as "Restbestände". But the lemma of "bestände", we already know it. So I want to take the first part of the word which is unknown in dicti and add this to the lemmatized second part and print this out (or return it).
Example: "Restbestände" --> "Rest-bestand". ("bestand" is taken from the lemma of "Lagerbestände")
I coded the following:
for limit in range(1, len(Word)):
for k, v in dicti.iteritems():
if re.search('[\w]*'+Word[limit:], k, re.IGNORECASE) != None:
if '-' in v:
tmp = v.find('-')
end = v[tmp:]
end = re.sub(ur'[-]',"", end)
Word = Word[:limit] + '-' + end `
But I got 2 problems:
At the end of the words, it is printed out every time "&#10". How can I avoid this?
The second part of the word is sometimes not correct - there must be a logical error.
However; how would you solve this?

At the end of the words, it is printed out every time "&#10". How can
I avoid this?
In must use UNICODE everywhere in your script. Everywhere, everywhere, everywhere.
Also, python RegEx functions accept flag re.UNICODE that you should always set. German letters are out of ASCII set, so RegEx can be sometimes confused, for instance when matching r'\w'

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

First recurring character problem in Python - python

Related

Why does it recognize the second capital T as 0?

Understanding this solution to the longest substring without repetition problem in leetcode

How to change a single letter in input string

Most Frequent Character - User Submitted String without Dictionaries or Counters

python: dictionary of words and wordforms

Categories

Resources