Why does it recognize the second capital T as 0? - python

I'm trying to make a short program that will find all the capital letters in a single string. I got it to work for the first two capital letters but it won't return the correct position of the last capital letter. What did I do wrong?
def capital_indexes(n):
listOfUpperPlaces = []
for x in n:
print(x)
if x.isupper():
characterPlace = n.index(x)
print(characterPlace)
listOfUpperPlaces.append(characterPlace)
return listOfUpperPlaces
print(capital_indexes("TEsTo"))

That is because n.index(x) returns the first occurrence of x in the string n. Because "T" occurs multiple times, n.index(x) returns the first occurrence of "T"
You want to iterate through range(len(n), like
def capital_indexes(n):
listOfUpperPlaces = []
for x in range(len(n)):
print(n[x])
if n[x].isupper():
print(x)
listOfUpperPlaces.append(x)
return listOfUpperPlaces
print(capital_indexes("TEsTo"))

The issue is the call to n.index(x)
This is searching the string to find x, and its able to find a capital T right at the beginning of the string.
A better way to do this would be to use enumerate, which gives you both the index and the item at the same time.
Can't code very well from a phone, but something like:
for index, character in enumerate(n):
if character.isUpper():
list_of_upper_places.append(index)
This will handle duplicates correctly, and will also be faster, since you don't need to search through the string just to count which character you are currently checking. It will be easier to read for most python programmers too.

Related

First recurring character problem in Python

I'm trying to solve a problem that I have with a recurring character problem.
I'm a beginner in development so I'm trying to think of ways I can do this.
thisWord = input()
def firstChar(thisWord):
for i in range(len(thisWord)):
for j in range(i+1, len(thisWord)):
if thisWord[i] == thisWord[j]:
return thisWord[i]
print(firstChar(thisWord))
This is what I came up with. In plenty of use cases, the result is fine. The problem I found after some fiddling around is that with a word like "statistics", where the "t" is the first recurring letter rather than the "s" because of the distance between the letters, my code counts the "s" first and returns that as the result.
I've tried weird solutions like measuring the entire string first for each possible case, creating variables for string length, and then comparing it to another variable, but I'm just ending up with more errors than I can handle.
Thank you in advance.
So you want to find the first letter that recurs in your text, with "first" being determined by the recurrence, not the first occurrence of the letter? To illustrate that with your "statistics" example, the t is the first letter that recurs, but the s had its first occurrence before the first occurrence of the t. I understand that in such cases, it's the t you want, not the s.
If that's the case, then I think a set is what you want, since it allows you to keep track of letters you've already seen before:
thisword = "statistics"
set_of_letters = set()
for letter in thisword:
if letter not in set_of_letters:
set_of_letters.add(letter)
else:
firstchar = letter
break
print(firstchar)
Whenever you're looking at a certain character in the word, you should not check whether the character will occur again at all, but whether it has already occurred. The algorithmically optimal way would be to use a set to store and look up characters as you go, but it could just as well be done with your double loop. The second one should then become for j in range(i).
This is not an answer to your problem (one was already provided), but an advice for a better solution:
def firstChar(thisWord):
occurrences: dict[str, int] = {char: 0 for char in thisWord} # At the beginning all the characters occurred once
for char in thisWord:
occurrences[char] += 1 # You found this char
if (occurrences[char] == 2): # This was already found one time before
return char # So you return it as the first duplicate
This works as expected:
>>> firstChar("statistics")
't'
EDIT:
occurrences: dict[str, int] = {char: 0 for char in thisWord}
This line of code creates a dictionary with the chars from thisWord as keys and 0 as values, so that you can use it to count the occurrences starting from 0 (before finding a char its count is 0).

How to change a single letter in input string

I'm newbie in Python so that I have a question. I want to change letter in word if the first letter appears more than once. Moreover I want to use input to get the word from user. I'll present the problem using an example:
word = 'restart'
After changes the word should be like this:
word = 'resta$t'
I was trying couple of ideas but always I got stuck. Is there any simple sollutions for this?
Thanks in advance.
EDIT: In response to Simas Joneliunas
It's not my homework. I'm just finished reading some basic Python tutorials and I found some questions that I couldn't solve on my own. My first thought was to separate word into a single letters and then to find out the place of the letter I want to replace by "$". I have wrote that code but I couldn't came up with sollution how to get to specific place and replace it.
word = 'restart'
how_many = {}
for x in word:
how_many=+1
else:
how_many=1
for y in how_many:
if how_many[y] > 0:
print(y,how_many[y])
Using str.replace:
s = "restart"
new_s = s[0] + s[1:].replace(s[0], "$")
Output:
'resta$t'
Try:
"".join([["$" if ch in word[:i] else ch for i, ch in enumerate(word)])
enumerate iterates through the string (i.e. a list of characters) and keeps a running index of the iteration
word[:i] checks the list of chars until the current index, i.e. previously appeared characters
"$" if ch in word[:i] else ch means replace the character at existing position with $ if it appears before others keep the character
"".join() joins the list of characters into a single string.
This is where the python console is handy and lets you experiment. Since you have to keep track of number of letters, for a good visual I would list the alphabet in a list. Then in the loop remove from the list the current letter. If letter does not exist in the list replace the letter with $.
So check if it exists first thing in the loop, if it exists, remove it, if it doesn’t exist replace it from example above.

How to use multiple 'if' statements nested inside an enumerator?

I have a massive string of letters all jumbled up, 1.2k lines long.
I'm trying to find a lowercase letter that has EXACTLY three capital letters on either side of it.
This is what I have so far
def scramble(sentence):
try:
for i,v in enumerate(sentence):
if v.islower():
if sentence[i-4].islower() and sentence[i+4].islower():
....
....
except IndexError:
print() #Trying to deal with the problem of reaching the end of the list
#This section is checking if
the fourth letters before
and after i are lowercase to ensure the central lower case letter has
exactly three upper case letters around it
But now I am stuck with the next step. What I would like to achieve is create a for-loop in range of (-3,4) and check that each of these letters is uppercase. If in fact there are three uppercase letters either side of the lowercase letter then print this out.
For example
for j in range(-3,4):
if j != 0:
#Some code to check if the letters in this range are uppercase
#if j != 0 is there because we already know it is lowercase
#because of the previous if v.islower(): statement.
If this doesn't make sense, this would be an example output if the code worked as expected
scramble("abcdEFGhIJKlmnop")
OUTPUT
EFGhIJK
One lowercase letter with three uppercase letters either side of it.
Here is a way to do it "Pythonically" without
regular expressions:
s = 'abcdEFGhIJKlmnop'
words = [s[i:i+7] for i in range(len(s) - 7) if s[i:i+3].isupper() and s[i+3].islower() and s[i+4:i+7].isupper()]
print(words)
And the output is:
['EFGhIJK']
And here is a way to do it with regular expressions,
which is, well, also Pythonic :-)
import re
words = re.findall(r'[A-Z]{3}[a-z][A-Z]{3}', s)
if you can't use regular expression
maybe this for loop can do the trick
if v.islower():
if sentence[i-4].islower() and sentence[i+4].islower():
for k in range(1,4):
if sentence[i-k].islower() or sentence[i+k].islower():
break
if k == 3:
return i
regex is probably the easiest, using a modified version of #Israel Unterman's answer to account for the outside edges and non-upper surroundings the full regex might be:
s = 'abcdEFGhIJKlmnopABCdEFGGIddFFansTBDgRRQ'
import re
words = re.findall(r'(?:^|[^A-Z])([A-Z]{3}[a-z][A-Z]{3})(?:[^A-Z]|$)', s)
# words is ['EFGhIJK', 'TBDgRRQ']
using (?:.) groups keeps the search for beginning of line or non-upper from being included in match groups, leaving only the desired tokens in the result list. This should account for all conditions listed by OP.
(removed all my prior code as it was generally *bad*)

Procedure in Python

Question
Write a procedure that takes a string of words separated by spaces (assume no punctuation or capitalization), together with a ”target” word, and shows the position of the target word in the string of words. For example, if the string is:
'we dont need no education we dont need no thought control no we dont'
and the target is the word ”dont” then your procedure should return the list 1, 6, 13 because ”dont” appears at the 1st, 6th, and 13th position in the string. (We start counting positions of words in the string from 0.) Your procedure should return False if the target word doesn’t appear in the string.
My solution-
def procedure(string,target):
words=string.split(" ") #turn the string into a list of words
solution=[] #list that will be displayed
for i in range(len(words)):
if words[i]==target: solution.append(i)
if len(solution)==0: return False
return solution
string="we dont need no education we dont need no thought control no we dont"
print procedure(string, "dont")
assert procedure(string, "dont")
Why is this not running in python?! The problem is on print procedure(string, "dont") it mentions invalid syntax. I am running it in the IDLE.
The following is your code with the indentation fixed, compare this with what you posted and you should see why it now works.
It is unclear to me why your original code has a problem because the indentation controls how python views the blocks of code and will fail to run if the indentation is incorrect. I suspect that your problem is that you had these lines in your code:
for i in range(len(words)):
if words[i]==target: solution.append(i)
if len(solution)==0: return False
The above will fail and return False because solution length will be 0 on the first iteration if your word is not found on the first iteration, you should check the len of solution outside the scope of the for loop.
In [42]:
def procedure(string,target):
words=string.split(" ") #turn the string into a list of words
solution=[] #list that will be displayed
for i in range(len(words)):
if words[i]==target: solution.append(i)
if len(solution)==0: return False
return solution
string="we dont need no education we dont need no thought control no we dont"
print(procedure(string, "dont"))
assert(procedure(string, "dont"))
[1, 6, 13]
You can user a list comprehension for this:
def list_word_indexes(word, text):
return [index for index, text_word in enumerate(text.split())
if text_word == word]
The problem is on print procedure(string, "dont") it mentions invalid syntax
This means you are using python 3, where print is a function and not a statement. You should add brackets around the argument(s) to print or make sure to use python 2.
eg.
print(procedure(string, "dont"))

python: dictionary of words and wordforms

I have the following problem: I created a dictionary (german) with words and their corresponding lemma. exemple:
"Lagerbestände", "Lager-bestand"; "Wohnhäuser", "Wohn-haus"; "Bahnhof", "Bahn-hof"
I now have a text and I want to check for all word their lemmata. It can happen that it appears a word which is not in the dict, such as "Restbestände". But the lemma of "bestände", we already know it. So I want to take the first part of the word which is unknown in dicti and add this to the lemmatized second part and print this out (or return it).
Example: "Restbestände" --> "Rest-bestand". ("bestand" is taken from the lemma of "Lagerbestände")
I coded the following:
for limit in range(1, len(Word)):
for k, v in dicti.iteritems():
if re.search('[\w]*'+Word[limit:], k, re.IGNORECASE) != None:
if '-' in v:
tmp = v.find('-')
end = v[tmp:]
end = re.sub(ur'[-]',"", end)
Word = Word[:limit] + '-' + end `
But I got 2 problems:
At the end of the words, it is printed out every time "&#10". How can I avoid this?
The second part of the word is sometimes not correct - there must be a logical error.
However; how would you solve this?
At the end of the words, it is printed out every time "&#10". How can
I avoid this?
In must use UNICODE everywhere in your script. Everywhere, everywhere, everywhere.
Also, python RegEx functions accept flag re.UNICODE that you should always set. German letters are out of ASCII set, so RegEx can be sometimes confused, for instance when matching r'\w'

Categories

Resources