How can I remove duplicate string with Python? - python

I am to learn python, I have a problem to deal with
Following example:
string1 = "Galaxy S S10 Lite"
string2 = "Galaxy Note Note 10 Plus"
How can I remove the second two duplicates "S" and "S" or "Note" and "Note"?
The result should look like
string1a = "Galaxy S10 Lite"
string2a = "Galaxy Note 10 Plus"
how to only the second duplicates should be removed and the sequence of the words should not be changed!

string1a = string1.split()
del string1a[1]
string1a = " ".join(string1a)
This does what you want for the 2 strings provided.
It will only work in all the strings you want, if you know for sure that the second and third words of the string are the duplicates, giving preference to the third one.

The accepted answer only removes the second word in the sentence manually.
If you have a very long string that will be very tedious to clean.
I assume there were only two cases:
Skip the letter if it's the same with the following word's first letter
Skip the word if it's the same as the its following word
This function can clean them automatically
def clean_string(string):
"""Clean if there were sequentially duplicated word or letter"""
following_word = ''
complete_words = []
# loop through the string in reverse to be able to skip the earlier word/letter
# string.split() splits your string by each space and make it as a list
for word in reversed(string.split()):
# to skip duplicated letter, in your case is to skip "S" and retain "S10"
if (len(word) == 1) and (following_word[0] == word):
following_word = word
continue
# to skip duplicated word, in your case is to skip "Note" and retain latter "Note"
elif word == following_word:
following_word = word
continue
following_word = word
complete_words.append(word)
# join all appended word and re-reverse it to be the expected sequence
return ' '.join(reversed(complete_words))

Related

Why my function doesn't return a new string?

My assignment was to write a function with one string parameter that returns a string. The function should extract the words within this string, drop empty words as well as words that are equal to "end" and "exit", convert the
remaining words to upper case, join them with the joining token string ";" and return this
newly joined string.
This is my function but if the string doesnt contain the words "exit" or "end" no new string is returned:
def fun(long_string):
stop_words = ('end', 'exit', ' ')
new_line = ''
for word in long_string:
if word in stop_words:
new_line = long_string.replace(stop_words, " ")
result = ';'.join(new_line.upper())
return result
print(fun("this is a long string"))
The condition of if is never True, since word is not a real "word"; word in your code will be each "character" of long_string. So what if really does here is comparing 't' with 'end' and so on. Therefore, new_line always remains to be the empty string as initialized.
You will need split to work with words:
def fun(long_string):
return ';'.join(word for word in long_string.split() if word not in ('end', 'exit'))
print(fun("this is a long string")) # this;is;a;long;string
You don't need to check for empty words, because split considers them as separators (i.e., not even a word).
for word in long_string will iterate over each character in long_string, not each word. The next line compares each character to the words in stop_words.
You probably want something like for word in long.string.split(' ') in order to iterate over the words.

find all words in a certain alphabet with multi character letters

I want to find out what words can be formed using the names of musical notes.
This question is very similar: Python code that will find words made out of specific letters. Any subset of the letters could be used
But my alphabet also contains "fis","cis" and so on.
letters = ["c","d","e","f","g","a","h","c","fis","cis","dis"]
I have a really long word list with one word per list and want to use
with open(...) as f:
for line in f:
if
to check if each word is part of that "language" and then save it to another file.
my problem is how to alter
>>> import re
>>> m = re.compile('^[abilrstu]+$')
>>> m.match('australia') is not None
True
>>> m.match('dummy') is not None
False
>>> m.match('australian') is not None
False
so it also matches with "fis","cis" and so on.
e.g. "fish" is a match but "ifsh" is not a match.
I believe ^(fis|cis|dis|[abcfhg])+$ will do the job.
Some deconstruction of what's going on here:
| workds like OR conjunction
[...] denotes "any symbol from what's inside the brackets"
^ and $ stand for beginning and end of line, respectively
+ stands for "1 or more time"
( ... ) stands for grouping, needed to apply +/*/{} modifiers. Without grouping such modifiers applies to closest left expression
Alltogether this "reads" as "whole string is one or more repetition of fis/cis/dis or one of abcfhg"
This function works, it doesn't use any external libraries:
def func(word, letters):
for l in sorted(letters, key=lambda x: x.length, reverse=True):
word = word.replace(l, "")
return not s
it works because if s=="", then it has been decomposed into your letters.
Update:
It seems that my explanation wasn't clear. WORD.replace(LETTER, "") will replace the note/LETTER in WORD by nothing, here is an example :
func("banana", {'na'})
it will replace every 'na' in "banana" by nothing ('')
the result after this is "ba", which is not a note
not "" means True and not "ba" is false, this is syntactic sugar.
here is another example :
func("banana", {'na', 'chicken', 'b', 'ba'})
it will replace every 'chicken' in "banana" by nothing ('')
the result after this is "banana"
it will replace every 'ba' in "banana" by nothing ('')
the result after this is "nana"
it will replace every 'na' in "nana" by nothing ('')
the result after this is ""
it will replace every 'b' in "" by nothing ('')
the result after this is ""
not "" is True ==> HURRAY IT IS A MELODY !
note: The reason for the sorted by length is because otherwise, the second example would not have worked. The result after deleting "b" would be "a", which can't be decomposed in notes.
You can calculate the number of letters of all units (names of musical notes), which are in the word, and compare this number to the length of the word.
from collections import Counter
units = {"c","d","e","f","g","a","h", "fis","cis","dis"}
def func(word, units=units):
letters_count = Counter()
for unit in units:
num_of_units = word.count(unit)
letters_count[unit] += num_of_units * len(unit)
if len(unit) == 1:
continue
# if the unit consists of more than 1 letter (e.g. dis)
# check if these letters are in one letter units
# if yes, substruct the number of repeating letters
for letter in unit:
if letter in units:
letters_count[letter] -= num_of_units
return len(word) == sum(letters_count.values())
print(func('disc'))
print(func('disco'))
# True
# False
A solution with tkinter window opening to choose file:
import re
from tkinter import filedialog as fd
m = re.compile('^(fis|ges|gis|as|ais|cis|des|es|dis|[abcfhg])+$')
matches = list()
filename = fd.askopenfilename()
with open(filename) as f:
for line in f:
if m.match(str(line).lower()) is not None:
matches.append(line[:-1])
print(matches)
This answer was posted as an edit to the question find all words in a certain alphabet with multi character letters by the OP Nivatius under CC BY-SA 4.0.

Python - string index out of range issue

This is the question I was given to solve:
Create a program inputs a phrase (like a famous quotation) and prints all of the words that start with h-z.
I solved the problem, but the first two methods didn't work and I wanted to know why:
#1 string index out of range
quote = input("enter a 1 sentence quote, non-alpha separate words: ")
word = ""
for character in quote:
if character.isalpha():
word += character.upper()
else:
if word[0].lower() >= "h":
print(word)
word = ""
else:
word = ""
I get the IndexError: string index out of range message for any words after "g". Shouldn't the else statement catch it? I don't get why it doesn't, because if I remove the brackets [] from word[0], it works.
#2: last word not printing
quote = input("enter a 1 sentence quote, non-alpha separate words: ")
word = ""
for character in quote:
if character.isalpha():
word += character.upper()
else:
if word.lower() >= "h":
print(word)
word = ""
else:
word = ""
In this example, it works to a degree. It eliminates any words before 'h' and prints words after 'h', but for some reason doesn't print the last word. It doesn't matter what quote i use, it doesn't print the last word even if it's after 'h'. Why is that?
You're calling on word[0]. This accesses the first element of the iterable string word. If word is empty (that is, word == ""), there is no "first element" to access; thus you get an IndexError. If a "word" starts with a non-alphabetic character (e.g. a number or a dash), then this will happen.
The second error you're having, with your second code snippet leaving off the last word, is because of the approach you're using for this problem. It looks like you're trying to walk through the sentence you're given, character by character, and decide whether to print a word after having read through it (which you know because you hit a space character. But this leads to the issue with your second approach, which is that it doesn't print the last string. That's because the last character in your sentence isn't a space - it's just the last letter in the last word. So, your else loop is never executed.
I'd recommend using an entirely different approach, using the method string.split(). This method is built-in to python and will transform one string into a list of smaller strings, split across the character/substring you specify. So if I do
quote = "Hello this is a sentence"
words = quote.split(' ')
print(words)
you'll end up seeing this:
['Hello', 'this', 'is', 'a', 'sentence']
A couple of things to keep in mind on your next approach to this problem:
You need to account for empty words (like if I have two spaces in a row for some reason), and make sure they don't break the script.
You need to account for non-alphanumeric characters like numbers and dashes. You can either ignore them or handle them differently, but you have to have something in place.
You need to make sure that you handle the last word at some point, even if the sentence doesn't end in a space character.
Good luck!
Instead of what you're doing, you can Iterate over each word in the string and count how many of them begin in those letters. Read about the function str.split(), in the parameter you enter the divider, in this case ' ' since you want to count the words, and that returns a list of strings. Iterate over that in the loop and it should work.

Python Challenge #3: Loop stops way too early

I'm working on PythonChallenge #3. I've got a huge block of text that I have to sort through. I am trying to find a sequence in which the first and last three letters are caps, and the middle one is lowercase.
My function loops through the text. The variable block stores the seven letters that are currently being looped through. There's a variable, toPrint, which gets turned on and off based on whether the letters in block correspond to my pattern (AAAaAAA). Based on the last block printed according to my function, my loop stops early in my text. I have no idea why this is happening and if you could help me figure this out, that would be great.
text = """kAewtloYgcFQaJNhHVGxXDiQmzjfcpYbzxlWrVcqsmUbCunkfxZWDZjUZMiGqhRRiUvGmYmvnJ"""
words = []
for i in text:
toPrint = True
block = text[text.index(i):text.index(i)+7]
for b in block[:3]:
if b.isupper() == False:
toPrint = False
for b in block[3]:
if b.islower() == False:
toPrint = False
for b in block[4:]:
if b.isupper() == False:
toPrint = False
if toPrint == True and block not in words:
words.append(block)
print (block)
print (words)
With Regex:
This is a really good time to use regex, it's super fast, more clear, and doesn't require a bunch of nested if statements.
import re
text = """kAewtloYgcFQaJNhHVGxXDiQmzjfcpYbzxlWrVcqsmUbCunkfxZWDZjUZMiGqhRRiUvGmYmvnJ"""
print(re.search(r"[A-Z]{3}[a-z][A-Z]{3}", text).group(0))
Explanation of regex:
[A-Z]{3] ---> matches any 3 uppercase letters
[a-z] -------> matches a single lowercase letter
[A-Z]{3] ---> matches 3 more uppercase letters
Without Regex:
If you really don't want to use regex this is how you could do it:
text = """kAewtloYgcFQaJNhHVGxXDiQmzjfcpYbzxlWrVcqsmUbCunkfxZWDZjUZMiGqhRRiUvGmYmvnJ"""
for i, _ in enumerate(text[:-6]): #loop through index of each char (not including last 6)
sevenCharacters = text[i:i+7] #create chunk of seven characters
shouldBeCapital = sevenCharacters[0:3] + sevenCharacters[4:7] #combine all the chars that should be cap into list
if (all(char.isupper() for char in shouldBeCapital)): #make sure all those characters are indeeed capital
if(sevenCharacters[3].islower()): #make sure middle character is lowercase
print(sevenCharacters)
I think your first problem is that you are using str.index(). Like find(), the .index() method of a string returns the index of the first match that is found.
Thus, in your example, whenever you search for 'x' you will get the index of the first 'x' found, etc. You cannot successfully work with any character that is not unique in the string, or that is not the first occurrence of a repeated character.
In order to keep the same structure (which isn't necessary- there is an answer posted using enumerate that I prefer myself) I implemented a queuing approach with your block variable. Each iteration, a character is dropped from the front of block, while the new character is appended to the end.
I also cleaned up some of your needless comparisons with False. You will find that this is not only inefficient, it is frequently wrong, because many of the "boolean" activities you perform will not be on actual boolean values. Get out of the habit of spelling out True/False. Just use if c or if not c.
Here's the result:
text = """kAewtloYgcFQaJNhHVGxXDiQmzjfcpYbzxlWrVcqsmUbCunkfxZWDZjUZMiGqhRRiUvGmYmvnJ"""
words = []
block = '.' + text[0:6]
for i in text[6:]:
block = block[1:] + i # Drop 1st char, append 'i'
toPrint = True
for b in block[:3]:
if not b.isupper():
toPrint = False
if not block[3].islower():
toPrint = False
for b in block[4:]:
if not b.isupper():
toPrint = False
if toPrint and block not in words:
words.append(block)
print (words)
If I understood your question, then according to my opinion there is no need of loop. My this simple code can find required sequence.
# Use this code
text = """kAewtloYgcFQaJNhHVGxXDiQmzjfcpYbzxlWrVcqsmUbCunkfxZWDZjUZMiGqhRRiUvGmYmvnJ"""
import re
print(re.findall("[A-Z]{3}[a-z][A-Z]{3}", text))

How might I create an acronym by splitting a string at the spaces, taking the character indexed at 0, joining it together, and capitalizing it?

My code
beginning = input("What would you like to acronymize? : ")
second = beginning.upper()
third = second.split()
fourth = "".join(third[0])
print(fourth)
I can't seem to figure out what I'm missing. The code is supposed to the the phrase the user inputs, put it all in caps, split it into words, join the first character of each word together, and print it. I feel like there should be a loop somewhere, but I'm not entirely sure if that's right or where to put it.
Say input is "Federal Bureau of Agencies"
Typing third[0] gives you the first element of the split, which is "Federal". You want the first element of each element in the sprit. Use a generator comprehension or list comprehension to apply [0] to each item in the list:
val = input("What would you like to acronymize? ")
print("".join(word[0] for word in val.upper().split()))
In Python, it would not be idiomatic to use an explicit loop here. Generator comprehensions are shorter and easier to read, and do not require the use of an explicit accumulator variable.
When you run the code third[0], Python will index the variable third and give you the first part of it.
The results of .split() are a list of strings. Thus, third[0] is a single string, the first word (all capitalized).
You need some sort of loop to get the first letter of each word, or else you could do something with regular expressions. I'd suggest the loop.
Try this:
fourth = "".join(word[0] for word in third)
There is a little for loop inside the call to .join(). Python calls this a "generator expression". The variable word will be set to each word from third, in turn, and then word[0] gets you the char you want.
works for me this way:
>>> a = "What would you like to acronymize?"
>>> a.split()
['What', 'would', 'you', 'like', 'to', 'acronymize?']
>>> ''.join([i[0] for i in a.split()]).upper()
'WWYLTA'
>>>
One intuitive approach would be:
get the sentence using input (or raw_input in python 2)
split the sentence into a list of words
get the first letter of each word
join the letters with a space string
Here is the code:
sentence = raw_input('What would you like to acronymize?: ')
words = sentence.split() #split the sentece into words
just_first_letters = [] #a list containing just the first letter of each word
#traverse the list of words, adding the first letter of
#each word into just_first_letters
for word in words:
just_first_letters.append(word[0])
result = " ".join(just_first_letters) #join the list of first letters
print result
#acronym2.py
#illustrating how to design an acronymn
import string
def main():
sent=raw_input("Enter the sentence: ")#take input sentence with spaces
for i in string.split(string.capwords(sent)):#split the string so each word
#becomes
#a string
print string.join(i[0]), #loop through the split
#string(s) and
#concatenate the first letter
#of each of the
#split string to get your
#acronym
main()
name = input("Enter uppercase with lowercase name")
print(f'the original string = ' + name)
def uppercase(name):
res = [char for char in name if char.isupper()]
print("The uppercase characters in string are : " + "".join(res))
uppercase(name)

Categories

Resources