How to remove the first occurence of a repeated Character with python - python

I have been given the following string 'abcdea' and I need to find the repeated character but remove the first one so the result most be 'bcdea' I have tried to following but only get this result
def remove_rep(x):
new_list = []
for i in x:
if i not in new_list:
new_list.append(i)
new_list = ''.join(new_list)
print(new_list)
remove_rep('abcdea')
and the result is 'abcde' not the one that I was looking 'bcdea'

You could make use of str.find(), which returns the first occurrence with the string:
def remove_rep(oldString):
newString = ''
for i in oldString:
if i in newString:
# Character used previously, .find() returns the first position within string
first_position_index = newString.find(i)
newString = newString[:first_position_index] + newString[
first_position_index + 1:]
newString += i
print(newString)
remove_rep('abcdea')
remove_rep('abcdeaabcdea')
Out:
bcdea
bcdea

One approach can be to iterate in reverse order over the string, and keep track of all the characters seen in the string. If a character is repeated, we don't add it to the new_list.
def remove_rep(x: str):
new_list = []
seen = set()
for char in reversed(x):
if char not in seen:
new_list.append(char)
seen.add(char)
return ''.join(reversed(new_list))
print(remove_rep('abcdea'))
Result: 'bcdea'
Note that the above solution doesn't exactly work as desired, as it'll remove all occurrences of a character except the last one; for example, if you have 2+ occurrences of a chracter and you only want to remove the first one. To resolve that, you can instead do something like below:
def remove_rep(x: str):
new_list = []
first_seen = set()
for char in x:
freq = x.count(char)
if char in first_seen or freq == 1:
new_list.append(char)
elif freq > 1:
first_seen.add(char)
return ''.join(new_list)
Now for the given input:
print(remove_rep('abcdeaca'))
We get the desired result - only the first a and c is removed:
bdeaca
Test for a more complicated input:
print(remove_rep('abcdeaabcdea'))
We do get the correct result:
aabcdea
Do you see what happened in that last one? The first abcde sequence got removed, as all characters are repeated in this string. So our result is actually correct, even though it doesn't look so at an initial glance.

One of the approaches with one small change in the if condition:
def remove_rep(x):
new_list = []
visited = []
for i, item in enumerate(x):
if item not in x[i+1:] or item in visited:
new_list.append(item)
else:
visited.append(item)
new_list = ''.join(new_list)
print(new_list)
remove_rep('abcdeaa')
remove_rep('abcdeaabcdea')
Output:
bcdeaa
aabcdea

str.replace() does that :
https://docs.python.org/3/library/stdtypes.html#str.replace
str.replace(old, new[, count])
Return a copy of the string with all occurrences of substring old
replaced by new. If the optional argument count is given, only the
first count occurrences are replaced.
So basically :
"abcabc".replace('b', '', 1)
# output : 'acabc'

Change
new_list = ''.join(new_list)
to
new_list = ''.join(new_list[1:]+[i])
(and figure out why! Hint: what's the condition of your if block? What are you checking for and why?)

Related

How can I remove specific duplicates from a list, rather than remove all duplicates indiscriminately?

In a python script, I need to assess whether a string contains duplicates of a specific character (e.g., "f") and, if so, remove all but the first instance of that character. Other characters in the string may also have duplicates, but the script should not remove any duplicates other than those of the specified character.
This is what I've got so far. The script runs, but it is not accomplishing the desired task. I modified the reduce() line from the top answer to this question, but it's a little more complex than what I've learned at this point, so it's difficult for me to tell what part of this is wrong.
import re
from functools import reduce
string = "100 ffeet"
dups = ["f", "t"]
for char in dups:
if string.count(char) > 1:
lst = list(string)
reduce(lambda acc, el: acc if re.match(char, el) and el in acc else acc + [el], lst, [])
string = "".join(lst)
Let's create a function that receives a string s and a character c as parameters, and returns a new string where all but the first occurrence of c in s are removed.
We'll be making use of the following functions from Python std lib:
str.find(sub): Return the lowest index in the string where substring sub is found.
str.replace(old, new): Return a copy of the string with all occurrences of substring old replaced by new.
The idea is straightforward:
Find the first index of c in s
If none is found, return s
Make a substring of s starting from the next character after c
Remove all occurrences of c in the substring
Concatenate the first part of s with the updated substring
Return the final string
In Python:
def remove_all_but_first(s, c):
i = s.find(c)
if i == -1:
return s
i += 1
return s[:i] + s[i:].replace(c, '')
Now you can use this function to remove all the characters you want.
def main():
s = '100 ffffffffeet'
dups = ['f', 't', 'x']
print('Before:', s)
for c in dups:
s = remove_all_but_first(s, c)
print('After:', s)
if __name__ == '__main__':
main()
Here is one way that you could do it
string = "100 ffeet"
dups = ["f", "t"]
seen = []
for s in range(len(string)-1,0,-1):
if string[s] in dups and string[s] in seen:
string = string[:s] + '' + string[s+1:]
elif string[s] in dups:
seen.append(string[s])
print(string)

Iterate through a list in python and delete characters after the second instance of a character from an element

Sorry, very new to python.
Essentially I have a long list of file names, some in the format NAME_XX123456 and others in the format NAME_XX123456_123456.
I am needing to lose everything from the second underscore and after in each element.
The below code only iterates through the first two elements though, and doesn't delete the remainder when it encounters a double underscore, just splits it.
sample_list=['NAME_XX011024', 'NAME_XX011030_1234', 'NAME_XX011070', 'NAME_XX090119_15165']
shortlist=[]
item = "_"
count = 0
i=0
for i in range(0,len(sample_list)):
if(item in sample_list[i]):
count = count + 1
if(count == 2):
shortlist.append(sample_list[i].rpartition("_"))
i+=1
if (count == 1):
shortlist.append(sample_list[i])
i+=1
print(shortlist)
Here is a simple split join approach. We can split each input on underscore, and then join the first two elements together using underscore as the separator.
sample_list = ['NAME_XX011024', 'NAME_XX011030_1234', 'NAME_XX011070', 'NAME_XX090119_15165']
output = ['_'.join(x.split('_')[0:2]) for x in sample_list]
print(output)
# ['NAME_XX011024', 'NAME_XX011030', 'NAME_XX011070', 'NAME_XX090119']
You could also use regular expressions here:
sample_list = ['NAME_XX011024', 'NAME_XX011030_1234', 'NAME_XX011070', 'NAME_XX090119_15165']
output = [re.sub(r'([^_]+_[^_]+)_.*', r'\1', x) for x in sample_list]
print(output)
# ['NAME_XX011024', 'NAME_XX011030', 'NAME_XX011070', 'NAME_XX090119']
You can simply use split method to split each item in the list using '_' and then join the first two parts of the split. Thus ignoring everything after the second underscore.
Try this:
res= []
for item in sample_list:
item_split = item.split('_')
res.append('_'.join(item_split[0:2])) # taking only the first two items
print(res) # ['NAME_XX011024', 'NAME_XX011030', 'NAME_XX011070','NAME_XX090119']

How does the loop help iterate in this code

The problem at hand is that given a string S, we can transform every letter individually to be lowercase or uppercase to create another string.
Desired result is a list of all possible strings we could create.
Eg:
Input:
S = "a1b2"
Output:
["a1b2", "a1B2", "A1b2", "A1B2"]
I see the below code generates the correct result, but I'm a beginner in Python and can you help me understand how does loop line 5 & 7 work, which assign value to res.
def letterCasePermutation(self, S):
res = ['']
for ch in S:
if ch.isalpha():
res = [i+j for i in res for j in [ch.upper(), ch.lower()]]
else:
res = [i+ch for i in res]
return res
The result is a list of all possible strings up to this point. One call to the function handles the next character.
If the character is a non-letter (line 7), the comprehension simply adds that character to each string in the list.
If the character is a letter, then the new list contains two strings for each one in the input: one with the upper-case version added, one for the lower-case version.
If you're still confused, then I strongly recommend that you make an attempt to understand this with standard debugging techniques. Insert a couple of useful print statements to display the values that confuse you.
def letterCasePermutation(self, S):
res = ['']
for ch in S:
print("char = ", ch)
if ch.isalpha():
res = [i+j for i in res for j in [ch.upper(), ch.lower()]]
else:
res = [i+ch for i in res]
print(res)
return res
letterCasePermutation(None, "a1b2")
Output:
char = a
['A', 'a']
char = 1
['A1', 'a1']
char = b
['A1B', 'A1b', 'a1B', 'a1b']
char = 2
['A1B2', 'A1b2', 'a1B2', 'a1b2']
Best way to analyze this code is include the line:
print(res)
at the end of the outer for loop, as first answer suggests.
Then run it with the string '123' and the string 'abc' which will isolate the two conditionals. This gives the following output:
['1']
['12']
['123']
and
['A','a']
['AB','Ab','aB','ab']
['ABC','ABc','AbC','aBC','Abc','aBc','abC','abc']
Here we can see the loop is just taking the previously generated list as its input, and if the next string char is not a letter, is simply tagging the number/symbol onto the end of each string in the list, via string concatenation. If the next char in the initial input string is a letter, however, then the list is doubled in length by creating two copies for each item in the list, while simultaneously appending an upper version of the new char to the first copy, and a lower version of the new char to the second copy.
For an interesting result, see how the code fails if this change is made at line 2:
res = []

How to index a middle character in a list in python

Complete the get_mid_letter() function which is passed a list of strings as a parameter. The function returns a string made up of the concatenation of the middle letter of each word from the parameter list. The string returned by the function should be in lowercase characters. If the parameter list is an empty list, the function should return an empty string.
def get_mid_letter(a_list):
middle_list = []
for item in a_list:
middle_index = int(len(item) / 2)
middle_letter = a_list.index(middle_index)
middle_list = middle_list + [middle_letter]
return middle_list.lower()
def test_get_mid_letter():
print("1.", get_mid_letter(["Jess", "Cain", "Amity", "Raeann"]))
In my case, it shows an error message like "2 is not in the list".
What can I do to run my code successfully? Thanks!
array.index(element) returns an index not the character or element. So you use array[index_mid] to get the character and then append it to the middle_list
As the other answer pointed out, the mistake is here:
middle_letter = a_list.index(middle_index)
The index() method is looking for an element equal to middle_index (in this case, 2) in the list, trying to return the index of that element, but since there's no element equal to 2 in the list you get that error. To find the middle_letter, you should directly access the list item:
middle_letter = item[middle_index]
Be also aware that you're using the lower() method on a list, which is gonna cause an error. To get around this problem, you can use lower() on middle_letter for each loop iteration and simply return middle_list, like this:
for item in a_list:
middle_index = int(len(item) / 2)
middle_letter = item[middle_index]
middle_letter = middle_letter.lower()
middle_list = middle_list + [middle_letter]
return middle_list
There are a few problems with your code.
The function is supposed to return a string but you declare middle_list as a list not a string.
The index method returns the position in the list of its argument. You want to use [] for access.
You are not retrieving the character from the item, but the item from the list.
The corrected code could look like below:
def get_mid_letter(a_list):
middle_list = "" # declare the variable as an empty string
for item in a_list:
middle_index = int(len(item) / 2) # get the middle index
middle_letter = item[middle_index] # get the char from the string in the list
middle_list += middle_letter # concatenate the char and the string
return middle_list.lower()
If you turn each word into a list ['J', 'e', 's', 's'] in your for loop then you can just add the letter of the middle index to your growing word of middle letters
def middles(listb):
middle_word = ''
for i in listb:
letters = list(i)
index = int(len(i)/2)
letter = letters[index]
middle_word += letter.lower()
return middle_word
lista = ['Jess', 'Cain', 'Amity', 'Reann']
print(middles(lista))
(xenial)vash#localhost:~/python/stack_overflow$ python3.7 middle.py
siia

Removing duplicates from the list of unicode strings

I am trying to remove duplicates from the list of unicode string without changing the order(So, I don't want to use set) of elements appeared in it.
Program:
result = [u'http://google.com', u'http://www.catb.org/esr/faqs/hacker-howto.html', u'http://www.catb.org/~esr/faqs/hacker-howto.html',u'http://amazon.com', u'http://www.catb.org/esr/faqs/hacker-howto.html', u'http://yahoo.com']
result.reverse()
for e in result:
count_e = result.count(e)
if count_e > 1:
for i in range(0, count_e - 1):
result.remove(e)
result.reverse()
print result
Output:
[u'http://google.com', u'http://www.catb.org/esr/faqs/hacker-howto.html', u'http://www.catb.org/~esr/faqs/hacker-howto.html', u'http://amazon.com', u'http://yahoo.com']
Expected Output:
[u'http://google.com', u'http://catb.org/~esr/faqs/hacker-howto.html', u'http://amazon.com', u'http://yahoo.com']
So, Is there any way of doing it simple as possible.
You actually don't have duplicates in your list. One time you have http://catb.org while another time you have http://www.catb.org.
You'll have to figure a way to determine whether the URL has www. in front or not.
You can create a new list and add items to it if they're not already in it.
result = [ /some list items/]
uniq = []
for item in result:
if item not in uniq:
uniq.append(item)
You could use a set and then sort it by the original index:
sorted(set(result), key=result.index)
This works because index returns the first occurrence (so it keeps them in order according to first appearance in the original list)
I also notice that one of the strings in your original isn't a unicode string. So you might want to do something like:
u = [unicode(s) for s in result]
return sorted(set(u), key=u.index)
EDIT: 'http://google.com' and 'http://www.google.com' are not string duplicates. If you want to treat them as such, you could do something like:
def remove_www(s):
s = unicode(s)
prefix = u'http://'
suffix = s[11:] if s.startswith(u'http://www') else s[7:]
return prefix+suffix
And then replace the earlier code with
u = [remove_www(s) for s in result]
return sorted(set(u), key=u.index)
Here is a method that modifies result in place:
result = [u'http://google.com', u'http://catb.org/~esr/faqs/hacker-howto.html', u'http://www.catb.org/~esr/faqs/hacker-howto.html',u'http://amazon.com', 'http://www.catb.org/esr/faqs/hacker-howto.html', u'http://yahoo.com']
seen = set()
i = 0
while i < len(result):
if result[i] not in seen:
seen.add(result[i])
i += 1
else:
del result[i]

Categories

Resources