Remove dulpicates from a string

Remove dulpicates from a string - python

I wrote this code which consists of an input of two words and outputs the letters in the second word which are in the exact position as the first word in uppercase and the characters in the first and second word but not in the exact position as the first in lowercase.The code works for two of my test cases but fails on the third when i input duplicate letters as the second word.
If i input zabbz and cbbxb it outputs .bB.b but there are only 2 b's in the first word,the output should be '.bB..' .Please assist if you can.
Here is my code:
sec=list(input().upper())
guess=list(input().upper())
output=['.']*len(guess)
letters=''
for i in range(len(guess)-1,-1,-1):
if guess[i]==sec[i]:
output[i]=guess[i]
sec[i]=guess[i]=0
for i in range (len(guess)):
if guess[i] in sec and guess[i]!=0:
output[i]=guess[i].lower()
for letter in output:
letters=letters+letter
print(letters)

To remove duplicates I use something like this:
start_array = []
final_array = []
for i in start_array:
if start_array[i] in final_array:
#DO nothing its a duplicate and continue
continue
else:
final_array.append(start_array[i])
This checks each entry in the array and checks if that entry is in the final array and if it is it continues. Otherwise, it will append. This is assuming you changed all cases to upper().

Using list.count(item) method will allow u to verify that there are the same amount of letters in your output list as in your sec list and not more. Here is an example :
if guess[i] in sec and sec.count(guess[i]) > output.count(guess[i])+output.count(guess[i].upper()):
output[i] = guess[i].lower()
here is an example that i made real quick if you want to test it :
sec = list(input().upper())
guess = list(input().upper())
output = ["." for i in range(len(guess))]
for i in range(len(guess)):
if guess[i] == sec[i]:
output[i] = guess[i]
elif guess[i] in sec and sec.count(guess[i]) > output.count(guess[i])+output.count(guess[i].upper()):
output[i] = guess[i].lower()
output = "".join(output)
print(output)
also i noticed you were doing some string concatenation for the final result but if you want a simpler way to create a string from a list there is the "".join(list) method that allows you to join all elements in the list seperated by the str specified(in that case, seperated by nothing)

Related

Showing "Kernel Dead" while comparing two string sequences of huge dataset in Python

I wrote a script to compare Human and Mouse genome sequences for 3,641,094 strings. That means it requires 1,820,547 iterations. The code below works perfectly for ~999,999 character strings, but it results in a dead Jupyter kernel when the number of strings increases to ~9,999,999 (7 digits). I wrote a function to compare the strings and put a \n if:
at least one string is lowercase or,
they are not similar or,
they are not one of the letters ['A','T','C','G'] (N,X,Y,....) these should not be counted as similar.
at least one of them is '-'. Otherwise, we can append the letters in the result.
For example:
str1="ATCxGCGCGCGAcTTAATCGatcg----NXWQnxwq"
str2="aTCxGCGCGCGAcTTAatcgatcg--atNXWQnxwq"
Output:
Letters similar in strings 1 and 2:
TC
GCGCGCGA
TTA
Here is my code:
import re
def compare_strings(df, column, j):
str1 = df["Genome Sequence"][j] #even rows
str2 = df["Genome Sequence"][j+1] #odd rows
#then creating a new variable to store the result after
#comparing the strings. You note that I added result2 because
#if string 2 is longer than string 1 then you have extra characters
#in result 2, if string 1 is longer then the result you want to take
#a look at is result 2
result1 = ''
result2 = ''
#handle the case where one string is longer than the other
maxlen=len(str2) if len(str1)<len(str2) else len(str1)
#loop through the characters
for i in range(maxlen):
#use a slice rather than index in case one string longer than other
letter1=str1[i:i+1]
letter2=str2[i:i+1]
#create string with differences
if (letter1.islower() and letter2.islower() and letter1==letter2):
result1+='\n'
result2+='\n'
if ((letter1 not in ['A','T','C','G']) and (letter2 not in ['A','T','C','G']) and letter1==letter2):
result1+='\n'
result2+='\n'
if ((letter1 == letter2) and letter1.isupper() and letter2.isupper() and (letter1!='-') and (letter2!='-') and
letter1 in ['A','T','C','G'] and letter2 in ['A','T','C','G']):
result1+=letter1
result2+=letter2
# if ((letter1 == letter2) and (letter1!='-') and (letter2!='-') and
# letter1 in ['A','T','C','G'] and letter2 in ['A','T','C','G']):
# result1+=letter1
# result2+=letter2
if ((letter1 != letter2) or (letter1=='-') or (letter2=='-')):
result1+='\n'
result2+='\n'
#print out result
#print ("Letters different in string 1:",result1)
#print ("Letters different in string 2:",result2)
word=re.sub(r'\n+', '\n', result1).strip()
return word
#print(compare_strings(lines_frame, "Genome Sequence", 0))
comparisons = ""
for i in range(int(len(lines_frame)/2)):
if i%2==0:
comparisons+=(compare_strings(lines_frame, "Genome Sequence", i))
Could you help me to solve this Kernel Dead error on Jupyter Notebook?

Reverse a specific word function

I'm having trouble doing the next task:
So basically, I need to build a function that receives a (sentence, word, occurrence)
and it will search for that word and reverse it only where it occurs
for example:
function("Dani likes bananas, Dani also likes apples", "lik", "2")
returns: "Dani likes bananas, Dani also kiles apples"
As you can see, the "word" is 'lik' and at the second time it occurred it reversed to 'kil'.
I wrote something but it's too messy and that part still doesn't work for me,
def q2(sentence, word, occurrence):
count = 0
reSentence = ''
reWord = ''
for char in word:
if sentence.find(word) == -1:
print('could not find the word')
break
for letter in sentence:
if char == letter:
if word != reWord:
reWord += char
reSentence += letter
break
elif word == reWord:
if count == int(occurrence):
reWord = word[::-1]
reSentence += reWord
elif count > int(occurrence):
print("no such occurrence")
else:
count += 1
else:
reSentence += letter
print(reSentence)
sentence = 'Dani likes bananas, Dani also likes apples'
word = 'li'
occurrence = '2'
q2(sentence,word,occurrence)
the main problem right now is that, after it breaks it goes back to check from the start of the sentence so it will find i in "Dani". I couldn't think of a way to make it check from where it stopped.
I tried using enumerate but still had no idea how.

This will work for the given scenario
scentence = 'Dani likes bananas, Dani also likes apples'
word = 'lik'
st = word
occ = 2
lt = scentence.split(word)
op = ''
if (len(lt) > 1):
for i,x in enumerate(lt[:-1]):
if (i+1) == occ:
word = ''.join(reversed(word))
op = op + x + word
word = st
print(op+lt[-1])
Please test yourself for other scenario
This line for i,x in enumerate(lt[:-1]) basically loops on the list excluding the last element. using enumerate we can get index of the element in the list in i and value of element in x. So when code gets loops through it I re-join the split list with same word by which I broke, but I change the word on the specified position where you desired. The reason to exclude the last element while looping is because inside loop there is addition of word and after each list of element and if I include the whole list there will be extra word at the end. Hope it explains.

Your approach shows that you've clearly thought about the problem and are using the means you know well enough to solve it. However, your code has a few too many issue to simply fix, for example:
you only check for occurrence of the word once you're inside the loop;
you loop over the entire sentence for each letter in the word;
you only compare a character at a time, and make some mistakes in keeping track of how much you've matched so far.
you pass a string '2', which you intend to use as a number 2
All of that and other problems can be fixed, but you would do well to use what the language gives you. Your task breaks down into:
find the n-th occurrence of a substring in a string
replace it with another word where found and return the string
Note that you're not really looking for a 'word' per se, as your example shows you replacing only part of a word (i.e. 'lik') and a 'word' is commonly understood to mean a whole word between word boundaries.
def q2(sentence, word, occurrence):
# the first bit
position = 0
count = 0
while count < occurrence:
position = sentence.find(word, position+1)
count += 1
if position == -1:
print (f'Word "{word}" does not appear {occurrence} times in "{sentence}"')
return None
# and then using what was found for a result
return sentence[0:position] + word[::-1] + sentence[position+len(word):]
print(q2('Dani likes bananas, Dani also likes apples','lik',2))
print(q2('Dani likes bananas, Dani also likes apples','nope',2))
A bit of explanation on that return statement:
sentence[0:position] gets sentence from the start 0 to the character just before position, this is called a 'slice'
word[::-1] get word from start to end, but going in reverse -1. Leaving out the values in the slice implies 'from one end to the other'
sentence[position+len(word):] gets sentence from the position position + len(word), which is the character after the found word, until the end (no index, so taking everything).
All those combined is the result you need.
Note that the function returns None if it can't find the word the right number of times - that may not be what is needed in your case.

import re
from itertools import islice
s = "Dani likes bananas, Dani also likes apples"
t = "lik"
n = 2
x = re.finditer(t, s)
try:
i = next(islice(x, n - 1, n)).start()
except StopIteration:
i = -1
if i >= 0:
y = s[i: i + len(t)][::-1]
print(f"{s[:i]}{y}{s[i + len(t):]}")
else:
print(s)
Finds the 2nd starting index (if exists) using Regex. May require two passes in the worst case over string s, one to find the index, one to form the output. This can also be done in one pass using two pointers, but I'll leave that to you. From what I see, no one has offered a solution yet that does in one pass.

index = Find index of nth occurence
Use slice notation to get part you are interested in (you have it's beginning and length)
Reverse it
Construct your result string:
result = sentence[:index] + reversed part + sentence[index+len(word):]

How to print horizontally instead of vertically?

I wrote a script that is supposed to print an answer to a specific input horizontally.
For example if the input is:
TTACTGGCAT
It should print:
TTACTGGCAT
AATGACCGTA
My code:
x = 0
n = input("Insert DNA seqence: ")
print(n.upper())
while x < len(n):
if 'T' in n[x]:
print('A')
if 'G' in n[x]:
print('C')
if 'C' in n[x]:
print('G')
if 'A' in n[x]:
print('T')
x = x + 1

I assume you want to do something like this:
nucl_dict = {'A':'T', 'T':'A', 'C':'G', 'G':'C'}
n = input("Insert DNA seqence: ").upper()
print(n)
print(''.join(nucl_dict.get(nucl, nucl) for nucl in n))
nucl_dict defines which nucleotides are complementary.
This joins the characters for the corresponding nucleotides into a string and prints the result.
If the character is not a valid nucleotide, the character is simply added without a change to the complementary string. get tries to find the value given the first argument as a key (in this case each character in n) and if the key does not exist uses the second argument (in this case the same character).

You should concat everything in a string and after the loops end print it.

You can use the end parameter like print(some_var, end='') to not print ending newline after each call. In your loop you would want to print new line, so just run without the end parameter there. See print documentation.

Beginner Issue; string index out of range

# word reverser
#user input word is printed backwards
word = input("please type a word")
#letters are to be added to "reverse" creating a new string each time
reverse = ""
#the index of the letter of the final letter of "word" the users' input
#use this to "steal" a letter each time
#index is the length of the word - 1 to give a valid index number
index = len(word) - 1
#steals a letter until word is empty, adding each letter to "reverse" each time (in reverse)
while word:
reverse += word[index]
word = word[:index]
print(reverse)
print(reverse)
input("press enter to exit")
Working to make a simple program that spells a user input word backwards and prints it back to them by "stealing" letters from the original and making new strings from them.
Trouble I'm having is this code spews back a string index out of range error at
reverse += word[index]
Help or a better way of achieving same result is mucho apreciado.

Reversing a word in python is simpler than that:
reversed = forward[::-1]
I wouldn't use a loop, it's longer and less readable.

While others have pointed out multiple ways of reversing words in Python, here is what I believe to be the problem with your code.
index always stay the same. Lets say the user inputs a four letter word, like abcd. Index will be set to three (index = len(word) - 1). Then during the first iteration of the loop, word will be reduced to abc (word = word[:index]). Then, during the next iteration of the loop, on the first line inside it (reverse += word[index]) you will get the error. index is still three, so you try to access index[3]. However, since word has been cut short there is no longer an index[3]. You need to reduce index by one each iteration:
while word:
reverse += word[index]
word = word[:index]
index -= 1
And here is yet another way of reversing a word in Python (Wills code is the neatest, though):
reverse = "".join([word[i-1] for i in range(len(word), 0, -1)])
Happy coding!

You're going to want to use the "range" function.
range(start, stop, step)
Returns a list from start to stop increasing (or decreasing) by step. Then you can iterate through the list. All together, it would look something like this:
for i in range(len(word) -1, -1, -1):
reverse += word[i]
print(reverse)
Or the easier way would be to use string slicing to reverse the word directly and then iterate through that. Like so:
for letter in word[::-1]:
reverse += letter
print(reverse)
With the way it is written now, it will not only print the word backwards, but it will also print each part of the backwards word. For example, if the user entered "Hello" it would print
o
ol
oll
olle
olleH
If you just want to print the word backwards, the best way is just
print(word[::-1])

It is because you are not changing the value of the index
modification:
while word:
reverse += word[index]
word = word[:index]
index-=1
print(reverse)`
that is you have to reduce index each time you loop through to get the current last letter of the word

Count vowels from raw input

I have a homework question which asks to read a string through raw input and count how many vowels are in the string. This is what I have so far but I have encountered a problem:
def vowels():
vowels = ["a","e","i","o","u"]
count = 0
string = raw_input ("Enter a string: ")
for i in range(0, len(string)):
if string[i] == vowels[i]:
count = count+1
print count
vowels()
It counts the vowels fine, but due to if string[i] == vowels[i]:, it will only count one vowel once as i keeps increasing in the range. How can I change this code to check the inputted string for vowels without encountering this problem?

in operator
You probably want to use the in operator instead of the == operator - the in operator lets you check to see if a particular item is in a sequence/set.
1 in [1,2,3] # True
1 in [2,3,4] # False
'a' in ['a','e','i','o','u'] # True
'a' in 'aeiou' # Also True
Some other comments:
Sets
The in operator is most efficient when used with a set, which is a data type specifically designed to be quick for "is item X part of this set of items" kind of operations.*
vowels = set(['a','e','i','o','u'])
*dicts are also efficient with in, which checks to see if a key exists in the dict.
Iterating on strings
A string is a sequence type in Python, which means that you don't need to go to all of the effort of getting the length and then using indices - you can just iterate over the string and you'll get each character in turn:
E.g.:
for character in my_string:
if character in vowels:
# ...
Initializing a set with a string
Above, you may have noticed that creating a set with pre-set values (at least in Python 2.x) involves using a list. This is because the set() type constructor takes a sequence of items. You may also notice that in the previous section, I mentioned that strings are sequences in Python - sequences of characters.
What this means is that if you want a set of characters, you can actually just pass a string of those characters to the set() constructor - you don't need to have a list one single-character strings. In other words, the following two lines are equivalent:
set_from_string = set('aeiou')
set_from_list = set(['a','e','i','o','u'])
Neat, huh? :) Do note, however, that this can also bite you if you're trying to make a set of strings, rather than a set of characters. For instance, the following two lines are not the same:
set_with_one_string = set(['cat'])
set_with_three_characters = set('cat')
The former is a set with one element:
'cat' in set_with_one_string # True
'c' in set_with_one_string # False
Whereas the latter is a set with three elements (each one a character):
'c' in set_with_three_characters` # True
'cat' in set_with_three_characters # False
Case sensitivity
Comparing characters is case sensitive. 'a' == 'A' is False, as is 'A' in 'aeiou'. To get around this, you can transform your input to match the case of what you're comparing against:
lowercase_string = input_string.lower()

You can simplify this code:
def vowels():
vowels = 'aeiou'
count = 0
string = raw_input ("Enter a string: ")
for i in string:
if i in vowels:
count += 1
print count
Strings are iterable in Python.

for i in range(0, len(string)):
if string[i] == vowels[i]:
This actually has a subtler problem than only counting each vowel once - it actually only tests if the first letter of the string is exactly a, if the second is exactly e and so on.. until you get past the fifth. It will try to test string[5] == vowels[5] - which gives an error.
You don't want to use i to look into vowels, you want a nested loop with a second index that will make sense for vowels - eg,
for i in range(len(string)):
for j in range(len(vowels)):
if string[i] == vowels[j]:
count += 1
This can be simplified further by realising that, in Python, you very rarely want to iterate over the indexes into a sequence - the for loop knows how to iterate over everything that you can do string[0], string[1] and so on, giving:
for s in string:
for v in vowels:
if s == v:
count += 1
The inner loop can be simplified using the in operation on lists - it does exactly the same thing as this code, but it keeps your code's logic at a higher level (what you want to do vs. how to do it):
for s in string:
if s in vowels:
count += 1
Now, it turns out that Python lets do math with booleans (which is what s in vowels gives you) and ints - True behaves as 1, False as 0, so True + True + False is 2. This leads to a one liner using a generator expression and sum:
sum(s in vowels for s in string)
Which reads as 'for every character in string, count how many are in vowels'.

you can use filter for a one liner
print len(filter(lambda ch:ch.lower() in "aeiou","This is a String"))

Here's a more condensed version using sum with a generator:
def vowels():
string = raw_input("Enter a string: ")
print sum(1 for x in string if x.lower() in 'aeiou')
vowels()

Option on a theme
Mystring = "The lazy DOG jumped Over"
Usestring = ""
count=0
for i in Mystring:
if i.lower() in 'aeiou':
count +=1
Usestring +='^'
else:
Usestring +=' '
print (Mystring+'\n'+Usestring)
print ('Vowels =',count)
The lazy DOG jumped Over
^ ^ ^ ^ ^ ^ ^
Vowels = 7

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Remove dulpicates from a string - python

Related

Showing "Kernel Dead" while comparing two string sequences of huge dataset in Python

Reverse a specific word function

How to print horizontally instead of vertically?

Beginner Issue; string index out of range

Count vowels from raw input

Categories

Resources