I have a function for run length decoding in Python. It is working with single digit numbers, but if my number is above 9, it doesn't work. As you can see in my code, I want to print c 11 times, but it only prints it one time. How can I fix it ? I want to keep the code I have, without any special libraries. Example: Input is (A2B3C11) then output should be (AABBBCCCCCCCCCCC), but currently my output is only (AABBBC)
def run_length_decoding(compressed_seq):
seq = ''
for i in range(0, len(compressed_seq)):
if compressed_seq[i].isalpha() == True:
for j in range(int(compressed_seq[i + 1])):
seq += compressed_seq[i]
return (seq)
print(run_length_decoding('A2B3C11'))
I'll preface this answer by saying a regex would solve this pretty nicely.
Trying to keep relatively closely to your original code without importing any additional modules:
def run_length_decoding(compressed_seq: str) -> str:
seq = ""
current_letter = None
for character in compressed_seq:
if character.isalpha():
if current_letter is not None:
seq += current_letter * int(number)
current_letter = character
number = ""
else:
# We assume that this is the number following the letter.
number += character
if current_letter is not None:
seq += current_letter * int(number)
return seq
Try it at https://www.mycompiler.io/view/CVsq0tCieVP
Related
I am coding for a game similar to wordle
I need to compare 2 strings
If the character and position is the same to return -
If the character in guess is in the answer but wrong position to return *
If character completely not in answer to return .
If my answer has 2 similar characters (eg, guess: accept, answer: castle), * can only be returned once, meaning the expected output would be **.*.*
I can't seem to iterate the string taking into account the position as well
def process(guess: str, answer: str) -> str:
output = ""
for i,ch in enumerate(guess):
if ch not in answer:
output += '.'
elif ch != answer[i]:
output += '*'
else:
output += '-'
return output
You don't track the characters that you already identified in the answer, you can add a tracker string to check for identified characters:
def process(guess: str, answer: str) -> str:
output = ""
already_identified_characters = set()
for i, ch in enumerate(guess):
if ch not in answer or ch in already_identified_characters:
output += "."
elif ch != answer[i]:
output += "*"
else:
output += "-"
already_identified_characters.add(ch)
return output
If guess and answer are of equal length, this is how you could implement it:
def process(guess: str, answer: str) -> str:
output = []
misplaced_chars = set()
for g,a in zip(guess,answer):
if g == a:
# Identical character on same location
output.append('-')
elif g in answer and g not in misplaced_chars:
# Character exists in answer
output.append('*')
misplaced_chars.add(g)
else:
# Wrong guess
output.append('.')
return ''.join(output)
Using Counter to keep track of the letters used in the answer, you can make sure that if letters are repeated in answer, they will work properly as well.
Basically, you keep track of each letter's count in the input, and subtract from it as you encounter matches.
from collections import Counter
def process(guess: str, answer: str) -> str:
countAnswer = Counter(answer)
output = ""
for i,ch in enumerate(guess):
if ch not in answer:
output += '.'
elif countAnswer[ch]==0:
output += '.'
elif ch != answer[i] and countAnswer[ch]!=0:
output += '*'
countAnswer[ch]-=1
else:
output += '-'
return output
This should work out very similar to Wordle's treatment of repeated characters.
Various Inputs and their Outputs:
>>> process("crate","watch")
'*.**.'
>>> process("crate","slosh")
'.....'
>>> process("pious","slosh")
'..-.*'
>>> process("pesos","slosh")
'..***'
For example: string = aaaacccc, then I need the output to be 4a4c. Is there a way to do this without using any advanced methods, such as libraries or functions?
Also, if someone knows how to do the reverse: turning "4a4c: into aaaacccc, that would be great to know.
This will do the work in one iteration
Keep two temp variable one for current character, another for count of that character and one variable for the result.
Just iterate through the string and keep increasing the count if it matches with the previous one.
If it doesn't then update the result with count and value of character and update the character and count.
At last add the last character and the count to the result. Done!
input_str = "aaaacccc"
if input_str.isalpha():
current_str = input_str[0]
count = 0
final_string = ""
for i in input_str:
if i==current_str:
count+=1
else:
final_string+=str(count)+current_str
current_str = i
count = 1
final_string+=str(count)+current_str
print (final_string)
Another solution and I included even a patchwork reverse operation like you mentioned in your post. Both run in O(n) and are fairly simple to understand. The encode is basically identical one posted by Akanasha, he was just a bit faster in posting his answer while i was writing the decode().
def encode(x):
if not x.isalpha():
raise ValueError()
output = ""
current_l = x[0]
counter = 0
for pos in x:
if current_l != pos:
output += str(counter) + current_l
counter = 1
current_l = pos
else:
counter += 1
return output + str(counter) + current_l
def decode(x):
output = ""
i = 0
while i < len(x):
if x[i].isnumeric():
n = i + 1
while x[n].isnumeric():
n += 1
output += int(x[i:n])*x[n]
i = n
i += 1
return output
test = "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaasasggggggbbbbdd"
test1 = encode(test)
print(test1)
test2 = decode(test1)
print(test2)
print(test == test2)
yes, you do not need any libraries:
list1 = list("aaaacccc")
letters = []
for i in list1:
if i not in letters:
letters.append(i)
string = ""
for i in letters:
string += str(list1.count(i))
string+=str(i)
print(string)
Basically, it loops through the list, finds the unique letters and then prints the count with the letter itself. Reversing would be the same function, just print the amount.
I'm getting an error if the input only contains a character without a number attached to it. For example, if the user were to input "a2bc" the output should be "aabc". I have to meet the run-length format. The decode function works if its "a2b1c1". The single character doesn't recognize any of it. I played with the conditions and debugger. I can't seem to meet the format for run-length.
The code displayed below were my attempts. I commented on the block where I tried to fix my problems.
def decode(user_input):
if not user_input:
return ""
else:
char = user_input[0]
num = user_input[1]
if num.isdigit():
result = char * int(num)
# elif num.isalpha():
# # this should skip to the next two characters
else:
result = char * int(num)
return result + decode(user_input[2:])
test1 = decode("a2b3c1")
test2 = decode("a2b3c")
print(test1)
print(test2)
(Note: the output for test2 should be "aabbbc")
Thank you so much.
This requires two changes: as you already figured out, if num is not actually a number, then you only use the char once and skip one character ahead. Otherwise you use the number and skip two characters ahead.
But you also need to handle a single character without a number at the end of your string. You can solve this by not only checking if user_input is empty, but whether it only has one character - in both cases you can simply return the string.
def decode(user_input):
if len(user_input) < 2:
return user_input
char = user_input[0]
num = user_input[1]
if num.isdigit():
return char * int(num) + decode(user_input[2:])
else:
return char + decode(user_input[1:])
You should advance by 1 instead of 2 when the next character is not a digit (i.e. the 1 is implicit):
def decode(user_input):
if len(user_input) < 2 : return user_input
multiplier,skip = (int(user_input[1]),2) if user_input[1].isdigit() else (1,1)
return user_input[0] * multiplier + decode(user_input[skip:])
note that doing this recursively will constrain the size of the input string that you can process because of the maximum recursion limit.
i got homework to do "Run Length Encoding" in python and i wrote a code but it is print somthing else that i dont want. it prints just the string(just like he was written) but i want that it prints the string and if threre are any characthers more than one time in this string it will print the character just one time and near it the number of time that she appeard in the string. how can i do this?
For example:
the string : 'lelamfaf"
the result : 'l2ea2mf2
def encode(input_string):
count = 1
prev = ''
lst = []
for character in input_string:
if character != prev:
if prev:
entry = (prev, count)
lst.append(entry)
#print lst
count = 1
prev = character
else:
count += 1
else:
entry = (character, count)
lst.append(entry)
return lst
def decode(lst):
q = ""
for character, count in lst:
q += character * count
return q
def main():
s = 'emanuelshmuel'
print decode(encode(s))
if __name__ == "__main__":
main()
Three remarks:
You should use the existing method str.count for the encode function.
The decode function will print count times a character, not the character and its counter.
Actually the decode(encode(string)) combination is a coding function since you do not retrieve the starting string from the encoding result.
Here is a working code:
def encode(input_string):
characters = []
result = ''
for character in input_string:
# End loop if all characters were counted
if set(characters) == set(input_string):
break
if character not in characters:
characters.append(character)
count = input_string.count(character)
result += character
if count > 1:
result += str(count)
return result
def main():
s = 'emanuelshmuel'
print encode(s)
assert(encode(s) == 'e3m2anu2l2sh')
s = 'lelamfaf'
print encode(s)
assert(encode(s) == 'l2ea2mf2')
if __name__ == "__main__":
main()
Came up with this quickly, maybe there's room for optimization (for example, if the strings are too large and there's enough memory, it would be better to use a set of the letters of the original string for look ups rather than the list of characters itself). But, does the job fairly efficiently:
text = 'lelamfaf'
counts = {s:text.count(s) for s in text}
char_lst = []
for l in text:
if l not in char_lst:
char_lst.append(l)
if counts[l] > 1:
char_lst.append(str(counts[l]))
encoded_str = ''.join(char_lst)
print encoded_str
I've got a code that in theory should take an input of DNA that has errors in it and removes all errors (N in my case) and places a count of how many N's were removing in that location.
My code:
class dnaString (str):
def __new__(self,s):
#the inputted DNA sequence is converted as a string in all upper cases
return str.__new__(self,s.upper())
def getN (self):
#returns the count of value of N in the sequence
return self.count("N")
def remove(self):
print(self.replace("N", "{}".format(coolString.getN())))
#asks the user to input a DNA sequence
dna = input("Enter a dna sequence: ")
#takes the inputted DNA sequence, ???
coolString = dnaString(dna)
coolString.remove()
When I input AaNNNNNNGTC I should get AA{6}GTC as the answer, but when I run my code it prints out AA666666GTC because I ended up replacing every error with the count. How do I go about just inputting the count once?
If you want to complete the task without external libraries, you can do it with the following:
def fix_dna(dna_str):
fixed_str = ''
n_count = 0
n_found = False
for i in range(len(dna_str)):
if dna_str[i].upper() == 'N':
if not n_found:
n_found = True
n_count += 1
elif n_found:
fixed_str += '{' + str(n_count) + '}' + dna_str[i]
n_found = False
n_count = 0
elif not n_found:
fixed_str += dna_str[i]
return fixed_str
Not the cleanest solution, but does the job
from itertools import accumulate
s = "AaNNNNNNGTC"
for i in reversed(list(enumerate(accumulate('N'*100, add)))):
s=s.replace(i[1], '{'+str(i[0] + 1)+'}')
s = 'Aa{6}GTC'
That's expected, from the documentation:
Return a copy of string s with all occurrences of substring old replaced by new.
One solution could be using regexes. The re.sub can take a callable that generates the replacement string:
import re
def replace_with_count(x):
return "{%d}" % len(x.group())
test = 'AaNNNNNNGTNNC'
print re.sub('N+', replace_with_count, test)