Python: Run-Length Encoding

Python: Run-Length Encoding - python

I'm getting an error if the input only contains a character without a number attached to it. For example, if the user were to input "a2bc" the output should be "aabc". I have to meet the run-length format. The decode function works if its "a2b1c1". The single character doesn't recognize any of it. I played with the conditions and debugger. I can't seem to meet the format for run-length.
The code displayed below were my attempts. I commented on the block where I tried to fix my problems.
def decode(user_input):
if not user_input:
return ""
else:
char = user_input[0]
num = user_input[1]
if num.isdigit():
result = char * int(num)
# elif num.isalpha():
# # this should skip to the next two characters
else:
result = char * int(num)
return result + decode(user_input[2:])
test1 = decode("a2b3c1")
test2 = decode("a2b3c")
print(test1)
print(test2)
(Note: the output for test2 should be "aabbbc")
Thank you so much.

This requires two changes: as you already figured out, if num is not actually a number, then you only use the char once and skip one character ahead. Otherwise you use the number and skip two characters ahead.
But you also need to handle a single character without a number at the end of your string. You can solve this by not only checking if user_input is empty, but whether it only has one character - in both cases you can simply return the string.
def decode(user_input):
if len(user_input) < 2:
return user_input
char = user_input[0]
num = user_input[1]
if num.isdigit():
return char * int(num) + decode(user_input[2:])
else:
return char + decode(user_input[1:])

You should advance by 1 instead of 2 when the next character is not a digit (i.e. the 1 is implicit):
def decode(user_input):
if len(user_input) < 2 : return user_input
multiplier,skip = (int(user_input[1]),2) if user_input[1].isdigit() else (1,1)
return user_input[0] * multiplier + decode(user_input[skip:])
note that doing this recursively will constrain the size of the input string that you can process because of the maximum recursion limit.

Related

Run Length Decoding

I have a function for run length decoding in Python. It is working with single digit numbers, but if my number is above 9, it doesn't work. As you can see in my code, I want to print c 11 times, but it only prints it one time. How can I fix it ? I want to keep the code I have, without any special libraries. Example: Input is (A2B3C11) then output should be (AABBBCCCCCCCCCCC), but currently my output is only (AABBBC)
def run_length_decoding(compressed_seq):
seq = ''
for i in range(0, len(compressed_seq)):
if compressed_seq[i].isalpha() == True:
for j in range(int(compressed_seq[i + 1])):
seq += compressed_seq[i]
return (seq)
print(run_length_decoding('A2B3C11'))

I'll preface this answer by saying a regex would solve this pretty nicely.
Trying to keep relatively closely to your original code without importing any additional modules:
def run_length_decoding(compressed_seq: str) -> str:
seq = ""
current_letter = None
for character in compressed_seq:
if character.isalpha():
if current_letter is not None:
seq += current_letter * int(number)
current_letter = character
number = ""
else:
# We assume that this is the number following the letter.
number += character
if current_letter is not None:
seq += current_letter * int(number)
return seq
Try it at https://www.mycompiler.io/view/CVsq0tCieVP

Remove string character after run of n characters in string

Suppose you have a given string and an integer, n. Every time a character appears in the string more than n times in a row, you want to remove some of the characters so that it only appears n times in a row. For example, for the case n = 2, we would want the string 'aaabccdddd' to become 'aabccdd'. I have written this crude function that compiles without errors but doesn't quite get me what I want:
def strcut(string, n):
for i in range(len(string)):
for j in range(n):
if i + j < len(string)-(n-1):
if string[i] == string[i+j]:
beg = string[:i]
ends = string[i+1:]
string = beg + ends
print(string)
These are the outputs for strcut('aaabccdddd', n):
n
output
expected
1
'abcdd'
'abcd'
2
'acdd'
'aabccdd'
3
'acddd'
'aaabccddd'
I am new to python but I am pretty sure that my error is in line 3, 4 or 5 of my function. Does anyone have any suggestions or know of any methods that would make this easier?

This may not answer why your code does not work, but here's an alternate solution using regex:
import re
def strcut(string, n):
return re.sub(fr"(.)\1{{{n-1},}}", r"\1"*n, string)
How it works: First, the pattern formatted is "(.)\1{n-1,}". If n=3 then the pattern becomes "(.)\1{2,}"
(.) is a capture group that matches any single character
\1 matches the first capture group
{2,} matches the previous token 2 or more times
The replacement string is the first capture group repeated n times
For example: str = "aaaab" and n = 3. The first "a" is the capture group (.). The next 3 "aaa" matches \1{2,} - in this example a{2,}. So the whole thing matches "a" + "aaa" = "aaaa". That is replaced with "aaa".
regex101 can explain it better than me.

you can implement a stack data structure.
Idea is you add new character in stack, check if it is same as previous one or not in stack and yes then increase counter and check if counter is in limit or not if yes then add it into stack else not. if new character is not same as previous one then add that character in stack and set counter to 1
# your code goes here
def func(string, n):
stack = []
counter = None
for i in string:
if not stack:
counter = 1
stack.append(i)
elif stack[-1]==i:
if counter+1<=n:
stack.append(i)
counter+=1
elif stack[-1]!=i:
stack.append(i)
counter = 1
return ''.join(stack)
print(func('aaabbcdaaacccdsdsccddssse', 2)=='aabbcdaaccdsdsccddsse')
print(func('aaabccdddd',1 )=='abcd')
print(func('aaabccdddd',2 )=='aabccdd')
print(func('aaabccdddd',3 )=='aaabccddd')
output
True
True
True
True

The method I would use is creating a new empty string at the start of the function and then everytime you exceed the number of characters in the input string you just not insert them in the output string, this is computationally efficient because it is O(n) :
def strcut(string,n) :
new_string = ""
first_c, s = string[0], 0
for c in string :
if c != first_c :
first_c, s= c, 0
s += 1
if s > n : continue
else : new_string += c
return new_string
print(strcut("aabcaaabbba",2)) # output : #aabcaabba

Simply, to anwer the question
appears in the string more than n times in a row
the following code is small and simple, and will work fine :-)
def strcut(string: str, n: int) -> str:
tmp = "*" * (n+1)
for char in string:
if tmp[len(tmp) - n:] != char * n:
tmp += char
print(tmp[n+1:])
strcut("aaabccdddd", 1)
strcut("aaabccdddd", 2)
strcut("aaabccdddd", 3)
Output:
abcd
aabccdd
aaabccddd
Notes:
The character "*" in the line tmp = "*"*n+string[0:1] can be any character that is not in the string, it's just a placeholder to handle the start case when there are no characters.
The print(tmp[n:]) line simply removes the "*" characters added in the beginning.

You don't need nested loops. Keep track of the current character and its count. include characters when the count is less or equal to n, reset the current character and count when it changes.
def strcut(s,n):
result = '' # resulting string
char,count = '',0 # initial character and count
for c in s: # only loop once on the characters
if c == char: count += 1 # increase count
else: char,count = c,1 # reset character/count
if count<=n: result += c # include character if count is ok
return result

Just to give some ideas, this is a different approach. I didn't like how n was iterating each time even if I was on i=3 and n=2, I still jump to i=4 even though I already checked that character while going through n. And since you are checking the next n characters in the string, you method doesn't fit with keeping the strings in order. Here is a rough method that I find easier to read.
def strcut(string, n):
for i in range(len(string)-1,0,-1): # I go backwards assuming you want to keep the front characters
if string.count(string[i]) > n:
string = remove(string,i)
print(string)
def remove(string, i):
if i > len(string):
return string[:i]
return string[:i] + string[i+1:]
strcut('aaabccdddd',2)

How to make all of the permutations of a password for brute force?

So I was trying to make a program that brute forces passwords.
Firstly, I made a program for a password of length 1:
password = input('What is your password?\n')
chars = 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789'
def brute_force():
for char in chars:
if char == password:
return char
print(brute_force())
Then I edited it for a password of length 2:
def brute_force():
guess = [None, None]
for char in chars:
guess[0] = char
for char2 in chars:
guess[1] = char2
if ''.join(guess) == password:
return ''.join(guess)
Finally I did the same for a password of length 3:
def brute_force():
guess = [None, None, None]
for char in chars:
guess[0] = char
for char2 in chars:
guess[1] = char2
for char3 in chars:
guess[2] = char3
if ''.join(guess) == password:
return ''.join(guess)
How could I generalize this for a variable called length which contains the integer value of the lenght of the password?

You can use the following recursive function:
def brute_force(string, length, goal):
if not length:
if string == goal:
return string
return False
for c in chars:
s = brute_force(string + c, length - 1, goal)
if s:
return s
return False
which you can call with syntax like:
>>> brute_force('', 3, 'bob')
'bob'
>>> brute_force('', 2, 'yo')
'yo'
Why does this work?
We always call each function with the three variables: string, length and goal. The variable string holds the current guess up to this point, so in the first example, string will be everything up to bob such as ab, bo etc.
The next variable length holds how many characters there are to go till the string is the right length.
The next variable goal is the correct password which we just pass through and is compare against.
In the main body of the function, we need to first check the case where length is 0 (done by checking not length as 0 evaluates to False). This is the case when we already have a string that is the length of the goal and we just want to check whether it is correct.
If it matches, then we return the string, otherwise we return False. We return either the solution or False to indicate to the function which called us (the call above in the stack) that we found the right password (or not).
We have now finished the case where length = 0 and now need to handle the other cases.
In these cases, the aim is to take the string that we have been called with and loop through all of the characters in chars, each time calling the brute_force function (recursive) with the result of the concatenation of the string we were called with and that character (c).
This will create a tree like affect where every string up to the original length is checked.
We also need to know what to do with the length and goal variables when calling the next function.
Well, to handle these, we just need to think what the next function needs to know. It already has the string (as this was the result of concatenating the next character in the chars string) and the length is just going to be one less as we just added one to the string through the concatenation and the goal is clearly going to be the same - we are still searching for the same password.
Now that we have called this function, it will run through subtracting one from the length at each of the subsequent calls it makes until it eventually reaches the case where length == 0. And we are at the easy case again and already know what to do!
So, after calling it, the function will return one of two things, either False indicating that the last node did not find the password (so this would occur in the case where something like ab reached the end in our search for bob so returned False after no solution was found), or, the call could return the actual solution.
Handling these cases is simple, if we got the actual solution, we just want to return that up the chain and if we got a fail (False), we just want to return False And that will indicate to the node above us that we did not succeed and tell it to continue its search.
So now, we just need to know how to call the function. We just need to send in an empty string and a target length and goal value and let the recursion take place.
Note one last thing is that if you wanted this to be even neater, you could modify the function definition to:
def brute_force(length, goal, string=''):
...
and change the recursive call within. This way, you could call the function with something just like: brute_force(3, 'bob') and wouldn't need to specify what string should start at. This is just something that you can add in if you want, but isn't necessary for the function to work.

In addition to the answer that shows you how this works, I'd like to draw attention to the fact that the standard library has a function for just this, in the shape of itertools.product—not itertools.permutations because that does not allow repetitions and therefore would only generate guesses with all unique characters:
from itertools import product
def brute_force():
for length in range(min_length, max_length + 1):
for p in product(chars, repeat=length):
guess = ''.join(p)
if guess == password:
return guess

Here's one solution:
password = input('What is your password? ')
chars = 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789'
def brute_force(length, check_callback, guess = ""):
if check_callback(guess):
return guess
elif len(guess) == length: #Reached maximum length and didn't find the password
return None
for char in chars:
retval = brute_force(length, check_callback, guess = guess + char)
if retval is not None:
return retval
return None #Couldn't find anything
print(brute_force(len(password), lambda guess: (guess == password))) #len(password) => cheating just for this example
length is the maximum guess length the function will go up to. check_callback should take a guess and return a truthy value if it worked. The function returns the first successful guess, or None if it couldn't find anything.
I will admit I forgot about the guess length and was reminded by #Joe Iddon's answer.
Now, that function checks for a correct answer even if the guess isn't the right length yet, which is wasteful in some circumstances. Here's a function that doesn't do that:
def brute_force(length, check_callback, guess = ""):
if len(guess) == length: #Reached maximum length
return (guess if check_callback(guess) else None)
for char in chars:
retval = brute_force(length, check_callback, guess = guess + char)
if retval is not None:
return retval
return None #Couldn't find anything
print(brute_force(len(password), lambda guess: guess == password)) #len(password) => cheating just for this example

Try this:
password = input('What is your password?\n')
chars = 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789'
answer = ''
for i in range(len(password)):
for char in chars:
if char==password[i]:
answer += char
print(answer)
Instead of using nested loops it guesses each character in turn.

Code is not validating: TypeError cannot concatenate 'str' and 'int' objects

While loops are new to me and I'm having trouble getting my code to validate.
Description
In this exercise your function will receive two parameters: a
string(long_word) and a character(char). Use a while loop to go
through all the letters in the string and build a new string made up
from those letters until you find the char. You may assume that each
string will contain the passed in character(char).
This is my code.
def letters_up_to_char(long_word, char):
new = ""
i = 0
while i != char:
for letter in long_word:
new += letter
i += 1
return new
Example output
letters_up_to_char('coderoxthesox', 'x') -> 'codero'
letters_up_to_char('abcdefghijklmnop', 'f') -> 'abcde'
When I go to run my code I get:
TypeError: cannot concatenate 'str' and 'int' objects

To get rid of TypeError: cannot concatenate 'str' and 'xyz' objects, just
cast the object being concatenated to a string. If your code was
string + num or string += num just cast num to string like so:
str(num)
BUT, your code won't return the desired output. See why below:
If I'm not mistaken, the code shouldn't compile because when new is defined, you don't close the double quotes. Or if you are using single quotes, change your code and your question to reflect your change.
When I ran your code and executed it, it went in an infinite loop, the beginner's worst enemy! In your code, the
for letter in long_word:
new += letter
Is the same as saying new += long_word, because you are just adding the individual characters instead of the whole string at one go.
Your code can then be rewritten as follows:
def letters_up_to_char(long_word, char):
new = ""
i = 0
while i != char:
new += long_word
i += 1
return new
Now it is clear what your code is doing. It's just adding the whole word to new each time the while loop is executed. And the while loop is executed till i != char. Since i is an int and char is a str, i != char is always true. Infinite loop in the making!
Your function should look like this:
def letters_up_to_char(long_word, char):
new = ""
i = 0
while i < len(long_word) and long_word[i] != char:
new += long_word[i]
i += 1
return new
Explanation:
Go through each character in long_word from the start (this can be more easily accomplished using a for...in loop, but I'm using a while loop as per your request) and till the current character != char, add that character to new.
This code returns the desired output for both your test cases.

Considering
You may assume that each string will contain the passed in
character(char).
not including the char:
def letters_up_to_char(long_word, char):
i=0
while long_word[i] != char:
i+=1
return long_word[:i]
including the char:
def letters_up_to_char(long_word, char):
i=0
while long_word[i] != char:
i+=1
return long_word[:i+1]
Though a more pythonic way is, i.e.:
def letters_up_to_char(long_word, char):
return long_word.partition(char)[0]
Suggest you to use http://docs.python.org/3/tutorial/index.html as a reference, when completing your assignments.

The "i" is int type which cannot be compared with a str type "char".
while i < len(long_word) and long_word[i] != char

There are a couple of ways similar to this to write this code. In your example, the while i != char line is going to result in a very long loop because it will loop until i == int(char), or possibly infinitely. I would write it with either a for or a while, as below:
def letters_while(long_word, char):
new = ""
i = 0
# A second condition is needed to prevent an infinite loop
# in the case that char is not in long_word
while long_word[i] != char and i < len(long_word):
new += letter
i += 1
return new
def letters_for(long_word, char):
new = ""
for letter in long_word:
if letter != char:
new += letter
return new
As a note, these are easy to understand examples, and a better way to do this would be
long_word.split(char)[0]

Removing all occurrences of a letter and replacing with a count of how many errors

I've got a code that in theory should take an input of DNA that has errors in it and removes all errors (N in my case) and places a count of how many N's were removing in that location.
My code:
class dnaString (str):
def __new__(self,s):
#the inputted DNA sequence is converted as a string in all upper cases
return str.__new__(self,s.upper())
def getN (self):
#returns the count of value of N in the sequence
return self.count("N")
def remove(self):
print(self.replace("N", "{}".format(coolString.getN())))
#asks the user to input a DNA sequence
dna = input("Enter a dna sequence: ")
#takes the inputted DNA sequence, ???
coolString = dnaString(dna)
coolString.remove()
When I input AaNNNNNNGTC I should get AA{6}GTC as the answer, but when I run my code it prints out AA666666GTC because I ended up replacing every error with the count. How do I go about just inputting the count once?

If you want to complete the task without external libraries, you can do it with the following:
def fix_dna(dna_str):
fixed_str = ''
n_count = 0
n_found = False
for i in range(len(dna_str)):
if dna_str[i].upper() == 'N':
if not n_found:
n_found = True
n_count += 1
elif n_found:
fixed_str += '{' + str(n_count) + '}' + dna_str[i]
n_found = False
n_count = 0
elif not n_found:
fixed_str += dna_str[i]
return fixed_str

Not the cleanest solution, but does the job
from itertools import accumulate
s = "AaNNNNNNGTC"
for i in reversed(list(enumerate(accumulate('N'*100, add)))):
s=s.replace(i[1], '{'+str(i[0] + 1)+'}')
s = 'Aa{6}GTC'

That's expected, from the documentation:
Return a copy of string s with all occurrences of substring old replaced by new.
One solution could be using regexes. The re.sub can take a callable that generates the replacement string:
import re
def replace_with_count(x):
return "{%d}" % len(x.group())
test = 'AaNNNNNNGTNNC'
print re.sub('N+', replace_with_count, test)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python: Run-Length Encoding - python

Related

Run Length Decoding

Remove string character after run of n characters in string

How to make all of the permutations of a password for brute force?

Code is not validating: TypeError cannot concatenate 'str' and 'int' objects

Removing all occurrences of a letter and replacing with a count of how many errors

Categories

Resources