Run Length Encoding in python

Run Length Encoding in python - python

i got homework to do "Run Length Encoding" in python and i wrote a code but it is print somthing else that i dont want. it prints just the string(just like he was written) but i want that it prints the string and if threre are any characthers more than one time in this string it will print the character just one time and near it the number of time that she appeard in the string. how can i do this?
For example:
the string : 'lelamfaf"
the result : 'l2ea2mf2
def encode(input_string):
count = 1
prev = ''
lst = []
for character in input_string:
if character != prev:
if prev:
entry = (prev, count)
lst.append(entry)
#print lst
count = 1
prev = character
else:
count += 1
else:
entry = (character, count)
lst.append(entry)
return lst
def decode(lst):
q = ""
for character, count in lst:
q += character * count
return q
def main():
s = 'emanuelshmuel'
print decode(encode(s))
if __name__ == "__main__":
main()

Three remarks:
You should use the existing method str.count for the encode function.
The decode function will print count times a character, not the character and its counter.
Actually the decode(encode(string)) combination is a coding function since you do not retrieve the starting string from the encoding result.
Here is a working code:
def encode(input_string):
characters = []
result = ''
for character in input_string:
# End loop if all characters were counted
if set(characters) == set(input_string):
break
if character not in characters:
characters.append(character)
count = input_string.count(character)
result += character
if count > 1:
result += str(count)
return result
def main():
s = 'emanuelshmuel'
print encode(s)
assert(encode(s) == 'e3m2anu2l2sh')
s = 'lelamfaf'
print encode(s)
assert(encode(s) == 'l2ea2mf2')
if __name__ == "__main__":
main()

Came up with this quickly, maybe there's room for optimization (for example, if the strings are too large and there's enough memory, it would be better to use a set of the letters of the original string for look ups rather than the list of characters itself). But, does the job fairly efficiently:
text = 'lelamfaf'
counts = {s:text.count(s) for s in text}
char_lst = []
for l in text:
if l not in char_lst:
char_lst.append(l)
if counts[l] > 1:
char_lst.append(str(counts[l]))
encoded_str = ''.join(char_lst)
print encoded_str

Related

How to group consecutive letters in a string in Python?

For example: string = aaaacccc, then I need the output to be 4a4c. Is there a way to do this without using any advanced methods, such as libraries or functions?
Also, if someone knows how to do the reverse: turning "4a4c: into aaaacccc, that would be great to know.

This will do the work in one iteration
Keep two temp variable one for current character, another for count of that character and one variable for the result.
Just iterate through the string and keep increasing the count if it matches with the previous one.
If it doesn't then update the result with count and value of character and update the character and count.
At last add the last character and the count to the result. Done!
input_str = "aaaacccc"
if input_str.isalpha():
current_str = input_str[0]
count = 0
final_string = ""
for i in input_str:
if i==current_str:
count+=1
else:
final_string+=str(count)+current_str
current_str = i
count = 1
final_string+=str(count)+current_str
print (final_string)

Another solution and I included even a patchwork reverse operation like you mentioned in your post. Both run in O(n) and are fairly simple to understand. The encode is basically identical one posted by Akanasha, he was just a bit faster in posting his answer while i was writing the decode().
def encode(x):
if not x.isalpha():
raise ValueError()
output = ""
current_l = x[0]
counter = 0
for pos in x:
if current_l != pos:
output += str(counter) + current_l
counter = 1
current_l = pos
else:
counter += 1
return output + str(counter) + current_l
def decode(x):
output = ""
i = 0
while i < len(x):
if x[i].isnumeric():
n = i + 1
while x[n].isnumeric():
n += 1
output += int(x[i:n])*x[n]
i = n
i += 1
return output
test = "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaasasggggggbbbbdd"
test1 = encode(test)
print(test1)
test2 = decode(test1)
print(test2)
print(test == test2)

yes, you do not need any libraries:
list1 = list("aaaacccc")
letters = []
for i in list1:
if i not in letters:
letters.append(i)
string = ""
for i in letters:
string += str(list1.count(i))
string+=str(i)
print(string)
Basically, it loops through the list, finds the unique letters and then prints the count with the letter itself. Reversing would be the same function, just print the amount.

How to remove Triplicate Letters in Python

So I'm a little confused as far as putting this small code together. My teacher gave me this info:
Iterate over the string and remove any triplicated letters (e.g.
"byeee mmmy friiiennd" becomes "bye my friennd"). You may assume any
immediate following same letters are a triplicate.
I've mostly only seen examples for duplicates, so how do I remove triplicates? My code doesn't return anything when I run it.
def removeTriplicateLetters(i):
result = ''
for i in result:
if i not in result:
result.append(i)
return result
def main():
print(removeTriplicateLetters('byeee mmmy friiiennd'))
main()

I have generalized the scenario with "n". In your case, you can pass n=3 as below
def remove_n_plicates(input_string, n):
i=0
final_string = ''
if not input_string:
return final_string
while(True):
final_string += input_string[i]
if input_string[i:i+n] == input_string[i]*n:
i += n
else:
i += 1
if i >= len(input_string):
break
return final_string
input_string = "byeee mmmy friiiennd"
output_string = remove_n_plicates(input_string, 3)
print(output_string)
# bye my friennd
You can use this for any "n" value now (where n > 0 and n < length of input string)

Your code returns an empty string because that's exactly what you coded:
result = ''
for i in result:
...
return result
Since result is an empty string, you don't enter the loop at all.
If you did enter the loop you couldn't return anything:
for i in result:
if i not in result:
The if makes no sense: to get to that statement, i must be in result
Instead, do as #newbie showed you. Iterate through the string, looking at a 3-character slice. If the slice is equal to 3 copies of the first character, then you've identified a triplet.
if input_string[i:i+n] == input_string[i]*n:

Without going in to writing the code to resolve the problem.
When you iterate over the string, add that iteration to a new string.
If the next iteration is the same as the previous iteration then do not add that to the new string.
This will catch both the triple and the double characters in your problem.
Tweaked a previous answer to remove a few lines that were not needed.
def remove_n_plicates(input_string, n):
i=0
result = ''
while(True):
result += input_string[i]
if input_string[i:i+n] == input_string[i]*n:
i += n
else:
i += 1
if i >= len(input_string):
break
return result
input_string = "byeee mmmy friiiennd"
output_string = remove_n_plicates(input_string, 3)
print(output_string)
# bye my friennd

Here's a fun way using itertools.groupby:
def removeTriplicateLetters(s):
return ''.join(k*(l//3+l%3) for k,l in ((k,len(list(g))) for k, g in groupby(s)))
>>> removeTriplicateLetters('byeee mmmy friiiennd')
'bye my friennd'

just modifying #newbie solution and using stack data structure as solution
def remove_n_plicates(input_string, n):
if input_string =='' or n<1:
return None
w = ''
c = 0
if input_string!='':
tmp =[]
for i in range(len(input_string)):
if c==n:
w+=str(tmp[-1])
tmp=[]
c =0
if tmp==[]:
tmp.append(input_string[i])
c = 1
else:
if input_string[i]==tmp[-1]:
tmp.append(input_string[i])
c+=1
elif input_string[i]!=tmp[-1]:
w+=str(''.join(tmp))
tmp=[input_string[i]]
c = 1
w+=''.join(tmp)
return w
input_string = "byeee mmmy friiiennd nnnn"
output_string = remove_n_plicates(input_string, 3)
print(output_string)
output
bye my friennd nn

so this is a bit dirty but it's short and works
def removeTriplicateLetters(i):
result,string = i[:2],i[2:]
for k in string:
if result[-1]==k and result[-2]==k:
result=result[:-1]
else:
result+=k
return result
print(removeTriplicateLetters('byeee mmmy friiiennd'))
bye my friennd

You have already got a working solution. But here, I come with another way to achieve your goal.
def removeTriplicateLetters(sentence):
"""
:param sentence: The sentence to transform.
:param words: The words in the sentence.
:param new_words: The list of the final words of the new sentence.
"""
words = sentence.split(" ") # split the sentence into words
new_words = []
for word in words: # loop through words of the sentence
new_word = []
for char in word: # loop through characters in a word
position = word.index(char)
if word.count(char) >= 3:
new_word = [i for i in word if i != char]
new_word.insert(position, char)
new_words.append(''.join(new_word))
return ' '.join(new_words)
def main():
print(removeTriplicateLetters('byeee mmmy friiiennd'))
main()
Output: bye my friennd

Split a string, loop through it character by character, and replace specific ones?

I'm working on an assignment and have gotten stuck on a particular task. I need to write two functions that do similar things. The first needs to correct capitalization at the beginning of a sentence, and count when this is done. I've tried the below code:
def fix_capitalization(usrStr):
count = 0
fixStr = usrStr.split('.')
for sentence in fixStr:
if sentence[0].islower():
sentence[0].upper()
count += 1
print('Number of letters capitalized: %d' % count)
print('Edited text: %s' % fixStr)
Bu receive an out of range error. I'm getting an "Index out of range error" and am not sure why. Should't sentence[0] simply reference the first character in that particular string in the list?
I also need to replace certain characters with others, as shown below:
def replace_punctuation(usrStr):
s = list(usrStr)
exclamationCount = 0
semicolonCount = 0
for sentence in s:
for i in sentence:
if i == '!':
sentence[i] = '.'
exclamationCount += 1
if i == ';':
sentence[i] = ','
semicolonCount += 1
newStr = ''.join(s)
print(newStr)
print(semicolonCount)
print(exclamationCount)
But I'm struggling to figure out how to actually do the replacing once the character is found. Where am I going wrong here?
Thank you in advance for any help!

I would use str.capitalize over str.upper on one character. It also works correctly on empty strings. The other major improvement would be to use enumerate to also track the index as you iterate over the list:
def fix_capitalization(s):
sentences = [sentence.strip() for sentence in s.split('.')]
count = 0
for index, sentence in enumerate(sentences):
capitalized = sentence.capitalize()
if capitalized != sentence:
count += 1
sentences[index] = capitalized
result = '. '.join(sentences)
return result, count
You can take a similar approach to replacing punctuation:
replacements = {'!': '.', ';': ','}
def replace_punctuation(s):
l = list(s)
counts = dict.fromkeys(replacements, 0)
for index, item in enumerate(l):
if item in replacements:
l[index] = replacements[item]
counts[item] += 1
print("Replacement counts:")
for k, v in counts.items():
print("{} {:>5}".format(k, v))
return ''.join(l)

There are better ways to do these things but I'll try to change your code minimally so you will learn something.
The first function's issue is that when you split the sentence like "Hello." there will be two sentences in your fixStr list that the last one is an empty string; so the first index of an empty string is out of range. fix it by doing this.
def fix_capitalization(usrStr):
count = 0
fixStr = usrStr.split('.')
for sentence in fixStr:
# changed line
if sentence != "":
sentence[0].upper()
count += 1
print('Number of letters capitalized: %d' % count)
print('Edited text: %s' % fixStr)
In second snippet you are trying to write, when you pass a string to list() you get a list of characters of that string. So all you need to do is to iterate over the elements of the list and replace them and after that get string from the list.
def replace_punctuation(usrStr):
newStr = ""
s = list(usrStr)
exclamationCount = 0
semicolonCount = 0
for c in s:
if c == '!':
c = '.'
exclamationCount += 1
if c == ';':
c = ','
semicolonCount += 1
newStr = newStr + c
print(newStr)
print(semicolonCount)
print(exclamationCount)
Hope I helped!

Python has a nice build in function for this
for str in list:
new_str = str.replace('!', '.').replace(';', ',')
You can write a oneliner to get a new list
new_list = [str.replace('!', '.').replace(';', ',') for str in list]
You also could go for the split/join method
new_str = '.'.join(str.split('!'))
new_str = ','.join(str.split(';'))
To count capitalized letters you could do
result = len([cap for cap in str if str(cap).isupper()])
And to capitalize them words just use the
str.capitalize()
Hope this works out for you

Removing all occurrences of a letter and replacing with a count of how many errors

I've got a code that in theory should take an input of DNA that has errors in it and removes all errors (N in my case) and places a count of how many N's were removing in that location.
My code:
class dnaString (str):
def __new__(self,s):
#the inputted DNA sequence is converted as a string in all upper cases
return str.__new__(self,s.upper())
def getN (self):
#returns the count of value of N in the sequence
return self.count("N")
def remove(self):
print(self.replace("N", "{}".format(coolString.getN())))
#asks the user to input a DNA sequence
dna = input("Enter a dna sequence: ")
#takes the inputted DNA sequence, ???
coolString = dnaString(dna)
coolString.remove()
When I input AaNNNNNNGTC I should get AA{6}GTC as the answer, but when I run my code it prints out AA666666GTC because I ended up replacing every error with the count. How do I go about just inputting the count once?

If you want to complete the task without external libraries, you can do it with the following:
def fix_dna(dna_str):
fixed_str = ''
n_count = 0
n_found = False
for i in range(len(dna_str)):
if dna_str[i].upper() == 'N':
if not n_found:
n_found = True
n_count += 1
elif n_found:
fixed_str += '{' + str(n_count) + '}' + dna_str[i]
n_found = False
n_count = 0
elif not n_found:
fixed_str += dna_str[i]
return fixed_str

Not the cleanest solution, but does the job
from itertools import accumulate
s = "AaNNNNNNGTC"
for i in reversed(list(enumerate(accumulate('N'*100, add)))):
s=s.replace(i[1], '{'+str(i[0] + 1)+'}')
s = 'Aa{6}GTC'

That's expected, from the documentation:
Return a copy of string s with all occurrences of substring old replaced by new.
One solution could be using regexes. The re.sub can take a callable that generates the replacement string:
import re
def replace_with_count(x):
return "{%d}" % len(x.group())
test = 'AaNNNNNNGTNNC'
print re.sub('N+', replace_with_count, test)

Python String Comparisons Using A Word List

Eventually I will be able to post simple questions like this in a chat room, but for now I must post it. I am still struggling with comparison issues in Python. I have a list containing strings that I obtained from a file. I have a function which takes in the word list (previously created from a file) and some 'ciphertext'. I am trying to Brute Force crack the ciphertext using a Shift Cipher. My issue is the same as with comparing integers. Although I can see when trying to debug using print statements, that my ciphertext will be shifted to a word in the word list, it never evaluates to True. I am probably comparing two different variable types or a /n is probably throwing the comparison off. Sorry for all of the posts today, I am doing lots of practice problems today in preparation for an upcoming assignment.
def shift_encrypt(s, m):
shiftAmt = s % 26
msgAsNumList = string2nlist(m)
shiftedNumList = add_val_mod26(msgAsNumList, shiftAmt)
print 'Here is the shifted number list: ', shiftedNumList
# Take the shifted number list and convert it back to a string
numListtoMsg = nlist2string(shiftedNumList)
msgString = ''.join(numListtoMsg)
return msgString
def add_val_mod26(nlist, value):
newValue = value % 26
print 'Value to Add after mod 26: ', newValue
listLen = len(nlist)
index = 0
while index < listLen:
nlist[index] = (nlist[index] + newValue) % 26
index = index + 1
return nlist
def string2nlist(m):
characters = ['a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p','q','r','s','t','u','v','w','x','y','z']
numbers = [0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25]
newList = []
msgLen = len(m) # var msgLen will be an integer of the length
index = 0 # iterate through message length in while loop
while index < msgLen:
letter = m[index] # iterate through message m
i = 0
while i < 26:
if letter == characters[i]:
newList.append(numbers[i])
i = i + 1
index = index + 1
return newList
def nlist2string(nlist):
characters = ['a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p','q','r','s','t','u','v','w','x','y','z']
numbers = [0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25]
newList = []
nListLen = len(nlist)
index = 0
while index < nListLen:
num = nlist[index]
newNum = num % 26
i = 0
while i < 26:
num1 = newNum
num2 = numbers[i]
if (num1 == num2):
newList.append(characters[i])
i = i + 1
index = index + 1
return newList
def wordList(filename):
fileObject = open(filename, "r+")
wordsList = fileObject.readlines()
return wordsList
def shift_computePlaintext(wlist, c):
index = 0
while index < 26:
newCipher = shift_encrypt(index, c)
print 'The new cipher text is: ', newCipher
wordlistLen = len(wlist)
i = 0
while i < wordlistLen:
print wlist[i]
if newCipher == wlist[i]:
return newCipher
else:
print 'Word not found.'
i = i + 1
index = index + 1
print 'Take Ciphertext and Find Plaintext from Wordlist Function: \n'
list = wordList('test.txt')
print list
plainText = shift_computePlaintext(list, 'vium')
print 'The plaintext was found in the wordlist: ', plainText
When the shift amount = 18, the ciphertext = name which is a word in my wordlist, but it never evaluates to True. Thanks for any help in advance!!

It's hard to be sure with the information we have so far, but here's a guess:
wordsList = fileObject.readlines()
This is going to return you a list of strings with the newlines preserved, like:
['hello\n', 'my\n', 'name\n', 'is\n', 'jesi\n']
So, inside shift_computePlaintext, when you iterate over wlist looking for something that matches the decrypted 'vium', you're looking for a string that matches 'name', and none of them match, including 'name\n'.
In other words, exactly what you suspected.
There are a few ways to fix this, but the most obvious are to use wlist[i].strip() instead of wlist[i], or to strip everything in the first place by using something like wordsList = [line.strip() for line in fileObject] instead of wordsList = fileObject.readlines().
A few side notes:
There is almost never a good reason to call readlines(). That returns a list of lines that you can iterate over… but the file object itself was already an iterable of lines that you can iterate over. If you really need to make sure it's a list instead of some other kind of iterable, or make a separate copy for later, or whatever, just call list on it, as you would with any other iterable.
You should almost never write a loop like this:
index = 0
while index < 26:
# ...
index = index + 1
Instead, just do this:
for index in range(26):
It's easier to read, harder to get wrong (subtle off-by-one errors are responsible for half the frustrating debugging you will do in your lifetime), etc.
And if you're looping over the length of a collection, don't even do that. Instead of this:
wordlistLen = len(wlist)
i = 0
while i < wordlistLen:
# ...
word = wlist[i]
# ...
i = i + 1
… just do this:
for word in wlist:
… or, if you need both i and word (which you occasionally do):
for i, word in enumerate(wlist):
Meanwhile, if the only reason you're looping over a collection is to check each of its values, you don't even need that. Instead of this:
wordlistLen = len(wlist)
while i < wordlistLen:
print wlist[i]
if newCipher == wlist[i]:
return newCipher
else:
print 'Word not found.'
i = i + 1
… just do this:
if newCipher in wlist:
return newCipher
else:
print 'Word not found.'
Here, you've actually got one of those subtle bugs: you print 'Word not found' over and over, instead of only printing it once at the end if it wasn't found.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Run Length Encoding in python - python

Related

How to group consecutive letters in a string in Python?

How to remove Triplicate Letters in Python

Split a string, loop through it character by character, and replace specific ones?

Removing all occurrences of a letter and replacing with a count of how many errors

Python String Comparisons Using A Word List

Categories

Resources